386
SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM Narelle Lee Kruger B.Agr.Sc (Hons I) The University of Queensland A thesis submitted for the degree of Doctor of Philosophy The University of Queensland Australia School of Land and Food Sciences February 2005

Narelle Kruger PhD thesis

Embed Size (px)

Citation preview

Page 1: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF

MARKER-ASSISTED SELECTION

IN A WHEAT BREEDING

PROGRAM

Narelle Lee Kruger B.Agr.Sc (Hons I)

The University of Queensland

A thesis submitted for the degree of Doctor of Philosophy

The University of Queensland Australia

School of Land and Food Sciences

February 2005

Page 2: Narelle Kruger PhD thesis
Page 3: Narelle Kruger PhD thesis

Declaration of Originality

This thesis is the original work of the author, except as otherwise indicated.

It has not been submitted previously for a degree at any University.

Narelle Lee Kruger

Page 4: Narelle Kruger PhD thesis
Page 5: Narelle Kruger PhD thesis

ACKNOWLEDGEMENTS

v

Acknowledgements

I would like to thank my supervisors Mark Cooper, Kaye Basford and Dean

Podlich. They have provided countless hours of direction, guidance, assistance and

support to me throughout this research and I appreciate the time they have given up to

see this work through to the end. Thank you also to Mark and Dean’s families who let

me into their homes while I was visiting them in the USA.

I thank Chris Winkler at Pioneer Hi-bred International and Pioneer Hi-bred In-

ternational for accommodating me on my visits to Des Moines, USA.

I would like to thank all the QTL detection analysis software programmers who

helped me via email and especially to Friedrich Utz who helped to ensure PLABQTL

would run on our computer systems.

Thankyou to the Australian Grains Research and Development Corporation for

financial support as a Grains Research Scholar. The Graduate School Research Travel

Award from The University of Queensland was invaluable as a mechanism for visiting

Mark and Dean in the USA to ensure this work was completed.

Thanks to my good friends and colleagues Nicole Jensen, Jo Stringer, Kevin

Micallef, Hunter Laidlaw, Ky Mathews and Allan Rattey, who made studying at UQ

immensely enjoyable. You have all provided me with invaluable advice in your areas of

expertise, and have either been through, or are presently immersed in the PhD process.

To Chris, who I truly love, for without this thesis we would never have met.

Thank you for everything.

Finally, thanks to Mum, Dad, Shane, Karen and Debra and the rest of my family

who supported me through the whole process, even when the light seemed to be moving

away faster than I was travelling. I love and miss you all.

Page 6: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

vi

Page 7: Narelle Kruger PhD thesis

ABSTRACT

vii

Abstract

The wheat Germplasm Enhancement Program, managed from the University of

Queensland, was developed to provide a source of high yielding and high quality wheat

germplasm to the pedigree breeding programs run by the Leslie Research Centre at

Toowoomba and the Plant Breeding Institute of the University of Sydney at Narrabri.

Investigating the feasibility of introducing marker-assisted selection into the Germ-

plasm Enhancement Program was considered an important step in an attempt to

increase genetic gains for this breeding program. Implementing and testing marker-

assisted selection in the Germplasm Enhancement Program as an empirical experiment

would be costly and time consuming. By examining through simulation the impact of

marker-assisted selection in combination with S1 family (the current approach) and

doubled haploid line selection strategies, it was feasible to determine their ability to

contribute towards accelerated rates of response to selection.

The aim of most wheat breeding programs is to develop commercially viable

cultivars that are superior in performance (quality and yield stability) to those presently

being grown in the target production system. Until recently, producing a superior

cultivar has been based on a combination of experiences, quantitative genetic theory

predictions and the outcomes of the laborious work involved in empirical studies.

Empirical experimentation will always be essential, however, simulation provides a

methodology to extend the basic quantitative genetics theoretical prediction equations

by relaxing some key assumptions applied to make the mathematical equations

tractable. The simulation work in this thesis was conducted using the QU-GENE

(QUantitative-GENEtics) simulation platform developed at the University of Queen-

sland (Podlich and Cooper 1998). To ensure that the simulation model was an accurate

extension of the theory, it was important to test the consistency and convergence of the

different strategies for deriving expectations of selection. It was found that under simple

additive models, the simulation accurately modelled multi-genic recombination and

produced the same results as the prediction equations. It was also observed that

departure from the simple additive model frequently invalidated the normality assump-

Page 8: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

viii

tion held by theory and caused the expectations from the prediction equations to over-

estimate the response compared to the simulations.

Reliable detection of quantitative trait loci (QTL) is a critical step in conducting

marker-assisted selection in a breeding program. After comparing a number of

programs, PLABQTL (Utz and Melchinger 1996) was selected as the QTL detection

analysis program to be used throughout this thesis. The modelling of multiple QTL

scenarios for a simulated wheat genome was examined to determine the extent to which

the wheat genome needed to be represented in the simulation experiment to examine the

reliability of the detection of QTL. Representing the full wheat genome did not change

the conclusions compared to simulations based on a reduced genome model. For

example, it was found that a model based on 12 chromosomes, 12 QTL and two flanking

markers per QTL could be used in place of a 21 chromosome, 12 QTL, and eight

flanking markers per QTL model. An advantage of the cutdown in genome size in the

simulation experiments represented a saving in the time taken for the QTL analysis to

complete. As approximately 45 million simulation experiments were analysed in this

thesis, this accounted for a significant saving in time.

Mapping population size, heritability and per meiosis recombination fraction

between a marker and a quantitative trait locus each influenced the detection of QTL.

The number of QTL detected in this study generally increased as the heritiability

increased, the per meiosis recombination fraction became smaller, the mapping

population size was increased or when two or more of these variables were combined.

This work has reinforced that the recommended threshold mapping population size of

500 to 1000 individuals is required for confidence in the power of the mapping study for

QTL detection (Beavis 1998, Ober and Cox 1998, Holland 2004).

Complexities were simulated through the addition of epistasis and genotype-by-

environment (G×E) interaction into the genetic models to determine their impact on the

detection of QTL and on response to selection. These interactions have been shown

experimentally to be important factors influencing grain yield variation in the reference

population of the Germplasm Enhancement Program. Digenic epistatic networks were

found to have no effect on the detection of QTL under the models tested, while more

Page 9: Narelle Kruger PhD thesis

ABSTRACT

ix

complex epistatic networks involving a large number of genes did have an effect.

Genotype-by-environment interactions were found to influence the detection of QTL in a

mapping population due to the complications they can cause in the phenotyping of

individuals, and were particularly influential where QTL had different effects on trait

phenotypes in different environmental conditions. Epistasis and G×E interactions were

also found to cause a decrease in the response to selection for the breeding strategies

when they were included in the genetic models.

For the range of quantitative trait genetic models considered, marker-assisted

selection produced a greater response to selection than phenotypic selection and

marker selection. The result of this simulation study indicated that a breeding strategy

based on a combination of doubled haploid lines and marker-assisted selection was

likely to produce the greatest response to selection for quantitative traits across a wide

range of simple to complex genetic models.

Page 10: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

x

Page 11: Narelle Kruger PhD thesis

LIST OF PUBLICATIONS

xi

List of Publications Principal Author Kruger NL, Cooper M and Podlich DW (2002) Comparison of phenotypic, marker and

marker-assisted selection strategies in an S1 family recurrent selection strategy.

In: JA McComb (ed.) 'Plant Breeding for the 11th Millennium'. Proceedings of

the 12th Australasian Plant Breeding Conference, 15-20 September 2002. Perth,

W. Australia: Australasian Plant Breeding Association Inc. pp. 696-701.

Kruger NL, Cooper M, Podlich DW, Jensen NM and Basford KE (2001) The effect of

population size on QTL detection in recombinant inbred lines. In: G Hollamby,

T Rathjen, R Eastwood and N Gororo (eds). Wheat Breeding Society of Austra-

lia Inc.10th Assembly Proceedings. Mildura, Australia. pp. 194-196.

Kruger NL (1999) Simulation analysis of doubled haploids in a wheat breeding

program. The University of Queensland, School of Land and Food Sciences,

Plant Improvement Group Research Report No.5.

Kruger NL, Podlich DW and Cooper M (1999) Comparison of S1 and doubled haploid

recurrent selection strategies by computer simulation with applications for the

Germplasm Enhancement Program of the Northern Wheat Improvement Pro-

gram. In: P Williamson, P Banks, I Haak, J Thompson and AW Campbell (eds).

Proceedings of the Ninth Assembly Wheat Breeding Society of Australia - Vision

2020. Toowoomba: The University of Southern Queensland. pp. 216-219.

Co-author Cooper M, Podlich DW, Micallef KP, Smith OS, Jensen NM, Chapman SC and Kruger

NL (2001) Complexity, quantitative traits and plant breeding: a role for simula-

tion modeling in the genetic improvement of crops. In: MS Kang (ed.) Quantita-

tive Genetics, Genomics and Plant Breeding. CAB International: Wallingford,

UK. pp. 143-166.

Page 12: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

xii

Page 13: Narelle Kruger PhD thesis

TABLE OF CONTENTS

xiii

Table of Contents ACKNOWLEDGEMENTS .............................................................................................................................V ABSTRACT............................................................................................................................................. VII LIST OF PUBLICATIONS........................................................................................................................... XI TABLE OF CONTENTS............................................................................................................................ XIII LIST OF TABLES.................................................................................................................................... XIX LIST OF FIGURES ................................................................................................................................ XXIII LIST OF ABBREVIATIONS ..................................................................................................................XXXIII

PART I BACKGROUND ..........................................................................................................................1

CHAPTER 1 INTRODUCTION...............................................................................................................3

CHAPTER 2 REVIEW OF LITERATURE ..........................................................................................11 2.1 INTRODUCTION ........................................................................................................................11 2.2 PLANT BREEDING PROGRAMS: A REVIEW OF TRADITIONAL AND MOLECULAR SELECTION TECHNIQUES ...........................................................................................................................................12

2.2.1 Traditional selection...........................................................................................................12 2.2.2 Indirect selection ................................................................................................................14

2.2.2.1 Recombination and linkage .............................................................................................................. 14 2.2.2.2 Generating genetic maps .................................................................................................................. 18 2.2.2.3 Detecting QTL.................................................................................................................................. 19 2.2.2.4 Statistical methods used to detect QTL ............................................................................................ 21 2.2.2.5 Statistical issues to consider when detecting QTL............................................................................ 23 2.2.2.6 Marker-assisted selection ................................................................................................................. 25

2.3 THE GERMPLASM ENHANCEMENT PROGRAM...........................................................................29 2.4 GENOTYPE-ENVIRONMENT FACTORS INFLUENCING RESPONSE TO SELECTION..........................36

2.4.1 Introduction........................................................................................................................36 2.4.2 Epistasis..............................................................................................................................38 2.4.3 G×E interactions ................................................................................................................43

2.5 A ROLE FOR COMPUTER SIMULATION IN THE ANALYSIS OF GENETIC SYSTEMS .........................48 2.5.1 Background.........................................................................................................................48 2.5.2 The QU-GENE simulation platform ...................................................................................52

2.6 SYNOPSIS FROM LITERATURE ...................................................................................................55

CHAPTER 3 MODELLING METHODOLOGY .................................................................................57 3.1 INTRODUCTION .................................................................................................................................57 3.2 ITERATIVE MODELLING PROCESS ......................................................................................................57

3.2.1 Propose the relevant questions ................................................................................................58 3.2.2 Define the proposed simulation experiment or module............................................................59 3.2.3 Develop and test the QU-GENE software................................................................................59 3.2.4 Finalise the design of the simulation experiment.....................................................................59 3.2.5 Implementation of the simulation experiment ..........................................................................60 3.2.6 Compilation of results of the simulation experiment................................................................61 3.2.7 Analysis and interpretation of the simulation experiment........................................................61 3.2.8 Evaluate the results of the simulation experiment in relation to the questions posed..............61

3.3 QUESTIONS PROPOSED FOR THE THESIS.............................................................................................61

Page 14: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

xiv

PART II SIMULATION AS A MODELLING APPROACH ..............................................................63

CHAPTER 4 EXAMINING THE CONSISTENCY BETWEEN PREDICTIONS FROM QUANTITATIVE GENETIC EQUATIONS AND QU-GENE SIMULATIONS OF KEY GENETIC PROCESSES REQUIRED FOR MODELLING SELECTION RESPONSE .................65

4.1 INTRODUCTION ........................................................................................................................65 4.2 RECOMBINATION PREDICTION EQUATIONS .......................................................................................68

4.2.1 Materials and Methods ............................................................................................................69 4.2.1.1 Recombination and linkage disequilibrium ...................................................................................... 69 4.2.1.2 Theory underlying the breaking of linkage....................................................................................... 69 4.2.1.3 QU-GENE simulation of recombination .......................................................................................... 70

4.2.2 Results......................................................................................................................................72 4.2.2.1 Recombination and linkage disequilibrium ...................................................................................... 72

4.3 RESPONSE TO SELECTION PREDICTION EQUATIONS ...........................................................................74 4.3.1 Materials and Methods ............................................................................................................75

4.3.1.1 Theoretical prediction equations for mass, S1 family, and DH line selection methods..................... 75 4.3.1.1.1 Basic response to selection prediction equation ....................................................................... 75 4.3.1.1.2 Comstock’s response to selection prediction equations............................................................ 77

4.3.1.2 Simulating mass, S1 family and DH line selection methods ............................................................. 81 4.3.1.2.1 Investigating convergence of expectation from prediction theory and simulation ................... 83 4.3.1.2.2 Verifying the number of generations of random mating required to reach linkage equilibrium84

4.3.2 Results......................................................................................................................................85 4.3.2.1 Response to selection prediction equations ...................................................................................... 85

4.3.2.1.1 Investigating convergence of expectation from prediction theory and simulation ................... 85 4.3.2.2 Verifying the number of generations of random mating required to reach linkage equilibrium....... 91

4.4 DISCUSSION ......................................................................................................................................95 4.5 CONCLUSION ....................................................................................................................................98

CHAPTER 5 COMPARING QTL DETECTION ANALYSIS PROGRAMS AND SIMULATING THE WHEAT GENOME IN QU-GENE ...............................................................................................99

5.1 INTRODUCTION .................................................................................................................................99 5.2 SELECTING A QTL DETECTION PROGRAM TO BE USED IN THIS THESIS ............................................100

5.2.1 Materials and Methods ..........................................................................................................101 5.2.1.1 Genetic models ............................................................................................................................... 102 5.2.1.2 Creating the mapping population and generating the linkage groups ............................................. 104 5.2.1.3 Conducting the QTL detection analysis.......................................................................................... 105

5.2.2 Results....................................................................................................................................105 5.2.3 Discussion..............................................................................................................................107 5.2.4 Conclusion .............................................................................................................................108

5.3 MODELLING THE WHEAT GENOME FOR QTL DETECTION ANALYSIS USING PLABQTL ..................110 5.3.1 Materials and Methods ..........................................................................................................112

5.3.1.1 Genetic models ............................................................................................................................... 112 5.3.1.2 Creating the mapping population and generating the linkage groups ............................................. 113 5.3.1.3 Conducting the QTL detection analysis.......................................................................................... 114

5.3.2 Results....................................................................................................................................114 5.3.3 Discussion..............................................................................................................................115 5.3.4 Conclusion .............................................................................................................................116

Page 15: Narelle Kruger PhD thesis

TABLE OF CONTENTS

xv

PART III FACTORS AFFECTING THE POWER OF QTL DETECTION...................................117

CHAPTER 6 EFFECT OF MAPPING POPULATION SIZE, PER MEIOSIS RECOMBINATION FRACTION AND HERITABILITY ON QTL DETECTION ...........................................................119

6.1 INTRODUCTION ...............................................................................................................................119 6.2 MATERIALS AND METHODS............................................................................................................121

6.2.1 Genetic models.......................................................................................................................121 6.2.2 Creating the mapping population and generating the linkage groups...................................121 6.2.3 Conducting the QTL detection analysis .................................................................................122 6.2.4 Conducting the statistical analyses........................................................................................122

6.3 RESULTS .........................................................................................................................................123 6.4 DISCUSSION ....................................................................................................................................127 6.5 CONCLUSION ..................................................................................................................................129

CHAPTER 7 THE EFFECT OF GENOTYPE-BY-ENVIRONMENT INTERACTIONS AND DIGENIC EPISTATIC NETWORKS ON QTL DETECTION ........................................................131

7.1 INTRODUCTION ...............................................................................................................................131 7.2 MATERIALS AND METHODS............................................................................................................133

7.2.1 Genetic models.......................................................................................................................133 7.2.1.1 Core model ..................................................................................................................................... 133 7.2.1.2 Digenic epistatic models; E(NK) = 1(10:1) .................................................................................... 134 7.2.1.3 G×E interaction models; E(NK) = 1(10:0), 2(10:0), 5(10:0), 10(10:0)........................................... 137

7.2.2 Creating the mapping population and generating the linkage groups...................................138 7.2.3 Conducting the QTL detection analysis .................................................................................138 7.2.4 Conducting the statistical analyses........................................................................................139

7.3 RESULTS .........................................................................................................................................140 7.3.1 Genetic Models: Additive and Epistatic.................................................................................140 7.3.2 Genetic Models: Additive and G×E interaction ....................................................................142

7.4 DISCUSSION ....................................................................................................................................148 7.5 CONCLUSION ..................................................................................................................................152

PART IV SIMULATION OF PHENOTYPIC, MARKER AND MARKER-ASSISTED SELECTION IN THE WHEAT GERMPLASM ENHANCEMENT PROGRAM.........................155

CHAPTER 8 SELECTION RESPONSE IN THE GERMPLASM ENHANCEMENT PROGRAM FOR ADDITIVE GENETIC MODELS ...............................................................................................157

8.1 INTRODUCTION ...............................................................................................................................157 8.2 MATERIALS AND METHODS.............................................................................................................161

8.2.1 Genetic models.......................................................................................................................161 8.2.2 Creating the mapping population and generating linkage groups ........................................162 8.2.3 Assigning marker profiles ......................................................................................................164 8.2.4 Conducting the QTL detection analysis .................................................................................165 8.2.5 Simulating phenotypic selection, marker selection and marker-assisted selection for S1 families in the Germplasm Enhancement Program ........................................................................166 8.2.6 Conducting the statistical analysis.........................................................................................169

8.3 RESULTS .........................................................................................................................................171 8.3.1 Number of QTL detected........................................................................................................171 8.3.2 Response to selection: phenotypic selection, marker selection, and marker-assisted selection........................................................................................................................................................174

8.4 DISCUSSION ....................................................................................................................................183 8.5 CONCLUSION ..................................................................................................................................187

Page 16: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

xvi

CHAPTER 9 SELECTION RESPONSE IN THE GERMPLASM ENHANCEMENT PROGRAM FOR COMPLEX GENETIC MODELS...............................................................................................189

9.1 INTRODUCTION ...............................................................................................................................189 9.2 MATERIALS AND METHODS............................................................................................................194

9.2.1 Genetic models.......................................................................................................................194 9.2.2 Creating the mapping population and generating linkage groups ........................................197 9.2.3 Assigning marker profiles ......................................................................................................197 9.2.4 Conducting the QTL detection analysis .................................................................................197 9.2.5 Simulating phenotypic selection, marker selection, and marker-assisted selection for S1 families and DH lines in the Germplasm Enhancement Program ..................................................201 9.2.6 Conducting the statistical analyses........................................................................................203

9.2.6.1 QTL detection analysis................................................................................................................... 203 9.2.6.2 Response to selection ..................................................................................................................... 204

9.3 RESULTS .........................................................................................................................................207 9.3.1 Analysis of the QTL detection results over all genetic models...............................................207

9.3.1.1 Percent of QTL segregating............................................................................................................ 207 9.3.1.2 Percent of QTL detected................................................................................................................. 207 9.3.1.3 Percent of QTL detected of those segregating................................................................................ 209 9.3.1.4 Percent of QTL detected with incorrect marker-QTL allele associations....................................... 211

9.3.2 Analysis of the trait mean value (response to selection) ........................................................215 9.3.2.1 Analysis over 10 cycles of selection of the Germplasm Enhancement Program ............................ 215 9.3.2.2 Analysis conducted at cycle five of the Germplasm Enhancement Program.................................. 217

9.3.3 Detailed analysis of the trait mean value for specific genetic models ...................................219 9.3.3.1 Case 1: No G×E interaction, no epistasis; E(NK) = 1(12:0) ........................................................... 219 9.3.3.2 Case 2: G×E interaction present, no epistasis; E(NK) = 10(12:0)................................................... 222 9.3.3.3 Case 3: No G×E interaction, epistasis present; E(NK) = 1(12:5).................................................... 225 9.3.3.4 Case 4: G×E interactions and epistasis present; E(NK) = 10(12:5) ................................................ 229

9.3.4 General trends across E(NK) models ....................................................................................232 9.4 DISCUSSION ....................................................................................................................................233

9.4.1 QTL detection analysis ..........................................................................................................233 9.4.2 Response to selection: S1 and DH with phenotypic selection, marker selection and marker-assisted selection strategies ............................................................................................................238

9.5 CONCLUSION ..................................................................................................................................243

PART V GENERAL DISCUSSION AND CONCLUSIONS..............................................................245

CHAPTER 10 GENERAL DISCUSSION............................................................................................247

BIBLIOGRAPHY ..................................................................................................................................261

APPENDICES ........................................................................................................................................285

APPENDIX 1 ADDITIONAL INFORMATION ASSOCIATED WITH CHAPTER 4...................287 A1.1 ADDITIONAL INFORMATION FOR THE RESPONSE TO SELECTION PREDICTION EQUATIONS.............287

A1.1.1 Gene action definitions for different prediction equations ..................................................287 A1.1.2 Alternate S1 family prediction equations .............................................................................287 A1.1.3 Effect of inbreeding on the variance components coefficient ..............................................288

A1.2 QUANTITATIVE GENETICS THEORY ASSUMPTIONS........................................................................290 A1.3 ASSUMPTION OF NORMALITY IN THE BASE POPULATION DOES NOT HOLD WHEN DOMINANCE IS INCLUDED.............................................................................................................................................291

Page 17: Narelle Kruger PhD thesis

TABLE OF CONTENTS

xvii

APPENDIX 2 ADDITIONAL INFORMATION ASSOCIATED WITH CHAPTER 5...................299 A2.1 GENERATING A LINKAGE MAP AND ITS ASSOCIATION WITH MAPPING POPULATION SIZE ..............299

A2.1.1 Model 1 - one chromosome, one QTL, two flanking markers..............................................300 A2.1.2 Model 2 - two chromosomes, three QTL per chromosome, two flanking markers per QTL301 A2.1.3 Model 3 - 10 chromosomes, one QTL per chromosome, two flanking markers per QTL....302 A2.1.4 Model 4 - 10 chromosomes, two QTL per chromosome, four flanking markers per QTL...303 A2.1.5 Conclusion...........................................................................................................................304

A2.2 QU-GENE INPUT FILES FOR QTL DETECTION ANALYSIS PROGRAMS...........................................305 A2.2.1 Model 1 - one chromosome, one QTL, two flanking markers..............................................305 A2.2.2 Model 2 - two chromosomes, three QTL per chromosome, two flanking markers per QTL305 A2.2.3 Model 3 - 10 chromosomes, one QTL per chromosome, two flanking markers per QTL....306 A2.2.4 Model 4 - 10 chromosomes, two QTL per chromosome, four flanking markers per QTL...307

APPENDIX 3 ADDITIONAL INFORMATION ASSOCIATED WITH CHAPTER 8...................311 A3.1 NUMBER OF QTL DETECTED........................................................................................................311 A3.2 RESPONSE TO SELECTION: PHENOTYPIC SELECTION, MARKER SELECTION, AND MARKER-ASSISTED SELECTION............................................................................................................................................312

APPENDIX 4 ANALYSES OF VARIANCE FOR FACTORS AFFECTING THE DETECTION OF QTL AND RESPONSE TO SELECTION ....................................................................................317

A4.1 FACTORS AFFECTING QTL SEGREGATION AND DETECTION..........................................................317 A4.2 ANALYSIS OF RESPONSE TO SELECTION .......................................................................................323 A4.3 RESPONSE TO SELECTION RESULTS ..............................................................................................331

Page 18: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

xviii

Page 19: Narelle Kruger PhD thesis

LIST OF TABLES

xix

List of Tables

Table 2.1 Estimated variance components (±s.e.) relative to F2 for grain yield (t ha-1) of recombinant inbred line derived from 11IBSWN50/Vasco and Hartog/Vasco crosses tested in Queensland in 1989. Extract of Table 3 (Fabrizius et al. 1997) .............................................................................................................................. 43

Table 2.2 Estimates of genetic parameters for grain yield (t ha-1) of 49 wheat lines

tested in six environments in Queensland. Extract of Table 10.1 (Cooper et al. 1996b) ............................................................................................................................ 46

Table 2.3 Estimated variance components (±s.e.) for grain yield (t ha-1) of recombinant

inbred lines derived from two crosses, 11IBSWN50/Vasco and Hartog/Vasco, tested at three sites in Queensland in 1989. Extract of Table 2 (Fabrizius et al. 1997) .............................................................................................................................. 46

Table 2.4 Characterisation of the genetic architecture of a trait according to heritability

level and some of the factors affecting complexity. Adapted from (Cooper and Hammer 1996) ............................................................................................................... 54

Table 4.1 Experimental variable levels defined in the PEQ module to compare the

response to selection from simulation and expectations from prediction equa-tions................................................................................................................................ 84

Table 4.2 Experimental variable levels used in the PEQ module to verify linkage

equilibrium results from Section 4.2 .............................................................................. 85 Table 4.3 Average number of generations of random mating (RM) required to reach

linkage equilibrium (observed recombination fraction, R = 0.5) for three per meiosis recombination fractions (based on linkage in coupling over 500 runs). Results from Figure 4.3.................................................................................................. 85

Table 5.1 Experimental variables used to define each genetic model for the QUGENE

input file. Chr = chromosome, c = per meiosis recombination fraction and h2 = heritability of trait on an observational unit, MP-LG = mapping population size used to determine the linkage groups and MP-QTL = QTL detection mapping population size............................................................................................... 102

Table 5.2 QTL detection analysis results for a QTL mapping population size of 100

individuals: if QTL detected, if QTL not detected, IM = interval mapping and CIM = composite interval mapping. NC = not conducted..................................... 106

Table 5.3 QTL detection analysis results for a QTL mapping population size of 100

individuals: if QTL detected, if QTL not detected, IM = interval mapping and CIM = composite interval mapping. NC = not conducted..................................... 106

Table 5.4 QTL detection analysis results for a QTL mapping population size of 100

individuals: if QTL detected, if QTL not detected, IM = interval mapping and CIM = composite interval mapping. NC = not conducted..................................... 107

Table 5.6 Experimental variables used to define each genetic model for the QUGENE

input file. chr = chromosome, c = per meiosis recombination fraction and h2 = heritability of trait on an observational unit, MP-QTL = QTL detection map-ping population size ..................................................................................................... 112

Page 20: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

xx

Table 6.1 Analysis of variance for the number of QTL detected. Degrees of freedom (DF) and F values are shown for per meiosis recombination fraction (c), heritability (h2), and mapping population size (MP) and first-order interac-tions. σ2 = error mean square........................................................................................ 123

Table 6.2 Number of QTL detected (averaged over 100 runs) for a simulated Germplasm

Enhancement Program mapping study for four mapping population sizes (MP), two heritability levels (h2) and three per meiosis recombination frac-tions (c) between a marker and QTL. Percentage of QTL detected out of the total number of polymorphic QTL also shown in parentheses..................................... 126

Table 7.1 Experimental variable levels used to specify the core genetic models studied ............ 134 Table 7.2 The percentage of additive ( )2Aσ , dominance ( )2

Dσ and epistatic ( )2Kσ

variance of the total genotypic ( )2Gσ variance for each of the models ........................ 135

Table 7.3 The matrix of gene codes in each environment-type. A 0 indicates no G×E

interaction as the gene has no effect, a 1 indicates the gene follows m = mid-point, a = additive, d = dominance values, a -1 indicates a crossover effect. This table is set out so that as the number of environment-types increases the level of complexity in the system increases as more genes are interacting with the environment-type.................................................................................................... 138

Table 7.4 Degrees of freedom (DF) and F values shown for per meiosis recombination

fraction (c), heritability (h2), mapping population size (MP), epistatic model (B), and first-order interactions affecting the number of QTL detected. σ2 = er-ror mean square ............................................................................................................ 141

Table 7.5 Degrees of freedom (DF) and F values shown for per meiosis recombination

fraction (c), heritability (h2), mapping population size (MP), number of envi-ronment-types (E), and first-order interactions affecting the number of QTL detected. σ2 = error mean ............................................................................................. 143

Table 8.1 Experimental variable levels used to specify the core genetic models studied ............ 162 Table 8.2 Experimental variable levels utilised in the GEPMAS module. METs = multi-

environment trials, GEP = Germplasm Enhancement Program. .................................. 166 Table 8.3 Number of polymorphic QTL for each bi-parental mapping population

replication and the number of QTL detected for each of the 36 genetic models. Average across replications is also presented. c = per meiosis recombination fraction between QTL and marker, h2 = heritability, MP = mapping population size ............................................................................................................................... 172

Table 8.4 Degrees of freedom (DF) and F values shown for per meiosis recombination

fraction (c), heritability (h2), mapping population size (MP), gene frequency (GF), and first-order interactions affecting the number of QTL detected. σ2 = error mean square ......................................................................................................... 173

Table 8.5 Degrees of freedom (DF) and F values shown for per meiosis recombination

fraction (c), heritability (h2), mapping population size (MP), gene frequency (GF), Selection strategy (SS), cycles (cyc) and first-order interactions affect-ing the response to selection. σ2 = error mean square .................................................. 175

Table 9.1 Experimental variable levels defined in the QU-GENE engine to create the

genotype-environment genetic models......................................................................... 196

Page 21: Narelle Kruger PhD thesis

LIST OF TABLES

xxi

Table 9.2 Experimental variable levels utilised in the QTL detection analysis............................ 197 Table 9.3 Experimental variable levels utilised in the GEPMAS module.................................... 198

Page 22: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

xxii

Page 23: Narelle Kruger PhD thesis

LIST OF FIGURES

xxiii

List of Figures

Figure 1.1 Outline of the structure of investigations conducted to simulate the different breeding strategies considered for the Germplasm Enhancement Program in this thesis. Blue indicates the definition of genetic models and construction of reference and base populations for the Germplasm Enhancement Program. Yellow indicates the simulation of mapping and QTL experiments and the green indicates the simulation of the breeding strategies of interest. The part numbers indicate within which Parts of the thesis these phases are addressed ................ 8

Figure 2.1 Genetic map of the group 1 chromosomes of Triticeae (Vandeynze et al.

1995). The centromere of the chromosome is indicated by the bold letter C................. 16 Figure 2.2 QTL detection analysis for a single chromosome with six markers (equally

spaced 0.2 Morgans apart) and three segregating QTL. The mapping popula-tion size was 200. All six markers were significant for QTL effects using sin-gle marker analysis (single marker). Interval mapping (IM) detected four sig-nificant QTL peaks. Composite interval mapping (CIM) detected three signifi-cant QTL peaks and multiple interval mapping (MIM) detected four signifi-cant QTL peaks. Detection of false QTL may be a result of low population size. The likelihood ratio threshold was set at 11.5. These simulated data were generated using QU-GENE, the analyses were conducted in QTL CARTOG-RAPHER (Basten et al. 1994, 2001).............................................................................. 22

Figure 2.3 Outline of the wheat growing areas in Australia and the northern grains region.

Adapted from Montana Wheat & Barley Committee (2002) ......................................... 30 Figure 2.4 Components and pathways of germplasm transfer for yield improvement in the

Australian Northern Wheat Improvement Program: LRC-QDPI represents the Queensland Department of Primary Industries pedigree breeding programs lo-cated in Toowoomba at the Leslie Research Centre; PBI-US represents the University of Sydney pedigree breeding programs located in Narrabri; and the Germplasm Enhancement Program is conducted by the University of Queen-sland (Cooper et al. 1999a) ............................................................................................ 31

Figure 2.5 Outline of the activities involved in the S1 family and doubled haploid (DH)

line breeding strategies over one cycle of the Germplasm Enhancement Pro-gram. The S1 activities are adapted from (Fabrizius et al. 1996). MET = multi-environment trial ............................................................................................................ 34

Figure 2.6 Example of additive×additive interaction. Shows favourable allelic combina-

tions aabb and AABB give the highest genotypic value ................................................ 40 Figure 2.7 Classification of genotype-by-environment (G×E) interactions, A and B are

two genotypes and lines represent the responses of the genotypes in two envi-ronments; type 1 parallel response (no G×E interaction), type 2 non-crossover response, type 3 crossover response............................................................................... 45

Figure 2.8 Number of articles published in the last 34 years with “simulation” and either

“genetic*” or “plant breeding” as words anywhere in the AGRICOLA (1970-12/2003), CAB (1984-1/2004), and Biological Abstracts (1984-12/2003) data-bases. Note: some article duplication may have occurred. * represents all ex-tensions of genetic. Each category contains five years, except the last which contains 4 years .............................................................................................................. 49

Page 24: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

xxiv

Figure 2.9 Number of articles published in the last 34 years with “marker assisted” or “marker assisted and simulation” as words anywhere in the AGRICOLA (1970-12/2003), CAB (1984-1/2004), and Biological Abstracts (1984-12/2003) databases. Note: some article duplication may have occurred. Each category contains five years, except the last, which contains 4 years ............................ 51

Figure 2.10 Schematic outline of the QU-GENE simulation software. The central ellipse

shows the engine and the surrounding boxes show the application modules (Podlich and Cooper 1997, 1998)................................................................................... 52

Figure 3.1 Iterative modelling methodology process used to design simulation

experiments for this thesis.............................................................................................. 58 Figure 4.1 Schematic outline of the LINKEQ module. Two opposing extreme inbred

individuals with two genes in coupling phase linkage were crossed to form the F1, which was selfed to form the F2 population. The F2 population was sub-jected to a number of generations of random mating until the observed fre-quency of recombinant gametes reaches R ≥ 0.4. After each cycle of random mating if the observed frequency of recombinant gametes R < 0.4, the F2 population is randomly mated until R ≥ 0.4 ................................................................... 71

Figure 4.2 Number of generations of random mating required to reach an observed

recombination fraction of R = 0.4 between two genes for the simulation (with standard deviation bars) using QU-GENE and the theoretical values calculated from Equation (4.1) for a range of per meiosis recombination fractions. The smaller the per meiosis recombination fraction, the tighter the linkage and the more generations of random mating required to break the linkage ................................ 72

Figure 4.3 Number of generations of random mating required to reach an observed

recombination fraction of R = 0.5 between two genes for the simulation (with standard deviation bars) using QU-GENE for a range of per meiosis recombi-nation fractions. The smaller the per meiosis recombination fraction, the tighter the linkage and the more generations of random mating required to break this linkage ........................................................................................................... 73

Figure 4.4 Schematic outline of the PEQ module, (a) mass selection strategy, (b) S1

family (self) and DH line (double) strategy. This example shows a two gene model in coupling with a base population size of 1000 individuals ............................... 82

Figure 4.5 Response to selection for the mass selection strategy for the simulation (Sim),

with standard deviation bars, Basic prediction equation (Basic, Equation 4.3) and Comstock prediction equation (Com, Equation 4.9). Response was as-sessed in one environment (E = 1) with three gene levels (N = 2, 10, 50) and no epistasis (K = 0), with a reference F2 population size of 1000, additive gene action, and linkage equilibrium ...................................................................................... 87

Figure 4.6 Response to selection for the S1 family selection strategy for the simulation

(Sim), with standard deviation bars, Basic prediction equation (Basic, Equa-tion 4.4) and Comstock prediction equation (Com, Equation 4.11). Response was assessed in one environment (E = 1) with three gene levels (N = 2, 10, 50) and no epistasis (K = 0), with a reference S0 population size of 1000, additive gene action, and linkage equilibrium. f is the number of progeny tested per S0 plant (level of replication) and b is the number of reserve seed intermated to create the reference population after selection ............................................................... 88

Figure 4.7 Response to selection for the DH line selection strategy for the simulation

(Sim), with standard deviation bars, Basic prediction equation (Basic, Equa-tion 4.5) and Comstock prediction equation (Com, Equation 4.12). Response

Page 25: Narelle Kruger PhD thesis

LIST OF FIGURES

xxv

was assessed in one environment (E = 1) with three gene levels (N = 2, 10, 50) and no epistasis (K = 0), with a reference S0 population size of 1000, additive gene action, and linkage equilibrium. f is the number of progeny tested per S0 plant (level of replication) and b is the number of reserve seed intermated to create the reference population after selection ............................................................... 90

Figure 4.8 Random mating reduced the effect of linkage disequilibrium for a per meiosis

recombination fraction of c = 0.05 to reach an observed linkage equilibrium of R = 0.5 for the response to selection of the simulation (Sim) for the mass se-lection strategy. Response to selection for the Basic (Basic) and Comstock (Com) prediction equations are the same across all plots and assume linkage equilibrium. A one environment (E = 1), 10 gene (N = 10) and no epistasis (K = 0) genetic model was tested. A reduction in linkage equilibrium was ob-served for both coupling and repulsion phase linkage.................................................... 92

Figure 4.9 Random mating reduced the effect of linkage disequilibrium for a per meiosis

recombination fraction of c = 0.05 to reach an observed linkage equilibrium of R = 0.5 for the response to selection of the simulation (Sim) for the S1 family selection strategy. Response to selection for the Basic (Basic) and Comstock (Com) prediction equations are the same across all plots and assume linkage equilibrium. A one environment (E = 1), 10 gene (N = 10) and no epistasis (K = 0) genetic model was tested. A reduction in linkage equilibrium was ob-served for both coupling and repulsion phase linkage.................................................... 93

Figure 4.10 Random mating reduced the effect of linkage disequilibrium for a per meiosis

recombination fraction of c = 0.05 to reach an observed linkage equilibrium of R = 0.5 for the response to selection of the simulation (Sim) for the DH line selection strategy. Response to selection for the Basic (Basic) and Comstock (Com) prediction equations are the same across all plots and assume linkage equilibrium. A one environment (E = 1), 10 gene (N = 10) and no epistasis (K = 0) genetic model was tested. A reduction in linkage equilibrium was ob-served for both coupling and repulsion phase linkage.................................................... 94

Figure 5.1 The three step process to follow allowing a QTL detection analysis to be

conducted on a simulated population ........................................................................... 102 Figure 5.2 Schematic outline of the Model 1, 2, 3 and 4 linkage groups. For Model 1 and

2 the markers are spaced at 11 cM (c = 0.1) from each QTL or marker. For Model 3 the markers are spaced at 5.2 cM (c = 0.05) from the QTL and for Model 4 the markers are spaced at 5.2 cM (c = 0.05) from a marker and 2.5 cM (c = 0.025) from a QTL. The per meiosis recombination fraction was con-verted to using the Haldane mapping function (Haldane 1931) ................................... 103

Figure 5.3 Schematic outline of artificially zooming in on regions of the wheat genome

containing QTL contributing towards a trait of interest. Simulation of the wheat genome progressed from the genetic map of wheat (a), which may con-tain 12 QTL of interest and can be represented for simulation using 21 linkage groups, each with eight markers, and 12 linkage groups with one QTL (b), this can be reduced to 12 chromosomes each containing a QTL (c) and then to 12 chromosome each with one QTL and two flanking markers (d). The Haldane mapping function (Haldane 1931) was used to convert from per meiosis re-combination fractions. Wheat genome figures (Nelson et al. 1995a, Nelson et al. 1995b, Nelson et al. 1995c, Vandeynze et al. 1995, Marino et al. 1996) ............... 111

Figure 6.1 A sample of articles (86) on plant QTL analysis was assessed on the basis of

the mapping population size used to find QTL and the number of QTL de-tected per trait. The filled bars indicate the percentage of papers that reported a mapping population size in the indicated range. The error bars indicate the

Page 26: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

xxvi

minimum and maximum number of QTL per trait, with the filled circle indi-cating the average. 51% of the papers used a mapping population size between 60 and 140 individuals ................................................................................................. 120

Figure 6.2 Schematic outline of the simulated linkage groups. Ten chromosomes, each

with one QTL and two flanking markers. The example here has the markers spaced at 11 cM from the QTL, or a per meiosis recombination fraction of c = 0.1 on either side of the QTL when converted using the Haldane mapping function (Haldane 1931)............................................................................................... 121

Figure 6.3 Percent of QTL detected (averaged over 100 runs) for each significant

experimental variable from the analysis of variance. All levels within experi-mental variable factors were significantly different. All 10 QTL were segre-gating............................................................................................................................ 124

Figure 6.4 Significant first-order interactions from the analysis of variance for the

number of QTL detected. h2 = heritability, c = per meiosis recombination frac-tion, MP = mapping population size............................................................................. 125

Figure 7.1 Genotypic values for the six genetic models considered: (a) an additive model,

(b-d) are the random digenic epistatic networks and (e-f) are the McMullen (2001), maysin and 3-deoxyanthocyanin digenic epistatic networks, respec-tively............................................................................................................................. 136

Figure 7.2 Number of QTL detected as a percentage of the total runs are shown for four

digenic epistatic models (E(NK) = 1(10:1)) with a heritability of h2 = 0.1, per meiosis recombination fraction of c = 0.01(a-c) and c = 0.1 (d) with four map-ping population sizes (MP = 100, 200, 500, 1000). Presence of false QTL oc-curs when 11 QTL were detected................................................................................. 142

Figure 7.3 Percent of QTL detected (averaged over 100 runs) for the number of

environment-types (a) and significant first-order interactions (b-c). h2 = herita-bility, MP = mapping population size and E = number of environment-types............. 143

Figure 7.4 Number of QTL detected as a percentage of the total runs are shown for

genetic models with no epistasis and either (a) one: E(NK) = 1(10:0), (b) two: E(NK) = 2(10:0), (c) five: E(NK) = 5(10:0), or (d) 10: E(NK) = 10(10:0) envi-ronment-types in the target population of environments with a heritability of h2 = 0.25, per meiosis recombination fraction of c = 0.01 and four mapping population sizes (MP = 100, 200, 500, 1000)............................................................... 145

Figure 7.5 Number of QTL detected as a percentage of the total runs are shown for

genetic models with no epistasis and either (a) one: E(NK) = 1(10:0), (b) two: E(NK) = 2(10:0), (c) five: E(NK) = 5(10:0), or (d) 10: E(NK) = 10(10:0) envi-ronment-types in the target population of environments with a heritability of h2 = 1.0, per meiosis recombination fraction of c = 0.01 and four mapping population sizes (MP = 100, 200, 500, 1000)............................................................... 146

Figure 7.6 Number of QTL detected as a percentage of the total runs are shown for

genetic models with no epistasis and either (a) one: E(NK) = 1(10:0), (b) two: E(NK) = 2(10:0), (c) five: E(NK) = 5(10:0), or (d) 10: E(NK) = 10(10:0), en-vironment-types in the target population of environments with a heritability of h2 = 0.25, per meiosis recombination fraction of c = 0.1 and four mapping population sizes (MP = 100, 200, 500, 1000)............................................................... 147

Figure 7.7 Number of QTL detected as a percentage of the total runs are shown for

genetic models with no epistasis and either (a) one: E(NK) = 1(10:0), (b) two: E(NK) = 2(10:0), (c) five: E(NK) = 5(10:0), or (d) 10: E(NK) = 10(10:0), envi-

Page 27: Narelle Kruger PhD thesis

LIST OF FIGURES

xxvii

ronment-types in the target population of environments with a heritability of h2 = 1.0, per meiosis recombination fraction of c = 0.1 and four mapping population sizes (MP = 100, 200, 500, 1000)............................................................... 148

Figure 8.1 Schematic outline of the sequence of computer programs used to determine

response to selection in the GEP. QUGENE is the QU-GENE engine, GEXPV2 used the output from QUGENE to create input data for PLABQTL. PLABQTL then conducts the QTL detection analysis. GEPMAS is a QU-GENE module that conducts S1 recurrent selection by phenotypic selection and using the QTL detected by analysis using PLABQTL also conducts marker selection and marker-assisted selection............................................................ 161

Figure 8.2 Schematic outline of the sequence of procedures used to simulate the creation

of the mapping population (for QTL detection analysis) and Germplasm En-hancement Program base population. The orange arrows show the information from the QTL detection utilised in marker selection (MS) and marker-assisted selection (MAS) strategies. The two parents used to create the mapping popu-lation are also included in the 10 parent structure used to create the half diallel population of the Germplasm Enhancement Program S1 recurrent selection breeding program (see Figure 8.3). PS = phenotypic selection, RIL = recombi-nant inbred line............................................................................................................. 163

Figure 8.3 Schematic outlines of the simulation of phenotypic selection (PS), marker

selection (MS), and marker-assisted selection (MAS) procedures in the S1 re-current selection module (GEPMAS) used to simulate the Germplasm En-hancement Program. For phenotypic selection, 1 indicates random mating of the reserve seed from the seed increase after multi-environment trials (METs) have been performed, for marker selection, the 2 indicates random mating of the selected plants from the space plant population based on their marker pro-file and for marker-assisted selection, 3 indicates random mating of the reserve seed from the seed increase after marker profiles and multi-environment trials have been performed. The three strategies of the Germplasm Enhancement Program simulated here can be compared to the more detailed description of the Germplasm Enhancement Program given in Chapter 2, Figure 2.5 ....................... 167

Figure 8.4 Significant main effects from the analysis of variance for the number of QTL

detected. All effect levels were significantly different except for those indi-cated by the same letter ................................................................................................ 173

Figure 8.5 Significant main effects from the analysis of variance for response to

selection. Response to selection expressed relative to the maximum potential response to selection (%TG) where TG = target genotype. All effect levels were significantly different except for those indicated by the same letter ................... 176

Figure 8.6 Significant first-order interactions from the analysis of variance for the

response to selection. Response to selection expressed relative to the maxi-mum potential response to selection (%TG) where TG = target genotype. SS = selection strategy, c = per meiosis recombination fraction, h2 = heritability, GF = gene frequency, MP = mapping population size ....................................................... 177

Figure 8.7 Response to selection expressed as percentage of target genotype (average of

the five bi-parental mapping population replicates) for phenotypic selection (PS), marker selection (MS) and marker-assisted selection (MAS) over 10 cy-cles of the Germplasm Enhancement Program. E(NK) = 1(10:0), GF = 0.1, h2 = 0.25 (a-c) and h2 = 1.0 (d-f), c = 0.01, and three mapping population sizes (MP = 200, 500, 1000). TG = target genotype ............................................................. 179

Page 28: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

xxviii

Figure 8.8 Response to selection expressed as percentage of target genotype (average of the five bi-parental mapping population replicates) for phenotypic selection (PS), marker selection (MS) and marker-assisted selection (MAS) over 10 cy-cles of the Germplasm Enhancement Program. E(NK) = 1(10:0), GF = 0.1, h2 = 0.25 (a-c) and h2 = 1.0 (d-f), c = 0.2, and three mapping population sizes (MP = 200, 500, 1000). TG = target genotype ............................................................. 180

Figure 8.9 Response to selection expressed as percentage of target genotype (average of

the five bi-parental mapping population replicates) for phenotypic selection (PS), marker selection (MS) and marker-assisted selection (MAS) over 10 cy-cles of the GEP. E(NK) = 1(10:0), GF = 0.5, h2 = 0.25 (a-c) and h2 = 1.0 (d-f), c = 0.01, and three mapping population sizes (MP = 200, 500, 1000). TG = target genotype ............................................................................................................. 181

Figure 8.10 Response to selection expressed as percentage of target genotype (average of

the five bi-parental mapping population replicates) for phenotypic selection (PS), marker selection (MS) and marker-assisted selection (MAS) over 10 cy-cles of the Germplasm Enhancement Program. E(NK) = 1(10:0), GF = 0.5, h2 = 0.25 (a-c) and h2 = 1.0 (d-f), c = 0.2, and three mapping population sizes (MP = 200, 500, 1000). TG = target genotype ............................................................. 182

Figure 9.1 Outline of the structure of investigations of the thesis towards the simulation

of different breeding strategies. Blue indicates the definition of genetic models and construct reference and base populations for the Germplasm Enhancement Program. Yellow indicates the simulation of mapping and QTL experiments and the green indicates the simulation of the breeding strategies of interest. The part numbers indicate which parts of the thesis these phases are addressed in (Replication of Chapter 1, Figure 1.1; included here for ease of reference) ............ 193

Figure 9.2 Schematic outline of the linkage groups. There were 12 chromosomes each

with one QTL and two flanking markers. The example has the markers spaced at 11 cM from the QTL, equivalent to a per meiosis recombination fraction of c = 0.1 on either side of the QTL using the Haldane mapping function (Haldane 1931)............................................................................................................. 196

Figure 9.3 Schematic outline of the simulation of phenotypic selection (PS), marker

selection (MS) and marker-assisted selection (MAS) procedures in the DH line recurrent selection module (GEPMAS) used to simulate the Germplasm En-hancement Program. For PS, 1 indicates random mating of the reserve seed from the seed increase after multi-environment trials have been performed, for marker selection, 2 indicates random mating of the selected plants from the space plant population based on their marker profile, and for marker-assisted selection, 3 indicates random mating of the reserve seed from the seed in-crease after marker profiles and multi-environment trials have been performed. The implementation of DH line recurrent selection in the Germplasm En-hancement Program can be compared to the S1 family implementation in Chapter 8, Figure 8.3.................................................................................................... 202

Figure 9.4 Significant main effects from the analysis of variance for the percent of QTL

segregating. All effect levels were significantly different except for those indi-cated by the same letter ................................................................................................ 207

Figure 9.5 Significant main effects from the analysis of variance for the percent of QTL

detected. All effect levels were significantly different except for those indi-cated by the same letter ................................................................................................ 208

Figure 9.6 Significant first-order interactions from the analysis of variance for the percent

of QTL detected. All effect levels were significantly different except for those

Page 29: Narelle Kruger PhD thesis

LIST OF FIGURES

xxix

indicated by the same letter. GF = starting gene frequency, K = epistasis level, E = number of environment-types, c = per meiosis recombination fraction, and h2 = heritability............................................................................................................. 209

Figure 9.7 Significant main effects from the analysis of variance for the percent of QTL

detected of those segregating. All effect levels were significantly different ex-cept for those indicated by the same letter ................................................................... 210

Figure 9.8 Significant first-order interactions from the analysis of variance for the percent

of QTL detected of those segregating. All effect levels were significantly dif-ferent except for those indicated by the same letter. GF = starting gene fre-quency, K = epistasis level, E = number of environment-types, and h2 = heritability .................................................................................................................... 211

Figure 9.9 Significant main effects from the analysis of variance for the percent of

incorrect marker-QTL allele associations. All effect levels were significantly different except for those indicated by the same letter ................................................. 212

Figure 9.10 Significant first-order interactions from the analysis of variance for the percent

of QTL detected with incorrect marker-QTL allele associations. All effect lev-els were significantly different except for those indicated by the same letter. GF = starting gene frequency, K = epistasis level, E = number of environment-types and h2 = heritability............................................................................................. 213

Figure 9.11 Percent of QTL detected with incorrect marker-QTL allele associations (IAA)

against the percent of QTL detected, and the percent of replications containing those combinations for (a) a simple additive case, E(NK) = 1(12:0), (b) in-creasing epistasis value E(NK) = 1(12:5), (c) increasing the number environ-ment-types E(NK) = 10(12:0), and (d) increasing both epistasis and environ-ment-types E(NK) = 10(12:5) for a per meiosis recombination fraction of c = 0.05, gene frequency of GF = 0.1 and heritability of h2 = 1.0 ..................................... 214

Figure 9.12 Significant main effects from analysis of variance conducted over 10 cycles of

the Germplasm Enhancement Program. All experimental variable levels were significantly different except epistasis where levels of zero and two were not significantly different. All effect levels were significantly different except for those indicated by the same letter................................................................................. 216

Figure 9.13 Significant first-order interactions from the analysis of variance conducted

over 10 cycles of the Germplasm Enhancement Program. K = epistasis level, E = number of environment-types, SS = selection strategy, PT = population type ......... 217

Figure 9.14 Significant main effects from analysis of variance conducted at cycle five of

the Germplasm Enhancement Program. All experimental variable levels were significantly different ................................................................................................... 218

Figure 9.15 Average percent of QTL segregating (Seg), detected (Det), detected of

segregating (D/S) and incorrect marker-QTL allele associations (IAA), with corresponding trait mean value response as a percent of the target genotype for phenotypic selection (PS), marker selection (MS) and marker-assisted selec-tion (MAS) of S1 families and DH lines for a E(NK) = 1(12:0) model with gene frequency (GF) of 0.1, two per meiosis recombination fractions (c) 0.05 and 0.1 and two heritabilities (h2) 0.1 and 1.0.............................................................. 220

Figure 9.16 Average percent of QTL segregating (Seg), detected (Det), detected of

segregating (D/S) and incorrect marker-QTL allele associations (IAA), with corresponding trait mean value response as a percent of the target genotype for phenotypic selection (PS), marker selection (MS) and marker-assisted selec-

Page 30: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

xxx

tion (MAS) of S1 families and DH lines for a E(NK) = 1(12:0) model with gene frequency (GF) of 0.5, two per meiosis recombination fractions (c) 0.05 and 0.1 and two heritabilities (h2) 0.1 and 1.0.............................................................. 221

Figure 9.17 400 replications of the response to selection for DH and S1 families for the

three selection strategies (phenotypic selection (PS), marker selection (MS) and marker-assisted selection (MAS)), E(NK) = 1(12:0) model with gene fre-quency of 0.1, per meiosis recombination fraction of 0.1 and heritability of 1.0. Corresponds to the set of graphs in Figure 9.15b .................................................. 222

Figure 9.18 Average percent of QTL segregating (Seg), detected (Det), detected of

segregating (D/S) and incorrect marker-QTL allele associations (IAA), with corresponding trait mean value response as a percent of the target genotype for phenotypic selection (PS), marker selection (MS) and marker-assisted selec-tion (MAS) of S1 families and DH lines for a E(NK) = 10(12:0) model with gene frequency (GF) of 0.1, two per meiosis recombination fractions (c) 0.05 and 0.1 and two heritabilities (h2) 0.1 and 1.0.............................................................. 223

Figure 9.19 Average percent of QTL segregating (Seg), detected (Det), detected of

segregating (D/S) and incorrect marker-QTL allele associations (IAA), with corresponding trait mean value response as a percent of the target genotype for phenotypic selection (PS), marker selection (MS) and marker-assisted selec-tion (MAS) of S1 families and DH lines for a E(NK) = 10(12:0) model with gene frequency (GF) of 0.5, two per meiosis recombination fractions (c) 0.05 and 0.1 and two heritabilities (h2) 0.1 and 1.0.............................................................. 224

Figure 9.20 400 replications of the response to selection for DH and S1 families for the

three selection strategies (phenotypic selection (PS), marker selection (MS) and marker-assisted selection (MAS)), E(NK) = 10(12:0) model with gene fre-quency of 0.1, per meiosis recombination fraction of 0.1 and heritability of 1.0. Corresponds to the set of graphs in Figure 9.18b .................................................. 225

Figure 9.21 Average percent of QTL segregating (Seg), detected (Det), detected of

segregating (D/S) and incorrect marker-QTL allele associations (IAA), with corresponding trait mean value response as a percent of the target genotype for phenotypic selection (PS), marker selection (MS) and marker-assisted selec-tion (MAS) of S1 families and DH lines for a E(NK) = 1(12:5) model with gene frequency (GF) of 0.1, two per meiosis recombination fractions (c) 0.05 and 0.1 and two heritabilities (h2) 0.1 and 1.0.............................................................. 226

Figure 9.22 Average percent of QTL segregating (Seg), detected (Det), detected of

segregating (D/S) and incorrect marker-QTL allele associations (IAA), with corresponding trait mean value response as a percent of the target genotype for phenotypic selection (PS), marker selection (MS) and marker-assisted selec-tion (MAS) of S1 families and DH lines for a E(NK) = 1(12:5) model with gene frequency (GF) of 0.5, two per meiosis recombination fractions (c) 0.05 and 0.1 and two heritabilities (h2) 0.1 and 1.0.............................................................. 228

Figure 9.23 400 replications of the response to selection for DH and S1 families for the

three selection strategies (phenotypic selection (PS), marker selection (MS) and marker-assisted selection (MAS)), E(NK) = 1(12:5) model with gene fre-quency of 0.1, per meiosis recombination fraction of 0.1 and heritability of 1.0. Corresponds to the set of graphs in Figure 9.21b .................................................. 229

Figure 9.24 Average percent of QTL segregating (Seg), detected (Det), detected of

segregating (D/S) and incorrect marker-QTL allele associations (IAA), with corresponding trait mean value response as a percent of the target genotype for phenotypic selection (PS), marker selection (MS) and marker-assisted selec-

Page 31: Narelle Kruger PhD thesis

LIST OF FIGURES

xxxi

tion (MAS) of S1 families and DH lines for a E(NK) = 10(12:5) model with gene frequency (GF) of 0.1, two per meiosis recombination fractions (c) 0.05 and 0.1 and two heritabilities (h2) 0.1 and 1.0.............................................................. 230

Figure 9.25 Average percent of QTL segregating (Seg), detected (Det), detected of

segregating (D/S) and incorrect marker-QTL allele associations (IAA), with corresponding trait mean value response as a percent of the target genotype for phenotypic selection (PS), marker selection (MS) and marker-assisted selec-tion (MAS) of S1 families and DH lines for a E(NK) = 10(12:5) model with gene frequency (GF) of 0.5, two per meiosis recombination fractions (c) 0.05 and 0.1 and two heritabilities (h2) 0.1 and 1.0.............................................................. 231

Figure 9.26 400 replications of the response to selection for DH and S1 families for the

three selection strategies (phenotypic selection (PS), marker selection (MS) and marker-assisted selection (MAS)), E(NK) = 10(12:5) model with gene fre-quency of 0.1, per meiosis recombination fraction of 0.1 and heritability of 1.0. Corresponds to the set of graphs in Figure 9.24b .................................................. 232

Page 32: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

xxxii

Page 33: Narelle Kruger PhD thesis

LIST OF ABBREVIATIONS

xxxiii

List of Abbreviations α Critical value ANOVA Analysis of variance c Per meiosis recombination fraction cM centiMorgans Chr Chromosome CIM Composite interval mapping CIMMYT The International Center for Maize and Wheat Improvement D/S Percent of QTL detected of those segregating Det Percent of QTL detected DF Degrees of freedom DH Doubled haploid DNA Deoxyribonucleic acid E Number of environment-types as per the E(NK) model E(NK) Number of environment-types (E), number of genes (N) and the

level of epistasis (K) Fn Filal generation n F value Calculated F statistic value to be compared to a threshold in the F

distribution GEP Germplasm Enhancement Program GEXP Genetic Experiments (QU-GENE module) GEPMAS QU-GENE module used to conduct simulation experiments of the

Germplasm Enhancement Program with phenotypic selection, marker selection and marker-assisted selection

GF Gene frequency G×E Genotype-by-environment h2 Heritability of trait on an observational unit basis IAA Incorrect marker-QTL allele association (Type III QTL detection error) IM Interval mapping K Level of epistasis as per the E(NK) model LG Linkage group LINKEQ QU-GENE module used to conduct the linkage equilibrium experiments LOD log10 likelihood odds ratio lsd Least significant difference M Morgans MAS Marker-assisted selection MET Multi-environment trial MP Mapping population size MS Marker selection N Number of genes as per the E(NK) model NWIP Northern Wheat Improvement Program PEQ QU-GENE module used to compare simulation against theoretical

prediction equations

Page 34: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

xxxiv

PLABQTL QTL detection analysis software (PLAnt breeding and Biology QTL)

PS Phenotypic selection QCC QU-GENE computing cluster QTL Quantitative trait loci QTL×E Quantitative trait loci-by-environment QUGENE QU-GENE genotype-environment system engine QU-GENE Genetic analysis simulation software RIL Recombinant inbred line RM Random mating S1 Self-pollinated for one generation following an inter-individual

cross Seg Percent of QTL segregating TG Target genotype TPE Target population of environments

Page 35: Narelle Kruger PhD thesis

PART I BACKGROUND

1

PART I

BACKGROUND

Page 36: Narelle Kruger PhD thesis

2 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

Page 37: Narelle Kruger PhD thesis

CHAPTER 1 INTRODUCTION

3

CHAPTER 1

INTRODUCTION

The motivation for and focus of the research reported in this thesis was based on

the need for strategic research to support the continued evolution of a breeding strategy

for yield improvement of wheat in the northern grains region of Australia (Northern

Wheat Improvement Program). There has, and continues to be a long-term commitment

to the improvement of yield potential, adaptation and stability of performance of wheat

within the context of the complex target populations of environments (TPE: Comstock

1977) in this dryland farming region (e.g. Brennan and Byth 1979, Brennan et al. 1981,

Cooper et al. 1996a). This historical long-term wheat breeding effort, and the associated

research, has provided a large body of empirical data on the important factors that can

impact yield performance of wheat in this region. The evolution to a pedigree breeding

strategy that was in place in the 1990s was an outcome of empirically evaluating

modifications and suggestions for improvements, and where evidence dictated,

adjustments were made to the breeding program. Strengths and weaknesses of the

incumbent pedigree breeding strategy were recognised and the overall breeding effort

was altered to incorporate backcross breeding. This was targeted at incorporating genes

for specific traits, and recurrent selection methodology, to enhance the pool of locally

adapted inbred lines used as parents in the pedigree breeding program.

During the 1990s the impetus for further enhancements to the overall breeding

effort grew with the availability of molecular marker technology (e.g. restriction

fragment length polymorphisms (RFLP), randomly amplified polymorphic deoxyribo-

nucleic acid (RAPD), amplified fragment length polymorphisms (AFLP) and simple

sequence repeat (SSR); Nadella 1998, Susanto 2004) and doubled haploid (DH) line

Page 38: Narelle Kruger PhD thesis

4 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

production technology (e.g. Jensen and Kammholz 1998). It was recognised that

empirical evaluation of all potential modifications to the incumbent breeding strategy

was impractical for reasons of cost and ability to conduct sufficiently large experiments

to evaluate the power of suggested alternative breeding strategies. Therefore, to support

the empirical research underway on the genetic architecture of yield and the impact of

alternative breeding strategies on improving yield, an investment was made to develop

computer simulation technologies that would enable realistic modelling of the impact

and power of alternative breeding strategies (Podlich and Cooper 1998, Podlich 1999).

This simulation approach gave rise to a co-ordinated research effort with goals to: (i)

obtain empirical results on the genetic control of variation for important traits and their

contributions to yield; (ii) investigate appropriate theoretical models for quantitative

traits; (iii) develop simulation software and high performance computing infrastructure;

and (iv) use these in combination to conduct the strategic research necessary to evolve

the wheat breeding strategies used in the northern grains region.

This thesis is one component of the larger strategic research effort. As such, the

work reported here relies heavily on the empirical genetic research conducted by others

(Cooper et al. 1997, Fabrizius et al. 1997, Nadella 1998, Peake 2002, Jensen 2004,

Susanto 2004) and the simulation infrastructure and methodology developed by others

(Podlich and Cooper 1998, Micallef et al. 2001, Cooper and Podlich 2002). The specific

focus of this thesis, was on the use of computer simulation to evaluate the opportunity

to enhance the rate of genetic gain for quantitative traits within the recurrent selection

Germplasm Enhancement Program component of the Northern Wheat Improvement

Program. The technologies of interest to this evaluation were molecular markers, to

enable marker-assisted selection, and DH production, to rapidly generate inbred lines

for evaluation in multi-environment trials. This thesis reports the results of the computer

simulation investigations that were undertaken to make recommendations on how these

two breeding technologies could be used to enhance the long-term genetic gain from the

Germplasm Enhancement Program. A parallel series of investigations have been

undertaken for other components of the Northern Wheat Improvement Program (e.g.

Jensen 2004).

Page 39: Narelle Kruger PhD thesis

CHAPTER 1 INTRODUCTION

5

The current structure of the Germplasm Enhancement Program is a S1 (self-

pollinated for one generation following an inter-individual cross) recurrent selection

program operating as a parent building component of the Northern Wheat Improvement

Program of Australia (Fabrizius et al. 1996). Recurrent selection programs are con-

ducted to achieve medium and long-term genetic improvement by increasing the

frequency of favourable alleles for genes and gene combinations (Hallauer and Miranda

1988). Optimising the allocation of resources to activities within the Germplasm

Enhancement Program to achieve its role in the Northern Wheat Improvement Program

is a complex problem. There is interest in how effectively markers can be used to

enhance the current phenotypic selection strategy. Any modified breeding strategy will

need to be robust for multiple traits that differ in their genetic architecture, ranging from

simple additive to more complex situations including epistatic and genotype-by-

environment (G×E) interactions. The importance and influence of G×E interactions and

epistasis in the northern grains region, and specifically for the germplasm of relevance

to the Germplasm Enhancement Program, have been outlined in many studies (Brennan

and Byth 1979, Brennan et al. 1981, Cooper et al. 1994a, 1994b, Cooper and DeLacy

1994, Cooper et al. 1996b, Fabrizius et al. 1997, Basford and Cooper 1998, Peake 2002,

Jensen 2004) and are considered as components for the genetic models investigated in

this thesis.

Marker-assisted selection is a recent technological advancement in wheat breed-

ing programs (Howes et al. 1998). Many species now have a sufficient number of

markers to create dense maps and localise associated QTL (Moreau et al. 2000).

Theoretical studies have shown that marker-assisted selection is capable of improving

the efficiency of selection (Lande and Thompson 1990, Lande 1992, Dudley 1993), and

much of the mapping / marker-assisted selection literature reports that knowing the

position of QTL regions and markers will enable breeders to increase the rate of

response of a breeding program. However, moving from these general statements and

evaluating the impact of marker-assisted selection within an applied breeding program

context is not a simple task. The cost of conducting marker-assisted selection experi-

ments in the past has been an expensive venture for a relatively unknown benefit,

resulting in examples of marker-assisted selection rarely being empirically evaluated in

Page 40: Narelle Kruger PhD thesis

6 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

large field experiments (Young 1999, Moreau et al. 2000). The ability to use computer

simulation to model a plant breeding program and conduct in silico, many cycles of

breeding, provides a tool that allows a breeder to determine the impact of a selection

strategy on a breeding program with relatively less time and cost involved than in the

case for field experiments. Computer simulation has been evolving over the past 40+

years (e.g. Fraser 1957a, Kempthorne 1988, Podlich and Cooper 1998), and with the

increase in modern computer speeds, simulation has the potential to be a useful tool in

exploring the response to selection of a breeding program and to help with the decision

making process. Computer simulation research methodologies are also widely applied

outside of the discipline of genetics and plant breeding (e.g. Casti 1997a, Schrage 1999,

Wolfram 2002).

The computer simulation platform QU-GENE, was designed for the quantitative

analysis of genetic models and can be used to model plant breeding programs (Podlich

and Cooper 1998). The two-stage architecture of QU-GENE allows many independent

modules, representing alternative breeding strategies, to be attached to multiple genetic

models of a genotype-environment system defined in the QU-GENE engine. These

modules have the ability to explore a range of breeding strategies, construct mapping

populations and produce multiple breeding population structures. QU-GENE has the

ability to simulate generic genetic model problems, but it can also be used to model

specific breeding programs (e.g. Fabrizius et al. 1996, Jensen 2004).

The question posed at the initiation of this thesis was: “Is there a difference in

the expected response to selection of the Germplasm Enhancement Program for S1

families and DH lines when either phenotypic selection, marker selection or marker-

assisted selection is implemented and both G×E interaction and epistasis influence the

trait of interest?” To answer this question using quantitative genetics theory would be

difficult as the algebraic equations needed to model these systems are intractable as they

would require relaxing many assumptions. To answer this question empirically is not

feasible as it would require many years of field experimentation and significant

resources that are well beyond the scope of the breeding program. Following prelimi-

nary studies (Kruger 1999), and experiences gained from other projects (Fabrizius et al.

Page 41: Narelle Kruger PhD thesis

CHAPTER 1 INTRODUCTION

7

1996, Jensen 2004), simulation was identified as an appropriate platform on which to

seek answers to this question and was used for this thesis.

A schematic outline (Figure 1.1) presents an overview of how each part of the

thesis is interrelated. It was important to undertake the work completed in each of the

proceeding parts to enable the thesis to develop an answer to the key question posed

above. Part I provides the foundation knowledge underlying the concepts examined in

this thesis (not shown on figure). Part II investigates the convergence of simulation and

theory to acquire experience with simulation methods and to determine whether

simulation was an appropriate extension of quantitative genetics theory for the objec-

tives of this thesis. Part II also includes investigations into which QTL detection method

and analysis program to use and to determine whether a reduced genome model could

be used instead of the full wheat genome model for the simulation of a QTL detection

experiment. Part III investigates how QTL detection would be implemented in the

Germplasm Enhancement Program, and how linkage maps would be created. Part III

also evaluates the influence of population size, heritability, per meiosis recombination

fraction, epistasis and G×E interactions on the detection of QTL. This section was

important for the thesis as there was a need to determine the most efficient method for

mapping QTL, conducting a QTL detection analysis using an additional stand alone

program, and incorporating these results back into QU-GENE to simulate the breeding

strategies considered. In Part IV, the work completed in the previous parts allowed a

detailed investigation to be conducted of the opportunities to implement marker-assisted

selection for S1 families and DH lines into the Germplasm Enhancement Program.

Page 42: Narelle Kruger PhD thesis

8 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

ModellingMethodology:

Defining & validating amodelling approach

Base Population

MappingPopulation MS & MAS

QTLanalysis

alogithms

QTLinformation

Germplasm Enhancement Program

MASMS

PSPS

⊗ Part II

Part IIIPart IV

Figure 1.1 Outline of the structure of investigations conducted to simulate the different breeding strategies considered for the Germplasm Enhancement Program in this thesis. Blue indicates the definition of genetic models and construction of reference and base populations for the Germplasm Enhancement Program. Yellow indicates the simulation of mapping and QTL experiments and the green indicates the simulation of the breeding strategies of interest. The part numbers indicate within which Parts of the thesis these phases are addressed (Part I refers to the background literature and is not shown in figure)

This thesis is structured into the following five parts:

Part I: Background (Chapters 1-3): Within this section the foundation and background

to the study is given with the relevant literature reviewed.

Part II: Simulation as a modelling approach (Chapters 4 and 5): The objective of this

section was to introduce the concepts behind the quantitative genetic theory used in

plant breeding programs and how they apply in a computer simulation environment.

This was done by first exploring the convergence between quantitative theory and

computer simulation as two ways of encoding a breeding system into a formal mathe-

matical system for analysis by quantitative methods (Casti 1997a). To focus this

comparison selected topics relevant to this thesis were considered. Simulation experi-

ments were extended from simple genetic models to more complex genetic models for

mass selection, S1 family and DH line population types. Recombination was examined

Page 43: Narelle Kruger PhD thesis

CHAPTER 1 INTRODUCTION

9

in greater detail because of its importance in modelling QTL detection and marker-

assisted selection. Preliminary exploration was conducted on how recombination is

modelled in simulation and the effect of generation time on breaking linkages, an

important concept in long-term marker-assisted selection. A comparison between QTL

detection analysis programs to determine their reliability and the ease with which they

could be run in batch mode was also conducted. PLABQTL (Utz and Melchinger 1996),

was selected as the program to be used for this thesis. An experiment was also con-

ducted to determine whether the detection of QTL was affected by the size of the wheat

genome represented in the simulation experiments. A comparison was made between a

12 chromosome, 12 QTL, two flanking markers per QTL genome model as opposed to a

21 chromosome, 12 QTL, eight flanking markers per QTL wheat genome model

representation.

Part III: Factors affecting the power of QTL detection (Chapters 6 and 7): The objective

of this section was to test a range of factors that may affect the detection of QTL in the

mapping studies underway for the Germplasm Enhancement Program (Nadella 1998,

Cooper et al. 1999a, Susanto 2004). The factors included in this study were mapping

population size, heritability, per meiosis recombination fraction, epistasis, and G×E

interaction. By testing these factors, their influence on QTL detection was determined

and recommended values were established for the variables such as population size,

marker density (defined in terms of per meiosis recombination rate between adjacent

markers) and target heritability for phenotyping. The influence of epistasis and G×E

interactions on QTL detection was also determined.

Part IV: Simulation of phenotypic, marker, and marker-assisted selection in the wheat

Germplasm Enhancement Program (Chapters 8 and 9): The objective of this section

was to apply the outcomes of Parts II and III to a simulation of an applied breeding

situation and determine the effect of marker-assisted selection versus phenotypic

selection and pure marker selection in the Germplasm Enhancement Program. The

response to selection of the Germplasm Enhancement Program for a range of genetic

models, including effects of epistasis and G×E interactions, was examined. The

Page 44: Narelle Kruger PhD thesis

10 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

prospect of using marker-assisted selection to enhance the outcomes of the Germplasm

Enhancement Program for both S1 families and DH lines was determined.

Part V: General discussion and conclusions (Chapter 10): This final section of the

thesis integrates the main findings and developments from Parts I to IV and discusses

issues associated with the design of marker-assisted selection strategies in plant

breeding and the recommendations for the inclusion of marker-assisted selection in the

Germplasm Enhancement Program.

Page 45: Narelle Kruger PhD thesis

CHAPTER 2 REVIEW OF LITERATURE

11

CHAPTER 2

REVIEW OF LITERATURE

2.1 Introduction This review is structured to give a balance of considerations of the literature

relevant to modelling marker-assisted selection in a plant breeding program. These

considerations provide much of the background for the design of the series of simula-

tion experiments conducted in the following Chapters of this thesis. Conventional

selection techniques presently utilised in plant breeding programs are outlined, with an

overview of molecular markers, QTL detection and marker-assisted selection also

given. The Germplasm Enhancement Program goals and strategy are provided as the

specific wheat breeding program case study under investigation. Epistasis, G×E

interaction, and per meiosis recombination fraction are discussed as important factors

that may influence marker-assisted selection as they can introduce potential complica-

tions that can affect the ability to detect true QTL (i.e. QTL that do exist), and define

favourable genotypes for multiple QTL models of traits. This is followed by a review of

computer simulation in genetics, including an overview of the QU-GENE software, the

simulation platform used throughout this thesis. While these review sections build a

foundation for the concepts and experiments used in this thesis, additional relevant

literature is introduced as necessary in the following Chapters.

Page 46: Narelle Kruger PhD thesis

12 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

2.2 Plant breeding programs: a review of traditional and molecular selection techniques 2.2.1 Traditional selection

For centuries farmers have been improving crop germplasm by visually selecting

plants with the preferred phenotype and using the selected plants to produce seed for the

next generation of cropping. This system of phenotypic selection is commonly referred

to as mass selection. More recently, beginning in the late part of the 19th century and

early part of the 20th century, universities, public institutions, private companies and

corporations have taken over this role by designing and managing plant breeding

programs to produce and supply improved genotypes to farmers. Through this evalua-

tion of breeding strategies, plant breeding programs have evolved from simple mass

selection procedures to sophisticated formal plant breeding programs.

The success of a breeding program can be estimated by monitoring the differ-

ence between the mean phenotypic value of the offspring and the parental generation

before selection (Falconer and Mackay 1996). Any change in the mean genetic value of

a population due to the influence of selective forces is termed the realised response to

selection or genetic gain. The basic principle of any plant breeding program is the

continuous improvement of the target species, achieved by maintaining the long-term

response to selection while sustaining new cultivar development using the short-term

response to selection (Hallauer 1981).

For a given trait, predicted response to selection ( )ΔG quantifies the expected

genetic gain achievable in any cycle of selection. Equally, realised response to selection,

measured by comparing the performance of successive cycles of selection, indicates

how much of a prediction was obtained in practice (Duvick et al. 2004). The plant

breeder’s role is to control the intensity and speed of this genetic improvement by

changing the genetic structure of a population (Williams 1964). By understanding the

underlying concepts of the components of the direct response to selection prediction

equation for a trait y,

2yy y y pG i h σΔ = , (2.1)

Page 47: Narelle Kruger PhD thesis

CHAPTER 2 REVIEW OF LITERATURE

13

populations can be manipulated by altering the intensity of selection applied to trait y

( )yi , the heritability of trait y ( )2yh and the square root of the phenotypic variance for

trait y ( )ypσ . Here heritability is defined in the narrow sense as the ratio of the additive

genetic variation to phenotypic variation.

The ability to produce superior genotypes by imposing a breeding strategy de-

pends on the quality of the germplasm used, the genetic architecture of the trait of

interest and the power of the selection techniques used. Most plant breeding programs to

date have used direct selection methods based on selection for the phenotype of the

traits to be improved. Phenotypic selection involves selecting solely on the basis of

phenotypic information provided by the individuals to be selected, and in some cases

their relatives. However, the phenotype rarely gives a complete representation of the

underlying genotype, particularly if epistasis and G×E interactions are important factors

affecting a trait’s phenotypic performance (Mackay 2004). High performance pheno-

types may result from gene combinations that are not easily transferred across genera-

tions (from parent to offspring) resulting in a reduced realised response to selection in

comparison to expectation. The concept of narrow sense heritability provides a measure

of the ease of transfer of genotypic performance values from parents to offspring and is

based on the concepts of the average effects of genes in combination with the reference

population and the additive genetic variance (Falconer and Mackay 1996). This narrow

sense heritability and additive genetic variance have been central concepts in the

definition of prediction equations for a range of breeding strategies. Much of the theory

for the estimation of heritability and prediction of the response to selection has been

based on models that assume no epistasis or G×E interactions. In the presence of

epistasis and G×E interactions the interpretation of the average effects of genes, additive

genetic variance, narrow sense heritability, and predicted response to selection is

complicated and the appropriate application of these concepts in an applied breeding

program comes into question (Cheverud 2001, Holland 2001, Cooper et al. 2002a). In

theory and practice, breeding programs can be designed to account for the influences of

epistasis and G×E interaction on the variation of a trait’s phenotype. Pedigree breeding

strategies have evolved to deal with the specifics of how genotypes combine to produce

Page 48: Narelle Kruger PhD thesis

14 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

improved progeny and to include testing across many environments (Duvick et al.

2004). However, in the absence of a detailed understanding of the genetic architecture

of quantitative traits (e.g. Mackay 2001) all of these breeding strategies have been based

on selection at the level of the phenotype.

2.2.2 Indirect selection In principle, with the increasing availability of molecular markers, dense genetic

maps and genome sequences for plant species, in principle, breeding programs have the

opportunity to advance from direct selection methods based on phenotype selection to

indirect selection methods that use the knowledge of the genome structure and gene-to-

phenotype relationships for traits. Indirect selection methods involve the use of markers

(either morphological or molecular) and their association with QTL (QTL) to select for

traits of interest. Quantitative trait loci give breeders the ability to select for a trait based

on the presence or absence of markers, and can allow selection of plants to occur earlier

in a life cycle, in particular before reproduction. Early in the 20th century the use of

morphological markers to locate QTL was first proposed by Sax (1923), who reported

an association between seed coat colour and seed size in beans. However, the number of

morphological markers available has been rapidly overtaken by the number of molecu-

lar markers that can be associated with QTL. Therefore, it was not until the generation

of large numbers of molecular markers became cheap and reliable that QTL detection

became feasible and popular for many traits and species.

For the remainder of this thesis molecular markers will usually be abbreviated to

markers. This Section of the review aims to provide an overview of recombination

fraction, linkage disequilibrium and the production of a genetic map. It also covers the

statistical methods and issues involved in the detection of QTL and some background to

marker-assisted selection and its use in breeding programs.

2.2.2.1 Recombination and linkage Per meiosis recombination fraction (c) is used in genetics as a measure of the

genetic distance separating two loci and is determined by the likelihood that a crossover

or recombination event will occur between two loci in a single meiosis event. A per

Page 49: Narelle Kruger PhD thesis

CHAPTER 2 REVIEW OF LITERATURE

15

meiosis recombination fraction is estimated as the ratio of recombinant gametes over the

total pool of gametes and is expected to have a value between 0 and 0.5. A per meiosis

recombination fraction c = 0 indicates that no recombinant gametes were observed and

the loci are estimated to be completely linked, while a per meiosis recombination

fraction c = 0.5 indicates that recombinant and parental gametes are equally likely to

occur and the two loci show independent segregation. A genetic map is created by

estimating the probability of a recombination event occurring between many pairs of

markers (Figure 2.1). In general, recombination fractions are not additive along a

chromosome, a problem that becomes more obvious as the genetic distance between

loci increases (Liu 1998). Therefore, mapping functions were developed to convert

recombination fractions into additive map distances, which are measured in terms of the

units of Morgans (M) or centiMorgans (cM).

One of the differences between mapping functions is their ability to account for

the effects of double-crossover events between two loci and/or interference. Double-

crossovers occur when recombination occurs twice between two loci. They cause the

original genotype at the two loci to be restored, therefore no genotypic difference will

be observed in a mapping population where the two loci are the only reference points

observed. Thus, with the incidence of double-crossover events the relative frequency of

crossover events is underestimated by the observable recombination fraction. Interfer-

ence reduces the number of double crossover events because the formation of one

crossover reduces the chances of a second crossover forming close to the first. Kearsey

and Pooni (1996) suggested from experimental data that double-crossovers are unlikely

to occur at a per meiosis recombination fraction c ≤ 0.15 because of interference. If no

interference is assumed, then the probability of a double-crossover is the product of the

probability of a crossover in one region multiplied by the probability of a crossover in

the second region. Haldane’s (1931) mapping function assumes no interference while

Ludwig (1934), Kosambi (1944), Carter and Falconer (1951), Sturt (1976), Rao et al.

(1977), Karlin and Liberman (1978), and Felsenstein (1979), all account for different

levels of interference.

Page 50: Narelle Kruger PhD thesis

16 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

Figure 2.1 Genetic map of the group 1 chromosomes of Triticeae (Vandeynze et al. 1995). The centromere of the chromosome is indicated by the bold letter C

Groups of genes that are linked, and tend to be transmitted intact from one gen-

eration to the next, are referred to as linkage groups. Linkage can influence estimates of

genetic variance for quantitative characters. For achievement of linkage equilibrium in a

population, the opportunity must be provided for genetic recombination within double

heterozygous individuals (i.e. individuals that are heterozygous at the two loci under

consideration). This requires repeated generations of intermating or selfing of heterozy-

gous individuals. Recombination acts to break up linkage blocks and reduce the effect

of linkage disequilibrium. Fehr (1987) commented on a number of factors that influence

the length of linkage blocks that are retained in a breeding population, including: (i) the

number of parents used to develop the population; (ii) the number of generations of

intermating before selfing is initiated; and (iii) the number of selfing generations

conducted after intermating is completed. Another important factor is the extent of

coancestry among the parents used to develop the population.

Page 51: Narelle Kruger PhD thesis

CHAPTER 2 REVIEW OF LITERATURE

17

The number of recombination events is important in determining the extent of

linkage disequilibrium in a mapping population and in turn this determines the extent of

resolution that can be achieved in the mapping of QTL positions. In a breeding

population, once a favourable marker-QTL allele combination has been defined, it is

important that recombination between the favourable QTL allele and the marker allele

does not occur frequently and break up the linkage group. Consider an example of a

favourable linkage combination between a marker and a QTL. Assume a marker, M

(alleles are M and m) is associated with a QTL, Q (alleles Q and q) in a mapping

population. The objective is to use allelic variation at marker M to indirectly select for

an allele at QTL Q in a breeding population. If the favourable allele combination is

defined as MQ (unfavourable combinations are mq, mQ, and Mq) then when selection is

for marker allele M, and against marker allele m, it is expected that selection will be for

QTL allele Q. If a single recombination event occurs between M and Q then the

resulting allele combinations in the progeny from the recombinant gametes will be mQ

and Mq. For progeny possessing the recombinant chromosome, when selecting for

marker allele M, indirect selection is for the unfavourable QTL allele q and not Q. In

these cases, an outcome of recombination is that selection is not for the favourable QTL

allele in the breeding population, and, as the frequency of the recombinant chromo-

somes increases in the breeding population the response to marker selection will

decrease. Therefore, it is important to have an appropriate balance between marker

density on the genetic map and the likelihood of recombination events that will break up

marker-QTL associations both within the mapping study and in the breeding population.

A long-term target for most breeding programs is to establish a dense marker map to

find markers closely linked to the QTL of interest in order to minimise the chance of

recombination events that break marker-QTL associations for a wide range of breeding

strategies. For successful forward breeding, a related, and equally important issue, is to

ensure that any mapping population identifies linkage phase associations between

marker and QTL alleles that are consistent with linkage phase relationships that exist in

the elite populations of the breeding program.

Page 52: Narelle Kruger PhD thesis

18 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

2.2.2.2 Generating genetic maps Markers are sequences of deoxyribonucleic acid (DNA) that indicate positions in

a genome. While markers have a physical position in a genome, to date they have

largely been utilised in combination with genetic maps. A large proportion of the

world’s commercial crops have genetic maps based on markers. Public versions of these

maps can be accessed at http://www.nalusda.gov/pgdic/. There are a range of systems

available for generating markers with each of the techniques having a range of advan-

tages and disadvantages (Korzun 2003). While most genetic maps for crops are based

on restriction fragment length polymorphisms (RFLP), additional simpler techniques

like randomly amplified polymorphic deoxyribonucleic acid (RAPD), amplified

fragment length polymorphisms (AFLP), simple sequence repeats (SSR) and single

nucleotide polymorphisms (SNP) have been developed (Nadella 1998, Korzun 2003,

Susanto 2004). More recently, the development of Diversity Array Technology (DArT)

has provided the ability to discover hundreds of markers in a low-cost single experi-

ment, a major advantage over the systems mentioned above (Wenzl et al. 2004).

To obtain markers and create a sufficiently dense genetic map (e.g. 1 marker / 5

cM, Liu 1998), specially designed experiments are set up to ensure the offspring are

genetically variable, and that polymorphic markers exist and are in linkage disequilib-

rium for the trait of interest. Examples of popular mapping population designs are

backcrosses (BC), F2’s, recombinant inbred lines (RIL) and doubled haploids (DH).

Backcrosses and F2’s are the most frequently used due to the shorter time period

required to generate them. However, recombinant inbred lines and DH populations

allow unlimited replication of the measurement unit (Carbonell et al. 1993), an

important advantage when it is necessary to collect phenotypic data over multiple

environments (Stam 1994). The size of the populations used to generate markers is also

an important component when creating a genetic map. A small population size can

result in undetected or unresolved linkage and low marker coverage across the genetic

map. A larger population size can result in more accurate marker coverage as the

probability of detecting linkages and joining genome segments increases (Liu 1998).

Page 53: Narelle Kruger PhD thesis

CHAPTER 2 REVIEW OF LITERATURE

19

Statistical linkage tests are conducted on the polymorphic markers found in the

mapping population to create a genetic map (Liu 1998). When parents have different

genotypes for a marker, the progeny will segregate for this marker. Multiple markers

segregating in the progeny of a cross provides the structured genetic variation needed to

statistically estimate the relationship between markers and determine whether they co-

segregate and are thus linked on the same chromosome. The differences in the extent of

co-segregation are expected to be due to the different locations of the markers on

chromosomes and the recombination fraction between markers on a chromosome during

meiosis. By conducting a linkage analysis (two-point or multipoint maximum likelihood

ratio and the least squares method), with a program such as MAPMAKER/EXP (Lander

et al. 1987) or JoinMap (Van Ooijen and Voorrips 2001), a genetic map of the genome

of interest can be estimated. Linkage analysis involves aligning the markers into a linear

order by minimising the genetic distances between them based on the patterns of co-

segregation of the markers. A genetic map indicates the number of linkage groups or

chromosomes detected and the estimated recombination fraction between the markers

on each chromosome. It is important to acknowledge that the genetic distances between

the markers are statistical estimates that are measured with some level of error. Figure

2.1 is an example of the estimated genetic map of the Triticeae group 1 chromosomes.

Wheat generally displays relatively low levels of polymorphisms with significantly less

markers occurring on the D genome than on the A and B genomes (Chalmers et al.

2001), which can be seen in Figure 2.1.

2.2.2.3 Detecting QTL Quantitative trait loci are specific regions in the genome that are statistically as-

sociated with genetic variation for quantitative traits. A QTL detection analysis can be

conducted using programs such as PLABQTL (Utz and Melchinger 1996) or QTL

Cartographer (Basten et al. 1994, 2001). A more extensive list of programs can be

found at http://linkage.rockefeller.edu/soft/list.html. To conduct a QTL detection

analysis a reliably estimated genetic map and accurately collected phenotypic and

marker data are required. If the estimated genetic map is poor, then QTL locations will

be poorly estimated. It is also essential to ensure phenotypic values are estimated

accurately and with precision to prevent wrongful QTL detection through errors of

Page 54: Narelle Kruger PhD thesis

20 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

measurement. A range of statistical detection methods can be used to determine the

association between a marker and QTL. A major issue involved in the successful

application of this method in practice is the accurate detection of QTL in mapping

studies and accurate definition of the appropriate multi-locus QTL genotypes as

selection targets.

A major limitation in many QTL mapping studies is population size (Darvasi et

al. 1993, Beavis 1998, Liu 1998, Charmet 2000). The mapping population size needs to

be large enough to ensure that the possible marker-QTL genotypic combinations are

sampled close to their expected frequencies. If a small proportion of the combinations

are sampled then the genetic map created will not be accurate and the QTL detection

investigation is likely to detect QTL that do not exist (i.e. Type I errors) or not detect

QTL that do exist (i.e. Type II errors) due to the lack of information. Previous studies

have shown that population sizes less than 500 have limited power to identify QTL with

small effects and are likely to make a large number of Type I errors (Beavis 1998,

Kruger et al. 2001). Most genome wide searches use 500 individuals with a 10 – 12 cM

map as both a denser map and a larger population size enable more QTL to be detected

and a greater resolution to be achieved in positioning the QTL (Ober and Cox 1998,

Chalmers et al. 2001). In addition, it has been suggested that a population size of 1000

is required to obtain accurate QTL positions and to estimate effects (Holland 2004) and

QTL mapping studies in maize have been conducted using 976 progeny families

(Openshaw and Frascaroli 1997).

The recombination fraction between the marker and a QTL and the number of

meiotic events that allow crossover events to occur is an important issue in the power to

detect QTL in a mapping population. If a QTL is located at a marker (c = 0, complete

linkage) then the QTL effect will be measured with high power as the marker is

perfectly associated with the QTL. If however, the QTL is not located at the marker (0 <

c < 0.5), then the phenotypic effect of the QTL may be biased downwards by (1-2c)

(Lander and Botstein 1989). To accommodate for this bias and achieve the same power

as complete linkage then the population size needs to be increased by a factor of

Page 55: Narelle Kruger PhD thesis

CHAPTER 2 REVIEW OF LITERATURE

21

( )21

1 2c−, and as a consequence the variance explained by the marker decreases by a

factor of ( )2

1

1 2c− (Lander and Botstein 1989). This relationship emphasises one of the

important aspects of population size in mapping studies.

2.2.2.4 Statistical methods used to detect QTL Several statistical methods have been developed over the last 25 years to im-

prove the accuracy and precision for the detection of QTL. Analyses have progressed

from single marker t-tests to testing for multiple QTL over an entire genome. An

overview of single marker analysis t-tests, interval mapping, composite interval

mapping, multiple interval mapping and some statistical issues that need to be consid-

ered when detecting QTL follows.

Single marker analysis t-tests (Soller et al. 1976) involve detecting QTL associ-

ated with each single marker in a series of independent tests. In this analysis method, a

genetic map is not required and QTL effect and location relative to the marker are

confounded. Single marker analysis involves testing for the presence of a QTL only at

markers and determines whether there is a difference in the means of the genetic marker

classes. It is also equivalent to testing for allelic substitution. A test statistic (likelihood

ratio, t-test, analysis of variance or linear regression) of the marker being associated

with a QTL is calculated and compared to a significance threshold for accepting the

presence of a QTL. If the statistical test produces a value greater than the threshold then

the marker is associated with a QTL for the trait of interest (Figure 2.2: single marker).

Interval mapping (Lander and Botstein 1989) analyses two markers at a time to

map a QTL. Interval mapping requires a genetic map, and uses the information from the

map positions of the markers to remove the confounding effect of location and

recombination fraction allowing the location and effects of the QTL to be estimated. By

removing the confounding of a QTL effect and its location, interval mapping is more

powerful than single marker analysis. Interval mapping involves stepping along the

genome (e.g. every 2 cM) and calculating a test statistic for the likelihood of the

Page 56: Narelle Kruger PhD thesis

22 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

presence of a QTL. This test statistic (likelihood ratio or linear regression) calculates the

probability that an individual has a particular QTL genotype given by the marker

information and QTL position. The test statistic can be plotted against the genome and

compared to a threshold value to observe significantly associated markers and QTL

location (Figure 2.2: IM). With interval mapping, QTL positions and effects may be

biased if more than one QTL is present and they both co-segregate on a chromosome as

there will be a level of co-segregation of their effects. In addition, interval mapping does

not account for information provided by other QTL. As a result, searching for one QTL

within intervals can be complicated and confounded by multiple QTL.

Position (Morgans)0.0 0.2 0.4 0.6 0.8 1.0

Like

lihoo

d R

atio

0

50

100

150

200

250 Single markerIMCIM

ThresholdMIM

Figure 2.2 QTL detection analysis for a single chromosome with six markers (equally spaced 0.2 Morgans apart) and three segregating QTL. The mapping population size was 200. All six markers were significant for QTL effects using single marker analysis (single marker). Interval mapping (IM) detected four significant QTL peaks. Composite interval mapping (CIM) detected three significant QTL peaks and multiple interval mapping (MIM) detected four significant QTL peaks. Detection of false QTL may be a result of low popula-tion size. The likelihood ratio threshold was set at 11.5. These simulated data were generated using QU-GENE, the analyses were conducted in QTL CARTOGRAPHER (Basten et al. 1994, 2001)

Composite interval mapping (Jansen 1993, Zeng 1993, 1994) combines interval

mapping (Lander and Botstein 1989) and multiple regression, allowing both regression

QTL QTL QTL

Page 57: Narelle Kruger PhD thesis

CHAPTER 2 REVIEW OF LITERATURE

23

on the QTL within an interval and on marker loci outside that interval. Composite

interval mapping was developed in response to the recognition that many QTL in the

genome may contribute simultaneously towards the genetic variation observed for a

trait. To overcome this, co-factors are used to control for the background genetic

variation from other QTL located at other linked or unlinked markers. Composite

interval mapping adjusts for the effects of these background QTL by regressing on

markers outside the interval where the QTL effect is being tested. The goal of compos-

ite interval mapping is to test for QTL in an interval with statistical independence of

effects of other QTL along the chromosome. This allows an improvement in the

precision and efficiency of mapping multiple QTL (Figure 2.2: CIM).

Multiple interval mapping (Kao et al. 1999) searches for multiple QTL and

combines QTL mapping analysis with an analysis of the genetic architecture of a

quantitative trait. Multiple interval mapping uses a search algorithm to search for

number, positions, effects, and the interactions of significant QTL simultaneously.

Multiple interval mapping is a new technique that tends to be more powerful and precise

than either interval mapping or composite interval mapping for QTL detection (Figure

2.2: MIM). An added advantage of multiple interval mapping is that it can search for

epistatic QTL and estimate individual genotypic value and heritabilities of quantitative

traits. This is an advantage over composite interval mapping which can not be directly

extendable to analysing epistasis (Zeng 2000).

2.2.2.5 Statistical issues to consider when detecting QTL A common issue with QTL detection is the selection of a critical value to deter-

mine a significance threshold. A critical value (α) determines what error risk is

acceptable. There are two classical types of errors. Type I errors occur when the

alternate hypothesis is accepted that a QTL effect exists when there really is no QTL

effect at that position (false positive). A Type II error (false negative) occurs when the

null hypothesis of no QTL effect is accepted, when in reality it does exist. A small

critical value decreases the rate of Type I errors however, conversely it will increase the

rate of Type II errors and reduce the power of the test to detect QTL. A middle ground

needs to be reached when testing for QTL detection. A critical value α = 0.05 is a

Page 58: Narelle Kruger PhD thesis

24 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

commonly accepted significant threshold for general QTL detection (Manly and Olson

1999, Knott and Haley 2000). When dealing with exploratory QTL detection experi-

ments a critical value α = 0.25 may be more appropriate (Beavis 1998) as it allows QTL

with both strong and weak effects to be detected as significant. This then allows a more

stringent significant level or validation test to be imposed on these QTL to determine

their real effect on the trait of interest. All of the QTL detection analyses conducted

within this thesis are exploratory and the recommendations of Beavis (1998) have been

used as a guide throughout this thesis with a critical value α = 0.25 being the common

significance level. In addition to Type I and II errors, a less common type of error, a

Type III error may also exist (Mosteller 1948). In the case for QTL detection, a Type III

error occurs when the presence of a QTL is correctly identified, however, the definition

of the favourable and unfavourable QTL allele is incorrect. This can occur in a QTL

detection analysis when the unfavourable marker alleles are identified with the

favourable QTL allele in relation to the favourable QTL alleles defined in the target

genotype. Following the convention given above in Section 2.2.2.1, this error would

occur whenever mq was defined as the favourable marker-QTL allele combination and

MQ, mQ and Mq, the unfavourable combinations. Thus, in the case described here the

QTL Q, is detected, but the marker-QTL allele combinations are ranked incorrectly, i.e.

mq is defined as superior to MQ when MQ is in fact superior.

Significance thresholds determine whether a QTL will be accepted as significant

or not. Thresholds are calculated as either likelihood ratios (LR) or log10 likelihood odds

ratio (LOD) tests ( )LR LOD 4.6052= × based on normal distributions. Problems with

underlying trait and error distributions being non-normal brought about the use of the

permutation test (Churchill and Doerge 1994, Doerge and Churchill 1996). Permuta-

tions assume the phenotype and genotype are related if there is a QTL effect. By

breaking up this association and randomly reassigning genotype and phenotype

association the null hypothesis of no phenotype-genotype association is tested. Repeated

permutation tests lead to a distribution of the differences of sample means under the

hypothesis of no association between marker and trait (Doerge et al. 1997). The more

repetitions conducted, the more reliable the empirical significance threshold will be.

Page 59: Narelle Kruger PhD thesis

CHAPTER 2 REVIEW OF LITERATURE

25

There are a wide range of statistical and experimental issues that may affect the

power to detect QTL, which include: (i) trait values and errors that are assumed to be

normally distributed when they may in fact be a mixture of distributions; (ii) the scale of

the measured trait may be non-linear requiring the use of transformations on the data

raising questions on whether transformation of trait values is correct; (iii) the use of

small sample sizes; (iv) the use of statistical tests for QTL that are not independent as

markers are ordered; (v) inappropriate balance between marker density and the extent of

recombination; (vi) the amount of missing data; (vii) the presence of segregation

distortion, which tends to expand a map; and (viii) the influence of non-additive genetic

effects due to epistasis and G×E interactions. All of these issues have been noted as they

need to be considered when conducting, interpreting or using the results of a QTL

detection analysis in any situation, including marker-assisted selection applications.

2.2.2.6 Marker-assisted selection A major motivation for QTL detection analysis in plant breeding has been to

generate knowledge of the genetic architecture of a trait (Mackay 2001, 2004) to enable

marker-assisted selection in a breeding program (Lande and Thompson 1990, Open-

shaw and Frascaroli 1997, Jansen et al. 2003, Podlich et al. 2004). Throughout this

thesis marker-assisted selection is considered to be the integration of information from

marker-based QTL detection and traditional phenotypic-based selection methods to find

genetically superior individuals. The marker information can be used to select individu-

als early in a breeding population with certain desired markers to progress through the

program and undergo phenotypic selection. Marker-assisted selection is generally used

for traits that have a low heritability, or where other means of selection are difficult and

economically unjustified. The use of markers and QTL associations to select indirectly

effectively increases the heritability of economically important traits (Stuber et al.

1992).

The potential impact of marker-assisted selection can be examined as a special

case of indirect selection. The indirect response to selection can be estimated by

Equation (2.2), given by Falconer and Mackay (1996, Equation [19.6]):

Page 60: Narelle Kruger PhD thesis

26 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

σΔ =| xy yy x x x y g pG i h h r , (2.2)

where Δ |y xG is the genetic change in trait y brought about by selection on trait x; xi is

the selection intensity (defined as a standardised selection differential) applied to trait x;

xh and yh are square roots of the heritability for traits x and y, respectively; xygr is the

genetic correlation between trait x and y, which can be defined on an additive or

genotypic variation basis; and σyp

is the square root of the phenotypic variance of trait y.

Assuming that trait x is a marker and trait y is a QTL which is to be manipulated by

selection based on the marker trait x, then the following assumption and simplification

to Equation (2.2) can occur. For a reliable polymorphic marker, it is assumed that the

heritability of marker trait x is =2 1.0xh . Thus, substituting = 1.0xh into Equation

(2.2) gives Equation (2.3),

σΔ =| xy yy x x y g pG i h r . (2.3)

Recognising that σσ= gy

pyyh , Equation (2.3) can be further simplified by substituting this

form of yh into Equation (2.3) and cancelling the two σyp

terms:

σΔ =| xy yy x x g gG i r (2.4)

From this form (Equation 2.4) the indirect response to selection for trait y is a function

of the selection intensity applied to trait x (the marker), the genetic correlation between

traits x and y, and the extent of genetic variation for trait y. Here the genetic correlation

can be interpreted in terms of the strength of the linkage between the QTL for trait y and

marker x.

Further, it is informative to compare Equation (2.2) for indirect response to

selection with a comparable equation for direct response to phenotypic selection

Equation (2.1)

Page 61: Narelle Kruger PhD thesis

CHAPTER 2 REVIEW OF LITERATURE

27

σ

σΔ

|2xy y

y

x x y g py x

y y y p

i h h rGG i h

. (2.5)

Recalling that in the case where trait x is a marker where = 1.0xh and cancelling the

common terms, Equation (2.5) becomes,

Δ

| xyx gy x

y y y

i rGG i h

. (2.6)

From Equation (2.6) it can be seen that indirect selection for trait y (QTL) using the

marker will be more efficient than direct selection on the phenotype of the QTL when

>xyx g y yi r i h . Therefore, the relative efficiency of a marker-assisted selection strategy

can be examined in terms of the degree of linkage between the marker and the QTL

( )xygr and the heritability of the target trait y. It is also important to consider the case

where there is potential to apply greater selection intensity to the markers ( )xi than

directly to the trait phenotype ( )yi . Throughout this thesis, components of this quantita-

tive framework will be used in combination with computer simulation to evaluate the

relative merits of direct selection on phenotypic variation (phenotypic selection),

indirect selection on markers alone (marker selection) and selection on a combination of

phenotypic and marker information (marker-assisted selection).

An important consideration in utilising QTL that are detected in a mapping

population is determining whether they are still valid, or even segregating in the

breeding population. Mapping populations are usually developed from two contrasting

inbred parents to create a large amount of genetic variance and a much higher heritabil-

ity than what would be observed in a typical breeding population. Many studies have

suggested that markers associated with agronomic traits offer a great potential for use in

marker-assisted selection (Lande and Thompson 1990, Lande 1992, Dudley 1993, De

Koyer et al. 2001). However, successful and practical examples of marker-assisted

selection in a breeding program are rare (Young 1999). Since one of the goals of QTL

Page 62: Narelle Kruger PhD thesis

28 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

detection analysis is to provide the foundation for marker-assisted selection programs, it

may be useful to identify QTL that have already been selected in a breeding population.

The argument in support of this approach is that the QTL that have already been under

the influence of selection have demonstrated value for the trait in the reference breeding

populations.

To enhance the rate of genetic gain the use of marker-assisted selection tech-

niques in a breeding program needs to demonstrate that as a breeding strategy it is

capable of producing greater genetic gains than those observed with phenotypic

selection. Heritability plays an important role in maximising the response from marker-

assisted selection relative to phenotypic selection, as marker-assisted selection is

increasingly more effective relative to phenotypic selection as heritability decreases.

However, as heritability decreases, the power of experiments to detect QTL will also

decrease. Therefore, maximising marker-assisted selection relative to phenotypic

selection is theoretically harder as heritability decreases (Knapp 1994). A number of

simulation studies have been conducted to compare marker-assisted selection and

phenotypic selection (Zhang and Smith 1992, 1993, Edwards and Page 1994, Gimelfarb

and Lande 1994a, 1994b, 1995, Whittaker et al. 1995, Hospital and Charcosset 1997,

Whittaker et al. 1997, Cooper and Podlich 2002). A general conclusion can be drawn

from all these papers, that under the models tested, marker-assisted selection is capable

of producing a rapid response to selection, which declines with time relative to

phenotypic selection. The decline with time is generally due to marker-assisted selection

quickly fixing genes that were identified as important in the population. At the same

time phenotypic selection would also be increasing the frequency of the same genes in

the population over a longer timeframe, therefore continually improving to the point

that marker-assisted selection has already reached.

Marker-assisted selection is a relatively new technique available to plant breed-

ers and is likely to bring about a cultural change in selection methods. Like any new

method, breeders have to be aware of both its advantages in a breeding program and

also its limitations. A breeder needs to understand how markers are detected and how

QTL detection analyses operate to be able to utilise marker-assisted selection effi-

Page 63: Narelle Kruger PhD thesis

CHAPTER 2 REVIEW OF LITERATURE

29

ciently. Appropriate marker maps and relevant QTL need to be readily available to plant

breeders in a form that is easy to utilise in a breeding program. In this thesis, computer

simulation will be used to model an active wheat breeding program (the Germplasm

Enhancement Program) to allow the breeders to observe the response of the breeding

program utilising QTL and marker-assisted selection in comparison to phenotypic

selection. This will provide a basis for determining the potential power of a marker-

assisted selection strategy within the breeding program context and evaluating the

situations where it may fail or succeed for the Germplasm Enhancement Program.

2.3 The Germplasm Enhancement Program Bread wheat (Triticum aestivum L.), is the most important crop in the Australian

grains industry. Australia was forecasted in 2000 to be the 9th largest wheat producing

nation in the world, producing 18.5 million tonnes (Montana Wheat & Barley Commit-

tee 2001). In the 1999/2000 grains season, total wheat production in Australia was

25,012,000 tonnes followed by barley (5,043,000 tonnes) and sorghum (2,163,000

tonnes) (AWB Ltd 2001). Wheat is grown in all Australian states, except the Northern

Territory (Figure 2.3), and is Queensland’s major cereal crop (Douglas 1985), covering

a large proportion of the fertile cropping lands in the south-eastern section of the state.

The Australian Northern Wheat Improvement Program was established to target

wheat breeding for the Queensland and northern New South Wales (northern grains

region, Figure 2.3) growing regions. The environments in the northern region differ

from those in the southern states when examined in terms of G×E interactions (Watson

et al. 1995, Basford and Cooper 1998) and production variability. The variation is

mostly due to the differences in distribution of rainfall. Southern New South Wales,

Victoria and South Australia (southern grains region) receive spring rainfalls to promote

high yields, while northern New South Wales and southern Queensland (northern grains

region) receive summer rainfalls with yields relying heavily on water stored in the

heavy clay soils (Simmonds 1989). This does not preclude the fact that a large amount

of environmental influence exists within the northern grains region (Further discussion

in Section 2.4.3).

Page 64: Narelle Kruger PhD thesis

30 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

LegendMajor growing area

Northern grains region

WesternAustralia

NorthernTerritory

SouthAustralia

Queensland

New SouthWales

Victoria

Tasmania

ACT

Figure 2.3 Outline of the wheat growing areas in Australia and the northern grains region. Adapted from Montana Wheat & Barley Committee (2002)

The aim of the Northern Wheat Improvement Program in the year 2000 was to

develop superior high quality wheat cultivars. This aim is targeted by integrating three

separate breeding programs that employ different breeding strategies (Figure 2.4),

(Cooper et al. 1999a). The main objective of the Germplasm Enhancement Program,

managed from the University of Queensland, is to provide a source of high yielding and

high quality wheat germplasm to the pedigree breeding programs run by the Leslie

Research Centre at Toowoomba and the Plant Breeding Institute of the University of

Sydney at Narrabri. The Germplasm Enhancement Program maintains a long-term

population improvement strategy using combinations of high yielding germplasm from

selected sources around the world with high quality Australian wheat cultivars.

Page 65: Narelle Kruger PhD thesis

CHAPTER 2 REVIEW OF LITERATURE

31

Cultivar

LRC-QDPIToowoomba

PBI-USNarrabri

Overseas GermplasmResearch Programs

Germplasm EnhancementProgram

The University of Queensland

ParentsParents

Figure 2.4 Components and pathways of germplasm transfer for yield improvement in the Australian Northern Wheat Improvement Program: LRC-QDPI represents the Queensland Department of Primary Industries pedigree breeding programs located in Toowoomba at the Leslie Research Centre; PBI-US represents the University of Sydney pedigree breeding pro-grams located in Narrabri; and the Germplasm Enhancement Program is conducted by the University of Queensland (Cooper et al. 1999a)

In the case of the Australian Northern Wheat Improvement Program, the Germ-

plasm Enhancement Program recurrent selection strategy was specifically designed to

exploit an elite source of high yielding wheat lines that were derived from the Veery

cross (Fox et al. 1996), which was developed by The International Center for Maize and

Wheat Improvement (CIMMYT). A number of the lines developed from this cross have

shown consistent high grain yield performance across diverse international multi-

environment trials conducted by CIMMYT (Cooper et al. 1993a, Cooper et al. 1993b)

and also in a range of high and low rainfall conditions in the Australian northern grains

region (Cooper et al. 1994a, 1994b, Cooper et al. 1995, Cooper et al. 1997). Two of the

Veery lines, Seri and Genaro, were identified for further use in the pedigree breeding

program at the Leslie Research Centre. Both Seri and Genaro contain the 1BL/1RS

translocation on chromosome 1B. This translocation has been associated with signifi-

cant quality deficiencies in Australian wheat cultivars (Dhaliwal et al. 1987, Barnes and

McKenzie 1993). In addition, associations between the presence of this translocation

and high yields have been reported in winter wheat (Carver and Rayburn 1994, Schlegel

and Meinel 1994) with mixed results observed for spring wheat (Villareal et al. 1994,

Singh et al. 1998). However, more recent evidence suggests that this association is not

causal and is variable in different backgrounds (Peake 2002). Following sufficient

Page 66: Narelle Kruger PhD thesis

32 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

intermating and recombination to reduce linkage disequilibrium, it is possible to select

to remove the 1RS component of the translocation and still achieve high grain yield

(Peake 2002). At the commencement of this thesis it was considered that conventional

pedigree breeding in the Leslie Research Centre breeding program using bi-parental

crosses with Veery lines Seri and Genaro as one parent and prime hard quality cultivars

as the other parent had not been successful in improving yield to the levels suggested by

the potential of the Veery lines (Cooper 1998). Subsequent investigations have

demonstrated significant and large sources of epistasis for grain yield within these

crosses (Peake 2002, Jensen 2004). Therefore, recurrent selection based on a targeted

base population combining the high yield lines Seri and Genaro with high quality

Australian lines was identified as a viable germplasm enhancement strategy (Fabrizius

et al. 1996). The outcome sought was a breeding strategy that enabled the combining of

high grain yield and high quality in the presence of epistasis and genotype-by-

environment interactions without the presence of the 1BL/1RS translocation. The lines

derived from the Germplasm Enhancement Program would in turn be used as enhanced

parental lines in the Leslie Research Centre and Plant Breeding Institute pedigree

breeding programs.

The current strategy used in the Germplasm Enhancement Program (Year 2000)

is a modified S1 recurrent selection strategy. The goal of recurrent selection is to

maintain the variability of a population for one or more quantitative traits, with minimal

reduction of genetic diversity in the long-term to allow for continued genetic gain

(Hallauer 1981, Strahwald and Geiger 1988, Carver and Bruns 1993, De Koyer et al.

1999). Recurrent selection maintains heterozygosity of loci and promotes crossing over

within gene blocks, which has the potential to release genetic variance and contribute

positively to maximising genetic gain in the long-term. Recurrent selection is most

commonly associated with breeding of allogamous (cross-pollinating) species (e.g.

maize, Hallauer and Miranda (1988)). A recent review of genetic gains (Carver and

Bruns 1993) for grain yield and quality for autogamous (self-pollinating) species

indicates that recurrent selection has been equally, if not more effective, than traditional

breeding methods, such as the pedigree strategy.

Page 67: Narelle Kruger PhD thesis

CHAPTER 2 REVIEW OF LITERATURE

33

Currently the Germplasm Enhancement Program works on a four-year cycle

within a general recurrent selection framework (Figure 2.5: S1). Years one and two are

used for intermating, selection for the traits maturity and height, and seed multiplication

of the S1 families for yield testing. In addition, screening and selection based on the

presence or absence of the 1RS chromosome arm, derived from the 1BL/1RS transloca-

tion, can also be implemented during these stages when necessary (Nadella et al. 2002).

Multi-environment trials of the S1 families are conducted in years three and four and

selection is based on grain yield and grain protein concentration data measured in the

multi-environment trials. This improvement strategy is expected to provide a gradual

increase of favourable allelic frequencies and thus increase the mean of the population

for the selected traits (Fabrizius et al. 1996).

Optimising the allocation of resources to activities within the Germplasm En-

hancement Program to achieve its role in the Northern Wheat Improvement Program is

a complex problem. By testing homozygous lines, e.g. developed as DH lines, rather

than heterogeneous, heterozygous families (S1), selection efficiency can be increased in

a recurrent selection breeding scheme (Griffing 1975, Baenziger et al. 1984). Simula-

tion experiments have been conducted for the Germplasm Enhancement Program to

determine whether a strategy using DH lines can contribute to an increase in the rate of

genetic improvement relative to that achieved by the current S1 strategy. An outline of

how DH lines would be implemented in the Germplasm Enhancement Program is given

in Figure 2.5: DH. Some advantages of using DH lines in the Germplasm Enhancement

Program are considered to be: (i) the plants are completely homozygous in one

generation; (ii) for DH lines, twice as much of the additive genetic variation is parti-

tioned among lines relative to S1 families (Wricke and Weber 1986); and (iii) selection

of superior genotypes should be easier, and more efficient with fixed lines. Some

disadvantages of using DH lines in the Germplasm Enhancement Program are consid-

ered to be: (i) that their production is technically more difficult relative to S1 families;

(ii) their cost of production is high and (iii) with the current DH technology based on the

wheat / maize crossing system (Jensen and Kammholz 1998) they would add an extra

year to the Germplasm Enhancement Program cycle. Preliminary results based on

simulation experiments indicated that for the additive genetic models considered, the

Page 68: Narelle Kruger PhD thesis

34 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

DH line strategy can achieve higher rates of response to selection than the S1 family

strategy (Kruger et al. 1999).

RandomIntermating

Generate DH plants

MET (5 sites)S1 evaluation

MET (5 sites)S1 evaluation

2,000 S1 familiesSample 1,000

10,000 S0 plantsSample 2,000

MET (5 sites)DH Evaluation

MET (5 sites)DH evaluation

Produce DH linesSeed increase

S1 DHYear

1

5

4

3

2

Figure 2.5 Outline of the activities involved in the S1 family and doubled haploid (DH) line breeding strategies over one cycle of the Germplasm Enhancement Program. The S1 activi-ties are adapted from (Fabrizius et al. 1996). MET = multi-environment trial

Hartog and Seri, two of the 10 parents used to establish the base populations

used for forward breeding in the Germplasm Enhancement Program, have been

screened for polymorphic markers and a preliminary amplified fragment length

polymorphism linkage map has been constructed (Nadella 1998). Quantitative trait loci

for four quantitative traits; plant height, days to flower, grain weight, and grain yield

have been located on this map. Eighteen QTL were detected, with two, four, eight, and

four QTL detected, respectively, for each trait. The four QTL for grain yield were also

associated with QTL for plant height and grain weight indicating the inheritance of

grain yield to be a complex multi-trait gene-to-phenotype model (Nadella 1998). An

extension of this work involved creating an integrated map by incorporating the

additional markers found using simple sequence repeats (Susanto 2004). There was

Page 69: Narelle Kruger PhD thesis

CHAPTER 2 REVIEW OF LITERATURE

35

some agreement and validation between the QTL detected for the agronomic traits for

both of the studies (Susanto 2004). Susanto (2004) found three major QTL for yellow

spot caused by (Pyrenophora tritici-repentis) in addition to the detection of five extra

QTL for other traits to extend the work reported by Nadella (1998). These studies have

demonstrated the possibility of finding polymorphic markers and detecting QTL in the

base population of the Germplasm Enhancement Program. Further investigations have

been conducted on the influence of plant height on grain yield in the Hartog/Seri cross

(Peake 2002).

In the Germplasm Enhancement Program the number of parents that was used to

form the starting population was relatively small, with 10 initial parents (11IBWSN50,

Seri 82, Genaro 81, Batavia, Hartog, Janz, QT4646, Sun 276A, Sun 290B, Sunvale,

Fabrizius et al. 1996). These 10 lines were selected following extensive analysis of the

yield performance of a diverse set of lines from the international testing program of

CIMMYT and comparisons with cultivars developed in the Australian northern grains

region target population of environments. The coancestry of these 10 parents has been

studied by pedigree analysis (Fabrizius et al. 1996) and is currently being examined by

use of markers (Susanto et al. 2002, Susanto 2004). Pedigree data indicate an expected

degree of coancestry among the 10 parents, which is supported by the molecular marker

data. To initiate the Germplasm Enhancement Program the 10 parents were intercrossed

in a diallel design followed by one generation of random mating. The individual

progeny from the random mating underwent one generation of selfing to generate the

evaluation units used within the Germplasm Enhancement Program modified S1 family

strategy. This crossing strategy is expected to result in a relatively low frequency of

recombination events and a high level of linkage disequilibrium in the base population

of the Germplasm Enhancement Program. It is expected that the level of linkage

disequilibrium in the DH line strategy will be greater than that of the S1 families

(Powell et al. 1992), as the S1 families, unlike DH lines, have further opportunities to

recombine during selfing after the intermating of the selected lines. Even though it is

expected that a relatively high level of linkage disequilibrium is present in the Germ-

plasm Enhancement Program breeding population, it is expected that there are sufficient

opportunities for recombination to break up some of the parental linkage groups. It was

Page 70: Narelle Kruger PhD thesis

36 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

considered necessary to limit the extent of recombination initially in the Germplasm

Enhancement Program because only a low resolution molecular map was available for

any QTL detection analysis (Nadella 1998). Therefore, it was expected that marker-

QTL associations found in the QTL mapping study (Cooper et al. 1999a) are still likely

to be present in the Germplasm Enhancement Program forward breeding population so

that successful marker-assisted selection can take place. This aspect of transferring the

results of a QTL mapping study to the active breeding program populations to imple-

ment marker-assisted selection will be examined in this thesis.

Testing the feasibility of introducing marker-assisted selection into the Germ-

plasm Enhancement Program is considered an important step in an attempt to increase

genetic gains for this breeding program. Implementing and testing marker-assisted

selection in the Germplasm Enhancement Program as an empirical experiment would be

costly and time consuming. By examining the power of marker-assisted selection in the

Germplasm Enhancement Program through simulation it is feasible to conduct a

comparison of S1 family and DH line selection strategies to determine their ability to

contribute towards accelerated rates of response to selection. If marker-assisted

selection is shown to increase the response to selection of the Germplasm Enhancement

Program under a wide range of genetic models then the Germplasm Enhancement

Program has the potential to produce superior parents for the pedigree programs earlier

than expected. Therefore, this simulation investigation will provide useful information

in any decisions on whether to use marker-assisted selection in future cycles of the

Germplasm Enhancement Program.

2.4 Genotype-environment factors influencing response to selection 2.4.1 Introduction

Understanding the influence of the genetic architecture of a trait on response to

selection is one of the most important aspects of the application of quantitative genetics

to plant breeding. Without the complications introduced by interactions among genes

and between genes and the environment, a phenotype would more closely resemble its

Page 71: Narelle Kruger PhD thesis

CHAPTER 2 REVIEW OF LITERATURE

37

genotype and improving breeding populations would be simplified by selecting the best

phenotypes based on performance in one environment. The aspects of the genetic

architecture of a trait considered of importance in this thesis, because of their effect on

gene-to-phenotype relationships, are epistasis (gene×gene interactions) and G×E

interactions. Mackay (2001, 2004) has given recent reviews of the expanding body of

experimental evidence indicating the importance of these factors in the genetic

architecture of quantitative traits, based predominantly on work in the model organism

Drosophila melanogaster.

From Equation (2.7), the impact of epistasis and G×E interactions on response to

selection, when selection is based on phenotypes, can be evaluated. In classical

quantitative genetic theory phenotypic variance ( )2Pσ is the sum of the genotypic

variance ( )2Gσ , the environmental variance ( )2

Eσ , the variance due to the interaction of

genotypes and the environment ( )2G Eσ × and the variance due to experimental error ( )2εσ ,

2 2 2 2 2P G E G E εσ σ σ σ σ×= + + + . (2.7)

In quantitative genetics and statistical theory, epistasis (intergenic interaction), along

with dominance (intragenic interaction), are non-additive forms of the genotypic

variance for a trait, 2 2 2 2 2 2G A D AA AD DDσ σ σ σ σ σ= + + + + , where ( )2

Aσ is additive variance,

( )2Dσ is the dominance variance, and ( )2

AAσ is the additive×additive, ( )2ADσ addi-

tive×dominance, and ( )2DDσ dominance×dominance components of digenic epistatic

variance. Therefore, both epistasis and G×E interactions have important roles in

determining the response to selection of a breeding population (Lynch and Walsh 1998).

Selecting for combinations of genes is further complicated when linked genes

recombine, and coupling and repulsion combinations of alleles change over generations.

When recombination occurs, the original intergenic allele combinations are broken up

and new combinations are created in populations where substantial linkage disequilib-

rium exists. Recombination can cause problems when unfavourable QTL alleles end up

Page 72: Narelle Kruger PhD thesis

38 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

being linked with what are designated as the favourable marker alleles within a mapping

study. Incorrect determinations of marker-QTL allele associations will effect the

response of marker-assisted selection in a breeding program (See Sections 2.2.2.1 and

2.2.2.5).

When epistasis and G×E interactions are significant components of the genetic

architecture of a trait, they both have influential roles in the detection of QTL (Mauricio

2001, Dekkers and Hospital 2002, Doerge 2002) and ultimately the efficiency of

marker-assisted selection in a breeding program (Cooper and Podlich 2002, Podlich et

al. 2004). Each factor has the ability to impact and complicate selection and cause

realised response to selection from a marker-assisted selection strategy to be less than

expected, unless their influences are managed appropriately. Potential influences of

these factors on response for marker-assisted selection in the Germplasm Enhancement

Program will be considered in a series of simulation experiments in this thesis.

2.4.2 Epistasis Epistasis is the interaction between alleles at different loci, and is a form of non-

additive gene action. Epistasis was first defined by Bateson (1909) to describe the

interaction between genes where the action of one gene blocked or masked the action of

another gene. Fisher (1918) expanded this concept to include quantitative differences

between genotypes, concluding that “epistacy” is the remaining genetic variance not

attributable to additive and dominance effects. Wright (1932) took a more biological

approach and viewed epistasis as the functional interaction between genes. The debate

between the Fisher and Wright theories remains today, as there is still no powerful

statistical means to detect epistatic effects (Wu 2000). However, within the field of

quantitative genetics the Fisherian model is the more widely accepted and used

definition.

In principle, genetic experiments are able to detect epistasis as a genetic compo-

nent of quantitative trait variation (Cheverud and Routman 1993, Whitlock et al. 1995).

Population designs have been specifically created to help study epistatic components. A

generation means analysis, based on six generations, was proposed by Hayman (1958)

Page 73: Narelle Kruger PhD thesis

CHAPTER 2 REVIEW OF LITERATURE

39

as a method to estimate genetic effects attributable to the cross mean, additivity,

dominance and epistasis (additive×additive, additive×dominance and domi-

nance×dominance epistasis) while Kearsey and Jinks (1968) suggested the use of a

triple testcross. Crow and Kimura (1979) proposed that epistasis can be detected by

comparing covariances among relatives and comparing means of different types of

hybrids. An analysis of variance (Fisher 1918) is the most popular method for detecting

the presence of epistatic effects as a specific epistatic component of variance defined in

relation to a linear statistical genetic model. However, Whitlock et al. (1995) listed a

number of problems with epistatic genetic variance as a measure of epistasis, which

include: (i) the analysis of variance techniques in the detection of epistasis being biased;

(ii) confidence limits on genetic variance components, and particularly epistatic

variance, are generally large; (iii) in artificial environments, G×E interaction can

obscure the true nature of genetic variation in natural environments; and (iv) epistasis

can be concealed by linkage disequilibrium. These issues still exist with the analysis

methods used today, and epistasis remains a difficult component of the genetic

architecture of a trait to measure.

The complexity associated with selecting superior genotypes in a breeding pro-

gram when epistasis is an important component of the genetic architecture of a trait can

be illustrated through example. Achieving a high response to selection in a breeding

population requires favourable allele combinations in the population of genotypes to be

fixed. When two genes are interacting the contribution value of an allele at one locus is

dependent on the genotype at the other locus (Kauffman 1993, Wade 2001, Cooper and

Podlich 2002). Therefore, in a population situation it can be difficult to determine the

favourable allele as there are many possible genetic contexts. Using an example of

additive×additive digenic epistasis (Figure 2.6), which would be the favourable allele at

locus A? From Figure 2.6 the answer to this question clearly depends on which allele is

present at the locus B. A context-free favourable allele at each locus does not exist, the

favourable allele combinations across all the loci interacting in the epistatic network

need to be found. This example has only two interacting genes and the nature of the

interaction is simple in relation to the many alternative forms the interaction can take.

Page 74: Narelle Kruger PhD thesis

40 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

The relationship between gene action effects and statistical genetic effects for a

character become less direct as the number of interacting loci increases (Holland 2001).

bb Bb BB

Gen

otyp

ic V

alue

0

1

2

3

4

5

6

7

8

9

10

aaAaAA

Figure 2.6 Example of additive×additive epistatic interaction. Shows favourable allelic combinations aabb and AABB give the highest genotypic value

Epistasis has been argued to be of little importance in response to selection be-

cause of its apparent small effect when it has been experimentally investigated (Crow

and Kimura 1979). This argument has been widely accepted without a comprehensive

understanding of the power of the statistical methods used to determine the important

features of epistasis in quantitative traits. However, it is recognised that accurate and

precise experimental estimation of the epistatic effects of genes is extremely difficult.

Baker (1984) and Wricke and Weber (1986) suggested that epistasis seems to have little

impact on selection strategies and on optimum allocation of breeding resources, yet

Baker did note that epistasis may have a much greater impact on inbred crop species

than cross fertilised species. In wheat, epistasis may occur for quantitative traits

between both homologous and non-homologous chromosomes (Snape et al. 1975).

Rahman et al. (2003) showed epistasis to be an influencing component in both wheat

quality and yield characters influencing plant height, spikes per plant, spike length,

grains per spike, 1000-grains weight, grain yield per plant and protein content. A

significant epistatic effect in wheat has also been reported by Goldringer et al. (1997),

who found epistatic variance to be almost twice as large as the additive variance for

grain yield.

Page 75: Narelle Kruger PhD thesis

CHAPTER 2 REVIEW OF LITERATURE

41

Although there is an expectation that epistasis can be an important factor in the

genetic variation of quantitative traits (Carlborg and Haley 2004), QTL mapping studies

rarely explicitly deal with its effects (Ohno et al. 2000). Reviews on epistasis and QTL

studies show that few epistatic interactions are important for determining the pheno-

types of interest (Cheverud and Routman 1993, Tanksley 1993). Significant interactions

between QTL are also generally difficult to identify (Tanksley 1993). Lukens and

Doebley (1999) gave three reasons why QTL mapping may under estimate the number

of non-additive interactions including: (i) the presence of two-locus double homozygous

classes at low frequency, even in large mapping populations, decreases statistical

power; (ii) mapping populations can be segregating for many QTL that may interfere

with detection of an interaction between loci under consideration; and (iii) the need to

impose high significance thresholds as detecting epistatic interactions requires many

statistical tests. The interpretation of the importance of epistasis continues as the results

from QTL studies accumulate. Holland (2001) reported that many QTL studies have

shown epistasis to be an important component of genetic variance on plant yield and

fitness. Tanksley (1993) considered that QTL studies suggest that strong epistatic

interactions are the exception rather than the rule, and that epistatic effects are more

likely to be detected between QTL in near isogenic lines that can be replicated to allow

a more precise measurement of epistasis. Carlborg and Haley (2004) argue that epistasis

should become a factor that is routinely accounted for as it has generally been over-

looked. In any case the importance of epistasis remains an issue with varied views and

in the process of designing a breeding program a breeder must take it upon themselves

to consider the potential impacts of epistasis on response to selection in their context. In

this thesis epistasis will be considered as an important factor in the evaluation of

selection strategies within the context of the Germplasm Enhancement Program. This

focus is supported by growing evidence from a series of experiments conducted with the

wheat germplasm targeted for forward breeding by the Germplasm Enhancement

Program (Peake 2002, Jensen 2004).

The effect of epistasis for the target traits and germplasm of the Germplasm En-

hancement Program is still under investigation. The evidence accumulated to date

indicates that epistasis and epistatic effects that are conditional on environmental

Page 76: Narelle Kruger PhD thesis

42 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

conditions (referred to here as epistasis×environment interactions, see also Mackay

(2004) for further discussion of this topic) could be of importance in the northern wheat

region. Fabrizius et al. (1997) used variance components analysis to test for epistatic

effects in crosses derived from parents of the Germplasm Enhancement Program and

reported that there was little evidence to suggest significant epistatic effects on grain

yield (Table 2.1). A striking feature of this investigation was the presence of a negative

additive×additive epistasis×environment interaction variance component that was more

than twice the magnitude of its standard error for both crosses. This is indicative of

problems with the statistical genetic models applied to these experiments. The presence

of strong G×E interactions in combination with a relatively small sample of diverse

environments may be a dominant feature of this experiment. A conclusive statement

cannot be based on only two crosses. Peake (2002) studied three other Germplasm

Enhancement Program parent crosses (Hartog/Seri, Hartog/Genaro and Har-

tog/11IBSWN50) in eight environments and found evidence of significant addi-

tive×additive epistatic variance for grain yield in the Hartog/Seri cross. Addi-

tive×additive epistasis×environment interaction variance was also detected for yield,

adding to the complexities involving epistasis. Further investigations by Peake (2002)

based on comparisons of recombinant inbred line population means with mid-parent

means provided stronger evidence for the important effects of additive×additive

epistasis for grain yield in crosses between lines that were important founding crosses

for the Germplasm Enhancement Program. Subsequent work by Jensen (2004) has

confirmed these findings by Peake (2002). Therefore, it is highly likely that epistasis

plays a significant role in the genetic variation for the target traits of interest to the

Germplasm Enhancement Program and could influence the success and outcomes of the

Germplasm Enhancement Program. Therefore, the experimental investigations

conducted by Peake (2002) and Jensen (2004) provide further motivation for the

theoretical consideration of epistasis within this thesis.

The selection strategies applied in the context of the Germplasm Enhancement

Program are examined in this thesis to determine whether they are able to deal with

some of the potential effects of epistasis. The thesis will also determine how robust the

strategies are in developing superior genotypes from individuals with complex genetic

Page 77: Narelle Kruger PhD thesis

CHAPTER 2 REVIEW OF LITERATURE

43

architectures, e.g. including the presence of epistasis. A simulation modelling approach

that is an extension of the approach used by Jensen (2004), and based on the framework

discussed by Cooper and Podlich (2002), is used to model epistasis in this thesis.

Table 2.1 Estimated variance components (±s.e.) relative to F2 for grain yield (t ha-1) of re-combinant inbred lines derived from 11IBSWN50/Vasco and Hartog/Vasco crosses tested in Queensland in 1989. Extract of Table 3 (Fabrizius et al. 1997) Variance Component 11IBSWN50/Vasco Hartog/Vasco additive -0.005±0.028 -0.030±0.032 additive×environment 0.146±0.042 0.192±0.052 additive×additive epistasis 0.006±0.012 0.018±0.013 additive×additive epistasis×environment -0.068±0.017 -0.060±0.019

Epistasis can influence the outcomes of marker-assisted selection in a breeding

program in a number of ways. Epistasis introduces complexity in the process of

determining marker-trait associations. In the presence of epistasis the identified marker-

trait associations will be context dependent (e.g. Figure 2.6). This can result in favour-

able marker alleles being associated with unfavourable QTL alleles as the genetic

background changes. Selection on markers that identify the favourable QTL alleles in

the mapping study context, that are unfavourable in the breeding program context, will

cause the response to selection of the breeding program to decrease as the frequency of

the unfavourable alleles increases in the population due to selection on the marker-QTL

allele association. QTL detection analysis programs are capable of detecting epistatic

QTL, and if epistasis is found to be important, then genomic tools can be used to

identify the nature and components of interacting genic systems and marker-assisted

selection schemes can be designed to exploit epistasis (Holland 2001). The potential for

selection to exploit epistasis will be considered in a context that is relevant for the

Germplasm Enhancement Program in this thesis.

2.4.3 G×E interactions When genotypes are compared in different environments, their performance

relative to each other may change giving rise to G×E interactions. Genotype-by-

environment interactions have a large impact on response to selection for grain yield of

wheat in the Australian target production environments, particularly the genotype-by-

site-by-year (G×S×Y) component of the G×E interactions (Brennan et al. 1981, Basford

Page 78: Narelle Kruger PhD thesis

44 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

and Cooper 1998). Genotype-by-environment interactions can result in changes in the

rank of genotypes in different environmental conditions (Haldane 1947, Comstock and

Moll 1963). In the presence of G×E interactions that change the ranks of the genotypes,

one genotype may have the highest yield in some environments and a second genotype

may excel in others. Therefore, G×E interactions can be a major complication in the

study of quantitative traits as they: (i) make the interpretation of genetic experiments

dependent on the environmental context; and (ii) can reduce the repeatability of

experimental results when validation is conducted and realised genetic gain is evaluated

in different environmental contexts. In turn these effects of G×E interactions make

predictions difficult and reduce the efficiency of selection (Kearsey and Pooni 1996).

To emphasise the different influences of G×E interactions on the efficiency of

selection the effects are sometimes categorised into interactions due to:

(i) no G×E interaction (Figure 2.7: Type 1);

(ii) heterogeneity of genetic variance among environments (Robertson 1959), i.e.

the ranking of the genotypes does not differ between environments, only the

magnitude of the differences between the genotypes in each environment

changes, therefore the same genotypes are selected regardless of environment

and prediction of response to selection is not complicated by changes in rank of

genotypes (Figure 2.7: Type 2); or

(iii) lack of genetic correlation among environments (Robertson 1959), i.e. this

source of interaction can result in cross-over interactions, where reranking of

the genotypes occurs and a genotype that performs well in one environment,

does not perform well relative to the other genotypes in other environments;

this form of G×E interaction greatly complicates the selection decisions in a

breeding program (Figure 2.7: Type 3), particularly if there is no knowledge of

the environmental contexts that give rise to the G×E interactions.

Page 79: Narelle Kruger PhD thesis

CHAPTER 2 REVIEW OF LITERATURE

45

Type 1

EnvironmentE1 E2

Gen

otyp

ic v

alue

0

1

2

3

4

5Type 2

EnvironmentE1 E2

0

1

2

3

4

5AB

Type 3

EnvironmentE1 E2

0

1

2

3

4

5AB

AB

Figure 2.7 Classification of genotype-by-environment (G×E) interactions, A and B are two genotypes and lines represent the responses of the genotypes in two environments; type 1 parallel response (no G×E interaction), type 2 non-crossover response, type 3 crossover re-sponse

The analysis of variance (Fisher 1926) has been used to partition total pheno-

typic variation into components due to genotype, environment, G×E interaction and

experimental error (Brennan and Byth 1979, DeLacy et al. 1990). The relative sizes of

the variance components are frequently used to quantify the magnitude and importance

of G×E interactions. The influence of G×E interactions in a breeding program is a

problem when the ratio of the G×E interaction to genotypic variance ( )2 2:G G Eσ σ × is high

(Cooper and DeLacy 1994). Studies in the northern grains region have outlined the

importance of accounting for G×E interaction (Brennan and Byth 1979, Brennan et al.

1981, Cooper et al. 1994a, 1994b, Cooper et al. 1995, Watson et al. 1995, Cooper et al.

1996b, Fabrizius et al. 1997, Basford and Cooper 1998). A study of 49 wheat lines in

six environments in Queensland showed grain yield to have a high G×E interaction

component, with 86% of the G×E interaction component being attributed to the lack of

correlation of genotypes among the six environments (Table 2.2). An experiment

involving progeny from crosses based on a subset of the lines used as the parents in the

Germplasm Enhancement Program (Fabrizius et al. 1997) also found a significant

amount of the phenotypic variance among the lines for yield was attributed to G×E

interactions (Table 2.3). These findings reinforce the importance of G×E interactions for

grain yield of wheat in experimental studies and breeding program trials conducted in

the northern grains region of Australia.

Page 80: Narelle Kruger PhD thesis

46 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

Table 2.2 Estimates of genetic parameters for grain yield (t ha-1) of 49 wheat lines tested in six environments in Queensland. Extract of Table 10.1 (Cooper et al. 1996b)

Statistic Estimate % of 2G Eσ ×

Genetic variance component ( )2Gσ 0.029 27

G×E interaction variance component ( )2G Eσ × 0.108 -

Heterogeneity of genotypic variance 0.015 14 Lack of genetic correlation 0.094 86

Table 2.3 Estimated variance components (±s.e.) for grain yield (t ha-1) of recombinant in-bred lines derived from two crosses, 11IBSWN50/Vasco and Hartog/Vasco, tested at three sites in Queensland in 1989. Extract of Table 2 (Fabrizius et al. 1997)

Variance component 11IBSWN50/Vasco Hartog/Vasco Genotypic F2 0.001±0.018 -0.011±0.021 G×E interaction F2×Site 0.077±0.027 0.132±0.034

The focus on G×E interactions for yield of wheat in the northern grains region

arises because the interactions are of sufficient magnitude to introduce uncertainty into

the process of selection among genotypes, especially when selection is based on their

phenotypic performance in a relatively small sample of environments taken from the

target population of environments (Cooper and DeLacy 1994), as occurs in the case of

the Germplasm Enhancement Program. The Germplasm Enhancement Program utilises

two years of multi-environment trials to accommodate the effects of the G×S×Y

interactions that are encountered, as a strategy to improve S1 family mean heritability,

and thus improve the expected response to selection. The traditional S1 recurrent

selection strategy, as described for maize (Hallauer and Miranda 1988), works on a

three year cycle, using only one year of multi-environment trials. The first two years

involve similar steps to those conducted for the Germplasm Enhancement Program,

intermating and production of S1 families. The traditional S1 selection strategy has been

applied in maize breeding for target environment populations where G×S×Y interac-

tions are not considered to be sufficiently large as to warrant two years of multi-

environment trials (Hallauer and Miranda 1988). For wheat yields in the northern grains

region, the incidence of large G×S×Y interactions requires at least two years of multi-

environment trials (Brennan et al. 1981, Cooper et al. 1996a). Hence the modification

of the S1 recurrent selection strategy for the Germplasm Enhancement Program involves

an additional year of multi-environment testing of the S1 families. The use of a DH line

Page 81: Narelle Kruger PhD thesis

CHAPTER 2 REVIEW OF LITERATURE

47

strategy in place of S1 families has previously been investigated for a range of genetic

models with different levels of G×E interaction (Kruger 1999). This preliminary work

indicated that the use of a DH line strategy for yield testing can out perform S1 families

at both high and low heritabilities in the presence of G×E interactions (Kruger 1999)

and will be assessed in more detail in this thesis.

Breeding for any trait, whether it utilises information from QTL mapping studies

or not, needs to optimise selection for a population of target environments (Comstock

and Moll 1963, Gardner 1963). In general, crossover interactions can have a strong

negative effect on the outcome of marker-assisted selection if the incorrect allele for the

target population of environments is detected as favourable based on the environment-

types sampled in the QTL mapping study. With non-crossover G×E interactions,

unfavourable marker-QTL allele associations are less likely to be identified in mapping

studies as re-ranking of genotypes across environment-types does not occur as in the

cases where crossover interactions occur. Where there are no crossover interactions, the

case may be that the QTL selected do not affect the trait across important target

environments or QTL that do affect the trait across target environments may be missed

(Knapp 1994). Where there is heterogeneity of genetic variance and no crossover

interaction the differences between QTL genotype means within some environments is

greater than across environments. Conducting multi-environment QTL mapping

experiments will help determine whether a quantitative trait loci-by-environment

(QTL×E) interaction is present in a study (Knapp 1994).

Problems with poor representation of the target environments in multi-

environment QTL studies may result in the fixing of alleles at QTL that have no mean

effect across a target population of environments. When conducting marker-assisted

selection for broad adaptation response in a target population of environments, it has

been argued that only the mean QTL genotypes across environments needs to be

estimated. However, this method will fail if every QTL manifests a crossover and the

test environments did not uncover these interactions. These are unlikely events if both

the test and target environments were carefully selected (Knapp 1994). However,

studies have suggested that a large proportion of QTL (especially major QTL) affecting

Page 82: Narelle Kruger PhD thesis

48 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

a quantitative trait in one environment will be active in other environments, which is a

positive result when the objective is to develop lines for a range of environments using

markers (Tanksley 1993). As G×E interactions are important for grain yield of wheat in

the northern grains region of Australia, the potential effects of QTL×E interactions on

selection response from the Germplasm Enhancement Program will be examined in the

simulation studies conducted in this thesis. Developing an ability to understand and

characterise the extent and specific types of QTL×E interactions will be useful in

optimising both marker-assisted selection and conventional breeding.

2.5 A role for computer simulation in the analysis of genetic systems 2.5.1 Background

Reviews by Scheinberg (1968), Fraser (1970) and Kempthorne (1988) have in-

dicated the importance of computer simulation in the field of genetics. Computer

applications in selection theory were first investigated by Fraser (1957a). Following

this, a number of pioneering papers on simulating simple genetic models emerged

(Fraser 1957b, Martin Jr and Cockerham 1960, Young 1966, Cress 1967, Young 1967,

Baker 1968, Bliss and Gates 1968, Qureshi 1968, Qureshi and Kempthorne 1968,

Qureshi et al. 1968, Casali and Tigchelaar 1975, Snape and Riggs 1975). Computer

simulation as a tool utilised in genetics has received a fairly constant linear increase

over the past 34 years with around 3000 papers published in the last five years (Figure

2.8).

With the introduction of high speed, user-friendly personal computers in the last

10-20 years, an extensive use of computer simulation in genetics and plant breeding has

occurred (Figure 2.8). Weir and Cockerham (1977) and Kempthorne (1988) have both

outlined the restrictions placed on quantitative genetics theory as it attempts to model

realistic situations involving multiple loci, multiple alleles, inbreeding, linkage and

selection. Simulation can be a powerful tool for assessing the ability of breeding

programs to deal with these factors. Recently, applications of computer simulation in

plant breeding have been an area of focus at the University of Queensland, including

Page 83: Narelle Kruger PhD thesis

CHAPTER 2 REVIEW OF LITERATURE

49

work by Podlich and Cooper (1998), Podlich et al. (1999), Cooper et al. (1999d),

Cooper et al. (1999c), Kruger (1999), Kruger et al. (1999, 2001, 2002), Wang et al.

(2001), Wang et al. (2003) and Ye et al. (2004).

Year1970-1974

1975-19791980-1984

1985-19891990-1994

1995-19992000-2003

Num

ber o

f arti

cles

0

500

1000

1500

2000

2500

3000Simulation & genetic*Simulation & plant breeding

Figure 2.8 Number of articles published in the last 34 years with “simulation” and either “genetic*” or “plant breeding” as words anywhere in the AGRICOLA (1970-12/2003), CAB (1984-1/2004), and Biological Abstracts (1984-12/2003) databases. Note: some article du-plication may have occurred. * represents all extensions of genetic. Each category contains five years, except the last which contains four years

As methods for modelling plant-breeding programs have advanced, so has the

use of simulation in the more specialised areas of plant breeding, like marker-assisted

selection. Marker-assisted selection is a relatively new area of genetics and with nearly

5% of all papers on marker-assisted selection containing the term simulation in the last

four years, its utilisation in these more complex breeding systems has been increasing

(Figure 2.9). The ability to model marker-assisted selection more accurately should

improve as the understanding of the influential factors, including recombination,

epistasis and G×E interactions, improves. Consequently, it is expected that the number

of experiments dealing with simulation and marker-assisted selection will increase with

time. To date most QTL studies have focussed on theoretical issues or applications to

experimental situations for traits, where the objective is to study the genetic architecture

of the trait. Broadening the range of field and simulation experiments investigating

marker-assisted selection will help create a better understanding of the situations in

which QTL detection and marker-assisted selection can operate successfully.

Page 84: Narelle Kruger PhD thesis

50 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

The use of computers in utilising the information available to model breeding

strategies and help develop new cultivars is an important tool available to plant breeders

(Rafalski and Tingey 1993). As there is limited time to conduct many experimental

cycles of marker-assisted selection in a breeding program, a substantial component of

marker-assisted selection research has been based on computer simulation, (e.g. Zhang

and Smith 1992, 1993, Edwards and Page 1994, Gimelfarb and Lande 1994a, 1994b,

1995, Whittaker et al. 1995, Hospital and Charcosset 1997, Hospital et al. 1997, Frisch

and Melchinger 2001a, 2001b). Simulation studies of marker-assisted selection based on

a range of genetic models allows the plant breeder to observe important trends that

occur in the breeding population that may also be expected to occur under a range of

conditions. Applied to marker-assisted selection, this approach can be used by the

breeder to judge how marker-assisted selection can be incorporated efficiently into their

breeding program.

While simulation may prove a more powerful technique for modelling the proc-

esses of a genetic system over quantitative genetic theory by relaxing some of the

theoretical assumptions, it is important to remember that simulation also brings about a

new set of assumptions and limitations (Podlich and Cooper 1998). Simulation provides

a different platform to theoretical equations to test theories, allowing different properties

and processes of a genetic system to be modelled. Like quantitative genetic theory,

simulating a genetic system is not attempting to model the intricacies of a “real life”

model, but is attempting to capture the key properties and processes specific to the

system that is being modelled. For this thesis the goal of using simulation was to model

complex quantitative trait genetic models and the design of a breeding program, using

key properties and processes that focussed on achieving these two goals. By knowing

the key properties and processes that needed to be modelled, these complex processes

were able to be simplified. Improving the results obtained through simulation, by

relaxing some of the simplifying assumptions applied in the simulation experiments,

can be achieved by gaining a greater understanding of the underlying biological

processes modelled and obtaining more reliable data through experimental work. For

this thesis, previous experimental studies have aided in improving the modelling process

and ultimately the results of the simulations (Cooper et al. 1999a, Cooper and Podlich

Page 85: Narelle Kruger PhD thesis

CHAPTER 2 REVIEW OF LITERATURE

51

1999, Podlich 1999, Podlich and Cooper 1999, Cooper et al. 2002a, Cooper and Podlich

2002, Peake 2002, Chapman et al. 2003, Wang et al. 2003, Hammer et al. 2004, Jensen

2004, Peccoud et al. 2004, Podlich et al. 2004) and will contribute to the improvement

of future modelling work.

Year1970-1974

1975-19791980-1984

1985-19891990-1994

1995-19992000-2003

Num

ber o

f arti

cles

0

500

1000

1500

2000

2500Marker assistedSimulation & marker assisted

Figure 2.9 Number of articles published in the last 34 years with “marker assisted” or “marker assisted and simulation” as words anywhere in the AGRICOLA (1970-12/2003), CAB (1984-1/2004), and Biological Abstracts (1984-12/2003) databases. Note: some article duplication may have occurred. Each category contains five years, except the last which contains four years

The focus of this thesis will be on uses of computer simulation as a tool to assist

the plant breeders responsible for the Germplasm Enhancement Program evaluate the

current breeding phenotypic selection strategy against the marker selection and marker-

assisted selection breeding strategies considered. A wide range of genetic scenarios will

be considered and the influences of a number of variables, including population size,

heritability, number of QTL and number of markers, will be examined. It is argued that

by enabling a greater understanding of the average expectations, the variability of the

outcomes and the distribution of these outcomes, computer simulation will assist in the

effective implementation of complex breeding strategies like marker-assisted selection

into the Germplasm Enhancement Program wheat breeding program.

Page 86: Narelle Kruger PhD thesis

52 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

2.5.2 The QU-GENE simulation platform The QU-GENE software is a computer simulation platform developed at the

University of Queensland for the quantitative analysis of genetic models (Podlich and

Cooper 1998). The QU-GENE software was developed with a modular structure (Figure

2.10) and consists of two major component levels:

(i) the genotype-environment system engine (QUGENE), which is used to define

the genetic models to be examined; and

(ii) the application modules that examine properties of the genotype-environment

system by investigating, analysing or manipulating a model of a population of

genotypes for a target population of environments that was created within the

QUGENE engine.

QUGENEGenotype-Environment

System

HMSSLTHalf MassSelection

DHAPDoubled Haploids

MSSLTMass Selection

HSRRSHalf-Sib ReciprocalRecurrent Selection

HGEPRSSHalf GermplasmEnhancement GEPRSS

GermplasmEnhancement

PEDIGREEPedigree

GEXPGenetic

Experiments

Figure 2.10 Schematic outline of the QU-GENE simulation software. The central ellipse shows the engine and the surrounding boxes show the application modules (Podlich and Cooper 1997, 1998)

An important feature of QU-GENE is that it allows the relaxing of some of the

common assumptions and simplifications that are used within the algebraic theoretical

equations when predicting population values (Podlich and Cooper 1998). As the number

of parameters in a mathematical equation increases, more assumptions may be required

to make the solutions to the equation mathematically tractable. Often the implications of

the invalidation of these assumptions are not fully understood, but it is likely that they

result in undesirable statistical properties, such as biased estimation of genetic proper-

ties of the model. Where it is desirable to relax the model assumptions, computer

simulation provides a tractable estimation procedure and potentially more appropriate

Page 87: Narelle Kruger PhD thesis

CHAPTER 2 REVIEW OF LITERATURE

53

answers to be formulated. With the speed of computers continuously increasing, the

availability of enhanced computer software and the ability to cluster computers for

higher experiment throughput (Micallef et al. 2001), computer simulation is becoming a

powerful tool for the quantitative geneticist. With the ability of the simulation software

to accommodate a range of specific breeding program parameter values, examine

different breeding strategies, and consider a wide range of gene actions, a range of

situations can be analysed for any breeding program to determine the most appropriate

combination of variables for specific situations.

The QUGENE engine is based on a core E(NK) modelling framework, which

enables the user to create a range of genetic models with a defined number of genes (N),

levels of epistasis (K) and environment-types (E), where different NK models can be

defined for different environmental situations represented by the E environment-types.

This flexibility allows the incorporation of G×E interactions and epistasis into the

genetic models of a quantitative trait (Podlich and Cooper 1998). Kauffman (1993)

developed the NK genetic model to study genetic regulatory networks. Kauffman used

this framework to model multi-locus interactions in haploid genomes and in a quantita-

tive genetics system defined N as the number of genes in the genotype and K as the

average number of genes acting on every other gene. Boolean networks have previously

been used as a way of modelling the behaviour of complex epistatic networks in

combination with computer simulation. An application of Boolean networks in nature

has been demonstrated by Yuh et al. (1998), who illustrated experimentally and through

simulation that a Boolean network encoded in a genes upstream region, regulated the

activity of the gene.

It is possible that gene networks interact differently under different environ-

mental conditions when G×E interaction is influencing the performance of a quantita-

tive trait (Mackay 2004). Podlich (1999) investigated the extension of Kauffman’s NK

framework (Kauffman 1993) to diploid organisms in multiple environments. In such a

genotype-environment system the NK framework can be nested within environment

types, producing an E(NK) model to define a genotype-environment system. Here E is

defined as the number of environment types in a target population of environments.

Page 88: Narelle Kruger PhD thesis

54 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

Within the E(NK) framework the concept of a target population of environments was

defined following the qualitative ideas articulated by Comstock (1977).

The QUGENE engine allows the manipulation of a range of factors important in

defining the genotype-environment system, including: (i) the number of traits the genes

influence; (ii) the number of genes contributing towards a trait; (iii) the action of each

gene, for example in the case of a model with no epistasis this can be defined using the

classical m (midpoint value), a (additive effect) and d (dominance effect), as defined by

Falconer and Mackay (1996); (iv) the number of chromosomes; (v) per meiosis

recombination fraction between genes; (vi) epistasis within the NK model framework;

(vii) G×E interaction within the E(NK) model framework; (viii) marker genes; (ix)

heritability of each trait; (x) number of alleles; and (xi) base population size can all be

defined. These factors allow an array of genetic models to be explored ranging from low

complexity - high heritability simple models (Table 2.4: bottom left quadrant) to highly

complex - low heritability models (Table 2.4: top right quadrant).

Table 2.4 Characterisation of the genetic architecture of a trait according to heritability level and some of the factors affecting complexity. Adapted from (Cooper and Hammer 1996)

Complexity Heritability

QU-GENE is capable of modelling the linkage relationship between multiple

loci on multiple chromosomes (Podlich and Cooper 1998). Recombination presently

follows the method outlined in Fraser and Burnell (1970), and does not include

modelling of double-crossovers. The relationship between the loci is coded by specify-

ing a per meiosis recombination fraction (c [0, 0.5]) between adjacent loci. A per

meiosis recombination fraction (c = 0) indicates complete linkage while independent

Low High Low

No G×E interaction No epistasis No linkage Few genes Large experimental error

G×E interaction Epistasis Linkage Many genes Large experimental error

High

No G×E interaction No epistasis No linkage Few genes Small experimental error

G×E interaction Epistasis Linkage Many genes Small experimental error

Page 89: Narelle Kruger PhD thesis

CHAPTER 2 REVIEW OF LITERATURE

55

segregation occurs when the per meiosis recombination fraction c = 0.5. Simulation of

recombination by the Fraser and Burnell (1970) method involves the equivalence of a

random walk along the length of a pair of homologous chromosomes, changing from

one chromosome to the other depending on the constraints and probability of that

change occurring. The chromosomes are stored as bit patterns of zeros and ones with

recombination modelled by suitable logical operations using masks to combine parts of

one gamete with the complementary parts of another (Fraser and Burnell 1970). This

method of modelling has also been used by Mulitze and Baker (1985), Charlesworth et

al. (1992, 1993), Lascoux (1997), and Latter (1998). Work on modelling both positional

and chromatid interference has been conducted by Speed et al. (1992), McPeek and

Speed (1995) and Zhao et al. (1995a, 1995b). Speed et al. (1992) state in their work that

the no interference (positional) model was asymptotically robust for gene ordering with

models which do attempt to account for interference, however some efficiency is lost in

the ordering when there is interference in the underlying crossover process. Even

though this point was concluded, McPeek and Speed, (1995) also point out that this

model “clearly does not fit the data”, yet concluded that none of the models they tried fit

the data, though they did capture certain aspects observed in the data.

2.6 Synopsis from literature Used as a tool, and in combination with appropriate attention to available em-

pirical evidence on important features of the genetic architecture of traits and simulation

experiment design and validation studies, QU-GENE enables investigation of the

impact of resource allocation decisions within a breeding program. The result of these

investigations can be used to assist decisions on how the resources will be allocated

within a breeding program (Fabrizius et al. 1996). The QU-GENE software has

previously been used to model breeding programs (Fabrizius et al. 1996, Podlich and

Cooper 1998, Cooper et al. 1999c, Cooper et al. 1999d, Kruger 1999, Kruger et al.

1999, Podlich et al. 1999, Kruger et al. 2001, Wang et al. 2001, Kruger et al. 2002,

Wang et al. 2003, Ye et al. 2004). In this thesis a suite of modules was developed to

simulate components of the Germplasm Enhancement Program to provide the necessary

tools to investigate the potential for implementing marker-assisted selection for

quantitative traits. This involves considering the use of the current S1 family selection

Page 90: Narelle Kruger PhD thesis

56 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

strategy or replacing the use of S1 families with DH lines. Motivated by empirical

studies (Fabrizius et al. 1997, Nadella 1998, Peake 2002, Jensen 2004), properties of the

genetic architecture of a quantitative trait that have the potential to impact on the

effectiveness of marker-assisted selection are considered in Chapters 6-8. In Chapter 9 a

simulation experiment examines the implementation of marker-assisted selection in the

context of the sequence of steps within the Germplasm Enhancement Program.

Page 91: Narelle Kruger PhD thesis

CHAPTER 3 MODELLING METHODOLOGY

57

CHAPTER 3

MODELLING METHODOLOGY

3.1 Introduction The purpose of this Chapter is to explain the iterative modelling process that was

undertaken to develop the investigations conducted throughout this thesis. This

modelling process was important in: (i) the identification of questions that needed to be

examined; (ii) determining the design of simulation experiments; (iii) identifying who

was involved in the design of the experiments; and (iv) formulating the next set of

questions to be examined from the results of the simulation experiments. To success-

fully employ computer simulation as a tool to be used in experimental work, it was

important to ensure that the simulation experiments were set up to answer the proposed

questions. Detailed analysis and specification needed to occur to ensure that the QU-

GENE simulation module accurately modelled or encoded the genetic and breeding

system of interest. It was also important that the output from the simulation experiment

was setup to provide results that answered the questions posed. Without this initial

process the simulation experiments would not have progressed as planned and the

results would not have met the expectations outlined in Chapter 1.

3.2 Iterative modelling process To model the Germplasm Enhancement Program, or any experiment involving

simulation for this thesis, a series of phases were followed to ensure the successful

completion of the experiment. These phases have been schematically illustrated as a

flowchart (Figure 3.1). Each of the phases within the flowchart is described in detail

below.

Page 92: Narelle Kruger PhD thesis

58 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

Pose question

Develop and testQU-GENE software

Define proposedsimulation experiment

or module

Implement simulationexperiments

- desktop- QCC

Compile results

Analysis &interpretation

Evaluation

Finalise design ofsimulation experiment

Figure 3.1 Iterative modelling methodology process used to design simulation experiments for this thesis, QCC = QU-GENE Computing Cluster

3.2.1 Propose the relevant questions This phase of the modelling process was important as an experiment can not be

completed or a simulation module created without knowing what questions need to be

answered. This phase involved an interactive discussion between researchers that were

familiar with: (i) the empirical data associated with the Germplasm Enhancement

Program; (ii) the important practical and theoretical issues under consideration in the

Germplasm Enhancement Program; and (iii) the responsibility and development of the

QU-GENE simulation software and computing infrastructure requirements. In addition

to the discussion group, extensive research was conducted into the relevant literature to

ensure the envisioned work had not been previously completed, and to ensure the

Page 93: Narelle Kruger PhD thesis

CHAPTER 3 MODELLING METHODOLOGY

59

questions posed were relevant. The group of researchers varied during the course of this

thesis. The group predominantly included Narelle Kruger1, Mark Cooper1&2, Dean

Podlich1&2, Nicole Jensen1&2, Kevin Micallef1 and Chris Winkler2.

3.2.2 Define the proposed simulation experiment or module After the questions that needed to be answered had been defined, this next phase

involved designing the basic framework of the simulation experiment or module to

ensure the questions could be answered. The specifications of the module and the

factors that may need to be varied within the module were also outlined. This design

phase also required interaction between researchers, however, this group was small and

generally included Narelle Kruger, Mark Cooper and Dean Podlich. Any extra work that

needed to be completed or computer programs that needed to be found or trialled was

also specified at this point.

3.2.3 Develop and test the QU-GENE software Once the specifications of the software had been detailed and documented, a

new QU-GENE module was developed to answer the proposed questions. A simulation

experiment, especially involving a new module, was generally initially tested on a

single desktop computer. Small exploratory experiments were conducted on the desktop

allowing debugging of the program to occur on a single user scale. During the testing

phase the software was evaluated and any additional requirements were outlined and

added into the experiments or module at this point. The software development of QU-

GENE was completed by Dean Podlich. The testing of the software was undertaken by

Narelle Kruger.

3.2.4 Finalise the design of the simulation experiment To finalise an experiment or module, a range of experimental variables was

taken into consideration that could not be foreseen or accounted for at the development

stage. The major factor that defines how an experiment will be conducted is the amount

of time a module requires to conduct one run of a single genetic model. As the number

1 The University of Queensland, St Lucia, Brisbane, 4072 2 Pioneer Hi-bred International Inc. 7250 N.W. 62nd Avenue, Johnston, Iowa 50131-0552, USA

Page 94: Narelle Kruger PhD thesis

60 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

of experimental variables and the level of complexity in the genetic model increases, the

time taken to run the model in a module increases. A simple genetic model may take

one minute to run, while a larger complex genetic model can take one week. By the time

a set of genetic models to be compared is determined, with each model conducted 100

times to encapsulate the variation, a single experiment may take years to run. Therefore,

it was important to balance the experimental variables being compared against the time

taken to run the experiment to ensure that results could be collated in a reasonable time

frame that would answer the proposed questions. An important part of the design of an

experiment was to be selective in the number of experimental variables being tested and

the levels within each of these experimental variables. For this thesis there were two

areas that needed to be considered when determining experiment runtime: (i) the

experimental variables used to create the genetic model; and (ii) the experimental

variables required to conduct the breeding strategy. The questions being posed dictated

how the variables within each of these areas would be tested. Examples that were taken

in this thesis to reduce the size of experiments were: (i) testing a high and low heritabil-

ity as opposed to many levels between a heritability h2 = 0 and h2 = 1.0 (Chapter 4); (ii)

using a 12 chromosome genetic model of the wheat genome as opposed to the full set of

21 wheat chromosomes (Chapter 5); (iii) using two flanking markers as opposed to

eight flanking markers (Chapter 5); and (iv) conducting 10 cycles of the breeding

program as opposed to 50 cycles (simulation time constraints). Preliminary experiments

demonstrated that the simplified models gave a representation of the more detailed

analysis and that the variable levels that were not tested did not contribute significantly

to the results or their interpretation while they would have contributed significantly to

the module runtime.

3.2.5 Implementation of the simulation experiment Once the experiment or module setup was finalised and available for running on

a large scale, the experiment was set up on the QU-GENE Computing Cluster (QCC). If

the QCC was involved, relevant scripts and, where necessary, stand alone software to

manage experiment outputs was developed. This involved Narelle Kruger, Kevin

Micallef and Dean Podlich.

Page 95: Narelle Kruger PhD thesis

CHAPTER 3 MODELLING METHODOLOGY

61

3.2.6 Compilation of results of the simulation experiment The results from the simulation experiments were generally comprehensive and

involved large amounts of data in thousands of output files. If an experiment was small

enough, the results were collated into a spreadsheet and the data were manipulated into

a form that was appropriate for statistical analysis. For large datasets, the results were

generally collated using a stand alone program and then entered into a database for

manipulation into a manageable format. This work was completed by Narelle Kruger.

3.2.7 Analysis and interpretation of the simulation experiment Once the data had been manipulated into the appropriate format a statistical

analysis was conducted. The statistical analysis usually involved conducting an analysis

of variance using the statistical software package ASREML (Gilmour et al. 1999). The

results of the analyses were summarised graphically in combination with a statistical

analysis as this greatly assisted interpretation of the results. This work was done by

Narelle Kruger.

3.2.8 Evaluate the results of the simulation experiment in relation to the questions posed

After the data had been analysed and interpreted it was important to evaluate the

results and ensure that the questions posed at the beginning of the experiment were

answered and whether the results fit with any relevant empirical evidence that was

available. This was completed as a small group discussion involving Narelle Kruger,

Mark Cooper and Dean Podlich. Following the discussions it was determined whether

further experiments needed to be completed with the same focus or whether new

experiments needed to be created to answer a new set of questions. If so, a new set of

questions were proposed and the process returned to step 1 (Figure 3.1).

3.3 Questions proposed for the thesis This thesis can be viewed as a series of iterations of the modelling process (Fig-

ure 3.1) to investigate a sequence of key questions identified as relevant to evaluating

the improvement of the Germplasm Enhancement Program breeding strategy, with a

Page 96: Narelle Kruger PhD thesis

62 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

specific focus on marker-assisted selection and DH breeding technologies. In some

Chapters of this thesis, the iterative modelling process was conducted multiple times to

answer a set of related questions. In other Chapters a larger question was posed and only

one cycle of the iteration was conducted. The set of questions proposed to be answered

in each of the experimental Chapters of the thesis were as follows:

Chapter Questions 4 When modelling the same genetic system was there a convergence between

expectations of theoretical prediction equations and simulation results?

5 What is the appropriate QTL detection analysis program to use? Can a reduced genome model be used to simulate linkage relationships for

QTL detection analyses? For this thesis the example considered was whether a 12 chromosome representation of the wheat genome model accurately represented a 21 chromosome genome model for the purposes of studying QTL detection?

6 Does population size affect QTL detection?

7 Does G×E interaction and epistasis affect QTL detection?

8 Is there a difference in the expected response to selection of the Germplasm

Enhancement Program for S1 families when either phenotypic selection, marker selection or marker-assisted selection is implemented when the genetic architecture of the trait was defined as an additive finite locus model?

9 Is there a difference in the expected response to selection of the Germplasm

Enhancement Program for S1 families and DH lines when either phenotypic selection, marker selection or marker-assisted selection are implemented when G×E interaction and epistasis are components of the genetic architec-ture for the trait of interest?

The remainder of this thesis focuses on answering the questions proposed above.

The aim of this thesis (as outlined in Chapter 1) was to answer the main question of how

to improve the rate of genetic gain for the Germplasm Enhancement Program. This

question is examined in detail in Chapter 9. However, the exploratory work indicated

for Chapters 4 to 8 first needed to be completed to ensure the validity of using simula-

tion and designing the experiment to answer this question. The initial Chapters 4 and 5

addressed questions related to the functionality of the simulation program and whether it

could be effectively used to address the main question in Chapter 9. Chapters 6 to 8

addressed questions that were used to define the size and scope of the experiment

required to address the main question, which was examined in Chapter 9.

Page 97: Narelle Kruger PhD thesis

PART II SIMULATION AS A MODELLING APPROACH

63

PART II

SIMULATION AS A

MODELLING APPROACH

Page 98: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

64

Page 99: Narelle Kruger PhD thesis

CHAPTER 4 EXAMINING THEORY AND SIMULATION

65

CHAPTER 4

EXAMINING THE CONSISTENCY

BETWEEN PREDICTIONS FROM

QUANTITATIVE GENETIC EQUATIONS

AND QU-GENE SIMULATIONS OF KEY

GENETIC PROCESSES REQUIRED FOR

MODELLING SELECTION RESPONSE

4.1 Introduction In the field of quantitative genetics, population level genetic processes have been

modelled predominantly using algebraically derived statistical prediction equations

(Fisher 1918, Kempthorne 1988, Comstock 1996, Falconer and Mackay 1996). An

alternative, less frequently used approach, is to use computer simulation to numerically

investigate the same genetic process (Martin Jr and Cockerham 1960, Cress 1967,

Fraser and Burnell 1970, Cooper et al. 2002b). When the same assumptions are used in

the prediction equations, both models, though encoded differently, are expected to give

the same predictions of the genetic process. Thus, as a prelude to the simulation of

complex genetic processes involved in marker-assisted selection, it is possible to

examine the consistency between predictions of the properties of genetic systems based

on both modelling approaches for cases that have previously been studied and explicit

prediction equations have been developed.

Page 100: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

66

Some of the motivating factors for examining the use of computer simulation as

a tool for extending the theoretical models currently used in quantitative theory arise

from a combination of factors including: (i) a growing number of strong questions of the

validity of the simplifying assumptions used to derive the theoretical models (e.g. many

independently segregating genes each with small and equal effects, no gene-by-

environment interaction, no linkage) as discussed in detail by Kempthorne (1988); (ii)

the growing body of evidence from molecular genetic investigations that models of

genetic processes need to explicitly incorporate context dependent interactions between

genes that may be considered to represent biological examples of the statistical

interactions, commonly referred to as epistasis and genotype-by-environment interac-

tions (e.g. Mackay 2001, 2004); (iii) the increasing availability of powerful experimen-

tal approaches to study some of the details of the genetic architecture of quantitative

traits (Kearsey and Pooni 1996); and (iv) the increase in the speed of simulation

methodology as a consequence of advances in computer software (Podlich and Cooper

1998) and hardware (Moore 1965, Micallef et al. 2001). The increased use of computers

in genetics was predicted by Kempthorne (1988) and Keen and Spain (1992) who

recognised that there has been an increase in situations requiring the use of computer

simulation as theoretical equations were reaching the point where hand calculations may

be sufficient for simple models, but computer simulation is essential for understanding

multi-component models and their complex interrelationships if a useful solution is to

be found.

The use of computer simulation as an investigative approach is not unique to

quantitative genetics. More generally in modern mathematical modelling, computer

simulation has been used as a valid tool to obtain answers to complex problems. Casti

(1997a) described this evolution in his systems modelling text “Reality rules: I Picturing

the world in mathematics - the fundamentals”. Further, in a useful complement to this

theoretical treatment of modelling, Schrage (1999) describes how simulation modelling

is being widely used to study complex engineering and business problems. Cooper et al.

(2002b) give a recent overview of how these developments can be applied to study

complex genetic problems relevant to plant breeding.

Page 101: Narelle Kruger PhD thesis

CHAPTER 4 EXAMINING THEORY AND SIMULATION

67

A common approach for understanding the implications of a genetic model for

response to selection of a trait in a breeding program has been traditionally through the

use of both: (i) experimentation to estimate either the variables in the theoretical

prediction equations or to directly estimate realised response to selection; and (ii)

develop appropriate theoretical prediction equations based on assumed quantitative

genetic models. As discussed above, more recently computer simulation has been

applied. It has been argued that QU-GENE, a computer program developed for the

quantitative analysis of genetic models (Podlich and Cooper 1998), allows the relaxa-

tion of some of the assumptions and simplifications made in the derivation of theoretical

prediction equations. While a completely general computer simulation platform (e.g.

QU-GENE) is a desirable quantitative genetic tool, the importance of theoretical

prediction equations has not been lessened. The purpose for the investigations in this

Chapter was to check for consistency between the predictions from quantitative genetics

theoretical equations and QU-GENE simulations under conditions where consistency is

expected to be observed. In these situations the prediction equations are able to provide

an explicit and independent verification of the algorithms implemented within the QU-

GENE simulation program. As the QU-GENE software is the simulation platform used

throughout this thesis, it was considered important to examine the results produced from

the simulation program prior to considering the use of simulation as a valid framework

for the extension of the theoretical framework.

Prediction equations are mathematical formulae used in both plant and animal

genetics to derive expectations of genetic processes. In this Section, expectations

obtained from two prediction equations will be compared with simulation results. The

first prediction equation used in this study is a recombination prediction equation given

by Liu et al. (1996). This equation can be used to calculate the number of generations of

random mating required to breakdown an initial linkage in relation to a defined level of

linkage disequilibrium for a given per meiosis recombination fraction. This comparison

is important to understand the level of consistency between the equation based derived

expectations and the outcomes of the method of simulation of recombination imple-

mented in QU-GENE, which is based on the method given by Fraser and Burnell

(1970). Understanding the properties of recombination that are modelled is an important

Page 102: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

68

component of studying QTL detection and marker-assisted selection. Simulating

different recombination events in populations using computer software programs, e.g.

QU-GENE (Podlich and Cooper 1998) and GenomeMixer (Williams and Williams

2004), help to determine optimal breeding designs for these populations.

The second type of prediction equation examined in this thesis is the response to

selection prediction equation. Response to selection prediction equations are used to

derive the expected level of genetic gain that can be accomplished by a selection

strategy after a cycle of selection. There are many response to selection prediction

equations available for use by plant breeders, which can range from simple models

(Empig et al. 1971, Fehr 1987, Falconer and Mackay 1996), to more complex equations

including more extensive parameters (Comstock 1996). A major component of the

simulation studies reported in this thesis involves examining the expected response to

selection using marker-assisted selection for a wide range of genetic models. As a first

step in this process, expectations from prediction equations for a mass (reference

breeding strategy), S1 family and DH line selection (as used and proposed for the

Germplasm Enhancement Program) breeding strategies, as given by Falconer (1996)

and Comstock (1996) are compared to the expectations from the same genetic models

implemented in QU-GENE.

The following Materials and Methods and Results have been divided into two

sections. The first Section considers modelling recombination and compares the

theoretical equation results to the results from a simulation experiment modelling the

same process. The second Section compares simulation results to two different

prediction equations for the response to selection of mass, S1 family and DH line

selection as they are relevant to this thesis. A combined discussion and conclusion

Section is presented.

4.2 Recombination prediction equations When two or more alleles at multiple loci occur together more frequently than

would be expected by chance, linkage disequilibrium is present in the population. This

association is determined by the extent of recombination events that change the

Page 103: Narelle Kruger PhD thesis

CHAPTER 4 EXAMINING THEORY AND SIMULATION

69

association between the alleles of different loci. For segments on different chromosomes

and in the absence of any effects of selection, this is determined by the patterns of

chromosome assortment during meiosis. Usually it is expected that there will be

independent assortment of different chromosomes into gametes during meiosis. For loci

on the same chromosome, the assortment of alleles during meiosis is determined by the

number and distribution of crossing over events observed as chiasmata during meiosis.

Measures of linkage disequilibrium are variable and influenced by the patterns of

intermating of individuals within a population and how this enables recombination

events that can result in the rearrangement of polymorphic chromosome segments. An

understanding of the level of linkage disequilibrium in a mapping population is

important when creating genetic maps. Decreasing the level of linkage disequilibrium in

a study population by conducting multiple generations of random mating can change the

relationship between genetic map size of an interval containing a QTL and DNA

content of the map interval which may result in an improvement in the precision with

which QTL are mapped (Paterson 1998). By manipulating the number of generations of

random mating it is possible to manipulate the level of linkage disequilibrium in a

mapping population and thus the expected level of map resolution. Modelling of the

genetic process of recombination is an important component of any investigation

examining multiple locus models, particularly QTL detection and marker-assisted

selection.

4.2.1 Materials and Methods 4.2.1.1 Recombination and linkage disequilibrium

A simulation experiment was conducted to compare the consistency between the

simulation results and the predictions derived from the theoretical equation given by Liu

et al. (1996), used to determine the expected number of generations of random mating

required to break a known level of linkage association between two loci.

4.2.1.2 Theory underlying the breaking of linkage A method for calculating the number of generations of random mating required

to reach an observed recombination fraction greater than 0.4 ( )0.4Rt > was given by Liu et

al. (1996), (Equation 4.1) and used by Paterson, (1998),

Page 104: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

70

( ) ( )( )0.4

0.4

ln 1 2 ln 1 2min

ln 1RR

R ct I

c>=

⎛ ⎞− − − ⎟⎜ ⎟⎜= > ⎟⎜ ⎟⎜ ⎟− ⎟⎜⎝ ⎠, (4.1)

where, R is the observed recombination fraction of an F2 population after t generations

of random mating, c is the per meiosis recombination fraction, and I is an integer (whole

number) indicating the number of generations of random mating. An observed recombi-

nation fraction of 0.4 ( )0.4R= was utilised as the prediction point for the equation

since in practice, it is difficult to experimentally detect linkage between markers at

recombination values greater than this.

4.2.1.3 QU-GENE simulation of recombination The modelling of recombination in QU-GENE follows the method of Fraser and

Burnell (1970) and was previously outlined in Chapter 2, Section 2.5.2. Following the

approach described in Chapter 3, the LINKEQ module (Figure 4.1), was created to

simulate the number of generations of random mating required to reach a defined level

of observed recombination fraction (R) between two genes for a range of per meiosis

recombination fractions (c). The LINKEQ module was designed to simulate the

conditions represented by the Liu et al. (1996) prediction equation (Equation (4.1)).

The LINKEQ module simulated an F2 population size of 1000 individuals cre-

ated from two parents at opposing genotypic extremes with coupling phase linkage

association between the alleles of two genes (Figure 4.1). The starting gene frequency

of the favourable allele (GF) in the F2 population was GF = 0.5. Any realistic per

meiosis recombination fraction could be tested. The module continues the process of

random mating without selection until a user defined observed frequency of recombi-

nant gametes is observed in the F2 population (an observed recombination fraction R >

0.4 to confer with Liu et al. (1996) work). The measure of linkage disequilibrium used

in this study was obtained using Equation (4.2),

non-parentalsnon-parentals parentals

,

R

Ab aBAb aB AB ab

+=

+=+ + +

(4.2)

Page 105: Narelle Kruger PhD thesis

CHAPTER 4 EXAMINING THEORY AND SIMULATION

71

where Ab and aB are the non-parental gametes and AB and ab are the parental gametes.

If the observed recombination fraction after a cycle of random mating was less than 0.4

( )0.4R< , the population was subjected to another cycle of random mating and the

observed recombination fraction recalculated. This procedure continued until the

observed recombination fraction was equal to or greater than 0.4 ( )0.4R ≥ and the

number of generations of random mating to reach this point was counted.

F2population

1000

AABB aabb

AaBb

Calculate R ofF2 population

If R < 0.4

If R ≥ 0.4

Count number of generationsof random mating required

to reach R ≥ 0.4

Conduct a generationof random mating of

the F2 population

Figure 4.1 Schematic outline of the LINKEQ module. Two opposing extreme inbred indi-viduals with two genes in coupling phase linkage were crossed to form the F1, which was selfed to form the F2 population. The F2 population was subjected to a number of genera-tions of random mating until the observed frequency of recombinant gametes reaches R ≥ 0.4. After each cycle of random mating if the observed frequency of recombinant gametes R < 0.4, the F2 population is randomly mated until R ≥ 0.4

The simulation experiment was conducted to determine the number of generations

of random mating required to reach an observed recombination fraction between two

genes of R = 0.4 for a range of per meiosis recombination fractions. The simulation was

repeated 500 times for each per meiosis recombination fraction with the average and

standard deviation ( )σ of the number of generations of random mating required to

Page 106: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

72

achieve an observed recombination fraction, R ≥ 0.4 recorded. The per meiosis

recombination fractions tested were c = 0.005, 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07,

0.08, 0.09, 0.10, 0.12, 0.14, 0.16, 0.18, 0.20, 0.25, 0.30, 0.35, and 0.4. These levels

allowed a detailed consideration of the range from a tight linkage (per meiosis recombi-

nation fraction c = 0.005) to a weak linkage (per meiosis recombination fraction c =

0.35). In addition to the target observed recombination fraction of R = 0.4, a target

observed recombination fraction of R = 0.5 was also simulated for the per meiosis

recombination fractions listed above.

4.2.2 Results 4.2.2.1 Recombination and linkage disequilibrium The average number of generations of random mating obtained from the

simulation conformed well to the expectations from Equation (4.1) for a target observed

recombination fraction of 0.4 (Figure 4.2).

Recombination fraction (c)

0.0 0.1 0.2 0.3 0.4 0.5

Num

ber o

f ran

dom

mat

ing

gene

ratio

ns t R

=0.4

0

100

200

300

400

500

600SimulationTheoretical

Figure 4.2 Number of generations of random mating required to reach an observed recom-bination fraction of R = 0.4 between two genes for the simulation (with standard deviation bars) using QU-GENE and the theoretical values calculated from Equation (4.1) for a range of per meiosis recombination fractions. The smaller the per meiosis recombination fraction, the tighter the linkage and the more generations of random mating required to break the linkage

Page 107: Narelle Kruger PhD thesis

CHAPTER 4 EXAMINING THEORY AND SIMULATION

73

As the per meiosis recombination fraction approached c = 0.005, the standard

deviation for the number of generations of random mating for the simulations became

larger. Thus, the number of generations required to break tight linkages was found to be

highly variable even though the average expectations were consistent between the

prediction equations and the simulations. Below a per meiosis recombination fraction of

0.1, the number of generations of random mating required to break up the linkage

associations increased rapidly.

Recomination fraction (c)0.0 0.1 0.2 0.3 0.4 0.5

Num

ber o

f ran

dom

mat

ing

gene

ratio

ns t R

=0.5

0

100

200

300

400

500

600

700

800

900

1000

Figure 4.3 Number of generations of random mating required to reach an observed recom-bination fraction of R = 0.5 between two genes for the simulation (with standard deviation bars) using QU-GENE for a range of per meiosis recombination fractions. The smaller the per meiosis recombination fraction, the tighter the linkage and the more generations of ran-dom mating required to break this linkage

An advantage the simulation approach has over the prediction equation is that

the simulation can provide an estimate of the number of generations of random mating

required to reach an observed recombination fraction of 0.5. Equation (4.1) can not

estimate this point on the distribution as ln 0 is not a defined number therefore, a

comparison between theory and simulation can not be obtained at this limit. Using the

LINKEQ module the number of generations of random mating required to reach an

observed recombination fraction R = 0.5 was estimated (Figure 4.3). The standard

deviation of the number of generations for a given per meiosis recombination fraction

Page 108: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

74

was consistently larger than when the target observed recombination fraction was

defined as R ≥ 0.4, and more generations of random mating were required to reach an

observed recombination fraction of R = 0.5 (cf. Figures 4.2 and Figures 4.3; note these

figures are on different scales for the vertical axis).

4.3 Response to selection prediction equations The Germplasm Enhancement Program was based initially on an S1 family se-

lection strategy (Fabrizius et al. 1996). Recently, replacement of the S1 family selection

by DH line selection has been considered for the Germplasm Enhancement Program.

Thus, the breeding strategies in this Section considered for comparisons between

prediction equation expectations was restricted to S1 family and DH line selection, to

retain relevance to the Germplasm Enhancement Program. The mass selection strategy

(individual plants selected on their phenotype) was also included as a base line

reference point. In comparison to mass selection, both S1 and DH selection represent

family selection strategies. The S1 family and DH lines (families) differ in the extent of

self-pollination that is undertaken. In the case of S1 family selection there is one

generation of self-pollination following the random intermating of selected individuals

(equivalent to an F3 generation). Therefore, there is genetic variation among and within

S1 families. In the case of DH lines, random gametes are sampled from the individuals

and these gametes are doubled to create completely homozygous lines. Therefore, all of

the genetic variation is among the DH lines.

An overview of the relevant response to selection prediction equation theory is

given for mass, S1 family and DH line selection, based on the framework developed by

Comstock (1996) and are labelled as Comstock’s response to selection prediction

equations. A set of Basic response to selection prediction equations are also being

compared and use the equations of Empig (1971), Fehr (1987) and Falconer and

Mackay (1996). These equations are simpler and contain less variables than those of

Comstock’s. More details on the prediction equations are given in Appendix 1, Section

A1.1. Some common assumptions made when deriving the prediction equations

considered for mass, S1 family and DH line selection include, Mendelian inheritance, no

mutation, infinite populations, Hardy-Weinberg equilibrium, many genes with small and

Page 109: Narelle Kruger PhD thesis

CHAPTER 4 EXAMINING THEORY AND SIMULATION

75

equal effects, no linkage or linkage phase equilibrium, no epistasis, no genotype-by-

environment interaction, and no correlated environmental effects. A more extensive list

of assumptions relevant to the prediction equations of interest can be found in Appendix

1, Section A1.2.

4.3.1 Materials and Methods 4.3.1.1 Theoretical prediction equations for mass, S1 family, and DH line selection methods

The simulation experiments reported here focus on examining the convergence

between expectations based on theoretical equations and the simulated mean for

response to selection. In this Section, each of the theoretical prediction equations

compared with the simulations are described. The QU-GENE module PEQ was also

developed to simulate the breeding processes relevant to the prediction equations.

Two simulation experiments were conducted in this Section. The first experi-

ment examines convergence between the prediction equations and simulation results for

three selection strategies: (i) mass; (ii) S1 family; and (iii) DH line selection, without

linkage between genes. The second experiment examines the effects of linkage and

recombination in combination with selection pressure. By imposing linkage and a small

per meiosis recombination fraction of c = 0.05 in the base population of the simulation

study, a range of generations of random mating levels were conducted that allowed

linkage equilibrium to be approached and thus produce the same selection response as

the prediction equations, which were calculated using an observed recombination

fraction of R = 0.5 (linkage equilibrium).

4.3.1.1.1 Basic response to selection prediction equation

The Basic response to selection prediction equation has been reproduced from

Falconer and Mackay (1996) as Equation (4.3),

2

2

22 2 2

2 2 2

,

,

,

p

AA D e

A D e

R h S

ih

i

σ

σ σ σ σσ σ σ

=

=

= + ++ +

(4.3)

Page 110: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

76

where, 2h is the narrow-sense heritability, which can be obtained from genetic

experiments conducted for the reference population (generations) prior to selection, S is

the selection differential, i is the standardised selection differential, pσ is the standard

deviation of the phenotypic values of the individuals (selection units), 2Aσ , 2

Dσ and 2eσ

are the additive and dominance genetic variance and environmental variance, respec-

tively. The selection differential is the mean phenotypic value of the individuals

selected as parents for the next cycle of the breeding process, expressed as a deviation

from the population mean. In practice the selection differential is not known until the

parents are selected. However, the expected value of the standardised selection

differential can be predicted assuming that the distribution of phenotypic values of the

individuals to be subjected to selection is a normal distribution.

Equation (4.3) is the formula used for predicting response to selection for a mass

selection strategy. Mass selection is a simple breeding strategy that involves selection

among individuals on the basis of some measure of their own phenotype. Equation (4.3)

can also be extended to predict the response to selection of other selection units such as

S1 families and DH lines.

The S1 family response to selection prediction equation is shown by Equation

(4.4). For a diploid two allele system, where the frequencies of the alternative alleles are

defined as p and q, when dominance is present in the population and p q≠ such that

0q p− ≠ , then an additional component C, where ( ) ( )12

1

2n

i

C pq p d a q p d=

⎡ ⎤= − + −⎣ ⎦∑ ,

is added on to the additive genetic variance ( )2Aσ giving '

2Aσ (Empig et al. 1971), as

shown by Equation (4.4),

' '

'

'

'

2 2 2 21 12 22 21

42 2 21 12 22 21

4

D eA ADA

D eADA

R iσ σ σ σ

σ σσ σ σ η

σ ση

+ += + +

+ ++ +

, (4.4)

where, 2'Aσ is the additive genetic variance plus the deviation (C) due to the dominance

effect (Empig et al. 1971), and η is the number of replications per environment.

Page 111: Narelle Kruger PhD thesis

CHAPTER 4 EXAMINING THEORY AND SIMULATION

77

Doubled haploid lines are completely homozygous and as such do not express

dominance variation or any segregation within lines. Doubled haploid lines exhibit

twice as much additive genetic variation among lines as that for S1 families used in an

S1 recurrent selection program. Therefore, a coefficient of two is placed in front of 2Aσ

for Equation (4.4) and the dominance genetic variance ( )2Dσ is removed to produce

Equation (4.5),

22

22

2

2 22

eAA

eA

R i σσ σσ ηση

= ++

. (4.5)

The three equations will be referred to as the Basic equations throughout the remainder

of this Section for mass (Equation (4.3)), S1 family (Equation (4.4)) and DH line

(Equation (4.5)) selection.

4.3.1.1.2 Comstock’s response to selection prediction equations Comstock (1996) derived a set of prediction equations for a range of breeding

systems based on gene frequency, gene action, level of inbreeding of the parents,

effective population size ( )eN , selection intensity and level of linkage disequilibrium

between two loci. These were also considered to be important factors in the simulation

modelling of the Germplasm Enhancement Program. For Equations (4.3), (4.4) and

(4.5), there are no explicit terms to account for effective population size or linkage

disequilibrium. Therefore, components of Comstock’s formal treatment of the three

breeding strategies were considered in this Section. For further details on the derivations

and justifications of the following equations the reader is referred to Comstock (1996).

Equation (4.6) is Comstock’s general response to selection prediction equation for

calculating the expected change per cycle of selection of the average value of genotypes

at a locus i, ( )xiE yΔ , when the target germplasm population is the same as the selected

population, and selection is among Sn families on the basis of their own phenotypic

performance. Equation (4.6) is a combination of Comstock’s (1996) Equation (11.27, pg

199) with the addition of linkage disequilibrium as outlined in Table 8.2 (pg 127) of

Page 112: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

78

Comstock (1996). This provides an equation that sums the effects of alleles over all loci

and includes the effects of effective population size and linkage disequilibrium between

locus i and locus j. Comstock states that only the expected response to selection value

( )( )xiE yΔ can be calculated as changes in allele frequency in finite populations are

subject to sampling variation.

( ) ( ) ( )

( )

( ) ( ) ( )( )( )

2

ˆ

(1 )(1 2 ) 12 1 (1 ) 1 1 2 12 2 2

1 1 211 ,

2

i xixi i i i xi xi

e ei

j xji i xi xixjij

e j iX

h q a Zy q q h q a uN N

h q aq q a uk pt rs uN

πσ ≠

⎧ ⎫⎛ ⎞− −⎡ ⎤⎪ ⎪⎡ ⎤Ε Δ = − + + + − − − ×⎨ ⎬⎜ ⎟⎢ ⎥ ⎣ ⎦⎣ ⎦⎪ ⎪⎝ ⎠⎩ ⎭⎡ ⎤ ⎡ ⎤− −−⎢ ⎥ ⎢ ⎥− + − + +⎢ ⎥ ⎢ ⎥⎣ ⎦⎢ ⎥⎣ ⎦

∑(4.6)

where, ( )( ) ( )[ ]{ }3 1 2 2 1 1 5 (1 )xi i i i xiZ a h q h q q a≈ + − + − − − ,

( )1 2 1 211 2 2

nγ γπγ

⎡ ⎤⎡ ⎤− ⎧ ⎫−⎛ ⎞⎢ ⎥= − ⎨ ⎬⎢ ⎥ ⎜ ⎟+ ⎝ ⎠⎢ ⎥⎩ ⎭⎣ ⎦ ⎣ ⎦,

where, qi is the gene frequency of the favourable allele at the ith locus, qj is the gene

frequency of the favourable allele at the jth locus, h is the inbreeding coefficient of the

parents ( )( )1121

nh

−= − , axi is the dominance effect of the gene at the ith locus, axj is the

dominance effect of the gene at the jth locus, uxi is the additive gene effect at the ith

locus, uxj is the additive gene effect at the jth locus, eN is the effective population size, k

is the standardised selection differential, γ is the probability of recombination between

the ith and jth loci and n is the number of successive generations of selfing from the

reference population. The standard deviation of the selection criterion ( )X̂σ , can be

calculated according to Equation (4.7),

( ) ( )( ) ( ) ( ) ( )

2 2 2 22

ˆza zb zab

zX v t vt vtεσ σ σ σ

σ ση

= + + + + (4.7)

where, ( ) ( )( )

22 2

4z

za zb

σσ σ= = , and ( )

( )2

2

2z

zab

σσ = were the ratios used by Comstock for

comparing different breeding strategies, ( )2

zσ is the average variance of values of the

Page 113: Narelle Kruger PhD thesis

CHAPTER 4 EXAMINING THEORY AND SIMULATION

79

genotypes of the genetic population, ( )2

zaσ is the average effect of the interaction

between families and locations, ( )2

zbσ is the average effect of the interaction between

families and years, ( )2

zabσ is the average effect of the interaction between families,

locations and years, ( )2

εσ is the residual error and contains the within family variance,

G×E interaction variance and a constant error (depending on the heritability), and η is

the number of replications at each of v locations in each of t years. Equation (4.8) gives

the genetic variance among Sn families assuming linkage equilibrium in the reference

population that is used for calculating the standard deviation of the selection criterion

(for the purposes of this thesis S1 families and Sn→∞ (DH lines) ≈ S∞ lines are consid-

ered).

( ) ( ) ( ) ( )( )( ) ( )( )

( ) ( ) ( )

22 2

2

1 1 2 1 12 1 1 1 1 2

4

2 1 1 1 ,

i i xii i i xi xiz

i

i i j j i i j ji j i

h q q h aq q h h q a u

q q q q B h a u a u

σ

⎧ ⎫⎡ ⎤⎪ ⎪− − − −⎪ ⎪⎣ ⎦⎪ ⎪= − + + − − +⎨ ⎬⎪ ⎪⎪ ⎪⎪ ⎪⎩ ⎭⎡ ⎤+ − − − −⎢ ⎥⎣ ⎦

∑∑(4.8)

where, ( ) 1

21 2 2

2

n

Bγ γ

−⎡ ⎤− +⎢ ⎥=⎢ ⎥⎣ ⎦

.

Firstly, consider the development of the mass selection prediction equation.

Equation (4.9) is based on selection units that are non-inbred or families of non-inbred

individuals which sum over all loci and includes the effect of effective population size

( )12mass eN N= + and linkage disequilibrium,

( ) ( ) ( ) ( ) ( ) ( )

( )( ) ( ) ( )

2 21 1

ˆ2 2

12

12 1 1 1 2 12 2

11 1 2 ,

xi i i i xi xii X

i i xi xij xj xjij

j i

Z kE y q q q a uN N

q q a upt rs q a u

N

σ

⎡ ⎤⎧ ⎫⎛ ⎞⎪ ⎪ ⎢ ⎥⎡ ⎤Δ = − + − − −⎜ ⎟⎨ ⎬⎣ ⎦ ⎜ ⎟ ⎢ ⎥+ +⎪ ⎪⎝ ⎠⎩ ⎭ ⎢ ⎥⎣ ⎦− ⎡ ⎤− + − + −⎣ ⎦+

∑ (4.9)

where, ( ) ( ){ }4 1 2 1 5 1xi i i i xiZ a q q q a⎡ ⎤≈ − + − −⎣ ⎦ .

Page 114: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

80

For the mass strategy, selection is based on individuals therefore, there is no within

family variance included in the residual error, ( )2εσ . The genetic variance for the mass

selection strategy was not explicitly outlined in Comstock (1996), however, as there is

no family structure in the mass strategy, ( )2zσ can be calculated by adding the additive

and dominance genetic variance together, assuming linkage equilibrium and no epistasis

(Equation (4.10)).

( )

( ) ( ) ( )

2 2 2

2 22 2 2 22 1 1 1 2 4 1 .

g dz

i i i xi xi i i xi xii i

q q q a u q q a u

σ σ σ= +

⎡ ⎤= − + − + −⎣ ⎦∑ ∑ (4.10)

The S1 family selection strategy was based on parents that were non-inbred

members of the reference population. Equation (4.11) is a simplification of equation

(4.6) where the inbreeding coefficient of the parents was zero (i.e. h = 0). With the S1

family effective population size 1 2 12

S e bb

MN+

⎛ ⎞⎟⎜ ⎟=⎜ ⎟⎜ ⎟⎟⎜⎝ ⎠ substituted in,

( ) ( ) ( )( ) ( )

( )

( )

( )( ) ( )

( )

2 1 2 12 2

2 12

2

ˆ

(1 2 ) 12 1 1 1 1 2 12 2 2

1 211 ,

2

b bb b

bb

i xixi i i i xi xi

M Mi

j xji i xi xixjijM j iX

q a Zy q q q a u

q aq q a uk pt rs uπσ

+ +

+ ≠

⎧ ⎫⎛ ⎞⎪ ⎪⎟⎪ ⎪⎜ ⎟⎪ ⎪⎜⎡ ⎤− ⎟⎪ ⎜ ⎪⎪ ⎪⎟⎡ ⎤⎢ ⎥Ε Δ = − + + − − − ×⎜⎨ ⎬⎟⎜⎣ ⎦ ⎟⎢ ⎥⎪ ⎪⎜ ⎟⎣ ⎦⎪ ⎪⎟⎜ ⎟⎪ ⎪⎜ ⎟⎜⎝ ⎠⎪ ⎪⎪ ⎪⎩ ⎭⎡ ⎤ ⎡ ⎤−−⎢ ⎥ ⎢ ⎥⎢ ⎥ − + − + +⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎣ ⎦⎣ ⎦

(4.11)

where, ( ) [ ]{ }3 1 2 2 1 5 (1 )xi i i i xiZ a q q q a≈ − + − − ,

( ) ( )( ) ( ) ( ) ( )

2 2 2 22

ˆza zb zab

zX v t vt rvtεσ σ σ σ

σ σ= + + + + ,

( ) ( ) ( ) ( ) ( )2 2 2 2 2 21 1 1 1 1

2 2 2 2 2g d za zb zab eεσ σ σ σ σ σ σ= + + + + + ,

where, M is the number of S0 plants selected from the reference population based on S1

family performance, b is the number of reserve seed (S1 individuals within each of the

Page 115: Narelle Kruger PhD thesis

CHAPTER 4 EXAMINING THEORY AND SIMULATION

81

selected S1 families derived from the selected S0 individuals) randomly mated per S0

plant selected and ( )2eσ is a constant error variance term.

As for S1 family selection the DH line selection strategy also utilised parents that

were non-inbred members of the reference population. However, in the case of DH

lines, a single gamete was selected from each individual in the reference population then

doubled to homozygosity. These DH lines were evaluated and selection was made

among the S0 derived DH lines. For the selected lines there is no within family genetic

variance. Equation (4.12) is given as a simplification of Equation (4.6), where the

inbreeding coefficient of the parents is assumed to be 1 (h = 1) and the DH line effective

population size DH2eMN

⎛ ⎞⎟⎜ = ⎟⎜ ⎟⎟⎜⎝ ⎠ is substituted in,

( ) ( ) ( ) ( ) ( )

( )

( )( ) ( ) ( )

2

2 2

ˆ 2

12 1 2 1 1 2 12 2

11 ,

xi i i i xi xiM Mi

i i xi xixjijM

j iX

Zy q q q a u

q q a uk pt rs uπσ ≠

⎧ ⎫⎛ ⎞⎪ ⎪⎡ ⎤Ε Δ = − + − − − ×⎜ ⎟⎨ ⎬⎣ ⎦⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭⎡ ⎤ −⎢ ⎥ − + − +⎢ ⎥⎢ ⎥⎣ ⎦

∑ (4.12)

where, ( ){ }4 1 2xi iZ a q≈ −

( ) ( )( ) ( ) ( ) ( )

2 2 2 22

ˆza zb zab e

zX v t vt rvt

σ σ σ σσ σ= + + + + .

The DH line prediction equation does not include any within family variances in the

residual error ( )2εσ as the individuals within a DH line are all genetically identical.

4.3.1.2 Simulating mass, S1 family and DH line selection methods Following the procedures given in Chapter 3 an application module for QU-

GENE was developed to test the convergence between the Basic and Comstock

response to selection prediction equations against the simulated response to selection

using the QU-GENE PEQ simulation module. The PEQ module (Figure 4.4) simulates

the mass, S1 family and DH line selection strategies simultaneously from a single F2

reference population. The simulation module calculates the mean and standard deviation

of the change in the F2 or S0 population mean for each strategy after one cycle of

Page 116: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

82

selection. To create the F2 base reference population (equivalent to the S0 as discussed

above for the S1 families and DH lines), two genotypically different parents with genes

either in coupling or repulsion phase linkage were first crossed to produce the F1. The F1

was self-pollinated (selfed) to produce the F2 or S0 reference population. The mass, S1

family and DH line selection strategies were then applied to the reference population for

one cycle of selection. An option was included in the PEQ module to allow the user to

define a variable number of generations of random mating to the F2 reference popula-

tion to remove the effect of linkage disequilibrium based on the relationships defined in

Section 4.2.2.1 (Figures 4.2 and 4.3).

BasePopulation

F2

1000

BasePopulation

S0

1000

Number of generationsof random mating

Number of generationsof random mating

Phenotypic evaluationof all individuals

Intermate selectedindividuals

mean andstandarddeviationcalculated

Self ordouble

Phenotypicevaluation

Reserveseed (b)

No. progeny testedper S0 plant (f)No. locations

Intermate selectedindividuals

AABB aabb

AaBb

AABB aabb

AaBb

(a) Mass (b) S1 family DH line

mean andstandarddeviationcalculated

Figure 4.4 Schematic outline of the PEQ module, (a) mass selection strategy, (b) S1 family (self) and DH line (double) strategy. This example shows a two gene model in coupling with a base population size of 1000 individuals

Page 117: Narelle Kruger PhD thesis

CHAPTER 4 EXAMINING THEORY AND SIMULATION

83

For the mass strategy each individual F2 plant phenotype was evaluated and the

individuals with the largest phenotype values were selected (Figure 4.4a). The selected

plants were then taken from the F2 base population and randomly mated to create the

new base population. For S1 family selection each individual S0 plant was selfed to

produce the S1 seed that represents an S1 family derived from an S0 individual (Figure

4.4b). A component of this S1 seed was designated and kept as reserve seed (b as

defined in Equation (4.11)) while the remainder of the seed (Figure 4.4b, No. progeny

tested per S0 plant) is used to measure the phenotype of the S1 family. The individuals

were phenotypically evaluated at a number of locations (Figure 4.4b, No. locations).

The selection proportion determined how many of the high performing S1 families were

selected. The reserve seed of the selected S1 families was randomly mated to create the

new base population and the mean of the progeny was calculated.

For the DH line selection strategy a random gamete from each of the F2 plants

was doubled to produce a doubled haploid seed (Figure 4.4b). A component of the DH

seed (only one seed is needed in the simulation since all individuals were assumed to be

genetically identical homozygotes) was designated as reserve seed (b) while the

remainder of the seed (Figure 4.4b, No. progeny tested per S0 plant) is used to measure

the phenotype of the DH line at a number of locations (Figure 4.4b, No. locations). The

DH lines were selected on phenotypic performance and the reserve seed of the selected

plants was used to conduct one cycle of random mating to create the new base plant

population. The mean of the new base population was recorded.

4.3.1.2.1 Investigating convergence of expectation from prediction theory

and simulation Simulation experiments were conducted using the PEQ module to compare

simulation results with the expected results based on the response to selection prediction

equations. Only additive models were considered in the cases presented in this Section.

The experimental variables examined in the first experiment are defined in Table 4.1.

The three selection strategies mass, S1 families and DH lines were examined. Three

levels of gene number were tested over four heritability levels. Heritability was

calculated by using an error that was proportional to half, equal to and twice the additive

Page 118: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

84

genetic variance, as well as a heritability of h2 = 1.0. The number of progeny tested per

F2 or S0 plant, and reserve seed values were altered to add replication into the experi-

ment (Table 4.1).

Table 4.1 Experimental variable levels defined in the PEQ module to compare the re-sponse to selection from simulation and expectations from prediction equations

Experimental variable Levels F2 or S0 population size 1000 Selection strategy mass, S1, DH Gene action additive No. progeny per F2 or S0 plant (f) 1, 10 No. reserve seed (b) 1, 10 Linkage types coupling Per meiosis recombination fraction (c) 0.5 Selection proportion 0.2 No. of genes 2, 10, 50 Heritability (h2) 0.3, 0.5, 0.66, 1.0

4.3.1.2.2 Verifying the number of generations of random mating required

to reach linkage equilibrium The PEQ module provided an option to examine the effect of linkage disequilib-

rium on response to selection for the three breeding strategies. With the flexibility of

simulation over theoretical equations it was possible to observe the effects of linkage

disequilibrium in both coupling and repulsion phase linkage and the effect conducting a

certain number of generations of random mating had on achieving equilibrium. This

provided an independent verification of the results of Section 4.2 where, for a given per

meiosis recombination fraction, the number of generations of random mating required to

achieve equilibrium was determined. If the expected number of generations of random

mating required to achieve linkage equilibrium from Section 4.2 were correct, then the

simulations in Section 4.3.1.2.2, using random mating to remove the effect of linkage

disequilibrium should produce the same mean result as the prediction equations under

the assumption of linkage equilibrium. This simulation experiment was conducted for

mass, S1 family and DH line selection strategies with the experimental variables

outlined in Table 4.2.

The results of Section 4.2.2.1 indicated that to remove the effects of linkage dis-

equilibrium associated with a per meiosis recombination fraction of c = 0.05 required

Page 119: Narelle Kruger PhD thesis

CHAPTER 4 EXAMINING THEORY AND SIMULATION

85

approximately 75 generations of random mating (Figure 4.4 and Table 4.3). Thus, in the

simulation experiments conducted in this Section, 75 generations of random mating

were conducted prior to selection to remove the linkage disequilibrium effect caused by

a per meiosis recombination fraction of c = 0.05.

Table 4.2 Experimental variable levels used in the PEQ module to verify linkage equi-librium results from Section 4.2

Experimental variable Levels F2 or S0 population size 1000 Selection strategy mass, S1, DH Gene action additive No. progeny tested per F2 or S0 plant (f) 10 No. reserve seed (b) 10 Linkage types coupling, repulsion Per meiosis recombination fraction 0.05 Selection proportion 0.2 Generations of random mating 0, 40, 80 No. of genes 10 Heritability 0.3, 0.5, 0.66, 1.0

Table 4.3 Average number of generations of random mating (RM) required to reach link-age equilibrium (observed recombination fraction, R = 0.5) for three per meiosis recombi-nation fractions (based on linkage in coupling over 500 runs). Results from Figure 4.3

Per meiosis recombination

fraction (c)

Average number generations RM ± standard error

0.5 2 ± 0.11 0.05 75 ± 2.18

0.005 544 ± 18.32

4.3.2 Results 4.3.2.1 Response to selection prediction equations 4.3.2.1.1 Investigating convergence of expectation from prediction theory

and simulation The response to selection was calculated for three selection strategies; mass

selection (Figure 4.5), S1 family selection (Figure 4.6) and the DH line selection

strategy (Figure 4.7). Each individual graph illustrates the response to selection for a

specified gene level (i.e. genetic models based on two, 10 and 50 genes) against four

heritability levels, two levels of reserve seed and two levels of number of progeny tested

per F2 or S0 plants (or level of replication within a plot). The variables reserve seed and

number of progeny tested per F2 plant, b = f = 1 and b = f = 10 do not apply for mass

Page 120: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

86

selection as there is no replication and selection is based on individuals. The simulation

was considered to produce the same response to selection as the prediction equations if

the prediction equations fell within the simulation standard deviation bars. For the

simulation, the response to selection was estimated as the difference between the

simulated reference population mean before and after one cycle of selection. The

predicted response to selection was determined using the Basic prediction equations

(Section 4.3.1.1.1) and the Comstock prediction equations (Section 4.3.1.1.2).

For mass selection, when two genes were present in the genetic model, the pre-

diction equations produced the same expected response to selection at low heritabilities

(Figure 4.5a). With a heritability of h2 = 1.0 the prediction equation response was

slightly higher than the simulated response. In the cases where heritability was defined

as h2 = 1.0, further investigations found that the distribution of the population pheno-

types was not normally distributed in the simulation experiment (Appendix 1, Section

A1.3) and the superior homozygotes were selected in one cycle contributing to the

discrepancy between the prediction equations and the simulations. As the number of

genes in the genetic model increased to 10 (Figure 4.5b) and 50 (Figure 4.5c) both the

prediction equations and the simulation produced the same response.

Both the number of progeny tested per S0 plant and the number of reserve seed

are factors that can influence the response for S1 family selection. With the two-gene

genetic model and the reserve seed and number of progeny tested per S0 plant set at b =

f = 1, both the prediction equations and simulation produced the same response to

selection (Figure 4.6a). When the reserve seed and number of progeny tested per S0

plant were increased to b = f = 10 the response to selection increased at the lower

heritability levels. Simulation and prediction equations produced the same response to

selection at the lower heritability. With the higher heritabilities for the two-gene model

the simulation produced a slightly lower response to selection compared to the

expectations based on the prediction equations (Figure 4.6b).

Page 121: Narelle Kruger PhD thesis

CHAPTER 4 EXAMINING THEORY AND SIMULATION

87

(a) E(NK) = 1(2:0)

Heritability0.2 0.4 0.6 0.8 1.0

Res

pons

e to

sel

ectio

n (tr

ait u

nits

)0.0

0.5

1.0

1.5

2.0

(b) E(NK) = 1(10:0)

Heritability0.2 0.4 0.6 0.8 1.0

Res

pons

e to

sel

ectio

n (tr

ait u

nits

)

1.0

1.5

2.0

2.5

3.0

3.5

4.0

(c) E(NK) = 1(50:0)

Heritability0.2 0.4 0.6 0.8 1.0

Res

pons

e to

sel

ectio

n (tr

ait u

nits

)

3

4

5

6

7

8

BasicComSim

Figure 4.5 Response to selection for the mass selection strategy for the simulation (Sim), with standard deviation bars, Basic prediction equation (Basic, Equation 4.3) and Comstock prediction equation (Com, Equation 4.9). Response was assessed in one environment (E = 1) with three gene levels (N = 2, 10, 50) and no epistasis (K = 0), with a reference F2 popu-lation size of 1000, additive gene action, and linkage equilibrium

As discussed for mass selection, in the case of the two-gene model, at the higher

levels of heritability the distribution of phenotypic values of the S0 families was not

normal and superior families dominated. Double homozygotes for the favourable alleles

were predominantly selected in the single cycle of selection. As the number of genes in

the model was increased to N =10 (Figure 4.6c and Figure 4.6d) and N = 50 (Figure 4.6e

and Figure 4.6f) the prediction equations and simulation produced the same response to

Page 122: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

88

selection when the reserve seed and number of progeny tested per S0 plant were both b

= f = 1 and b = f = 10. When the reserve seed and number of progeny tested per S0 plant

were b = f = 10 the overall response to selection was higher than when the reserve seed

and number of progeny tested per S0 plant was b = f =1, especially with the higher

number of genes in the genetic models (Figure 4.6d and f cf. Figure 4.6c and e).

(a) E(NK) = 1(2:0), b = 1, f = 1

Heritability0.2 0.4 0.6 0.8 1.0

Res

pons

e to

sel

ectio

n (tr

ait u

nits

)

0.0

0.5

1.0

1.5

2.0(b) E(NK) = 1(2:0), b = 10, f = 10

Heritability0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

(c) E(NK) = 1(10:0), b = 1, f = 1

Heritability0.2 0.4 0.6 0.8 1.0

Res

pons

e to

sel

ectio

n (tr

ait u

nits

)

1.0

1.5

2.0

2.5

3.0

3.5(d) E(NK) = 1(10:0), b = 10, f = 10

Heritability0.2 0.4 0.6 0.8 1.0

1.0

1.5

2.0

2.5

3.0

3.5

(e) E(NK) = 1(50:0), b = 1, f = 1

Heritability0.2 0.4 0.6 0.8 1.0

Res

pons

e to

sel

ectio

n (tr

ait u

nits

)

3.0

3.5

4.0

4.5

5.0

5.5

6.0

6.5

7.0

(f) E(NK) = 1(50:0), b = 10, f = 10

Heritability0.2 0.4 0.6 0.8 1.0

3.0

3.5

4.0

4.5

5.0

5.5

6.0

6.5

7.0

BasicComSim

Figure 4.6 Response to selection for the S1 family selection strategy for the simulation (Sim), with standard deviation bars, Basic prediction equation (Basic, Equation 4.4) and Comstock prediction equation (Com, Equation 4.11). Response was assessed in one envi-ronment (E = 1) with three gene levels (N = 2, 10, 50) and no epistasis (K = 0), with a refer-ence S0 population size of 1000, additive gene action, and linkage equilibrium. f is the num-ber of progeny tested per S0 plant (level of replication) and b is the number of reserve seed intermated to create the reference population after selection

Page 123: Narelle Kruger PhD thesis

CHAPTER 4 EXAMINING THEORY AND SIMULATION

89

For DH line selection both the reserve seed and number of progeny tested per S0

plant influenced the response to selection for the two-gene model. When the reserve

seed and number of progeny tested per S0 plant were b and f = 1, the prediction

equations produced the same response to selection at a low and high heritability,

however, the simulation produced a higher response than the prediction equations at the

intermediate heritability levels (Figure 4.7a). This divergence was further extended

when the reserve seed and number of progeny tested per S0 plant were increased to b = f

= 10 (Figure 4.7b), where the simulation produced a great response to selection than the

prediction equations at the low heritability. As the number of genes in the model was

increased to N = 10 (Figure 4.7c and Figure 4.7d) and N = 50 (Figure 4.7e and Figure

4.7f) the prediction equations and simulation produced the same responses. Therefore,

as for mass selection and S1 family selection, there were deviations between the

simulation results and the expectations of the prediction equations for the two-gene

model case.

The assumption of a normally distributed quantitative trait did not hold for the

two-gene model, particularly when the heritability of the trait approached h2 = 1.0.

Under these circumstances deviations were observed between the expectations of the

prediction equations and the outcomes of the simulations for mass, S1 family and DH

line selection. For the 10-gene model and 50-gene model cases, when the assumption of

a normally distributed trait held (Appendix 1, Section A1.3) there was convergence

between the response to selection obtained from the simulation and the expectations of

the prediction equations.

Page 124: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

90

(a) E(NK) = 1(2:0), b = 1, f = 1

Heritability0.2 0.4 0.6 0.8 1.0

Res

pons

e to

sel

ectio

n (tr

ait u

nits

)

1.0

1.5

2.0

2.5

3.0(b) E(NK) = 1(2:0), b = 10, f = 10

Heritability0.2 0.4 0.6 0.8 1.0

1.0

1.5

2.0

2.5

3.0

(c) E(NK) = 1(10:0), b = 1, f = 1

Heritability0.2 0.4 0.6 0.8 1.0

Res

pons

e to

sel

ectio

n (tr

ait u

nits

)

2.0

2.5

3.0

3.5

4.0

4.5

5.0(d) E(NK) = 1(10:0), b = 10, f = 10

Heritability0.2 0.4 0.6 0.8 1.0

2.0

2.5

3.0

3.5

4.0

4.5

5.0

(e) E(NK) = 1(50:0), b = 1, f = 1

Heritability0.2 0.4 0.6 0.8 1.0

Res

pons

e to

sel

ectio

n (tr

ait u

nits

)

6

7

8

9

10

(f) E(NK) = 1(50:0), b = 10, f = 10

Heritability0.2 0.4 0.6 0.8 1.0

6

7

8

9

10

BasicComSim

Figure 4.7 Response to selection for the DH line selection strategy for the simulation (Sim), with standard deviation bars, Basic prediction equation (Basic, Equation 4.5) and Comstock prediction equation (Com, Equation 4.12). Response was assessed in one envi-ronment (E = 1) with three gene levels (N = 2, 10, 50) and no epistasis (K = 0), with a refer-ence S0 population size of 1000, additive gene action, and linkage equilibrium. f is the number of progeny tested per S0 plant (level of replication) and b is the number of reserve seed intermated to create the reference population after selection

A number of other cases where the assumption of a normally distributed trait

may not hold can be examined using simulation. In particular the influence of the

number of genes and dominance on the distribution of the trait phenotypes was

examined. The results of these investigations are summarised in Appendix 1, Section

A1.3. As in the case of the two-gene model here, when the assumption of normality was

Page 125: Narelle Kruger PhD thesis

CHAPTER 4 EXAMINING THEORY AND SIMULATION

91

violated there were deviations between the results of the simulation and the expectations

from the prediction equations.

4.3.2.2 Verifying the number of generations of random mating required to reach linkage equilibrium The presence of linkage disequilibrium in the reference population can result

from non-random association of alleles at different loci in the founding individuals that

give rise to the reference population. The presence of linkage disequilibrium can

produce a deviation between the simulation and the predictions based on the assumption

of linkage equilibrium. Randomly mating the reference population should reduce

linkage disequilibrium and reduce or remove any discrepancy between the simulation

and the prediction equations. For the simulation experiment a per meiosis recombination

fraction of c = 0.05 was defined, and three levels of generations of random mating were

conducted to determine the effect of random mating on reducing linkage disequilibrium

in the population. Based on the results of Section 4.2 (summarised in Table 4.3) after 80

generations of random mating, the simulation and prediction equations (with the

prediction equations using an observed recombination fraction of R = 0.5) should

produce the same response to selection. Both coupling and repulsion phase linkage

associations were considered.

For the mass selection strategy with zero generations of random mating the

simulated response was higher for coupling phase linkage (Figure 4.8a) and lower for

repulsion phase linkage (Figure 4.8b) than the prediction equations due to the effect of

linkage disequilibrium. As the number of generations of random mating increased to 40,

the simulation response approached the prediction equation as the effect of linkage

disequilibrium decreased (Figure 4.8c and Figure 4.8d). After 80 generations of random

mating, linkage equilibrium was approached and the results of the simulation were the

same as the expectations based on the prediction equations, for both coupling and

repulsion phase linkage, as the prediction equations fell inside the standard deviation

bars of the simulation (Figure 4.8e and Figure 4.8f). Heritability affected the response to

selection for both simulation and prediction equations. As the heritability increased the

response to selection increased.

Page 126: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

92

(a) E(NK) = 1(10:0), 0 RM

Heritability0.2 0.4 0.6 0.8 1.0

Res

pons

e to

sel

ectio

n (tr

ait u

nits

)

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5(b) E(NK) = 1(10:0), 0 RM

Heritability0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

(c) E(NK) = 1(10:0), 40 RM

Heritability0.2 0.4 0.6 0.8 1.0

Res

pons

e to

sel

ectio

n (tr

ait u

nits

)

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5(d) E(NK) = 1(10:0), 40 RM

Heritability0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

(e) E(NK) = 1(10:0), 80 RM

Heritability0.2 0.4 0.6 0.8 1.0

Res

pons

e to

sel

ectio

n (tr

ait u

nits

)

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5(f) E(NK) = 1(10:0), 80 RM

Heritability0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

Coupling Repulsion

BasicComSim

Figure 4.8 Random mating reduced the effect of linkage disequilibrium for a per meiosis recombination fraction of c = 0.05 to reach an observed linkage equilibrium of R = 0.5 for the response to selection of the simulation (Sim) for the mass selection strategy. Response to selection for the Basic (Basic) and Comstock (Com) prediction equations are the same across all plots and assume linkage equilibrium. A one environment (E = 1), 10 gene (N = 10) and no epistasis (K = 0) genetic model was tested. A reduction in linkage equilibrium was observed for both coupling and repulsion phase linkage

For the S1 family selection strategy with zero generations of random mating the

simulated response was higher for coupling phase linkage (Figure 4.9a) and lower for

repulsion phase linkage (Figure 4.9b) than the expectation based on the prediction

Page 127: Narelle Kruger PhD thesis

CHAPTER 4 EXAMINING THEORY AND SIMULATION

93

equations, due to the effect of the linkage disequilibrium in the reference population. As

the number of generations of random mating was increased to 40 the simulation

response approached the prediction equations as the effect of linkage was reduced

(Figure 4.9c and Figure 4.9d).

(a) E(NK) = 1(10:0), 0 RM

Heritability0.2 0.4 0.6 0.8 1.0

Res

pons

e to

sel

ectio

n (tr

ait u

nits

)

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5(b) E(NK) = 1(10:0), 0 RM

Heritability0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

(c) E(NK) = 1(10:0), 40 RM

Heritability0.2 0.4 0.6 0.8 1.0

Res

pons

e to

sel

ectio

n (tr

ait u

nits

)

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5(d) E(NK) = 1(10:0), 40 RM

Heritability0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

(e) E(NK) = 1(10:0), 80 RM

Heritability0.2 0.4 0.6 0.8 1.0

Res

pons

e to

sel

ectio

n (tr

ait u

nits

)

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5(f) E(NK) = 1(10:0), 80 RM

Heritability0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

Coupling Repulsion

BasicComSim

Figure 4.9 Random mating reduced the effect of linkage disequilibrium for a per meiosis recombination fraction of c = 0.05 to reach an observed linkage equilibrium of R = 0.5 for the response to selection of the simulation (Sim) for the S1 family selection strategy. Re-sponse to selection for the Basic (Basic) and Comstock (Com) prediction equations are the same across all plots and assume linkage equilibrium. A one environment (E = 1), 10 gene (N = 10) and no epistasis (K = 0) genetic model was tested. A reduction in linkage equilib-rium was observed for both coupling and repulsion phase linkage

Page 128: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

94

After 80 generations of random mating, linkage equilibrium was more closely

approached and the simulation produced the same response to selection as the prediction

equations for both coupling and repulsion phase linkage, as the prediction equations fell

inside the standard deviation bars of the simulation (Figure 4.8e and Figure 4.8f).

(a) E(NK) = 1(10:0), 0 RM

Heritability0.2 0.4 0.6 0.8 1.0

Res

pons

e to

sel

ectio

n (tr

ait u

nits

)

0

1

2

3

4

5

6

(b) E(NK) = 1(10:0), 0 RM

Heritability0.2 0.4 0.6 0.8 1.0

0

1

2

3

4

5

6

(c) E(NK) = 1(10:0), 40 RM

Heritability0.2 0.4 0.6 0.8 1.0

Res

pons

e to

sel

ectio

n (tr

ait u

nits

)

0

1

2

3

4

5

6

(d) E(NK) = 1(10:0), 40 RM

Heritability0.2 0.4 0.6 0.8 1.0

0

1

2

3

4

5

6

(e) E(NK) = 1(10:0), 80 RM

Heritability0.2 0.4 0.6 0.8 1.0

Res

pons

e to

sel

ectio

n (tr

ait u

nits

)

0

1

2

3

4

5

6

(f) E(NK) = 1(10:0), 80 RM

Heritability0.2 0.4 0.6 0.8 1.0

0

1

2

3

4

5

6

BasicComSim

Coupling Repulsion

Figure 4.10 Random mating reduced the effect of linkage disequilibrium for a per meiosis recombination fraction of c = 0.05 to reach an observed linkage equilibrium of R = 0.5 for the response to selection of the simulation (Sim) for the DH line selection strategy. Re-sponse to selection for the Basic (Basic) and Comstock (Com) prediction equations are the same across all plots and assume linkage equilibrium. A one environment (E = 1), 10 gene (N = 10) and no epistasis (K = 0) genetic model was tested. A reduction in linkage equilib-rium was observed for both coupling and repulsion phase linkage

Page 129: Narelle Kruger PhD thesis

CHAPTER 4 EXAMINING THEORY AND SIMULATION

95

The heritability parameter had little effect on the response to selection for both

simulation and prediction equations as S1 family selection involves replication which

resulted in an increase in the effective heritability of the differences among the family

means.

With zero generations of random mating for the DH line selection, the simula-

tion response was higher for the coupling phase linkage (Figure 4.10a) and lower for

repulsion phase linkage (Figure 4.10b) than the expectations based on the prediction

equations due to the effect of linkage disequilibrium. As the number of generations of

random mating was increased to 40, the simulation response approached the expecta-

tions of the prediction equation as the effect of linkage was reduced (Figure 4.10c and

Figure 4.10d). After 80 generations of random mating, the population was closer to

linkage equilibrium and the simulation produced the same response to selection as the

prediction equations for both coupling and repulsion phase linkage, as the prediction

equations fell inside the standard deviation bars of the simulation (Figure 4.8e and

Figure 4.8f). The average response to selection of the simulations was trending towards

being slightly lower than the prediction equations for DH lines. This could possibly be

due to a small effect of the loss of genes due to genetic drift and conducting of 80

generations of random mating while creating DH lines. Heritability had little effect on

the response to selection for both simulation and prediction equations as DH line

selection also involved replication which resulted in an increase in the effective

heritability of the differences among the family means.

For the three selection strategies considered, 80 generations of random mating

was able to reduce the linkage disequilibrium present in the base population to a point

where the simulation and prediction equations produced similar results. This was

observed for both coupling and repulsion phase linkage associations and was within the

expectations based on the findings from Section 4.2.2.1.

4.4 Discussion By conducting a simulation with the same parameters and assumptions as those

held by prediction equations, both methods could be compared on their consistency to

Page 130: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

96

predict similar outcomes. For the recombination prediction equation, the method of

modelling recombination by QU-GENE conformed to the expectations of the theory.

The limitations of theory however, were reinforced as simulation was able to estimate

the number of generations required to reach true linkage equilibrium (R = 0.5), whereas

the prediction equation could test only to an observed recombination fraction of R = 0.4.

From the response to selection prediction equation results it was shown that when

simple models (the point at which assumptions held by the prediction equation match

the simulation) are examined, the simulation produced the same answer as the predic-

tion equation. As more complex genetic models were tested it was not possible to

compare the prediction equation and simulation results as the assumptions held by the

prediction equation failed, and no longer matched the simulation. This generally

resulted in the prediction equations over-estimating the response to selection cf.

simulation method, and for the two-gene model predicting a response to selection that is

not possible (Appendix 1, Section A1.3). The recombination work was also verified by

demonstrating that the predicted number of 80 generations of random mating was

sufficient to reduce a set amount of linkage disequilibrium in a population (and create a

population in linkage equilibrium) to the point where the simulations produced the same

response to selection as the prediction equations.

There were important consistencies and discrepancies between the response to

selection observed for the prediction equations and simulation results. Cases where the

simulation and prediction equations were not consistent occurred with the two-gene

model and a heritability of h2 = 1.0 for mass selection and S1 family selection and all

heritability levels for DH line selection. The experiment conducted to examine the

frequency of genotypes in the F2 base population for additive, partial dominance and

complete dominance models (Appendix 1, Section A1.3), illustrates other instances

where a deviation between the simulation results and the prediction equation results was

observed with the prediction equations consistently over-estimating the response to

selection. Deviations between the simulation and prediction equation results coincided

with departures from the additive model (Appendix 1, Figure A1.3). The assumption of

the base population phenotypic values having a normal distribution is a common and

important assumption as the theoretical applied selection intensity value depends on this

assumption. However, in most cases it was an invalid assumption when dominance was

Page 131: Narelle Kruger PhD thesis

CHAPTER 4 EXAMINING THEORY AND SIMULATION

97

present in the genetic models. This caused a problem with estimates based on the

prediction equations and created inconsistencies between the prediction and simulation

results, with the degree of inconsistency depending on the skewness of the distribution.

Discrepancies can be large for simulation of finite locus models based on small gene

numbers. Some researchers however, argue that the normal distribution assumption can

be substantiated by using the central limit theorem case (Ronningen 1976).

It is also important to note the effect the number of reserve seed and number of

progeny tested per S0 or F2 plant had on response to selection. When the number of

reserve seed and number of progeny tested per S0 or F2 plant was increased from 1 to

10, there was a significant increase in the response to selection observed with the low

heritability. By having more progeny tested per S0 or F2 plant the replication has caused

the heritability on a family-mean basis to increase and contribute towards a higher

response to selection.

While they were based on a different parameterisation, the Basic and Comstock

prediction equations produced the same response to selection under the genetic models

tested. This demonstrated that under the simple additive models tested, both prediction

equations were employing the same underlying modelling framework and assumptions.

These assumptions were also employed for the finite locus models considered in the

simulation for the cases when the prediction equations and simulations produced the

same results.

The results from this Chapter provides justification for proceeding to simulation

as prediction equations can not be constructed to deal with all the possible complexities

in a genetic system and the failure of assumptions. As research progresses and QTL data

relevant to the wheat Germplasm Enhancement Program considered here become more

widely available (e.g. Nadella 1998, Susanto 2004), simulation will have the ability to

predict more realistic values for the Germplasm Enhancement Program as the genetic

architecture of the quantitative traits modelled is better understood.

Page 132: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

98

4.5 Conclusion Departure from the simple additive model invalidated the normality assumption

held by theory and caused the expectations from the prediction equations to over-

estimate the response to selection compared with the simulation method. Some of the

deviations are expected as a consequence of finite sampling effects when representing a

quantitative trait by a small number of genes. Within a simulation modelling framework

it is possible to relax some of the assumptions applied to develop the prediction theory,

therefore, simulation can be used to study complex genetic systems beyond the basic

additive model. The ability of QU-GENE to model recombination and produce similar

results to the prediction equations under simple additive models, and to detect important

deviations when the assumptions do not hold, provided a validation of the simulation

algorithms and supported the use of a simulation approach to study the complex genetic

systems that are relevant to predicting response to selection in the Germplasm En-

hancement Program. A series of simulation based investigations adopting the proce-

dures outlined in Chapter 3, is used for Chapters 5 to 9.

Page 133: Narelle Kruger PhD thesis

CHAPTER 5 COMPARING QTL PROGRAMS AND SIMULATING THE WHEAT GENOME

99

CHAPTER 5

COMPARING QTL DETECTION

ANALYSIS PROGRAMS AND

SIMULATING THE WHEAT

GENOME IN QU-GENE

5.1 Introduction A key step in simulating marker-assisted selection in the Germplasm Enhance-

ment Program is integrating the methodology for detection of QTL into the simulation

of the marker-assisted selection process. The simulated detection of QTL within this

thesis involves using a stand alone QTL detection analysis program of which the output

will be used as an input to a QU-GENE module for the simulation of marker-assisted

selection. There are many QTL detection analysis programs available for use in the

detection of QTL. The first Section of this Chapter focuses on choosing, out of three

selected QTL detection analysis programs, which program will be used in this thesis.

The second Section of this Chapter involves a set of experiments to determine how best

to model multiple QTL scenarios on a simulated wheat genome. Bread wheat has three

closely related genomes (A, B and D genomes) consisting of six sets of chromosomes in

total (hexaploid). Due to the fact that wheat chromosomes pair in strict homologous

relationships, despite it containing three genomes, it behaves in a diploid-like manner

(Riley and Chapman 1958, Sutton et al. 2003) and therefore will be modelled as a

diploid genome in this thesis. A 21 chromosome, 12 QTL, eight flanking markers per

QTL model was compared to a 12 chromosome, 12 QTL and two flanking markers per

Page 134: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

100

QTL model to determine whether the removal of the excess chromosomes and markers

affected the detection of QTL. Part of this process also looked at whether the QTL

detection program would run faster with a reduced wheat genome. Since approximately

45 million simulation experiments needed to be analysed for this thesis, QTL detection

analysis program speed was an important criterion.

5.2 Selecting a QTL detection program to be used in this thesis

An experiment was conducted to determine the: (i) ease of use; (ii) ability to

automate high throughput QTL detection analysis; and (iii) compare the number of QTL

detected for each of three QTL detection analysis programs for a range of genetic

models. A wider range of QTL detection analysis programs was initially tested,

however, the number of programs was reduced to three based on the other QTL

detection analysis programs not fulfilling the requirements necessary for this thesis. The

three QTL detection analysis programs selected and examined in this Section were

PLABQTL (Utz and Melchinger 1996), QTL Cartographer (Basten et al. 1994, 2001)

and MapQTL (Van Ooijen and Maliepaard 1996). Each of these programs were able to

analyse doubled haploid and recombinant inbred line mapping populations and could

handle population sizes of 1000 individuals. Software licensing costs were also a

practical issue as the software was to run on the multiple processor QU-GENE

Computing Cluster (QCC: Micallef et al. 2001). PLABQTL and QTL Cartographer did

not require the payment of any fees for their use and are freely available on the internet.

MapQTL was available to use for a licence fee. As the University of Queensland

already had a licence for its use, it was included for evaluation. The way in which the

QTL detection analysis programs analysed data to account for G×E interaction or

epistasis was not used as a selection criterion as this function of the software was not

intended to be used in this thesis.

All three QTL detection analysis programs use the same underlying methodolo-

gies (refer to Chapter 2, Section 2.2.2.3) to implement permutation tests (Churchill and

Doerge 1994, Doerge and Churchill 1996) and interval mapping (Lander and Botstein

1989). For composite interval mapping the PLABQTL method is based on the Jansen

Page 135: Narelle Kruger PhD thesis

CHAPTER 5 COMPARING QTL PROGRAMS AND SIMULATING THE WHEAT GENOME

101

(1993, 1994) and Zeng (1994) methods, MapQTL bases its analysis on the methodology

of Jansen (1993, 1994) and Jansen and Stam (1994), while QTL Cartographer bases its

composite interval mapping method on the work of Zeng (1993, 1994). Due to the QTL

detection analysis programs using similar methodologies for composite interval

mapping, it was expected that each of the QTL detection analysis programs would

produce similar results. However, the ease of use of the three programs had the potential

to be quite variable depending on how the software was coded and this aspect of the

software was unknown at the beginning of this study.

5.2.1 Materials and Methods Conducting a QTL detection analysis on a simulated population required three

steps to be followed for this thesis (Figure 5.1). Firstly, the genetic models to be tested

were determined and set up in a format suitable for processing in the QU-GENE module

GEXP (Genetic EXPeriments). In the GEXP module, the genetic model was used to

create and simulate a specific type of mapping population, resulting in a set of files

which specified the marker genotype and the phenotypic data for the trait of interest for

each of the individuals in the mapping population. The marker genotype data was used

by MAPMAKER/EXP (Lander et al. 1987) to estimate the linkage map of the simulated

population. With this linkage map, the marker data and the phenotypic data for the trait

of interest, a QTL detection analysis software program (e.g. PLABQTL) was used to

determine if there were any QTL for the trait of interest associated with the markers on

the linkage map. These last two steps allowed for the same genetic model, with different

population sizes to be used for creating the linkage map and for conducting the QTL

detection analysis. In this thesis, the maximum number of QTL that were able to be

detected was known prior to conducting the QTL analysis as the number of segregating

QTL was specified in the computers input file for each genetic model. All LOD curves

above the specified LOD threshold (e.g. Figure 2.2) were considered to be QTL

detected by the QTL detection analysis program. All detected QTL were assumed to be

the QTL specified in the genetic model, if more QTL were detected than there were

known to be segregating, it was assumed that a false QTL occurred and was detected.

Preliminary assessments have consistently found that the QTL detected were the same

as those QTL known to be segregating in the genetic model, which suggests that this

Page 136: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

102

simple counting procedure was a reliable method for monitoring QTL detection. There

was no feedback mechanism in place to determine whether the QTL detected by the

QTL detection analysis program were the same as those specified in the QU-GENE

genetic model, or whether the QTL was a true QTL or not.

Determine the genetic models to be tested QU-GENE

PROCESS SOFTWARE

Estimate / create a linkage map MAPMAKER/EXP

Conduct a QTL detection analysis PLABQTL

Figure 5.1 The three step process to follow allowing a QTL detection analysis to be con-ducted on a simulated population

5.2.1.1 Genetic models A simulation experiment was designed to: (i) compare QTL detection between

the three QTL detection analysis programs; (ii) observe the effect of population size on

QTL detection (1000 individuals was a reference population size); and (iii) select the

QTL detection analysis program to be used in this thesis. Four genetic models were

setup and simulated with an array of experimental variables (Table 5.1). The experi-

ments were simulated to represent a population in a single environment, where there

were no epistatic effects and all QTL had equal additive effects. Further investigations

into the association between population size and the creation of a correctly structured

linkage map (relative to the map specified in the QUGENE input file) using MAP-

MAKER/EXP (Lander et al. 1987), are presented in Appendix 2, Section A2.1.

Table 5.1 Experimental variables used to define each genetic model for the QUGENE input file. Chr = chromosome, c = per meiosis recombination fraction and h2 = heritability of trait on an observational unit, MP-LG = mapping population size used to determine the linkage groups and MP-QTL = QTL detection mapping population size

Model No. chr

QTL / chr

Markers / QTL

c(QTL-

marker) c(marker-

marker) h2 MP-LG MP-QTL

1 1 1 2 0.1 - 1.0 100 100 2 2 3 2 0.1 0.1 1.0 1000 100, 500, 1000 3 10 1 2 0.05 - 1.0 1000 100, 500, 1000 4 10 2 4 0.025 0.05 1.0 1000 100, 500, 1000

Page 137: Narelle Kruger PhD thesis

CHAPTER 5 COMPARING QTL PROGRAMS AND SIMULATING THE WHEAT GENOME

103

The level of complexity for creating linkage groups and detecting QTL

increased as the model number increased from Model 1 (one chromosome, one QTL,

two flanking markers) to Model 2 (two chromosome, three QTL per chromosome, two

flanking markers per chromosome), Model 3 (10 chromosomes, one QTL per chromo-

some, two flanking markers per QTL) and Model 4 (10 chromosomes, two QTL per

chromosome, four flanking markers per QTL), (Figure 5.2).

Model 4

Marker1

QTL

Marker2

11.0

11.0

1

Marker1

QTL

Marker2

Marker3

QTL

Marker4

Marker5

QTL

Marker6

11.0

11.0

11.0

11.0

11.0

11.0

11.0

11.0

1

Marker1

QTL

Marker2

Marker3

QTL

Marker4

Marker5

QTL

Marker6

11.0

11.0

11.0

11.0

11.0

11.0

11.0

11.0

2

Marker1

QTL

Marker2

5.2

5.2

1

Marker1

QTL

Marker2

5.2

5.2

2

Marker1

QTL

Marker2

5.2

5.2

3

Marker1

QTL

Marker2

5.2

5.2

4

Marker1

QTL

Marker2

5.2

5.2

5

Marker1

QTL

Marker2

5.2

5.2

6

Marker1

QTL

Marker2

5.2

5.2

7

Marker1

QTL

Marker2

5.2

5.2

8

Marker1

QTL

Marker2

5.2

5.2

9

Marker1

QTL

Marker2

5.2

5.2

10

Marker1

Marker2QTLMarker3

Marker4

Marker5

Marker6QTLMarker7

Marker8

5.2

2.52.5

5.2

5.2

5.2

2.52.5

5.2

1

Marker1

Marker2QTLMarker3

Marker4

Marker5

Marker6QTLMarker7

Marker8

5.2

2.52.5

5.2

5.2

5.2

2.52.5

5.2

2

Marker1

Marker2QTLMarker3

Marker4

Marker5

Marker6QTLMarker7

Marker8

5.2

2.52.5

5.2

5.2

5.2

2.52.5

5.2

3

Marker1

Marker2QTLMarker3

Marker4

Marker5

Marker6QTLMarker7

Marker8

5.2

2.52.5

5.2

5.2

5.2

2.52.5

5.2

4

Marker1

Marker2QTLMarker3

Marker4

Marker5

Marker6QTLMarker7

Marker8

5.2

2.52.5

5.2

5.2

5.2

2.52.5

5.2

5

Marker1

Marker2QTLMarker3

Marker4

Marker5

Marker6QTLMarker7

Marker8

5.2

2.52.5

5.2

5.2

5.2

2.52.5

5.2

6

Marker1

Marker2QTLMarker3

Marker4

Marker5

Marker6QTLMarker7

Marker8

5.2

2.52.5

5.2

5.2

5.2

2.52.5

5.2

7

Marker1

Marker2QTLMarker3

Marker4

Marker5

Marker6QTLMarker7

Marker8

5.2

2.52.5

5.2

5.2

5.2

2.52.5

5.2

8

Marker1

Marker2QTLMarker3

Marker4

Marker5

Marker6QTLMarker7

Marker8

5.2

2.52.5

5.2

5.2

5.2

2.52.5

5.2

9

Marker1

Marker2QTLMarker3

Marker4

Marker5

Marker6QTLMarker7

Marker8

5.2

2.52.5

5.2

5.2

5.2

2.52.5

5.2

10

Model 2

Model 3

Model 1

Figure 5.2 Schematic outline of the Model 1, 2, 3 and 4 linkage groups. For Model 1 and 2 the markers are spaced at 11 cM (c = 0.1) from each QTL or marker. For Model 3 the markers are spaced at 5.2 cM (c = 0.05) from the QTL and for Model 4 the markers are spaced at 5.2 cM (c = 0.05) from a marker and 2.5 cM (c = 0.025) from a QTL. The per meiosis recombination fraction was converted to using the Haldane mapping function (Haldane 1931) As outlined in Table 5.1 the linkage groups for Model 1 were created from a re-

combinant inbred line mapping population of 100 individuals whilst Models 2, 3 and 4

used 1000 recombinant inbred lines as the smaller population sizes did not accurately

create the linkage groups for these more complex models (Appendix 2, Section A2.1).

Heritability for the trait was set at h2 = 1.0. The QUGENE input file for Model 1, 2, 3

Page 138: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

104

and 4 can be found in Appendix 2, Figure A2.5 – A2.8. The QTL detection analysis was

conducted on phenotypic, marker data and a linkage map created from a simulated

recombinant inbred line population of 100 individuals for Model 1 and 100, 500 and

1000 individuals for Models 2, 3 and 4.

5.2.1.2 Creating the mapping population and generating the linkage groups

Following the processes of Figure 5.1, once the genetic models were setup, the

mapping population and linkage maps were created. One recombinant inbred line

mapping population was created for each of the four genetic models. The simulations of

the mapping populations were established so that every QTL in the mapping population

was segregating. Each recombinant inbred line population was created from a bi-

parental cross between two genotypically extreme parents to form the F1. The F1 was

selfed to form an F2 population of a specific size (100, 500 and 1000 individuals).

Single seed descent was simulated and each F2 plant was selfed for greater than 10

generations to reach homozygosity. The QUGENE input file for each of the four genetic

models was run through the QUGENE engine (Chapter 2, Figure 2.10) to create a

genotype-environment system output file. The QUGENE output file was used as an

input for the GEXP module (Chapter 2, Figure 2.10). The GEXP module conducted the

bi-parental cross and generations of single seed descent, producing the marker data at

each locus and the phenotypic data for the trait simulated from the individuals derived

in the last generation of selfing.

MAPMAKER/EXP (Lander et al. 1987) was used to estimate the linkage groups

and was run manually. The output file from the QU-GENE module GEXP for each of

the genetic models was run through the MAPMAKER/EXP software. For Model 1 a

linkage map was created for a recombinant inbred line population of size 100. For

Models 2 and 3 a linkage map was created for a recombinant inbred line population of

100, 500 and 1000 individuals (the recombinant inbred line population of 100 and 500

individuals was used in Appendix 2, Section A2.2). For Model 4 a linkage map was

created for a recombinant inbred line population of 1000 individuals.

Page 139: Narelle Kruger PhD thesis

CHAPTER 5 COMPARING QTL PROGRAMS AND SIMULATING THE WHEAT GENOME

105

5.2.1.3 Conducting the QTL detection analysis As per Figure 5.1, after the genetic models were created and the linkage maps

were estimated, the QTL detection analysis was conducted. The QTL detection analysis

was conducted on the simulated data sets using three QTL detection analysis programs.

Each QTL detection analysis program required different input files based on the output

of MAPMAKER/EXP therefore, a set of Tcl/Tk (Tool command language/Toolkit,

http://tcl.activestate.com) scripts was created to automate the formation of the different

input files. Both interval mapping and composite interval mapping were conducted

using QTL Cartographer and PLABQTL. Only interval mapping was conducted with

MapQTL, as the automation of the permutations for composite interval mapping was

not practical for high volume simulation data sets. The significance threshold for

interval mapping was set at a LOD value of 2.5 (the default value in the PLABQTL

software). When composite interval mapping was conducted, a permutation test

(Churchill and Doerge 1994, Doerge and Churchill 1996) was first conducted to

determine an empirical LOD score for a significance threshold critical value α = 25%,

as suggested by Beavis (1998) for exploratory QTL detection analysis investigations.

When composite interval mapping was used, automatic co-factor selection was also

enabled. The number of QTL detected was recorded for each genetic model and QTL

detection analysis program.

5.2.2 Results For the genetic models simulated, the order of markers on the genetic map gen-

erated using MAPMAKER/EXP was compared to the order of markers specified in the

QU-GENE input file. For the four models tested the genetic map generated by MAP-

MAKER/EXP was the same as that specified in the genetic model (Appendix 2, Section

A2.1.1 – A2.1.4).

For Model 1, the single QTL was detected by all three QTL detection analysis

programs. For Model 2, a QTL mapping population size of 500 and 1000 individuals

resulted in all QTL being detected for the three QTL detection analysis programs and

the two different detection methods. For a mapping population size of 100 individuals,

QTL Cartographer did not detect QTL three on chromosome two using interval

Page 140: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

106

mapping, however, all QTL were detected using composite interval mapping. For both

interval mapping and composite interval mapping, PLABQTL detected all QTL except

QTL three on chromosome one. MapQTL detected all the QTL with interval mapping

(Table 5.2).

Table 5.2 QTL detection analysis results for a QTL mapping population size of 100 indi-viduals: if QTL detected, if QTL not detected, IM = interval mapping and CIM = com-posite interval mapping. NC = not conducted

QTL Cartographer PLABQTL MapQTL Analysis type IM CIM IM CIM IM CIM

Chromosome 1 NC Chromosome 2 NC

With a mapping population size of 100 individuals for Model 3, 50% of the QTL

were detected using interval mapping for both PLABQTL and MapQTL, which also

detected exactly the same QTL. Composite interval mapping provided an increase in the

number of QTL detected, with PLABQTL detecting all 10 QTL and QTL Cartographer

detecting one more QTL than with interval mapping (Table 5.3). For the 500 and 1000

population sizes, all QTL were detected by all three QTL detection analysis programs

using interval mapping, and composite interval mapping for QTL Cartographer and

PLABQTL.

Table 5.3 QTL detection analysis results for a QTL mapping population size of 100 indi-viduals: if QTL detected, if QTL not detected, IM = interval mapping and CIM = com-posite interval mapping. NC = not conducted

QTL Cartographer PLABQTL MapQTL Analysis type IM CIM IM CIM IM CIM

Chromosome 1 NC Chromosome 2 NC Chromosome 3 NC Chromosome 4 NC Chromosome 5 NC Chromosome 6 NC Chromosome 7 NC Chromosome 8 NC Chromosome 9 NC

Chromosome 10 NC

For Model 4, with a mapping population size of 100 individuals, the detection of

QTL was variable across chromosomes for all QTL detection analysis programs (Table

5.4). Generally, a specific QTL was not detected by all the QTL detection analysis

Page 141: Narelle Kruger PhD thesis

CHAPTER 5 COMPARING QTL PROGRAMS AND SIMULATING THE WHEAT GENOME

107

programs using interval mapping. For example, with a population size of 100 individu-

als and interval mapping, QTL one on chromosome two, QTL one and two on chromo-

some four, QTL two on chromosome five and chromosome six, and QTL one and two

on chromosome nine and chromosome 10 were not detected by any of the QTL

detection programs. Employing composite interval mapping with a population size of

100 resulted in an increase in QTL detected.

Table 5.4 QTL detection analysis results for a QTL mapping population size of 100 indi-viduals: if QTL detected, if QTL not detected, IM = interval mapping and CIM = com-posite interval mapping. NC = not conducted

QTL Cartographer PLABQTL MapQTL Analysis type IM CIM IM CIM IM CIM

Chromosome 1 NC Chromosome 2 NC Chromosome 3 NC Chromosome 4 NC Chromosome 5 NC Chromosome 6 NC Chromosome 7 NC Chromosome 8 NC Chromosome 9 NC

Chromosome 10 NC

As the mapping population size increased, the number of QTL detected in-

creased. With a population size of 500 and 1000 individuals using interval mapping,

both QTL Cartographer and MapQTL detected all 20 QTL. PLABQTL detected 16

QTL with a population size of 500 individuals, and 18 QTL with a population size of

1000 individuals. When composite interval mapping was employed, PLABQTL

detected all 20 QTL for the population sizes of 500 and 1000 individuals while QTL

Cartographer detected 19 QTL for a population size of 500 individuals and 20 QTL for

a population size of 1000 individuals.

5.2.3 Discussion The population size used to generate the linkage map was important for the

markers to be placed on the correct linkage groups (relative to the map specified in the

QUGENE input file). The results of Appendix 2, Section A2.1 reinforced the need for

linkage maps to be created from larger population sizes to ensure markers were

correctly assigned to linkage groups, especially as the complexity of the genetic model

Page 142: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

108

increased. With a population size of 1000 individuals, linkage maps were correctly

generated using MAPMAKER/EXP. Therefore, in Chapters 8 and 9 of this thesis,

where large mapping populations were used throughout, the simulation of the linkage

map generation step was removed and the linkage maps were automatically generated

using the recombination fractions specified in the QUGENE input file.

A general observation for the three different QTL detection analysis programs

was that as the complexity of the genetic model increased (e.g. by adding more QTL,

markers and chromosomes) the QTL detection analysis programs did not always detect

the same number of QTL. This was more apparent at the lower mapping population size

of 100 individuals.

The mapping population size used to detect QTL influenced the number of QTL

detected. As the population size increased, the number of QTL detected increased. This

result is consistent with the findings of Beavis (1994, 1998). Therefore, the extra

information that the additional individuals contributed towards the QTL detection

analysis was important in identifying all segregating QTL. Based on the QTL detection

experiments conducted in Section 5.2, each program produced similar results when the

larger population sizes of 500 and 1000 individuals were used. It is also important to

note that these comparisons were conducted on models that did not include epistasis or

G×E interactions.

MAPQTL was eliminated from further consideration on the practical grounds

that it was not easy to automate its use for the simulation studies that were the focus of

this thesis. QTL Cartographer was also eliminated on the grounds that it was more

difficult to automate and run in batch mode for the large scale simulation experiments

required for this thesis.

5.2.4 Conclusion Following these investigations PLABQTL was selected as the program of choice

for this thesis due to its comparable detection results with the other two QTL detection

analysis programs, ease of automation, requirement of little manual input in the many

Page 143: Narelle Kruger PhD thesis

CHAPTER 5 COMPARING QTL PROGRAMS AND SIMULATING THE WHEAT GENOME

109

steps of the QTL detection analysis and its ability to run efficiently and easily in batch

mode. Composite interval mapping with automatic co-factor selection using PLABQTL

was selected as the QTL detection analysis method to be used throughout the remainder

of this thesis. A permutation test conducted to generate the empirical LOD score for a

significance threshold critical value of α = 0.25 was also selected.

Page 144: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

110

5.3 Modelling the wheat genome for QTL detection analysis using PLABQTL

In this Section, QU-GENE and PLABQTL were used to determine how best to

model multiple QTL scenarios for a simulated wheat genome (Figure 5.3). While it is

possible within QU-GENE, simulating the entire wheat genome (Figure 5.3a) would be

an inefficient use of computer time for the objective of this thesis, especially as only a

select proportion of the genome contributes to the variation observed in a trait.

Preliminary investigations indicated that larger genome models can take more than 24

hours to complete the QTL detection analysis, including permutations. Therefore, when

many thousands of genetic models are considered, a comprehensive simulation

experiment would have been impractical due to the computing time required.

A series of simulation experiments was conducted to compare the detection of

12 QTL on three representations of the simulated wheat genome. The choice of 12 QTL

was based on the ease of specifying a range of epistatic models within the E(NK) model

when N = 12; Chapter 9. Firstly, a 21 chromosome model was simulated with one QTL

per chromosome and eight flanking markers per QTL to represent the full wheat

genome chromosome numbers (Figure 5.3b). Of the 21 chromosomes, 12 of the

chromosomes possessed a QTL contributing towards the trait of interest, while nine

chromosomes contained eight markers each and no QTL. The next genome model

involved removing the nine chromosomes with no QTL from Model 1, resulting in 12

chromosomes each with one QTL contributing towards the trait of interest and eight

flanking markers per QTL (Figure 5.3c). This genome model was then further reduced

to a 12 chromosome model, with each chromosome containing one QTL, and two

flanking markers (Figure 5.3d). This progression from a comprehensive but complex

representation of the wheat genome, to a simpler genome map was used to determine

whether the simpler genome model would be sufficient to simulate the QTL detection

analysis process expected for the full genome model.

Page 145: Narelle Kruger PhD thesis

CHAPTER 5 COMPARING QTL PROGRAMS AND SIMULATING THE WHEAT GENOME

111

Figure 5.3 Schematic outline of artificially zooming in on regions of the wheat genome containing QTL contributing towards a trait of interest. Simulation of the wheat genome progressed from the genetic map of wheat (a), which may contain 12 QTL of interest and can be represented for simulation using 21 linkage groups, each with eight markers, and 12 linkage groups with one QTL (b), this can be reduced to 12 chromosomes each containing a QTL (c) and then to 12 chromosomes each with one QTL and two flanking markers (d). The Haldane mapping function (Haldane 1931) was used to convert per meiosis recombina-tion fractions to cM. Wheat genome figures (Nelson et al. 1995a, Nelson et al. 1995b, Nel-son et al. 1995c, Vandeynze et al. 1995, Marino et al. 1996)

1A

1B

1

D

2A

2B

2

D

3A

3B

3D

4A

4B

4D

5A

5B

5D

6A

6B

6D

7A

7B

7

D

1

2

3

4 5

6

7

8

9

10

11

1

2

1

2

3

4

5

6

7

8

9

1

0

11

12

Marke

r1

Marke

r2

Marke

r3Ma

rker4

QTL

Marke

r5Ma

rker6

Marke

r7

Marke

r8

25.5

25.5 5.2 5.2 5.2 5.2 25.5

25.5

Marke

r1

Marke

r2

Marke

r3Ma

rker4

Marke

r5Ma

rker6

Marke

r7

Marke

r8

25.5

25.5 5.2 10.4 5.2 25.5

25.5

Mark

er1

Mark

er2

Mark

er3Ma

rker4

QTL

Mark

er5Ma

rker6

Mark

er7

Mark

er8

25.5

25.5 5.2 5.2 5.2 5.2 25.5

25.5

Marke

r1

Marke

r2

Marke

r3Ma

rker4

Marke

r5Ma

rker6

Marke

r7

Marke

r8

25.5

25.5 5.2 10.4 5.2 25.5

25.5

Mark

er1

Mark

er2

Mark

er3Ma

rker4

QTL

Mark

er5Ma

rker6

Mark

er7

Mark

er8

25.5

25.5 5.2 5.2 5.2 5.2 25.5

25.5

Marke

r1

Marke

r2

Marke

r3Ma

rker4

Marke

r5Ma

rker6

Marke

r7

Marke

r8

25.5

25.5 5.2 10.4 5.2 25.5

25.5

Marke

r1

Marke

r2

Marke

r3Ma

rker4

QTL

Marke

r5Ma

rker6

Marke

r7

Marke

r8

25.5

25.5 5.2 5.2 5.2 5.2 25.5

25.5

Mark

er1

Mark

er2

Mark

er3Ma

rker4

QTL

Mark

er5Ma

rker6

Mark

er7

Mark

er8

25.5

25.5 5.2 5.2 5.2 5.2 25.5

25.5

Marke

r1

Marke

r2

Marke

r3Ma

rker4

QTL

Marke

r5Ma

rker6

Marke

r7

Marke

r8

25.5

25.5 5.2 5.2 5.2 5.2 25.5

25.5

Mark

er1

Mark

er2

Mark

er3Ma

rker4

Mark

er5Ma

rker6

Mark

er7

Mark

er8

25.5

25.5 5.2 10.4 5.2 25.5

25.5

Marke

r1

Marke

r2

Marke

r3Ma

rker4

QTL

Marke

r5Ma

rker6

Marke

r7

Marke

r8

25.5

25.5 5.2 5.2 5.2 5.2 25.5

25.5

Marke

r1

Marke

r2

Marke

r3Ma

rker4

Marke

r5Ma

rker6

Marke

r7

Marke

r8

25.5

25.5 5.2 10.4 5.2 25.5

25.5

Mark

er1

Mark

er2

Mark

er3Ma

rker4

QTL

Mark

er5Ma

rker6

Mark

er7

Mark

er8

25.5

25.5 5.2 5.2 5.2 5.2 25.5

25.5

Marke

r1

Marke

r2

Marke

r3Ma

rker4

QTL

Marke

r5Ma

rker6

Marke

r7

Marke

r8

25.5

25.5 5.2 5.2 5.2 5.2 25.5

25.5

Marke

r1

Marke

r2

Marke

r3Ma

rker4

Marke

r5Ma

rker6

Marke

r7

Marke

r8

25.5

25.5 5.2 10.4 5.2 25.5

25.5

Marke

r1

Marke

r2

Marke

r3Ma

rker4

QTL

Marke

r5Ma

rker6

Marke

r7

Marke

r8

25.5

25.5 5.2 5.2 5.2 5.2 25.5

25.5

Marke

r1

Marke

r2

Marke

r3Ma

rker4

Marke

r5Ma

rker6

Marke

r7

Marke

r8

25.5

25.5 5.2 10.4 5.2 25.5

25.5

Marke

r1

Marke

r2

Marke

r3Ma

rker4

QTL

Marke

r5Ma

rker6

Marke

r7

Marke

r8

25.5

25.5 5.2 5.2 5.2 5.2 25.5

25.5

Marke

r1

Marke

r2

Marke

r3Ma

rker4

Marke

r5Ma

rker6

Marke

r7

Marke

r8

25.5

25.5 5.2 10.4 5.2 25.5

25.5

Marke

r1

Marke

r2

Marke

r3Ma

rker4

QTL

Marke

r5Ma

rker6

Marke

r7

Marke

r8

25.5

25.5 5.2 5.2 5.2 5.2 25.5

25.5

Marke

r1

Marke

r2

Marke

r3Ma

rker4

Marke

r5Ma

rker6

Marke

r7

Marke

r8

25.5

25.5 5.2 10.4 5.2 25.5

25.5

Marke

r1

Marke

r2

Marke

r3Ma

rker4

QTL

Marke

r5Ma

rker6

Marke

r7

Marke

r8

25.5

25.5 5.2 5.2 5.2 5.2 25.5

25.5

Marke

r1

Marke

r2

Marke

r3Ma

rker4

QTL

Marke

r5Ma

rker6

Marke

r7

Marke

r8

25.5

25.5 5.2 5.2 5.2 5.2 25.5

25.5

Marke

r1

Marke

r2

Marke

r3Ma

rker4

QTL

Marke

r5Ma

rker6

Marke

r7

Marke

r8

25.5

25.5 5.2 5.2 5.2 5.2 25.5

25.5

Marke

r1

Marke

r2

Marke

r3Ma

rker4

QTL

Marke

r5Ma

rker6

Marke

r7

Marke

r8

25.5

25.5 5.2 5.2 5.2 5.2 25.5

25.5

Marke

r1

Marke

r2

Marke

r3Ma

rker4

QTL

Marke

r5Ma

rker6

Marke

r7

Marke

r8

25.5

25.5 5.2 5.2 5.2 5.2 25.5

25.5

Marke

r1

Marke

r2

Marke

r3Ma

rker4

QTL

Marke

r5Ma

rker6

Marke

r7

Marke

r8

25.5

25.5 5.2 5.2 5.2 5.2 25.5

25.5

Marke

r1

Marke

r2

Marke

r3Ma

rker4

QTL

Marke

r5Ma

rker6

Marke

r7

Marke

r8

25.5

25.5 5.2 5.2 5.2 5.2 25.5

25.5

Marke

r1

Marke

r2

Marke

r3Ma

rker4

QTL

Marke

r5Ma

rker6

Marke

r7

Marke

r8

25.5

25.5 5.2 5.2 5.2 5.2 25.5

25.5

Marke

r1

Marke

r2

Marke

r3Ma

rker4

QTL

Marke

r5Ma

rker6

Marke

r7

Marke

r8

25.5

25.5 5.2 5.2 5.2 5.2 25.5

25.5

Marke

r1

Marke

r2

Marke

r3Ma

rker4

QTL

Marke

r5Ma

rker6

Marke

r7

Marke

r8

25.5

25.5 5.2 5.2 5.2 5.2 25.5

25.5

Marke

r1

Marke

r2

Marke

r3Ma

rker4

QTL

Marke

r5Ma

rker6

Marke

r7

Marke

r8

25.5

25.5 5.2 5.2 5.2 5.2 25.5

25.5

Marke

r1

Marke

r2

Marke

r3Ma

rker4

QTL

Marke

r5Ma

rker6

Marke

r7

Marke

r8

25.5

25.5 5.2 5.2 5.2 5.2 25.5

25.5

Marke

r4

QTL

Marke

r5

5.2 5.2

Marke

r4

QTL

Marke

r5

5.2 5.2

Marke

r4

QTL

Marke

r5

5.2 5.2

Marke

r4

QTL

Marke

r5

5.2 5.2

Marke

r4

QTL

Marke

r5

5.2 5.2

Marke

r4

QTL

Marke

r5

5.2 5.2

Marke

r4

QTL

Marke

r5

5.2 5.2

Marke

r4

QTL

Marke

r5

5.2 5.2

Marke

r4

QTL

Marke

r5

5.2 5.2

Marke

r4

QTL

Marke

r5

5.2 5.2

Marke

r4

QTL

Marke

r5

5.2 5.2

Marke

r4

QTL

Marke

r5

5.2 5.2

(a) (b) (c) (d)

Page 146: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

112

5.3.1 Materials and Methods 5.3.1.1 Genetic models

Three simulation experiments were conducted to determine the use of a simpli-

fied genetic map for modelling multiple QTL genetic models for QTL detection for a

trait in wheat. The three experiments were based on three different genome configura-

tions (Table 5.6). All Models used a recombinant inbred line mapping population of

1000 individuals.

Table 5.6 Experimental variables used to define each genetic model for the QUGENE input file. chr = chromosome, c = per meiosis recombination fraction and h2 = heritability of trait on an observational unit, MP-QTL = QTL detection mapping population size

Model No. chr

No. chr with no QTL

QTL / chr

Markers / QTL

c(QTL-

marker) c(marker-

marker) h2 MP-

QTL 1 21 9 1 4 0.05 0.05, 0.2 1.0 1000 2 12 0 1 4 0.05 0.05, 0.2 1.0 1000 3 12 0 1 2 0.05 0.05 1.0 1000

Model 1 (21 chromosomes, 12 chromosomes with a QTL, nine chromosomes with

no QTL, eight markers per chromosome) consisted of four markers flanking the QTL

(two either side of the QTL) at a per meiosis recombination fraction c = 0.05 between

the markers and c = 0.05 between a marker and QTL. The next two sets of flanking

markers were spaced at a per meiosis recombination fraction c = 0.2 between the

markers. This same setup occurred for the chromosomes with no QTL; however, the

genetic distance between the two middle markers was a per meiosis recombination

fraction of c = 0.1, as there was no QTL present on these chromosomes (Figure 5.3b).

The Model 2 experiment involved removing the markers and chromosomes from

Model 1 that did not contribute towards the trait of interest. This resulted in a 12

chromosome model, with one QTL per chromosome and eight flanking markers per

QTL (Figure 5.3c). This model was used to determine whether all segregating QTL

were detected when the extra chromosomes and markers were removed from the model.

Model 3 involved removal of the flanking markers from Model 2, except for the

closest flanking markers. Model 3 therefore, had 12 chromosomes with one QTL per

chromosome and two flanking markers per QTL (Figure 5.3d). This model was

Page 147: Narelle Kruger PhD thesis

CHAPTER 5 COMPARING QTL PROGRAMS AND SIMULATING THE WHEAT GENOME

113

simulated to determine whether the two closest flanking markers were sufficient to

detect the segregating QTL and to determine whether this reduced genome model

produced similar results to the larger genome models.

5.3.1.2 Creating the mapping population and generating the linkage groups

One recombinant inbred line mapping population was created for each of the

three genetic models (Table 5.6). The simulation of the mapping population was

conducted to represent the mapping process used for the wheat Germplasm Enhance-

ment Program (Cooper et al. 1999a). Ten inbred parents were created in the QUGENE

engine according to the genetic model specifications. The parents were genotyped based

on polymorphic markers and the two most extreme genotypes were crossed and the

progeny selfed to form a recombinant inbred line population. This process may result in

fewer QTL segregating in the population, in contrast to Section 5.2.1.2 where the

population was set up so that all QTL were segregating. The two selected parents were

crossed to form the F1. The F1 was selfed to form an F2 population of a specific size

(100, 500 and 1000 individuals). Single seed descent was simulated and each F2 plant

was selfed for greater than 10 generations to reach homozygosity. While the actual

Germplasm Enhancement Program mapping population was not selfed for so many

generations, the simulation study was designed to remove residual heterozygosity from

the recombinant inbred lines. The QUGENE input file for each of the four genetic

models was run through the QUGENE engine (Chapter 2, Figure 2.10) to create a

genotype-environment system output file. The QUGENE output file was used as an

input for the GEXP module (Chapter 2, Figure 2.10). The GEXP module conducted the

bi-parental cross and generations of single seed descent as well as producing the marker

data at each locus and the phenotypic data for the trait simulated from the individuals

derived in the last generation of selfing.

Based on the results of Appendix 2, Section A2.1 the linkage groups were cre-

ated using the values specified in the QUGENE input file and not using MAP-

MAKER/EXP (Lander et al. 1987). The linkage groups were manually entered into the

PLABQTL input file.

Page 148: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

114

5.3.1.3 Conducting the QTL detection analysis Based on the results of Section 5.2, the QTL detection analysis was conducted

on the simulated data using PLABQTL. To obtain a LOD threshold level, PLABQTL

was run using the permutation function. One thousand permutations were run for each

of the genetic models to generate a LOD threshold. Composite interval mapping with

automatic co-factor selection was employed using the critical value α = 25% threshold

obtained from the permutation test. This threshold was suggested by Beavis (1998) as

an acceptable value when using a genome scanning approach to explore a genome for

QTL.

Permutation tests were the longest stage of the QTL detection analysis and could

take a significant amount of time to run. Therefore, for the differing genome model

sizes, the length of time 1000 permutation tests took to run was recorded3. The QTL

detection analysis step using PLABQTL was quick and took three seconds to run for all

genome sizes. As the QTL detection analysis step of the process was found to be fast, a

comparison was not conducted between the different genome sizes.

5.3.2 Results Since the simulation of the mapping population was conducted to represent the

Germplasm Enhancement Program mapping study (Cooper et al. 1999a), it was possible

that not all of the 12 QTL were segregating in the recombinant inbred line mapping

population for each of the genetic models. When a QTL was not segregating this meant

that the two parents (selected from the 10 parents) had the same allele for a QTL and no

polymorphism was detected. When this occurred the QTL were monomorphic in the

recombinant inbred line population. Only polymorphic QTL could be detected.

For Model 1, 10 of the 12 QTL were polymorphic in the recombinant inbred line

mapping population. All 10 QTL were detected as contributing towards the variation

observed for the trait of interest. Even though polymorphic markers were segregating on

the nine chromosomes that did not contain a QTL (chromosomes 1B, 2A, 2D, 4A, 4D,

3 Computer Hardware: AMD Athlon™ XP 1600+, 1.4 GHz, 1.00 GB RAM.

Page 149: Narelle Kruger PhD thesis

CHAPTER 5 COMPARING QTL PROGRAMS AND SIMULATING THE WHEAT GENOME

115

5D, 6B, 7A and 7D), no false positive QTL were detected on these chromosomes. For

Model 1, 1000 permutations using PLABQTL took 17 minutes and 27 seconds.

For the 12 chromosome, one QTL per chromosome, eight flanking markers

model (Model 2), eight of the 12 QTL were segregating in the recombinant inbred line

mapping population. All of the segregating QTL were detected. The time required to

conduct 1000 permutations using PLABQTL was 5 minutes and 19 seconds.

Reducing the flanking markers down to two (Model 3), nine of the 12 QTL were

segregating in the recombinant inbred line mapping population. All of the segregating

QTL were detected therefore, removing the additional six markers present in Model 2

had no effect on the number of QTL detected. The time required to conduct 1000

permutations using PLABQTL for Model 3 was 25 seconds.

5.3.3 Discussion From the results comparing QTL detection based on different representations of

a wheat genome, it was concluded that a multi-QTL model for a trait, with up to 12

QTL contributing towards the trait of interest, can be modelled using a reduced genome

model, e.g. in this case a 12 chromosome, one QTL per chromosome, two flanking

markers model (Figure 5.3d) as compared to 21 chromosomes with eight markers per

chromosome and nine of the QTL containing no QTL (Figure 5.3b). For the more

comprehensive genome representation based on the 21 chromosome experiment (Model

1, Figure 5.3b), even though polymorphic markers were placed on chromosomes not

containing QTL, false QTL were not detected on these additional chromosomes.

Therefore, it is concluded that there was no need to include these additional chromo-

somes in the genome model. It was also observed that all flanking markers on a

chromosome containing a QTL did not need to be modelled to obtain comparable QTL

detection results. For the 12 chromosome experiment with eight flanking markers

(Model 2, Figure 5.3c), only the two closest flanking markers were required to detect

the QTL. For all three genome models considered, all of the segregating QTL were

detected. Therefore, a simulated genome structure based on a 12 chromosome, 12 QTL,

Page 150: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

116

two flanking markers model (Model 3) as shown in Figure 5.3d would be suggested as a

viable modelling option.

In addition to the simplicity of Model 3 over Model 1, the difference in the

computing time taken to conduct a permutation test to establish the significance

threshold was substantial. The 12 chromosome, two flanking marker genome model

(Model 3) was 12 times faster than the 12 chromosome, eight flanking markers genome

model (Model 2) and 41 times faster than the 21 chromosome, eight flanking markers

genome model (Model 1). This is a substantial saving in analysis time and allows more

efficient simulation of the wheat genome without detectable loss in representation of the

QTL detection process. Increasing the speed of the QTL detection analysis allowed

consideration of many more genetic model scenarios in the following Chapters. Based

on the time saved, and ability to reliably model a more complex genome system, the

reduced genome model e.g. the 12 chromosome, 12 QTL, two flanking markers model

(Figure 5.3d), as well as a smaller 10 chromosome, 10 QTL, two flanking markers

genome model will be predominately used in the experiments throughout the rest of this

thesis.

5.3.4 Conclusion A set of progressive simulation experiments was conducted to show how a 12

chromosome, one QTL per chromosome, two flanking markers per QTL model could

be used to simulate the wheat genome for the QTL detection studies that are the focus of

this thesis. In addition, the smaller genome sizes also required shorter computer

simulation time. Together they allow a high throughput simulation investigation of the

detection of segregating QTL for Parts III and IV of this thesis.

Page 151: Narelle Kruger PhD thesis

PART III FACTORS AFFECTING THE POWER OF QTL DETECTION

117

PART III

FACTORS AFFECTING THE

POWER OF QTL DETECTION

Page 152: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

118

Page 153: Narelle Kruger PhD thesis

CHAPTER 6 EFFECT OF MAPPING POPULATION SIZE ON QTL DETECTION

119

CHAPTER 6

EFFECT OF MAPPING POPULATION

SIZE, PER MEIOSIS RECOMBINATION

FRACTION AND HERITABILITY ON

QTL DETECTION

6.1 Introduction With the availability of large numbers of markers distributed across plant

genomes, marker-assisted selection has become more widely available for use in

breeding programs, such as the Germplasm Enhancement Program (Nadella 1998,

Cooper et al. 1999a, Susanto 2004). Detecting the presence of QTL for a trait relies on

many factors, with one of the most important being identified as the mapping population

size (Beavis 1998, Liu 1998, Charmet 2000, Carlborg and Haley 2004, Holland 2004).

Small mapping population sizes are convenient (e.g. 96-well PCR plates) and desirable,

as they require fewer resources to analyse. However, small populations may miss QTL

that exist, inaccurately estimate the contribution the QTL makes towards the variation

observed for a trait and can also contribute to the detection of false QTL. A review of

plant QTL detection literature (all population types) shows that 51% of the studies used

mapping population sizes between 60 and 140 individuals (Figure 6.1). These smaller

mapping population sizes detected approximately the same number of QTL per trait as

the studies based on larger mapping population sizes.

The mapping population constructed to detect QTL so that marker-assisted se-

lection could be implemented in the wheat Germplasm Enhancement Program was

Page 154: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

120

designed taking into consideration the recommendations of Beavis (1994, 1998), as

described by Cooper et al. (1999a). It was considered to be important to re-evaluate the

findings of Beavis (1994, 1998) and examine the influence of other variables in

combination with population size for situations relevant to the Germplasm Enhancement

Program. For the quantitative traits of interest to the Germplasm Enhancement Program

it is expected that trait heritability can range from low to high. Also with the current

status of marker map development it is likely that map density will be relatively low at

present (Nadella 1998, Susanto 2004) and therefore, the recombination fraction between

markers could range from low to high. Based on these considerations the aim of this

Chapter is to use simulation to examine how mapping population size, heritability and

per meiosis recombination fraction between a marker and QTL can influence QTL

detection. Here the smaller recombination fraction is considered to represent the case of

a dense genetic map and the larger recombination fraction the case of a less dense

genetic map. This investigation provides a basis for recommending a threshold mapping

population size for the Germplasm Enhancement Program at which confidence could be

placed in the power of the mapping study for QTL detection.

Population size

40-5960-79

80-99100-119

120-139140-159

160-179180-199

200-219220-239

240-259260-279

280-299300-320

Per

cent

age

of p

aper

s (%

)

0

2

4

6

8

10

12

14

16

18

20

22

Num

ber o

f QTL

per

trai

t

0

2

4

6

8

10

12

14

16

18

20

22

Average number QTL per traitPopulation frequency

Figure 6.1 A sample of articles (86) on plant QTL analysis was assessed on the basis of the mapping population size used to find QTL and the number of QTL detected per trait. The filled bars indicate the percentage of papers that reported a mapping population size in the indicated range. The error bars indicate the minimum and maximum number of QTL per trait, with the filled circle indicating the average. 51% of the papers used a mapping population size between 60 and 140 individuals

Page 155: Narelle Kruger PhD thesis

CHAPTER 6 EFFECT OF MAPPING POPULATION SIZE ON QTL DETECTION

121

6.2 Materials and Methods 6.2.1 Genetic models A factorial experiment based on 24 genetic models was conducted to determine

the dependence of QTL detection on mapping population size, per meiosis recombina-

tion fraction between a marker and QTL, and the heritability of the trait. Following the

investigations in Chapter 5, a reduced model representation of the wheat genome was

applied. The basic model consisted of 10 chromosomes with each chromosome

containing one segregating QTL evenly spaced between two flanking markers (Figure

6.2).

Marker1

QTL

Marker2

11.0

11.0

1

Marker1

QTL

Marker2

11.0

11.0

2

Marker1

QTL

Marker2

11.0

11.0

3

Marker1

QTL

Marker2

11.0

11.0

4

Marker1

QTL

Marker2

11.0

11.0

5

Marker1

QTL

Marker2

11.0

11.0

6

Marker1

QTL

Marker2

11.0

11.0

7

Marker1

QTL

Marker2

11.0

11.0

8

Marker1

QTL

Marker2

11.0

11.0

9

Marker1

QTL

Marker2

11.0

11.0

10

Figure 6.2 Schematic outline of the simulated linkage groups. Ten chromosomes, each with one QTL and two flanking markers. The example here has the markers spaced at 11 cM from the QTL, or a per meiosis recombination fraction of c = 0.1 on either side of the QTL when converted using the Haldane mapping function (Haldane 1931)

The experimental variables for this experiment were: (i) recombinant inbred line

mapping population size for QTL detection (MP): 100, 200, 500 and 1000 (reference

population size); (ii) heritability of the trait on an observed unit (single plant) basis (h2):

0.25 (low) and 1.0 (high, reference value); and (iii) per meiosis recombination fraction

between a marker and QTL (c): 0.01 (small) and 0.1 (large). The mapping experiments

were simulated to represent an experiment in a single environment, where there were no

epistatic effects and all QTL had small and equal additive effects. More complex

genetic models including the effects of epistasis and G×E interactions are considered in

Chapter 7. In the present study each combination of variables was replicated 100 times,

resulting in 2400 genetic model replicate scenarios.

6.2.2 Creating the mapping population and generating the linkage groups One recombinant inbred line mapping population was created for each of the

genetic model scenarios. The procedures as outlined in Chapter 5, Section 5.2.1.2 were

Page 156: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

122

followed for creating the mapping populations. In this case, all of the QTL were

segregating and linkage groups were generated using MAPMAKER/EXP (Lander et al.

1987). Additional information with respect to creating the mapping population and

generating the linkage groups for this Chapter were that the F1 was selfed to form an F2

population of a specific size (MP = 100, 200, 500, 1000, Section 6.2.1). A linkage map

was created for each genetic model that was used in the QTL detection analysis.

6.2.3 Conducting the QTL detection analysis Following the exploratory work and comparisons reported in Chapter 5, the QTL

detection analyses were conducted using the computer program PLABQTL (Utz and

Melchinger 1996). To obtain a LOD threshold level, PLABQTL was run using the

permutation function. One thousand permutations were run for each of the genetic

models and a LOD threshold obtained. Composite interval mapping with automatic co-

factor selection was employed using the critical value α = 25% threshold obtained from

the permutation test. A LOD threshold for the critical value α = 25% was suggested by

Beavis (1998) as an acceptable value when using a genome scanning approach to

explore a genome for QTL, and was used in this experiment and throughout this thesis.

6.2.4 Conducting the statistical analyses An analysis of variance was conducted for the 2400 genetic models tested. The

variate recorded for each of the genetic models was the number of QTL detected. The

sources of variation in the analysis of variance were per meiosis recombination fraction,

heritability level, mapping population size and all combinations of these to produce the

first-order (i.e. two-factor) interactions. All other interactions were confounded into the

residual and treated as error. The model used for the analysis of variance is shown as

Equation (6.1),

2 2 2( ) ( ) ( ) ,ijkl i j k ij ik jk ijklx c h MP c h c MP h MPμ ε= + + + + × + × + × + (6.1)

where,

ijklx is the number of QTL detected for replicate l, at per meiosis recombination

fraction level i, heritability level j and mapping population size k,

μ is the overall mean,

Page 157: Narelle Kruger PhD thesis

CHAPTER 6 EFFECT OF MAPPING POPULATION SIZE ON QTL DETECTION

123

ic is the fixed effect of the ith per meiosis recombination fraction level,

2jh is the fixed effect of the jth heritability level,

kMP is the fixed effect of the kth mapping population size,

combinations of the above terms represent their interactions,

ijklε is the random residual effect of per meiosis recombination fraction level i,

heritability level j, and mapping population size k for replicate l, 2(0, )N εε σ∼ .

The significance level for the analysis of variance was set at a critical value of α

= 0.05. Analyses were conducted with the fixed effects constrained to sum-to-zero

within the ASREML software (Gilmour et al. 1999). A least significant difference test

was conducted on the means of the levels within a factor that had a significant F value.

6.3 Results There was a difference at the 5% significance level between the three per meio-

sis recombination fractions, two heritability levels and four mapping population sizes

(Table 6.1). There was also a significant interaction between per meiosis recombination

fraction and heritability level, per meiosis recombination fraction and mapping

population size, and heritability level and mapping population size (Table 6.1). The

means of each significant main effect source of variation is illustrated in Figure 6.3. The

significant first-order interactions are presented in Figure 6.4.

Table 6.1 Analysis of variance for the number of QTL detected. Degrees of freedom (DF) and F values are shown for per meiosis recombination fraction (c), heritability (h2), and mapping population size (MP) and first-order interactions. σ2 = error mean square

Source DF F value c 2 316.8 * h2 1 6798.7 *

MP 3 2170.7 * c × h2 2 51.0 *

c × MP 6 38.8 * h2 × MP 3 1429.7 *

Error 2382 σ2 =1.005 Total 2399

* indicates significant at α = 5%, F distribution

Page 158: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

124

There was a significant difference between all four mapping population sizes

with the smaller mapping population sizes producing a lower percentage of QTL

detected than the larger mapping population sizes (Figure 6.3a). With a mapping

population size of 100 individuals, 57% of the QTL were detected (Figure 6.3a). The

average number of QTL detected increased as the mapping population size increased,

with 99% of the QTL detected when the mapping population size was 1000 individuals

(Figure 6.3a). For a large per meiosis recombination fraction c = 0.1, 74% of the QTL

were detected on average. As the per meiosis recombination fraction decreased to c =

0.05 and c = 0.01 the percentage of QTL detected increased to 83% and 86% respec-

tively (Figure 6.3b). Heritability had a significant effect on the efficiency of QTL

detection (Table 6.1). Over the combinations of mapping population size and per

meiosis recombination fraction considered, a heritability of h2 = 0.25 detected 64% of

the QTL while a heritability of h2 = 1.0, detected 98% of the QTL on average (Figure

6.3c).

(a) Mapping population size

Mapping population size100 200 500 1000

Perc

ent o

f QTL

det

ecte

d

0

20

40

60

80

100(b) Recombination fraction

Recombination fraction0.01 0.05 0.1

0

20

40

60

80

100(c) Heritability

Heritability0.25 1

0

20

40

60

80

100

Figure 6.3 Percent of QTL detected (averaged over 100 runs) for each significant experi-mental variable from the analysis of variance. All levels within experimental variable fac-tors were significantly different. All 10 QTL were segregating

Three first-order interactions were significant from the analysis of variance: (i)

heritability × per meiosis recombination fraction (h2 × c) interaction; (ii) heritability ×

mapping population size (h2 × MP) interaction; and (iii) per meiosis recombination

fraction × mapping population size (c × MP) interaction, (Table 6.1). For the h2 × c

interaction, a per meiosis recombination fraction of c = 0.01 had a higher number of

QTL detected on average for a heritability of h2 = 0.25 than a per meiosis recombination

fraction of c = 0.05 and c = 0.1. With a heritability of h2 = 1.0 there was no significant

difference between a per meiosis recombination fraction of c = 0.01 and c = 0.05, which

lsd=0.11 lsd=0.08lsd=0.1

Page 159: Narelle Kruger PhD thesis

CHAPTER 6 EFFECT OF MAPPING POPULATION SIZE ON QTL DETECTION

125

were significantly different from a per meiosis recombination fraction of c = 0.1 (Figure

6.4a).

(a) h2 x c

Recombination fraction0.01 0.05 0.1

Ave

rage

no.

of Q

TL d

etec

ted

0

2

4

6

8

10

(b) h2 x MP

Mapping population size100 200 500 1000

0

2

4

6

8

10

(c) c x MP

Mapping population size100 200 500 1000

0

2

4

6

8

10

h2 = 0.25h2 = 1.0

h2 = 0.25h2 = 1.0

c = 0.01c = 0.05c = 0.1

Figure 6.4 Significant first-order interactions from the analysis of variance for the number of QTL detected. h2 = heritability, c = per meiosis recombination fraction, MP = mapping population size

For the heritability × mapping population size interaction, with a heritability of

h2 = 1.0 there was no significant difference in the number of QTL detected across a

mapping population size of 200, 500, and 1000 individuals (Figure 6.4b). With a

heritability of h2 = 0.25 there was a significant difference between all mapping

population sizes for the number of QTL detected. A mapping population size of 100 and

200 individuals in combination with a heritability of h2 = 0.25 also had a significantly

lower percentage of QTL detected than with a heritability of h2 = 1.0 (Figure 6.4b). For

the per meiosis recombination fraction × mapping population size interaction there was

a significant difference between the three mapping population sizes (MP = 100, 200 and

500) for all three per meiosis recombination fractions. For a mapping population size of

1000 individuals there was no difference in the number of QTL detected for all per

meiosis recombination fractions (Figure 6.4c).

The average results over the 100 replications for each of the 24 genetic models

are presented for the number of QTL detected in Table 6.2. In Table 6.2 the first three

columns describe the experimental variables of the genetic models, which were the

mapping population size, heritability level, and per meiosis recombination fraction

between the marker and QTL. The fourth column contains the average number of QTL

detected (out of a possible 10 QTL as all QTL were segregating or polymorphic) over

the 100 runs for each genetic model. The average number of QTL detected expressed as

lsd=0.14 lsd=0.2lsd=0.16

Page 160: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

126

a percentage provides a measure of the power of the experimental approach used to

detect QTL for each of the genetic models.

Table 6.2 Number of QTL detected (averaged over 100 runs) for a simulated Germplasm Enhancement Program mapping study for four mapping population sizes (MP), two herita-bility levels (h2) and three per meiosis recombination fractions (c) between a marker and QTL. Percentage of QTL detected out of the total number of polymorphic QTL also shown in parentheses

MP h2 c No. QTL detected (%) MP h2 c No. QTL

detected (%)100 0.25 0.10 1.7 (17%) 200 0.25 0.10 3.3 (33%) 100 0.25 0.05 2.5 (25%) 200 0.25 0.05 4.8 (48%) 100 0.25 0.01 3.1 (31%) 200 0.25 0.01 5.9 (59%) 100 1.00 0.10 7.2 (72%) 200 1.00 0.10 10.0 (100%)100 1.00 0.05 10.0 (100%) 200 1.00 0.05 10.0 (100%)100 1.00 0.01 10.0 (100%) 200 1.00 0.01 10.0 (100%)

500 0.25 0.10 7.0 (70%) 1000 0.25 0.10 9.6 (96%) 500 0.25 0.05 8.9 (89%) 1000 0.25 0.05 10.0 (100%)500 0.25 0.01 9.6 (96%) 1000 0.25 0.01 10.0 (100%)500 1.00 0.10 10.0 (100%) 1000 1.00 0.10 10.0 (100%)500 1.00 0.05 10.0 (100%) 1000 1.00 0.05 10.0 (100%)500 1.00 0.01 10.0 (100%) 1000 1.00 0.01 10.0 (100%)

The number of QTL detected was highly dependent on mapping population size

and heritability (Table 6.2). With a low mapping population size of 100 individuals, all

10 QTL were detected when the heritability was h2 = 1.0 and the per meiosis recombi-

nation fraction was small (c = 0.05 and 0.01). Decreasing the heritability (h2 = 0.25) and

increasing the per meiosis recombination fraction (c = 0.10) resulted in a decrease in the

number of QTL detected to 17% of the QTL being detected on average. Increasing the

mapping population size to 200 individuals with a low heritability and small per meiosis

recombination fraction resulted in an increase in the percent of QTL detected to 33%,

and to 70% and 96% for a mapping population size of 500 and 1000 individuals,

respectively (Table 6.2). With a heritability of h2 = 1.0, and a mapping population size

of 200, 500, or 1000 individuals, all of the polymorphic QTL were detected, regardless

of the per meiosis recombination fraction. For all mapping population sizes, the percent

of QTL detected increased as the per meiosis recombination fraction decreased when

the heritability was h2 = 0.25 (Table 6.2).

Page 161: Narelle Kruger PhD thesis

CHAPTER 6 EFFECT OF MAPPING POPULATION SIZE ON QTL DETECTION

127

6.4 Discussion The aim of most QTL mapping studies is to reliably detect as many as possible

of the true QTL that contribute towards the variation observed in a trait. The QTL

detection results observed when varying the QTL mapping population sizes, heritability

levels and per meiosis recombination fractions for the levels used in this experiment

indicate that these are important factors in the power of detection of QTL for the

Germplasm Enhancement Program mapping study.

Heritability was important in the detection of QTL as it affected the quality of

the phenotypic data collected on the progeny. When heritability was high, phenotypic

differences more reliably reflected the genetic differences. When the heritability was

low, the phenotypic values vary not only with respect to the genetic differences, but

were also strongly influenced by the environment (experimental error). Low heritability

resulted in a reduction in the power of the QTL detection analysis program to detect

QTL.

The per meiosis recombination fraction between a marker and QTL was found to

be an important parameter in this experiment. Therefore, it is expected that since map

density will influence the likely per meiosis recombination fraction between a marker

and a QTL, map density will influence the QTL detection analysis outcomes of the

Germplasm Enhancement Program mapping studies. A smaller per meiosis recombina-

tion fraction (c = 0.01) resulted in more QTL being detected than when a larger per

meiosis recombination fraction of c = 0.1 was used. A per meiosis recombination

fraction of c = 0.01 was important in retaining important linkages between markers and

QTL. As the per meiosis recombination fraction increased to c = 0.05 and c = 0.1, the

number of QTL detected decreased. This indicated that recombination events occurred

between the markers and QTL resulting in the breaking up of important linkage

relationships. It is therefore important to have a map of sufficient density to increase the

likelihood of a tight linkage between markers and favourable QTL contributing towards

the trait of interest.

Page 162: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

128

For the additive multiple QTL models considered here, one of the most impor-

tant parameters in QTL detection was found to be mapping population size. With the

larger mapping population size (MP = 1000 individuals) the percentage of QTL

detected reached 100% in most cases considered, irrespective of the heritability or per

meiosis recombination fraction. With a mapping population size of 100, heritability and

per meiosis recombination fraction played important roles in increasing the detection of

QTL. Mapping population size alone, for the models tested in this experiment, was not

large enough to completely overcome the effects of a low heritability (h2 = 0.25) and

weak per meiosis recombination fraction (c = 0.1), (Table 6.2). Overall, small mapping

population sizes (100-200 individuals) had a low power for detecting QTL. Mapping

population sizes approaching 500 to 1000 individuals gave a high, reliable power for

QTL detection across the range of genetic models tested.

As observed in this Chapter, reports of previous studies have shown that map-

ping population sizes less than 500 individuals have reduced power to identify QTL

with small effects (Beavis 1994, 1998, Utz et al. 2000), with suggestions that population

sizes need to be large (1000 individuals) to obtain QTL positions and estimate effets

with reasonable accuracy, and that even 2000 individuals may not be large enough

(Holland 2004). In addition to these findings simulation work conducted by Charmet

(2000) found that the accuracy in determining a QTL position was mostly affected by

population size and heritability and less by marker spacing. Ultimately, predicting a

definitive value for mapping population size over different heritabilities or per meiosis

recombination fractions when creating linkage maps, and for QTL detection analysis is

difficult. Breeding programs employing QTL detection will use different population

types and have many differing parameter values. Certainly, mapping population size is

one of the most important factors influencing QTL detection analysis. Based on the

results of the present study a mapping population size of at least 500 individuals would

be recommended as the minimum for the Germplasm Enhancement Program, since

some of the important traits are known to have a relatively low to moderate heritability

(h2 = 0.1 to 0.5: Nadella 1998, Peake 2002, Jensen 2004). Further, with the current

status of the Germplasm Enhancement Program marker map (Nadella 1998, Susanto

Page 163: Narelle Kruger PhD thesis

CHAPTER 6 EFFECT OF MAPPING POPULATION SIZE ON QTL DETECTION

129

2004) there are likely to be regions of the genome with relatively low marker density

and a per meiosis recombination fraction of c ≥ 0.1.

The results of this investigation suggest that the dimensions of the preliminary

empirical QTL analyses conducted for the Germplasm Enhancement Program by

Nadella (1998) are such that there would have been limited power to detect QTL for

complex traits like grain weight and grain yield. This is consistent with the results

reported by Nadella (1998), that QTL were only detected for traits with a higher level of

heritability. Therefore, the mapping population size of 143 recombinant inbred lines

used by Nadella (1998), was most likely too small to detect many of the QTL for the

traits of interest and should be increased to at least 500 recombinant inbred lines for

subsequent investigations aimed at mapping complex traits such as grain yield and its

components.

6.5 Conclusion The results of the present study, while restricted to additive multiple QTL mod-

els and thus preliminary, do exhibit general features that are common in QTL detection

analysis experiments reported in the literature (Figure 6.1). The results reported in this

Chapter provide an independent verification of Beavis’s (1998) observation that a

mapping population size of at least 500 individuals is required to identify QTL with

small effects, particularly for traits with low to moderate heritabilities. The results of the

study reported in this chapter are applicable for basic genetic models and assumptions

with very few genetic complexities. In reality, the situation is likely to be more

complicated, therefore, prediction of minimum practical mapping population size in

order to detect QTL for the Germplasm Enhancement Program mapping study is

difficult. However, it is recommended that larger population sizes of 500 to 1000

individuals be employed to overcome the complexities contributing towards the

variation observed for a trait of interest.

In Chapter 7 further exploratory experiments will continue to consider factors

that affect QTL detection. The extra factors considered in Chapter 7 are the effects of

introducing digenic epistatic networks or G×E interactions into the models. In Chapter 7

Page 164: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

130

the per meiosis recombination fraction of c = 0.05 was removed from the models to help

keep the experiment size down to a manageable level, given the available computer

resources. The per meiosis recombination fraction of c = 0.05 was selected to be

removed as it did not contribute any extra information to the effect of per meiosis

recombination fraction that could not be obtained using per meiosis recombination

fractions of c = 0.01 and c = 0.1 (Chapter 6).

Page 165: Narelle Kruger PhD thesis

CHAPTER 7 EFFECT OF G×E INTERACTION AND EPISTASIS ON QTL DETECTION

131

CHAPTER 7

THE EFFECT OF GENOTYPE-BY-

ENVIRONMENT INTERACTIONS AND

DIGENIC EPISTATIC NETWORKS

ON QTL DETECTION

7.1 Introduction In the case of the wheat Germplasm Enhancement Program, the target breeding

program of the research conducted in this thesis, empirical evidence indicates that both

epistasis and G×E interactions are important factors contributing to the genetic

architecture of grain yield in the reference population (Peake 2002, Jensen 2004,

Chapter 2, Section 2.4). Both epistasis and G×E interactions result in context dependent

effects of genes and are expected to complicate the mapping of traits and marker-

assisted selection (Cooper and Podlich 2002). Epistasis and G×E interactions are

expected to contribute to a more complex genetic architecture for traits, making QTL

detection more difficult than for the case of the additive genetic models considered in

Chapter 6. The work in this Chapter involves conducting a trait mapping study in the

presence of either: (i) a range of digenic epistatic networks, including two published

digenic networks functioning in a maize population; or (ii) a range of G×E interactions

associated with environmental variation in a target population of environments. As for

the cases considered in Chapter 6 the mapping populations considered in this Chapter

were designed to simulate the mapping process used for the Germplasm Enhancement

Program (Cooper et al. 1999a).

Page 166: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

132

The term epistasis is used in this thesis to refer to gene-by-gene interactions in

the determination of the effects of a gene (or QTL) on a trait. Quantitative experimental

studies have reported that epistatic interactions between traits are rarely reported (Stuber

et al. 1992, Tanksley 1993). However, these findings are not universal and epistatic

interactions have been reported for quantitative traits (Damerval et al. 1994, Cheverud

and Routman 1995, Doebley et al. 1995, Lark et al. 1995, Long et al. 1995, Cockerham

and Zeng 1996, Eshed and Zamir 1996, Mackay 2001, McMullen et al. 2001, Gadau et

al. 2002, Mackay 2004). Reasons why it may be difficult to detect epistatic interactions

in mapping studies include: (i) marker-QTL associations with significant effects that are

likely to show small epistatic effects (Lynch and Walsh 1998); (ii) recombination

occurring between markers and QTL (Doebley et al. 1995); (iii) limitations with the

analysis of variance method for detecting interactions (Wade 1992); (iv) the use of

small population sizes; and (v) populations that do not control genetic background

effects (Lynch and Walsh 1998). It is important to remember that not all epistatic

interactions are negative in effect (Holland 2001), and that by finding QTL that do

interact, it may be feasible to use markers to select for individuals with favourable gene

(or QTL) combinations (McMullen et al. 2001).

G×E interactions refer to changes in the relative trait values of genotypes in dif-

ferent environmental conditions. Considered at the gene level, gene-by-environment (or

QTL-by-environment) interactions arise when the contributions of the genes to trait

values change between environment-types or environmental conditions that vary among

experiments. Empirical evidence indicates that QTL can have a consistent effect across

environments (no G×E interaction), or their effect may vary across environments (G×E

interaction) (van Eeuwijk et al. 2002). By testing the individuals in the mapping

population across multiple environments, the effect of a QTL in different environments

can be examined as opposed to testing in only one environment where the genetic

effects will be confounded with the conditions in that one environment. Significant G×E

interactions for QTL have been reported in the literature (Paterson et al. 1991, Stuber et

al. 1992, Hayes et al. 1993, Zhuang et al. 1997, Yan et al. 1998, van Eeuwijk et al.

2002). However, the detection of QTL with small effects across environments is

expected to be less likely than the detection of QTL with large effects (Koester et al.

Page 167: Narelle Kruger PhD thesis

CHAPTER 7 EFFECT OF G×E INTERACTION AND EPISTASIS ON QTL DETECTION

133

1993). Experimental conditions that result in a low power to detect QTL, like small

mapping population sizes (Chapter 6), can result in QTL only being detected in some

environments, even if their effect is identical over all environments (Lynch and Walsh

1998).

As both G×E interaction and epistasis have been shown to be important factors

influencing grain yield variation in the reference population of the Germplasm En-

hancement Program (Chapter 2, Section 2.4) it was considered important to test the

effects of these factors on the power of QTL detection. The effects of epistasis and G×E

interactions in the genetic models was not accounted for by the QTL detection analysis

programs to illustrate a worse-case scenario where it is assumed that these effects do not

exist (which is a common assumption, Appendix 1, Section A1.2). In this Chapter the

effects of epistasis and G×E interaction are introduced into the genetic models to

determine their impact on QTL detection. Therefore, the research reported in this

Chapter is considered to be an extension of the study reported in Chapter 6.

7.2 Materials and Methods 7.2.1 Genetic models 7.2.1.1 Core model

A simulated factorial experiment was conducted to determine the dependence of

QTL detection on mapping population size, per meiosis recombination fraction and

heritability, in the presence of epistasis or G×E interactions. The core genetic model

consisted of 10 chromosomes with each chromosome consisting of one QTL evenly

spaced between two flanking markers (Chapter 6, Figure 6.2).

Based on Chapter 6, the core genetic model experimental variables evaluated

were: (i) QTL detection mapping population size (MP): 100, 200, 500 and 1000

recombinant inbred lines; (ii) heritability of the trait on an observation unit (single

plant) basis (h2): 0.25 (low) and 1.0 (high); and (iii) per meiosis recombination fraction

between a marker and QTL (c): 0.01 (small) and 0.1 (large), (Table 7.1).

Page 168: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

134

Table 7.1 Experimental variable levels used to specify the core genetic models studied Experimental variable Level Number of chromosomes 10 Number of QTL 10 Number of flanking markers per QTL 2 Heritability 0.25, 1.0 Per meiosis recombination fraction 0.01, 0.1 Mapping population sizes 100, 200, 500, 1000

7.2.1.2 Digenic epistatic models; E(NK) = 1(10:1) As this section was a preliminary study of the effects of epistasis on QTL detec-

tion, a broad range of digenic epistatic models i.e. two genes interacting to form an

epistatic network, were constructed to study the effects of epistasis on the number of

QTL detected (out of 10 possible QTL). Each model was tested in one environment

(E = 1), as no G×E interaction was included in the epistatic models. Five digenic

epistatic models (K = 1) were compared to an additive model (K = 0) where no epistasis

was present. Ten QTL were present in the core genetic model, therefore, five separate

digenic epistatic networks were defined for the trait of interest. All QTL had two alleles,

therefore nine genotypes were possible for each digenic network. The epistatic

networks were implemented in each model following the procedures described by

Kauffman (1993) and Cooper and Podlich (2002). Epistatic effects were simulated by

drawing the values for each of the nine genotypes from the uniform distribution. For the

10 gene model the same genotype values were used for each of the five sets of digenic

interactions. Thus, one parameterisation defined the values for the nine genotypes which

were applied to the five epistatic networks within each model. For example, a digenic

epistatic network with two loci (A and B) each with two alleles (a and A for locus A,

and b and B for locus B), the nine genotype values were:

Genotype Value aabb 1.17 aaBb 1.95 aaBB 1.70 Aabb 0.53 AaBb 4.00 AaBB 1.12 AAbb 3.68 AABb 2.79 AABB 0.00

Page 169: Narelle Kruger PhD thesis

CHAPTER 7 EFFECT OF G×E INTERACTION AND EPISTASIS ON QTL DETECTION

135

These values were repeated for each of the five digenic epistatic networks (10 QTL, two

QTL interacting therefore five digenic epistatic networks) within a genetic model.

The first three epistatic models considered in this Chapter (Epi 1, Epi 2 and

Epi 3, Table 7.2) were created so that the total genotypic variance comprised of a

varying proportion of epistatic variance (Table 7.2). The remaining two epistatic models

considered (referred to as Maysin & Deoxy, Table 7.2) were obtained from the work of

McMullen et al. (2001). Maysin and 3-deoxyanthocyanin synthesis are two components

of the flavonoid pathway in maize (Zea mays L.) with two interacting QTL, W23a1 and

GT119. The interaction of these two genes produces the nine genotypes for the products

of the flavonoid pathway. From the publication by McMullen, et al. (2001), the values

determined for each of the genotypes for each trait were used to define the epistatic

networks in this Chapter. Table 7.2 The percentage of additive ( )2Aσ , dominance ( )2

Dσ and epistatic ( )2Kσ variance

of the total genotypic ( )2Gσ variance for each of the models

Epistatic model 2Aσ / 2

Gσ (%) 2Dσ / 2

Gσ (%) 2Kσ / 2

Gσ (%) Additive 100 0 0

Epi 1 27 42 31 Epi 2 4 53 43 Epi 3 15 16 69

Maysin 81 6 12 Deoxy 71 11 18

An overview of the sources of genetic variance for each of the models can be

observed in Table 7.2 which displays for each model the percentage of additive ( )2Aσ ,

dominance ( )2Dσ and epistatic ( )2

Kσ variance of the total genotypic ( )2Gσ variance. These

values were calculated from the genotypic values using the orthogonal contrasts given

by Kempthorne (1969) for an F2 reference population. For the E(NK) = 1(10:0) additive

model, all of the genotypic variance, as expected, was additive (Table 7.2). For the

E(NK) = 1(10:1) models Epi 1, Epi 2, and Epi 3 there were significant non-additive

components of genetic variance. The contribution of the epistatic component of variance

to the total variance increased in the order of Epi 1, Epi 2, and Epi 3.

Page 170: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

136

01234

5

6

7

aaAa

AAbb

Bb

BB

Gen

otyp

ic v

alue

Gene 1

Gene 2

(a) Additive (E(NK) = 1(10:0))

01234

5

6

7

aaAa

AAbb

Bb

BB

Gen

otyp

ic v

alue

Gene 1G

ene 2

(b) Epi 1 (E(NK) = 1(10:1))

01234

5

6

7

aaAa

AAbb

Bb

BB

Gen

otyp

ic v

alue

Gene 1

Gene 2

(c) Epi 2 (E(NK) = 1(10:1))

01234

5

6

7

aaAa

AAbb

Bb

BB

Gen

otyp

ic v

alue

Gene 1

Gene 2

(d) Epi 3 (E(NK) = 1(10:1))

01234

5

6

7

aaAa

AAbb

Bb

BB

Gen

otyp

ic v

alue

Gene 1

Gene 2

(e) Maysin (E(NK) = 1(10:1))

01234

5

6

7

aaAa

AAbb

Bb

BB

Gen

otyp

ic v

alue

Gene 1

Gene 2

(f) Deoxy (E(NK) = 1(10:1))

Figure 7.1 Genotypic values for the six genetic models considered: (a) an additive model, (b-d) are the random digenic epistatic networks and (e-f) are the McMullen (2001), maysin and 3-deoxyanthocyanin digenic epistatic networks, respectively

For the maysin and deoxy models epistatic components of variance were pre-

sent, but were small relative to the proportion of additive variance to total variance. In

Page 171: Narelle Kruger PhD thesis

CHAPTER 7 EFFECT OF G×E INTERACTION AND EPISTASIS ON QTL DETECTION

137

addition to the table of variances (Table 7.2) the genotypic values for the five epistatic

models can be examined graphically to observe where peaks in the trait performance

landscape occur (Figure 7.1). Figure 7.1 illustrates how peaks in the epistatic models

(Figure 7.1b-f) may not occur at the traditional additive model favourable allelic

combination of AABB (Figure 7.1a).

7.2.1.3 G×E interaction models; E(NK) = 1(10:0), 2(10:0), 5(10:0), 10(10:0)

A broad range of G×E interaction models were simulated to study the effect of

G×E interactions on the power of QTL detection. To examine G×E interactions, four

models were created with each model containing a different number of environment-

types (E) in the target population of environments. The four models were based on one,

two, five and 10 environment-types. As the number of environment-types increases, the

phenotypic variation due to G×E interactions increases. The gene effects in each of the

environment-types were defined as shown in Table 7.3. For Table 7.3 the gene-

environment codes are: (i) 0 indicates the gene has no effect in that environment; (ii) 1

indicates that gene acts according to its m = midpoint, a = additive, d = dominance

values (Falconer and Mackay 1996); and (iii) -1 indicates that the gene effect is opposite

to its specified m, a, d values, giving rise to crossover gene-by-environment interac-

tions. When there was no G×E interaction each gene was given the gene-environment

code 1 (Table 7.3, Environment-type 1). For the G×E interaction model with two

environment-types (E = 2), the model included the first two environment-types

( )2 2: 0.12GE Gσ σ = as set out in Table 7.3, for the model with five environment-types the

first five columns indicate the gene effects in each of the five environment-types

( )2 2: 0.53GE Gσ σ = , and for the model with 10 environments-types all 10 environment-

types ( )2 2: 2.06GE Gσ σ = as set out in Table 7.3 were included in the model. Thus, the

level of G×E interaction increased with the number of environment-types included in

the genetic model.

Page 172: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

138

Table 7.3 The matrix of gene codes in each environment-type. A 0 indicates no G×E inter-action as the gene has no effect, a 1 indicates the gene follows m = midpoint, a = additive, d = dominance values, a -1 indicates a crossover effect. This table is set out so that as the number of environment-types increases the level of complexity in the system increases as more genes are interacting with the environment-type

Environment-type Gene 1 2 3 4 5 6 7 8 9 10

1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 -1 3 1 1 1 1 1 1 1 1 0 0 4 1 1 1 1 1 1 0 0 -1 -1 5 1 1 1 1 1 0 -1 -1 0 0 6 1 1 1 1 0 -1 0 -1 -1 0 7 1 1 1 -1 -1 1 -1 1 0 -1 8 1 1 -1 0 0 -1 1 0 -1 1 9 1 0 0 1 -1 0 1 -1 1 1 10 1 -1 -1 0 1 0 -1 1 1 0

7.2.2 Creating the mapping population and generating the linkage groups

One recombinant inbred line mapping population was created for each of the ge-

netic models (i.e. core model plus the gene effects of either the epistatic models or G×E

interactions). Based on the results of Chapter 5, Section 5.3, the mapping population

was established to represent the case for the wheat Germplasm Enhancement Program

(Cooper et al. 1999a). The procedures as outlined in Chapter 5, Section 5.3.1.2 were

followed for creating the mapping population and generating the linkage groups. The

experiment was repeated 100 times with a different QTL model parameterisation for

each repeat.

7.2.3 Conducting the QTL detection analysis The procedures described in Section 6.2.3 of Chapter 6 were used for the QTL

detection analyses. As for Chapter 6 the QTL detection analysis was conducted in one

environment assuming that no epistasis or G×E interaction was present in the mapping

population. The QTL detection analysis in this Chapter provides results on the number

of QTL detected when either epistasis or G×E interaction is present in the mapping

population, however, these factors were not accounted for in the QTL detection analysis

program.

Page 173: Narelle Kruger PhD thesis

CHAPTER 7 EFFECT OF G×E INTERACTION AND EPISTASIS ON QTL DETECTION

139

7.2.4 Conducting the statistical analyses An analysis of variance was conducted to determine the significant factors af-

fecting the number of QTL detected when five digenic epistatic networks were

analysed, along with a non-epistatic (additive) model. The variate recorded for each of

the genetic models was the number of QTL detected. The model used for the analysis of

variance is shown as Equation (7.1),

2 2

2 2

( ) ( ) ( )

( ) ( ) ( ) ,ijklm i j k l ij ik il

jk jl kl ijklm

x c h MP B c h c MP c B

h MP h B MP B

μ

ε

= + + + + + × + × + ×

+ × + × + × + (7.1)

where:

ijklmx is the number of QTL detected for observation m, at per meiosis recombi-

nation fraction level i, heritability level j, mapping population size k and

epistatic model l,

μ is the overall mean,

ic is the fixed effect of the ith per meiosis recombination fraction level,

2jh is the fixed effect of the jth heritability level,

kMP is the fixed effect of the kth mapping population size,

lB is the fixed effect of the lth epistatic model,

Combinations of the above terms represent their interactions,

ijklmε is the random residual effect of per meiosis recombination fraction level i,

heritability level j, mapping population size k, epistatic model l, for observation

m, 2(0, )N εε σ∼ .

An analysis of variance was also conducted to determine whether G×E interac-

tions affected the number of QTL detected. The model used for the analysis of variance

is shown as Equation (7.2),

2 2

2 2

( ) ( ) ( )

( ) ( ) ( ) ,ijklm i j k l ij ik il

jk jl kl ijklm

x c h MP E c h c MP c E

h MP h E MP E

μ

ε

= + + + + + × + × + ×

+ × + × + × + (7.2)

Page 174: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

140

where:

ijklmx is for observation m, the number of QTL detected at per meiosis recombi-

nation fraction level i, heritability level j, mapping population size k and envi-

ronment-type level l,

μ is the overall mean,

ic is the fixed effect of the ith per meiosis recombination fraction level,

2jh is the fixed effect of the jth heritability level,

kMP is the fixed effect of the kth mapping population size,

lE is the fixed effect of the lth environment-type level,

Combinations of the above terms represent their interactions,

ijklmε is the random residual effect of per meiosis recombination fraction level i,

heritability level j, mapping population size k, environment-type level l, for ob-

servation m, 2(0, )N εε σ∼ .

For both models the significance level for the analysis of variance was set at a

critical value of α = 0.05. Analyses were conducted with the fixed effects constrained to

sum-to-zero within the ASREML software (Gilmour et al. 1999). A least significant

difference test was conducted on the means of the levels within a factor that had a

significant F value.

7.3 Results 7.3.1 Genetic Models: Additive and Epistatic

From the analysis of variance there was no significant difference between the

epistatic models, or between epistasis being present or absent (additive model) for the

number of QTL detected (Table 7.4). As observed in Chapter 6, mapping population

size, heritability and per meiosis recombination fraction significantly affected QTL

detection. The average number of QTL detected increased as mapping population size

increased from 200 to 500 and 1000 individuals. For the smaller per meiosis recombina-

tion fraction (c = 0.01), significantly more QTL were detected on average than with the

larger per meiosis recombination fraction (c = 0.1), and the higher heritability level of

Page 175: Narelle Kruger PhD thesis

CHAPTER 7 EFFECT OF G×E INTERACTION AND EPISTASIS ON QTL DETECTION

141

h2 = 1.0, detected more QTL on average than the lower heritability of h2 = 0.25. Due to

their similarity the results have not been shown here, refer to Figure 6.3 for trends. In

Chapter 7, a per meiosis recombination fraction of c = 0.05 was not analysed, which

must be taken into consideration when referring to Figure 6.3b.

Table 7.4 Degrees of freedom (DF) and F values shown for per meiosis recombination fraction (c), heritability (h2), mapping population size (MP), epistatic model (B), and first-order interactions affecting the number of QTL detected. σ2 = error mean square

Source DF F value c 1 2996.0 * h2 1 26472.0 *

MP 3 9056.3 * B 5 2.2

c × h2 1 479.3 * c × MP 3 335.8 * c × B 5 0.9

h2 × MP 3 4970.7 * h2 × B 5 1.2

MP × B 15 1.04 Error 9557 σ2 =1.06 Total 9599

* indicates significant at α = 5%, F distribution

The significant first-order interactions were the heritability × per meiosis re-

combination fraction (h2 × c) interaction, heritability × mapping population size (h2 ×

MP) interaction and per meiosis recombination fraction × mapping population size (c ×

MP) interaction (Table 7.4). From these interactions, the number of QTL detected

increased as (i) heritability increased and per meiosis recombination fraction decreased;

(ii) as population size increased for a low heritability; and (iii) and as population size

increased for a per meiosis recombination fraction of c = 0.01. The means are not shown

due to their similarity to Chapter 6 (Figure 6.4).

From the analysis of variance, epistasis was not statistically significant therefore

the five epistatic models and the additive model all detected the same number of QTL.

However, a few of the epistatic models will be presented as they show the occurrence of

false QTL. In the context of this thesis, a false QTL is assumed to exist when 11 or

more QTL were detected when only 10 were specified in the model. False QTL were

not detected with a heritability h2 = 0.25 however, they were detected for a per meiosis

recombination fraction of c = 0.01 and c = 0.1 and only for a mapping population size of

Page 176: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

142

200 individuals. In the epistatic model, Epi 1, 1% of the runs with a mapping population

size of 200 individuals identified a false QTL (Figure 7.2a). False QTL were also

observed for the epistatic models Epi 2 and Epi 3, where 2% and 1%, respectively, of

the runs detected a false QTL with a mapping population size of 200 individuals (Figure

7.2b, c). In the epistatic model deoxy, 1% of the runs with a mapping population size of

200 individuals falsely identified an additional QTL (Figure 7.2d).

(c) E(NK) = 1(10:1) - Epi 3

Number of QTL detected0 2 4 6 8 10 12

0

20

40

60

80

100(b) E(NK) = 1(10:1) - Epi 2

Number of QTL detected0 2 4 6 8 10 12

0

20

40

60

80

100

(d) E(NK) = 1(10:1) - Deoxy

Number of QTL detected0 2 4 6 8 10 12

Per

cent

of r

uns

0

20

40

60

80

100

c = 0.01

c = 0.1

(a) E(NK) = 1(10:1) - Epi 1

Number of QTL detected0 2 4 6 8 10 12

Per

cent

of r

uns

0

20

40

60

80

100

Figure 7.2 Number of QTL detected as a percentage of the total runs are shown for four digenic epistatic models (E(NK) = 1(10:1)) with a heritability of h2 = 1.0, per meiosis re-combination fraction of c = 0.01(a-c) and c = 0.1 (d) with four mapping population sizes (MP = 100, 200, 500, 1000). Presence of false QTL occured when 11 QTL were detected

7.3.2 Genetic Models: Additive and G×E interaction From the analysis of variance there was a significant effect of the level of G×E

interaction on the number of QTL detected (Table 7.5). Consistent with the results of

Chapter 6 and the additive and epistatic genetic models (Section 7.3.1), per meiosis

recombination fraction, heritability level and mapping population size were dominating

factors contributing towards variation in the number of QTL detected (Table 7.5).

Page 177: Narelle Kruger PhD thesis

CHAPTER 7 EFFECT OF G×E INTERACTION AND EPISTASIS ON QTL DETECTION

143

Table 7.5 Degrees of freedom (DF) and F values shown for per meiosis recombination fraction (c), heritability (h2), mapping population size (MP), number of environment-types (E), and first-order interactions affecting the number of QTL detected. σ2 = error mean

Source DF F value c 1 1260.1 * h2 1 12123.2 *

MP 3 4060.0 * E 3 587.3 *

c × h2 1 226.4 * c × MP 3 141.7 * c × E 3 1.0

h2 × MP 3 2391.6 * h2 × E 3 35.5 *

MP × E 9 17.6 * Error 6369 σ2 =1.25 Total 6399

* indicates significant at α = 5%, F distribution

Mapping population size, heritability and per meiosis recombination fraction all

significantly affected QTL detection as was observed in Chapter 6 and Section 7.3.1.

Due to their similarity the results are not shown here, refer to Figure 6.3. In addition to

these effects, the number of environment-types also significantly affected the number of

QTL detected (Figure 7.3a), with the number of QTL detected decreasing as the number

of environment-types increased.

(a) No. environment-types

No. Environment-types1 2 5 10

0

20

40

60

80

100

Per

cent

of Q

TL d

etec

ted

(c) E x MP

Mapping population size100 200 500 1000

0

2

4

6

8

10

(b) h2 x E

No. Environment-types1 2 5 10

0

2

4

6

8

10h2 = 0.25h2 = 1.0

E = 1E = 2E = 5E = 10

Figure 7.3 Percent of QTL detected (averaged over 100 runs) for the number of environ-ment-types main effect (a) and significant first-order interactions (b-c). h2 = heritability, MP = mapping population size and E = number of environment-types

The significant first-order interactions in common with Chapter 6 and Section

7.3.1 were the heritability × per meiosis recombination fraction (h2 × c) interaction,

heritability × mapping population size (h2 × MP) interaction and per meiosis recombina-

tion fraction × mapping population size (c × MP) interaction (Table 7.4). The means are

not shown here due to their similarity to Chapter 6 (Figure 6.4). In addition, the

lsd=0.08 lsd=0.16lsd=0.11

Page 178: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

144

heritability × environment-types (h2 × E) interaction and environment-types × mapping

population size (E × MP) interaction were also significant (Table 7.5). For the heritabil-

ity × environment-types interaction there was a significant difference between all

numbers of environment-types over the two heritability levels (Figure 7.3b). For the

environment-types × mapping population size interaction there was a significant

difference between all mapping population sizes with a mapping population size of

1000 individuals detecting the highest number of QTL and a mapping population size of

100 individuals detecting the lowest number of QTL over all numbers of environment-

types (Figure 7.3c).

The additive models with one environment-type, E(NK) = 1(10:0), did not con-

tain any G×E interaction and were used as a reference point for the detection of QTL in

the presence of G×E interactions. For a heritability of h2 = 0.25 in combination with a

per meiosis recombination fraction of c = 0.01, as the number of environment-types in

the target population of environments increased (i.e. E = 2, 5, and 10), the distribution

of the number of QTL detected for each mapping population size shifted slightly to the

left, indicating fewer QTL were detected on average (Figure 7.4). For example, in

Figure 7.4a, the 1000 individuals mapping population size detected 10 QTL for 100% of

the runs, however, with 10 environment-types a mapping population size of 1000

individuals detected from six to 10 QTL (Figure 7.4d). This effect also occurred in the

lower mapping population sizes; however the effect was not as obvious as the lower

mapping population sizes had a broader distribution of the number of QTL detected for

all levels of G×E interaction.

Page 179: Narelle Kruger PhD thesis

CHAPTER 7 EFFECT OF G×E INTERACTION AND EPISTASIS ON QTL DETECTION

145

(a) E(NK) = 1(10:0)

Number of QTL detected0 2 4 6 8 10 12

Perc

ent o

f run

s

0

20

40

60

80

100 MP = 100MP = 200MP = 500MP = 1000

(b) E(NK) = 2(10:0)

Number of QTL detected0 2 4 6 8 10 12

Perc

ent o

f run

s

0

20

40

60

80

100

(d) E(NK) = 10(10:0)

Number of QTL detected0 2 4 6 8 10 12

Perc

ent o

f run

s

0

20

40

60

80

100(c) E(NK) = 5(10:0)

Number of QTL detected0 2 4 6 8 10 12

Perc

ent o

f run

s

0

20

40

60

80

100

Figure 7.4 Number of QTL detected as a percentage of the total runs are shown for genetic models with no epistasis and either (a) one: E(NK) = 1(10:0), (b) two: E(NK) = 2(10:0), (c) five: E(NK) = 5(10:0), or (d) 10: E(NK) = 10(10:0) environment-types in the target popula-tion of environments with a heritability of h2 = 0.25, per meiosis recombination fraction of c = 0.01 and four mapping population sizes (MP = 100, 200, 500, 1000)

Increasing the heritability from h2 = 0.25 (Figure 7.4) to h2 = 1.0 (Figure 7.5) re-

sulted in a higher number of QTL being detected on average. With no G×E interaction

(i.e. E = 1), all mapping population sizes detected 10 QTL for 100% of the runs (Figure

7.5a). As the number of environment-types increased, all mapping populations could

still detect 10 QTL, however, the percentage of runs where 10 QTL were detected

decreased. For this genetic model, mapping population size seemed to have little effect

on the number of QTL detected as all mapping population sizes produced the same

basic distribution for each number of environment-types.

Page 180: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

146

(a) E(NK) = 1(10:0)

Number of QTL detected0 2 4 6 8 10 12

Perc

ent o

f run

s

0

20

40

60

80

100 MP = 100MP = 200MP = 500MP = 1000

(b) E(NK) = 2(10:0)

Number of QTL detected0 2 4 6 8 10 12

Perc

ent o

f run

s

0

20

40

60

80

100

(d) E(NK) = 10(10:0)

Number of QTL detected0 2 4 6 8 10 12

Perc

ent o

f run

s

0

20

40

60

80

100(c) E(NK) = 5(10:0)

Number of QTL detected0 2 4 6 8 10 12

Perc

ent o

f run

s

0

20

40

60

80

100

Figure 7.5 Number of QTL detected as a percentage of the total runs are shown for genetic models with no epistasis and either (a) one: E(NK) = 1(10:0), (b) two: E(NK) = 2(10:0), (c) five: E(NK) = 5(10:0), or (d) 10: E(NK) = 10(10:0) environment-types in the target popula-tion of environments with a heritability of h2 = 1.0, per meiosis recombination fraction of c = 0.01 and four mapping population sizes (MP = 100, 200, 500, 1000)

With an increase in the per meiosis recombination fraction from c = 0.01 (Figure

7.4) to c = 0.1 (Figure 7.6), in combination with a low heritability of h2 = 0.25, a

decrease occurred in the average number of QTL detected. The decrease in the average

number of QTL detected resulted in a shift of the distribution towards the left for all

mapping population sizes (Figure 7.6). With no G×E interaction (E = 1), a mapping

population size of 1000 individuals and a per meiosis recombination fraction of c = 0.1,

10 QTL were detected for 69% of the runs (Figure 7.6a). A mapping population size of

500 individuals detected 10 QTL for 5% of the runs, while a mapping population of 100

and 200 individuals did not detect 10 QTL for any of the runs (Figure 7.6a). As the

number of environment-types increased, the average number of QTL detected de-

creased, with the biggest decrease occurring for a mapping population size of 1000

individuals (Figure 7.6).

Page 181: Narelle Kruger PhD thesis

CHAPTER 7 EFFECT OF G×E INTERACTION AND EPISTASIS ON QTL DETECTION

147

(a) E(NK) = 1(10:0)

Number of QTL detected0 2 4 6 8 10 12

Perc

ent o

f run

s

0

20

40

60

80

100 MP = 100MP = 200MP = 500MP = 1000

(b) E(NK) = 2(10:0)

Number of QTL detected0 2 4 6 8 10 12

Perc

ent o

f run

s

0

20

40

60

80

100

(d) E(NK) = 10(10:0)

Number of QTL detected0 2 4 6 8 10 12

Perc

ent o

f run

s

0

20

40

60

80

100(c) E(NK) = 5(10:0)

Number of QTL detected0 2 4 6 8 10 12

Perc

ent o

f run

s

0

20

40

60

80

100

Figure 7.6 Number of QTL detected as a percentage of the total runs are shown for genetic models with no epistasis and either (a) one: E(NK) = 1(10:0), (b) two: E(NK) = 2(10:0), (c) five: E(NK) = 5(10:0), or (d) 10: E(NK) = 10(10:0), environment-types in the target popula-tion of environments with a heritability of h2 = 0.25, per meiosis recombination fraction of c = 0.1 and four mapping population sizes (MP = 100, 200, 500, 1000)

Increasing the heritability of the trait from h2 = 0.25 (Figure 7.6) to h2 = 1.0

(Figure 7.7), in combination with a per meiosis recombination fraction of c = 0.1,

resulted in an increase in the number of QTL detected for all mapping population sizes.

With no G×E interaction (E = 1) the 500 and 1000 individual mapping population sizes

detected all 10 QTL for 100% of the runs (Figure 7.7a). A mapping population size of

200 individuals detected 10 QTL for 98% of the runs, while the distribution of QTL

detected ranged from three to 10 for a mapping population size of 100 individuals

(Figure 7.7a). As the number of environment-types increased, there was a trend for the

distribution of the number of QTL detected to broaden for each mapping population

size. For the larger per meiosis recombination fraction of c = 0.1, the 200, 500 and 1000

individual mapping population sizes for all levels of G×E interaction (Figure 7.7), gave

slightly broader distributions compared to when the lower per meiosis recombination

fraction of c = 0.01 was used (Figure 7.5d). A difference was noted for a mapping

population size of 100 individuals.

Page 182: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

148

(a) E(NK) = 1(10:0)

Number of QTL detected0 2 4 6 8 10 12

Perc

ent o

f run

s

0

20

40

60

80

100 MP = 100MP = 200MP = 500MP = 1000

(b) E(NK) = 2(10:0)

Number of QTL detected0 2 4 6 8 10 12

Perc

ent o

f run

s

0

20

40

60

80

100

(d) E(NK) = 10(10:0)

Number of QTL detected0 2 4 6 8 10 12

Perc

ent o

f run

s

0

20

40

60

80

100(c) E(NK) = 5(10:0)

Number of QTL detected0 2 4 6 8 10 12

Perc

ent o

f run

s

0

20

40

60

80

100

Figure 7.7 Number of QTL detected as a percentage of the total runs are shown for genetic models with no epistasis and either (a) one: E(NK) = 1(10:0), (b) two: E(NK) = 2(10:0), (c) five: E(NK) = 5(10:0), or (d) 10: E(NK) = 10(10:0), environment-types in the target popula-tion of environments with a heritability of h2 = 1.0, per meiosis recombination fraction of c = 0.1 and four mapping population sizes (MP = 100, 200, 500, 1000)

7.4 Discussion Over all the G×E interaction and epistatic models tested in this experiment,

heritability, per meiosis recombination fraction and mapping population size contributed

significantly to the variation observed for the number of QTL detected. This result was

consistent with the results observed in Chapter 6 when only additive genetic models

were considered.

Interestingly, there was no significant difference between the additive model and

the five digenic models considered here for the number of QTL detected or between the

five digenic models for the number of QTL detected (Table 7.4). This was unexpected

as the epistatic models varied in the percentage of the total genetic variance that was

epistatically based from 12% to 69% (Table 7.2). The epistatic models from the

McMullen et al. (2001), study were similar in their genetic variances and were not

significantly different from each other or the additive model. For the epistatic models

tested in this experiment the digenic epistatic networks did not reduce the power of the

Page 183: Narelle Kruger PhD thesis

CHAPTER 7 EFFECT OF G×E INTERACTION AND EPISTASIS ON QTL DETECTION

149

mapping experiments to detect QTL. The QTL detection analysis was able to detect

QTL when epistasis was present in the model as easily as when the additive model was

tested. There were also no significant interactions between the epistatic models and

heritability, per meiosis recombination fraction or mapping population size. The

presence of epistasis, for the models considered here, did not reduce the likelihood of

detecting QTL in any of the mapping studies simulated in this experiment.

False QTL were detected in some cases in the experiments which contained

epistasis (Figures 7.2). False QTL were only detected when the heritability of the trait

was h2 = 1.0, and the mapping population size was 200 individuals. This may be due to

a mapping population size of 100 individuals being too small to detect more than 10

QTL and a mapping population size of 500 and 1000 individuals being a sufficiently

large enough sample to greatly reduce the chance of detecting a false QTL. It is noted

that in this study, false QTL are only recognised if more than 10 QTL are detected. This

only occurred when experimental conditions enabled a large number of QTL to be

detected, e.g. h2 = 1.0. Therefore, it is possible that false QTL may have been present

under some of the other model and experimental conditions considered here, but simply

were not recognised by the criterion applied in this study. Regardless, the results

suggest that the numbers of false positive QTL are likely to be small. Changing the

critical value of the LOD threshold used in the QTL detection analysis (currently: α =

0.25) to a more conservative value (e.g. α = 0.05) should decrease the possibility of

detecting false QTL.

Increasing the amount of G×E interactions, by increasing the number of

environment-types in the target population of environments (Table 7.3) had a significant

effect on the number of QTL detected (Table 7.5). As the number of environment-types

in the target population of environments increased, the number of QTL that were

detected, on average, decreased. The level of G×E interaction in the genetic model

influences the QTL detection analysis as G×E interactions affect the QTL detection

programs ability to determine associations between phenotypic values and markers.

When there were no G×E interactions, i.e. one environment-type, the QTL detection

analysis was effective in identifying QTL as the phenotype produced the same response

Page 184: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

150

in all environment-types. However, as the level of G×E interaction increased and the

number of environment-types in the target population of environments increased the

chances of finding all QTL decreased. As the genes contributing towards the variation

observed for the trait had complex gene actions in a range of environment-types, it was

difficult for the QTL detection analysis program to find consistencies in the trait value

over different environment-types, which resulted in associations not being found

between QTL and markers in some environments. With a large number of environment-

types in combination with a QTL detection analyses conducted in one-environment, all

of the environment-types were not sampled resulting in QTL specific to certain

environment-types not being sampled resulting in the QTL not being detected. Conduct-

ing a QTL detection analysis over many environment-types will help resolve problems

associated with sampling one environment. Sampling many environment-types could

help determine major QTL detected in all environment-types and also to find QTL that

are only detected in certain environment-types.

In this Chapter, 100 replications of each model were conducted, allowing a dis-

tribution of the number of QTL detected across repeated runs of the same experiment to

be created. With the distributions, it was observed that the results of any one QTL study

could be highly variable for a specific genetic model. The distribution of the number of

QTL detected was broad when the trait had a low heritability (h2 = 0.25) and a per

meiosis recombination fraction of c = 0.1 (Figure 7.6). With a high heritability (h2 =

1.0) and small per meiosis recombination fraction of c = 0.01 the distribution was

narrower (Figure 7.5). In addition to the results reported in Chapter 6, i.e. with a higher

heritability and lower per meiosis recombination fraction, more QTL were detected, it is

apparent that the results are expected to have a higher repeatability with a denser genetic

map, where per meiosis recombination fraction c = 0.01 can be achieved, and the trait

can be measured with a higher heritability.

As observed in Chapter 6, mapping population size was an important factor in

determining the number of QTL detected. In an empirical mapping study of quantitative

traits of corn, a mapping population size of 976 F5 testcross progeny resulted in QTL

detection that accounted for 60% to 80% of the total genotypic variance, depending on

Page 185: Narelle Kruger PhD thesis

CHAPTER 7 EFFECT OF G×E INTERACTION AND EPISTASIS ON QTL DETECTION

151

the trait (Openshaw and Frascaroli 1997). In the simulation study reported in this

Chapter, for each model tested, each mapping population size had its own distribution

for the number of QTL detected. The smaller mapping population size of 100 individu-

als generally had the lowest number of QTL detected, followed by an increase in the

number of QTL detected as mapping population size increased to 200, 500, and 1000

individuals. Models where a mapping population size of 100 individuals was inappro-

priate for QTL detection are presented in Figure 7.7a. This figure represents a model

where the per meiosis recombination fraction was c = 0.1, the heritability was h2 = 1.0

and the analysis was conducted in one environment-type sampled at random from the

target population of environments. Under this model all other mapping population sizes

detected 10 QTL for at least 95% of the runs, while the mapping population size of 100

did not detect all 10 QTL.

A mapping population size of 500 and 1000 individuals always detected 10 QTL

for some runs under all the models tested. However, this was not true of the 100 and

200 individuals mapping population sizes. The higher mapping population size of 1000

individuals generally had a higher percentage of runs for each number of QTL detected

over the 500 individuals mapping population size. Occasions where mapping population

size was not as important were models with a heritability of h2 = 1.0 and a per meiosis

recombination fraction of c = 0.01 (Figure 7.5). The small likelihood of recombination

and removal of error effects when measuring the phenotype meant that the larger

mapping population sizes gave no advantage over the smaller mapping population sizes.

This effect was observed again at the larger per meiosis recombination fraction (Figure

7.7), however, in this case the increased chance of recombination meant that a mapping

population size of 100 individuals was not sufficient to detect the same number of QTL

as the other mapping population sizes. By increasing the size of mapping populations

the power to detect epistasis is expected to increase (McMullen et al. 2001) as there are

a greater number of genotypic classes represented in the mapping population, and

greater numbers of individuals within these classes.

A lower per meiosis recombination fraction was necessary to ensure a limited

amount of recombination between the marker allele and the trait QTL allele. Achieving

Page 186: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

152

a high heritability was also important. Any experimental methods that could be used to

improve the heritability of the trait should be employed in mapping studies to increase

the power of the experiment to detect QTL. Larger mapping population sizes of 500 or

more individuals were necessary to ensure a large proportion of the genotypes in an

epistatic network are represented in the study and to increase the power of QTL

detection.

From the study by Nadella (1998) it is important to note that to incorporate

marker-assisted selection into the Germplasm Enhancement Program a more saturated

linkage map needs to be created to associate QTL more efficiently with markers. In the

Nadella (1998) study, 403 amplified fragment length polymorphic markers were

segregating in the Hartog/Seri mapping population for the Germplasm Enhancement

Program. Only 114 amplified fragment length polymorphic markers were used in the

mapping study, along with 10 loci of known function, to form 19 linkage groups, with

19 markers considered unlinked. A more dense linkage map could possibly be made

which could detect 21 linkage groups with inclusion of the remaining amplified

fragment length polymorphic markers. With respect to the QTL analyses conducted in

the Nadella (1998) study, only 143 recombinant inbred lines, of a larger recombinant

inbred line mapping population size, consisting of 850 lines were used to localise QTL.

This lead to the detection of 18 QTL for four quantitative traits. With the use of the

larger mapping population size, effects like linkage between QTL and pleiotropy which

could not be distinguished in the Nadella (1998) study, could possibly be examined, and

a greater power in the detection of QTL could be achieved.

7.5 Conclusion The digenic epistatic models tested were not found to have a significant influ-

ence on the number of QTL detected. It was, however, found that the detection of false

QTL did occur at low population sizes for the digenic models. The number of environ-

ment-types contributing towards G×E interactions in the target population of environ-

ments did have a significant effect on the number of QTL detected. Increasing the

number of environment-types in the genetic model resulted in a decrease in the number

of QTL detected. Per meiosis recombination fraction between a marker and QTL,

Page 187: Narelle Kruger PhD thesis

CHAPTER 7 EFFECT OF G×E INTERACTION AND EPISTASIS ON QTL DETECTION

153

heritability and mapping population size were significant sources of variation in the

power of the mapping experiment in the detection of QTL and were consistent with the

results observed in Chapter 6. For the genetic models with either epistasis or G×E

interactions, the highest number of runs with all 10 QTL detected occurred with a per

meiosis recombination fraction of c = 0.01, heritability of h2 = 1.0 and mapping

population size of 1000 individuals. Therefore, when creating mapping populations for

the Germplasm Enhancement Program it would be important to use as large a popula-

tion size as possible to investigate the effects of epistasis and G×E interaction and detect

true QTL.

In this Chapter only a limited number of epistatic and G×E interaction models

have been considered. To draw more general conclusions on the effects of epistasis, and

G×E interaction, experiments with more complex epistatic networks and G×E interac-

tions should be considered. Epistasis and G×E interaction should also be investigated as

factors that occur simultaneously. Mapping population sizes should, if possible be at

least 500 individuals. Part IV of this thesis progresses from these results by considering

a range of genetic models including both G×E interaction and epistasis in combination

and their effect on both QTL detection and the response to selection for a marker-

assisted selection strategy proposed for the Germplasm Enhancement Program (Chapter

9). However, before the study of these complex factors on marker-assisted selection is

considered, it is first necessary to assess the introduction of marker-assisted selection

into the Germplasm Enhancement Program S1 family breeding program for less

complex additive genetic models to provide a reference for consideration of the effect of

the complex genetic models involving epistasis and G×E interactions (Chapter 8).

Page 188: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

154

Page 189: Narelle Kruger PhD thesis

PART IV SIMULATION OF PS, MS AND MAS IN THE WHEAT GEP

155

PART IV

SIMULATION OF

PHENOTYPIC, MARKER,

AND MARKER-ASSISTED

SELECTION IN THE WHEAT

GERMPLASM

ENHANCEMENT

PROGRAM

Page 190: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

156

Page 191: Narelle Kruger PhD thesis

CHAPTER 8 SELECTION RESPONSE IN THE GEP FOR ADDITIVE GENETIC MODELS

157

CHAPTER 8

SELECTION RESPONSE IN THE

GERMPLASM ENHANCEMENT

PROGRAM FOR ADDITIVE

GENETIC MODELS

8.1 Introduction Phenotypic selection is the process of selecting individuals, lines or families

based on their phenotypic performance as estimated from field experiments, is the

classical direct selection method used in plant breeding programs. The ability to now

create genetic maps and find associations between markers and QTL regions allows the

possibility of exploring new indirect selection techniques that include selecting a

phenotype based on its markers (marker selection). With marker selection, marker

profiles of the breeding population are created and compared with the definition of

favourable alleles of QTL estimated from a mapping population. Plants with marker

profiles that indicate a higher frequency of favourable QTL alleles present for the trait

of interest are selected. This technique is highly dependent on the quality of the

association that is established between the markers and the QTL and on the information

the markers provide. It is expected that if the favourable allele for all QTL for a trait are

reliably detected and can be selected on, marker selection will work well in the short-

term and long-term. However, if only a few of the possible QTL are detected then the

response from marker selection will be limited relative to the potential from phenotypic

selection. Based on the results reported in Chapters 6 and 7 it is unlikely that all QTL

and all favourable alleles of these QTL will always be detected in any mapping study.

Page 192: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

158

Marker-assisted selection can improve on this limitation by incorporating phenotypic

selection together with marker selection and utilising the performance information not

accounted for by the markers.

Marker-assisted selection has emerged as a strategy with the potential to

increase response to selection (Lande and Thompson 1990, Lande 1992, Dudley 1993),

with results showing that marker-assisted selection produces greater selection gains than

phenotypic selection for a normally distributed quantitative trait (Knapp 1998). Despite

this theory, marker-assisted selection for quantitative traits has rarely been utilised in

breeding programs for complex traits such as grain yield. Before marker-assisted

selection techniques will be readily incorporated into a breeding program, it is necessary

to demonstrate that marker-assisted selection is capable of producing greater genetic

gains than those observed with phenotypic selection. A number of theoretical (Van

Berloo and Stam 1999, Yousef and Juvik 2001) and simulation studies (Zhang and

Smith 1992, 1993, Edwards and Page 1994, Gimelfarb and Lande 1994a, 1994b, 1995,

Whittaker et al. 1995, Hospital and Charcosset 1997, Whittaker et al. 1997, Cooper and

Podlich 2002) have been conducted to compare marker-assisted selection and pheno-

typic selection. A general conclusion drawn from these papers is that for the models

tested, marker-assisted selection is capable of producing a rapid response to selection,

which declines with time relative to phenotypic selection.

As shown in Chapters 6 and 7, there are many factors that effect QTL detection

which include; mapping population size, map density, due to its effect on per meiosis

recombination fraction between the marker and QTL, and heritability of the trait. Since

these factors influence QTL detection they will also have a carry through effect on both

marker selection and marker-assisted selection strategies. In this Chapter the difference

in gene frequency between the mapping population and the breeding reference popula-

tion in which selection is ultimately applied will also be analysed for impact on marker

selection and marker-assisted selection. A synopsis on the importance of each of these is

provided below.

Page 193: Narelle Kruger PhD thesis

CHAPTER 8 SELECTION RESPONSE IN THE GEP FOR ADDITIVE GENETIC MODELS

159

Population size has been shown to be one of the most limiting factors when de-

tecting QTL (Beavis 1998, Chapters 6 and 7). Low population sizes result in low QTL

detection numbers, as well as the detection of false QTL. A problem with low popula-

tion sizes (i.e. < 500 individuals; Chapters 6 and 7) is that they are not able to sample all

of the segregating QTL combinations. Of the genotypes present in the population, there

will be large variation around the phenotypic values of these individuals due to the low

sampling rate of the different genotypes, which leads to poor QTL detection.

Map density influences the likelihood that segregating markers will be located

close to the QTL for the trait of interest. Map density affects the expected per meiosis

recombination fraction between the marker and the QTL and the persistence of linkage

disequilibrium between the marker alleles and the alleles of the gene(s) contributing to

the QTL. As the per meiosis recombination fraction weakens between markers and

QTL, the probability of crossover events increases. This can lead to favourable QTL

combinations being broken up, as well as previously designated unfavourable marker

alleles being linked with favourable QTL alleles. A small per meiosis recombination

fraction between a marker and QTL is expected to lead to greater QTL detection than do

the larger values. The impact of the strength of linkage association between markers and

QTL in marker-assisted selection is influenced by the number of opportunities for

meiotic events that allow for recombination between the marker and the QTL in

individuals that are heterozygous for both the marker and the QTL.

Heritability affects the reliability of the phenotypic values of the traits measured

on the individuals in the mapping population. A low heritability indicates that the

phenotypic values include a large amount of error. This can lead to poor QTL detection

power, as, during QTL detection analysis, an association cannot easily be made between

markers and a QTL contributing to trait variation. In the best case where heritability

approaches 1.0, there is little error contributing towards the phenotypic values, and

marker-QTL associations will be more easily detected.

For the mapping population constructed to support the Germplasm Enhancement

Program the parents of the mapping population were selected from the reference

Page 194: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

160

population of the breeding program (Nadella 1998, Cooper et al. 1999a, Susanto 2004).

For this situation the starting gene frequency for alleles of a QTL in the reference

population of the breeding program affects the likelihood that the QTL will be segregat-

ing in the mapping population and therefore, the likelihood that the QTL will be

detected. If a QTL allele has a low frequency in the potential set of parents of a mapping

population then it is likely the QTL will be monomorphic in the mapping population.

Any QTL that are monomorphic in the mapping population and are important for the

genetic variation for the trait in the breeding program cannot be detected in the mapping

population, leading to a general decrease in the number of relevant QTL that can be

detected in a mapping population. With a higher starting QTL allele frequency in the

reference population, the parents of the mapping population are more likely to be

polymorphic for the QTL, leading to a higher number of segregating QTL in the

mapping population.

Starting gene frequency can also be an important factor in achieving a response

to selection in a breeding program. If the starting gene frequency is low, then response

to selection can be slow initially but the potential for improvement is high. With a

higher starting gene frequency, the allele is already present at a relatively high level in

the population and in some cases it may be easy to select for the allele and fix the

favourable allele in the population. However, since the starting gene frequency is higher

the potential for changing the population mean trait value may be less.

The aim of this chapter was to use simulation to examine the potential advan-

tages of marker-assisted selection in the Germplasm Enhancement Program over both

phenotypic and marker selection. The three selection strategies were applied as

variations of the S1 family recurrent selection breeding program for a quantitative trait

determined by additive finite locus genetic models and compared by measuring their

response to selection over 10 cycles of selection. The influences of epistasis and G×E

interaction are considered in Chapter 9. The impact of these three selection methods and

their response to selection are compared for varying levels of heritability, per meiosis

recombination fraction, starting gene frequency in the breeding program reference

population, QTL mapping population size, and combinations of lines used as parents of

Page 195: Narelle Kruger PhD thesis

CHAPTER 8 SELECTION RESPONSE IN THE GEP FOR ADDITIVE GENETIC MODELS

161

the mapping population. The results of this simulation study were used to design the

more comprehensive simulation experiment considered in Chapter 9.

8.2 Materials and Methods 8.2.1 Genetic models

The simulation experiment involved the use of two computer programs, the ge-

netic simulation program QU-GENE (Podlich and Cooper 1998), and the QTL detection

analysis program PLABQTL (Utz and Melchinger 1996). The QU-GENE engine

(QUGENE) was used to simulate reference populations for the Germplasm Enhance-

ment Program breeding program according to predefined genetic models (Table 8.1).

Two QU-GENE modules were developed: (i) GEXPV2, which was used to create the

marker, phenotypic, and map data from QUGENE required as input by PLABQTL; and

(ii) GEPMAS, which utilises the QTL detection analysis results from PLABQTL to

conduct marker-assisted selection, marker selection, and phenotypic selection in the

Germplasm Enhancement Program S1 recurrent selection breeding program (Figure

8.1).

QUGENE GEPMASPLABQTLGEXPV2

Figure 8.1 Schematic outline of the sequence of computer programs used to determine re-sponse to selection in the GEP. QUGENE is the QU-GENE engine, GEXPV2 used the out-put from QUGENE to create input data for PLABQTL. PLABQTL then conducts the QTL detection analysis. GEPMAS is a QU-GENE module that conducts S1 recurrent selection by phenotypic selection and using the QTL detected by analysis using PLABQTL also con-ducts marker selection and marker-assisted selection

A factorial experiment based on 36 genetic models was conducted to observe the

response to selection for the implementation of phenotypic selection, marker selection,

and marker-assisted selection in the S1 recurrent selection breeding program conducted

to simulate the Germplasm Enhancement Program breeding strategy. The core model

consisted of 10 chromosomes with each chromosome having one QTL spaced between

two flanking markers (refer to Figure 6.2).

The genetic models considered were all based on finite-locus, additive genes

with effects of the same magnitude. The experimental variables used in this experiment

Page 196: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

162

were: (i) QTL mapping population size (MP): 200, 500 and 1000; (ii) heritability of the

trait on an observational unit (single plant) basis (h2): 0.25 and 1.0; (iii) per meiosis

recombination fraction between marker and QTL (c): 0.01 (small), 0.1 (intermediate)

and 0.2 (large); and (iv) starting gene frequency of the favourable QTL allele in the base

population of the Germplasm Enhancement Program (GF): 0.1 and 0.5 (Table 8.1).

Table 8.1 Experimental variable levels used to specify the core genetic models studied

Experimental variable Level Number of chromosomes 10 Number of QTL 10 Number of flanking markers / QTL 2 Heritability 0.25, 1.0 Per meiosis recombination fraction 0.01, 0.1, 0.2 Starting gene frequency 0.1, 0.2 Mapping population sizes 200, 500, 100 Replications 5

For this simulation experiment no epistatic or genotype-by-environment interac-

tion effects were included in the genetic models. All QTL mapping populations and

breeding program experiments were conducted in a single environment. In the notation

of the E(NK) model, all of the genetic models were E(NK) = 1(10:0). The QTL

detection analysis for each genetic model (within each gene frequency) was conducted

using the same five bi-parental mapping populations (i.e. five replications, with each

replicate representing a different pair of parents selected from the 10 parents used to

create the S1 recurrent selection breeding program base population).

8.2.2 Creating the mapping population and generating linkage groups

One of the limitations of trait mapping noted by Spelman and Bovenhuis (1998)

is whether QTL detected in a mapping population are directly applicable for use in a

breeding program. In this study there is a clear relationship between the mapping

population and the breeding population (Figure 8.2). The procedures outlined in Chapter

5, Section 5.3.1.2 were followed to create mapping populations that represented the case

for the wheat Germplasm Enhancement Program (Cooper et al. 1999a). In this Chapter,

a mapping population was created for each of the five replicates (within each gene

frequency).

Page 197: Narelle Kruger PhD thesis

CHAPTER 8 SELECTION RESPONSE IN THE GEP FOR ADDITIVE GENETIC MODELS

163

10 initialparents

QUGENEDefine g-e system

Cross twoextremeparents

RIL mappingpopulation

Single seed descent(n>10 generations)

Halfdiallel

Genotypeparents

QTL detectionanalysis

Space PlantPopulation

S1 familyproduction

METs

Space PlantPopulation

Space PlantPopulation

MarkerProfile

S1 familyproduction

METs

MarkerProfile

PSMASMS

Figure 8.2 Schematic outline of the sequence of procedures used to simulate the creation of the mapping population (for QTL detection analysis) and Germplasm Enhancement Pro-gram base population. The orange arrows show the information from the QTL detection utilised in marker selection (MS) and marker-assisted selection (MAS) strategies. The two parents used to create the mapping population are also included in the 10 parent structure used to create the half diallel population of the Germplasm Enhancement Program S1 recur-rent selection breeding program (see Figure 8.3). PS = phenotypic selection, RIL = recom-binant inbred line

In earlier experimental work (Chapters 5, 6 and 7) MAPMAKER/EXP (Lander

et al. 1987) was used to determine the linkage groups. From the results presented in

Appendix 2, Section A2.1 it was shown that there was consistency between the

specified per meiosis recombination fraction entered into the QUGENE engine and the

linkage groups generated by MAPMAKER/EXP for a mapping population size of 1000

individuals. Therefore, based on these results and to improve time efficiency MAP-

MAKER/EXP was not required to be executed and a genetic map was generated in the

GEXPV2 module from the values specified by the user in the QUGENE engine input

file. It is recognised that removing the map generation step from the simulation of the

marker selection and marker-assisted selection breeding strategies and directly utilising

the true genetic map will remove a source of error from the simulation of these two

breeding strategies. However, given the consistency of results between the estimated

Page 198: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

164

and true maps for mapping populations based on 1000 individuals, this source of error

and its potential effects on the simulated results of marker selection and marker-assisted

selection are considered to be small.

8.2.3 Assigning marker profiles Using the results of the QTL detection analysis, a marker value is assigned to

each individual in the space plant population stage of the Germplasm Enhancement

Program based on its marker profile (Figure 8.2). These inferred QTL genotypes based

on marker profiles are used to implement marker selection and marker-assisted selection

(Figure 8.3). The trait QTL value of an individual is defined as the sum of individual

QTL values for all of the segregating markers. This procedure is explained by way of an

example. For a QTL (Q ) with two flanking markers (M and N ), the favourable

alleles are defined as Q, M, and N, and the unfavourable alleles are q, m, and n,

respectively. In practice these designations of favourable and unfavourable alleles are

based on the results of the mapping analysis. Each of the favourable marker alleles is

assigned a value of two (in the inbred case a value of two is used as the duplicate

chromosome will be identical), while the unfavourable marker alleles are assigned a

value of zero. An example of assigning marker values is shown below for a case where

two QTL are found segregating in the breeding population. For each of these examples,

because the favourable QTL allele is present, the true QTL value for each of the

examples is four. As each QTL is flanked by two markers, the assigned marker value

for the case where all favourable marker alleles are present will be twice that of the true

QTL value, and is the preferred case for marker selection and marker-assisted selection

to ensure the correct QTL allele is being selected.

M Q N2 2M Q n2 0

m Q n0 0

8

AssignedMarkerValue

4

4

Example 1

Example 2

Example 3

Linkage group 1 Linkage group 2

M Q N2 2

M Q n2 0

0 0m Q n

M Q N2 2 0 0

m Q n

Example 40

TrueQTLValue

4

4

4

4

2

2

22

2

2 2

2

Page 199: Narelle Kruger PhD thesis

CHAPTER 8 SELECTION RESPONSE IN THE GEP FOR ADDITIVE GENETIC MODELS

165

For example 1, two segregating QTL were present on two linkage groups. Each QTL

was flanked by two favourable marker alleles, each favourable marker was assigned a

marker value of two and the total assigned marker value for this example is eight.

Therefore, when selecting this individual based on the assigned marker value it is

assumed that the favourable QTL alleles are being indirectly selected. For example two,

a recombination event has occurred between the QTL Q and marker N on both

linkage groups. There is one favourable and one unfavourable marker allele on both of

the linkage groups, resulting in an assigned marker value of two. It is assumed that with

the lower marker value that a recombination event may have caused the incorrect QTL

allele to be present, even though this has not occurred. This same assumption is held by

example 3 where no recombination has occurred on linkage group 1 and the favourable

marker alleles are associated with the favourable QTL allele, however, there have been

two crossover events for linkage group 2. Therefore, this individual is given an assigned

marker value of four, underestimating its true QTL value. In example 4, two recombina-

tion events have occurred on both linkage groups between marker M and QTL Q ,

and marker N and QTL Q . In this situation the favourable QTL allele is linked to the

unfavourable marker alleles (as defined from the QTL mapping results) resulting in a

value of zero. This individual and ultimately the favourable QTL allele will not be

selected as the marker profile is equal to zero.

8.2.4 Conducting the QTL detection analysis The procedures described in Section 6.2.3 of Chapter 6 for implementing

PLABQTL were used here for the QTL detection analyses. As for Chapter 6 the QTL

detection analysis was conducted in one environment assuming that no epistasis or G×E

interaction was present in the mapping population. The detected QTL were then used to

conduct marker selection and marker-assisted selection in the QU-GENE GEPMAS

module (Figures 8.1, 8.2 and 8.3).

Page 200: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

166

8.2.5 Simulating phenotypic selection, marker selection and marker-assisted selection for S1 families in the Germplasm Enhancement Program

The S1 recurrent selection breeding program modelled in the GEPMAS module

(Figure 8.1) is an adaptation of the Germplasm Enhancement Program of the Northern

Wheat Improvement Program. The GEPMAS module allows the modelling of pheno-

typic selection (current Germplasm Enhancement Program selection method), marker

selection, and marker-assisted selection over 10 cycles of selection. Table 8.2 contains

the experimental variables used in the GEPMAS module.

Table 8.2 Experimental variable levels utilised in the GEPMAS module. METs = multi-environment trials, GEP = Germplasm Enhancement Program

Experimental variable Level Number environments in METs in GEP 10 Number cycles of GEP 10 Number families in METs in GEP 500 (50 selected) Number runs 100 Population types S1 Number of bi-parental mapping populations 5 Selection type PS, MS, MAS

All selection methods (Figure 8.3) start with the creation of a reference popula-

tion by randomly mating the F1 progeny of a half diallel of the 10 Germplasm En-

hancement Program parents (Figure 8.2). The F1 individuals from the half diallel are

then randomly mated for one cycle to create the first S0 or space plant population

(Fabrizius et al. 1996, 10000 individuals).

For phenotypic selection, 500 individuals were randomly sampled from the

space plant population. During the S1 family production phase, S1 families were created

for each of the 500 sampled S0 individuals. Multi-environment trials were conducted on

these 500 S1 families. A multi-environment trial size consisting of a random sample of

10 environments was applied as this is the target multi-environment trial size for the

Germplasm Enhancement Program based on the studies reported by Cooper et al. (1995,

1997). The top 50 S1 families were selected on their mean phenotypic values across the

10 environments sampled from the target population of environments in the multi-

environment trials. Reserve S1 seed from the seed increase of the 50 selected families

Page 201: Narelle Kruger PhD thesis

CHAPTER 8 SELECTION RESPONSE IN THE GEP FOR ADDITIVE GENETIC MODELS

167

was then randomly mated to create the new space plant population for the next cycle

(Figure 8.3, PS).

HalfDiallel

Space PlantPopulation(10 000)

S1 familyproduction

METs

Space PlantPopulation(10 000)

Space PlantPopulation(10 000)

MarkerProfile

S1 familyproduction

METs

MarkerProfile

1 2

3

PS MASMS

HalfDiallel

HalfDiallel

Figure 8.3 Schematic outlines of the simulation of phenotypic selection (PS), marker selec-tion (MS), and marker-assisted selection (MAS) procedures in the S1 recurrent selection module (GEPMAS) used to simulate the Germplasm Enhancement Program. For pheno-typic selection, 1 indicates random mating of the reserve seed from the seed increase after multi-environment trials (METs) have been performed, for marker selection, the 2 indicates random mating of the selected plants from the space plant population based on their marker profile and for marker-assisted selection, 3 indicates random mating of the reserve seed from the seed increase after marker profiles and multi-environment trials have been per-formed. The three strategies of the Germplasm Enhancement Program simulated here can be compared to the more detailed description of the Germplasm Enhancement Program given in Chapter 2, Figure 2.5

For marker selection, plants are solely selected on their marker profile and do

not include any phenotypic selection. For this strategy a marker profile was created for

all 10000 space plants based on the results of the QTL detection analysis. No pheno-

typic evaluation was conducted in the case of marker selection. A QTL trait value was

determined for each of the 10000 individuals based on the marker profiles as in Section

8.2.3. The top 50 individuals, based on the QTL trait values determined from their

Page 202: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

168

marker profiles were selected and randomly mated to create the new space plant

population (Figure 8.3, MS).

The marker-assisted selection strategy considered for the Germplasm Enhance-

ment Program in this thesis was implemented as a two-stage tandem process. Selection

on marker-QTL associations were conducted on the space plant population in the first

stage and selection on phenotypic performance in a multi-environment trial was

conducted in the second stage. Therefore, in stage one of marker-assisted selection a

marker profile and QTL trait values were determined for all 10000 space plants based

on the results of the QTL detection analysis. The top 500 individuals were selected

based on their marker profile and QTL trait values. In stage two of marker-assisted

selection, S1 families were created for each of these 500 individuals selected from stage

one. The 500 S1 families were then evaluated in a 10 environment multi-environment

trial as for phenotypic selection. The family mean phenotypic value for the trait was

estimated from the multi-environment trials. The 50 S1 families with the highest trait

mean phenotype were selected. The reserve seed of the top 50 families selected

following marker-assisted selection (stage one = marker selection and stage two =

phenotypic selection), were randomly mated to create the new space plant population

for the next cycle of the breeding program (Figure 8.3, MAS). The approach to

implementing marker selection and marker-assisted selection represents the current

strategy under evaluation for the Germplasm Enhancement Program (Cooper et al.

1999a).

To implement the marker selection and marker-assisted selection strategies, the

QTL were detected for each of the 180 experimental combinations (36 genetic models ×

five mapping populations). The QTL detection analysis listed the QTL name, and which

chromosome it was detected on. This information was used to simulate marker selection

and marker-assisted selection. Each of the breeding strategies was implemented

separately for 10 cycles of selection for each of the 36 genetic models and five bi-

parental mapping population replications. The response to selection based on the trait

mean value was recorded for each cycle of selection. The trait mean value is expressed

as a percentage of the target genotype (Podlich and Cooper 1998), which in the case of

Page 203: Narelle Kruger PhD thesis

CHAPTER 8 SELECTION RESPONSE IN THE GEP FOR ADDITIVE GENETIC MODELS

169

the additive QTL models considered here, is the percentage of favourable alleles present

in the population in relation to the target genotype. The target genotype may be

specified in the QUGENE engine or, for simple genetic models it is the presence of the

favourable allele for the genes contributing towards the trait; e.g. for a simple additive

genetic model with three loci the target genotype would be AABBCC. Each of the 180

experiments was simulated 100 times for each selection strategy, with the results

averaged over the 100 runs. The results were then averaged over the five bi-parental

mapping population replications for graphing the response to selection (36 separate

graphs).

8.2.6 Conducting the statistical analysis Statistical analyses were conducted on both the number of QTL detected and

also on the simulated response to selection. The analysis for the number of QTL

detected was conducted on the average of the five bi-parental mapping population

replicates for the 36 genetic models. The analysis of the response to selection was

conduced on the average of each of the 36 genetic models run 100 times for each

selection strategy, then averaged over the five bi-parental mapping population repli-

cates.

An analysis of variance was conducted to determine the significant factors af-

fecting the number of QTL detected. The variate recorded for each of the genetic

models was the average number of QTL detected over the five bi-parental mapping

population replicates. The model used for the analysis of variance is shown as Equation

(8.1),

2 2

2 2

( ) ( ) ( )

( ) ( ) ( ) ,ijklm i j k l ij ik il

jk jl kl ijklm

x c h MP GF c h c MP c GF

h MP h GF MP GF

μ

ε

= + + + + + × + × + ×

+ × + × + × + (8.1)

where:

ijklmx is the number of QTL detected for observation m, at per meiosis recombi-

nation fraction level i, heritability level j, mapping population size k and starting

gene frequency l,

μ is the overall mean,

Page 204: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

170

ic is the fixed effect of the ith per meiosis recombination fraction level,

2jh is the fixed effect of the jth heritability level,

kMP is the fixed effect of the kth mapping population size,

lGF is the fixed effect of the lth starting gene frequency,

Combinations of the above terms represent their interactions,

ijklmε is the random residual effect of per meiosis recombination fraction level i,

heritability level j, mapping population size k, starting gene frequency l, for ob-

servation m, 2(0, )N εε σ∼ .

An analysis of variance was conducted to determine the significant factors af-

fecting the response to selection of each of the selection strategies. The variate recorded

for each of the genetic models was the population mean trait value after each cycle of

selection. The model used for the analysis of variance is shown as Equation (8.2),

2 2

2

2 2 2

( ) ( )

( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ,

ijklmno i j k l m n ij ik

il im in jk jl

jm jn kl km kn

lm ln mn ijklmno

x GF c h MP SS Cyc GF c GF h

GF MP GF SS GF Cyc c h c MP

c SS c Cyc h MP h SS h Cyc

MP SS MP Cyc SS Cyc

μ

ε

= + + + + + + + × + ×

+ × + × + × + × + ×

+ × + × + × + × + ×+ × + × + × +

(8.2)

where:

ijklmnox is the population mean trait value for observation o, at starting gene fre-

quency i, per meiosis recombination fraction j, heritability level k, mapping

population size l, selection strategy m and cycle n,

μ is the overall mean,

iGF is the fixed effect of the ith starting gene frequency,

jc is the fixed effect of the jth per meiosis recombination fraction,

2kh is the fixed effect of the kth heritability level,

lMP is the fixed effect of the lth mapping population size,

mSS is the fixed effect of the mth selection strategy,

nCyc is the fixed effect of the nth cycle,

Combinations of the above terms represent their interactions,

Page 205: Narelle Kruger PhD thesis

CHAPTER 8 SELECTION RESPONSE IN THE GEP FOR ADDITIVE GENETIC MODELS

171

ijklmnoε is the random residual effect of starting gene frequency i, per meiosis re-

combination fraction j, heritability level k, mapping population size l, selection

strategy m, and cycle n, for observation o, 2(0, )N εε σ∼ .

The significance level for each analysis of variance was set at a critical value of

α = 0.05. Analyses were conducted with the fixed effects constrained to sum-to-zero

within the ASREML software (Gilmour et al. 1999). A least significant difference test

was conducted on the means of the levels within a factor that had a significant F value.

8.3 Results 8.3.1 Number of QTL detected

The number of QTL detected for each of the genetic models for the five bi-

parental mapping population replications is presented in Table 8.3. The replicate and

gene frequency columns are separated as each of these genetic models required a

different base population to be created. The number of polymorphic QTL column

indicates for each of the replicates, within gene frequencies, the number of QTL that

were segregating in the mapping population and had the potential to be detected. The

data in the remainder of the table is the number of QTL detected within each gene

frequency and replicate for a range of models with differing per meiosis recombination

fractions, heritability and mapping population size. The average column contains the

means for each genetic model over the five bi-parental mapping population replicates.

On average all QTL segregating in the mapping population were not detected

with the case of a large per meiosis recombination fraction (c = 0.2), low heritability (h2

= 0.25), and small mapping population size (MP = 200), (Table 8.3). A starting gene

frequency of GF = 0.1 usually had less QTL segregating in the mapping population than

a starting gene frequency of GF = 0.5. In general, increasing the heritability and

mapping population size resulted in more QTL being detected, even when the per

meiosis recombination fraction was c = 0.2. On average a high heritability (h2 = 1.0),

and large mapping population size (MP = 1000) resulted in all segregating QTL being

detected over all per meiosis recombination fractions (Table 8.3).

Page 206: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

172

Table 8.3 Number of polymorphic QTL for each bi-parental mapping population replica-tion and the number of QTL detected for each of the 36 genetic models. Average across replications is also presented. c = per meiosis recombination fraction between QTL and marker, h2 = heritability, MP = mapping population size

Gene Frequency 0.1 0.5 Average Replicate 1 2 3 4 5 1 2 3 4 5 0.1 0.5

No. polymorphic QTL 5 3 3 2 4 4 7 9 6 8 3.4 6.8 c h2 MP Number of QTL detected

0.01 0.25 200 5 1 1 1 4 4 6 9 6 5 2.4 6.0 0.01 0.25 500 5 3 3 2 4 4 7 9 6 8 3.4 6.8 0.01 0.25 1000 5 3 3 2 4 4 7 9 6 8 3.4 6.8 0.01 1 200 5 3 3 2 4 4 7 9 6 8 3.4 6.8 0.01 1 500 5 3 3 2 4 4 7 9 6 8 3.4 6.8 0.01 1 1000 5 3 3 2 4 4 7 9 6 8 3.4 6.8 0.1 0.25 200 2 3 1 1 3 0 4 3 3 1 2.0 2.2 0.1 0.25 500 5 3 3 2 4 3 6 7 6 6 3.4 5.6 0.1 0.25 1000 5 3 3 2 4 4 6 9 6 8 3.4 6.6 0.1 1 200 5 3 3 2 4 4 7 9 6 8 3.4 6.8 0.1 1 500 5 3 3 2 4 4 7 9 6 8 3.4 6.8 0.1 1 1000 5 3 3 2 4 4 7 9 6 8 3.4 6.8 0.2 0.25 200 1 1 0 0 1 0 0 4 1 1 0.6 1.2 0.2 0.25 500 4 2 1 2 3 2 2 3 4 4 2.4 3.0 0.2 0.25 1000 4 3 3 1 3 4 2 7 3 5 2.8 4.2 0.2 1 200 5 3 3 2 4 4 4 5 4 6 3.4 4.6 0.2 1 500 5 3 3 2 4 4 7 9 6 6 3.4 6.4 0.2 1 1000 5 3 3 2 4 4 7 9 6 8 3.4 6.8

Red values indicate that on average all segregating QTL were not detected.

An analysis of variance was conducted on the average number of QTL detected

over the five bi-parental mapping population replicates. Consistent with the results of

Chapter 6 and 7, heritability level, per meiosis recombination fraction and mapping

population size were major factors contributing towards variation in the number of QTL

detected (Table 8.4). For the additional factor, gene frequency of the favourable QTL

allele in the reference population in this study, there was a significant difference

between the two levels (Table 8.4).

As the per meiosis recombination fraction increased, the average number of

QTL detected decreased (Figure 8.4a). For a trait heritability of h2 = 1.0, more QTL

were detected than for a heritability of h2 = 0.25 (Figure 8.4b). There was no significant

difference in the number of QTL detected for the mapping population size of 500 and

1000 individuals, which detected more QTL than 200 individuals (Figure 8.4c). A

Page 207: Narelle Kruger PhD thesis

CHAPTER 8 SELECTION RESPONSE IN THE GEP FOR ADDITIVE GENETIC MODELS

173

starting gene frequency of GF = 0.5 for the favourable allele resulted in more QTL

being detected on average than for the lower starting gene frequency of GF = 0.1

(Figure 8.4d).

Table 8.4 Degrees of freedom (DF) and F values shown for per meiosis recombination fraction (c), heritability (h2), mapping population size (MP), gene frequency (GF), and first-order interactions affecting the number of QTL detected. σ2 = error mean square

Source DF F value c 2 14.6 * h2 1 33.4 *

MP 2 11.9 * GF 1 137.0 *

c × h2 2 6.8 * c × MP 4 1.2 c × GF 2 5.1 *

h2 × MP 2 5.9 * h2 × GF 1 5.5 *

MP × GF 2 1.6 Error 160 σ2 = 2.2 Total 179

* significant value at α = 0.05, F distribution

a a

(a) Recombination fraction

Recombination fraction0.01 0.1 0.2

Aver

age

no. o

f QTL

det

ecte

d

0

1

2

3

4

5

6(b) Heritability

Heritability0.25 1

0

1

2

3

4

5

6

(c) Mapping population size

Mapping population size200 500 1000

Aver

age

no. o

f QTL

det

ecte

d

0

1

2

3

4

5

6(d) Gene frequency

Gene frequency0.1 0.5

0

1

2

3

4

5

6

Figure 8.4 Significant main effects from the analysis of variance for the number of QTL detected. All effect levels were significantly different except for those indicated by the same letter

lsd=0.54 lsd=0.44

lsd=0.54 lsd=0.44

Page 208: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

174

Several two-factor interactions were significant for the number of QTL detected

(Table 8.4). These were the heritability × per meiosis recombination fraction (h2× c)

interaction, gene frequency × per meiosis recombination fraction (GF × c), heritability ×

mapping population size (h2 × MP) interaction and gene frequency × heritability (GF ×

h2) interaction. As the responses generated by these interactions were linear, the

interaction graphs have been placed in Appendix 3, Figure A3.1.

8.3.2 Response to selection: phenotypic selection, marker selection, and marker-assisted selection Once QTL were detected for each of the genetic models, the marker selection

and marker-assisted selection strategies could be implemented in the simulated

Germplasm Enhancement Program. The response to selection (or trait mean value) of

the marker selection and marker-assisted selection strategies could then be measured

and compared to the response to selection of the phenotypic selection strategy con-

ducted on the same breeding population. An analysis of variance was conducted on the

average over 100 runs of the response to selection from the simulation of the three

selection strategies in the Germplasm Enhancement Program (Table 8.5). All of the

main effects were found to be significant (p < 0.05), (Table 8.5).

On average, the trait mean value increased as the number of cycles of selection

increased, although there was no difference in the trait mean value for cycles eight, nine

and 10 (Figure 8.5a). The marker-assisted selection strategy had a higher trait mean

value on average than the phenotypic selection and marker selection strategies (Figure

8.5b). Mapping population size had little effect on the response to selection observed

with no significant difference in the trait mean value for the 500 and 1000 individuals

mapping population sizes, which had a higher trait mean value than the 200 individuals

mapping population size (Figure 8.5c). The trait mean value was higher when the

favourable alleles started at a frequency of GF = 0.5 in the base population, compared to

a starting gene frequency of GF = 0.1 (Figure 8.5d). There was a slightly higher trait

mean value with a heritability of h2 = 1.0 in comparison to a heritability of h2 = 0.25

(Figure 8.5e), and as the per meiosis recombination fraction increased from c = 0.01 to c

= 0.2 the trait mean value decreased (Figure 8.5f).

Page 209: Narelle Kruger PhD thesis

CHAPTER 8 SELECTION RESPONSE IN THE GEP FOR ADDITIVE GENETIC MODELS

175

Table 8.5 Degrees of freedom (DF) and F values shown for per meiosis recombination fraction (c), heritability (h2), mapping population size (MP), gene frequency (GF), Selection strategy (SS), cycles (cyc) and first-order interactions affecting the response to selection. σ2 = error mean square

Source DF F value GF 1 9346.2 * c 2 48.1 * h2 1 28.5 *

MP 2 13.6 * SS 2 3822.2 *

Cyc 10 1249.3 * GF × c 2 1.1 GF × h2 1 5.7 *

GF × MP 2 1.3 GF × SS 2 474.7 * GF × cyc 10 177.0 *

c × h2 2 8.1 * c × MP 4 1.5 c × SS 4 31.9 * c × cyc 20 0.6 h2 × MP 2 8.4 * h2 × SS 2 18.4 * h2 × cyc 10 0.3 MP × SS 4 10.0 * MP × cyc 20 0.1 SS × cyc 20 141.2 *

Error 1064 σ2 = 26.7 Total 1187

* significant value at α = 0.05, F distribution

A number of two-factor interactions were also significant from the analysis of

variance (Table 8.5). After the first cycle of selection marker-assisted selection had the

highest trait mean value followed by marker selection and phenotypic selection for the

selection strategy × cycle (SS × cycle) interaction (Figure 8.6a). From cycle two to

seven marker-assisted selection retained the highest trait mean value, followed by

phenotypic selection and marker selection. The marker selection strategy had no further

increase in the mean after cycle two while there was continued improvement observed

for both phenotypic selection and marker-assisted selection. In the longer term, both

phenotypic selection and marker-assisted selection achieved a similar improvement in

the trait mean value, which reached a plateau between cycle eight and nine at 100% of

the target genotype with all favourable alleles fixed for the 10 QTL. The majority of the

contributions from the marker information to the trait mean value in the marker-assisted

selection strategy occurred in the earlier cycles of selection. The contributions to the

Page 210: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

176

trait mean for marker-assisted selection in the later cycles came from the phenotypic

selection stage. Thus, in the early cycles of selection marker-assisted selection demon-

strated an advantage over phenotypic selection which decreased over time.

a b b

a a a(a) Cycles

Cycle0 1 2 3 4 5 6 7 8 9 10

Trai

t mea

n va

lue

(%TG

)

0

20

40

60

80

100 (b) Selection strategy

Selection strategyPS MS MAS

0

20

40

60

80

100

(c) Mapping population size

Mapping population size200 500 1000

Trai

t mea

n va

lue

(%TG

)

0

20

40

60

80

100(d) Gene frequency

Gene frequency0.1 0.5

0

20

40

60

80

100

(e) Heritability

Heritability0.25 1

Trai

t mea

n va

lue

(%TG

)

0

20

40

60

80

100(f) Recombination fraction

Recombination fraction0.01 0.1 0.2

0

20

40

60

80

100

Figure 8.5 Significant main effects from the analysis of variance for response to selection. Response to selection expressed relative to the maximum potential response to selection (%TG) where TG = target genotype. All effect levels were significantly different except for those indicated by the same letter

There was a significant starting gene frequency × cycle (GF × cycle) interaction

(Figure 8.6b). For this interaction a starting gene frequency of GF = 0.5 had a higher

trait mean value than a starting gene frequency of GF = 0.1 over all cycles of selection.

With a starting gene frequency of GF = 0.5 the trait mean reached a plateau (cycle four)

at a higher trait mean value than for the starting gene frequency of GF = 0.1 which

reached a plateau at cycle eight.

lsd=0.94 lsd=0.42

lsd=0.42 lsd=0.03

lsd=0.03 lsd=0.42

Page 211: Narelle Kruger PhD thesis

CHAPTER 8 SELECTION RESPONSE IN THE GEP FOR ADDITIVE GENETIC MODELS

177

(a) SS x cycle

Cycle0 1 2 3 4 5 6 7 8 9 10

Trai

t mea

n va

lue

(%TG

)

0

20

40

60

80

100(b) GF x cycle

Cycle0 1 2 3 4 5 6 7 8 9 10

0

20

40

60

80

100

(c) SS x MP

Mapping population size200 500 1000

0

20

40

60

80

100(d) MP x h2

Heritability0.25 1

0

20

40

60

80

100

PSMSMAS

GF = 0.1GF = 0.5

PSMSMAS

MP = 200MP = 500MP = 1000

Trai

t mea

n va

lue

(%TG

)

Figure 8.6 Significant first-order interactions from the analysis of variance for the response to selection. Response to selection expressed relative to the maximum potential response to selection (%TG) where TG = target genotype. SS = selection strategy, h2 = heritability, GF = gene frequency, MP = mapping population size

For the selection strategy × mapping population size (SS × MP) interaction

(Figure 8.6c), mapping population size had no effect on phenotypic selection, as no

marker information was used. Mapping population size also had little effect on marker-

assisted selection as marker-assisted selection on average reverted back to phenotypic

selection after cycle two. For marker selection, the larger mapping population sizes

contributed significantly to an increase in the trait mean value (Figure 8.6c). There was

no difference in the trait mean value for a mapping population size of 500 and 1000

individuals over both heritability levels. For a heritability of h2 = 0.25 the 200 individu-

als mapping population size had the lowest trait mean value. With a heritability of h2 =

1.0 the 200 individuals mapping population size had the same trait mean value as a

mapping population size of 500 and 1000 individuals (Figure 8.6d). The remaining

interactions do not add significantly to the results and have been placed in Appendix 3,

Figure A3.2.

The following sets of figures (Figures 8.7, 8.8, 8.9 and 8.10) plot the mean trait

value (response to selection) for the three selection strategies over 10 cycles of selection

a a b b b

lsd=2.43

lsd=1.27 lsd=1.04

lsd=1.98

Page 212: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

178

for a range of heritability levels, starting gene frequencies, per meiosis recombination

fractions and mapping population sizes. As mapping population size had no effect on

the phenotypic selection strategy and the lower heritability had little effect due to

replication across 10 environments in the multi-environment trial, phenotypic selection

was similar over all sub-figures within each of the following Figures.

For a starting gene frequency of GF = 0.1 (Figure 8.7; c = 0.01 and 8.8; c = 0.2)

both phenotypic selection and marker-assisted selection achieved the target genotype by

cycle eight. Marker selection rapidly fixed the favourable alleles of the QTL detected in

the mapping study by cycle two. Marker-assisted selection had a higher trait mean value

than marker selection over all cycles of selection and a higher trait mean value than

phenotypic selection over the first seven to eight cycles of selection. When all the QTL

segregating in the mapping study were not detected (Table 8.3), marker-assisted

selection and marker selection returned a slightly lower response at cycle two (Figure

8.7a, marker-assisted selection was 4% lower and marker selection was 3% lower and

Figure 8.8a marker-assisted selection was 10% lower and marker selection was 9%

lower) than when all segregating QTL were detected (Table 8.3, Figures 8.7b, c, d, e, f

and Figures 8.8b, c, d, e, f). There was no difference in the response to selection for the

three selection strategies for each of the heritability levels for Figure 8.7 as the small per

meiosis recombination fraction of c = 0.01 resulted in all QTL being detected for all

scenarios, with the exception of Figure 8.7a, which did not detect all of the segregating

QTL due to the low heritability and small mapping population size.

Page 213: Narelle Kruger PhD thesis

CHAPTER 8 SELECTION RESPONSE IN THE GEP FOR ADDITIVE GENETIC MODELS

179

(a) 1(10:0) h2=0.25, MP=200

Cycles0 2 4 6 8 10

Trai

t mea

n va

lue

(%TG

)

0

20

40

60

80

100PSMSMAS

(b) 1(10:0) h2=0.25, MP=500

Cycles0 2 4 6 8 10

0

20

40

60

80

100(c) 1(10:0) h2=0.25, MP=1000

Cycles0 2 4 6 8 10

0

20

40

60

80

100

(d) 1(10:0) h2=1.0, MP=200

Cycles0 2 4 6 8 10

Trai

t mea

n va

lue

(%TG

)

0

20

40

60

80

100(e) 1(10:0) h2=1.0, MP=500

Cycles0 2 4 6 8 10

0

20

40

60

80

100(f) 1(10:0) h2=1.0, MP=1000

Cycles0 2 4 6 8 10

0

20

40

60

80

100

GF = 0.1, c = 0.01

Figure 8.7 Response to selection expressed as percentage of target genotype (average of the five bi-parental mapping population replicates) for phenotypic selection (PS), marker selec-tion (MS) and marker-assisted selection (MAS) over 10 cycles of the Germplasm En-hancement Program. E(NK) = 1(10:0), GF = 0.1, h2 = 0.25 (a-c) and h2 = 1.0 (d-f), c = 0.01, and three mapping population sizes (MP = 200, 500, 1000). TG = target genotype

As the per meiosis recombination fraction was increased from c = 0.01 (Figure

8.7) to c = 0.2 (Figure 8.8) the 14% advantage previously observed by marker-assisted

selection over phenotypic selection at cycle two decreased to 9%. When all segregating

QTL were not detected in the mapping study (Table 8.3, Figure 8.8a, b, c) the trait mean

value was lower than when all segregating QTL were detected (Table 8.3, Figure 8.8d,

e, f). Increasing the mapping population size resulted in an increase in the number of

segregating QTL detected, which is observed as small increases in the response to

selection for both marker selection (12% - 1%) and marker-assisted selection (6% -

1%), at cycle 2 (Figure 8.7a cf. 8.8b and 8.8c). The main impact of heritability was for

the marker selection strategy, where a heritability of h2 = 1.0 gave a 13% higher trait

mean value than a heritability of h2 = 0.25, with a mapping population size of 200

individuals. When heritability was increased from h2 = 0.25 to h2 = 0.1, all segregating

QTL were detected for all population sizes on average (Table 8.3) and the responses of

all three selection strategies were similar across mapping population sizes (Figure 8.8d,

e, f).

Page 214: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

180

(a) 1(10:0) h2=0.25, MP=200

Cycles0 2 4 6 8 10

Trai

t mea

n va

lue

(%TG

)

0

20

40

60

80

100(b) 1(10:0) h2=0.25, MP=500

Cycles0 2 4 6 8 10

0

20

40

60

80

100(c) 1(10:0) h2=0.25, MP=1000

Cycles0 2 4 6 8 10

0

20

40

60

80

100

(d) 1(10:0) h2=1.0, MP=200

Cycles0 2 4 6 8 10

Trai

t mea

n va

lue

(%TG

)

0

20

40

60

80

100(e) 1(10:0) h2=1.0, MP=500

Cycles0 2 4 6 8 10

0

20

40

60

80

100(f) 1(10:0) h2=1.0, MP=1000

Cycles0 2 4 6 8 10

0

20

40

60

80

100

PSMSMAS

GF = 0.1, c = 0.2

Figure 8.8 Response to selection expressed as percentage of target genotype (average of the five bi-parental mapping population replicates) for phenotypic selection (PS), marker selec-tion (MS) and marker-assisted selection (MAS) over 10 cycles of the Germplasm En-hancement Program. E(NK) = 1(10:0), GF = 0.1, h2 = 0.25 (a-c) and h2 = 1.0 (d-f), c = 0.2, and three mapping population sizes (MP = 200, 500, 1000). TG = target genotype

With an increase in the starting gene frequency to GF = 0.5 in the base popula-

tion from which the 10 parents were drawn, there was an increase in the starting

population mean and response to selection (Figure 8.9; c = 0.01 and Figure 8.10; c =

0.2) over the comparable cases where the starting gene frequency was GF = 0.1 (Figure

8.7; c = 0.01 and 8.8; c = 0.2). A higher favourable allele frequency in the base

population of GF = 0.5, resulted in a higher trait mean value in cycle zero compared to

the starting gene frequency of GF = 0.1. With a per meiosis recombination fraction of c

= 0.01, marker-assisted selection had the fastest increase in trait mean value, with the

target genotype being reached in cycle three, and cycle four for phenotypic selection

(Figure 8.9), as opposed to cycle eight with a starting gene frequency of GF = 0.1

(Figure 8.7). The trait mean value for marker-assisted selection and marker selection

were 0.5% and 3% lower, respectively, with the low heritability and small mapping

population size as not all of the QTL were detected (Table 8.3, Figure 8.9a). All QTL

were detected for the remaining mapping population sizes for a heritability of h2 = 0.25

Page 215: Narelle Kruger PhD thesis

CHAPTER 8 SELECTION RESPONSE IN THE GEP FOR ADDITIVE GENETIC MODELS

181

and h2 = 1.0, resulting in a similar response being observed for these models for all

selection strategies (Figure 8.9b, c, d, e, f).

(a) 1(10:0) h2=0.25, MP=200

Cycles0 2 4 6 8 10

Trai

t mea

n va

lue

(%TG

)

0

20

40

60

80

100

PSMSMAS

(b) 1(10:0) h2=0.25, MP=500

Cycles0 2 4 6 8 10

0

20

40

60

80

100(c) 1(10:0) h2=0.25, MP=1000

Cycles0 2 4 6 8 10

0

20

40

60

80

100

(d) 1(10:0) h2=1.0, MP=200

Cycles0 2 4 6 8 10

Trai

t mea

n va

lue

(%TG

)

0

20

40

60

80

100(e) 1(10:0) h2=1.0, MP=500

Cycles0 2 4 6 8 10

0

20

40

60

80

100(f) 1(10:0) h2=1.0, MP=1000

Cycles0 2 4 6 8 10

0

20

40

60

80

100

GF = 0.5, c = 0.01

Figure 8.9 Response to selection expressed as percentage of target genotype (average of the five bi-parental mapping population replicates) for phenotypic selection (PS), marker selec-tion (MS) and marker-assisted selection (MAS) over 10 cycles of the GEP. E(NK) = 1(10:0), GF = 0.5, h2 = 0.25 (a-c) and h2 = 1.0 (d-f), c = 0.01, and three mapping population sizes (MP = 200, 500, 1000). TG = target genotype

With an increase of the per meiosis recombination fraction to c = 0.2 from c =

0.01, fewer QTL were detected (Table 8.3). The effect on response to selection for a

differing number of QTL detected is illustrated within Figure 8.10. In Figure 8.10a,

18% of the QTL segregating were detected, which at cycle two resulted in marker

selection performing 28% lower than phenotypic selection, and marker-assisted

selection performing only 3% better than phenotypic selection. When the number of

QTL detected increased to 62% (Figure 8.10c), the response of marker selection

remained lower than phenotypic selection at 14%, however, it had improved compared

to when fewer QTL were detected (Figure 8.10a). With the increase in the number of

QTL detected the response of marker-assisted selection at cycle two also increased to be

8% higher than phenotypic selection. When 100% of the QTL were detected (Figure

8.10f), marker selection had a slightly higher trait mean value than phenotypic selection

in the first cycle of selection. Marker-assisted selection also had a higher trait mean

Page 216: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

182

value than both marker selection and phenotypic selection, as opposed to the cases

where fewer QTL were detected (Figure 8.10a).

(a) 1(10:0) h2=0.25, MP=200

Cycles0 2 4 6 8 10

Trai

t mea

n va

lue

(%TG

)

0

20

40

60

80

100(b) 1(10:0) h2=0.25, MP=500

Cycles0 2 4 6 8 10

0

20

40

60

80

100(c) 1(10:0) h2=0.25, MP=1000

Cycles0 2 4 6 8 10

0

20

40

60

80

100

(d) 1(10:0) h2=1.0, MP=200

Cycles0 2 4 6 8 10

Trai

t mea

n va

lue

(%TG

)

0

20

40

60

80

100(e) 1(10:0) h2=1.0, MP=500

Cycles0 2 4 6 8 10

0

20

40

60

80

100(f) 1(10:0) h2=1.0, MP=1000

Cycles0 2 4 6 8 10

0

20

40

60

80

100

PSMSMAS

GF = 0.5, c = 0.2

Figure 8.10 Response to selection expressed as percentage of target genotype (average of the five bi-parental mapping population replicates) for phenotypic selection (PS), marker selection (MS) and marker-assisted selection (MAS) over 10 cycles of the Germplasm En-hancement Program. E(NK) = 1(10:0), GF = 0.5, h2 = 0.25 (a-c) and h2 = 1.0 (d-f), c = 0.2, and three mapping population sizes (MP = 200, 500, 1000). TG = target genotype

Overall, marker-assisted selection produced a greater rate of response to selec-

tion than both marker selection and phenotypic selection, moving the population more

rapidly towards the target genotype with all favourable alleles for the 10 QTL, for the

additive genetic models considered in Chapter 8. Only the marker-assisted selection and

marker selection strategies were affected by the mapping population size as this variable

influenced the number of QTL detected. Per meiosis recombination fraction and

mapping population size had the greatest effect on the response to selection of marker

selection and marker-assisted selection through the impact these variables have on the

number of QTL that could be detected. Heritability had little impact on response to

selection; however it did influence response through an influence on QTL detection in

the mapping study. A starting gene frequency of GF = 0.5 resulted in a faster response

to selection than a starting gene frequency of GF = 0.1 in the reference population.

Page 217: Narelle Kruger PhD thesis

CHAPTER 8 SELECTION RESPONSE IN THE GEP FOR ADDITIVE GENETIC MODELS

183

8.4 Discussion Heritability was an important factor affecting the number of QTL detected dur-

ing the QTL detection phase. At a heritability of h2 = 1.0 generally all segregating QTL

were detected while fewer QTL were detected with a heritability of h2 = 0.25 (Table

8.3). With a heritability of h2 = 1.0 the observed phenotype was more representative of

the underlying genotype, which resulted in the composite interval mapping methodol-

ogy implemented in PLABQTL being able to associate markers with QTL for the trait

of interest. With the lower heritability of h2 = 0.25, fewer QTL were detected as the

phenotype did not accurately reflect the underlying genotype and composite interval

mapping implemented in PLABQTL was unable to find associations between markers

and all of the QTL for the trait of interest. One way of increasing the heritability during

the QTL detection phase would be to collect phenotypic data in many environments or

replications to help determine whether markers were associated with QTL for the trait

of interest under different conditions. When fewer QTL were detected, a lower trait

mean value was observed for marker-assisted selection and marker selection (e.g.

Figure 8.10a) than when more QTL were detected (e.g. Figure 8.10c). As noted in

Chapters 6 and 7, this effect on response to selection was due to less QTL information

contributing towards these selection strategies than when more QTL were detected.

Little difference was observed in the trait mean value of phenotypic selection for a

heritability of h2 = 0.25 and h2 = 1.0. This is due in part to the heritability being defined

on a single-plant basis in the base population. In the simulated Germplasm Enhance-

ment Program any phenotypic selection (in both the phenotypic selection and marker-

assisted selection strategies) conducted was based on means from multi-environment

trials based on a sample size of 10 environments. The repetition of observational units

in the multi-environment trials meant that heritability on a family-mean basis was

increased, resulting in a higher response to selection for the lower heritability of h2 =

0.25.

Many studies (Lande and Thompson 1990, Gimelfarb and Lande 1994a,

Whittaker et al. 1995, Van Berloo and Stam 1999, Yousef and Juvik 2001) have

observed that at high heritabilities the advantage of marker-assisted selection over

phenotypic selection decreases. The same effect was observed in this study. This is an

Page 218: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

184

expected effect as with high levels of replication the phenotype is a better predictor of

the underlying genotype, resulting in phenotypic selection being more effective than

selection based on markers. For the models tested in this Chapter, marker selection and

marker-assisted selection initially allowed a much faster rate of fixing the favourable

alleles to reach the target genotype, however due to the information available from the

markers being used by cycle two or three these selection strategies lost the ability to

reach the target genotype at a faster rate than phenotypic selection in the later cycles of

selection.

The starting gene frequency determined the proportion of each of the two alleles

at a locus in the reference population used to initiate the breeding program. With a

starting gene frequency of GF = 0.1 for the favourable allele, on average 10% of the

alleles in the base population are the favourable allele at that locus and 90% are the

unfavourable allele. In this study, a low starting gene frequency resulted in a slow

increase in the trait mean value for phenotypic selection and marker-assisted selection,

with both requiring eight cycles of selection to reach the target genotype (Figure 8.7 and

8.8). With the higher starting gene frequency of GF = 0.5, the favourable allele was at a

higher proportion in the base population, and selection was more effective in the early

cycles of the program with the target genotype being reached in two cycles of selection

for phenotypic selection and marker-assisted selection (Figure 8.9 and 8.10). A higher

starting gene frequency also resulted in a larger number of segregating QTL in the

mapping populations compared to a gene frequency of GF = 0.1. This resulted in the

detection of more QTL and a higher response to selection for both marker selection and

marker-assisted selection compared to the lower starting gene frequency.

Per meiosis recombination fraction was an important factor in both the detection

of QTL and the simulation of the Germplasm Enhancement Program for the marker

selection and marker-assisted selection strategies. The probability of a recombination

event between a marker and QTL occurring increases as the per meiosis recombination

fraction increases, which can result in favourable marker-QTL allele combinations

being lost during cycles of the breeding program or incorrect allele associations being

detected between QTL and marker alleles during the QTL detection analysis. This may

Page 219: Narelle Kruger PhD thesis

CHAPTER 8 SELECTION RESPONSE IN THE GEP FOR ADDITIVE GENETIC MODELS

185

lead to a lower trait mean value for the marker selection and marker-assisted selection

strategies. A smaller per meiosis recombination fraction generally resulted in most of

the polymorphic QTL that were segregating being detected in the mapping study,

resulting in a higher trait mean value than when the per meiosis recombination fraction

was larger. When the per meiosis recombination fraction was increased to c = 0.2, the

number of QTL detected of those segregating was low, resulting in the marker-assisted

selection strategy approaching the response of the phenotypic selection strategy. With

the larger per meiosis recombination fraction the marker selection and marker-assisted

selection strategies had a lower trait mean value than when the per meiosis recombina-

tion fraction was smaller, a trend also observed by Edwards and Page (1994). Figures

containing the results of a per meiosis recombination fraction of c = 0.1 (Table 8.3)

have not been shown due to their similarity to a per meiosis recombination fraction of c

= 0.2 (Figures 8.8 and 8.10), however for completeness they are included in Appendix

3, Figure A3.3 and Figure A3.4.

On average there was no significant difference in the number of QTL detected

for the QTL mapping population sizes of 500 and 1000 individuals. A flow on effect

was there being no significant difference in the trait mean value for the breeding

program for these mapping population sizes as these values did not contribute towards

the modelling of the Germplasm Enhancement Program component of the simulation

experiment, only the detection of QTL. Any individual experiment differences observed

were due to the 500 individuals mapping population size detecting less QTL compared

to the mapping population size of 1000 individuals, however, this difference was small.

For marker selection and marker-assisted selection based on a mapping population size

of 200 individuals, response was generally lower than the other mapping population

sizes when the heritability was also low. The low response to selection for the mapping

population size of 200 individuals was a result of the low number of QTL detected.

With a high heritability, a mapping population size of 200 individuals was comparable

to the 500 and 1000 individuals mapping population sizes. This result is consistent with

the results reported in Chapters 6 and 7 where smaller population sizes resulted in both

fewer QTL being detected, and some false QTL being detected.

Page 220: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

186

Mapping studies with a large number of segregating QTL relevant to the breed-

ing program are preferable crosses for use as a foundation in the implementation of

marker-assisted selection. A bi-parental mapping population as simulated here follow-

ing the strategy implemented by Cooper et al. (1999a), may not be the best type of

population for detecting QTL for use in marker-assisted selection for the Germplasm

Enhancement Program, as the number of polymorphic QTL was usually low and

variable between the five bi-parental mapping population replications for each gene

frequency (Table 8.3). The information provided by the marker-QTL associations only

lasted for two cycles of selection. Therefore, choice of mapping population is critical in

the design of an effective marker-assisted selection strategy and further investigation

should be conducted to find population types and designs that can produce and detect

more polymorphic QTL (e.g. Jansen et al. 2003). The need for additional mapping

studies at later cycles of selection may be necessary to detect QTL that were not

detected in the first mapping study.

Generally marker-assisted selection was found to have the highest trait mean

value, especially in the short and medium term, followed by phenotypic selection and

then marker selection. The different responses observed between phenotypic selection

(no marker use), and marker-assisted selection and marker selection (both use markers)

is a result of the number of QTL detected. When few QTL were detected, marker-

assisted selection had only a slightly higher response to selection than phenotypic

selection, and little genetic gain was achieved by marker selection (e.g. Figure 8.10a).

As there was no further mapping study and no phenotypic selection within the marker

selection strategy, no selection pressure was applied to the non-segregating QTL. This

resulted in the population mean remaining constant after the mapped QTL were fixed

and therefore a poor response to selection in the long term. When an intermediate

number of QTL were detected, the response to selection of marker selection was similar

to phenotypic selection in the short term (e.g. Figure 8.10c). When all possible QTL

were detected, marker-assisted selection was superior to phenotypic selection in the

short and long-term and marker selection could be better than phenotypic selection in

the short term (e.g. Figure 8.10f). Marker-assisted selection performed better than both

marker selection and phenotypic selection until marker-QTL association information

Page 221: Narelle Kruger PhD thesis

CHAPTER 8 SELECTION RESPONSE IN THE GEP FOR ADDITIVE GENETIC MODELS

187

was exhausted by selection, at which point marker-assisted selection was equivalent to

phenotypic selection and further responses to selection were based on the QTL

segregating in the breeding population that were not polymorphic in the mapping

population. This result can be observed in both Figure 8.8 and Figure 8.10.

Marker-assisted selection produced the highest selection response of the three

strategies over all variables studied (Figure 8.5b). There is scope to further improve the

effectiveness of marker-assisted selection by selecting a different mapping population,

or conducting another mapping study at cycle two to increase the number of polymor-

phic QTL detected by mapping.

8.5 Conclusion For the additive QTL models considered in this study, the rate of response to se-

lection of marker-assisted selection and marker selection relative to phenotypic

selection was dependent on the number of QTL detected. A low percentage of QTL

detected in the mapping study, resulted in a similar rate of genetic gain between the

phenotypic selection and marker-assisted selection strategy, with marker selection

performing poorly. Increasing the percentage of QTL detected of those segregating in

the mapping population resulted in marker-assisted selection having a greater response

to selection than phenotypic selection. Once the information from the marker-QTL

associations was utilised and the identified QTL were fixed for the favourable allele in

the breeding population, marker-assisted selection reverted to phenotypic selection for

the remaining QTL, and there was no further gain from marker selection as marker

selection was equivalent to a random mating strategy in the implementation considered

here. In addition, marker-assisted selection generally produced a greater response to

selection in the medium term. It is important to note that in the case of the additive

models considered here, both phenotypic selection and marker-assisted selection

achieved the same long-term (10 cycles) response to selection. Marker-assisted

selection outperformed phenotypic selection by achieving the maximum response to

selection in fewer cycles of selection for GF = 0.5.

Page 222: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

188

QTL mapping population size was an important factor affecting the number of

QTL that were detected; however, it had little impact on the response to selection. Over

all the models studied, a mapping population size of 1000 individuals did not consis-

tently detect all the segregating QTL. It is therefore important when selecting a QTL

mapping population size to have a reliable map established and an estimate of the

heritability of the trait of interest. Reliable detection of QTL and avoiding false

positives (Type I errors) is also necessary when marker-assisted selection is to be

introduced into a breeding program. Use of larger population sizes can reduce the

occurrence of these complications. Thus a reliable and relevant mapping strategy is a

critical issue in the design of an effective marker-assisted selection breeding strategy.

From this experiment it is shown that the three selection strategies did give dif-

ferent response to selection for S1 families, however it is not obvious whether pheno-

typic selection and marker-assisted selection will achieve similar long-term response to

selection in the presence of epistasis or genotype-by-environment interactions, a point

raised previously by Holland (2001, 2004) and examined by Cooper and Podlich (2002).

A broader investigation of the impact of genetic architecture of a trait, including the

effects of epistasis and G×E interaction on both QTL detection and response to selection

of the Germplasm Enhancement Program, is considered in Chapter 9.

Page 223: Narelle Kruger PhD thesis

CHAPTER 9 SELECTION IN THE GEP FOR COMPLEX GENETIC MODELS

189

CHAPTER 9

SELECTION RESPONSE IN THE

GERMPLASM ENHANCEMENT

PROGRAM FOR COMPLEX

GENETIC MODELS

9.1 Introduction

G×E interaction is an important factor to include as a component of the

genetic models used to examine the effectiveness of marker-assisted selection. It is a

large component of variation for quantitative traits of wheat in the target population of

environments of the Germplasm Enhancement Program and has been shown to

influence the response to selection observed in breeding programs in the northern grains

region (Brennan and Byth 1979, Brennan et al. 1981, Cooper et al. 1994a, 1994b,

Cooper et al. 1995, Cooper et al. 1996b, Fabrizius et al. 1997, Basford and Cooper

1998). The influences of G×E interactions have been investigated in earlier simulations

for the detection of QTL (Chapter 7), and on response to selection using phenotypic

selection (Kruger et al. 1999). The earlier simulations found that G×E interaction by

itself affected QTL detection and also caused a decrease in the response to selection

however, progress was still achievable and in the case for the study by Kruger et al.

(1999), DH lines produced a greater response than S1 families for the Germplasm

Enhancement Program. G×E interactions have not yet been included in the models

investigating marker selection or marker-assisted selection schemes considered in this

thesis. It is expected however, that they will impact on the ability to detect QTL as QTL

important in determining the phenotype in one environment, may not be important in

Page 224: Narelle Kruger PhD thesis

190 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

another environment (Tanksley 1993, Cooper and Podlich 2002, Chapter 7), reinforcing

the importance of conducting QTL detection analysis over many environments. For the

analysis of marker-assisted selection in this chapter, there are two stages of interest

where G×E interaction may influence the overall response to selection in the Germ-

plasm Enhancement Program; (i) in the QTL detection analysis phase for marker

selection and marker-assisted selection by possibly re-ranking genotypes across

environments; and (ii) in the phenotypic selection phase of marker-assisted selection

and phenotypic selection.

The effect of epistasis on the detection of QTL has been examined in earlier

simulation investigations in this thesis (Chapter 7). Epistasis has been argued to be of

little importance in response to selection because of its apparent small effect when it has

been experimentally investigated (Crow and Kimura 1979). However, today the

accumulating body of molecular evidence suggests that epistasis may be a significant

component of the genetic variation for a trait, even when it is difficult to detect as a

component of variance using classical quantitative genetics methodology (Chapter 2).

Therefore, there is a great need to investigate the impact of epistasis on QTL detection

analysis and the impact of the QTL detected on response to selection in the short-term

and long-term (Holland 2001, Cooper and Podlich 2002, Carlborg and Haley 2004). The

impact of epistasis on the response to selection has not yet been simulated in this thesis.

Studies by Peake (2002) and Jensen (2004) indicate that significant epistatic effects are

present for grain yield when bi-parental crosses based on the parents of the Germplasm

Enhancement Program are examined, and this is the basis for inclusion of epistasis in

the trait genetic models considered in this thesis. For the analysis of marker-assisted

selection in this chapter there are two instances of interest where epistatic interactions

may influence the overall response to selection in the Germplasm Enhancement

Program; (i) influences in the QTL detection phase for marker selection and marker-

assisted selection; and (ii) influences in the phenotypic selection phase of marker-

assisted selection and phenotypic selection. The preliminary work reported in Chapter 7

indicated that for a small number of examples, digenic epistatic networks had no

influence on the number of QTL detected. However, it is unclear from these results

whether the effects of epistasis on the specific QTL that were detected will influence the

Page 225: Narelle Kruger PhD thesis

CHAPTER 9 SELECTION IN THE GEP FOR COMPLEX GENETIC MODELS

191

outcomes of marker-assisted selection. In this Chapter the range of epistatic networks

considered is extended from K = 1 (digenic) to K = 2 (trigenic) and K = 5 (hexgenic) to

determine whether increasing the number of genes in the epistatic network will affect

QTL detection and ultimately marker-assisted selection for the Germplasm Enhance-

ment Program. For additive genetic models (K = 0), when finding associations between

markers and QTL it is important to find the marker allele linked to the favourable QTL

allele (Chapter 8). However, if the QTL is interacting with other genes the effects of its

alleles are going to be dependent on the effects of allele combinations at other genes.

Under these conditions it is less obvious how to define the favourable allele for a QTL

for short-term and long-term response to selection. Therefore, it is important to find

QTL that are interacting epistatically and determine what combination of QTL alleles

are likely to contribute to the best phenotypic response to selection within the context of

the gene network (e.g. Holland 2001). By finding these marker combinations then it

becomes possible to select for the best epistatic network combination. Defining optimal

QTL allele combinations in the presence of epistasis is a challenging task and a

comprehensive treatment of this topic is considered to be beyond the scope of this

thesis. Here the focus will be on whether QTL detection analysis and marker-assisted

selection, as proposed for the Germplasm Enhancement Program, can contribute to a

greater rate of response to selection the phenotypic selection.

Doubled haploid lines have previously been examined as an alternative to S1

family selection in the Germplasm Enhancement Program (Kruger 1999, Kruger et al.

1999). Under a phenotypic selection strategy, DH lines produced a greater response to

selection than S1 families in the Germplasm Enhancement Program for a range of

genetic models. They have not yet been simulated in combination with a marker

selection or marker-assisted selection scheme for the Germplasm Enhancement

Program. Howes et al. (1998) found through simulation that DH lines increased the

efficiency of marker-assisted selection, and also concluded that it was a fast strategy for

combining large numbers of genes with a minimum number of marker tests. Therefore,

it was considered appropriate to include DH lines as another factor in evaluating

marker-assisted selection for the Germplasm Enhancement Program in this study.

Page 226: Narelle Kruger PhD thesis

192 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

The levels of per meiosis recombination fraction used in this study represented a

realistic situation for the Germplasm Enhancement Program. From the integrated

AFLP-SSR linkage map for the parents of the Germplasm Enhancement Program

(Susanto 2004), the smallest per meiosis recombination fraction between two markers

over all of the linkage groups was c = 0.0019 (0.2 cM, Haldane conversion (Haldane

1931)), the largest per meiosis recombination fraction between two markers was c =

0.25 (34.7 cM Haldane conversion (Haldane 1931)) and the average per meiosis

recombination fraction between two markers over the linkage groups was c = 0.07 (8.7

cM Haldane conversion (Haldane 1931)). Therefore, modelling a recombination

fraction of c = 0.05 and c = 0.1 provided a realistic approach to the expected per meiosis

recombination fraction for the Germplasm Enhancement Program.

The results of the investigations reported in Chapters 4 to 8 were used as a basis

for designing the simulation experiment considered here. Figure 9.1 (replication of

Chapter 1, Figure 1.1) provides a schematic overview of how each part of the thesis is

interrelated and contributes to the modelling of the Germplasm Enhancement Program

breeding strategies considered here. It was important to undertake the work completed

in each of the proceeding parts to enable credible simulation of marker-assisted

selection for the Germplasm Enhancement Program. Part I investigated the convergence

of simulation and theory to determine that simulation was an adequate extension of

theory for a range of genetic models. Emphasis was given to the strategies for modelling

linkage and recombination in the QU-GENE software. Part I also considered which

QTL detection analysis methodology and software program to use and determined a

more efficient way of modelling QTL and markers on chromosomes for QTL detection.

Part II investigated how QTL detection analysis would be implemented in the Germ-

plasm Enhancement Program, and how linkage maps would be created. This Part also

looked at the influence of population size, heritability, per meiosis recombination

fraction, epistasis and G×E interaction on the detection of QTL. In Part IV, the work

completed in the previous parts allowed a detailed investigation to be conducted of the

opportunities to implement marker-assisted selection into the Germplasm Enhancement

Program. In Chapter 8, a comparison of the implementation of phenotypic selection,

marker selection and marker-assisted selection for simple additive genetic models in the

Page 227: Narelle Kruger PhD thesis

CHAPTER 9 SELECTION IN THE GEP FOR COMPLEX GENETIC MODELS

193

Germplasm Enhancement Program, using S1 families, was conducted. This previous

work was integral for the design of the simulation experiment in this final chapter where

phenotypic selection, marker selection and marker-assisted selection strategies were

implemented in the Germplasm Enhancement Program for both S1 families and DH

lines with the additional influence of epistasis and G×E interaction in the breeding

program. Based on the results of the previous Chapters and relevant literature, the

variables and treatment levels that were expected to have a critical influence on the

relative performance of selection strategies within the Germplasm Enhancement

Program were selected for inclusion in the simulation experiment.

ModellingMethodology:

Defining & validating amodelling approach

Base Population

MappingPopulation MS & MAS

QTLanalysis

alogithms

QTLinformation

Germplasm Enhancement Program

MASMS

PSPS

⊗ Part II

Part IIIPart IV

Figure 9.1 Outline of the structure of investigations of the thesis towards the simulation of different breeding strategies. Blue indicates the definition of genetic models and construct reference and base populations for the Germplasm Enhancement Program. Yellow indicates the simulation of mapping and QTL experiments and the green indicates the simulation of the breeding strategies of interest. The part numbers indicate which parts of the thesis these phases are addressed in (Replication of Chapter 1, Figure 1.1; included here for ease of reference)

To measure response to selection, the trait mean value was compared for the

measurement units of S1 families and DH lines for phenotypic selection, marker

selection and marker-assisted selection. Marker-assisted selection is theoretically

Page 228: Narelle Kruger PhD thesis

194 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

expected to return a greater response to selection than phenotypic selection under

certain conditions (Lande and Thompson 1990). However, the genetic models tested

under simulation by other authors (Zhang and Smith 1992, 1993, Edwards and Page

1994, Gimelfarb and Lande 1994a, Whittaker et al. 1995, Hospital et al. 1997, Howes et

al. 1998) have not explicitly included effects due to epistasis or G×E interaction;

generally the effects of heritability for additive finite locus models were examined. The

present experiment examines: (i) outcomes of QTL mapping in terms of, the influence

of the percent of QTL segregating, percent of QTL detected, percent of QTL detected of

the segregating loci, and errors in QTL detection; and (ii) how the results of the QTL

detection phase in turn effect the response to selection of S1 families and DH lines

within the marker selection and marker-assisted selection strategies for the Germplasm

Enhancement Program for simple to complex genetic models. Therefore, in this Chapter

the simulation experiment was designed to examine how the results of the QTL

detection analysis influence forward selection and can therefore be used as a strong

guide to the expectations of outcomes from applying these strategies in the Germplasm

Enhancement Program.

9.2 Materials and Methods 9.2.1 Genetic models

To create the range of genetic models required in this experiment, the statistical

ensemble approach (Kauffman 1993, Podlich 1999, Cooper and Podlich 2002) was

applied to the E(NK) framework so that a large number and wide range of genetic

models could be examined. Parameterisations of the E(NK) genetic models involved the

application of context independent (i.e. E = 1 and K = 0) and context dependent (i.e. E >

1 and K > 0) gene values drawn from the uniform distribution (See Kauffman (1993),

for further discussion of the influence of using alternative distributions to the uniform

distribution). A detailed discussion of the parameterisation of the E(NK) models was

given by Podlich (1999) and a summary of the approach used in this thesis was

described by Cooper and Podlich (2002). For example, for a E(NK) = 2(12:0) frame-

work the effect of each of the twelve genes in each of the two environments were

allocated as values drawn at random from the uniform distribution. The sampling from

the uniform distribution was conducted such that it is expected that each different

Page 229: Narelle Kruger PhD thesis

CHAPTER 9 SELECTION IN THE GEP FOR COMPLEX GENETIC MODELS

195

E(NK) model parameterisation will be independent. When using this approach to

generate genetic models, genes no longer have small and equal effects, as is often

assumed for quantitative traits; there is expected to be a distribution of major and minor

genes (Cooper et al. 2002a). The statistical ensemble parameterisation of the genetic

models only occurred in this Chapter and was implemented to enable a large number of

genetic model scenarios to be examined.

To create the genotype-environment systems considered in this experiment, four

levels of number of environment-types in the target population of environments, E = 1,

2, 5, and 10, and four levels of epistasis, K = 0, 1, 2, and 5, were considered. Note that

the combination of E = 1 and K = 0 is equivalent to the additive finite locus model

considered in Chapter 8. The range for the number of environment-types considered

here was based on the work of Chapman et al. (2000a, 2000b, 2000c) for sorghum in

the same target region as the Germplasm Enhancement Program and the extensions of

this work to wheat for this region (Mathews et al. 2002). The levels of epistasis

considered were based on the theoretical arguments by Kauffmann (1993) and earlier

investigations considering applications to plant breeding (Podlich 1999, Podlich and

Cooper 1999, Podlich et al. 1999, Cooper et al. 2002a, Cooper and Podlich 2002).

Heritability of the trait on an observational unit (single plant) basis (h2) was fixed at two

levels h2 = 0.1 (low) and h2 = 1.0 (high; reference point). Starting gene frequency (GF)

in the Germplasm Enhancement Program reference population for cycle zero was also

fixed at two levels GF = 0.1 (low) and GF = 0.5 (intermediate). The per meiosis

recombination fraction between the marker and QTL (c) was also specified at c = 0.05

(tight linkage representing a dense genetic map) and c = 0.1 (intermediate linkage

representing a relatively dense genetic map). These parameters, when combined with

the four levels of G×E interaction and four levels of epistasis, created 2×2×2×4×4 = 128

genotype-environment genetic models (Table 9.1). For each of the 128 genotype-

environment genetic models, 20 gene effect parameterisations were created giving a

total of 20×128 = 2560 genotype-environment genetic models.

Page 230: Narelle Kruger PhD thesis

196 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

Table 9.1 Experimental variable levels defined in the QU-GENE engine to create the geno-type-environment genetic models

Experimental variable Level Number QTL (N) 12 Linkage phase coupling Heritability (h2) 0.1, 1.0 Epistatic networks (K) 0, 1, 2, 5 G×E interaction: Number environment-types (E) 1, 2, 5, 10 Gene frequency (GF) 0.1, 0.5 Per meiosis recombination fraction (c) 0.05, 0.1 Number of parameterisations 20

Number of models = Heritability × K × E × GF × c = 128 × 20 parameterisa-tions = 2560 genotype-environment genetic models.

The basic linkage group model consisted of 12 chromosomes. Each chromosome

consisted of one QTL evenly spaced between two flanking markers (Figure 9.2).

Marker1

QTL

Marker2

11.0

11.0

1

Marker1

QTL

Marker2

11.0

11.0

2

Marker1

QTL

Marker2

11.0

11.0

3

Marker1

QTL

Marker2

11.0

11.0

4

Marker1

QTL

Marker2

11.0

11.0

5

Marker1

QTL

Marker2

11.0

11.0

6

Marker1

QTL

Marker2

11.0

11.0

7

Marker1

QTL

Marker2

11.0

11.0

8

Marker1

QTL

Marker2

11.0

11.0

9

Marker1

QTL

Marker2

11.0

11.0

10

Marker1

QTL

Marker2

11.0

11.0

11

Marker1

QTL

Marker2

11.0

11.0

12

Figure 9.2 Schematic outline of the linkage groups. There were 12 chromosomes each with one QTL and two flanking markers. The example has the markers spaced at 11 cM from the QTL, equivalent to a per meiosis recombination fraction of c = 0.1 on either side of the QTL using the Haldane mapping function (Haldane 1931)

For each of the 2560 genotype-environment genetic models (Table 9.1), 20 pa-

rental reference populations were created by taking 20 samples of the 10 parents of the

Germplasm Enhancement Program. The different parental reference populations varied

the linkage phase associations between the alternative alleles of the QTL for each of the

2560 genotype-environment genetic models. Thus, the experiment increased in size to

20 × 2560 = 51200 genetic model population scenarios.

To simulate phenotypic selection, marker selection and marker-assisted selection

the procedures outlined in Section 8.2 were followed. This involved creating the

reference populations using the QUGENE engine, constructing the linkage map and

mapping population in the GEXPV2 module, conducting a QTL detection analysis in

PLABQTL and conducting the selection strategies in the GEPMAS module (Figure

8.1).

Page 231: Narelle Kruger PhD thesis

CHAPTER 9 SELECTION IN THE GEP FOR COMPLEX GENETIC MODELS

197

9.2.2 Creating the mapping population and generating linkage groups

The procedures outlined in Chapter 8, Section 8.2.2 were followed to create the

mapping populations and involved combining each parental population with each

genotype-environment genetic model creating a Germplasm Enhancement Program

reference breeding population of 10 parents for each of the 51200 genetic models (Table

9.2). The relationship between the mapping population and breeding population was

retained as per Chapter 8, Figure 8.2. Based on the studies reported in Chapters 6, 7, and

8 it was concluded that a large number of lines and a sample of environment-types was

necessary to reliably detect QTL in the presence of G×E interactions (Cooper et al.

1999b). In this study the mapping population consisted of 1000 recombinant inbred line

individuals and the QTL detection analysis was conducted on line mean trait phenotype

values over 10 environments sampled at random from the target population of environ-

ments.

9.2.3 Assigning marker profiles Marker profiles were assigned to individuals as per the procedure described in

Chapter 8, Section 8.2.3.

9.2.4 Conducting the QTL detection analysis A QTL detection analysis was conducted on the mapping populations for the

51200 genetic model scenarios based on a recombinant inbred line mapping population

size of 1000 individuals (Table 9.2). The QTL detection analysis followed the proce-

dures as described in Chapter 8, Section 8.2.4.

Table 9.2 Experimental variable levels utilised in the QTL detection analysis

Experimental variable Level QTL mapping population size (recombinant inbred lines) 1000 Number environments in the QTL detection analysis multi-environment trials

10

The results from the QTL detection analysis were used as input for the GEP-

MAS module for conducting marker selection and marker-assisted selection. In the

Page 232: Narelle Kruger PhD thesis

198 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

GEPMAS module phenotypic selection, marker selection and marker-assisted selection

techniques were applied in combination with S1 families and DH lines and were

conducted with 10 runs (simulation replicates) of each of the 51200 model scenarios for

each of the breeding strategies (Table 9.3). Therefore, each of the six breeding strategies

(S1-PS, S1-MS, S1-MAS, DH-PS, DH-MS and DH-MAS) were simulated 512000 times.

Therefore, a total of 3072000 QTL detection analysis by breeding strategy scenarios

were assessed for their impact on response to selection over 10 cycles of selection in the

Germplasm Enhancement Program.

Table 9.3 Experimental variable levels utilised in the GEPMAS module

Experimental variable Level Number starting parents for the Germplasm Enhancement Program 10 Number of environments in the multi-environment trials in the Germplasm Enhancement Program

10

Number cycles of Germplasm Enhancement Program 10 Number of families in multi-environment trials in the Germplasm Enhancement Program

500 (50 selected)

Number runs (simulation replicates) 10 Population types S1 family, DH line Number parental populations 20 Selection type PS, MS, MAS

In addition to the procedures followed in Chapter 8: (i) the percent of QTL

segregating; (ii) percent of QTL detected; (iii) percent of QTL detected of those

segregating; and (iv) percent of QTL detected with incorrect marker-QTL allele

associations was also recorded for each of the 51200 genetic models subjected to QTL

detection analysis.

The percent of QTL segregating was the percent of QTL segregating in the

mapping population of the N = 12 possible QTL. For each mapping population, the

percent of QTL segregating could be determined as the two parents selected to create

the mapping population was known. Following selection of the two parents of the

mapping population based on highest and lowest trait value, QTL were only segregating

if the parents were polymorphic for that QTL. Of the 12 QTL potentially influencing the

trait there is a low likelihood that the parents will be polymorphic for all QTL in any

individual mapping study. An important factor affecting the number of QTL segregating

is the starting gene frequency of the favourable allele in the reference population. If the

Page 233: Narelle Kruger PhD thesis

CHAPTER 9 SELECTION IN THE GEP FOR COMPLEX GENETIC MODELS

199

starting gene frequency is GF = 0.1 for the favourable QTL allele, on average only 10%

of the alleles in the reference population are the favourable allele, and the remaining

90% are the unfavourable allele in the base population. Therefore, the chance of

selecting two parents polymorphic for all of the QTL of interest following the proce-

dures used in this investigation is quite low. This complicating feature of mapping for

breeding applications is considered relevant to the situation for the Germplasm

Enhancement Program. It is expected, based on available pedigree and marker data

(Nadella 1998, Susanto et al. 2002), that all of the important QTL segregating for a trait

in the Germplasm Enhancement Program breeding population would not be segregating

in the mapping population.

Marker1

QTL

Marker2

10.0

10.0

1

Marker1

QTL

Marker2

10.0

10.0

2

Marker1

QTL

Marker2

10.0

10.0

3

Marker1

QTL

Marker2

10.0

10.0

4

Marker1

QTL

Marker2

10.0

10.0

5

Marker1

QTL

Marker2

10.0

10.0

6

Marker1

QTL

Marker2

10.0

10.0

7

Marker1

QTL

Marker2

10.0

10.0

8

Marker1

QTL

Marker2

10.0

10.0

9

Marker1

QTL

Marker2

10.0

10.0

10

Marker1

QTL

Marker2

10.0

10.0

11

Marker1

QTL

Marker2

10.0

10.0

12

Using the linkage map example above, the indicates that the QTL was not

segregating in the mapping population. Therefore, the percentage of segregating QTL in

this example is the number of QTL segregating divided by the total number of QTL

multiplied by 100; × =8100 66%

12, i.e. eight of the possible 12 QTL were segregating

and could potentially be detected in the QTL detection analysis.

The percent of QTL detected was the percent of QTL that were detected from

the QTL detection analysis, of the total QTL (N = 12). To calculate the percentage of

QTL detected, a composite interval mapping analysis was conducted using PLABQTL

(as per Section 9.2.4).

Marker1

QTL

Marker2

10.0

10.0

1

Marker1

QTL

Marker2

10.0

10.0

2

Marker1

QTL

Marker2

10.0

10.0

3

Marker1

QTL

Marker2

10.0

10.0

4

Marker1

QTL

Marker2

10.0

10.0

5

Marker1

QTL

Marker2

10.0

10.0

6

Marker1

QTL

Marker2

10.0

10.0

7

Marker1

QTL

Marker2

10.0

10.0

8

Marker1

QTL

Marker2

10.0

10.0

9

Marker1

QTL

Marker2

10.0

10.0

10

Marker1

QTL

Marker2

10.0

10.0

11

Marker1

QTL

Marker2

10.0

10.0

12

In the example above, after conducting the QTL detection analysis, six QTL

were detected and are represented by a , the indicates that the QTL was not

Page 234: Narelle Kruger PhD thesis

200 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

segregating in the mapping population. In this example the percent of QTL detected of

the total number of QTL (N = 12) was × =6100 50%

12 i.e. six of the 12 QTL were

detected in the QTL detection analysis.

The percent of QTL detected of those segregating is the percent of QTL detected

divided by the percentage of QTL that were segregating in the mapping study. For the

examples in the above two Sections, the percent of QTL detected of those segregating

was × =50

100 75%66

, as six QTL were detected out of the eight segregating in the

mapping population. Two QTL, one on linkage group six and one on linkage group 11

were not detected but were segregating in the mapping population in the example above.

The percent of QTL detected with incorrect marker-QTL allele associations is

the percent of QTL that were detected, where the marker alleles were incorrectly

associated with the globally favourable QTL alleles. This quantifies the percentage of

cases where the results of the QTL detection analysis identified the QTL in the mapping

study, but selected the wrong allele as favourable in comparison to its true value when

all possible genotypes were considered in the breeding program reference population. In

this case, for the E(NK) models considered, the total genotypic space could be defined

and the favourable allele combinations could be determined for all epistatic networks.

Therefore, the true QTL allele value is known for all QTL alleles for each model

parameterisation. As discussed in Chapter 2, Section 2.2.2.5, incorrect marker-QTL

allele associations are also known as Type III errors.

Incorrect marker-QTL allele associations can occur for a number of reasons. For

example, incorrect marker-QTL allele associations can occur for an additive genetic

model if composite interval mapping analysis cannot distinguish accurately which

marker alleles are associated with the QTL alleles for the superior and inferior pheno-

types. Alternatively, incorrect marker-QTL allele associations could arise when

epistasis is present, and the effects of the alleles were context dependent and con-

founded with specific background effects of the non-segregating QTL in the mapping

study. In the case of the genetic models considered here it is possible to establish when

Page 235: Narelle Kruger PhD thesis

CHAPTER 9 SELECTION IN THE GEP FOR COMPLEX GENETIC MODELS

201

an incorrect marker-QTL allele association occurs because the true effects of the alleles

are known from the model parameterisation created in the QU-GENE engine. This

information was recorded by comparing the results of the model parameterisation with

the results of the QTL detection analysis. For example, from the model parameterisation

it is known that the favourable allele combination between two flanking markers M

(alleles M and m) and N (alleles N and n) and QTL Q (alleles Q and q) is M-Q-N, and

therefore, the unfavourable allele combination is m-q-n. In this case, an incorrect

marker-QTL allele association occurs when the results of the QTL detection analysis

assigns the unfavourable marker alleles with the favourable QTL allele, for example m-

Q-n which also means that the favourable marker alleles are associated with the

unfavourable QTL allele, M-q-N. These outcomes were frequently observed in the

genetic models where epistasis contributed to the trait values. The frequency of

occurrence of an incorrect marker-QTL allele association was recorded.

9.2.5 Simulating phenotypic selection, marker selection, and marker-assisted selection for S1 families and DH lines in the Germplasm Enhancement Program

The recurrent selection strategy modelled in the GEPMAS module (Chapter 8,

Figure 8.1) is an implementation of the Germplasm Enhancement Program of the

Northern Wheat Improvement Program. The GEPMAS module allows the modelling of

phenotypic selection (current Germplasm Enhancement Program selection method),

marker selection, and marker-assisted selection over 10 cycles of selection.

The Germplasm Enhancement Program strategy was discussed in detail in Chap-

ter 2, Section 2.3 and an outline of the procedures used to implement the S1 family

selection strategy was given in Chapter 8 (Figure 8.3). These same procedures were

applied for S1 family selection in this Chapter.

An outline of the implementation of the DH line selection strategy in the Germ-

plasm Enhancement Program is illustrated in Figure 9.3. The technology used to

implement DH line production in practice is the maize × wheat crossing strategy (Laurie

and Bennett 1986, 1988). A detailed description of the application of this technology in

Page 236: Narelle Kruger PhD thesis

202 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

the Germplasm Enhancement Program was given by Jensen and Kammholz (1998) and

Jensen (2004) and is not discussed further here. All selection methods (Figure 9.3) start

with the creation of a reference population from random mating of the F1 progeny of a

half diallel of the 10 parents. The F1 individuals from the half diallel are then randomly

mated for one cycle to create the first S0 or space plant population (10000 individuals).

HalfDiallel

Space PlantPopulation(10 000)

Randomlysample 500,

Create10 DH/plant

Space PlantPopulation(10 000)

Space PlantPopulation(10 000)

METs

1 2

3

HalfDiallel

HalfDiallel

PS MASMS

MarkerProfile

MarkerProfile

METs

Randomlysample 500,

Create10 DH/plant

Randomlysample 500,

Create10 DH/plant

Figure 9.3 Schematic outline of the simulation of phenotypic selection (PS), marker selec-tion (MS) and marker-assisted selection (MAS) procedures in the DH line recurrent selec-tion module (GEPMAS) used to simulate the Germplasm Enhancement Program. For PS, 1 indicates random mating of the reserve seed from the seed increase after multi-environment trials have been performed, for marker selection, 2 indicates random mating of the selected plants from the space plant population based on their marker profile, and for marker-assisted selection, 3 indicates random mating of the reserve seed from the seed increase after marker profiles and multi-environment trials have been performed. The implementa-tion of DH line recurrent selection in the Germplasm Enhancement Program can be com-pared to the S1 family implementation in Chapter 8, Figure 8.3

For DH line phenotypic selection, 500 individuals were randomly sampled from

the space plant population. Ten doubled haploid plants were created for each of the 500

S0 individuals to create 5000 individuals. From these 5,000 individuals, 500 were

randomly sampled and grown in multi-environment trials (10 environments). The top 50

Page 237: Narelle Kruger PhD thesis

CHAPTER 9 SELECTION IN THE GEP FOR COMPLEX GENETIC MODELS

203

lines were selected on their mean phenotypic values across the 10 environments

sampled at random from the target population of environments in the multi-environment

trials. The reserve seed (for the special case of DH lines the reserve seed is identical to

the DH lines) from the creation of the doubled haploids of the 50 selected lines was then

randomly mated to create the new space plant population for the next cycle (Figure 9.3,

PS).

For DH line marker selection, plants are solely selected on their marker profile

and do not include any phenotypic selection. For this strategy 500 individuals were

randomly sampled from the space plant population. Ten doubled haploid plants were

created for each of the 500 S0 individuals. A marker profile was created for all 5000

plants based on the results of the QTL detection analysis. The top 50 individuals, based

on their marker profiles, were randomly mated to create the new space plant population

(Figure 9.3, MS).

For DH line marker-assisted selection, 500 individuals were randomly sampled

from the space plant population. Ten DH plants were created for each of the 500 S0

individuals. A marker profile was created for all 5000 plants based on the results of the

QTL detection analysis. The top 500 individuals were selected based on their marker

profile. The selected 500 DH lines were then evaluated in a 10 environment multi-

environment trial as for phenotypic selection and 50 lines were selected. The reserve

seed from the creation of the doubled haploids of the 50 selected lines was then

randomly mated to create the new space plant population for the next cycle (Figure 9.3,

MAS). 9.2.6 Conducting the statistical analyses

An analysis of variance was conducted on the results from both the QTL detec-

tion analysis phase and the simulated response to selection phase of the Germplasm

Enhancement Program.

9.2.6.1 QTL detection analysis For the QTL detection analysis the analysis of variance was conducted on the

average of the 400 replications (20 E(NK) parameterisations × 20 parental replications)

Page 238: Narelle Kruger PhD thesis

204 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

for each of the 128 genotype-environment genetic models. The variates recorded for

each of the genetic models were: (i) percent of QTL segregating; (ii) percent of QTL

detected; (iii) percent of QTL detected of those segregating; and (iv) percent of QTL

detected with incorrect marker-QTL allele associations. The statistical model used for

the above variates is shown as Equation (9.1)..

2

2 2 2

2

( ) ( ) ( )

( ) ( ) ( ) ( ) ( ) ( )

( ) ,

ijklmn i j k l m ij ik il

im jk jl jm kl km

lm ijklmn

x GF E K c h GF E GF K GF c

GF h E K E c E h K c K h

c h

μ

ε

= + + + + + + × + × + ×

+ × + × + × + × + × + ×

+ × +

(9.1)

where:

ijklmnx is either the (i) percent of QTL detected; (ii) percent of QTL segregating;

(iii) percent of QTL detected of those segregating; or (iv) percent of QTL de-

tected with incorrect marker-QTL allele associations for observation n, for start-

ing gene frequency i, environment-type level j, epistasis level k, per meiosis re-

combination fraction level l and heritability level m,

μ is the overall mean,

iGF is the fixed effect of the ith starting gene frequency,

jE is the fixed effect of the jth environment-type level,

kK is the fixed effect of the kth epistasis level,

lc is the fixed effect of the lth per meiosis recombination fraction level,

2mh is the fixed effect of the mth heritability level,

Combinations of the above terms represent their interactions,

ijklmnε is the random residual effect of starting gene frequency i, environment-

type level j, epistasis level k, per meiosis recombination fraction level l, herita-

bility level m for observation n, 2(0, )N εε σ∼ .

9.2.6.2 Response to selection An analysis of variance was conducted on the trait mean value for the ten cycles

of selection in the Germplasm Enhancement Program. The statistical model is shown as

Equation (9.2).

Page 239: Narelle Kruger PhD thesis

CHAPTER 9 SELECTION IN THE GEP FOR COMPLEX GENETIC MODELS

205

2

2

2

2

( )

( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( )

( ) (

ijklmnopq i j k l m n o p ij

ik il im in io

ip jk jl jm jn

jo jp kl km kn

ko

x GF E K c h SS PT Cyc GF E

GF K GF c GF h GF SS GF PT

GF Cyc E K E c E h E SS

E PT E Cyc K c K h K SS

K PT

μ= + + + + + + + + + ×

+ × + × + × + × + ×+ × + × + × + × + ×

+ × + × + × + × + ×

+ × + 2) ( ) ( ) ( )

( ) ( ) ( ) ( ) ,kp lm ln lo

lp no np tp ijklmnopq

K Cyc c h c SS c PT

c Cyc SS PT SS Cyc PT Cyc ε× + × + × + ×

+ × + × + × + × +

(9.2)

where:

ijklmnopqx is the trait mean value (as a measure of response to selection) for obser-

vation q, for starting gene frequency i, environment-type level j, epistasis level

k, per meiosis recombination fraction l, heritability m, selection strategy n, popu-

lation type o and cycle p,

μ is the overall mean,

iGF is the fixed effect of the ith starting gene frequency,

jE is the fixed effect of the jth environment-type level,

kK is the fixed effect of the kth epistasis level,

lc is the fixed effect of the lth per meiosis recombination fraction level,

2mh is the fixed effect of the mth heritability level,

nSS is the fixed effect of the nth selection strategy,

oPT is the fixed effect of the oth population type,

pCyc is the fixed effect of the pth cycle,

Combinations of the above terms represent their interactions,

ijklmnopqε is the random residual effect of starting gene frequency i, environment-

type level j, epistasis level k, per meiosis recombination fraction l, heritability m,

selection strategy n, population type o, cycle p for observation q, 2(0, )N εε σ∼ .

In addition to an analysis of variance over all ten cycles of selection an analysis

of variance was also conducted for the population trait mean at cycle five. The statistical

model is shown as Equation (9.3).

Page 240: Narelle Kruger PhD thesis

206 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

2

2

2

2

2

( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )( ) (

ijklmnop i j k l m n o ij

ik il im in

io jk jl jm

jn jo kl km

kn ko lm ln

lo

x GF E K c h SS PT GF E

GF K GF c GF h GF SS

GF PT E K E c E h

E SS E PT K c K h

K SS K PT c h c SSc PT

μ= + + + + + + + + ×

+ × + × + × + ×+ × + × + × + ×

+ × + × + × + ×

+ × + × + × + ×+ × + ) ,no ijklmnopSS PT ε× +

(9.3)

where:

ijklmnopx is the trait mean value for observation p at cycle five, for starting gene

frequency i, environment-type level j, epistasis level k, per meiosis recombina-

tion fraction l, heritability m, selection strategy n and population type o,

μ is the overall mean,

iGF is the fixed effect of the ith starting gene frequency,

jE is the fixed effect of the jth environment-type level,

kK is the fixed effect of the kth epistasis level,

lc is the fixed effect of the lth per meiosis recombination fraction level,

2mh is the fixed effect of the mth heritability level,

nSS is the fixed effect of the nth selection strategy,

oPT is the fixed effect of the oth population type,

Combinations of the above terms represent their interactions,

ijklmnopε is the random residual effect of starting gene frequency i, environment-

type level j, epistasis level k, per meiosis recombination fraction l, heritability m,

selection strategy n, population type o, for observation p, 2(0, )N εε σ∼ .

For all analyses the significance level was set at a critical value of α = 0.05.

Analyses were conducted with the fixed effects constrained to sum-to-zero within the

ASREML software (Gilmour et al. 1999). A least significant difference test was

conducted on the means of the levels within a factor that had a significant F value.

Page 241: Narelle Kruger PhD thesis

CHAPTER 9 SELECTION IN THE GEP FOR COMPLEX GENETIC MODELS

207

9.3 Results 9.3.1 Analysis of the QTL detection results over all genetic models 9.3.1.1 Percent of QTL segregating

From the analysis of variance of the percent of QTL segregating (Appendix 4,

Table A4.1), the significant main effects were starting gene frequency, number of

environment-types and epistasis levels (Figure 9.4). There was a significant difference

between the two starting gene frequencies with the higher gene frequency of GF = 0.5

having a higher percent of segregating QTL than a starting gene frequency of GF = 0.1

(Figure 9.4a). Whilst there was a significant difference between a target population of

environments based on one, two, five, and 10 environment-types, the differences were

small (Figure 9.4b). All epistasis levels were significantly different, with epistasis level

K = 1 having the highest percent of segregating QTL and epistasis level K = 5 having

the lowest (Figure 9.4c). Again, while significant, these differences were also small.

Significant first-order interactions have been placed in Appendix 4, Figure A4.1 as they

did not contribute significantly to the interpretation of the results.

a c bc ab

(a) Gene frequency

Gene Frequency0.1 0.5

Perc

ent o

f QTL

seg

rega

ting

0

10

20

30

40

50

60

70(b) No. environment-types

No. environment-types1 2 5 10

0

10

20

30

40

50

60

70(c) Epistasis

Epistasis level0 1 2 5

0

10

20

30

40

50

60

70

Figure 9.4 Significant main effects from the analysis of variance for the percent of QTL segregating. All effect levels were significantly different except for those indicated by the same letter

9.3.1.2 Percent of QTL detected From the analysis of variance of the percent of QTL detected (Appendix 4, Ta-

ble A4.2), the significant main effects were starting gene frequency, number of

environment-types in the target population of environments, epistasis level, per meiosis

recombination fraction and heritability (Figure 9.5). There was a significant difference

between the two gene frequencies with the starting gene frequency of GF = 0.5 having a

lsd=0.02 lsd=0.05 lsd=0.05

Page 242: Narelle Kruger PhD thesis

208 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

higher percent of QTL detected (Figure 9.5a). Genetic models with one or two envi-

ronment-types were not significantly different and had a significantly higher percent of

QTL detected compared to five and 10 environment-types, which were both different

(Figure 9.5b). Thus, on average as the level of G×E interaction in the target population

of environments increased, the percent of QTL detected decreased. An epistatic level of

K = 1 had the highest percent of QTL detected with K = 5 having the lowest percent of

QTL detected. There was no significant difference in the percent of QTL detected

between epistasis level K = 0 and K = 2 (Figure 9.5c). Thus, in contrast with the results

in Chapter 7, where a small sample of epistatic models was tested, the wider range of

epistatic models considered in this study showed that epistasis level did affect the

percent of QTL detected. As expected from the results of Chapters 6, 7 and 8, a greater

percent of QTL detection was associated with a smaller per meiosis recombination

fraction (Figure 9.5d) and higher heritability (Figure 9.5e).

a a a a

(a) Gene frequency

Gene Frequency0.1 0.5

Perc

ent o

f QTL

det

ecte

d

0

10

20

30

40

50

60(b) No. environment-types

No. environment-types1 2 5 10

0

10

20

30

40

50

60(c) Epistasis

Epistasis level0 1 2 5

0

10

20

30

40

50

60

(d) Recombination fraction

Recombination fraction0.05 0.1

Perc

ent

of Q

TL d

etec

ted

0

10

20

30

40

50

60(e) Heritability

Heritability0.1 1

0

10

20

30

40

50

60

Figure 9.5 Significant main effects from the analysis of variance for the percent of QTL detected. All effect levels were significantly different except for those indicated by the same letter

There was a number of significant first-order interactions that affected the per-

cent of QTL detected (Appendix 4, Table A4.2). Only a select few are shown here, the

remainder can be found in Appendix 4, Figure A4.2. There was a significant interaction

for the starting gene frequency × epistasis level (GF × K) interaction. The re-ranking of

lsd=0.10 lsd=0.20 lsd=0.20

lsd=0.10 lsd=0.10

Page 243: Narelle Kruger PhD thesis

CHAPTER 9 SELECTION IN THE GEP FOR COMPLEX GENETIC MODELS

209

epistasis level K = 0 relative to K = 1, K = 2, and K = 5 occurred for the percent of QTL

detected (Figure 9.6a)

(a) GF x K

Epistasis level0 1 2 5

0

10

20

30

40

50

60(b) h2 x E

No. environment-types1 2 5 10

0

10

20

30

40

50

60(c) h2 x K

Epistasis level0 1 2 5

0

10

20

30

40

50

60

Per

cent

of Q

TL d

etec

ted

GF = 0.1GF = 0.5

h2 = 0.1h2 = 1.0

h2 = 0.1h2 = 1.0

Figure 9.6 Significant first-order interactions from the analysis of variance for the percent of QTL detected. All effect levels were significantly different except for those indicated by the same letter. GF = starting gene frequency, K = epistasis level, E = number of environ-ment-types, and h2 = heritability

For the heritability × number of environment-types (h2 × E) interaction, all

number of environment-types had the same percent of QTL detected with a

heritability of h2 = 1.0 (Figure 9.6b). All environment-types had a different percent

of QTL detected with a heritability of h2 = 0.1 (Figure 9.6b). There was a re-

ranking of epistasis level K = 0 relative to K = 1, K = 2, and K = 5 for the percent

of QTL detected over the two heritability levels for the heritability × epistasis

level (h2 × K) interaction (Figure 9.6c). Over both heritability levels, epistasis

level K = 1 had a higher percent of QTL detected than K = 2, and K = 5. With a

heritability of h2 = 0.1, epistasis level K = 0 had the same percent of QTL detected

as K = 5, and with a heritability of h2 = 1.0, K = 0 had the same percent of QTL

detected as K = 1 (Figure 9.6c).

9.3.1.3 Percent of QTL detected of those segregating From the analysis of variance of the percent of QTL detected of those segregat-

ing (Appendix 4, Table A4.3), the significant main effects were gene frequency, number

of environment-types, epistasis level, per meiosis recombination fraction and heritabil-

ity (Figure 9.7).

a a a a a b b a

lsd=1.15 lsd=1.15 lsd=1.15

Page 244: Narelle Kruger PhD thesis

210 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

a a

(a) Gene frequency

Gene Frequency0.1 0.5

Perc

ent o

f QTL

det

ecte

d o

f tho

se s

egre

gatin

g

0

20

40

60

80

100(b) No. environment-types

No. environment-types1 2 5 10

0

20

40

60

80

100(c) Epistasis

Epistasis level0 1 2 5

0

20

40

60

80

100

(d) Recombination fraction

Recombination fraction0.05 0.1

Perc

ent o

f QTL

det

ecte

d o

f tho

se s

egre

gatin

g

0

20

40

60

80

100(e) Heritability

Heritability0.1 1

0

20

40

60

80

100

Figure 9.7 Significant main effects from the analysis of variance for the percent of QTL detected of those segregating. All effect levels were significantly different except for those indicated by the same letter

The percent of QTL detected of those segregating was significantly different be-

tween the two gene frequencies, with the lower starting gene frequency of GF = 0.1

having a higher percent of QTL detected of those segregating than the higher starting

gene frequency of GF = 0.5 (Figure 9.7a). All number of environment-types were

significantly different, with E = 1 environment-type having the highest percent of QTL

detected of those segregating and E = 10 environment-types having the lowest percent

of QTL detected of those segregating (Figure 9.7b). Epistasis levels of K = 1 and K = 2

were not significantly different and also had the highest percent of QTL detected of

those segregating, while K = 5 had the lowest percent of QTL detected of those

segregating (Figure 9.7c). Per meiosis recombination fraction was significantly

different, with the percent of QTL detected of those segregating lower with a larger per

meiosis recombination fraction (Figure 9.7d). There was a significant difference

between heritability levels with a heritability of h2 = 1.0 detecting a higher percent of

QTL that were segregating than a heritability of h2 = 0.1 (Figure 9.7e).

There were four significant first-order interactions that affected the percent of

QTL detected of those segregating (Appendix 4, Table A4.3). There was a re-ranking of

epistatic levels K = 0 and K = 1 relative to K = 2, and K = 5 for the percent of QTL

lsd=0.12 lsd=0.25 lsd=0.25

lsd=0.12 lsd=0.12

Page 245: Narelle Kruger PhD thesis

CHAPTER 9 SELECTION IN THE GEP FOR COMPLEX GENETIC MODELS

211

detected of those segregating over the two heritability levels for the heritability ×

epistasis level (h2 × K) interaction (Figure 9.8a). There was a significant difference in

the percent of QTL detected of those segregating for each epistatic level at both

heritability levels (Figure 9.8b) For the heritability × number of environment-types (h2

× E) interaction, all number of environment-types had the same percent of QTL

detected of those segregating with a heritability of h2 = 1.0 (Figure 9.8b). With a

heritability of h2 = 0.1 all number of environment-types were different, with E = 1

environment-type having the highest percent of QTL detected of those segregating and

E = 10 environment-types having the lowest percent of QTL detected of those segregat-

ing. The remaining interactions can be found in Appendix 4, Figure A4.3.

(b) h2 x E

No. environment-types1 2 5 10

0

20

40

60

80

100(a) h2 x K

Epistasis level0 1 2 5

Per

cent

of Q

TL d

etec

ted

of t

hose

seg

rega

ting

0

20

40

60

80

100 h2 = 0.1h2 = 1.0h2 = 0.1

h2 = 1.0 a a a a

Figure 9.8 Significant first-order interactions from the analysis of variance for the percent of QTL detected of those segregating. All effect levels were significantly different except for those indicated by the same letter. K = epistasis level, E = number of environment-types, and h2 = heritability

9.3.1.4 Percent of QTL detected with incorrect marker-QTL allele associa-tions

From the percent of QTL detected with incorrect marker-QTL allele associations

analysis of variance (Appendix 4, Table A4.4), the significant main effects were gene

frequency, number of environment-types, epistasis level, and heritability. There was a

significant difference between the two gene frequencies with the lower gene frequency

having a higher percent of QTL detected with incorrect marker-QTL allele associations

(Figure 9.9a). For the number of environment-types, E = 1 and E = 2 environment-types

were not significantly different and had a significantly lower percent of QTL detected

with incorrect marker-QTL allele associations compared to E = 5 and E = 10 environ-

ment-types, which were significantly different (Figure 9.9b). As the level of G×E

interaction increased with the number of environment-types (E), there was a tendency

lsd=1.45 lsd=1.45

Page 246: Narelle Kruger PhD thesis

212 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

for an increase in the percent of QTL detected with incorrect marker-QTL allele

associations. All epistasis levels were significantly different with epistasis level K = 0

(i.e. the additive model) having the lowest percent of QTL detected with incorrect

marker-QTL allele associations and K = 5 having the highest percent of QTL detected

with incorrect marker-QTL allele associations (Figure 9.9c). As the level of epistasis

increased there was a strong trend towards an increase in the percent of QTL detected

with incorrect marker-QTL allele associations. In the presence of epistasis, the percent

of QTL detected with incorrect marker-QTL allele associations was substantial, ranging

from 32.7% for epistasis level K = 1 to 49.6% for K = 5. The heritability levels were

significantly different, with a slightly higher percent of QTL detected with incorrect

marker-QTL allele associations for a heritability of h2 = 1.0 compared to a heritability

of h2 = 0.1 (Figure 9.9d).

a a

(a) Gene frequency

Gene Frequency0.1 0.5

Perc

ent o

f QTL

det

ecte

d w

ith IA

A

0

10

20

30

40

50

60(b) No. environment-types

No. environment-types1 2 5 10

0

10

20

30

40

50

60

(c) Epistasis

Epistasis level0 1 2 5

Perc

ent

of Q

TL d

etec

ted

with

IAA

0

10

20

30

40

50

60(d) Heritability

Heritability0.1 1

0

10

20

30

40

50

60

Figure 9.9 Significant main effects from the analysis of variance for the percent of incorrect marker-QTL allele associations. All effect levels were significantly different except for those indicated by the same letter

There were several significant first-order interactions from the analysis of vari-

ance for the percent of QTL detected with incorrect marker-QTL allele associations

(Appendix 4, Table A4.4). A significant interaction existed between the heritability and

number of environment-types (h2 × E), with the effects being small (Figure 9.10a).

lsd=0.08 lsd=0.17

lsd=0.17 lsd=0.08

Page 247: Narelle Kruger PhD thesis

CHAPTER 9 SELECTION IN THE GEP FOR COMPLEX GENETIC MODELS

213

There was no difference in the percent of QTL detected with incorrect marker-QTL

allele associations for E = 1 environment-type and E = 2 environment-types for the two

heritability levels. There was a significant interaction between the level of epistasis and

number of environment-types (K × E) for the percent of QTL detected with incorrect

marker-QTL allele associations. Each epistasis level had a different percent of QTL

detected with incorrect marker-QTL allele associations for all numbers of environment-

types, with the ranking of each epistatic level at each number of environment-types

being consistent; epistatic level K = 5 > K = 2 > K = 1 > K = 0 (Figure 9.10b). With the

non-epistatic model (K = 0) as the number of environment-types increases, so did the

percent of QTL detected with incorrect marker-QTL allele associations. The remaining

interactions can be found in Appendix 4, Figure A4.4.

(b) K x E

No. environment-types1 2 5 10

0

10

20

30

40

50

60(a) h2 x E

No. environment-types1 2 5 10

Per

cent

of Q

TL d

etec

ted

with

IAA

0

10

20

30

40

50

60h2 = 0.1h2 = 1.0

K = 0K = 1

K = 2K = 5

a a b b

Figure 9.10 Significant first-order interactions from the analysis of variance for the percent of QTL detected with incorrect marker-QTL allele associations. All effect levels were sig-nificantly different except for those indicated by the same letter. K = epistasis level, E = number of environment-types and h2 = heritability

Introducing the effects of G×E interaction and epistasis into the E(NK) model

resulted in an increase in the percent of QTL detected with incorrect marker-QTL allele

associations (Figure 9.10b). One way of observing this effect is to construct a three-

dimensional plot containing the percent of replications, the percent of QTL detected,

and the percent of QTL detected with incorrect marker-QTL allele associations. This

figure visualises under what conditions incorrect marker-QTL allele associations were

observed. A subset of models has been used to illustrate this effect (Figure 9.11). In the

simple additive model scenario with no G×E interaction or epistasis, E(NK) = 1(12:0),

GF = 0.1, c = 0.05, and h2 = 1.0 (Figure 9.11a), there were no QTL detected with

incorrect marker-QTL allele associations.

lsd=0.96 lsd=1.37

Page 248: Narelle Kruger PhD thesis

214 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

The introduction of the highest level of epistasis modelled in this experiment,

(E(NK) = 1(12:5), Figure 9.11b) resulted in a large increase in the percent of QTL

detected with incorrect marker-QTL allele associations. For replicates with 8, 16, 25, 33

and 50 percent of the QTL detected, all QTL were identified to have incorrect marker-

QTL allele associations. With an increase in G×E interaction to the highest level

modelled in this experiment and no epistasis (E(NK) = 10(12:0), Figure 9.11c), the

percent of QTL detected with incorrect marker-QTL allele associations increased

compared to the additive E(NK) = 1(12:0) model (Figure 9.11a).

0

10

20

30

40

020

4060

80100

020

4060

80

Per

cent

of r

eplic

atio

ns

Percent QTL Detected

Percent QTL with IAA

(a) E(NK) = 1(12:0), c = 0.05, GF = 0.1, h2 = 1.0

0

10

20

30

40

020

4060

80100

020

4060

80

Per

cent

of r

eplic

atio

ns

Percent QTL DetectedPercent QTL with IAA

(b) E(NK) = 1(12:5), c = 0.05, GF = 0.1, h2 = 1.0

0

10

20

30

40

020

4060

80100

020

4060

80

Per

cent

of r

eplic

atio

ns

Percent QTL Detected

Percent QTL with IAA

(c) E(NK) = 10(12:0), c = 0.05, GF = 0.1, h2 = 1.0

0

10

20

30

40

020

4060

80100

020

4060

80

Per

cent

of r

eplic

atio

ns

Percent QTL Detected

Percent QTL with IAA

(d) E(NK) = 10(12:5), c = 0.05, GF = 0.1, h2 = 1.0

Figure 9.11 Percent of QTL detected with incorrect marker-QTL allele associations (IAA) against the percent of QTL detected, and the percent of replications containing those com-binations for (a) a simple additive case, E(NK) = 1(12:0), (b) increasing epistasis value E(NK) = 1(12:5), (c) increasing the number environment-types E(NK) = 10(12:0), and (d) increasing both epistasis and environment-types E(NK) = 10(12:5) for a per meiosis recom-bination fraction of c = 0.05, gene frequency of GF = 0.1 and heritability of h2 = 1.0

For the E(NK) = 10(12:0) model there was one replicate case were 42 percent of

the QTL were detected, and all QTL were identified to have incorrect marker-QTL

Page 249: Narelle Kruger PhD thesis

CHAPTER 9 SELECTION IN THE GEP FOR COMPLEX GENETIC MODELS

215

allele associations. When both G×E interaction and epistasis were included (E(NK) =

10(12:5), Figure 9.11d) there was once again five replicate incidences where 8, 16, 25,

33 and 50 percent of the QTL detected all had incorrect marker-QTL allele associa-

tions. This was the same when only epistasis was included in the model (Figure 9.11b),

however the frequency of replications that these cases occurred was higher than the

epistasis only model. Overall, epistasis seems to be the larger contributor towards the

occurrence of incorrect marker-QTL allele associations in comparison to G×E interac-

tions.

9.3.2 Analysis of the trait mean value (response to selection) 9.3.2.1 Analysis over 10 cycles of selection of the Germplasm Enhance-ment Program

All main effects were found to be significant for the analysis of variance of the

trait mean value conducted over 10 cycles of selection of the Germplasm Enhancement

Program (Appendix 4, Table A4.5). Averaged over the remaining factors, the trait mean

value or response to selection increased as the number of cycles increased (Figure

9.12a). Marker-assisted selection had a higher trait mean value than phenotypic

selection and marker selection (Figure 9.12b), and DH lines had a higher trait mean

value than S1 families (Figure 9.12c). A higher trait mean value was observed for the

higher starting gene frequency in the base population (Figure 9.12d) and for the higher

heritability (Figure 9.12e). There was also a difference in the trait mean value between

the two per meiosis recombination fractions (Figure 9.12f). The differences observed

for selection strategy, starting gene frequency, heritability and per meiosis recombina-

tion fraction are consistent with the results of Chapter 8. There was little difference

between the four levels of epistasis. An epistatic level of K = 1 had the lowest trait mean

value, with K = 2 and K = 0 having the same trait mean value and K = 5 having a

slightly higher trait mean value (Figure 9.12g). The number of environment-types in the

target population of environments affected the trait mean value, with the trait mean

decreasing as the number of environment-types increased (Figure 9.12h).

Many of the first-order interactions for the trait mean value over 10 cycles of se-

lection in the Germplasm Enhancement Program were significant (Appendix 4, Table

Page 250: Narelle Kruger PhD thesis

216 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

A4.5). Only a selected few of these interactions will be illustrated here, the remainder

can be found in Appendix 4, Figure A4.5.

ab b c

(a) Cycles

Cycle0 1 2 3 4 5 6 7 8 9 10

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80 (b) Selection strategy

Selection strategyPS MS MAS

0

20

40

60

80

(c) Population type

Population typeS1 DH

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80(d) Gene frequency

Gene frequency0.1 0.5

0

20

40

60

80

(e) Heritability

Heritability0.1 1

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80(f) Recombination fraction

Recombination fraction0.05 0.1

0

20

40

60

80

(g) Epistasis

Epistasis level0 1 2 5

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80(h) No. environment-types

No. environment-types1 2 5 10

0

20

40

60

80

Figure 9.12 Significant main effects from analysis of variance conducted over 10 cycles of the Germplasm Enhancement Program. All experimental variable levels were significantly different except epistasis where levels of zero and two were not significantly different. All effect levels were significantly different except for those indicated by the same letter

For the epistasis × cycle (K × cycle) interaction, an epistatic level of K = 1 and K

= 2 generally had the same response over all cycles (Figure 9.13a). For epistatic level K

= 2 and K = 5 their response to selection was similar over the first five cycles, yet by

lsd=0.27 lsd=0.27

lsd=0.19 lsd=0.19

lsd=0.19 lsd=0.19

lsd=0.45 lsd=0.23

Page 251: Narelle Kruger PhD thesis

CHAPTER 9 SELECTION IN THE GEP FOR COMPLEX GENETIC MODELS

217

cycle 10, K = 5 had the higher response. For epistatic level K = 0, the trait mean value in

the first two cycles was the lowest however, from cycle five to cycle 10, epistatic level

K = 0 had the highest trait mean value. For the selection strategy × population type (SS

× PT) interaction, the trait mean value followed the descending order of DH-MAS > S1-

MAS > DH-PS > S1-PS > DH-MS > S1-MS (Figure 9.13b). Therefore, the use of DH

lines always gave a higher response than S1 families for each selection strategy (Figure

9.13b). For both the selection strategy × epistasis (SS × K) interaction (Figure 9.13c)

and selection strategy × number of environment-types (SS × E) interaction (Figure

9.13d), marker-assisted selection had a higher response to selection than phenotypic

selection and marker selection.

(a) K x cycle

Cycle

0 1 2 3 4 5 6 7 8 9 100

20

40

60

80

100(b) SS x PT

Population type

S1 DH0

20

40

60

80

100

(c) SS x K

Epistasis level

0 1 2 50

20

40

60

80

100(d) SS x E

No. environment-types

1 2 5 100

20

40

60

80

100

K = 0K = 1K = 2K = 5

PSMSMAS

PSMSMAS

PSMSMAS

Trai

t mea

n va

lue

(% o

f TG

)Tr

ait m

ean

valu

e (%

of T

G)

Figure 9.13 Significant first-order interactions from the analysis of variance conducted over 10 cycles of the Germplasm Enhancement Program. K = epistasis level, E = number of envi-ronment-types, SS = selection strategy, PT = population type

9.3.2.2 Analysis conducted at cycle five of the Germplasm Enhancement Program

The analysis of variance conducted over 10 cycles of selection provided a repre-

sentation of what occurred on average over the course of the breeding program. The 10

cycle analysis showed that by cycle five a large amount of the progress had, on average,

been achieved (Figure 9.12a), therefore, an analysis of variance was also conducted on

lsd=0.89 lsd=0.33

lsd=0.47 lsd=0.47

Page 252: Narelle Kruger PhD thesis

218 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

the data at this half-way point of the breeding program. After five cycles of selection,

the Germplasm Enhancement Program would have in practice, progressed for 20 years

(five cycles by four years per cycle), a significant amount of career time for a breeder to

dedicate to one breeding program.

In general the results of the analysis of variance for cycle five (Appendix 4, Ta-

ble A4.6) were similar to the results from the analysis over all 10 cycles. All of the main

effects were significant at cycle five, except for per meiosis recombination fraction

(Figure 9.14). Marker-assisted selection had the highest trait mean value followed by

phenotypic selection and marker selection (Figure 9.14a). Doubled haploid lines had a

higher trait mean value than S1 families (Figure 9.14b). The trait mean value was

highest for the higher starting gene frequency (Figure 9.14c) and higher heritability

(Figure 9.14d). The number of environment-types in the target population of environ-

ments affected the trait mean value, with the trait mean value at cycle five decreasing as

the number of environment-types or level of G×E interaction increased (Figure 9.14e).

There was a significant difference between the four epistasis levels with K = 0 having a

higher trait mean value than K = 5, K = 2 and K = 1 (Figure 9.14f).

(a) Selection strategy

Selection strategyPS MS MAS

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80 (b) Population type

Population typeS1 DH

0

20

40

60

80(c) Gene frequency

Gene frequency0.1 0.5

0

20

40

60

80

(d) Heritability

Heritability0.1 1

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80 (f) Epistasis

Epistasis level0 1 2 5

0

20

40

60

80(e) No. environment-types

No. environment-types1 2 5 10

0

20

40

60

80

Figure 9.14 Significant main effects from analysis of variance conducted at cycle five of the Germplasm Enhancement Program. All experimental variable levels were significantly dif-ferent

lsd=0.58 lsd=0.83 lsd=0.83

lsd=0.71 lsd=0.58 lsd=0.58

Page 253: Narelle Kruger PhD thesis

CHAPTER 9 SELECTION IN THE GEP FOR COMPLEX GENETIC MODELS

219

Many of the first-order interactions were significant at cycle five (Appendix 4,

Table A4.6) however, as they are all similar to the results over 10 cycles of selection

they have been placed in Appendix 4, Figure A4.6.

9.3.3 Detailed analysis of the trait mean value for specific genetic models

Of the 16 E(NK) models (four environment-type levels by four epistasis levels)

examined, four examples were selected to represent the transition from simple to

complex genetic models. Case 1 is the simplest genetic model containing no G×E

interaction or epistasis (E(NK) = 1(12:0)). Case 2 considers the effect of G×E interac-

tion as the number of environment-types in the target population of environments is

increased to ten, while epistasis remains absent (E(NK) = 10(12:0)). Case 3 considers

the effect of increasing the level of epistasis, while leaving the number of environment-

types at one (E(NK) = 1(12:5)). Case 4 considers the most complex model with both

G×E interactions and epistasis combined (E(NK) = 10(12:5)). For each of the E(NK)

models two starting gene frequencies (GF = 0.1 and GF = 0.5), and two heritability

levels (h2 = 0.1 and h2 = 1.0) were considered for a per meiosis recombination fraction

of c = 0.1 (as there was little difference for the per meiosis recombination fraction

levels).

9.3.3.1 Case 1: No G×E interaction, no epistasis; E(NK) = 1(12:0)

The E(NK) = 1(12:0) model, was the simplest trait genetic architecture exam-

ined. With a low gene frequency (GF = 0.1) and low heritability (h2 = 0.1), 85% of the

segregating QTL were detected and a small percent of QTL on average were detected

with incorrect marker-QTL allele associations (0.3%, (Figure 9.15a). Marker-assisted

selection had a higher response to selection than phenotypic selection over all 10 cycles,

while marker selection performed better than phenotypic selection for the first three

cycles of selection for S1 families. For the DH lines the response to selection of marker-

assisted selection and phenotypic selection was higher than for S1 families. The

response to selection was higher for marker-assisted selection than phenotypic selection

until cycle seven, after which the responses were similar with a slight increase associ-

ated with phenotypic selection by cycle 10. The marker selection strategy achieved a

Page 254: Narelle Kruger PhD thesis

220 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

higher trait mean than phenotypic selection until cycle two. An increase in the heritabil-

ity resulted in a higher percent of the segregating QTL being detected (Figure 9.15b).

For marker selection there was no difference in mean trait value achieved for DH lines

and S1 families over the two heritabilities (Figure 9.15a and b), however, for both

marker-assisted selection and phenotypic selection the response to selection was greater

for both S1 families and DH lines with a heritability of h2 = 1.0 than with h2 = 0.1.

Aver

age

% o

f QTL

0

20

40

60

80

100SegDetD/SIAA

S1 families

Cycle0 2 4 6 8 10

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100

PSMSMAS

DH lines

Cycle0 2 4 6 8 10

0

20

40

60

80

100

PSMSMAS

Aver

age

% o

f QTL

0

20

40

60

80

100SegDetD/SIAA

S1 families

Cycle0 2 4 6 8 10

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100

PSMSMAS

DH lines

Cycle0 2 4 6 8 10

0

20

40

60

80

100

PSMSMAS

E(NK)=1(12:0), GF = 0.1, c = 0.1(a) h2 = 0.1

(b) h2 = 1.0

Figure 9.15 Average percent of QTL segregating (Seg), detected (Det), detected of segre-gating (D/S) and incorrect marker-QTL allele associations (IAA), with corresponding trait mean value response as a percent of the target genotype for phenotypic selection (PS), marker selection (MS) and marker-assisted selection (MAS) of S1 families and DH lines for a E(NK) = 1(12:0) model. GF = gene frequency h2 =heritability and c = per meiosis recom-bination fraction

Increasing the starting gene frequency in the base population from GF = 0.1 to

GF = 0.5 increased the trait mean value at cycle zero for both S1 families and DH lines

(Figure 9.16). The response to selection was increased in comparison to the low starting

gene frequency (cf. Figure 9.15), as a higher proportion of the favourable alleles were

already present in the population. The percent of QTL segregating for both heritabilities

was around double that of the case of the lower gene frequency (Figure 9.15) however,

the percent of QTL detected of those segregating was approximately the same. A trait

Page 255: Narelle Kruger PhD thesis

CHAPTER 9 SELECTION IN THE GEP FOR COMPLEX GENETIC MODELS

221

mean value of 100% was achieved for both S1 families and DH lines with a heritability,

h2 = 1.0 (Figure 9.16b). Marker-assisted selection produced a faster initial response than

phenotypic selection for both S1 families and DH lines both heritability levels. Overall,

DH lines produced a faster response than S1 families for the three selection strategies.

Aver

age

% o

f QTL

0

20

40

60

80

100SegDetD/SIAA

S1 families

Cycle0 2 4 6 8 10

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100

PSMSMAS

DH lines

Cycle0 2 4 6 8 10

0

20

40

60

80

100

PSMSMAS

Aver

age

% o

f QTL

0

20

40

60

80

100SegDetD/SIAA

S1 families

Cycle0 2 4 6 8 10

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100

PSMSMAS

DH lines

Cycle0 2 4 6 8 10

0

20

40

60

80

100

PSMSMAS

E(NK)=1(12:0), GF = 0.5, c = 0.1(a) h2 = 0.1

(b) h2 = 1.0

Figure 9.16 Average percent of QTL segregating (Seg), detected (Det), detected of segre-gating (D/S) and incorrect marker-QTL allele associations (IAA), with corresponding trait mean value response as a percent of the target genotype for phenotypic selection (PS), marker selection (MS) and marker-assisted selection (MAS) of S1 families and DH lines for a E(NK) = 1(12:0) model. GF = gene frequency h2 =heritability and c = per meiosis recom-bination fraction

Graphing the trait mean value for each of the 400 replications (20 parameterisa-

tions × 20 parental replications) of the E(NK) = 1(12:0) model shows the variation and

range of responses that occurred (Figure 9.17). For the simple model; Case 1: E(NK) =

1(12:0), GF = 0.1, c = 0.1, and h2 = 1.0 (Figure 9.17), the trait mean value for S1 family

runs was more variable than for the DH lines. Phenotypic selection was the least

variable strategy, whereas marker selection was the most variable. It is also noted that in

the marker selection strategy for both S1 families (Figure 9.17b) and DH lines (Figure

9.17e) that no progress was made for two of the 400 replications.

Page 256: Narelle Kruger PhD thesis

222 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

(a) S1 PS

Cycle0 2 4 6 8 10

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100(b) S1 MS

Cycle0 2 4 6 8 10

0

20

40

60

80

100(c) S1 MAS

Cycle0 2 4 6 8 10

0

20

40

60

80

100

(d) DH PS

Cycle0 2 4 6 8 10

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100(e) DH MS

Cycle0 2 4 6 8 10

0

20

40

60

80

100 (f) DH MAS

Cycle0 2 4 6 8 10

0

20

40

60

80

100

Figure 9.17 400 replications of the response to selection for DH and S1 families for the three selection strategies (phenotypic selection (PS), marker selection (MS) and marker-assisted selection (MAS)), E(NK) = 1(12:0) model with gene frequency of 0.1, per meiosis recombination fraction of 0.1 and heritability of 1.0. Corresponds to the set of graphs in Figure 9.15b

The E(NK) = 1(12:0) model may represent an overly simplified situation in that

effects of G×E interaction and epistasis are excluded. This is the common assumption

made in theoretical treatments of marker-assisted selection. In the following three cases

these assumptions are relaxed within the framework of the E(NK) model and the change

in trait mean value is examined for these more complex cases.

9.3.3.2 Case 2: G×E interaction present, no epistasis; E(NK) = 10(12:0)

The E(NK) = 10(12:0) model introduces an increase in the complexity of the ge-

netic architecture of the trait by including the effects of G×E interaction for 10

environment-types in the target population of environments. Consider first the case

where the frequency of the favourable alleles in the base population is low (GF = 0.1)

(Figure 9.18). With a low gene frequency (GF = 0.1) and low heritability (h2 = 0.1),

63% of the segregating QTL were detected, and about 7% of those had incorrect

marker-QTL allele associations (Figure 9.18a). Marker-assisted selection produced a

higher response to selection than phenotypic selection over all 10 cycles for both S1

families and DH lines. The marker selection response was higher than marker-assisted

Page 257: Narelle Kruger PhD thesis

CHAPTER 9 SELECTION IN THE GEP FOR COMPLEX GENETIC MODELS

223

selection for cycle one and two for S1 families and cycle one for DH lines. For DH lines

the response of marker-assisted selection and phenotypic selection was faster and higher

than for S1 families.

Compared to the E(NK) = 1(12:0) model (Figure 9.15 and 9.16), introducing

G×E interaction by including 10 environment-types in the target population of environ-

ments decreased the magnitude of the response for all breeding strategies considered,

especially in the long-term (Figure 9.18 and Figure 9.19). This effect was more

dramatic at a gene frequency of GF = 0.1 than for the case where the gene frequency

commenced at GF = 0.5 in the base population.

Aver

age

% o

f QTL

0

20

40

60

80

100SegDetD/SIAA

S1 families

Cycle0 2 4 6 8 10

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100

PSMSMAS

DH lines

Cycle0 2 4 6 8 10

0

20

40

60

80

100

PSMSMAS

Aver

age

% o

f QTL

0

20

40

60

80

100SegDetD/SIAA

S1 families

Cycle0 2 4 6 8 10

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100

PSMSMAS

DH lines

Cycle0 2 4 6 8 10

0

20

40

60

80

100

PSMSMAS

E(NK)=10(12:0), GF = 0.1, c = 0.1(a) h2 = 0.1

(b) h2 = 1.0

Figure 9.18 Average percent of QTL segregating (Seg), detected (Det), detected of segre-gating (D/S) and incorrect marker-QTL allele associations (IAA), with corresponding trait mean value response as a percent of the target genotype for phenotypic selection (PS), marker selection (MS) and marker-assisted selection (MAS) of S1 families and DH lines for a E(NK) = 10(12:0) model. GF = gene frequency h2 =heritability and c = per meiosis recom-bination fraction

An increase in the heritability resulted in a higher percent of the segregating

QTL being detected, with 13% of the QTL having incorrect marker-QTL allele

associations (Figure 9.18b). The trait mean value for marker-assisted selection and

Page 258: Narelle Kruger PhD thesis

224 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

phenotypic selection was greater for both S1 families and DH lines with the increase in

heritability, however, in the long-term the marker-assisted selection trait mean value

was less than phenotypic selection due to the influence of incorrect marker-QTL allele

associations. Overall the increase in trait mean value for DH lines was faster than for S1

families.

Aver

age

% o

f QTL

0

20

40

60

80

100SegDetD/SIAA

S1 families

Cycle0 2 4 6 8 10

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100

PSMSMAS

DH lines

Cycle0 2 4 6 8 10

0

20

40

60

80

100

PSMSMAS

Aver

age

% o

f QTL

0

20

40

60

80

100SegDetD/SIAA

S1 families

Cycle0 2 4 6 8 10

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100

PSMSMAS

DH lines

Cycle0 2 4 6 8 10

0

20

40

60

80

100

PSMSMAS

E(NK)=10(12:0), GF = 0.5, c = 0.1(a) h2 = 0.1

(b) h2 = 1.0

Figure 9.19 Average percent of QTL segregating (Seg), detected (Det), detected of segre-gating (D/S) and incorrect marker-QTL allele associations (IAA), with corresponding trait mean value response as a percent of the target genotype for phenotypic selection (PS), marker selection (MS) and marker-assisted selection (MAS) of S1 families and DH lines for a E(NK) = 10(12:0) model. GF = gene frequency h2 =heritability and c = per meiosis recom-bination fraction

Increasing the gene frequency of the favourable alleles in the base population to

GF = 0.5 (Figure 9.19) increased the population mean and the initial trait mean value of

S1 families and DH lines was increased as the frequency of favourable QTL alleles in

the reference population was higher. In general the response to selection was faster than

for the lower gene frequency. The percent of QTL segregating for both heritabilities was

approximately double that of the lower gene frequency however, the percent of QTL

detected of those segregating was lower with h2 = 0.1 than a h2 = 1.0. As with the lower

Page 259: Narelle Kruger PhD thesis

CHAPTER 9 SELECTION IN THE GEP FOR COMPLEX GENETIC MODELS

225

gene frequency the marker-assisted selection response wasn’t always the best (Figures

9.18a and 9.19a). (a) S1 PS

Cycle0 2 4 6 8 10

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100(b) S1 MS

Cycle0 2 4 6 8 10

0

20

40

60

80

100(c) S1 MAS

Cycle0 2 4 6 8 10

0

20

40

60

80

100

(d) DH PS

Cycle0 2 4 6 8 10

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100(e) DH MS

Cycle0 2 4 6 8 10

0

20

40

60

80

100 (f) DH MAS

Cycle0 2 4 6 8 10

0

20

40

60

80

100

Figure 9.20 400 replications of the response to selection for DH and S1 families for the three selection strategies (phenotypic selection (PS), marker selection (MS) and marker-assisted selection (MAS)), E(NK) = 10(12:0) model with gene frequency of 0.1, per meiosis recombination fraction of 0.1 and heritability of 1.0. Corresponds to the set of graphs in Figure 9.18b

Graphing the trait mean value for each of the 400 replications (20 parameterisa-

tions × 20 parental replications) of the E(NK) = 10(12:0) model shows the variation and

range of responses that occurred (Figure 9.20). For the case with G×E interactions by

including 10 environment-types in the target population of environments, E(NK) =

10(12:0), GF = 0.1, c = 0.1, and h2 = 1.0 (Figure 9.20), the trait mean value for S1

family runs was as variable as for the DH lines. The response to selection of phenotypic

selection and marker-assisted selection was more variable when G×E interaction was

included (Figure 9.20) compared to when it was not included (Figure 9.17).

9.3.3.3 Case 3: No G×E interaction, epistasis present; E(NK) = 1(12:5) The E(NK) = 1(12:5) model introduced an increase in the complexity of the ge-

netic architecture of the trait by introducing epistatic networks of genes into the model.

In this case, on average five genes (i.e. K = 5) were acting on every other gene. With a

12 gene model (N = 12), there were always two sets of six genes interacting. Consider-

ing first, where the frequency of the favourable alleles was low in the population (GF =

Page 260: Narelle Kruger PhD thesis

226 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

0.1) with a low heritability (h2 = 0.1), 91% of the segregating QTL were detected, and

53% of those had incorrect marker-QTL allele associations relative to the target

genotype (Figure 9.21a). Because of the conditional effects of the alleles in the epistatic

networks the population mean was approximately 50% of the target genotype in the

base population. Marker-assisted selection had a higher response to selection than

phenotypic selection over the first six cycles for both S1 families and DH lines. The

marker selection strategy had the lowest overall trait mean value for both S1 families

and DH lines. For the DH lines, the response of marker-assisted selection and pheno-

typic selection was higher initially than for S1 families. An increase in the heritability

(Figure 9.21b) saw no change in the percent of QTL detected of those segregating or the

percent of QTL detected with incorrect marker-QTL allele associations. There was an

improvement of the phenotypic selection response, which exceeded marker-assisted

selection one cycle earlier for S1 families and DH lines in comparison to the low

heritability case (Figure 9.21a).

Aver

age

% o

f QTL

0

20

40

60

80

100SegDetD/SIAA

S1 families

Cycle0 2 4 6 8 10

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100

PSMSMAS

DH lines

Cycle0 2 4 6 8 10

0

20

40

60

80

100

PSMSMAS

Aver

age

% o

f QTL

0

20

40

60

80

100SegDetD/SIAA

S1 families

Cycle0 2 4 6 8 10

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100

PSMSMAS

DH lines

Cycle0 2 4 6 8 10

0

20

40

60

80

100

PSMSMAS

E(NK)=1(12:5), GF = 0.1, c = 0.1(a) h2 = 0.1

(b) h2 = 1.0

Figure 9.21 Average percent of QTL segregating (Seg), detected (Det), detected of segre-gating (D/S) and incorrect marker-QTL allele associations (IAA), with corresponding trait mean value response as a percent of the target genotype for phenotypic selection (PS), marker selection (MS) and marker-assisted selection (MAS) of S1 families and DH lines for a E(NK) = 1(12:5) model. GF = gene frequency h2 =heritability and c = per meiosis recom-bination fraction

Page 261: Narelle Kruger PhD thesis

CHAPTER 9 SELECTION IN THE GEP FOR COMPLEX GENETIC MODELS

227

Increasing the starting gene frequency in the base population to GF = 0.5 (Figure

9.22) did not cause an increase in the population trait mean at cycle zero, in comparison

with a gene frequency of GF = 0.1 (Figure 9.21). This is a contrast to the two previous

cases where epistasis was not included in the model. The percent of QTL segregating

for both heritabilities (Figure 9.22) was around double that of the lower gene frequency

(Figure 9.21) however, the percent of QTL detected of those segregating decreased to

around 70%. The percent of QTL detected with incorrect marker-QTL allele associa-

tions remained around 50%. The response of the trait mean value across cycles for

phenotypic selection and marker-assisted selection based on S1 families became

sigmoidal, or s-shaped, indicating a low initial response to selection, followed by a mid-

term rapid response, and returning to a slower response in the later cycles of selection.

This effect was not observed for the DH lines. Marker-assisted selection had the highest

response for both S1 families and DH lines when c = 0.05 and h2 = 0.1 (Figure 9.22a).

Increasing heritability to h2 = 1.0, resulted in marker-assisted selection reaching a

plateau earlier than phenotypic selection for both S1 families and DH lines. However,

phenotypic selection had the highest response in the long-term (Figure 9.22b). Overall,

DH lines produced a faster initial response than S1 families for marker-assisted selection

and marker selection, whereas S1 families showed no response to the marker selection

strategy. Marker-assisted selection had faster initial response than phenotypic selection.

Compared to the E(NK) = 1(12:0) model (Figure 9.15 and Figure 9.16), intro-

ducing epistasis (K = 5) resulted in more complex patterns of response to selection for

the different selection strategies. In general the responses to selection was slower with

epistasis included in the genetic model. In the long-term, the simple E(NK) = 1(12:0)

model (Figure 9.15 and Figure 9.16) produced a greater response than the E(NK) =

1(12:5) model (Figure 9.21 and Figure 9.22).

Page 262: Narelle Kruger PhD thesis

228 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

Aver

age

% o

f QTL

0

20

40

60

80

100SegDetD/SIAA

S1 families

Cycle0 2 4 6 8 10

Trai

t mea

n va

lue

(% o

f TG

)0

20

40

60

80

100

PSMSMAS

DH lines

Cycle0 2 4 6 8 10

0

20

40

60

80

100

PSMSMAS

Aver

age

% o

f QTL

0

20

40

60

80

100SegDetD/SIAA

S1 families

Cycle0 2 4 6 8 10

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100

PSMSMAS

DH lines

Cycle0 2 4 6 8 10

0

20

40

60

80

100

PSMSMAS

E(NK)=1(12:5), GF = 0.5, c = 0.1(a) h2 = 0.1

(b) h2 = 1.0

Figure 9.22 Average percent of QTL segregating (Seg), detected (Det), detected of segre-gating (D/S) and incorrect marker-QTL allele associations (IAA), with corresponding trait mean value response as a percent of the target genotype for phenotypic selection (PS), marker selection (MS) and marker-assisted selection (MAS) of S1 families and DH lines for a E(NK) = 1(12:5) model. GF = gene frequency h2 =heritability and c = per meiosis recom-bination fraction

Graphing the trait mean value for each of the 400 replications (20 parameterisa-

tions × 20 parental replications) of the E(NK) = 1(12:5) model shows the variation and

range of responses that occurred (Figure 9.23). For the case with epistasis by including

a level of K = 5, E(NK) = 1(12:5), GF = 0.1, c = 0.1, and h2 = 1.0 (Figure 9.23), the trait

mean value for S1 family runs was slightly more variable than for the DH lines. These

graphs also help explain the effect epistasis has on increasing the base population mean

(cycle zero). With epistasis absent E(NK) = 1(12:0), GF = 0.1, c = 0.1, and h2 = 1.0

(Figure 9.17), the variation of each of the population types and selection methods at

cycle zero were small. With the inclusion of epistasis (Figure 9.23), the variation at

cycle zero was large due to the conditional effects of the alleles in the epistatic

networks, and on average created a higher initial trait mean value (Figure 9.21 and

Figure 9.22).

Page 263: Narelle Kruger PhD thesis

CHAPTER 9 SELECTION IN THE GEP FOR COMPLEX GENETIC MODELS

229

(a) S1 PS

Cycle0 2 4 6 8 10

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100(b) S1 MS

Cycle0 2 4 6 8 10

0

20

40

60

80

100(c) S1 MAS

Cycle0 2 4 6 8 10

0

20

40

60

80

100

(d) DH PS

Cycle0 2 4 6 8 10

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100(e) DH MS

Cycle0 2 4 6 8 10

0

20

40

60

80

100 (f) DH MAS

Cycle0 2 4 6 8 10

0

20

40

60

80

100

Figure 9.23 400 replications of the response to selection for DH and S1 families for the three selection strategies (phenotypic selection (PS), marker selection (MS) and marker-assisted selection (MAS)), E(NK) = 1(12:5) model with gene frequency of 0.1, per meiosis recombination fraction of 0.1 and heritability of 1.0. Corresponds to the set of graphs in Figure 9.21b

9.3.3.4 Case 4: G×E interactions and epistasis present; E(NK) = 10(12:5) With the E(NK) = 10(12:5) model, both G×E interactions and epistasis were in-

troduced into the genetic architecture of the trait. In this case there were both 10

environment-types in the target population of environments (E = 10) and a high level of

epistasis (K = 5). Therefore, on average five genes were acting on every other gene and

the effects within these networks changed among the 10 environment-types. This case

represents the combined effects of G×E interaction and epistasis considered in the

previous two cases. With a low starting gene frequency in the base population (GF =

0.1), 82 - 90 % of the segregating QTL were detected, and generally 50% of these had

incorrect marker-QTL allele associations relative to the target genotype (Figure 9.24).

In general the response to selection overall was slow in comparison to the previous

cases. For the low heritability (h2 = 0.1) marker-assisted selection had a higher response

to selection than phenotypic selection over all cycles with the S1 families, and the first

six cycles for DH lines (Figure 9.24a). The marker selection strategy response was poor

for both S1 families and DH lines. For the DH lines, the response of marker-assisted

selection and phenotypic selection was higher initially than for S1 families. An increase

Page 264: Narelle Kruger PhD thesis

230 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

in the heritability resulted in an improvement of the phenotypic selection response, such

that phenotypic selection resulted in a higher trait mean value than marker-assisted

selection at cycle six for S1 families and cycle four for DH lines (Figure 9.24b).

Av

erag

e %

of Q

TL

0

20

40

60

80

100SegDetD/SIAA

S1 families

Cycle0 2 4 6 8 10

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100

PSMSMAS

DH lines

Cycle0 2 4 6 8 10

0

20

40

60

80

100

PSMSMAS

Aver

age

% o

f QTL

0

20

40

60

80

100SegDetD/SIAA

S1 families

Cycle0 2 4 6 8 10

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100

PSMSMAS

DH lines

Cycle0 2 4 6 8 10

0

20

40

60

80

100

PSMSMAS

E(NK)=10(12:5), GF = 0.1, c = 0.1(a) h2 = 0.1

(b) h2 = 1.0

Figure 9.24 Average percent of QTL segregating (Seg), detected (Det), detected of segre-gating (D/S) and incorrect marker-QTL allele associations (IAA), with corresponding trait mean value response as a percent of the target genotype for phenotypic selection (PS), marker selection (MS) and marker-assisted selection (MAS) of S1 families and DH lines for a E(NK) = 10(12:5) model. GF = gene frequency h2 =heritability and c = per meiosis recom-bination fraction

Increasing the initial gene frequency in the base population to GF = 0.5 (Figure

9.25) did not change the population mean trait value at cycle zero compared to the case

where the starting gene frequency was GF = 0.1 (Figure 9.24). The percent of QTL

segregating was once again was around double that of the lower gene frequency (Figure

9.25 cf. Figure 9.24) however, the percent of QTL detected of those segregating

decreased to around 70%. The percent of QTL detected with incorrect marker-QTL

allele associations remained around 50%. The trait mean value of phenotypic selection,

especially for S1 families, retained some of the sigmoidal s-shaped response observed

for the E(NK) = 1(12:5) model (Figure 9.22). For the low heritability (h2 = 0.1) marker-

assisted selection had the highest response for S1 families, and was higher than

Page 265: Narelle Kruger PhD thesis

CHAPTER 9 SELECTION IN THE GEP FOR COMPLEX GENETIC MODELS

231

phenotypic selection until cycle seven for the DH lines (Figure 9.25a). Increasing the

heritability to h2 = 1.0 (Figure 9.25b), resulted in marker-assisted selection reaching a

plateau earlier than phenotypic selection for both S1 families and DH lines.

Av

erag

e %

of Q

TL

0

20

40

60

80

100SegDetD/SIAA

S1 families

Cycle0 2 4 6 8 10

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100

PSMSMAS

DH lines

Cycle0 2 4 6 8 10

0

20

40

60

80

100

PSMSMAS

Aver

age

% o

f QTL

0

20

40

60

80

100SegDetD/SIAA

S1 families

Cycle0 2 4 6 8 10

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100

PSMSMAS

DH lines

Cycle0 2 4 6 8 10

0

20

40

60

80

100

PSMSMAS

E(NK)=10(12:5), GF = 0.5, c = 0.1(a) h2 = 0.1

(b) h2 = 1.0

Figure 9.25 Average percent of QTL segregating (Seg), detected (Det), detected of segre-gating (D/S) and incorrect marker-QTL allele associations (IAA), with corresponding trait mean value response as a percent of the target genotype for phenotypic selection (PS), marker selection (MS) and marker-assisted selection (MAS) of S1 families and DH lines for a E(NK) = 10(12:5) model. GF = gene frequency h2 =heritability and c = per meiosis recom-bination fraction

Compared to the E(NK) = 1(12:0) model (Figure 9.15 and Figure 9.16), intro-

ducing epistasis (K = 5) and G×E interaction (E = 10) in combination resulted in more

complex patterns of response to selection for the different selection strategies. In

general the trait mean value was lower and progressed at a slower rate over cycles of

selection with epistasis and G×E interaction in combination included in the genetic

model compared to the other three cases.

Page 266: Narelle Kruger PhD thesis

232 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

(a) S1 PS

Cycle0 2 4 6 8 10

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100(b) S1 MS

Cycle0 2 4 6 8 10

0

20

40

60

80

100(c) S1 MAS

Cycle0 2 4 6 8 10

0

20

40

60

80

100

(d) DH PS

Cycle0 2 4 6 8 10

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100(e) DH MS

Cycle0 2 4 6 8 10

0

20

40

60

80

100 (f) DH MAS

Cycle0 2 4 6 8 10

0

20

40

60

80

100

Figure 9.26 400 replications of the response to selection for DH and S1 families for the three selection strategies (phenotypic selection (PS), marker selection (MS) and marker-assisted selection (MAS)), E(NK) = 10(12:5) model with gene frequency of 0.1, per meiosis recombination fraction of 0.1 and heritability of 1.0. Corresponds to the set of graphs in Figure 9.24b

Graphing the trait mean value for each of the 400 replications (20 parameterisa-

tions × 20 parental replications) of the E(NK) = 10(12:5) model shows the variation and

range of responses that occurred (Figure 9.26). For this complex genetic model all

response patterns were highly variable. For the E(NK) = 10(12:5) GF = 0.1, c = 0.1, h2 =

1.0, the runs were variable for both S1 family and DH lines. It was observed that for the

marker selection strategy for both S1 families (Figure 9.26b) and DH lines (Figure

9.26e) that no progress was made with many of the replications. In some cases for

marker selection there was a negative response to selection over the cycles of selection

attributed to selection on the incorrect marker-QTL allele associations.

9.3.4 General trends across E(NK) models Increasing the complexity of the genetic architecture of the trait within the

framework of the E(NK) model of the genotype-environment system, by incorporating

either G×E interactions, epistasis, or the combined effects of both, affected the response

to selection within the simulations of the Germplasm Enhancement Program. Each of

the different strategies illustrated a different and varied response to the genetic models

Page 267: Narelle Kruger PhD thesis

CHAPTER 9 SELECTION IN THE GEP FOR COMPLEX GENETIC MODELS

233

tested. It was observed that even with the highest level of complexity considered in this

study E(NK) = 10(12:5), on average, progress was still obtained in the Germplasm

Enhancement Program when either marker-assisted selection or phenotypic selection

was utilised. For marker selection it was possible to observe a positive response, no

response, or a negative response to selection. This indicates that for models beyond the

additive model, phenotypic selection is likely to be a critical component of any marker-

based selection strategy. Ultimately, marker-assisted selection was the superior

selection strategy on average over all scenarios considered, particularly in the short to

medium-term for the Germplasm Enhancement Program system that was implemented.

Doubled haploid lines were the superior population type over S1 families on average,

for the models considered. Comparable figures (cf. Figure 9.15) for all of the other cases

considered in this simulation experiment are included in Appendix 4, Section A4.3.

9.4 Discussion 9.4.1 QTL detection analysis

The QTL detection analysis in this simulation experiment involved collecting

data for the number and location of segregating markers and QTL for the trait of interest

for use in the marker selection and marker-assisted selection strategies. In addition to

these data, the percent of QTL segregating, percent of QTL detected, percent of QTL

segregating of those detected and percent of QTL detected with incorrect marker-QTL

allele association components for each genetic model was also recorded. These

components allowed a dissection of how the QTL were acting in the population and

their impact on the results of the marker-assisted selection and marker selection

strategies and therefore, the performance of these strategies relative to phenotypic

selection. The starting gene frequency in the base population was an important factor

affecting each of these components of the QTL detection analysis. The effect of

heritability and per meiosis recombination fraction on each of these components was

also assessed and the results were consistent with those reported on in Chapters 6, 7,

and 8.

With the higher gene frequency (GF = 0.5 cf. GF = 0.1), there were more fa-

vourable QTL alleles segregating in the base population of the Germplasm Enhance-

Page 268: Narelle Kruger PhD thesis

234 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

ment Program, meaning that the mapping population was more likely to contain

polymorphic loci for the QTL influencing the trait under selection in the Germplasm

Enhancement Program. With the lower gene frequency, there was a lower likelihood of

finding two parents segregating for all of the QTL in the mapping population, therefore,

a greater number of QTL were found to be segregating with a gene frequency of GF =

0.5 than for GF = 0.1 (Figure 9.4a). When more QTL were segregating in the mapping

population, it was possible to detect more QTL with a gene frequency of GF = 0.5 than

a GF = 0.1 (Figure 9.5a). The lower gene frequency detected a higher proportion of the

segregating QTL than the higher gene frequency as there were fewer QTL segregating

and it was easier to detect them using the composite interval mapping methodology

considered here (Figure 9.7a). The marker-assisted selection and marker selection

strategies both relied on the presence and detection of QTL to improve the trait mean

value and show an advantage over phenotypic selection by selecting for those QTL. If

few QTL are detected, then the marker selection and marker-assisted selection strategies

are less likely to show an advantage over phenotypic selection than when there were

more QTL detected. The cases where there were more QTL segregating resulted in a

greater chance of detecting a larger number of QTL and therefore, a better response

from marker-assisted selection and marker selection.

The level of recombination between a marker and QTL affected the percent of

QTL detected, and the percent of QTL detected of those segregating (Figure 9.5 and

Figure 9.7). As the per meiosis recombination fraction increased, the percent of QTL

detected and percent of QTL detected of those segregating decreased. As the genetic

distance between a marker and QTL increased, the QTL detection analysis program

encountered problems finding a statistical association between a marker and QTL, most

likely due to crossovers with the consequence of the percent of QTL detected decreas-

ing. The levels of per meiosis recombination fraction used in this study represented a

realistic situation for the Germplasm Enhancement Program. From the integrated

AFLP-SSR linkage map for the parents of the Germplasm Enhancement Program

(Susanto 2004), the smallest per meiosis recombination fraction between two markers

over all of the linkage groups was c = 0.0019 (0.2 cM, Haldane conversion (Haldane

1931)), the largest per meiosis recombination fraction between two markers was c =

Page 269: Narelle Kruger PhD thesis

CHAPTER 9 SELECTION IN THE GEP FOR COMPLEX GENETIC MODELS

235

0.25 (34.7 cM Haldane conversion (Haldane 1931)) and the average per meiosis

recombination fraction between two markers over the linkage groups was c = 0.07 (8.7

cM Haldane conversion (Haldane 1931)). Therefore, modelling a recombination

fraction of c = 0.05 and c = 0.1 provided a realistic approach to the expected per meiosis

recombination fraction for the Germplasm Enhancement Program.

Heritability, as expected, had no influence on the percent of QTL segregating. A

higher heritability produced a higher percent of QTL detected as it contributed towards

the phenotypic values that were used in the QTL detection analysis, allowing a more

accurate representation of the underlying genotype. The lower heritability meant that the

phenotype was not an accurate representation of the genotype, which caused the QTL

detection analysis program to have trouble associating markers with QTL regions.

The effect of G×E interaction, or the number of environment-types in the target

population of environments was small on the percent of QTL segregating (Figure 9.4b).

The decrease in the percent of QTL detected was due to QTL responding differently

under different environments, and even though the QTL detection analysis was

conducted on the average of the phenotypic values of 1000 recombinant inbred lines

over 10 environment-types, some QTL may have had a small effect in all environment-

types, and subsequently were not detected in the QTL detection analysis (Figure 9.5b).

The percent of incorrect marker-QTL allele associations increased as the number of

environment-types increased as G×E interactions introduced complexity into the

models. This made it harder for the composite interval mapping methodology in

PLABQTL to determine what the true favourable allele was for the detected marker,

especially as the QTL detection analysis did not test for QTL × environment interac-

tions in the QTL detection model. The observed decrease in the percent of QTL

detected meant that the response to selection of the marker-assisted selection and

marker selection strategy would not be at their greatest level due to the influence of

G×E interactions on QTL detection.

There was a significant difference between the levels of epistasis for the percent

of QTL segregating (Figure 9.4c). This was due to the allelic combination of the

Page 270: Narelle Kruger PhD thesis

236 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

extreme parents being different for the different levels of epistasis. For example, assume

the following four parent combinations for a two gene/QTL model exists where the A

and B alleles are favourable; AABB, AAbb, aaBB, and aabb. For a simple additive

model, or for the K = 0 case, the highest performing genotype will be AABB and the

lowest performing will be aabb. Crossing the extreme parents will produce an AaBb F1

and both QTL will be segregating. In the case of epistasis level K = 1, there is a non-

linear relationship between the alleles affects on the phenotypes, and they do not

contribute to the genotypic value independently. If AABB is the highest performing

genotype, and instead of aabb being the lowest performing genotype, AAbb is the

lowest performing genotype, then the F1 will be AABb and only one QTL will be

segregating in the mapping population. Therefore, while the starting parent populations

for the breeding program are identical, the ranking of the genotypes can change for

different genetic models. With higher levels of epistasis (i.e. as K increases), the more

complex these networks become. A consequence in this study was fewer QTL segregat-

ing in the mapping population in the presence of epistasis.

The effect of epistasis on the percent of QTL detected, and the percent of QTL

detected with incorrect marker-QTL allele associations can be discussed simultane-

ously. The presence of epistatic networks made QTL detection difficult, and conse-

quently the QTL detection analysis program had trouble determining whether a QTL

was associated with a trait when there were other QTL influencing the phenotypic

value. As mentioned earlier, epistasis can cause phenotypic values to not correspond in

a linear way with the expected genotypic combination value that can be theoretically

determined, therefore the QTL detection analysis program does not detect the QTL as a

linear relationship was not found between a marker allele and QTL allele. Also, as the

epistasis level increased, the percentage of QTL detected with incorrect marker-QTL

allele associations increased (Figure 9.9c). When epistasis is present, the genetic

background influences the values of the alleles of a QTL. With the bi-parental cross, it

is unlikely that all genotypic combinations within an epistatic network will be encoun-

tered, and the true favourable allele will not be found. Instead the favourable allele is

restricted to what combinations are present in the mapping population. It is an important

statistic to record, as even though QTL have been detected, the association may not be

Page 271: Narelle Kruger PhD thesis

CHAPTER 9 SELECTION IN THE GEP FOR COMPLEX GENETIC MODELS

237

correct in the reference breeding population. The presence of incorrect marker-QTL

allele associations means that when the marker-assisted selection and marker selection

strategies are conducted, the unfavourable QTL allele will be selected for instead of the

favourable allele. In these strategies, when a large percent of QTL were detected with

incorrect marker-QTL allele associations, the Germplasm Enhancement Program

breeding program did not progress as well as when there were fewer incorrect marker-

QTL allele associations, especially when using the marker selection strategy where

there is no other method to counteract this problem (Figure 9.22, S1 families).

The percent of QTL detected with incorrect marker-QTL allele associations was

quite an interesting component of the QTL detection analysis. Simulation has provided

an advantage over conventional QTL detection analysis in a breeding program by being

able to record how many QTL were detected where the incorrect or unfavourable

marker allele was associated with the favourable QTL allele. In Figure 9.11, the

relationship between the percent of QTL detected and the percent of QTL detected with

incorrect marker-QTL allele associations for the simple, epistatic, G×E interaction and

combined epistasis and G×E interaction models, indicated that as the level of complex-

ity in the genetic model increased, the percent of QTL detected with incorrect marker-

QTL allele associations increased. Epistasis had a larger impact on the percent of QTL

detected with incorrect marker-QTL allele associations than G×E interaction. This could

be a common problem in breeding programs where breeders would not be able to tell if

the association between a marker and a QTL was favourable beyond the mapping study

until they attempted to use the association in forward breeding. Therefore, it is impor-

tant to account for epistatic and G×E interaction effects in the QTL detection analysis

when these factors are known to have a significant influence. The presence of a small

number of incorrect marker-QTL allele associations in the case 1: E(NK) = 1(12:0)

model was due to minor QTL having their alleles incorrectly assigned as the QTL

detection analysis methodology did not have the power to detect these differences due

to the difference in the mean of the genotypic classes of the QTL being small.

As mentioned in the results Section of this Chapter, the QTL detection analysis

by composite interval mapping in PLABQTL was conducted without accounting for the

Page 272: Narelle Kruger PhD thesis

238 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

effects of epistasis or G×E interaction acting in the mapping population in an attempt to

simulate a situation where a researcher may assume that these factors do not influence

the trait of interest. QTL methodology that explicitly models the effects of epistasis and

G×E interactions in the QTL detection analysis program (e.g van Eeuwijk et al. 2002)

may offer some opportunities to overcome the negative effects of these features of the

genetic architecture of a trait on the marker selection and marker-assisted selection

strategies. These analysis methods have not been considered here, but are recommended

as topics for further research.

9.4.2 Response to selection: S1 and DH with phenotypic selec-tion, marker selection and marker-assisted selection strategies

Doubled haploids on average produced a faster initial increase in trait mean

value over S1 families for all the genetic models tested (Figure 9.12c), which has also

been observed in earlier studies for phenotypic selection (Kruger 1999). Doubled

haploid plants are homozygous in one generation, and can fix favourable allelic

combinations in the breeding population more quickly than in the case for S1 family

selection. On the other hand, DH lines can also fix unfavourable allelic combinations

more rapidly in some situations, which can cause a reduction in the trait mean value. As

mentioned in Chapter 2, DH lines are expected to exhibit twice as much additive genetic

variance ( )22 Aσ , among lines relative to S1 families ( )2Aσ , which can be illustrated using

the corresponding response to selection equations (Equation 4.5 cf. 4.4). Therefore, in

the multi-environment trial stages of the phenotypic selection and marker-assisted

selection strategies, the trait of interest in the DH lines is easier to select for than the

same trait in the S1 families as the environment-type has a smaller influence on the

phenotype for DH lines. Both of these factors contributed towards DH lines producing a

faster response to selection than S1 families in the Germplasm Enhancement Program.

Marker-assisted selection produced a higher trait mean value than both marker

selection and phenotypic selection over all the models tested. As marker selection relied

solely on marker information, when QTL were not detected, or if QTL were detected

with incorrect marker-QTL allele associations, there was no procedure to remove these

effects, leading to marker selection having a low trait mean value compared to pheno-

Page 273: Narelle Kruger PhD thesis

CHAPTER 9 SELECTION IN THE GEP FOR COMPLEX GENETIC MODELS

239

typic selection and marker-assisted selection, especially for the more complex genetic

models. The advantage of marker-assisted selection over marker selection was that

errors made in QTL detection or the detection of QTL with incorrect marker-QTL allele

associations could be counteracted through the use of phenotypic selection. Phenotypic

selection in the long-term produced a higher trait mean value than marker selection as

for the phenotypic selection strategy, selection was based on the phenotype, of which all

QTL contributed towards, not only the few detected in the mapping study. The addition

of phenotypic selection to a marker selection strategy produced the marker-assisted

selection strategy, which produced a faster initial increase in the trait mean value, with

the additional benefit of correcting or at least compensating for some of the marker-

QTL allele association problems due to epistasis and G×E interaction. However, once

all the markers have been fixed in the marker-assisted selection strategy, the strategy

reverts to phenotypic selection. At this stage it may be useful to conduct another

mapping study for the Germplasm Enhancement Program to find more segregating QTL

and markers for the trait of interest.

Introducing G×E interaction into the genetic architecture of the traits subjected

to selection in the Germplasm Enhancement Program breeding program created

complexities that caused a decrease in the genetic gain compared to when G×E

interaction was absent from the genetic model. G×E interaction occurs when genotypes

respond differently relative to each other in different environments therefore, making it

harder to select superior genotypes as the number of different environment-types in the

target population of environments increases. As the number of environment-types

increases, the ability to select superior genotypes decreases due to the changes in QTL

allele effects and their contribution to genotype trait performance across environment-

types. Introducing a model based on 10 environment-types in the target population of

environments for the genetic models caused a decrease in the trait mean value for each

selection strategy. Selection strategies involving markers are expected to perform better

than phenotypic selection as markers are not influenced by the environment. However,

the inclusion of G×E interaction into the genetic models affected the ability of the QTL

detection analysis program to associate the favourable marker allele with the favourable

QTL allele and consequently affected the response to selection of the marker selection

Page 274: Narelle Kruger PhD thesis

240 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

and marker-assisted selection strategies, which are both reliant on QTL detection. With

increasing levels of G×E interactions, marker selection performed worse due to

incorrect marker-QTL allele associations, in addition to the fact that all QTL contribut-

ing towards the value of the trait were either not segregating or not detected in the

mapping population. In the simulated Germplasm Enhancement Program, with

inclusion of G×E interaction, marker selection performed better than phenotypic

selection until cycle seven for a gene frequency of GF = 0.1 and cycle five for a gene

frequency of GF = 0.5 for S1 families and a heritability of h2 = 0.1 (Figure 9.18a, c and

9.19a, c) compared to the E(NK) = 1(12:0) model where marker selection performed

better than phenotypic selection for three cycles and one cycle, respectively (Figure

9.15a, c and 9.16a, c). Conducting multi-environment trials over 10 environments

sampled at random from the target population of environments to account for the effect

of G×E interaction may have resulted in a decrease in the response of the marker-

assisted selection strategy (Cooper et al. 1999b), while multi-environment trials are

necessary for an increase in the response of phenotypic selection. Marker-assisted

selection on average produced the highest response to selection over all levels of G×E

interaction considered (Figure 9.13d). This was due to the combination of markers and

phenotypic selection allowing a greater likelihood of the selection of genotypes with

superior allelic combinations. A different approach to dealing with G×E interaction, and

the fixing of major QTL may be to select for consistent QTL in early screening

procedures, which should be adapted to diverse environments, and then conduct further

studies for QTL specific to a target environment (Austin and Lee 1998). While this

stratified QTL selection strategy was not considered in the present study, this could be

investigated in further simulation experiments.

Epistasis was an important factor influencing QTL detection and response to se-

lection. Epistasis caused a significant percent of incorrect marker-QTL allele associa-

tions due to the complexity it created by giving unfavourable allelic combinations high

phenotypic values, making it difficult for the QTL detection analysis methodology to

determine the correctly associated alleles. The s-shaped response of the S1 families

phenotypic selection strategy in the case of the epistatic models was most obvious with

a gene frequency of GF = 0.5 (Figure 9.22). The response started off low initially,

Page 275: Narelle Kruger PhD thesis

CHAPTER 9 SELECTION IN THE GEP FOR COMPLEX GENETIC MODELS

241

followed by a mid-term rapid response, returning to a slow response. This may be due

to major QTL in the epistatic networks becoming fixed early. Once enough QTL have

become fixed, it was easier to exploit the additive variance of the remaining genes in the

epistatic networks. This may also account for marker-assisted selection having a slightly

s-shaped response, yet a much higher response than phenotypic selection, as marker-

assisted selection had the ability to fix QTL in the marker stage and exploit the

remaining additive variance in the phenotypic selection stage earlier than straight

phenotypic selection.

The normalised trait mean value at cycle zero increased from 0.1, when there

was no epistasis present in the model, to 0.5 when epistasis was present and the gene

frequency was initially set to GF = 0.1 (Figures 9.15 and 9.21). The increase in the trait

mean value was caused by the presence of multiple peaks in the performance landscape

(occurring with the presence of epistasis), which resulted in more than one global

genotypic combination providing a high phenotypic value. Averaging out the starting

trait mean value at cycle 0 for all runs for that model resulted in a starting trait mean

value of 0.5. In the reference population the genotypic combinations and their relation-

ship to the phenotypic values no longer behave in a linear pattern (Cooper et al. 2002b).

When epistasis was present, some globally unfavourable allelic combinations had

higher local trait performance values than some other favourable allelic combinations.

Therefore, with a low gene frequency, the globally unfavourable allelic combinations

can have a high presence in the population and, with their high local trait performance

value, on average, can pull up the trait mean value at cycle zero (e.g. Figure 9.23,

Kauffman (1993)).

For some of the genetic models tested the phenotypic selection and marker-

assisted selection trait mean values crossed over for both DH lines and S1 families

(examples; Figures 9.15, 9.18 and 9.21). Selection strategy crossovers occurred whether

there were incorrect marker-QTL alleles identified or not, and happened over all

environment-types, epistasis and heritability levels. It generally occurred in earlier

cycles of selection with a gene frequency of GF = 0.5. A reason for the crossing over of

phenotypic selection over marker-assisted selection may be due to the fixing or loss of

Page 276: Narelle Kruger PhD thesis

242 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

favourable QTL alleles in the case of marker-assisted selection. The two strategies

(phenotypic selection and marker-assisted selection) are fixing different combinations

of QTL alleles. The marker-assisted selection strategy was using QTL which have only

been detected in the mapping study based on the selected two parents; however the

breeding population was created from 10 parents. Any combination of two parents may

not have contained all of the QTL contributing towards the trait of interest, or all QTL

may not be segregating. When marker profiles were conducted on the space plant

population, plant selection was based on the presence of markers, with the top 500

plants being selected on their marker profile. There may have been genotypes in the

space plant population that contained important QTL alleles however; these QTL alleles

were not segregating in the mapping population. Therefore, there were no marker-QTL

associations for those QTL and plants with favourable alleles for these QTL would not

be selected to progress through the breeding program from the marker profile. These

important QTL alleles may be lost from the breeding population when marker selection

or marker-assisted selection was conducted, which is why the phenotypic selection trait

mean value overtook the marker-assisted selection trait mean value, and marker-assisted

selection did not reach 100% of the target genotype. The reason phenotypic selection

overtook marker-assisted selection may have been due to phenotypic selection not

losing as many favourable QTL alleles from the breeding reference population. This

indicates an important limitation of the choice of mapping population type when

conducting marker-assisted selection. The selection strategy crossovers occur earlier for

the DH lines than S1 families as DH lines reach homozygosity earlier, and are capable

of losing favourable QTL alleles from the population earlier than the heterozygous S1

families.

Observing the 400 replication variation of the trait mean value for four of the

genetic models (Figures 9.17, 9.20, 9.23 and 9.26), gave an overall view of variation

around the average responses. From the simulated data there were two situations in the

simple model where no QTL were detected as the marker selection strategy had no

progress for two runs (Figure 9.17b, e). Therefore, random mating was effectively

occurring in these cases. For cases 2, 3 and 4, many of the runs showed little or

backwards progress for marker selection (Figures 9.20, 9.23 and 9.26b, e). For case 4,

Page 277: Narelle Kruger PhD thesis

CHAPTER 9 SELECTION IN THE GEP FOR COMPLEX GENETIC MODELS

243

from the raw data only one run did not detect any QTL. However, 17.5% of the runs

detected all of the QTL with incorrect marker-QTL allele associations while 12.25% of

the runs that detected all QTL assigned the alleles correctly. It should be noted that with

the marker-assisted selection strategy, even though for some of the runs no QTL were

detected, or QTL were detected with incorrect marker-QTL allele associations, progress

was still made. In the Germplasm Enhancement Program, marker-assisted selection was

implemented to act like phenotypic selection when no QTL are detected. Therefore, the

implementation of phenotypic selection within the marker-assisted selection strategy

was able to compensate for the progress lost through incorrect marker-QTL allele

associations or the absence of QTL detected within the mapping study.

9.5 Conclusion Doubled haploid lines were able to compensate for some of the effects of G×E

interaction and epistasis in a superior way to S1 families. On average, response to

selection was positive for all of the combinations tested. Overall, marker-assisted

selection with a DH line population in the Germplasm Enhancement Program gave the

highest trait mean value on average over all the genetic models simulated. G×E

interactions and epistasis had a large influence on the percent of QTL detected and on

the association of favourable marker alleles with favourable QTL alleles. This impacted

the trait mean value of both the marker selection and marker-assisted selection

strategies. G×E interactions and epistasis also influenced response to selection for the

phenotypic selection strategy and the phenotypic component of the marker-assisted

selection strategy.

Accounting for the effect of G×E interaction and epistasis within the QTL

analysis detection program is likely to improve the percent of QTL detected and remove

some of the incorrect marker-QTL allele associations. The QTL detected would be more

reliable and contribute positively to a higher response to selection in the Germplasm

Enhancement Program for the marker selection and marker-assisted selection strategies.

In this Chapter, epistasis and G×E interactions were not explicitly accounted for in the

QTL detection analysis models. It is likely that if they were considered, more reliable

QTL could be detected, and large mapping population sizes may possibly be reduced.

Page 278: Narelle Kruger PhD thesis

244 SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

This area is a logical extension of the work reported in this thesis and needs to be

considered in future investigations.

Page 279: Narelle Kruger PhD thesis

PART V GENERAL DISCUSSION AND CONCLUSIONS

245

PART V

GENERAL DISCUSSION

AND

CONCLUSIONS

Page 280: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

246

Page 281: Narelle Kruger PhD thesis

CHAPTER 10 GENERAL DISCUSSION

247

CHAPTER 10

GENERAL DISCUSSION

The research reported in this thesis was motivated by the need to evaluate

alternative breeding strategies for the long-term improvement of complex traits, such as

yield, for wheat in the northern grains region of Australia. Computer simulation was

selected as an appropriate investigative methodology to undertake this research. The

simulation experiments undertaken in this thesis focussed on the implementation of

marker-assisted selection in the wheat Germplasm Enhancement Program. The

inclusion of marker-assisted selection into the Germplasm Enhancement Program was

compared for S1 families and DH lines against the traditional phenotypic selection

strategy. Including variables of importance to the progress from selection within the

Germplasm Enhancement Program (i.e. the effects of per meiosis recombination

fraction, heritability, starting gene frequency, mapping population size, G×E interac-

tions and epistasis) has allowed a detailed evaluation of each of the different breeding

strategies, resulting in some general conclusions to be formed on the relative merits of

the breeding strategies considered for a wide range of genetic model scenarios. The

investigations have allowed a discussion to be built around the general areas of: (i)

simulating breeding programs; (ii) QTL detection and marker-assisted selection; and

(iii) the implications of the genetic architecture of traits for response to selection within

the Germplasm Enhancement Program. This Chapter summarises the general findings

for each of these areas followed by a summary of the key conclusions and recommenda-

tions for the use of marker-assisted selection within the Germplasm Enhancement

Program.

Page 282: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

248

General conclusions related to the simulation of breeding strategies Empirical and theoretical evaluations for the genetic gain achievable in a breed-

ing program can quickly become impractical as the complexity of the genetic system

increases. Quantitative genetic theory requires many assumptions to ensure the

mathematical equations used to predict the response to selection remain tractable, which

often limits their use in practice where the assumptions are not valid. Empirical

experimentation to evaluate alternative strategies quickly becomes time and resource

intensive as the number of experimental combinations increases. In contrast, computer

simulation can allow the investigator to relax some of the assumptions required for

theory to apply and is a practical approach for investigating a vast number of genetic

models for little time and resources, in comparison to empirical evaluation. Therefore,

the use of a computer simulation methodology to model the wheat Germplasm

Enhancement Program breeding strategy allowed a detailed investigation of the

response to selection of this breeding program for a range of genetic model scenarios

considered to be of importance to the program. The computer simulation methodology

utilised in this thesis allowed insights into the detailed procedures of the Germplasm

Enhancement Program strategy that would otherwise not have been possible by

theoretical and empirical investigation.

Simulation involves modelling a process, and in the case of this thesis, the proc-

esses are different selection strategies in the Germplasm Enhancement Program. The

role of simulation however, is not to model reality (or in this case the Germplasm

Enhancement Program) exactly but to model key processes and determine the impact of

these processes on what is being studied (Casti 1997b). It is a tool that can be used to

help predict the outcomes for a range of different variables. The steps taken within this

thesis to ensure key processes were simulated on a basis as close to reality as possible

included: (i) the method for modelling recombination was consistent with theoretical

expectations; (ii) the per meiosis recombination fractions used were plausible from

mapping work conducted on parents of the Germplasm Enhancement Program (Nadella

1998, Susanto 2004); (iii) a breeding program that exists was simulated; and (iv) the

results of simulating marker-assisted selection in the Germplasm Enhancement Program

were similar to what others had found, i.e. gains declined with time compared to

Page 283: Narelle Kruger PhD thesis

CHAPTER 10 GENERAL DISCUSSION

249

phenotypic selection (Zhang and Smith 1992, 1993, Edwards and Page 1994, Gimelfarb

and Lande 1994a, 1994b, 1995, Whittaker et al. 1995, Hospital and Charcosset 1997,

Whittaker et al. 1997, Cooper and Podlich 2002). At this point in time it is not possible

to model the effect of every gene in every epistatic network and every environment as

these effects are not understood for the Germplasm Enhancement Program, however,

over time, as the genetic architecture of traits are dissected and modelled, the results

from the simulations will represent a more realistic outcome.

Through the series of experiments reported in this thesis it was demonstrated

that detailed simulations of the specifics of a breeding program (e.g. the Germplasm

Enhancement Program) was achievable in a high throughput format. To simulate the

Germplasm Enhancement Program access was required to:

(i) the operating knowledge of the Germplasm Enhancement Program and an

understanding of the genetics underlying traits that are important for wheat

in the northern grains region of Australia;

(ii) the QU-GENE software;

(iii) high performance, high throughput computer hardware and software in the

form of the QU-GENE Computing Cluster and the software required to run

the QU-GENE Computing Cluster (Micallef et al. 2001); and,

(iv) the relevant technical support to develop the QU-GENE software modules

used in this investigation and their implementation on the QU-GENE Com-

puting Cluster. The approach used to undertake the software development

and implementation on the QU-GENE Computing Cluster was described in

Chapter 3.

For the simulation of breeding programs to be a successful undertaking, it was

important to accurately identify the goals of each individual simulation experiment. By

initially determining a set of key questions to investigate, an efficient design of the QU-

GENE simulation module to meet those goals could be developed. It was also important

to ensure that the necessary aspects of the breeding program were modelled in the

simulation module to ensure the most accurate portrayal of the breeding program could

occur based on current knowledge. Accurately identifying and designing simulations

Page 284: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

250

was an important aspect in the simulation of a breeding program to ensure the results

were as close to reality as could be expected based on current evidence (Casti 1997a).

The simulation of breeding programs has been conducted in various studies us-

ing QU-GENE. More recently, a comparable investigation for pedigree breeding was

undertaken by Jensen (2004), and for the CIMMYT wheat breeding program by Wang

et al. (2003). The results from these studies found simulation to be a viable tool to

explore and compare different breeding methods for a wide range of situations.

Using computer simulation to model a breeding program allows an investigation

into the effects of many experimental variables. Those considered in this study

included: (i) number of chromosomes; (ii) number of genes; (iii) number of QTL; (iv)

number of markers; (v) starting gene frequency in the base population; (vi) gene action;

(vii) linkage; (viii) per meiosis recombination fraction; (ix) heritability on an observa-

tion and selection unit basis; (x) epistasis; (xi) G×E interaction; (xii) selected propor-

tion; (xiii) mapping population size; (xiv) selection strategies; and (xv) population type.

The ability to modify so many experimental variables allowed a detailed analysis of the

impact a breeding strategy has on the response to selection for defined reference

populations with combinations of these variables. This aspect of simulation allowed

valuable conclusions to be made on the influence of the experimental variables for the

outcomes of the Germplasm Enhancement Program.

A wide range of effects of epistasis and G×E interactions on the genetic archi-

tecture of a trait was included in a series of experiments to broaden the range of genetic

models considered in the simulation investigations. In the absence of sufficient detail to

define the specifics of the situations for the wheat Germplasm Enhancement Program,

considering a range of possibilities was important to represent an ensemble of plausible

plant breeding situations. The E(NK) framework was implemented in QU-GENE to

allow the generation of gene effects by applying a statistical ensemble approach

(Kauffman 1993, Cooper and Podlich 2002). The availability of this theoretical

framework required significant developments of the NK model given by Kauffman

(1993). The diploid models considered in this thesis relied on prior work by Podlich

Page 285: Narelle Kruger PhD thesis

CHAPTER 10 GENERAL DISCUSSION

251

(1999). Parameterisation of the E(NK) model is still presently limited in many cases by

the lack of detailed knowledge available on the genetic architecture of traits. However,

this information will become available as the results of QTL and genomic investigations

are validated, candidate genes and gene networks are identified and G×E interactions

and epistatic effects can be quantified with greater precision (e.g. Cooper et al. 2005).

Some preliminary work in this direction was reported by Jensen (2004). The current

investigation relied heavily on the empirical body of information from classical

quantitative genetic investigations that indicate the importance of G×E interactions and

epistasis for grain yield for the germplasm relevant to the Germplasm Enhancement

Program. Future investigations of the Germplasm Enhancement Program breeding

strategy will benefit from E(NK) model parameterisations based on validated empirical

results of trait mapping investigations. Some preliminary work towards this direction

was reported by Nadella (1998) and Susanto (2004).

Main findings related to QTL detection analyses and marker-assisted

selection The limitations of population size, per meiosis recombination fraction, heritabil-

ity and gene frequency in the reference breeding population on QTL detection have

been outlined in previous studies (Beavis 1994, 1998, Jansen et al. 2003). In this thesis,

each of these variables was also found to be an important factor influencing the

detection of QTL and ultimately the response from marker-assisted selection.

As map density decreased and the genetic distance between a marker and QTL

become larger, there was a higher probability of recombination events occurring and the

strength of the association between the marker and QTL being reduced. A larger per

meiosis recombination fraction led to a decrease in the number of QTL detected in the

mapping studies compared to when a more dense map was available and the per meiosis

recombination fraction was smaller.

A higher heritability resulted in a larger number of the segregating QTL being

detected in the mapping studies. A higher heritability meant that the phenotype was a

more accurate representation of the underlying genotype, making it easier for the QTL

Page 286: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

252

detection analysis methodology to determine associations between QTL and markers as

there was low variability in the phenotypic data due to environmental variation and

other sources of experimental error. However, in cases where a trait had a high

heritability, QTL detection and marker-assisted selection may not be the preferred

option as selection on the phenotype may be the economically optimum selection

method. If a trait has a low heritability it is also possible to increase its heritability by

using methods in the QTL detection mapping population like progeny testing. Progeny

testing involves scoring replicated progeny which provides a higher family-mean

heritability, and, if grown in a range of environment-types, can provide a basis for

estimating QTL×E interactions, and an advantage to the use of marker-assisted selection

for traits with a low heritability.

The starting gene frequency in the reference breeding population of the breeding

program determined the proportion of each of the two alleles for each QTL in the

reference population. With a lower starting gene frequency, e.g. GF = 0.1 for the

favourable allele, fewer QTL were detected than with the higher starting gene frequency

of GF = 0.5, as there was a smaller chance of the two parents that formed the mapping

population to be segregating for the QTL influencing trait variation in the reference

population.

Mapping population size was found to be one of the most influential factors for

QTL detection. With a low mapping population size the number of genotypes that can

be sampled is limited. This results in some associations between the phenotype and

genotype not being found and ultimately, some segregating QTL not being detected.

With a mapping population size of 100 individuals, heritability and per meiosis

recombination played important roles in determining the detection of QTL. With small

mapping population sizes the chances of detecting segregating QTL increased with

greater heritability and a denser genetic map, which was simulated by a lower per

meiosis recombination fraction. With larger population sizes more genotypes were able

to be sampled and associations between marker and QTL alleles were more likely to be

found. Most genome wide searches for QTL use 500 individuals with a 10 – 12 cM map

as both a denser map and a larger population size enable more QTL to be detected and a

Page 287: Narelle Kruger PhD thesis

CHAPTER 10 GENERAL DISCUSSION

253

greater resolution to be achieved in positioning the QTL (Ober and Cox 1998, Chalmers

et al. 2001). It has also been suggested that a population size of 1000 individuals is

required to obtain accurate QTL positions and to estimate effects (Holland 2004), with a

practical QTL mapping study in maize being conducted using 976 progeny families

(Openshaw and Frascaroli 1997). Both of these references recognise the need for larger

mapping population sizes to be used. In this thesis, mapping population sizes approach-

ing 500 to 1000 recombinant inbred line individuals gave a high, reliable power for

QTL detection across the genetic models tested in this thesis. Whilst mapping popula-

tion was an important factor in the detection of QTL for marker-assisted selection, it

had a small impact on the response to selection as the phenotypic selection phase of

marker-assisted selection helped to overcome small numbers of detected QTL or QTL

with incorrect marker-QTL allele associations.

The presence of epistasis and G×E interactions as components of the genetic ar-

chitecture of a complex quantitative trait had strong implications for the results of QTL

detection and marker-assisted selection in comparison to models based on the assump-

tion of no epistasis and no G×E interactions. Increasing levels of G×E interactions and

epistasis generally caused a decrease in the number of QTL detected. A major finding

from this thesis is that in the presence of epistasis and G×E interactions it is expected

that mapping studies will detect QTL however, the marker-QTL allele associations

detected are less likely to be the globally desirable marker-QTL allele associations for

the reference breeding population.

The incidence of the detection of marker-QTL associations in the reference

mapping population that were not the preferred associations for the reference breeding

populations was examined in terms of incorrect marker-QTL allele associations, which

are a form of Type III errors. Determining the level of incorrect marker-QTL allele

associations allowed an insight into the effects that G×E interactions and epistasis had

on the detection of favourable QTL alleles. An incorrect marker-QTL allele association

occurred when a QTL that was segregating in the mapping population was detected,

however, the directional effect of the QTL was incorrect for the global model in the

reference breeding population i.e. the QTL allele had a negative effect when it should

Page 288: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

254

have been positive based on the model that was specified. An incorrect marker-QTL

allele association therefore, resulted in the favourable marker allele being associated

with the unfavourable QTL allele. Selection for the detected QTL leads to a build up of

globally unfavourable QTL alleles in the breeding population. It is noted that even

though these QTL alleles may not be globally superior, they may be favourable in

specific epistatic combinations, i.e. locally favourable. Therefore, even in the presence

of incorrect marker-QTL allele associations that are unfavourable from the perspective

of the global performance landscape, and thus the global target genotype, marker-

assisted selection, and to a lesser extent marker selection, can still contribute a positive

response to selection. An interpretation of this result is that while the marker-QTL allele

associations identified in the mapping study are not always globally favourable, they

can still be locally favourable on the performance landscape. Thus, climbing the local

performance peak on the landscape response surface results in a positive response to

selection. However, if this interpretation is correct, different results should be observed

from different replicates of the same model. While this could not be investigated in

detail in this study it was observed that when epistasis and G×E interactions were

included in the genetic model of the trait, the responses to selection were more variable

among the replicates than for the additive models with no epistasis and G×E interac-

tions. This result was also noted for cases of G×E interaction for simulated yield of

sorghum by Chapman et al. (2003). This result is consistent with the expectations of

exploiting different local peaks on the global performance landscape, as discussed first

by Wright (1932) and more recently by Kauffman (1993), Cooper and Podlich (2002),

and Podlich et al. (2004).

The reliable detection of QTL was important for the success of the marker-

assisted selection strategy investigated in this thesis. As the proportion of QTL detected

increased, the advantage of marker-assisted selection over phenotypic selection

increased. The increase of marker-assisted selection over phenotypic selection was due

to the favourable detected QTL alleles being fixed in one or two cycles in the breeding

program for marker-assisted selection, as opposed to phenotypic selection which

required a longer timeframe to fix the same favourable QTL alleles as selection was

only occurring on the observed phenotype. Generally the experimental variables which

Page 289: Narelle Kruger PhD thesis

CHAPTER 10 GENERAL DISCUSSION

255

affected QTL detection (heritability, per meiosis recombination fraction, starting gene

frequency and the effects of G×E interaction and epistasis), will have a carry through

effect on the response to selection of both the marker selection and marker-assisted

selection strategies. These variables also affected the phenotypic selection phase of

marker-assisted selection. Therefore, the effect of these variables impacted marker-

assisted selection in two phases of the strategy (QTL detection and phenotypic selection

phases) as opposed to marker selection where they affected only the QTL detection

phase. However, for all of the models tested in this thesis, marker-assisted selection was

generally found to be the selection method that gave the greatest rate of genetic gain.

The marker-assisted selection strategy considered in this thesis was able to produce a

faster response to selection in the Germplasm Enhancement Program than phenotypic

selection, as marker-assisted selection involved the selection of individuals based on

their marker profile, and then utilised phenotypic selection to further evaluate the

individuals selected first on the results of the QTL detection analysis. The phenotypic

selection stage allowed the removal of individuals which may have been selected on an

incorrect QTL allele marker profile due to an incorrect marker-QTL allele association.

Another advantage of the phenotypic selection stage was that it also allowed the

selection of individuals which contained QTL that were not segregating or were not

detected in the mapping population.

Findings specific to the Germplasm Enhancement Program The work in this thesis indicates that implementing either DH lines, marker-

assisted selection or both in combination, will result in a higher genetic gain than the

currently implemented S1 family phenotypic selection method for complex quantitative

traits. This advantage is especially large in the first few cycles of selection. Marker

selection was not a realistic breeding strategy for the Germplasm Enhancement Program

at this time given the lack of understanding of the genetic architecture of the quantita-

tive traits of interest, particularly grain yield. Mapping population size, G×E interac-

tions and epistasis all had important influences on the outcomes of marker-assisted

selection for the Germplasm Enhancement Program.

Page 290: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

256

Small mapping population size was a significant limitation to the detection of

QTL in the simulations. The current Germplasm Enhancement Program empirical trait

mapping investigations that have been based on recombinant inbred line population

sizes of 100-120 lines (Nadella 1998, Susanto 2004) are likely to provide unreliable

QTL detection analysis results for complex traits from the perspective of implementing

marker-assisted selection within the Germplasm Enhancement Program. These studies

have provided a foundation for the creation of a linkage map and the detection of QTL

for a range of agronomic traits for the Germplasm Enhancement Program. However, to

validate the detected QTL to determine that they are true QTL and are able to be used

for marker-assisted selection in a breeding program, further experiments involving

larger population sizes of 500 to 1000 individuals, and different parental crosses than

the single bi-parental mapping population currently used, will need to be considered.

Mapping studies with a large number of segregating QTL relevant to the breed-

ing program are preferable crosses for use in marker-assisted selection. A bi-parental

mapping population may not be the best type of population for detecting QTL for use in

the Germplasm Enhancement Program, as the number of polymorphic QTL was usually

found to be low and variable. The information provided by the markers and their

contribution to the response to selection only lasted for two cycles. Therefore, choice of

mapping population is shown to be critical in the design of an effective marker-assisted

selection strategy. Future investigations should involve examining a range of different

mapping population types and designs that can produce and detect more polymorphic

QTL that are relevant to the reference population of the breeding program (e.g. Jansen

et al. 2003). There may also be a need for additional mapping studies at later cycles of

selection in the Germplasm Enhancement Program to find QTL that were not detected

in the first mapping study (Podlich et al. 2004).

Given the current empirical evidence, both epistasis and G×E interactions are

likely to be important influences in the response to selection realised from the Germ-

plasm Enhancement Program (Peake 2002, Jensen 2004). The observed variability of

the simulated responses to selection for replicates of the Germplasm Enhancement

Program, given the same genetic model but different starting conditions in the presence

Page 291: Narelle Kruger PhD thesis

CHAPTER 10 GENERAL DISCUSSION

257

of epistasis and G×E interactions, suggests that the long-term outcomes from the

breeding program could be strongly context dependent. This contrasts in many

important ways with the additive models E(NK) = 1(N:0), in that for the additive models

the long-term outcomes were much less variable than for the models including epistasis

and G×E interactions.

Marker-assisted selection was found to provide scope to improve the rate of

progress from selection for quantitative traits in the Germplasm Enhancement Program

if the mapping phase can be conducted within the guidelines below to achieve accept-

able QTL detection power. Any future investments into mapping quantitative traits for

the Germplasm Enhancement Program should focus on:

(i) a recombinant inbred line population size of at least 500 individuals;

(ii) investigating experimental methods that improve the heritability of the traits,

e.g. reducing the incidence and influence of spatial variation within experi-

ments and other sources of experimental errors; and

(iii) target a map density that results in a marker coverage of around one poly-

morphic marker every 10 cM across the genome.

From the results of this thesis, DH lines have shown that they are a more effi-

cient breeding method in the Germplasm Enhancement Program as compared to S1

families when considered in terms of genetic gain. Other studies have also found the use

of DH lines to be more efficient than the strategies they were compared against (Gallais

1988, 1989, 1990). For the genetic models examined in this thesis, the inclusion of DH

lines into the Germplasm Enhancement Program breeding program looks promising.

The expense and time involved in producing DH lines may result in it not being feasible

for the Germplasm Enhancement Program to be completely dependent on DH lines.

However, as the production of DH plants becomes easier, DH lines can become an

efficient option for a recurrent selection strategy (Picard et al. 1988). Doubled haploid

selection offers an advantage over S1 selection and could be implemented into the

Germplasm Enhancement Program with or without the implementation of marker-

assisted selection.

Page 292: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

258

The incorporation of DH lines and / or marker-assisted selection into the Germ-

plasm Enhancement Program as examined in this thesis may result in the release of

commercial lines earlier than from the conventional S1 family selection program. If a

cultivar, superior to those already being commercially grown, can be developed one

year earlier than expected, the time saving can be of significant value to the target

industry. A recent study in rice (Pandey and Rajatasereekul 1999), put the net present

value of reducing a breeding cycle by: (i) one year with a discount rate of 5% at $19

million, and (ii) five years with a discount rate of 5% at $105.1 million, in an area

where the rice industry has a value of $1.5 billion. In Kansas (USA), from 1979 to 1994

the wheat breeding program cost an average $3.8 million per year. During this period

new semi-dwarf varieties were released and increased wheat production by greater than

1% per year, resulting in an economic benefit to wheat producers of $52.7 million per

year, or, for every $1 invested in varietal improvement, nearly $12 was earned by

Kansas wheat producers (Barkley 1997). In respect to marker-assisted selection, a

CIMMYT study has shown that even though it may be more efficient than phenotypic

selection, marker-assisted selection may not always be cost efficient and the choice

between the two techniques will be a trade-off between time and money (Dreher et al.

2003, Morris et al. 2003).

The incorporation of marker-assisted selection into plant breeding programs has

been relatively slow due to the time and resources involved (Lee 1995). There are many

costs involved when conducting marker-assisted selection. With the economic assess-

ment of marker-assisted selection being addressed in only a few studies (Dekkers and

Hospital 2002), there is little foundation information on which to base an estimate of

costs. A cost-benefit analysis would need to be conducted on the inclusion of marker-

assisted selection into the Germplasm Enhancement Program, as compared to pheno-

typic selection, to determine whether the increase in genetic gain is offset by the

increase in resources and time required. If marker-assisted selection is shown to

increase the response to selection of the Germplasm Enhancement Program under a

wide range of genetic models, then the Germplasm Enhancement Program has the

potential to produce superior parents for the pedigree programs of the Northern Wheat

Improvement Program earlier than expected. Therefore, the simulation investigation

Page 293: Narelle Kruger PhD thesis

CHAPTER 10 GENERAL DISCUSSION

259

reported in this thesis provides useful information in any decisions on whether to use

marker-assisted selection in future cycles of the Germplasm Enhancement Program.

Opportunities for further work There is potential for further investigations developed from this work to allow a

more detailed analysis of marker-assisted selection in the Germplasm Enhancement

Program. Given the empirical evidence supporting the importance of epistasis and G×E

interactions for grain yield in the reference population of the Germplasm Enhancement

Program, further work should investigate the genetic and physiological basis of these

interactions for the Germplasm Enhancement Program reference breeding population.

The results for the response to selection for marker-assisted selection in the Germplasm

Enhancement Program indicate that the information contributing towards marker-

assisted selection from the detected QTL was effective for two-three cycles of selection.

Conducting further QTL mapping after the contributions of the detected QTL have been

utilised may allow more opportunities to detect additional QTL. This aspect of long-

term marker-assisted selection was not investigated in this thesis and is identified as a

topic for further investigation. There were also important interactions between the QTL

mapping phase and the selection phase of marker-assisted selection in the Germplasm

Enhancement Program which requires further investigation.

Conclusions The inclusion of DH lines and marker-assisted selection in the Germplasm En-

hancement Program generally provided a larger genetic gain than S1 family and

phenotypic selection for the range of genetic models tested in this thesis. The use of

QU-GENE to simulate these selection strategies in the Germplasm Enhancement

Program allowed an extensive study of the program to be conducted. Both quantitative

genetic theory prediction equations and empirical experimentation were unable to

efficiently or practically manage the scenarios investigated in this thesis; however they

are vital in providing the solid foundation on which the simulation study was developed.

The results from this thesis forms part of a body of research helping to improve

the genetic gain for quantitative traits within the Germplasm Enhancement Program as a

Page 294: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

260

breeding program for the long-term improvement of wheat varieties for the northern

grains region, as well as forming part of a larger strategic research effort to improve the

modelling of genetic systems.

As simulation becomes a more widely accessed and utilised tool, its application

in plant breeding programs could become the primary point of focus in determining the

design of empirical experiments to ensure efficient use of time and resources to gain the

most information from the inputs for an experiment. As with any modelling approach,

simulation is only as good as the information that it works with. As more empirical

experimentation is undertaken to determine the detailed genetic architecture of a trait

including the effects of G×E interaction and epistasis, simulation will increasingly

produce more realistic outcomes for particular scenarios as the genetic model entered

into the simulation study approaches the true genetic composition of a trait.

The results of this study emphasise the power computer simulation technology

has provided to determine the efficiency of six complex selection strategies in the

Germplasm Enhancement Program. Although the genetic models in this thesis were

applied to a specifically modelled wheat breeding program, the results can be applied

beyond this breeding program to help guide in the decision making process of other

plant breeders in determining the use and efficiency of marker-assisted selection in plant

breeding.

Page 295: Narelle Kruger PhD thesis

BIBLIOGRAPHY

261

BIBLIOGRAPHY

Page 296: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

262

Page 297: Narelle Kruger PhD thesis

BIBLIOGRAPHY

263

Austin DF and Lee M (1998) Detection of quantitative trait loci for grain yield and yield components in maize across generations in stress and nonstress environ-ments. Crop Science. 38: 1296-1308.

AWB Ltd (2001) Grain Production. ABARE. www.awb.com.au/AWB/user/communityEducation/e43.asp

Baenziger PS, Kudirka DT, Schaeffer GW and Lazar MD (1984) The significance of doubled haploid variation. In: JP Gustafson (ed.) Gene manipulation in plant improvement. Plenum Press: New York. pp. 385-414.

Baker RJ (1968) Extent of intermating in self-pollinated species necessary to counter-act the effects of genetic drift. Crop Science. 8: 547-550.

Baker RJ (1984) Quantitative genetic principles in plant breeding. In: JP Gustafson (ed.) Gene Manipulation in plant improvement. Plenum Press: New York. pp. 147-176.

Barkley AP (1997) Kansas Wheat Breeding: an economic analysis. Kansas State University Agricultural Experiment Station and Cooperative Extension Service 793.

Barnes WC and McKenzie EA (1993) Dough mixing tolerance in non-1BL/1RS translocation wheats. Euphytica. 66: 187-195.

Basford KE and Cooper M (1998) Genotype × environment interactions and some considerations of their implications for wheat breeding in Australia. Australian Journal of Agricultural Research. 49: 153-174.

Basten CJ, Weir BS and Zeng Z-B (1994) Zmap - a QTL cartographer. In: C Smith et al. (eds). Proceedings of the 5th World Congress on Genetics Applied to Live-stock Production: Computing Strategies and Software, Vol. 22. Guelph, Ontario, Canada: Organizing Committee, 5th World Congress on Genetics Applied to Livestock Production. pp. 65-66.

Basten CJ, Weir BS and Zeng Z-B (2001) QTL Cartographer, Version 1.15. Depart-ment of Statistics, North Carolina State University.

Bateson W (1909) Mendel's principles of heredity. Cambridge University Press: Cambridge.

Beavis WD (1994) The power and deceit of QTL experiments: lessons from compara-tive QTL studies. In: DB Wilkinson (ed.) Forty-Ninth Annual Corn and Sor-ghum Research Conference. Chicago, Illinois: Wilkinson, D.B. pp. 250-266.

Beavis WD (1998) QTL analyses: power, precision, and accuracy. In: AH Paterson (ed.) Molecular Dissection of Complex Traits. CRC Press: Boca Raton. pp. 145-162.

Page 298: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

264

Bliss FA and Gates CE (1968) Directional selection in simulated populations of self-pollinated plants. Australian Journal of Biological Science. 21: 705-719.

Brennan PS and Byth DE (1979) Genotype × environment interactions for wheat yields and selection for widely adapted wheat genotypes. Australian Journal of Agricultural Research. 30: 221-232.

Brennan PS, Byth DE, Drake DW, DeLacy IH and Butler DG (1981) Determination of the location and number of test environments for a wheat cultivar evaluation program. Australian Journal of Agricultural Research. 32: 189-201.

Carbonell EA, Asins MJ, Baselga M, Balansard E and Gerig TM (1993) Power studies in the estimation of genetic parameters and the localization of quantita-tive trait loci for backcross and doubled haploid populations. Theoretical and Applied Genetics. 86: 411-416.

Carlborg Ö and Haley CS (2004) Epistasis: too often neglected in complex trait studies? Nature Reviews Genetics. 5: 618-625.

Carter TC and Falconer DS (1951) Stocks for detecting linkage in the mouse and the theory of their design. Journal of Genetics. 50: 307-323.

Carver BF and Bruns RF (1993) Emergence of alternative breeding methods for autogamous crops. In: BC Imrie and JB Hacker (eds). Focused plant improve-ment: towards responsible and sustainable agriculture. Proceedings of the Tenth Australian Plant Breeding Conference. Canberra: Organising Committee, Australian Convention and Travel Service. pp. 43-56.

Carver BF and Rayburn AL (1994) Comparisons of related wheat stocks possessing 1B or 1RS.1BL chromosomes: Agronomic performance. Crop Science. 34: 1505-1510.

Casali VWD and Tigchelaar EC (1975) Computer simulation studies comparing pedigree, bulk, and single seed descent selection in self pollinated populations. Journal of the American Society for Horticultural Science. 100: 364-367.

Casti JL (1997a) Reality rules: I Picturing the world in mathematics - the fundamen-tals. John Wiley & Sons Inc: New York.

Casti JL (1997b) Would-be-worlds: how simulation is changing the frontiers of science. J. Wiley: New York.

Chalmers KJ, Campbell AW, Kretschmer J, Karakousis A, Henschke PH, Pierens S, Harker N, Pallotta M, Cornish GB, Shariflou MR, Rampling LR, McLauchlan A, Daggard G, Sharp PJ, Holton TA, Sutherland MW, Appels R and Langridge P (2001) Construction of three linkage maps in bread wheat, Triticum aestivum. Australian Journal of Agricultural Research. 52: 1089-1119.

Chapman SC, Cooper M, Butler DG and Henzell RG (2000a) Genotype by environment interactions affecting grain sorghum. I. Characteristics that con-

Page 299: Narelle Kruger PhD thesis

BIBLIOGRAPHY

265

found interpretation of hybrid seed. Australian Journal of Agricultural Re-search. 51: 197-207.

Chapman SC, Cooper M, Butler DG and Henzell RG (2000b) Genotype by environment interactions affecting grain sorghum. II. Frequencies of different seasonal patterns of drought stress are related to location effects on hybrid yields. Australian Journal of Agricultural Research. 51: 209-221.

Chapman SC, Cooper M, Butler DG and Henzell RG (2000c) Genotype by environment interactions affecting grain sorghum. III. Temporal sequences and spatial patterns in the target population of environments. Australian Journal of Agricultural Research. 51: 223-234.

Chapman SC, Cooper M, Podlich DW and Hammer GL (2003) Evaluating plant breeding strategies by simulating gene action and dryland environment effects. Agronomy Journal. 95: 99-113.

Charlesworth D, Morgan MT and Charlesworth B (1992) The effect of linkage and population size on inbreeding depression due to mutational load. Genetical Re-search. 59: 49-61.

Charlesworth D, Morgan MT and Charlesworth B (1993) Mutation accumulation in finite outbreeding and inbreeding populations. Genetical Research. 61: 39-56.

Charmet G (2000) Power and accuracy of QTL detection: simulation studies of one-QTL models. Agronomie. 20: 309-323.

Cheverud JM and Routman E (1993) Quantitative trait loci: individual gene effects on quantitative characters. Journal of Evolutionary Biology. 6: 463-480.

Cheverud JM and Routman EJ (1995) Epistasis and its contribution to genetic variance components. Genetics. 139: 1455-1461.

Cheverud JM (2001) The genetic architecture of pleiotrophic relations and differential epistasis. In: GP Wagner (ed.) The Character Concept in Evolutionary Biology. Academic Press: San Diego. pp. 411-433.

Churchill GA and Doerge RW (1994) Empirical threshold values for quantitative trait mapping. Genetics. 138: 963-971.

Cochran WG (1951) Improvement by means of selection. In: J Neyman (ed.) Proceed-ings of the Second Berkley Symposium on Mathematical Statistics and Probabil-ity. University of California: University of California Press. pp. 449-470.

Cockerham CC and Zeng Z-B (1996) Design III with marker loci. Genetics. 143: 1437-1456.

Comstock RE and Moll RH (1963) Genotype-environment interactions. In: WD Hanson and HF Robinson (eds). Statistical genetics and plant breeding, Na-tional Academy of Sciences-National Research Council, Publication 982. Na-

Page 300: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

266

tional Academy of Sciences-National Research Council: Washington. pp. 164-196.

Comstock RE (1977) Quantitative genetics and the design of breeding programs. In: E Pollack et al. (eds). Proceedings of the International Conference on Quantitative Genetics. Iowa: Iowa State University Press. pp. 705-718.

Comstock RE (1996) Quantitative genetics with special reference to plant and animal breeding. Iowa State University Press: Ames.

Cooper M, Byth DE and DeLacy IH (1993a) A procedure to assess the relative merit of classification strategies for grouping environments to assist selection in plant breeding regional evaluation trials. Field Crops Research. 35: 63-74.

Cooper M, Byth DE, DeLacy IH and Woodruff DR (1993b) Predicting grain yield in Australian environments using data from CIMMYT international wheat per-formance trial. 1. Potential for exploiting correlated response to selection. Field Crops Research. 32: 305-322.

Cooper M, Byth DE and Woodruff DR (1994a) An investigation of the grain yield adaptation of advanced CIMMYT wheat lines to water stress environments in Queensland. 1. Crop physiology analysis. Australian Journal of Agricultural Re-search. 45: 965-984.

Cooper M, Byth DE and Woodruff DR (1994b) An investigation of the grain yield adaptation of advanced CIMMYT wheat lines to water stress environments in Queensland. 2. Classification analysis. Australian Journal of Agricultural Re-search. 45: 985-1002.

Cooper M and DeLacy IH (1994) Relationships among analytical methods used to study genotypic variation and genotype-by-environment interaction in plant breeding multi-environment experiments. Theoretical and Applied Genetics. 88: 561-572.

Cooper M, Woodruff DR, Eisemann RL, Brennan PS and DeLacy IH (1995) A selection strategy to accommodate genotype-by-environment interaction for grain yield of wheat: managed-environments for selection among genotypes. Theoretical and Applied Genetics. 90: 492-502.

Cooper M, Brennan PS and Sheppard JA (1996a) A strategy for yield improvement of wheat which accommodates large genotype by environment interaction. In: M Cooper and GL Hammer (eds). Plant Adaptation and Crop Improvement. CAB International in association with IRRI and ICRISAT: United Kingdom. pp. 487-511.

Cooper M, DeLacy IH and Basford KE (1996b) Relationship among analytical methods used to analyse genotypic adaptation in multi-environment trials. In: M Cooper and GL Hammer (eds). Plant Adaptation and Crop Improvement. CAB

Page 301: Narelle Kruger PhD thesis

BIBLIOGRAPHY

267

International in association with IRRI and ICRISAT: United Kingdom. pp. 193-224.

Cooper M and Hammer GL (1996) Synthesis of strategies for crop improvement. In: M Cooper and GL Hammer (eds). Plant Adaptation and Crop Improvement. CAB International in association with IRRI and ICRISAT: United Kingdom. pp. 591-623.

Cooper M, Stucker RE, DeLacy IH and Harch BD (1997) Wheat breeding nurseries, target environments, and indirect selection for grain yield. Crop Science. 37: 1168-1176.

Cooper M (1998) Pers. Comm.

Cooper M, Jensen NM, Carroll BJ, Godwin ID and Podlich DW (1999a) QTL mapping activities and marker assisted selection for yield in the Germplasm En-hancement Program of the Australian Northern Wheat Improvement Program. In: JM Ribaut and D Poland (eds). Molecular Approaches for the Genetic Im-provement of Cereals for Stable Production in Water-Limited Environments, A Strategic Planning Workshop held at CIMMYT, El Batan, Mexico, June 21-25. Mexico D.F.: CIMMYT. pp. 120-127.

Cooper M and Podlich DW (1999) Breeding field crops for farming systems: A case for modelling breeding programs. In: 11th Australian Plant breeding Confer-ence Proceedings, Vol. 1. Adelaide.

Cooper M, Podlich DW and Fukai S (1999b) Combining information from multi-environment trials and molecular markers to select adaptive traits for yield im-provement of rice in water-limited environments. In: O Ito et al. (eds). Genetic improvement of rice for water-limited environments. Proceedings of the Work-shop on Genetic Improvement of Rice for Water-Limited Environments. Los Banos: International Rice Research Institute. pp. 13-33.

Cooper M, Podlich DW, Jensen NM, Chapman SC and Hammer GL (1999c) Modelling plant breeding programs. Trends in Agronomy. 2: 33-64.

Cooper M, Rajatasereekul S, Somrith B, Sriwisut S, Immark S, Boonwite C, Suwanwongse A, Ruangsook S, Hanviriyapant P, Romyen P, Porn-uraisanit P, Skulkhu E, Fukai S, Basnayake J and Podlich DW (1999d) Rainfed low-land rice breeding strategies for Northeast Thailand. II. Comparison of intrasta-tion and interstation selection. Field Crops Research. 64: 153-176.

Cooper M, Chapman SC, Podlich DW and Hammer GL (2002a) The GP problem: quantifying gene-to-phenotype relationships. In Silico Biology. 2: 151-164.

Cooper M and Podlich DW (2002) The E(NK) model: extending the NK model to incorporate gene-by-environment interactions and epistasis for diploid genomes. Complexity. 7: 31-47.

Page 302: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

268

Cooper M, Podlich DW, Micallef KP, Smith OS, Jensen NM, Chapman SC and Kruger NL (2002b) Complexity, quantitative traits and plant breeding: a role for simulation modeling in the genetic improvement of crops. In: MS Kang (ed.) Quantitative Genetics, Genomics and Plant Breeding. CAB International: Wal-lingford, UK. pp. 143-166.

Cooper M, Podlich DW and Smith OS (2005) Gene-to-phenotype models and complex trait genetics. Australian Journal of Agricultural Research. 56: 895-918.

Cress CE (1967) Reciprocal recurrent selection and modifications in simulated populations. Crop Science. 7: 561-567.

Crow JF and Kimura M (1979) Efficiency of truncation selection. Proceedings of the National Academy of Sciences of the United States of America. 76: 396-399.

Damerval C, Maurice A, Josse JM and de Vienne D (1994) Quantitative trait loci underlying gene product variation: a novel perspective for analyzing regulation of genome expression. Genetics. 137: 289-301.

Darvasi A, Weinreb A, Minke V, Weller JI and Soller M (1993) Detecting marker-QTL linkage and estimating QTL gene effect and map location using a saturated genetic map. Genetics. 134: 943-951.

De Koyer DL, Phillips RL and Stuthman DD (1999) Changes in genetic diversity during seven cycles of selection for grain yield in oat, Avena sativa L. Plant Breeding. 118: 37-43.

De Koyer DL, Phillips RL and Stuthman DD (2001) Allelic shifts and quantitative trait loci in a recurrent selection population of oat. Crop Science. 41: 1228-1234.

Dekkers JCM and Hospital F (2002) The use of molecular genetics in the improve-ment of agricultural populations. Nature Reviews Genetics. 3: 22-32.

DeLacy IH, Eisemann RL and Cooper M (1990) The importance of genotype-by-environment interaction in regional variety trials. In: MS Kang (ed.) Genotype-by-Environment Interaction and Plant Breeding. Louisiana State University: Louisiana. pp. 108-117.

Dhaliwal AS, Mares DJ and Marshall DR (1987) Effect of 1B/1R chromosome on milling and quality characteristics of bread wheats. Cereal Chemistry. 64: 72-76.

Doebley J, Stec A and Gustus C (1995) Teosinte branched1 and the origin of maize: evidence for epistasis and the evolution of dominance. Genetics. 141: 333-346.

Doerge RW and Churchill GA (1996) Permutation tests for multiple loci affecting a quantitative character. Genetics. 142: 285-294.

Page 303: Narelle Kruger PhD thesis

BIBLIOGRAPHY

269

Doerge RW, Zeng ZB and Weir BS (1997) Statistical issues in the search for genes affecting quantitative traits in experimental populations. Statistical Science. 12: 195-219.

Doerge RW (2002) Mapping and analysis of quantitative trait loci in experimental populations. Nature Reviews Genetics. 3: 43-52.

Douglas NJ (1985) Wheat growing in Queensland. Queensland Government: Brisbane. pp. 49.

Dreher K, Khairallah MM, Ribaut J-M and Morris M (2003) Money matters (I): costs of field and laboratory procedures associated with conventional and marker-assisted maize breeding at CIMMYT. Molecular Breeding. 11: 221-234.

Dudley JW (1993) Molecular markers in plant improvement: manipulation of genes affecting quantitative traits. Crop Science. 33: 660-668.

Duvick DN, Smith JSC and Cooper M (2004) Long-term selection in a commercial hybrid maize breeding program. Plant Breeding Reviews. 24: 109-151.

Edwards MD and Page NJ (1994) Evaluation of marker-assisted selection through computer simulation. Theoretical and Applied Genetics. 88: 376-382.

Empig LT, Gardner CO and Compton WA (1971) Theoretical gains for different population improvement procedures, Vol. MP26. University of Nebraska: Ne-braska.

Eshed Y and Zamir D (1996) Less-than-additive epistatic interactions of quantitative trait loci in tomato. Genetics. 143: 1807-1817.

Fabrizius MA, Cooper M, Podlich DW, Brennan PS, Ellison FW and DeLacy IH (1996) Design and simulation of a recurrent selection program to improve yield and protein in spring wheat. In: RA Richards et al. (eds). Proceedings of the Eighth Assembly Wheat Breeding Society of Australia. Canberra: Wheat Breed-ing Society of Australia. pp. P8-P11.

Fabrizius MA, Cooper M and Basford KE (1997) Genetic analysis of variation for grain yield and protein concentration in two wheat crosses. Australian Journal of Agricultural Research. 48: 605-614.

Falconer DS and Mackay TFC (1996) Introduction to quantitative genetics. Longman Group Ltd: Essex.

Fehr WR (1987) Principles of cultivar development, v.1. Theory and Technique. Macmillan Publishing Company: USA.

Felsenstein J (1979) A mathematically tractable family of genetic mapping functions with different amount of interference. Genetics. 91: 769-775.

Page 304: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

270

Fisher RA (1918) The correlation between relatives on the supposition of Mendelian inheritance. Transactions of the Royal Society of Edinburgh. 52: 399-433.

Fisher RA (1926) The arrangement of field experiments. Journal of the Ministry of Agriculture. 33: 503-513.

Fox PN, Lopez C, Skovmnd B, Sanchez H, Herrera R, White JW, Duveiller E and van Ginkel M (1996) International Wheat Information System (IWIS), Version 1. CIMMYT: Mexico, D.F. CD-ROM.

Fraser AS (1957a) Simulation of genetic systems by automatic digital computers I. Introduction. Australian Journal of Biological Science. 10: 484-491.

Fraser AS (1957b) Simulation of genetic systems by automatic digital computers II. Effects of linkage on rates of advance under selection. Australian Journal of Biological Science. 10: 491-499.

Fraser AS and Burnell D (1970) Computer Models in Genetics. McGraw-Hill Book Co.: New York.

Frisch M and Melchinger AE (2001a) Marker-assisted backcrossing for introgression of a recessive gene. Crop Science. 41: 1485-1494.

Frisch M and Melchinger AE (2001b) Marker-assisted backcrossing for simultaneous introgression of two genes. Crop Science. 41: 1716-1725.

Gadau J, Page RE and Werren JH (2002) The genetic basis of the interspecific differences in wing size in Nasonia (Hymenoptera; Pteromalidae): major quanti-tative trait loci and epistasis. Genetics. 161: 673-684.

Gallais A (1988) A method of line development using doubled haploids: the single doubled haploid descent recurrent selection. Theoretical and Applied Genetics. 75: 330-332.

Gallais A (1989) Optimization of recurrent selection on the phenotypic value of doubled haploid lines. Theoretical and Applied Genetics. 77: 501-504.

Gallais A (1990) Quantitative genetics of doubled haploid populations and application to the theory of line development. Genetics. 124: 199-206.

Gardner CO (1963) Estimates of genetic parameters in cross-fertilizing plants and their implications in plant breeding. In: WD Hanson and HF Robinson (eds). Statisti-cal genetics and plant breeding, National Academy of Sciences-National Re-search Council, Publication 982, Vol. 982. National Academy of Sciences-National Research Council: Washington.

Gilmour AR, Cullis BR and Verbyla AP (1999) ASREML program user manual. NSW Agriculture: Orange.

Page 305: Narelle Kruger PhD thesis

BIBLIOGRAPHY

271

Gimelfarb A and Lande R (1994a) Simulation of marker assisted selection in hybrid populations. Genetical Research. 63: 39-47.

Gimelfarb A and Lande R (1994b) Simulation of marker assisted selection for non-additive traits. Genetical Research. 64: 127-136.

Gimelfarb A and Lande R (1995) Marker-assisted selection and marker-QTL associations in hybrid populations. Theoretical and Applied Genetics. 91: 522-528.

Goldringer I, Brabant P and Gallais A (1997) Estimation of additive and epistatic genetic variances for agronomic traits in a population of doubled-haploid lines of wheat. Heredity. 79: 60-71.

Griffing B (1975) Efficiency changes due to use of doubled-haploids in recurrent selection methods. Theoretical and Applied Genetics. 46: 367-386.

Haldane JBS (1931) The combination of linkage values, and the calculation of distances between the loci of linked factors. Journal of Genetics. 8: 299-309.

Haldane JBS (1947) The interaction of nature and nurture. Annals of Eugenics. 13: 197-205.

Hallauer AR (1981) Selection and breeding methods. In: KJ Frey (ed.) Plant Breeding II. Iowa State University Press: Iowa. pp. 3-55.

Hallauer AR and Miranda FJB (1988) Quantitative genetics in maize breeding. The Iowa State University Press: Iowa.

Hammer GL, Chapman SC, van Oosterom E and Podlich DW (2004) Trait physiology and crop modelling to link phenotypic complexity to underlying ge-netic systems. In: T Fischer et al. (eds). New directions for a diverse planet: Proceedings for the 4th International Crop Science Congress. Brisbane, Austra-lia.

Hayes PM, Liu B-H, Knapp SJ, Chen F, Jones B, Blake T, Franckowiak J, Rasmusson D, Sorrells M, Ullrich SE, Wesenberg DM and Kleinhofs A (1993) Quantitative trait loci effects and environmental interaction in a sample of North American barley germplasm. Theoretical and Applied Genetics. 87: 392-401.

Hayman BI (1958) The seperation of epistatic from additive and dominance variation in generation means. Heredity. 12: 371-390.

Holland JB (2001) Epistasis and plant breeding. Plant Breeding Reviews. 21: 27-92.

Holland JB (2004) Implementation of molecular markers for quantitative traits in breeding programs - challenges and opportunities. In: T Fischer et al. (eds). New directions for a diverse planet: Proceedings for the 4th International Crop Sci-ence Congress. Brisbane, Australia.

Page 306: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

272

Hospital F and Charcosset A (1997) Marker-assisted introgression of quantitative trait loci. Genetics. 147: 1469-1485.

Hospital F, Moreau L, Lacoudre F, Charcosset A and Gallais A (1997) More on the efficiency of marker-assisted selection. Theoretical and Applied Genetics. 95: 1181-1189.

Howes NK, Woods SM and Townley-Smith TF (1998) Simulations and practical problems of applying multiple marker assisted selection and doubled haploids to wheat breeding programs. In: HJ Braun et al. (eds). Wheat: Prospects for Global Improvement. Developments in Plant Breeding Volume 6. Kluwer Academic Publishers: Netherlands. pp. 291-296.

Jansen RC (1993) Interval mapping of multiple quantitative trait loci. Genetics. 135: 205-211.

Jansen RC (1994) Controlling the type I and type II errors in mapping quantitative trait loci. Genetics. 138: 871-881.

Jansen RC and Stam P (1994) High resolution of quantitative traits into multiple loci via interval mapping. Genetics. 136: 1447-1455.

Jansen RC, Jannink J-L and Beavis WD (2003) Mapping quantitative trait loci in plant breeding populations: use of parental haplotype sharing. Crop Science. 43: 829-834.

Jensen NM and Kammholz S (1998) A wheat × maize cross protocol for the develop-ment of doubled haploid wheat populations. The University of Queensland, School of Land and Food, Plant Improvement Group Research Report No.3.

Jensen NM (2004) Investigating quantitative genetic issues for a pedigree plant breeding program using computer simulation. PhD. The University of Queen-sland, Brisbane.

Kao C-H, Zeng Z-B and Teasdale RD (1999) Multiple interval mapping for quantita-tive trait loci. Genetics. 152: 1203-1216.

Karlin S and Liberman U (1978) Classification and comparison of multilocus recombination distributions. Proceedings of the National Academy of Sciences (USA). 75: 6332-6336.

Kauffman SA (1993) The origins of order: self-organization and selection in evolution. Oxford University Press, Inc.: Oxford.

Kearsey MJ and Jinks JL (1968) A general method of detecting additive, dominance and epistatic variation for metrical traits. I. Theory. Heredity. 23: 403-409.

Kearsey MJ and Pooni HS (1996) The genetical analysis of quantitative traits. Chapman and Hall: London. pp. 381.

Page 307: Narelle Kruger PhD thesis

BIBLIOGRAPHY

273

Keen RE and Spain JD (1992) Computer simulation in biology: a BASIC introduction. Wiley-Liss, Inc: New York.

Kempthorne O (1969) An introduction to genetic statistics. The Iowa State University Press: Ames.

Kempthorne O (1988) An overview of the field of quantitative genetics. In: BS Weir et al. (eds). Proceedings of the Second International Conference on Quantitative Genetics. Sunderland, MA: Sinauer Associates Inc. pp. 47-56.

Knapp SJ (1994) Mapping quantitative trait loci. In: RL Phillips and IK Vasil (eds). DNA based markers in plants, Vol. 1. Kluwer Academic Publishers: Nether-lands. pp. 58-96.

Knapp SJ (1998) Marker-assisted selection as a strategy for increasing the probability of selecting superior genotypes. Crop Science. 38: 1164-1174.

Knott SA and Haley CS (2000) Multitrait least squares for quantitative trait loci detection. Genetics. 156: 899-911.

Koester RP, Sisco PH and Stuber CW (1993) Identification of quantitative trait loci controlling days to flowering and plant height in two near isogenic lines of maize. Crop Science. 33: 1209-1216.

Korzun V (2003) Molecular markers and their applications in cereals breeding. In: P Donini et al. (eds). Marker assisted selection: a fast track to increase genetic gain in plant and animal breeding? The University of Turin, Turin, Italy. pp. 18-22.

Kosambi DD (1944) The estimation of the map distance from recombination values. Annals of Eugenics. 12: 172-175.

Kruger NL (1999) Simulation analysis of doubled haploids in a wheat breeding program. The University of Queensland, School of Land and Food Sciences, Plant Improvement Group Research Report No.5.

Kruger NL, Podlich DW and Cooper M (1999) Comparison of S1 and doubled haploid recurrent selection strategies by computer simulation with applications for the Germplasm Enhancement Program of the Northern Wheat Improvement Program. In: P Williamson et al. (eds). Proceedings of the Ninth Assembly Wheat Breeding Society of Australia - Vision 2020. Toowoomba: The Univer-sity of Southern Queensland. pp. 216-219.

Kruger NL, Cooper M, Podlich DW, Jensen NM and Basford KE (2001) The effect of population size on QTL detection in recombinant inbred lines. In: G Hol-lamby et al. (eds). Wheat Breeding Society of Australia Inc.10th Assembly Pro-ceedings. Mildura, Australia. pp. 194-196.

Kruger NL, Cooper M and Podlich DW (2002) Comparison of phenotypic, marker and marker-assisted selection strategies in an S1 family recurrent selection strat-

Page 308: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

274

egy. In: JA McComb (ed.) 'Plant Breeding for the 11th Millennium'. Proceed-ings of the 12th Australasian Plant Breeding Conference, 15-20 September 2002. Perth, W. Australia: Australasian Plant Breeding Association Inc. pp. 696-701.

Lande R and Thompson R (1990) Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics. 124: 743-756.

Lande R (1992) Marker-assisted selection in relation to traditional methods of plant breeding. In: HT Stalker and JP Murphy (eds). Plant breeding in the 1990s, Proceedings of the symposium on plant breeding in the 1990s. C.A.B Interna-tional: Raleigh. pp. 437-451.

Lander ES, Green P, Abrahamson J, Barlow A, Daley M, Lincoln SE and New-burg L (1987) MAPMAKER: An interactive computer package for constructing primary genetic linkage maps of experimental and natural populations. Genom-ics. 1: 174-181.

Lander ES and Botstein D (1989) Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics. 121: 185-199.

Lark KG, Chase K, Adler FR, Mansur LM and Orf JH (1995) Interactions between quantitative trait loci in soybean in which trait variation at one locus is condi-tional upon a specific allele at another. Proceedings of the National Academy of Sciences (USA). 92: 4656-4660.

Lascoux M (1997) Unpredictability of correlated response to selection: linkage and initial frequency also matter. Evolution. 51: 1394-1400.

Latter BDH (1998) Mutant alleles of small effects are primarily responsible for the loss of fitness with slow inbreeding in Drosophila melanogaster. Genetics. 148: 1143-1158.

Laurie DA and Bennett MD (1986) Wheat × maize hybridisation. Canadian Journal of Genetics and Cytology. 28: 313-316.

Laurie DA and Bennett MD (1988) The production of wheat plants from wheat × maize crosses. Theoretical and Applied Genetics. 76: 393-397.

Lee M (1995) DNA markers and plant breeding programs. Advances in Agronomy. 55: 265-344.

Liu B-H (1998) Statistical genomics: linkage, mapping and QTL analysis. CRC Press: Boca Raton.

Liu S-C, Kowalski SP, Lan T-H, Feldmann KA and Paterson AH (1996) Genome-wide high-resolution mapping by recurrent intermating using Arabidopsis thaliana as a model. Genetics. 142: 247-258.

Page 309: Narelle Kruger PhD thesis

BIBLIOGRAPHY

275

Long AD, Mullaney SL, Reid LA, Fry JD, Langley CH and Mackay TFC (1995) High resolution mapping of genetic factors affecting abdominal bristle number in Drosophila melanogaster. Genetics. 139: 1273-1291.

Ludwig W (1934) Über numerische Beziehungen der Crossover-Werte untereinander. Zeitschrift für induktive Abstammungs- und Vererbungslehre. 67: 58-95.

Lukens LN and Doebley J (1999) Epistatic and environmental interactions for quantitative trait loci involved in maize evolution. Genetical Research. 74: 291-302.

Lynch M and Walsh B (1998) Genetics and analysis of quantitative traits. Sinauer Associates, Inc: Massachusetts.

Mackay TFC (2001) The genetic architecture of quantitative traits. Annual Review of Genetics. 35: 303-339.

Mackay TFC (2004) The genetic architecture of quantitative traits: lessons from Drosophila. Current Opinion in Genetics and Development. 14: 1-5.

Manly KF and Olson JM (1999) Overview of QTL mapping software and introduction to Map Manager QT. Mammalian Genome. 10: 327-334.

Marino CL, Nelson JC, Lu YH, Sorrells ME, Leroy P, Tuleen NA, Lopes CR and Hart GE (1996) Molecular genetic maps of the group 6 chromosomes of hexap-loid wheat (Triticum aestivum L em Thell). Genome. 39: 359-366.

Martin Jr FG and Cockerham CC (1960) High speed selection studies. In: O Kempthorne (ed.) Biometrical genetics. Pergamon Press: London. pp. 35-45.

Mathews KL, Chapman SC, Butler DG, Cooper M, DeLacy IH, Sheppard JA, Kelly A and Sahama T (2002) Inter-annual changes in genotypic and genotype by environment variance components for different stages of the Northern Wheat Improvement Program. In: JA McComb (ed.) 'Plant Breeding for the 11th Mil-lennium'. Proceedings of the 12th Australasian Plant Breeding Conference, 15-20 September 2002. Perth, W. Australia: Australasian Plant Breeding Associa-tion Inc. pp. 650-654.

Mauricio R (2001) Mapping quantitative trait loci in plants: Uses and caveats for evolutionary biology. Nature Reviews Genetics. 2: 370-381.

McMullen MD, Snook M, Lee EA, Byrne PF, Kross H, Musket TA, Houchins K and Coe EHJ (2001) The biological basis of epistasis between quantitative trait loci for flavone and 3-deoxyanthocyanin synthesis in maize (Zea mays L.). Ge-nome. 44: 667-676.

McPeek MS and Speed TP (1995) Modeling interference in genetic recombination. Genetics. 139: 1031-1044.

Page 310: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

276

Micallef KP, Cooper M and Podlich DW (2001) Using clusters of computers for large QU-GENE simulation experiments. Bioinformatics. 17: 194-195.

Montana Wheat & Barley Committee (2001) Grains Market Report 30/11/2000. International Grains Council.

http://wbc.agr.state.mt.us/prodfacts/wf/wptwp.htmlwebsite.

Montana Wheat & Barley Committee (2002) Australian Winter Wheat map. USDA. http://wbc.agr.state.mt.us/prodfacts/maps/wwau.htmlwebsite.

Moore GE (1965) Cramming more components onto integrated circuits. Electronics. 38.

Moreau L, Lemarie S, Charcosset A and Gallais A (2000) Economic efficiency of one cycle of marker-assisted selection. Crop Science. 40: 329-337.

Morris M, Dreher K, Ribaut J-M and Khairallah MM (2003) Money matters (II): costs of maize inbred line conversion schemes at CIMMYT using conventional and marker-assisted selection. Molecular Breeding. 11: 235-247.

Mosteller F (1948) A k-sample slippage test for an extreme population. The Annals of Mathematical Statistics. 19: 58-65.

Mulitze DK and Baker RJ (1985) Evaluation of biometrical methods for estimating the number of genes 2. Effect of type I and type II statistical errors. Theoretical and Applied Genetics. 69: 559-566.

Nadella KD (1998) An investigation of the potential of using marker assisted selection for the genetic improvement of wheat in the northern region of Australia. PhD. The University of Queensland, Brisbane.

Nadella KD, Peake AS, Bariana HS, Cooper M, Godwin ID and Carroll BJ (2002) A rapid PCR protocol for marker assisted detection of heterozygotes in segregat-ing generations involving 1BL/1RS translocation and normal wheat lines. Aus-tralian Journal of Agricultural Research. 53: 931-938.

Nelson JC, Sorrells ME, Vandeynze AE, Lu YH, Atkinson M, Bernard M, Leroy P, Faris JD and Anderson JA (1995a) Molecular mapping of wheat - major genes and rearrangements in homoeologous group-4, group-5, and group-7. Ge-netics. 141: 721-731.

Nelson JC, Vandeynze AE, Autrique E, Sorrells ME, Lu YH, Merlino M, Atkinson M and Leroy P (1995b) Molecular mapping of wheat - homoeologous group-2. Genome. 38: 516-524.

Nelson JC, Vandeynze AE, Autrique E, Sorrells ME, Lu YH, Negre S, Bernard M and Leroy P (1995c) Molecular mapping of wheat - homoeologous group-3. Genome. 38: 525-533.

Page 311: Narelle Kruger PhD thesis

BIBLIOGRAPHY

277

Ober C and Cox NJ (1998) Mapping genes for complex traits in founder populations. Clinical and Experimental Allergy. 28: 101-105.

Ohno Y, Tanase H, Nabika T, Otsuka K, Sasaki T, Suzawa T, Morii T, Yamori Y and Saruta T (2000) Selective genotyping with epistasis can be utilised for a major quantitative trait locus mapping in hypertension in rats. Genetics. 155: 785-792.

Openshaw SJ and Frascaroli E (1997) QTL detection and marker-assisted selection for complex traits in maize. In: Proceedings of the 52nd Annual Corn and Sor-ghum Research Conference. Washington DC: ASTA (American Seed Trade As-sociation). pp. 44-53.

Pandey S and Rajatasereekul S (1999) Economics of plant breeding: the value of shorter breeding cycles for rice in Northeast Thailand. Field Crops Research. 64: 187-197.

Paterson AH, Damon S, Hewitt JD, Zamir D, Rabinowitch HD, Lincoln SE, Lander ES and Tanksley SD (1991) Mendelian factors underlying quantitative traits in tomato: Comparison across species, generations, and environments. Ge-netics. 127: 181-197.

Paterson AH (1998) High resolution mapping of QTLs. In: AH Paterson (ed.) Molecular Dissection of Complex Traits. CRC Press: Boca Raton. pp. 163-173.

Peake AS (2002) Inheritance of grain yield, and effect of the 1BL/1RS translocation, in three bi-parental wheat (Triticum aestivum) populations in production environ-ments of north-eastern Australia. Master of Agricultural Science. The University of Queensland, St Lucia.

Peccoud J, Vander Velden K, Podlich DW, Winkler CR, Arthur L and Cooper M (2004) The selective values of alleles in a molecular network model are context dependent. Genetics. 166: 1715-1725.

Picard E, Parisot C, Blanchard P, Brabant P, Causse M, Doussinault G, Trottet M and Rousset M (1988) Comparison of the doubled haploid method with other breeding procedures in wheat (Triticum aestivum) when applied to populations. In: TE Miller and RMD Koebner (eds). Proceedings of the Seventh Interna-tional Wheat Genetics Symposium held at Cambridge, England, 13-19 July 1988. Cambridge: Institute of Plant Science Research Cambridge Laboratory.

Podlich DW and Cooper M (1997) QU-GENE: a platform for quantitative analysis of genetic models. Centre for Statistics Research Report 83. The University of Queensland Centre for Statistics Research Report 83.

Podlich DW and Cooper M (1998) QU-GENE: a simulation platform for quantitative analysis of genetic models. Bioinformatics. 14: 632-653.

Podlich DW (1999) Using simulation to model plant breeding programs as search strategies on a response surface. PhD. The University of Queensland, Brisbane.

Page 312: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

278

Podlich DW and Cooper M (1999) Modelling plant breeding programs as search strategies on a complex response surface. Simulated Evolution and Learning, Vol. 1585. Springer-Verlag Berlin: Berlin. pp. 171-178.

Podlich DW, Cooper M and Basford KE (1999) Computer simulation of a selection strategy to accommodate genotype-environment interactions in a wheat recurrent selection programme. Plant Breeding. 118: 17-28.

Podlich DW, Winkler CR and Cooper M (2004) Mapping as you go: an effective approach for marker-assisted selection of complex traits. Crop Science. 44: 1560-1571.

Powell W, Thomas DM, Swanston JS and Waugh R (1992) Association between rDNA alleles and quantitative traits in doubled haploid populations of barley. Genetics. 130: 187-194.

Qureshi AW (1968) The role of finite population size and linkage in response to continued truncation selection. II. Dominance and overdominance. Theoretical and Applied Genetics. 38: 264-270.

Qureshi AW and Kempthorne O (1968) On the fixation of genes of large effects due to continued truncation selection in small populations of polygenic systems with linkage. Theoretical and Applied Genetics. 38: 249-255.

Qureshi AW, Kempthorne O and Hazel LN (1968) The role of finite population size and linkage in response to continued truncation selection. I. Additive gene ac-tion. Theoretical and Applied Genetics. 38: 256-263.

Rafalski JA and Tingey SV (1993) Genetic diagnostics in plant breeding: RAPDs, microsatellites and machines. Trends in Genetics. 9: 275-280.

Rahman MA, Siddquie NA, Robiul Alam M, Khan ASMMR and Alam MS (2003) Genetic analysis of some yield contributing and quality characters in spring wheat (Triticum aestivum). Asian Journal of Plant Sciences. 2: 277-282.

Rao DC, Morton NE, Lindsten J, Hulten M and Yee S (1977) A mapping function for man. Human Heredity. 27: 99-104.

Riley R and Chapman V (1958) Genetic control of the cytologically diploid behaviour of hexaploid wheat. Nature. 182: 713-715.

Robertson A (1959) The sampling variance of the genetic correlation coefficient. Biometrics: 469-485.

Ronningen K (1976) A method for the estimation of appropriate selection intensity from skewed distribution. Acta Agriculturae Scandinavica. 26: 82-86.

Sax K (1923) The association of size differences with seed-coat pattern and pigmenta-tion in Phaseolus vulgaris. Genetics. 8: 552-560.

Page 313: Narelle Kruger PhD thesis

BIBLIOGRAPHY

279

Scheinberg E (1968) Methodology of computer genetics research. Canadian Journal of Genetics and Cytology. 10: 754-761.

Schlegel R and Meinel A (1994) A quantitative trait locus (QTL) on chromosome are 1RS of Rye and its effect on yield performance of hexaploid wheat. Cereal Re-search Communications. 22: 7-13.

Schrage M (1999) Serious Play. Harvard Business School Press: Boston.

Simmonds DH (1989) Wheat and wheat quality in Australia. CSIRO: Australia. pp. 299.

Singh RP, Huerta-Espino J, Rajaram S and Crossa J (1998) Agronomic effects from chromosome translocations 7DL.7Ag and 1BL.1RS in spring wheat. Crop Sci-ence. 36: 27-33.

Snape JW, Law CN and Worland AJ (1975) A method for the detection of epistasis in chromosome substitution lines of hexaploid wheat. Heredity. 34: 297-303.

Snape JW and Riggs TJ (1975) Genetical consequences of single seed descent in the breeding of self-pollinating crops. Heredity. 35: 211-219.

Soller M, Brody T and Genizi A (1976) On the power of experimental design for the detection of linkage between marker loci and quantitative loci in crosses be-tween inbred lines. Theoretical and Applied Genetics. 47: 35-39.

Speed TP, McPeek MS and Evans SN (1992) Robustness of the no-interference model for ordering genetic markers. Proceedings of the National Academy of Sciences of the United States of America. 89: 3103-3106.

Spelman RJ and Bovenhuis H (1998) Moving from QTL experimental results to the utilization of QTL in breeding programmes. Animal Genetics. 29: 77-84.

Stam P (1994) Marker-assisted breeding. In: JW Van Ooijen (ed.) Biometrics in plant breeding: applications of molecular markers. Wageningen; The Netherlands: EUCARPIA. pp. 32-44.

Strahwald JF and Geiger HH (1988) Theoretical studies on the usefulness of doubled haploids for improving the efficiency of recurrent selection in spring barley. In: Proceedings of the Seventh Meeting of the EUCARPIA Section, Biometrics in plant breeding. Norway: The Norwegian State Agricultural Research Stations, Norway.

Stuber CW, Lincoln SE, Wolff DW, Helentjaris T and Lander ES (1992) Identifica-tion of genetic-factors contributing to heterosis in a hybrid from two elite maize inbred lines using molecular markers. Genetics. 132: 823-839.

Sturt E (1976) A mapping function for human chromosomes. Annals of Human Genetics. 40: 147-163.

Page 314: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

280

Susanto D, Cooper M, Carroll BJ and Godwin ID (2002) Genetic diversity among the 13 wheat lines used to create the base populations for yield improvement by recurrent selection in the Germplasm Enhancement Program. In: JA McComb (ed.) 'Plant Breeding for the 11th Millennium'. Proceedings of the 12th Austral-asian Plant Breeding Conference, 15-20 September 2002. Perth, W. Australia: Australasian Plant Breeding Association Inc. pp. 870-874.

Susanto D (2004) DNA markers for yellow spot resistance and agronomic traits in wheat (Triticum aestivum L.). PhD. The University of Queensland, Brisbane.

Sutton T, Whitford R, Baumann U, Dong C, Able JA and Langridge P (2003) The Ph2 pairing homoeologoue locus of wheat (Triticum aestivum): identification of candidate meiotic genes using a comparative genetics approach. The Plant Jour-nal. 36: 443-456.

Tanksley SD (1993) Mapping polygenes. Annual Review of Genetics. 27: 205-233.

Utz HF and Melchinger AE (1996) PLABQTL: A program for composite interval mapping of QTL. Journal of Quantitative Trait Loci. 2: Article 1.

Utz HF, Melchinger AE and Schön CC (2000) Bias and sampling error of the estimated prportion of genotypic variance explained by quantitative trait loci de-termined from experimental data in maize using cross valiadation and valiada-tion with independent samples. Genetics. 154: 1839-1849.

Van Berloo R and Stam P (1999) Comparison between marker-assisted selection and phenotypical selection in a set of Arabidopsis thaliana recombinant inbred lines. Theoretical and Applied Genetics. 98: 113-118.

van Eeuwijk FA, Crossa J, Vargas M and Ribaut J-M (2002) Analysing QTL-environment interaction be factorial regression, with an application to the CIM-MYT drought and low-nitrogen stress programme in maize. In: MS Kang (ed.) Quantitative Genetics, Genomics and Plant Breeding. CAB International. pp. 245-256.

Van Ooijen JW and Maliepaard C (1996) MapQTL™ version 3.0: Software for the calculation of QTL positions on genetic maps. CPRO-DLO: Wageningen.

Van Ooijen JW and Voorrips RE (2001) JoinMap® 3.0, Software for the calculation of genetic linkage maps. Plant Research International: Wageningen, Netherlands

Vandeynze AE, Dubcovsky J, Gill KS, Nelson JC, Sorrells ME, Dvorak J, Gill BS, Lagudah ES, McCouch SR and Appels R (1995) Molecular-genetic maps for group-1 chromosomes of triticeae species and their relation to chromosomes in rice and oat. Genome. 38: 45-59.

Villareal RL, Mujeeb-Kazi A, Rajaram S and Del Toro E (1994) Associated effects of chromosome 1B/1R translocation on agronomic traits in hexaploid wheat. Breeding Science. 44.

Page 315: Narelle Kruger PhD thesis

BIBLIOGRAPHY

281

Wade MJ (1992) Sewall Wright: gene interaction and the Shifting Balance Theory. Oxford Surveys in Evolutionary Biology. 8: 35-62.

Wade MJ (2001) Epistasis, complex traits, and mapping genes. Genetica. 112-113: 59-69.

Wang J, Podlich DW, Cooper M and DeLacy IH (2001) Power of the Joint Segrega-tion Analysis method for testing mixed major gene and polygene inheritance models of quantitative traits. Theoretical and Applied Genetics. 103: 804-816.

Wang J-K, van Ginkel M, Podlich DW, Ye G, Trethowan R, Pfeiffer W, DeLacy IH, Cooper M and Rajaram S (2003) Comparison of two breeding strategies by computer simulation. Crop Science. 43: 1764-1773.

Watson SL, Phillips IG and Basford KE (1995) Analyses and interpretation of yield from interstate wheat variety trial Series 24. In: RJ Puckridge (ed.) Australian Interstate Wheat Variety Trials 1994 Program. Adelaide: Grains Research and Development Corporation. pp. 5-9, 49-73.

Weir BS and Cockerham CC (1977) Two-locus theory in quantitative genetics. In: E Pollack et al. (eds). Proceedings of the First International Conference on Quan-titative Genetics. Ames, IA: Iowa State University Press. pp. 247-269.

Wenzl P, Caig V, Carling J, Cayla C, Evans M, Jaccoud D, Patarapuwadol S, Uszynski G, Xia L, Yang S, Huttner E and Kilian A (2004) Diversity Arrays Technology, a novel tool for harnessing crop genetic diversity. In: T Fischer et al. (eds). New directions for a diverse planet: Proceedings for the 4th Interna-tional Crop Science Congress. Brisbane, Australia.

Whitlock MC, Phillips PC, Moore FB-G and Tonsor SJ (1995) Multiple fitness peaks and epistasis. Annual Review of Ecology and Systematics. 26: 601-629.

Whittaker JC, Curnow RN, Haley CS and Thompson R (1995) Using marker-maps in marker-assisted selection. Genetical Research. 66: 255-265.

Whittaker JC, Haley CS and Thompson J (1997) Optimal weighting of information in marker-assisted selection. Genetical Research. 69: 137-144.

Williams AG and Williams RW (2004) GenomeMixer: a complex genetic cross simulator. Bioinformatics. 20: 2491-2492.

Williams W (1964) Genetical principles and plant breeding. Blackwell Scientific Publications: Oxford.

Wolfram S (2002) A new kind of science. Wolfram Media, Inc: Champaign.

Wricke G and Weber WE (1986) Quantitative genetics and selection in plant breeding. Walter de Gruyter & Co.: Berlin.

Page 316: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

282

Wright S (1932) The roles of mutation, inbreeding, cross breeding and selection in evolution. In: DF Jones (ed.) Proceedings of the Sixth International Conference of Genetics, Vol. 1. Ithaca, NY. pp. 356-366.

Wu RL (2000) Partitioning of population genetic variance under multiplicative-epistatic gene action. Theoretical and Applied Genetics. 100: 743-749.

Yan J, Zhu J, He C, Benmoussa M and Wu P (1998) Molecular dissection of developmental behaviour of plant height in rice (Oryza sativa L.). Genetics. 150: 1257-1265.

Ye G, Dieters M, Pudmenzky A, Micallef KP and Basford KE (2004) Simulation of positive assortment mating for inbred line development using QU-GENE. In: T Fischer et al. (eds). "New directions for a diverse planet". Proceedings of the 4th International Crop Science Congress. Brisbane.

Young ND (1999) A cautiously optimistic vision for marker-assisted breeding. Molecular Breeding. 5: 505-510.

Young SSY (1966) Computer simulation of directional selection in large populations I. The programme, the additive and the dominance models. Genetics. 53: 189-205.

Young SSY (1967) Computer simulation of directional selection in large populations II. The additive × additive and mixed models. Genetics. 56: 73-87.

Yousef GG and Juvik JA (2001) Comparison of phenotypic and marker-assisted selection for quantitative traits in sweet corn. Crop Science. 41: 645-655.

Yuh CH, Bolouri H and Davidson EH (1998) Genomic cis-regulatory logic: experi-mental and computational analysis of a sea urchin gene. Science. 279: 1896-1902.

Zeng Z-B (1993) Theoretical basis for seperation of multiple linked gene effects in mapping quantitative trait loci. Proceedings of the National Academy of Sci-ences of the United States of America. 90: 10972-10976.

Zeng Z-B (1994) Precision mapping of quantitative trait loci. Genetics. 136: 1457-1468.

Zeng Z-B (2000) Multiple Interval Mapping. QTL Mapping 2001, Southern Summer Institute in Statistical Genetics: Raleigh, NC.

Zhang W and Smith C (1992) Computer simulation of marker-assisted selection utilizing linkage disequilibrium. Theoretical and Applied Genetics. 83: 813-820.

Zhang W and Smith C (1993) Simulation of marker-assisted selection utilizing linkage disequilibrium: the effects of several additional factors. Theoretical and Applied Genetics. 86: 492-496.

Page 317: Narelle Kruger PhD thesis

BIBLIOGRAPHY

283

Zhao H, McPeek MS and Speed TP (1995a) Statistical analysis of chromatid interference. Genetics. 139: 1057-1065.

Zhao H, Speed TP and McPeek MS (1995b) Statistical analysis of crossover interfer-ence using the chi-squared model. Genetics. 139: 1045-1056.

Zhuang J-Y, Lin H-X, Lu J, Qian H-R, Hittalmani S, Huang N and Zheng K-L (1997) Analysis of QTL × environment interaction for yield components and plant height in rice. Theoretical and Applied Genetics. 95: 799-808.

Page 318: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

284

Page 319: Narelle Kruger PhD thesis

APPENDICES

285

APPENDICES

Page 320: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

286

Page 321: Narelle Kruger PhD thesis

APPENDIX 1

287

APPENDIX 1

ADDITIONAL INFORMATION ASSOCI-

ATED WITH CHAPTER 4

A1.1 Additional information for the response to selec-tion prediction equations A1.1.1 Gene action definitions for different prediction equations

Most of the response to selection equations considered in Chapter 4 relate to the

work of Falconer and Mackay (1996) and Comstock (1996). Falconer and Mackay

(1996) define gene action using the genetic parameters m, a, and d, where m is the

midpoint effect, a is the additive effect and d is the dominance effect such that the

genotypes are given the genotypic values BB = m + a, Bb = m + d and bb = m - a

(Falconer and Mackay 1996). Comstock (1996) defines gene action by allocating a gene

effect (u) and a gene action (a) such that the genotypes are given the values BB = 2u, Bb

= u + au and bb = 0.0; where a = -1 for complete dominance of the unfavourable allele,

a = 0 for additive, a = 1 for complete dominance of the favourable allele, and -1 < a > 1

is overdominance. Since both genetic models parameterise the differences between the

same three genotypes there is a relationship between the model parameters used by

Falconer and Mackay (1996) and those used by Comstock (1996), which provides a

stable foundation for the comparison of the prediction equations.

A1.1.2 Alternate S1 family prediction equations The basic S1 family prediction equation used in this thesis is shown in Chapter 4,

Equation (4.4). Another form of this response to selection prediction equation for the S1

family selection strategy was given by Fehr (1987) and is repeated here as Equation

Page 322: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

288

(A1.1). This response to selection prediction equation incorporates an explicit parental

control factor, dominance, and environmental interactions,

( )

2'

2 212' 4 2 21

' 4

Ac

AE DEeA D

kcR

t t

σ

σ σσ σ ση

=+

+ + +

, (A1.1)

where, cR is the expected gain per cycle, k is the standardised selection differential

applied to S1 families, c is the parental control factor which is 1 for S1 family selection, 2

'Aσ is the additive genetic variance plus a component that is mainly a function of degree

of dominance, 2eσ is the environmental (error) component of variance, η is the number

of replications per environment, t is the number of environments, 2'AEσ and 2

DEσ are the

additive-by-environmental and dominance-by-environmental interaction components of

variance, and 2Dσ is the dominance genetic variance (Fehr 1987).

A1.1.3 Effect of inbreeding on the variance components coeffi-cient

It is important to note that the coefficients of the variance components change

with the level of inbreeding in the different breeding strategies. When there is a

sequence of generations of selfing with families the variance is partitioned into within

and among line sources of genetic variance. For the cases of mass, S1 family and DH

line selection strategies each of these populations can be considered to be points on the

continuum of inbreeding, where the coefficient of inbreeding is represented by F

(Figure A1.1). Mass selection represents the case where the parents of the progeny are F

= 0. The S1 family structure is based on random individuals from a random mating

population. Therefore, the S1 family progeny and selfed progeny are from individuals

with an inbreeding coefficient, F = 0 (F2 random mating reference population) and for

DH lines the progeny are from completely inbred individuals (F∞ which is the same as

F = 1) is used.

Page 323: Narelle Kruger PhD thesis

APPENDIX 1

289

Figure A1.1 Inbreeding coefficient continuum from F = 0 (no inbreeding) to F = 1 (com-pletely inbred) for mass, S1 family and DH line selection

The coefficient of inbreeding affects the coefficient of the variance components

among the selection units associated with the different breeding strategies. When F = 0

the coefficient of the additive ( )2Aσ

and dominance ( )2

Dσ genetic variances are both 1.

As the coefficient of inbreeding increases (F→1) the additive genetic variance coeffi-

cient increases and the dominance genetic variance coefficient decreases. When the

inbreeding coefficient is F = 1 the additive genetic variance coefficient is two and the

dominance genetic variance coefficient is zero (Wricke and Weber 1986). The effect of

this can be examined by using the prediction equations. For the DH line (F = 1)

response to selection equation, Chapter 4, Equation (4.5) the coefficient for the additive

genetic variance is two, and the coefficient for the dominance genetic variance is zero,

while for mass selection (F = 0) response to selection equation, Chapter 4, Equation

(4.3) the coefficient for the additive genetic variance is one, and the coefficient for the

dominance genetic variance is one. The basis of this genetic difference is due to the

level of heterozygosity and therefore the dominance retained in the population with

inbreeding by self-pollination. As the number of selfing generations increases, the level

of heterozygosity decreases to zero when the genotypes are completely inbred (F∞).

When F = 1 all of the genetic variance is among homozygous lines and is therefore

additive, assuming there is no epistasis.

Inbreeding also effects the partitioning of genetic variance among and within

lines. As the level of inbreeding increases, there is greater variation among lines and

less variation within lines (Falconer and Mackay 1996). The increase in among line

variance ( )2bσ is due to the gene frequencies within each line moving towards either

F = 0

F = 1

mass S1 family

DH line

Page 324: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

290

zero or one. The movement of the gene frequencies within each line towards the

extreme values of zero or one results in a decrease in the within line variance ( )2wσ . As

inbreeding continues the among line variance increasingly becomes the majority of the

genetic variance and the within line variance is confounded with the experimental error 2 2

2 b εε

σ σση

⎛ ⎞+ ⎟⎜ ⎟=⎜ ⎟⎜ ⎟⎜⎝ ⎠, as it is part of the variation among plants within a plot (Fehr 1987).

The partitioning of the genetic variance was observed in the S1 family prediction

equation, Chapter 4, Equation (4.4). The DH prediction equation contains no within line

genetic variance as all individuals within a line are genetically identical, Chapter 4,

Equation (4.5).

A1.2 Quantitative genetics theory assumptions Quantitative genetic theory provides a modelling framework that can be used to

construct a mathematical representation of the effects of genes in populations. To derive

the common prediction equation simplifications and assumptions, as mentioned in

Chapter 4, Section 4.3.1.1, were applied. The common set of assumptions are: (i)

Mendelian inheritance: inheritance which follows the laws of segregation and independ-

ent assortment as proposed by Mendel; (ii) no mutation: removes a systematic process

(Falconer and Mackay 1996) capable of changing gene frequencies; (iii) infinite

populations: removes the effect of dispersive processes (Falconer and Mackay 1996);

(iv) Hardy-Weinberg equilibrium: maintenance of allele and genotype frequencies in a

population undergoing random mating in the absence of selection; (v) many genes with

small and equal effects: common assumption which may not be true for many traits; (vi)

no linkage of two loci situated close together on the same chromosome or linkage phase

equilibrium, thus there is no tendency for the occurrence together of two or more alleles

at closely linked loci more frequently than would be expected by chance; (vii) no

epistasis or interaction between non-allelic genes; (viii) no genotype-by-environment

interaction; and (ix) no correlated environmental effects between the environmental

values of two traits. To make the mathematical derivation of Comstock’s (1996)

prediction equations tractable, Comstock employed some of these simplifications in

addition to:

Page 325: Narelle Kruger PhD thesis

APPENDIX 1

291

1. mitosis, meiosis, gametogenesis and fertilisation follow patterns described

as normal in genetics texts;

2. all theory is for diploid or functionally diploid organisms;

3. sex-linked genes are ignored;

4. mostly assumes no epistasis and no multiple alleles;

5. linkage equilibrium is only assumed in part of the work otherwise the ef-

fects of linkage (in absence of epistasis and multiple alleles) is thoroughly

examined. When linkage disequilibrium is involved – assumptions prob-

abilities were based on:

a) that the S1 generation was formed by self fertilisation of random indi-

viduals from a random mating source population;

b) that each later generation was formed by self-fertilisation of random in-

dividuals from the immediately preceding generation;

c) note that linkage equilibrium in the source population was not assumed

and that the specifications that parents of each generation were random

members of their own generation equates to assuming no selection;

6. mutation is ignored;

7. barring that already mentioned, there are no restrictive assumptions made

on dominance, pleiotropy or G×E interaction;

8. procedures for obtaining ( )E k (expected standardised selection differen-

tial) assumes a “normal distribution” of the X̂ (selection criterion) pheno-

typic values of the experimental units that represent the selection criterion

(Comstock 1996).

A1.3 Assumption of normality in the base population does not hold when dominance is included

The assumption of normality was commonly found to be invalid under many

genetic models, leading to divergences between the expectations from prediction

equations and the simulation results. Invalidation of the assumption of normality

impacted the selection intensity (i) and resulted in significant differences between the

selection intensity used in a prediction equation and that realised in the simulation

Page 326: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

292

experiment. Departures from the additive model, e.g. the presence of dominance,

contribute to the genetic variance of the population tending to cause the phenotypic

distribution to skew (Figure A1.2b) and not conform to the assumption of a normal

distribution (Figure A1.2a). When selecting intensely, it is the tail of the frequency

distribution that contributes to the gains in selection therefore, knowing the skewness

coefficient of a frequency distribution is important (Cochran 1951).

-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 Figure A1.2 Change in the distribution of the measured phenotypic values when the ge-netic model deviates from the additive scenario; (a) normal distribution, (b) left skewed distribution

The effect of dominance on the skewness of the F2 base population distribution

was investigated using a genetic model involving three gene levels (N = 2, 10 and 200)

and five dominance levels; no dominance or additive (m = 1, a = 1, d = 0), partial

dominance (m = 1, a = 1, d = +0.5 or -0.5) and complete dominance (m = 1, a = 1, d =

+1 or -1). Other experimental variables investigated are outlined in Table A1.1. The

PEQ module (Chapter 4, Figure 4.4) was used to calculate the F2 population mean under

the mass selection strategy.

Table A1.1 Experimental variable levels used in the PEQ module to test the assump-tion that the individuals of the F2 are normally distributed

Experimental variable Levels F2 population size 1000 Selection strategy Mass selection Gene action additive, partial, complete No. plants per F2 plant (j) 1 No. reserve seed (b) 1 Linkage type coupling Per meiosis recombination fraction 0.5 Selection proportion 0.2 No. of genes 2, 10, 200 Heritability 1.0

(a) (b)

Page 327: Narelle Kruger PhD thesis

APPENDIX 1

293

The F2 base population genotype values for the 1000 individuals for 1000 runs

was recorded and their frequency (expressed as a percentage of the total number of

individuals) was graphed (Figure A1.3). In the presence of coupling phase linkage

associations in the base population, linkage disequilibrium resulted in an increase in the

association of the dominant alleles (Figure A1.3). Under this situation the assumption of

the multi-genic genotype values being normally distributed does not hold. The deviation

from normality increased as the amount of dominance in the population increases. For

the two gene model (E(NK) = 1(2:0)) the increasing presence of dominance in the base

population severely skewed the F2 population distribution with even the additive model

not looking normally distributed (Figure A1.3a). As the number of genes in the model

was increased to 10 (E(NK) = 1(10:0)) each of the distributions for the different gene

actions approached normality however, they were still fairly skewed (Figure A1.3c).

Only for the 200 gene model (E(NK) = 1(200:0)) did the distribution for all gene actions

approximate a normal distribution (Figure A1.3e).

Further analysis of the F2 base population genotypic values was conducted to de-

termine the mean ± standard deviation (Table A1.2), skewness coefficient (Table A1.3)

and kurtosis coefficient (Table A1.4) of the F2 population for each gene level. The

additive gene action mean fell halfway between the + partial dominance gene action

(+d) and - partial dominance gene action (-d) model. The additive gene action mean also

fell halfway between the + complete dominance gene action (+d) and - complete

dominance gene action (-d) model (Table A1.2). This was observed for all gene levels.

As the number of genes in the model increased the means and standard deviations

increased.

Table A1.2 Mean ± standard deviation of the F2 population for each gene level and gene action Gene action 2 genes 10 genes 100 genes 200 genes Additive 2.00 ± 1.00 10.00 ± 2.24 99.99 ± 7.07 199.99 ± 10.00 Partial : +d 2.49 ± 1.06 12.50 ± 2.37 125.00 ± 7.50 249.99 ± 10.60 -d 1.50 ± 1.06 7.50 ± 2.37 74.99 ± 7.50 150.00 ± 10.60 Complete: +d 2.99 ± 1.22 14.99 ± 2.74 149.99 ± 8.66 299.99 ± 12.23 -d 0.99 ± 1.22 4.99 ± 2.74 49.99 ± 8.67 99.98 ± 12.24

Page 328: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

294

0 50 100 150 200 250 300 350 4000

2

4

6

8

Gene actionadditive partial complete

Res

pons

e to

Sel

ectio

n

0

2

4

6

8

10

12

14

16

Freq

uenc

y (%

)

Genotypic value

0 5 10 15 20

Freq

uenc

y (%

)

0

5

10

15

20

25

30

Genotypic value

(a) E(NK) = 1(2:0)

Genotypic value0 1 2 3 4 5

Freq

uenc

y (%

)

0

10

20

30

40

50

60

additive partial complete0.0

0.5

1.0

1.5

2.0

Res

pons

e to

Sel

ectio

n

Gene action

additive partial complete0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

Res

pons

e to

Sel

ectio

n

Gene action

(c) E(NK) = 1(10:0)

(b) E(NK) = 1(2:0)

(d) E(NK) = 1(10:0)

(e) E(NK) = 1(200:0) (f) E(NK) = 1(200:0)

AdditivePartial (+d)Complete (+d)Partial (-d)Complete (+d)

Simulation (+d)

Simulation (-d)FalconerComstock

Figure A1.3 F2 population (1000) distribution frequency (1000 runs) as a percentage (a, c, e) and response to selection (b, d, f) for the mass selection strategy. Gene action is defined as additive (m = 1, a = 1, d = 0), partial dominance (m = 1, a = 1, d = 0.5 or -0.5) and com-plete dominance (m = 1, a = 1, d = 1 or -1). For subfigures (a, c, e) as the number of genes increases the distributions approach the expectation of normality. Corresponding response to selection plots (b, d, f) contain Falconer (1996) and Comstock (1996) response to selec-tion prediction equations (on top of each other) and simulation results with standard devia-tion bars for both +d and -d. The red mark (graph b, d = 1) indicates maximum response possible. Therefore, for finite locus models with low gene levels the response to selection prediction equations (with +d) over estimate the response to selection. Note: scaling differs on all graphs

Page 329: Narelle Kruger PhD thesis

APPENDIX 1

295

By estimating the skewness coefficient (k3) for each of the gene numbers and

gene action distributions, the skewness coefficient of the distributions can be quantified

(Table A1.3). A perfectly symmetric distribution is expected to have a skewness

coefficient of zero. The additive models for each gene level had a skewness coefficient

of zero. For each of the dominance models the +d and -d models were similar in

magnitude, however as the level of dominance increased the skewness coefficient

increased. A pattern was also observed with gene number. As the number of genes

increased, the magnitude of the coefficient decreased, confirming a closer approxima-

tion to the normal distribution, as observed in Figure A1.3.

Table A1.3 Skewness coefficient (k3) of the F2 population for each gene level and gene action. A perfectly symmetric distribution has a skewness of zero

Gene action 2 genes 10 genes 50 genes 100 genes 200 genes Additive 0.00 0.00 0.00 0.00 0.00 Partial : +d 0.63 0.28 0.12 0.09 0.06 -d 0.63 0.28 0.12 0.09 0.06 Complete: +d 0.82 0.36 0.16 0.12 0.08 -d 0.82 0.36 0.17 0.12 0.08

The kurtosis coefficient (k4) describes another aspect of the shape of a distribu-

tion compared to the normal distribution. A truly normal distribution has a kurtosis

coefficient k4 = 0. A distribution with a high narrow peak relative to the normal (k4 > 0)

is leptokurtic. A broader than normal peak (k4 < 0) is referred to as platykurtic. At low

gene numbers the distributions had a large negative kurtosis coefficient (Table A1.4)

and appeared platykurtic (Figure A1.3a). As the number of genes increased the kurtosis

coefficient become smaller and closer to the expectation of k4 = 0 for the normal

distribution (Table A1.4, Figure A1.3e).

Table A1.4 Kurtosis coefficient (k4) of the F2 population for each gene level and gene ac-tion. A normal distribution has k4 = 0. A distribution with a high narrow peak relative to the normal (k4 > 0) is leptokurtic. A broader than normal peak (k4 < 0) is referred to as platykurtic

Gene action 2 genes 10 genes 50 genes 100 genes 200 genes Additive -0.50 -0.10 -0.02 0.01 0.00 Partial : +d -0.41 -0.09 -0.0245 0.00 0.00 -d -0.41 -0.08 -0.0222 -0.01 0.00 Complete: +d -0.33 -0.08 -0.0124 0.00 0.00 -d -0.33 -0.07 -0.0087 0.00 0.00

Page 330: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

296

The effect on response to selection when skewness occurs in the positive

direction, i.e. having a negative dominance (-d) value, or a recessive gene in the genetic

model, was also investigated. It was expected that due to the dominance being negative,

most of the heterozygotes will have a low genotypic value (lower than the midpoint

value) therefore creating the positively skewed distribution. The -d models were

symmetrical to the +d models as observed in Figure A1.3a, c and e, and the standard

deviation (Table A1.2), skewness (Table A1.3a) and kurtosis (Table A1.4) coefficient

values were similar. However, when selecting, the higher the genotypic value the better

the genotype is, therefore when the top 20% of the F2 population was selected,

predominantly favourable homozygotes and a lower than expected frequency of

heterozygotes (as opposed to when d is positive) was selected. Therefore, selection was

more effective than expected based on the prediction equation and the response to

selection from the simulation was higher than predicted (Figure A1.3b).

When a positive d value (+d) was used for the simulations, the same genetic

model as the prediction equations was being tested (-d was only tested using simula-

tion). In Figure A1.3b, the red dash indicated the maximum response to selection

possible for two genes and complete dominance (E(NK) = 1(2:0)). This can be calcu-

lated as the F2 population mean is three (m = 1, a = 1, d = 1 ), and the value of AABB

(the favourable genotype) is four, therefore the maximum response possible is one. For

the two gene model with complete dominance the response to selection prediction

equations were predicting values higher than possible (Figure A1.3b). In the presence of

partial dominance, the difference between the prediction equations and simulation was

smaller and with the additive model, the difference was even smaller. As the number of

genes in the model increased, the difference between the prediction equations and

simulation decreased (Figure A1.3b, d and e). Increasing the number of genes in the

model to 200 genes resulted in the F2 population becoming normally distributed (Figure

A1.3e). This resulted in the prediction equations and simulation converging for all gene

actions as the assumption of normality in the F2 population became realistic (Figure

A1.3e and f). From these observations it should be noted that for finite locus models

including small gene numbers and the effects of dominance, there is a strong likelihood

Page 331: Narelle Kruger PhD thesis

APPENDIX 1

297

of observing deviations between the expectations from the prediction equations and the

results of simulation experiments, particularly in the presence of linkage disequilibrium.

The normal distribution assumption can only be considered valid using predic-

tion equations when either an additive genetic model is being used or when the finite

locus model is based on a large number of genes. The larger the number of genes used,

the greater the agreement between the prediction equation theory and the simulation

results. A further consideration in the interpretation of the simulation results in relation

to the prediction equations is the influence of linkage disequilibrium. In the experiment

reported in this Section, linkage disequilibrium resulted from the initiation of the

simulation experiment from parents with the favourable alleles in coupling associations.

As was expected, when the assumptions applied in developing the prediction equations

were not satisfied in the simulation experiment the response values predicted from the

equations diverged from the simulation results.

This simulation study demonstrated that with low gene numbers and dominance

in the presence of coupling phase associations, the extension of prediction equations

from an additive model produced expectations for the response to selection that deviated

from the simulation results.

Page 332: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

298

Page 333: Narelle Kruger PhD thesis

APPENDIX 2

299

APPENDIX 2

ADDITIONAL INFORMATION

ASSOCIATED WITH CHAPTER 5

A2.1 Generating a linkage map and its association with mapping population size For each of the genetic models (Chapter 5, Table 5.1) a comparison between the

specified and estimated per meiosis recombination fraction was conducted. The

specified per meiosis recombination fraction is the value that was entered into the

QUGENE input file. The estimated per meiosis recombination fraction is the genetic

distance between markers on the chromosome as estimated by MAPMAKER/EXP

(Lander et al. 1987).

In the QUGENE input file a per meiosis recombination fraction is specified be-

tween adjacent markers and between a marker and QTL. MAPMAKER/EXP does not

calculate the genetic distance between a marker and QTL as it does not know where

QTL are located, it only calculates the genetic distance between markers. As per

meiosis recombination fractions are not additive, and to account for double crossovers,

the specified value between two markers with a QTL between them is calculated using

Equation (A2.1),

2DF DE EF DE EFc c c c c= + − , (A2.1)

where, c = per meiosis recombination fraction, D is a marker locus, E is a QTL locus

and F is a marker locus. An example of the use of this equation can be shown with

Page 334: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

300

Model 1 (Section A2.1.1) where there is a chromosome with one QTL and two flanking

markers at a per meiosis recombination fraction genetic distance of c = 0.1 between the

QTL and each marker (Chapter 5, Figure 5.2, Model 1). From Equation (A2.1) the

specified genetic distance between the two markers is calculated to be 0.18.

Specific details on each of the models, including chromosome setup can be

found in Chapter 5, Section 5.2. For Model 1, a comparison between the specified and

calculated per meiosis recombination fraction was conducted for a recombinant inbred

line mapping population size of 100 individuals. For Model 2, the comparison was for a

recombinant inbred line mapping population size of 100, 500 and 1000 individuals. For

Model 3, the comparison was for a recombinant inbred line mapping population size of

100, 500 and 1000 individuals for 10 chromosomes. Since each chromosome was

defined to have the same linkage relationship and was simulated as independent linkage

groups, each chromosome was considered a replication that could be averaged across as

they were all identical. For Model 4, the comparison was conducted for a recombinant

inbred line mapping population size of 1000 individuals for 10 chromosomes at each of

the chromosome regions on a chromosome. Once again as each chromosome was

defined to have the same linkage relationship and was simulated as independent linkage

groups, each chromosome was considered a replication that could be averaged across as

they were all identical. A chromosome region refers to the simulated genetic distance

between two markers on a chromosome.

A2.1.1 Model 1 - one chromosome, one QTL, two flanking markers Details relevant to this model can be found in Chapter 5, Section 5.2.1.1. The

genetic map generated by MAPMAKER/EXP for the Model 1 recombinant inbred line

mapping population of 100 individuals was the same as that specified in QUGENE. The

per meiosis recombination fraction between the two markers was estimated to be 0.246

by MAPMAKER/EXP. This value is larger than the specified value of 0.18 therefore,

the mapping population size may not have been large enough for MAPMAKER/EXP to

accurately estimate the per meiosis recombination fraction between the two markers

(Figure A2.1).

Page 335: Narelle Kruger PhD thesis

APPENDIX 2

301

Chr 1Specified 100

Rec

ombi

natio

n fra

ctio

n

0.00

0.05

0.10

0.15

0.20

0.25

Figure A2.1 Per meiosis recombination fraction as simulated in QU-GENE (Specified) and estimated by MAPMAKER/EXP for Model 1 with a recombinant inbred line mapping population size of 100 individuals

A2.1.2 Model 2 - two chromosomes, three QTL per chromo-some, two flanking markers per QTL

Details relevant to this model can be found in Chapter 5, Section 5.2.1.1. Based

on the simulated recombinant inbred line mapping population size of 100 individuals a

linkage map was created. One of the markers (marker 7 on chromosome 2) was

considered to be unlinked by MAPMAKER/EXP (Figure A2.2b, missing pink bar in

last group). Due to marker 7 not being placed in a linkage group, larger mapping

population sizes were examined. With a mapping population size of 500 and 1000

recombinant inbred lines, marker 7 was placed in its correct linkage group (Figure

A2.2b, last group – green and gold bars). Therefore, mapping population size was

important in correctly placing all markers on their specified linkage group (relative to

the map specified in the QUGENE input file). Figure A2.2a (chromosome 1) and Figure

A2.2b (chromosome 2) illustrate the variation the estimated per meiosis recombination

fraction was for each chromosome region on each chromosome, and for the different

mapping population sizes, compared to the specified per meiosis recombination

fraction.

Page 336: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

302

(a) Recombination fraction between markers (Chr 1)

Chromosome regionm-q-m m-m m-q-m m-m m-q-m

Rec

ombi

natio

n fra

ctio

n

0.00

0.05

0.10

0.15

0.20

0.25

Chromosome regionm-q-m m-m m-q-m m-m m-q-m

0.00

0.05

0.10

0.15

0.20

0.25Specified1005001000

(b) Recombination fraction between markers (Chr 2)

Figure A2.2 Per meiosis recombination fraction as simulated in QU-GENE (Specified) and estimated by MAPMAKER/EXP for Model 2 for a recombinant inbred line mapping popu-lation sizes of 100, 500 and 1000. The per meiosis recombination fractions for the different chromosome regions is indicated by an m = marker and q = QTL for chromosome 1 (a) and chromosome 2 (b)

A2.1.3 Model 3 - 10 chromosomes, one QTL per chromosome, two flanking markers per QTL Details relevant to this model can be found in Chapter 5, Section 5.2.1.1. For

Model 3 the correct linkage group was created for each of the recombinant inbred line

mapping population sizes. As this model consisted of 10 chromosomes, each with one

QTL and two flanking markers per QTL, each individual chromosome was graphed

since each chromosome was defined to have the same linkage relationships and was

simulated as independent linkage groups (Figure A2.3a). The estimated per meiosis

recombination fraction was similar to the specified per meiosis recombination fraction

for all mapping population sizes. On average, across the 10 chromosomes within a

mapping population size, the estimated per meiosis recombination fraction slowly

approached the specified per meiosis recombination fraction as mapping population size

increased (Figure A2.3b).

Page 337: Narelle Kruger PhD thesis

APPENDIX 2

303

(a) Recombination fraction between markers for 10 chromosomes

ChromosomeChr 1Chr 2Chr 3Chr 4Chr 5Chr 6Chr 7Chr 8Chr 9

Chr 10

Rec

ombi

natio

n fra

ctio

n

0.00

0.05

0.10

0.15

0.20

0.25Specified1005001000

Population size

Aver

age

reco

mbi

natio

n fra

ctio

n

0.00

0.05

0.10

0.15

0.20

0.25

(b) Average recombination fraction between markers for 10 chromosomes

Figure A2.3 (a) Per meiosis recombination fraction as simulated in QU-GENE (Specified) and estimated by MAPMAKER/EXP for Model 3 for a recombinant inbred line mapping population sizes of 100, 500 and 1000. The per meiosis recombination fraction between the two markers on each of the 10 chromosomes is shown. (b) The average per meiosis recom-bination fraction between the two markers on each chromosome is shown for the three mapping population sizes

A2.1.4 Model 4 - 10 chromosomes, two QTL per chromosome, four flanking markers per QTL

Details relevant to this model can be found in Chapter 5, Section 5.2.1.1. For

Model 4 the correct linkage group was created for a recombinant inbred line mapping

population size of 1000 individuals. As this model consisted of 10 chromosomes, each

with two QTL and four flanking markers per QTL, each chromosome region was

graphed rather than each chromosome, with each chromosome effectively being a

replication since each chromosome was defined to have the same linkage relationships

and were simulated as independent linkage groups (Figure A2.4a). The estimated per

meiosis recombination fraction was variable around the specified per meiosis recombi-

nation fraction for each of the 10 chromosome replications (Figure A2.4a). However, on

average across chromosomes, the estimated per meiosis recombination fraction was

similar to the specified per meiosis recombination fraction (Figure A2.4b).

Page 338: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

304

(a) Recombination fraction between markers for 10 chromosomes

Chromosome regionm-m m-q-m m-m m-m m-m m-q-m m-m

Rec

ombi

natio

n fra

ctio

n

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07 SpecifiedChr 1Chr 2Chr 3Chr 4Chr 5Chr 6Chr 7Chr 8Chr 9Chr 10

Chromosome regionm-m m-q-m m-m m-m m-m m-q-m m-m

Aver

age

reco

mbi

natio

n fra

ctio

n

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07 SpecifiedAverage

(b) Average recombination fraction between markers for 10 chromosomes

Figure A2.4 Per meiosis recombination fraction as simulated in QU-GENE (Specified) and estimated by MAPMAKER/EXP for Model 4 for a recombinant inbred line mapping popu-lation size of 1000. The per meiosis recombination fractions for the different chromosome regions is indicated by an m = marker and q = QTL for all 10 chromosomes per chromo-some region (a). (b) The average per meiosis recombination fraction over the 10 chromo-somes for each of the chromosome regions

A2.1.5 Conclusion From the experiments conducted in this Section it is feasible to remove the need

to conduct the map construction step using MAPMAKER/EXP in the simulation

experiments and allow the linkage map and per meiosis recombination fractions

specified in QUGENE to be used to represent the linkage map for the QTL detection

analysis step. Even though per meiosis recombination fractions may not have been

similar for all of the genetic models, the linkage map was always created correctly for

the larger mapping population size of 1000 individuals (relative to the map specified in

the QUGENE input file). Following these results the majority of the experiments in the

remainder of this thesis did not use MAPMAKER/EXP to create the linkage maps as a

step in the simulation experiments. Instead the maps were automatically generated using

the per meiosis recombination fractions specified in the QUGENE engine input file

(however, as the per meiosis recombination fractions were generally small (i.e. c ≤ 0.1)

they were simply added (Liu 1998) instead of using Equation A2.1), as it was assumed

that all maps were created using a recombinant inbred line mapping population of 1000

individuals. This saved dramatically on the time taken to conduct the simulation

experiments.

Page 339: Narelle Kruger PhD thesis

APPENDIX 2

305

A2.2 QU-GENE input files for QTL detection analysis programs The following figures are a selected section of the QUGENE engine (version

1.0) input files for each of the genetic models in Chapter 5, Section 5.2.

A2.2.1 Model 1 - one chromosome, one QTL, two flanking markers

GN M A D AT L LN K E1 P 1 0 0 0 0 1 1 0 1 0.5 2 1 1 0 1 1 0.100 0 1 0.5 Chromosome 1 3 0 0 0 0 1 0.100 0 1 0.5

Figure A2.5 A section of the QUGENE engine input file showing the marker and QTL gene action setup for a one chromosome, one QTL, two flanking marker genome model (Section 5.2.1.1.1). QTL are highlighted in blue, markers are left black. Heritability for the trait was set as 1. Abbreviations are outlined at end of this Section

A2.2.2 Model 2 - two chromosomes, three QTL per chromo-some, two flanking markers per QTL

GN M A D AT L LN K E1 P 1 0 0 0 0 1 1 0 1 0.5 2 1 1 0 1 1 0.100 0 1 0.5 3 0 0 0 0 1 0.100 0 1 0.5 4 0 0 0 0 1 0.100 0 1 0.5 5 1 1 0 1 1 0.100 0 1 0.5 Chromosome 1 6 0 0 0 0 1 0.100 0 1 0.5 7 0 0 0 0 1 0.100 0 1 0.5 8 1 1 0 1 1 0.100 0 1 0.5 9 0 0 0 0 1 0.100 0 1 0.5 10 0 0 0 0 1 2 0 1 0.5 11 1 1 0 1 1 0.100 0 1 0.5 12 0 0 0 0 1 0.100 0 1 0.5 13 0 0 0 0 1 0.100 0 1 0.5 14 1 1 0 1 1 0.100 0 1 0.5 Chromosome 2 15 0 0 0 0 1 0.100 0 1 0.5 16 0 0 0 0 1 0.100 0 1 0.5 17 1 1 0 1 1 0.100 0 1 0.5 18 0 0 0 0 1 0.100 0 1 0.5

Figure A2.6 A section of the QUGENE engine input file showing the marker and QTL gene action setup for a two chromosome, three QTL per chromosome, two flanking mark-ers / per QTL genome model (Section 5.2.1.1.2). QTL are highlighted in blue, markers are left black. Heritability for the trait was set as 1. Abbreviations are outlined at end of this Section

Page 340: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

306

A2.2.3 Model 3 - 10 chromosomes, one QTL per chromosome, two flanking markers per QTL

GN M A D AT L LN K E1 P 1 0 0 0 0 1 1 0 1 0.5 2 1 1 0 1 1 0.050 0 1 0.5 Chromosome 1 3 0 0 0 0 1 0.050 0 1 0.5 4 0 0 0 0 1 2 0 1 0.5 5 1 1 0 1 1 0.050 0 1 0.5 Chromosome 2 6 0 0 0 0 1 0.050 0 1 0.5 7 0 0 0 0 1 3 0 1 0.5 8 1 1 0 1 1 0.050 0 1 0.5 Chromosome 3 9 0 0 0 0 1 0.050 0 1 0.5 10 0 0 0 0 1 4 0 1 0.5 11 1 1 0 1 1 0.050 0 1 0.5 Chromosome 4 12 0 0 0 0 1 0.050 0 1 0.5 13 0 0 0 0 1 5 0 1 0.5 14 1 1 0 1 1 0.050 0 1 0.5 Chromosome 5 15 0 0 0 0 1 0.050 0 1 0.5 16 0 0 0 0 1 6 0 1 0.5 17 1 1 0 1 1 0.050 0 1 0.5 Chromosome 6 18 0 0 0 0 1 0.050 0 1 0.5 19 0 0 0 0 1 7 0 1 0.5 20 1 1 0 1 1 0.050 0 1 0.5 Chromosome 7 21 0 0 0 0 1 0.050 0 1 0.5 22 0 0 0 0 1 8 0 1 0.5 23 1 1 0 1 1 0.050 0 1 0.5 Chromosome 8 24 0 0 0 0 1 0.050 0 1 0.5 25 0 0 0 0 1 9 0 1 0.5 26 1 1 0 1 1 0.050 0 1 0.5 Chromosome 9 27 0 0 0 0 1 0.050 0 1 0.5 28 0 0 0 0 1 10 0 1 0.5 29 1 1 0 1 1 0.050 0 1 0.5 Chromosome 10 30 0 0 0 0 1 0.050 0 1 0.5

Figure A2.7 A section of the QUGENE engine input file showing the marker and QTL gene action setup for a 10 chromosome, one QTL per chromosome, two flanking markers per QTL genome model (Section 5.2.1.1.3). QTL are highlighted in blue, markers are left black. Heritability for the trait was set as 1. Abbreviations are outlined at end of this Section

Page 341: Narelle Kruger PhD thesis

APPENDIX 2

307

A2.2.4 Model 4 - 10 chromosomes, two QTL per chromosome, four flanking markers per QTL

GN M A D AT L LN K E1 P 1 0 0 0 0 1 1 0 1 0.5 2 0 0 0 0 1 0.050 0 1 0.5 3 1 1 0 1 1 0.025 0 1 0.5 4 0 0 0 0 1 0.025 0 1 0.5 5 0 0 0 0 1 0.050 0 1 0.5 6 0 0 0 0 1 0.050 0 1 0.5 Chromosome 1 7 0 0 0 0 1 0.050 0 1 0.5 8 1 1 0 1 1 0.025 0 1 0.5 9 0 0 0 0 1 0.025 0 1 0.5 10 0 0 0 0 1 0.050 0 1 0.5 11 0 0 0 0 1 2 0 1 0.5 12 0 0 0 0 1 0.050 0 1 0.5 13 1 1 0 1 1 0.025 0 1 0.5 14 0 0 0 0 1 0.025 0 1 0.5 15 0 0 0 0 1 0.050 0 1 0.5 16 0 0 0 0 1 0.050 0 1 0.5 Chromosome 2 17 0 0 0 0 1 0.050 0 1 0.5 18 1 1 0 1 1 0.025 0 1 0.5 19 0 0 0 0 1 0.025 0 1 0.5 20 0 0 0 0 1 0.050 0 1 0.5 21 0 0 0 0 1 3 0 1 0.5 22 0 0 0 0 1 0.050 0 1 0.5 23 1 1 0 1 1 0.025 0 1 0.5 24 0 0 0 0 1 0.025 0 1 0.5 25 0 0 0 0 1 0.050 0 1 0.5 26 0 0 0 0 1 0.050 0 1 0.5 Chromosome 3 27 0 0 0 0 1 0.050 0 1 0.5 28 1 1 0 1 1 0.025 0 1 0.5 29 0 0 0 0 1 0.025 0 1 0.5 30 0 0 0 0 1 0.050 0 1 0.5 31 0 0 0 0 1 4 0 1 0.5 32 0 0 0 0 1 0.050 0 1 0.5 33 1 1 0 1 1 0.025 0 1 0.5 34 0 0 0 0 1 0.025 0 1 0.5 35 0 0 0 0 1 0.050 0 1 0.5 36 0 0 0 0 1 0.050 0 1 0.5 Chromosome 4 37 0 0 0 0 1 0.050 0 1 0.5 38 1 1 0 1 1 0.025 0 1 0.5 39 0 0 0 0 1 0.025 0 1 0.5 40 0 0 0 0 1 0.050 0 1 0.5 41 0 0 0 0 1 5 0 1 0.5 42 0 0 0 0 1 0.050 0 1 0.5 43 1 1 0 1 1 0.025 0 1 0.5 44 0 0 0 0 1 0.025 0 1 0.5 45 0 0 0 0 1 0.050 0 1 0.5 46 0 0 0 0 1 0.050 0 1 0.5 Chromosome 5 47 0 0 0 0 1 0.050 0 1 0.5 48 1 1 0 1 1 0.025 0 1 0.5 49 0 0 0 0 1 0.025 0 1 0.5 50 0 0 0 0 1 0.050 0 1 0.5

Page 342: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

308

51 0 0 0 0 1 6 0 1 0.5 52 0 0 0 0 1 0.050 0 1 0.5 53 1 1 0 1 1 0.025 0 1 0.5 54 0 0 0 0 1 0.025 0 1 0.5 55 0 0 0 0 1 0.050 0 1 0.5 56 0 0 0 0 1 0.050 0 1 0.5 Chromosome 6 57 0 0 0 0 1 0.050 0 1 0.5 58 1 1 0 1 1 0.025 0 1 0.5 59 0 0 0 0 1 0.025 0 1 0.5 60 0 0 0 0 1 0.050 0 1 0.5 61 0 0 0 0 1 7 0 1 0.5 62 0 0 0 0 1 0.050 0 1 0.5 63 1 1 0 1 1 0.025 0 1 0.5 64 0 0 0 0 1 0.025 0 1 0.5 65 0 0 0 0 1 0.050 0 1 0.5 66 0 0 0 0 1 0.050 0 1 0.5 Chromosome 7 67 0 0 0 0 1 0.050 0 1 0.5 68 1 1 0 1 1 0.025 0 1 0.5 69 0 0 0 0 1 0.025 0 1 0.5 70 0 0 0 0 1 0.050 0 1 0.5 71 0 0 0 0 1 8 0 1 0.5 72 0 0 0 0 1 0.050 0 1 0.5 73 1 1 0 1 1 0.025 0 1 0.5 74 0 0 0 0 1 0.025 0 1 0.5 75 0 0 0 0 1 0.050 0 1 0.5 76 0 0 0 0 1 0.050 0 1 0.5 Chromosome 8 77 0 0 0 0 1 0.050 0 1 0.5 78 1 1 0 1 1 0.025 0 1 0.5 79 0 0 0 0 1 0.025 0 1 0.5 80 0 0 0 0 1 0.050 0 1 0.5 81 0 0 0 0 1 9 0 1 0.5 82 0 0 0 0 1 0.050 0 1 0.5 83 1 1 0 1 1 0.025 0 1 0.5 84 0 0 0 0 1 0.025 0 1 0.5 85 0 0 0 0 1 0.050 0 1 0.5 86 0 0 0 0 1 0.050 0 1 0.5 Chromosome 9 87 0 0 0 0 1 0.050 0 1 0.5 88 1 1 0 1 1 0.025 0 1 0.5 89 0 0 0 0 1 0.025 0 1 0.5 90 0 0 0 0 1 0.050 0 1 0.5 91 0 0 0 0 1 10 0 1 0.5 92 0 0 0 0 1 0.050 0 1 0.5 93 1 1 0 1 1 0.025 0 1 0.5 94 0 0 0 0 1 0.025 0 1 0.5 95 0 0 0 0 1 0.050 0 1 0.5 96 0 0 0 0 1 0.050 0 1 0.5 Chromosome 10 97 0 0 0 0 1 0.050 0 1 0.5 98 1 1 0 1 1 0.025 0 1 0.5 99 0 0 0 0 1 0.025 0 1 0.5 100 0 0 0 0 1 0.050 0 1 0.5

Figure A2.8 A section of the QUGENE engine input file showing the marker and QTL gene action setup for a 10 chromosome, two QTL per chromosome, four flanking markers per QTL genome model (Section 5.2.1.1.4). QTL are highlighted in blue, markers are left black. Heritability for the trait was set as 1. Abbreviations are outlined at end of this Section

Page 343: Narelle Kruger PhD thesis

APPENDIX 2

309

Abbreviations GN Gene Number M Midpoint value A Additive effect D Dominance effect AT Indicates which attribute the gene is contributing towards (0 = marker) L Linkage LN Per meiosis recombination fraction between gene n and gene n-1 (whole

number indicates start of a new chromosome) K Epistasis network (0 = no epistasis) E1 Environment 1 gene effect P Starting gene frequency of the favourable allele

Page 344: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

310

Page 345: Narelle Kruger PhD thesis

APPENDIX 3

311

APPENDIX 3

ADDITIONAL INFORMATION

ASSOCIATED WITH CHAPTER 8

A3.1 Number of QTL detected

A number of two-factor interactions were significant (Chapter 8, Table 8.4) for

the number of QTL detected. The heritability × per meiosis recombination fraction (h2×

c) interaction was significant with a heritability of h2 = 1.0 detecting a higher number of

QTL on average than a heritability of h2 = 0.25 for all per meiosis recombination

fractions. For both heritability levels a per meiosis recombination fraction of c = 0.2

detected less QTL on average than a per meiosis recombination fraction of c = 0.1 and c

= 0.01 (Figure A3.1a). There was a significant gene frequency × per meiosis recombina-

tion fraction (GF × c) interaction (Figure A3.1b), where the number of QTL detected

increased as the per meiosis recombination fraction decreased for both starting gene

frequencies. On average more QTL were detected for a starting gene frequency of GF =

0.5 than for a starting gene frequency of GF = 0.1 over all per meiosis recombination

fractions. Heritability had a significant interaction with mapping population size (h2 ×

MP, Figure A3.1c). At the lower heritability of h2 = 0.25, fewer QTL were detected

with the smaller mapping population size of 200 individuals, than a mapping population

size of 500 and 1000 individuals. With a heritability of h2 = 1.0, mapping population

size was of less importance and there was little change in the number of QTL detected

with a change in mapping population size. Heritability also interacted significantly with

gene frequency (GF × h2, Figure A3.1d), where the number of QTL detected was

greater at a heritability of h2 = 1.0 than at h2 = 0.25, with the higher gene frequency of

GF = 0.5.

Page 346: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

312

(a) h2 x c

Recombination fraction

0.01 0.1 0.2

Aver

age

no. o

f QTL

det

ecte

d

0

1

2

3

4

5

6

7

8(b) GF x c

Recombination fraction

0.01 0.1 0.20

1

2

3

4

5

6

7

8

(c) h2 x MP

Mapping population size

200 500 1000

Aver

age

no. o

f QTL

det

ecte

d

0

1

2

3

4

5

6

7

8(d) GF x h2

Heritability

0.25 10

1

2

3

4

5

6

7

8

h2 = 0.25h2 = 1.0

GF = 0.1GF = 0.5

h2 = 0.25h2 = 1.0

GF = 0.1GF = 0.5

Figure A3.1 Significant first-order interactions from the analysis of variance for the num-ber of QTL detected. c = per meiosis recombination fraction, h2 = heritability, GF = gene frequency, MP = mapping population size

A3.2 Response to selection: phenotypic selection, marker selection, and marker-assisted selection

A number of two-factor interactions were significant (Chapter 8, Table 8.5) for

the response to selection. Selection strategy interacted significantly with the starting

gene frequency (SS × GF, Figure A3.2a) and per meiosis recombination fraction (SS ×

c, Figure A3.2b). While there was no change in the rank of the three selection strategies

with a change in starting gene frequency, the marker selection trait mean value was

lower relative to phenotypic selection and marker-assisted selection with a starting gene

frequency of GF = 0.1 in comparison to a starting gene frequency of GF = 0.5 (Figure

A3.2a). As no marker information was used in phenotypic selection, recombination

fraction had no influence on this strategy, but for both marker selection and marker-

assisted selection there was a reduction in the trait mean with a weakening of the per

meiosis recombination fraction (Figure A3.2b).

A range of first-order interactions involving heritability were also significant.

Heritability interacted significantly with selection strategy (SS × h2, Figure A3.2c),

lsd=0.76

lsd=0.76

lsd=0.76

lsd=0.62

Page 347: Narelle Kruger PhD thesis

APPENDIX 3

313

starting gene frequency (GF × h2, Figure A3.2d) and per meiosis recombination fraction

(c × h2, Figure A3.2e). Heritability had no effect on phenotypic selection or marker-

assisted selection, but at a heritability of h2 = 0.25 marker selection had a lower trait

mean value than a heritability of h2 = 1.0 (Figure A3.2c). A starting gene frequency of

GF = 0.5 had a higher trait mean value than a starting gene frequency of GF = 0.1 over

both heritability levels (Figure A3.2d). With a heritability of h2 = 0.25 there was a

significant difference in the trait mean value of the three per meiosis recombination

fractions with c = 0.01 having the highest trait mean value and c = 0.2 having the lowest

trait mean value. With a heritability of h2 = 1.0 there was no significant difference

between a per meiosis recombination fraction of c = 0.01 and c = 0.1, and a per meiosis

recombination fraction of c = 0.2 had the lowest trait mean value (Figure A3.2e).

(c) SS x h2

Heritability0.25 1

0

20

40

60

80

100(b) SS x c

Recombination fraction0.01 0.1 0.2

0

20

40

60

80

100(a) SS x GF

Gene frequency0.1 0.5

Trai

t mea

n va

lue

(%TG

)

0

20

40

60

80

100

(d) GF x h2

Heritability0.25 1

Trai

t mea

n va

lue

(%TG

)

0

20

40

60

80

100(e) c x h2

Heritability0.25 1

0

20

40

60

80

100

PSMSMAS

PSMSMAS

PSMSMAS

GF = 0.1GF= 0.5

c = 0.01c = 0.1c = 0.2

Figure A3.2 Remaining significant first-order interactions from the analysis of variance for the response to selection. Response to selection expressed relative to the maximum poten-tial response to selection (%TG) where TG = target genotype. SS = selection strategy, c = per meiosis recombination fraction, h2 = heritability, GF = gene frequency

The following sets of figures show the trait mean value for phenotypic selection,

marker selection and marker-assisted selection over 10 cycles of selection for a range of

heritability levels, starting gene frequencies and mapping population sizes, for a per

meiosis recombination fraction of c = 0.1. For a starting gene frequency of GF = 0.1

(Figure A3.3) both phenotypic selection and marker-assisted selection achieved the

target genotype by cycle eight. Marker selection rapidly fixed the favourable alleles of

lsd=1.04

lsd=0.85

lsd=1.27 lsd=1.04

lsd=1.04

Page 348: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

314

the QTL detected in the mapping study by cycle two. Marker-assisted selection had a

higher trait mean value than marker selection over all cycles of selection and a higher

trait mean value than phenotypic selection over the first seven to eight cycles of

selection. The main impact of heritability was for the marker selection strategy where a

heritability of h2 = 1.0 gave a 4% higher trait mean value than a heritability of h2 = 0.25,

with a mapping population size of 200 individuals.

(a) 1(10:0) h2=0.25, MP=200

Cycles0 2 4 6 8 10

Trai

t mea

n va

lue

(%TG

)

0

20

40

60

80

100PSMSMAS

(b) 1(10:0) h2=0.25, MP=500

Cycles0 2 4 6 8 10

0

20

40

60

80

100(c) 1(10:0) h2=0.25, MP=1000

Cycles0 2 4 6 8 10

0

20

40

60

80

100

(d) 1(10:0) h2=1.0, MP=200

Cycles0 2 4 6 8 10

Trai

t mea

n va

lue

(%TG

)

0

20

40

60

80

100(e) 1(10:0) h2=1.0, MP=500

Cycles0 2 4 6 8 10

0

20

40

60

80

100(f) 1(10:0) h2=1.0, MP=1000

Cycles0 2 4 6 8 10

0

20

40

60

80

100

GF = 0.1, c = 0.1

Figure A3.3 Response to selection expressed as percentage of target genotype (average of the five bi-parental mapping population replicates) for phenotypic selection (PS), marker selection (MS) and marker-assisted selection (MAS) over 10 cycles of the Germplasm En-hancement Program. E(NK) = 1(10:0), GF = 0.1, h2 = 0.25 (a-c) and h2 = 1.0 (d-f), c = 0.1, and three mapping population sizes (MP = 200, 500, 1000). TG = target genotype

With an increase in the starting gene frequency to GF = 0.5 in the base popula-

tion from which the 10 parents were drawn, there was an increase in the response to

selection (Figure A3.4) over the case where the starting gene frequency was GF = 0.1

(Figure A3.3). A higher favourable allele frequency in the base population of GF = 0.5,

resulted in a higher trait mean value at cycle zero compared to the starting gene

frequency of GF = 0.1.

Page 349: Narelle Kruger PhD thesis

APPENDIX 3

315

(a) 1(10:0) h2=0.25, MP=200

Cycles0 2 4 6 8 10

Trai

t mea

n va

lue

(%TG

)

0

20

40

60

80

100

PSMSMAS

(b) 1(10:0) h2=0.25, MP=500

Cycles0 2 4 6 8 10

0

20

40

60

80

100(c) 1(10:0) h2=0.25, MP=1000

Cycles0 2 4 6 8 10

0

20

40

60

80

100

(d) 1(10:0) h2=1.0, MP=200

Cycles0 2 4 6 8 10

Trai

t mea

n va

lue

(%TG

)

0

20

40

60

80

100(e) 1(10:0) h2=1.0, MP=500

Cycles0 2 4 6 8 10

0

20

40

60

80

100(f) 1(10:0) h2=1.0, MP=1000

Cycles0 2 4 6 8 10

0

20

40

60

80

100

GF = 0.5, c = 0.1

Figure A3.4 Response to selection expressed as percentage of target genotype (average of the five bi-parental mapping population replicates) for phenotypic selection (PS), marker selection (MS) and marker-assisted selection (MAS) over 10 cycles of the Germplasm En-hancement Program. E(NK) = 1(10:0), GF = 0.5, h2 = 0.25 (a-c) and h2 = 1.0 (d-f), c = 0.1, and three mapping population sizes (MP = 200, 500, 1000). TG = target genotype

With a per meiosis recombination fraction of c = 0.1, marker-assisted selection

had the fastest increase in trait mean value, with the target genotype being reached in

cycle three or four, and cycle four for phenotypic selection, as opposed to cycle 8 with a

starting gene frequency of GF = 0.1 (Figure A3.3). The trait mean value for marker-

assisted selection and marker selection were slightly lower (7% - 0.5% for marker-

assisted selection and 20% - 1% for marker selection) with the low heritability as not all

of the segregating QTL were detected (Chapter 8, Table 8.3 and Figure A3.4a, b and c).

All QTL were detected for a heritability of h2 = 1.0, resulting in a similar response

being observed for the three selection strategies for the models tested (Figure A3.4d, e

and f).

Page 350: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

316

Page 351: Narelle Kruger PhD thesis

APPENDIX 4

317

APPENDIX 4 ANALYSES OF VARIANCE FOR FACTORS

AFFECTING THE DETECTION OF QTL

AND RESPONSE TO SELECTION

A4.1 Factors affecting QTL segregation and detection An analysis of variance was conducted on the percentage of QTL segregating in

the mapping population (Table A4.1). The model used for this analysis is shown in

Chapter 9 as Equation (9.1). The significant main effects were gene frequency, number

of environment-types and epistatic model.

Table A4.1 Degrees of freedom (DF) and F values shown for per meiosis recombination fraction (c), heritability (h2), number of environment-types (E), epistatic model (K), gene frequency (GF) and first-order interactions affecting the percent of QTL segregating. σ2 = error mean square. * significant value at α = 0.05, F distribution

Source DF F value GF 1 157352.7 * E 3 4.6 * K 3 649.7 * h2 1 0.0 c 1 0.0

GF × E 3 5.2 * GF × K 3 1736.1 * GF × c 1 0.0 GF × h2 1 0.0 E × K 9 13.2 * E × c 3 0.0 E × h2 3 0.0 K × c 3 0.0 K × h2 3 0.0 c × h2 1 0.0 Error 88 σ2 = 0.17 Total 127

Page 352: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

318

Significant first-order interactions for the percentage of QTL segregating in the

mapping population included the starting gene frequency × number of environment-

types (GF × E) interaction. While this interaction was declared significant, graphical

analysis showed that the differences were extremely small. Therefore, on average each

level of number of environment-types responded similarly within each gene frequency.

The percent of QTL segregating for each number of environment-types was higher with

a starting gene frequency of GF = 0.5 than GF = 0.1 (Figure A4.1a). For the starting

gene frequency × epistasis level (GF × K) interaction (Figure A4.1b), similar to the

starting GF × E interaction, an increase in the percent of QTL segregating occurred as

starting gene frequency increased. There was a consistent ranking across both starting

gene frequencies with epistasis level K = 1 having the highest percent of QTL segregat-

ing, followed by K = 2 and K = 5. There was however, a change in the ranking of

epistasis level K = 0 over the two starting gene frequencies relative to the remaining

epistasis levels. This change in the ranking resulted in epistasis level K = 0 having the

lowest percent of QTL segregating for a starting gene frequency of GF = 0.1 and the

highest percent of QTL segregating for a starting gene frequency of GF = 0.5. While

declared significant, the epistasis level × number of environment-types (K × E)

interaction illustrated how each epistasis level had approximately the same percent of

QTL segregating within each number of environment-types (Figure A4.1c). The ranking

at each number of environment-types for each level of epistasis for percent of QTL

segregating was epistasis level K = 1 > K = 0 > K = 2 > K = 5.

(a) GF x E

No. environment-types1 2 5 10

Perc

ent o

f QTL

seg

rega

ting

0

10

20

30

40

50

60

70 (b) GF x K

Epistasis level0 1 2 5

0

10

20

30

40

50

60

70(c) K x E

No. environment-types1 2 5 10

0

10

20

30

40

50

60

70GF = 0.1GF = 0.5

GF = 0.1GF = 0.5

K = 0K = 1K = 2K = 5

Figure A4.1 Significant first-order interactions from the analysis of variance for the percent of QTL segregating. GF = starting gene frequency, K = epistasis level, and E = number of environment-types

lsd=0.29 lsd=0.29 lsd=0.41

Page 353: Narelle Kruger PhD thesis

APPENDIX 4

319

An analysis of variance was conducted on the percentage of QTL detected in the

mapping population (Table A4.2). The model used for this analysis is shown in Chapter

9 as Equation (9.1). All main effects were significant (p < 0.05, Table A4.2).

Table A4.2 Degrees of freedom (DF) and F values shown for per meiosis recombination fraction (c), heritability (h2), number of environment-types (E), epistatic model (K), gene frequency (GF) and first-order interactions affecting the percent of QTL detected. σ2 = error mean square. * significant value at α = 0.05, F distribution

Source DF F value GF 1 4862.2 * E 3 23.7 * K 3 111.7 * h2 1 284.1 * c 1 31.8 *

GF × E 3 5.2 * GF × K 3 116.6 * GF × c 1 7.5 * GF × h2 1 46.8 * E × K 9 2.1 * E × c 3 0.1 E × h2 3 21.0 * K × c 3 0.2 K × h2 3 43.5 c × h2 1 0.0 Error 88 σ2 = 0.17 Total 127

There was a number of significant first-order interactions that affected the

percent of QTL detected (Table A4.2). For the starting gene frequency × number

of environment-types (GF ×E) interaction, there was no significant difference

between E = 5 and E = 10 environment-types, or between E = 1 and E = 2

environment-types for a starting gene frequency of GF = 0.1 (Figure A4.2a). There

was also no significant difference between E = 1 and E = 2 environment-types for

a starting gene frequency of GF = 0.5. There was a significant interaction for the

per meiosis recombination fraction × starting gene frequency (c × GF) interaction

(Figure A4.2b) and the heritability × starting gene frequency × (h2 × GF) interac-

tion (Figure A4.2c). There was a significant difference between all epistasis levels

and number of environment-types for the epistasis × number of environment-types

interaction (Figure A4.2d).

Page 354: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

320

(a) GF x E

No. environment-types1 2 5 10

Per

cent

of Q

TL d

etec

ted

0

10

20

30

40

50

60(b) c x GF

Gene frequency0.1 0.5

0

10

20

30

40

50

60

(c) h2 x GF

Gene frequency0.1 0.5

Per

cent

of Q

TL d

etec

ted

0

10

20

30

40

50

60(d) K x E

No. environment-types1 2 5 10

0

10

20

30

40

50

60

GF = 0.1GF = 0.5

c = 0.05c = 0.1

h2 = 0.1h2 = 1.0

K = 0K = 1K = 2K = 5

Figure A4.2 Significant first-order interactions from the analysis of variance for the percent of QTL detected. All effect levels were significantly different except for those indicated by the same letter. GF = starting gene frequency, K = epistasis level, E = number of environ-ment-types, c = per meiosis recombination fraction, and h2 = heritability

An analysis of variance was conducted on the percentage of QTL detected of

those segregating in the mapping population (Table A4.3). The model used for this

analysis is shown as Equation (9.1) in Chapter 9. All main effects were significant (p <

0.05, Table A4.3).

There were significant first-order interactions that affected the percent of QTL

detected of those segregating (Table A4.3). There was a significant starting gene

frequency × epistasis level (GF × K) interaction (Figure A4.3a). For this interaction

there was a re-ranking of epistatic level K = 5 relative to K = 0, K = 1, and K = 2 for the

percent of QTL detected of those segregating. At a starting gene frequency of GF = 0.1

there was no difference in the percent of QTL detected of those segregating for epistatic

levels K = 1, K = 2, and K = 5, with K = 0 having the lowest percent of QTL detected of

those segregating. With an increase in the starting gene frequency to GF = 0.5, all

epistatic levels were significantly different with epistatic level K = 5 having the lowest

percent of QTL detected of those segregating (Figure A4.3a). There was no difference

lsd=1.15 lsd=0.81

lsd=0.81 lsd=1.63

Page 355: Narelle Kruger PhD thesis

APPENDIX 4

321

in the percent of QTL detected of those segregating for epistatic level K = 1 and K = 2

over all numbers of environment-types (Figure A4.3b).

Table A4.3 Degrees of freedom (DF) and F values shown for per meiosis recombination fraction (c), heritability (h2), number of environment-types (E), epistatic model (K), gene frequency (GF) and first-order interactions affecting the percent of QTL detected of those segregating. σ2 = error mean square. * significant value at α = 0.05, F distribution

Source DF F value GF 1 597.2 * E 3 66.5 * K 3 63.4 * h2 1 790.4 * c 1 85.6 *

GF × E 3 0.7 GF × K 3 63.4 * GF × c 1 3.2 GF × h2 1 3.8 E × K 9 6.1 * E × c 3 0.7 E × h2 3 57.0 * K × c 3 0.4 K × h2 3 112.6 * c × h2 1 0.1 Error 88 σ2 = 4.2 Total 127

(a) GF x K

Epistasis level0 1 2 5

Per

cent

of Q

TL d

etec

ted

of t

hose

seg

rega

ting

0

20

40

60

80

100(b) K x E

No. environment-types1 2 5 10

0

20

40

60

80

100GF = 0.1GF = 0.5

K = 0K = 1

K = 2K = 5

Figure A4.3 Significant first-order interactions from the analysis of variance for the percent of QTL detected of those segregating. All effect levels were significantly different except for those indicated by the same letter. GF = starting gene frequency, K = epistasis level and E = number of environment-types

An analysis of variance was conducted on the percentage of QTL detected with

incorrect allele associations (Table A4.4). The model used for this analysis is shown in

Chapter 9 as Equation (9.1). All main effects except for per meiosis recombination

fraction were significant (p < 0.05, Table A4.4).

lsd=1.45 lsd=2.06

Page 356: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

322

Table A4.4 Degrees of freedom (DF) and F values shown for per meiosis recombination fraction (c), heritability (h2), number of environment-types (E), epistatic model (K), gene frequency (GF) and first-order interactions affecting the percent of QTL detected with in-correct marker-QTL allele association. σ2 = error mean square. * significant value at α = 0.05, F distribution

Source DF F value GF 1 773.5 * E 3 51.9 * K 3 6046.1 * h2 1 24.8 * c 1 0.4

GF × E 3 11.7 * GF × K 3 266.2 * GF × c 1 0.0 GF × h2 1 0.2 E × K 9 45.6 * E × c 3 0.1 E × h2 3 5.1 * K × c 3 0.1 K × h2 3 9.8 * c × h2 1 0.0 Error 88 σ2 = 1.9 Total 127

There were a number of significant first-order interactions from the analysis of

variance for the percent of QTL detected with incorrect marker-QTL allele associations

(Table A4.4). The starting gene frequency × epistasis (GF × K) interaction was

significant and all epistasis levels were different for the percentage of incorrect marker-

QTL allele associations at the two gene frequencies (Figure A4.4a). No re-ranking of

epistatic levels occurred across the two starting gene frequencies with epistasis level K

= 5 having the highest percent of QTL detected with incorrect marker-QTL allele

associations followed by K = 2, K = 1, and K = 0 having the lowest percent of QTL

detected with incorrect marker-QTL allele associations (Figure A4.4a). There was a

significant interaction between heritability and epistasis level (h2 × K) for percent of

QTL detected with incorrect marker-QTL allele associations (Figure A4.4b). There was

no difference in the percent of QTL detected with incorrect marker-QTL allele

associations for epistatic level K = 5 or K = 2 across the two heritability levels (Figure

A4.4b). There was a significant interaction between starting gene frequency and the

number of environment-types (GF × E) for percent of QTL detected with incorrect

marker-QTL allele associations (Figure A4.4c). As the gene frequency increased from

GF = 0.1 to GF = 0.5 the percent of QTL detected with incorrect marker-QTL allele

Page 357: Narelle Kruger PhD thesis

APPENDIX 4

323

associations decreased for each of the environment-types. For E = 1, E = 2 and E = 5

environment-types there was no difference in the percent of QTL detected with

incorrect marker-QTL allele associations with a starting gene frequency of GF = 0.1.

All number of environment-types were different with a starting gene frequency of GF =

0.5.

(a) GF x K

Epistasis level0 1 2 5

Per

cent

of Q

TL d

etec

ted

with

IAA

0

10

20

30

40

50

60(b) h2 x K

Epistasis level0 1 2 5

0

10

20

30

40

50

60(c) GF x E

No. environment-types1 2 5 10

0

10

20

30

40

50

60GF = 0.1GF = 0.5

h2 = 0.1h2 = 1.0

GF = 0.1GF = 0.5

a a

b b

a a a

Figure A4.4 Significant first-order interactions from the analysis of variance for the percent of QTL detected with incorrect marker-QTL allele associations. All effect levels were sig-nificantly different except for those indicated by the same letter. GF = starting gene fre-quency, K = epistasis level, E = number of environment-types and h2 = heritability

A4.2 Analysis of response to selection

An analysis of variance was conducted on the response to selection over 10 cy-

cles of selection in the Germplasm Enhancement Program (Table A4.5). The model

used for this analysis is shown in Chapter 9 as Equation (9.2). All main effects were

significant (p < 0.05, Table A4.5).

Table A4.5 Degrees of freedom (DF) and F values shown for per meiosis recombination fraction (c), heritability (h2), number of environment-types (E), epistatic model (K), gene frequency (GF), population type (PT), selection strategy (SS), cycles (cyc) and first-order interactions affecting the response to selection over 10 cycles of selection. σ2 = error mean square. * significant value at α = 0.05, F distribution

Source DF F value E 3 2300.2 * K 3 93.8 * c 1 3.9 * h2 1 2432.7 * GF 1 13156.5 * PT 1 2303.5 * SS 2 11876.2 * cyc 10 3543.6 *

E × K 9 21.0 * E × c 3 0.0 E × h2 3 40.6 * E × GF 3 37.9 * E × PT 3 14.2 *

lsd=0.96 lsd=0.96 lsd=0.96

Page 358: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

324

E × SS 6 136.4 * E × cyc 30 28.1 * K × c 3 1.1 K × h2 3 356.2 * K × GF 3 6080.5 * K × PT 3 21.4 * K × SS 6 57.3 * K ×cyc 30 100.1 * c × h2 1 0.3 c × GF 1 0.1 c × PT 1 0.4 c × SS 2 6.2 * c × cyc 10 0.1 h2 ×GF 1 128.2 * h2 × PT 1 110.9 * h2 × SS 2 506.2 * h2 × cyc 10 36.4 * GF ×PT 1 0.1 GF × SS 2 0.4 GF × cyc 10 22.1 * PT ×SS 2 225.9 *

PT × cyc 10 42.4 * SS × cyc 20 399.3 *

Error 8246 σ2 = 19.4 Total 8447

The remaining significant first-order interactions not presented in Chapter 9,

Figure 9.13 are presented here as Figure A4.5. Many of the first-order interactions for

the trait mean value over 10 cycles of selection in the Germplasm Enhancement

Program were significant (Table A4.5). For the epistasis × number of environment-

types (K × E) interaction, the trait mean value decreased as the number of environment-

types increased (Figure A4.5a). There was a significant difference between all heritabil-

ity levels and number of environment-types for the heritability × number of environ-

ment-types (h2 × E) interaction, with the trait mean value decreasing as the number of

environment-types increased for both heritability levels (Figure A4.5b). For the starting

gene frequency × number of environment-types (GF × E) interaction, the trait mean

value decreased as the number of environment-types increased (Figure A4.5c) for both

starting gene frequencies. There was a significant difference between all heritability

levels and epistasis levels for the heritability × epistasis (h2 × K) interaction, with the

trait mean value seeming to increases as the level of epistasis increased for both

heritability levels (Figure A4.5d). For the starting gene frequency × epistasis (GF × K)

interaction, the trait mean value decreased as the level of epistasis increased (Figure

Page 359: Narelle Kruger PhD thesis

APPENDIX 4

325

A4.5e) for a starting gene frequency of GF = 0.5 and increased for a starting gene

frequency of GF = 0.1. There was a significant difference between all selection

strategies and per meiosis recombination fraction for the selection strategy × per

meiosis recombination fraction (SS × c) interaction, with the trait mean value being

lowest for the marker selection strategy, and highest for the marker-assisted selection

for both per meiosis recombination fractions (Figure A4.5f).

For the starting gene frequency × heritability (GF × h2) interaction, the trait

mean value decreased as the starting gene frequency decreased (Figure A4.5g). There

was a significant difference between all selection strategies and heritability levels for

the selection strategy × heritability (SS × h2) interaction, with the trait mean value being

lowest for the marker selection strategy, and highest for the marker-assisted selection

for both heritability levels (Figure A4.5h). There was also a significant difference

between both population types and heritability levels for the population type × heritabil-

ity (PT × h2) interaction, with the trait mean value being lowest for the S1 families, and

highest for the DH lines for both heritability levels (Figure A4.5i). For the heritability

by cycles (h2 × cycles) interaction, the higher heritability had a higher trait mean value

than the lower heritability over all cycles (Figure A4.5j). For the starting gene frequency

by cycles (GF × cycles) interaction, the larger starting gene frequency had a higher trait

mean value than the lower starting gene frequency over all cycles (Figure A4.5k). For

the selection strategy × cycle (SS × cycle) interaction there was no significant difference

at cycle zero between the three selection strategies (Figure A4.5l). The trait mean value

for the three strategies thereafter changed with cycles. For both phenotypic selection

and marker-assisted selection there was an increase in the trait mean value across all 10

cycles. Initially marker-assisted selection resulted in a greater rate of increase than

phenotypic selection and marker selection. The marker selection strategy trait mean

value increased till cycle two, after which there was no further increase in the trait mean

value. Thus, it was inferred that the effect of the markers in the marker-assisted

selection strategy also occurred predominantly in the early cycles of selection, In the

long-term, phenotypic selection produced a comparable response to marker-assisted

selection.

Page 360: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

326

(a) K x E

No. environment-types

1 2 5 10

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100(b) h2 x E

No. environment-types

1 2 5 100

20

40

60

80

100(c) GF x E

No. environment-types

1 2 5 100

20

40

60

80

100

(d) h2 x K

Epistasis level

0 1 2 5

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100(e) GF x K

Epistasis level

0 1 2 50

20

40

60

80

100 (f) SS x c

Recombination fraction

0.05 0.10

20

40

60

80

100

(h) SS x h2

Heritability

0.1 10

20

40

60

80

100(g) GF x h2

Heritability

0.1 1

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100(i) PT x h2

Heritability

0.1 10

20

40

60

80

100

(k) GF x cycles

Cycles

0 1 2 3 4 5 6 7 8 9 100

20

40

60

80

100(j) h2 x cycles

Cycles

0 1 2 3 4 5 6 7 8 9 10

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100

K = 0K = 1K = 2K = 5

h2 = 0.1h2 = 1.0

GF = 0.1GF = 0.5

h2 = 0.1h2 = 1.0

GF = 0.1GF = 0.5

PSMSMAS

GF = 0.1GF = 0.5

PSMSMAS

S1DH

GF = 0.1GF = 0.5

h2 = 0.1h2 = 1.0

(l) SS x cycle

Cycle

0 1 2 3 4 5 6 7 8 9 100

20

40

60

80

100

(m) PT x cycle

Cycle

0 1 2 3 4 5 6 7 8 9 100

20

40

60

80

100(n) E x cycle

Cycle

0 1 2 3 4 5 6 7 8 9 100

20

40

60

80

100(o) PT x K

Epistasis level

0 1 2 50

20

40

60

80

100

(p) PT x E

No. environment-types

1 2 5 10

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100

PSMSMAS

S1DH

E = 1E = 2E = 5E = 10

S1DH

S1DH

Trai

t mea

n va

lue

(% o

f TG

)

Figure A4.5 Remaining significant first-order interactions from the analysis of variance conducted over 10 cycles of the Germplasm Enhancement Program (Table A4.5). GF = starting gene frequency, K = epistasis level, E = number of environment-types, h2 = herita-bility, SS = selection strategy, PT = population type

lsd=0.54 lsd=0.38 lsd=0.38

lsd=0.38 lsd=0.38 lsd=0.33

lsd=0.27 lsd=0.33 lsd=0.27

lsd=0.63 lsd=0.63 lsd=0.78

lsd=0.63 lsd=0.89 lsd=0.38

lsd=0.38

Page 361: Narelle Kruger PhD thesis

APPENDIX 4

327

There was a significant population type × cycle (PT × cycle) interaction (Figure

A4.5m), where selection based on DH lines achieved a higher response to selection than

selection on S1 families for all cycles. Increasing the level of G×E interaction by

increasing the number of environment-types in the target population of environments

reduced the trait mean value for the number of environment-types × cycle (E × cycle)

interaction (Figure A4.5n). Over all cycles of selection, one environment-type (i.e. no

G×E interaction) had the highest trait mean value followed by E = 2, E = 5 and E = 10

environment-types (Figure A4.5n). For the population type × epistasis (PT × K)

interaction (Figure A4.5o) and population type × number of environment-types (PT × E)

interaction (Figure A4.5p), DH lines produced a higher trait mean value than S1

families.

An analysis of variance was conducted on the response to selection at cycle five

of the Germplasm Enhancement Program (Table A4.5). The model used for this

analysis is shown in Chapter 9 as Equation (9.3). All main effects were significant (p <

0.05, Table A4.5).

Many of the first-order interactions for the trait mean value at cycle five of se-

lection in the Germplasm Enhancement Program were significant (Table A4.6). There

was a significant difference between all heritability levels and number of environment-

types for the heritability × number of environment-types (h2 × E) interaction, with the

trait mean value decreasing as the number of environment-types increased for both

heritability levels (Figure A4.6a). For the starting gene frequency × number of envi-

ronment-types (GF × E) interaction, the trait mean value decreased as the number of

environment-types increased (Figure A4.6b) for both starting gene frequencies. There

was a significant difference between all heritability levels and epistasis levels for the

heritability × epistasis (h2 × K) interaction, with the trait mean value seeming to

increases as the level of epistasis increased for both heritability levels (Figure A4.6c).

For the starting gene frequency × epistasis (GF × K) interaction, the trait mean value

decreased as the level of epistasis increased (Figure A4.6d) for a starting gene frequency

of GF = 0.5 and increased for a starting gene frequency of GF = 0.1. For the starting

gene frequency × heritability (GF × h2) interaction, the trait mean value decreased as the

Page 362: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

328

starting gene frequency decreased (Figure A4.6e). There was a significant difference

between all selection strategies and heritability levels for the selection strategy ×

heritability (SS × h2) interaction, with the trait mean value being lowest for the marker

selection strategy, and highest for the marker-assisted selection for both heritability

levels (Figure A4.6f). There was also a significant difference between both population

types and heritability levels for the population type × heritability (PT × h2) interaction,

with the trait mean value being lowest for the S1 families, and highest for the DH lines

for both heritability levels (Figure A4.6g).

Table A4.6 Degrees of freedom (DF) and F values shown for per meiosis recombination fraction (c), heritability (h2), number of environment-types (E), epistatic model (K), gene frequency (GF), population type (PT), selection strategy (SS) and first-order interactions affecting the response to selection cycle 5 of the Germplasm Enhancement Program. σ2 = error mean square. * significant value at α = 0.05, F distribution

Source DF F value E 3 353.2 * K 3 20.3 * c 1 0.5 h2 1 489.7 * GF 1 1302.5 * PT 1 366.3 * SS 2 1851.6 *

E × K 9 3.3 * E × c 3 0.0 E × h2 3 6.8 * E × GF 3 5.6 * E × PT 3 0.9 E × SS 6 24.2 * K × c 3 0.1 K × h2 3 62.1 * K × GF 3 621.3 * K × PT 3 3.7 K × SS 6 8.2 * c × h2 1 0.1 c × GF 1 0.0 c × PT 2 0.1 c × SS 2 0.8 h2 ×GF 1 25.5 * h2 × PT 1 18.3 * h2 × SS 2 106.4 * GF ×PT 1 1.7 GF × SS 2 0.3 PT ×SS 2 57.5 * Error 696 σ2 = 16.5 Total 767

Page 363: Narelle Kruger PhD thesis

APPENDIX 4

329

(a) h2 x E

No. environment-types

1 2 5 10

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100(b) GF x E

No. environment-types

1 2 5 100

20

40

60

80

100

(d) GF x K

Epistasis level

0 1 2 5

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100(e) GF x h2

Heritability

0.1 10

20

40

60

80

100

(c) h2 x K

Epistasis level

0 1 2 50

20

40

60

80

100

(f) SS x h2

Heritability

0.1 10

20

40

60

80

100

(g) PT x h2

Heritability

0.1 1

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100

h2 = 0.1h2 = 1.0

GF = 0.1GF = 0.5

h2 = 0.1h2 = 1.0

GF = 0.1GF = 0.5

GF = 0.1GF = 0.5

PSMSMAS

S1

DH

(h) SS x PT

Population type

S1 DH0

20

40

60

80

100

(j) SS x K

Epistasis level

0 1 2 5

Trai

t mea

n va

lue

(% o

f TG

)

0

20

40

60

80

100 (k) SS x E

No. environment-types

1 2 5 100

20

40

60

80

100

PSMSMAS

PSMSMAS

PSMSMAS

(i) K x E

No. environment-types

1 2 5 100

20

40

60

80

100

K = 5

K = 0K = 1K = 2

Figure A4.6 Remaining significant first-order interactions from the analysis of variance conducted at cycle five of the Germplasm Enhancement Program (Table A4.6). GF = start-ing gene frequency, K = epistasis level, E = number of environment-types, h2 = heritability, SS = selection strategy, PT = population type

For the selection strategy × population type (SS × PT) interaction at cycle five

DH-MAS>DH-PS>S1-MAS>S1-PS>DH-MS>S1-MS for trait mean value (Figure

A4.6h). The DH lines always produced a higher trait mean value than S1 families for

each strategy (Figure 9. A4.6h). For the epistasis × number of environment-types (K ×

a a a a

lsd=0.83 lsd=1.01 lsd=1.65

lsd=1.17 lsd=0.83 lsd=1.01

lsd=1.17 lsd=1.17 lsd=1.17

lsd=1.43 lsd=1.43

Page 364: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

330

E) interaction (Figure A4.6i) each epistasis level generally gave a similar response for

each of the environment-type levels. All epistasis levels produced the same trait mean

value with 10 environment-types. For both the selection strategy × epistasis (SS × K)

interaction (Figure A4.6j) and selection strategy × number of environment-types (SS ×

E) interaction (Figure A4.6k), marker-assisted selection had a higher response than

phenotypic selection and marker selection.

Page 365: Narelle Kruger PhD thesis

APPENDIX 4

331

A4.3 Response to selection results The following sets of subfigures illustrate the complete set of genetic models

that were tested in the Chapter 9 experiment. Each set of figures is entitled using the

E(NK) nomenclature, the starting gene frequency (GF) and the heritability (h2). The two

rows of plots within this set of figures represent the two per meiosis recombination

fractions (RF). The first three plots per row of subfigures show the percentage of runs

that contained the percentage of the number of QTL segregating (Possible QTLs (%)),

percentage of QTL detected (Found QTLs (%)), and the percentage of QTL that were

detected that had incorrect marker-QTL allele associations (Incorrect allele id (%)). The

fourth plot is an average of each of these plots and displays the percent of QTL

segregating (Possible), percent of QTL detected (Found), percent of QTL detected of

those segregating (Fnd/Poss), percentage of QTL detected with incorrect marker-QTL

allele associations (Incorrect) and the percentage of QTL detected with incorrect

marker-QTL allele associations from the total number of QTL in the genetic model

(Inc*Fnd). The remaining two plots show the response to selection as a percentage of

the target genotype for phenotypic selection, marker selection and marker-assisted

selection for both S1 families and DH lines under the relevant genetic model.

E(NK) = 1(12:0), GF = 0.1, h2 = 0.1 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Page 366: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

332

E(NK) = 1(12:0), GF = 0.5, h2 = 0.1 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)0-

1010

-20

20-3

030

-40

40-5

050

-60

60-7

070

-80

80-9

090

-100

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 1(12:0), GF = 0.1, h2 = 1.0

RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 1(12:0), GF = 0.5, h2 = 1.0

RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Page 367: Narelle Kruger PhD thesis

APPENDIX 4

333

E(NK) = 1(12:1), GF = 0.1, h2 = 0.1 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 1(12:1), GF = 0.5, h2 = 0.1 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 1(12:1), GF = 0.1, h2 = 1.0

RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Page 368: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

334

E(NK) = 1(12:1), GF = 0.5, h2 = 1.0 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 1(12:2), GF = 0.1, h2 = 0.1 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 1(12:2), GF = 0.5, h2 = 0.1

RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Page 369: Narelle Kruger PhD thesis

APPENDIX 4

335

E(NK) = 1(12:2), GF = 0.1, h2 = 1.0 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 1(12:2), GF = 0.5, h2 = 1.0 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 1(12:5), GF = 0.1, h2 = 0.1

RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Page 370: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

336

E(NK) = 1(12:5), GF = 0.5, h2 = 0.1 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 1(12:5), GF = 0.1, h2 = 1.0 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 1(12:5), GF = 0.5, h2 = 1.0

RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Page 371: Narelle Kruger PhD thesis

APPENDIX 4

337

E(NK) = 2(12:0), GF = 0.1, h2 = 0.1 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 2(12:0), GF = 0.5, h2 = 0.1 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 2(12:0), GF = 0.1, h2 = 1.0

RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Page 372: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

338

E(NK) = 2(12:0), GF = 0.5, h2 = 1.0 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 2(12:1), GF = 0.1, h2 = 0.1 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 2(12:1), GF = 0.5, h2 = 0.1

RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Page 373: Narelle Kruger PhD thesis

APPENDIX 4

339

E(NK) = 2(12:1), GF = 0.1, h2 = 1.0 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 2(12:1), GF = 0.5, h2 = 1.0 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 2(12:2), GF = 0.1, h2 = 0.1

RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Page 374: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

340

E(NK) = 2(12:2), GF = 0.5, h2 = 0.1 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 2(12:2), GF = 0.1, h2 = 1.0 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 2(12:2), GF = 0.5, h2 = 1.0

RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Page 375: Narelle Kruger PhD thesis

APPENDIX 4

341

E(NK) = 2(12:5), GF = 0.1, h2 = 0.1 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 2(12:5), GF = 0.5, h2 = 0.1 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 2(12:5), GF = 0.1, h2 = 1.0

RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Page 376: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

342

E(NK) = 2(12:5), GF = 0.5, h2 = 1.0 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 5(12:0), GF = 0.1, h2 = 0.1 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 5(12:0), GF = 0.5, h2 = 0.1

RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Page 377: Narelle Kruger PhD thesis

APPENDIX 4

343

E(NK) = 5(12:0), GF = 0.1, h2 = 1.0 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 5(12:0), GF = 0.5, h2 = 1.0 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 5(12:1), GF = 0.1, h2 = 0.1

RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Page 378: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

344

E(NK) = 5(12:1), GF = 0.5, h2 = 0.1 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 5(12:1), GF = 0.1, h2 = 1.0 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 5(12:1), GF = 0.5, h2 = 1.0

RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Page 379: Narelle Kruger PhD thesis

APPENDIX 4

345

E(NK) = 5(12:2), GF = 0.1, h2 = 0.1 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 5(12:2), GF = 0.5, h2 = 0.1 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 5(12:2), GF = 0.1, h2 = 1.0

RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Page 380: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

346

E(NK) = 5(12:2), GF = 0.5, h2 = 1.0 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 5(12:5), GF = 0.1, h2 = 0.1 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 5(12:5), GF = 0.5, h2 = 0.1

RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Page 381: Narelle Kruger PhD thesis

APPENDIX 4

347

E(NK) = 5(12:5), GF = 0.1, h2 = 1.0 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 5(12:5), GF = 0.5, h2 = 1.0 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 10(12:0), GF = 0.1, h2 = 0.1

RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Page 382: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

348

E(NK) = 10(12:0), GF = 0.5, h2 = 0.1 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 10(12:0), GF = 0.1, h2 = 1.0 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 10(12:0), GF = 0.5, h2 = 1.0

RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Page 383: Narelle Kruger PhD thesis

APPENDIX 4

349

E(NK) = 10(12:1), GF = 0.1, h2 = 0.1 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 10(12:1), GF = 0.5, h2 = 0.1 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 10(12:1), GF = 0.1, h2 = 1.0

RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Page 384: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

350

E(NK) = 10(12:1), GF = 0.5, h2 = 1.0 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 10(12:2), GF = 0.1, h2 = 0.1 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 10(12:2), GF = 0.5, h2 = 0.1

RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Page 385: Narelle Kruger PhD thesis

APPENDIX 4

351

E(NK) = 10(12:2), GF = 0.1, h2 = 1.0 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 10(12:2), GF = 0.5, h2 = 1.0 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 10(12:5), GF = 0.1, h2 = 0.1

RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Page 386: Narelle Kruger PhD thesis

SIMULATING THE IMPACT OF MARKER-ASSISTED SELECTION IN A WHEAT BREEDING PROGRAM

352

E(NK) = 10(12:5), GF = 0.5, h2 = 0.1 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 10(12:5), GF = 0.1, h2 = 1.0 RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

E(NK) = 10(12:5), GF = 0.5, h2 = 1.0

RF=0.05

RF=0.10Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible QTLs (%)

Possible QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

% R

uns

0

20

40

60

80

100Found QTLs (%)

Found QTLs (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

Incorrect allele id. (%)

Incorrect allele id. (%)

0-10

10-2

020

-30

30-4

040

-50

50-6

060

-70

70-8

080

-90

90-1

00

AverageResponse (S1 family)

0 2 4 6 8 10

Res

pons

e (%

TG)

0

20

40

60

80

100

PSMSMAS

Response (DH lines)

0 2 4 6 8 10

PSMSMAS

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd

Possible

Found

Fnd/Poss

Incorrect

Inc*Fnd