5
Journal of the Science of Food and Agriculture J Sci Food Agric 87:925–929 (2007) Perspective Extreme breeding: Leveraging genomics for crop improvement Siobhan M Brady 1 and Nicholas J Provart 21 Duke University, Department of Biology, Box 91000, Durham, NC, 27708, USA 2 University of Toronto, Department of Cell and Systems Biology, 25 Willcocks Street, Toronto, ON, M5S 3B2, Canada Abstract: The genomic revolution has led to dramatic increases in our understanding of plant biology in the past 10 years, especially in model plant species such as Arabidopsis. The technologies associated with this revolution, such as tilling, array mapping, and association mapping, will see widespread application to crop improvement in the near future. The genes for desirable traits identified through such efforts may be introgressed at an accelerated rate into elite germplasm by marker-assisted breeding. 2007 Society of Chemical Industry Keywords: marker-assisted breeding; ecotilling; array mapping; association mapping; QTL INTRODUCTION Plant breeding is closely linked with the success of the human species. From the earliest domestication of maize, wheat, beans, rice and other agricultural species through to the efforts that in part brought about the green revolution in the 1960s, plant breeding ultimately involves identifying plants that have good genes. Especially in the past 50 years, plant breeders have exerted tremendous effort to create cultivars of plants having an appropriate complement of genes, often sourced from different wild populations. Before the advent of DNA sequence information, the identification of a desirable version of a gene – that is, of a ‘good’ allele – was limited to its physical manifestation as a characteristic that could be scored as a phenotype, such as yield, height, cooking quality, etc. In the last decade or so, a number of scientific milestones have been achieved in plant genetics which have allowed for much more precise identification of such alleles at the molecular level. These milestones include the sequencing of the Arabidopsis, rice and poplar genomes, the generation of expressed sequence tag (EST) databases, the development of microarray technologies, and the availability of extensive mutant collections, molecular markers and numerous recombinant inbred line resources in Arabidopsis, and in rice, maize and other agricultural species. A number of methods have been developed that use genomics to complement standard forward and reverse genetic approaches. Ecotilling, array mapping and association mapping are all methodologies that can assist in the rapid identification of genes and alleles that are responsible for favourable traits. Accelerated introgression using molecular markers can then be used to rapidly introduce a desirable allele into an elite germplasm. This article examines how the enormous amount of genomic information and genomic technologies could be used in the not-too-distant future to advance and accelerate crop improvement programs. MAPPING QUANTITATIVE TRAIT LOCI USING ARRAY AND ASSOCIATION MAPPING A paradigm shift has occurred in the past decade or so that has aided in the identification of genes responsible for a desired trait. The DNA sequence information from the appropriate alleles can then be used to create functional markers, 1,2 which can be used to accelerate their introgression into elite germplasm. We touch on the two main methodologies for genomics-assisted gene identification, array and association mapping, in the following subsections. It should be noted, however, that these methodologies are useful for genotyping or mapping loci associated with a phenotype of interest. Determining the function or necessity of these loci for a particular trait can be more difficult and the degree of difficulty depends in part on the number of genes that affect the trait of interest. Array mapping: Taking QTL mapping to the extreme Quantitative trait locus (QTL) mapping is used to measure and identify loci that contribute to complex patterns of genetic inheritance. QTL mapping is a lengthy process and often involves the construction Correspondence to: Nicholas J Provart, University of Toronto, Department of Cell and Systems Biology, 25 Willcocks Street, Toronto, ON, M5S 3B2, Canada E-mail: [email protected] (Received 28 April 2006; revised version received 9 August 2006; accepted 18 September 2006) Published online 5 February 2007; DOI: 10.1002/jsfa.2763 2007 Society of Chemical Industry. J Sci Food Agric 0022–5142/2007/$30.00

Extreme breeding: Leveraging genomics for crop improvement

Embed Size (px)

Citation preview

Page 1: Extreme breeding: Leveraging genomics for crop improvement

Journal of the Science of Food and Agriculture J Sci Food Agric 87:925–929 (2007)

PerspectiveExtreme breeding: Leveraging genomicsfor crop improvement

Siobhan M Brady1 and Nicholas J Provart2∗1Duke University, Department of Biology, Box 91000, Durham, NC, 27708, USA2University of Toronto, Department of Cell and Systems Biology, 25 Willcocks Street, Toronto, ON, M5S 3B2, Canada

Abstract: The genomic revolution has led to dramatic increases in our understanding of plant biology in the past10 years, especially in model plant species such as Arabidopsis. The technologies associated with this revolution,such as tilling, array mapping, and association mapping, will see widespread application to crop improvement inthe near future. The genes for desirable traits identified through such efforts may be introgressed at an acceleratedrate into elite germplasm by marker-assisted breeding. 2007 Society of Chemical Industry

Keywords: marker-assisted breeding; ecotilling; array mapping; association mapping; QTL

INTRODUCTIONPlant breeding is closely linked with the success ofthe human species. From the earliest domesticationof maize, wheat, beans, rice and other agriculturalspecies through to the efforts that in part broughtabout the green revolution in the 1960s, plantbreeding ultimately involves identifying plants thathave good genes. Especially in the past 50 years,plant breeders have exerted tremendous effort tocreate cultivars of plants having an appropriatecomplement of genes, often sourced from differentwild populations. Before the advent of DNA sequenceinformation, the identification of a desirable versionof a gene – that is, of a ‘good’ allele – was limitedto its physical manifestation as a characteristic thatcould be scored as a phenotype, such as yield, height,cooking quality, etc. In the last decade or so, anumber of scientific milestones have been achievedin plant genetics which have allowed for much moreprecise identification of such alleles at the molecularlevel. These milestones include the sequencing of theArabidopsis, rice and poplar genomes, the generationof expressed sequence tag (EST) databases, thedevelopment of microarray technologies, and theavailability of extensive mutant collections, molecularmarkers and numerous recombinant inbred lineresources in Arabidopsis, and in rice, maize andother agricultural species. A number of methods havebeen developed that use genomics to complementstandard forward and reverse genetic approaches.Ecotilling, array mapping and association mappingare all methodologies that can assist in the rapididentification of genes and alleles that are responsible

for favourable traits. Accelerated introgression usingmolecular markers can then be used to rapidlyintroduce a desirable allele into an elite germplasm.This article examines how the enormous amount ofgenomic information and genomic technologies couldbe used in the not-too-distant future to advance andaccelerate crop improvement programs.

MAPPING QUANTITATIVE TRAIT LOCI USINGARRAY AND ASSOCIATION MAPPINGA paradigm shift has occurred in the past decade or sothat has aided in the identification of genes responsiblefor a desired trait. The DNA sequence informationfrom the appropriate alleles can then be used to createfunctional markers,1,2 which can be used to acceleratetheir introgression into elite germplasm. We touchon the two main methodologies for genomics-assistedgene identification, array and association mapping, inthe following subsections. It should be noted, however,that these methodologies are useful for genotyping ormapping loci associated with a phenotype of interest.Determining the function or necessity of these loci fora particular trait can be more difficult and the degreeof difficulty depends in part on the number of genesthat affect the trait of interest.

Array mapping: Taking QTL mapping to theextremeQuantitative trait locus (QTL) mapping is used tomeasure and identify loci that contribute to complexpatterns of genetic inheritance. QTL mapping is alengthy process and often involves the construction

∗ Correspondence to: Nicholas J Provart, University of Toronto, Department of Cell and Systems Biology, 25 Willcocks Street, Toronto, ON, M5S 3B2, CanadaE-mail: [email protected](Received 28 April 2006; revised version received 9 August 2006; accepted 18 September 2006)Published online 5 February 2007; DOI: 10.1002/jsfa.2763

2007 Society of Chemical Industry. J Sci Food Agric 0022–5142/2007/$30.00

Page 2: Extreme breeding: Leveraging genomics for crop improvement

SM Brady, NJ Provart

and genotyping of recombinant inbred lines. The cou-pling of QTL mapping to microarray hybridizationhas reduced the amount of time and effort required togenotype and map QTL loci. In this approach, namedextreme array mapping (XAM), DNA is isolated frompools of recombinant inbred lines that display anextreme phenotype; this phenotype should representthe tails of continuous phenotype distribution.3,4 TheDNA is then hybridized to microarrays containingoligomers that represent a reference genome of inter-est and many single feature polymorphisms (SFPs) arethen detected. Potential deletions are identified basedon the principle that multiple adjacent SFPs may notbe independent: a single deletion may simultaneouslydisrupt binding to many oligomers. This approach hasbeen used to identify QTLs that are responsible forresponse to light and flowering-time variation.3,5 Formore moderate-effect QTL phenotypes and in caseswhere a phenotype is caused by two or more genes,larger populations or increased selection intensity isrequired.

This approach is also complementary to standardforward genetic methods and has been used to mapethyl methanesulfonate (EMS) mutations in genesinvolved in development.6 Bulk segregant analysis isa simple method that further increases the efficiencyof the mapping process. Segregating F2 populationsof mutants backcrossed to a mapping population areseparated based on phenotype into pools of 50–100individuals.4 The chromosomal region linked to thegene causing the phenotype will be fixed for alter-native alleles between the two pools while unlinkedchromosomes or chromosomal regions will be atapproximately equal frequency in each pool. Theregion of interest will be identified by a differencein allele frequency between the two pools. The DNAfrom each pool is hybridized to a microarray repre-senting a genome of interest. SFPs are identified andanalysed and each polymorphism is scaled so that thedifference between the mutant parent genotype andthe mapping population genotype is 1 and the meanis 0.4 With each SFP on the same scale, homozygousmutant parent and homozygous mapping parent allelefrequency should equal +0.5 and −0.5, respectively,with heterozygous individuals at 0. Any deviation fromthe −0.5 and +0.5 frequency will indicate a genomicregion of interest, containing alleles that can be testedbefore introgression into elite germplasm.

There are currently microarray projects under wayfor a number of crop species including barley, Bras-sica, maize, Medicago trunculata, potato, rice, soybean,tomato and wheat7–9 which utilize a variety of plat-forms from spotted cDNAs to oligonucleotides. Thegenome coverage of these arrays is in turn depen-dent on the availability of genome or EST sequences.With array mapping, the degree of genome coverageon an array directly affects the chances of success-fully identifying polymorphisms in QTL regions or inregions with induced mutations. Adaptation of array

mapping to other species will aid in identifying locithat contribute to complex traits more quickly.

Association mapping: Functional geneidentification by scoring for charactersAn alternative method to mapping QTLs is the useof association mapping, which is frequently employedin studies of human diseases. Association mappingis based on the concept of linkage disequilibrium:the goal is to identify unusually similar regions inthe genomes of individuals who are phenotypicallysimilar for a given character.10 The region of similaritytheoretically contains the gene responsible for thatcharacter. Similarity in the genome is measured usinglinkage to molecular markers, which may be SFPsas described before, or more traditional molecularmarkers, such as Cleaved Amplified PolymorphicSequence (CAPS), Amplified Fragment LengthPolymorphism (AFLP), and Simple Sequence Repeat(SSR) markers etc. An experimental populationcan include a core collection from a gene bank,varieties representing the elite germplasm of abreeding program or inbred lines representing asynthetic outcrossing population.11 In associationmapping, multiple traits can be studied in onepopulation using the same genotypic data, andmapping resolution is increased when comparedto standard linkage mapping because haplotypesharing between unrelated individuals reflects theaction of recombination over a large number ofgenerations.10 A complicating factor in associationmapping studies is population architecture. Since thepopulation is genetically heterogeneous, populationstructure can cause spurious correlations. Selfingspecies like rice or Arabidopsis exhibit substantialpopulation structure and these samples are often lessideal for association mapping. Even still, associationmapping in Arabidopsis has been used to mapknown flowering time and pathogen resistance lociin a sample of 96 accessions for which genome-wide polymorphism data were available.10 In maize,an association mapping population of 302 linesis now available and its population structure hasbeen described.12 Incorporation of this populationstructure into association models will aid in identifyingQTL with small effects in genetically diverse maize.A new mixed-model approach that incorporatesgenomic tools to uncover population structure(Q) and relative kinship (K) has been developedand is able to more efficiently control populationstructure effects.13 These measures are based onsingle nucleotide polymorphisms (SNPs) identifiedfrom a series of maize and teosinte inbred lines usingEST databases and array methods.14–16 For multiplemaize traits, this mixed model approach performedbetter in terms of controlling error in quantitativetrait dissection.13 In the near future, the availabilityof increasingly dense polymorphism data acrossnumerous accessions/varieties for Arabidopsis, maizeand other agricultural species raises the tantalizing

926 J Sci Food Agric 87:925–929 (2007)DOI: 10.1002/jsfa

Page 3: Extreme breeding: Leveraging genomics for crop improvement

Perspective

possibility of being able to identify the precisegene responsible for a trait simply by scoring theaccessions/varieties for the trait of interest, no mappingnecessary.

ECOTILLING: LEVERAGING NATURALVARIATION FOR CROP IMPROVEMENTOnce a gene responsible for a desired trait hasbeen identified, a novel method can be used toidentify a naturally occurring allele that mightdeliver even better results in the field. In ecotilling,the TILLING (targeting local lesions in genomes)method is adapted to natural populations to identifyDNA polymorphisms.17–19 The ecotilling methodhas been applied to 192 natural accessions ofArabidopsis thaliana to identify desired natural allelevariants. DNA from individuals belonging to aspecific geographical accession are pooled togetherand then combined with reference DNA fromthe standard accession, Col-0, whose genome hasbeen fully sequenced. TILLING does not requirea full genome sequence, only sufficient knowledgeto design primers that amplify 1 to 1.6 kb regionsof interest. Fluorescent primers are then used toamplify the target locus, and once amplified, the DNAis denatured and annealed to form heteroduplexes.These amplified regions of interest are incubated withthe CEL1 endonuclease, which enzymatically digestsheteroduplexes at mismatch positions. Although thismethod is generally exploited to identify singlenucleotide polymorphisms, it has also been used todetect deletions as large as 21 bp. Errors are possible,but relatively infrequent. This method of genotypingis cheap and fast and can aid in determining thespectrum of variation in individuals and in geneticmapping. Indeed, current sequencing technology hasvastly increased the speed and scale of TILLINGanalysis.20

How can this ecotilling approach be extended toother crop species that are polyploid or to caseswhere two or more alleles are present? The abilityto detect multiple alleles has been demonstratedusing two CEL1 reactions; in one reaction thereference DNA is not present, which would revealheterozygous or homozygous individuals, and in thesecond reaction, the reference DNA sample is includedas a control.21 High-throughput tilling has also beenapplied to maize to detect induced point mutations(http://genome.purdue.edu/maizetilling/).22 All that isneeded to perform TILLING is sequence informationabout a locus of interest, perhaps gained from studiesin other plant species or by linkage association analysis.An important caveat in using this method in otherplant species does exist. The high frequency ofcommon single nucleotide polymorphism betweengenetically heterogeneous individuals suggests thatthe sequence information gained from TILLINGapproaches must be analysed in the appropriatecontext. What types of mutations are present and how

should they affect the protein of interest? Functionalconfirmation of the effects of these mutations ismost likely necessary. Regardless, the advantagesof the TILLING method when compared to thealternative – full sequencing of multiple genomes – areobvious. When used in a systematic, high-throughputway, TILLING is an extremely useful resource inmolecular breeding.

MARKER-ASSISTED BREEDING: ACCELERATEDINTROGRESSION INTO ELITE GERMPLASMUSING MOLECULAR MARKERSOnce the desired allele of a gene has been identified byecotilling or association mapping, it is a relatively trivialprocess to develop molecular markers, such as CAPSor Polymerase Chain Reaction - Restriction FragmentLength Polymorphism (PCR-RFLP) markers, for thatallele and then to use these and other molecularmarkers across the genome to rapidly introgress theallele into an elite germplasm. Basically, the task athand may be summarized as follows: allow the twoparental genotypes to cross, and then in the progeny,identify those which contain only the desired allele-of-interest from the one parent, with the remaininggenetic material coming from the elite parent. Thechance of identifying the desired recombination eventincreases with the number of progeny screened.Typically, plant breeders will grow as many progeny aspossible in the field or nursery, and then wait for eachplant to reach a specified maturity stage so that thedesired phenotype may be scored, e.g. if the desiredcharacter is some aspect of grain quality, then thebreeder must wait for plants to reach maturity and toproduce seed. These plants in turn are backcrossedto the parent several times to yield an improved elitegermplasm, containing the new allele but with therest of the genetic information, painstakingly piecedtogether over years of breeding, coming from the eliteparent.

The use of molecular markers offers significantsavings in both of these areas. First, it is not necessaryto phenotypically screen the progeny for the desiredcharacter, obviating the need to wait until plantsreach a specific maturity stage. Instead, plants canbe screened at an extremely young age for thosecontaining the correct parental mix of chromosomalregions. Then only those plants that contain thedesired mix are then propagated for subsequent roundsof backcrossing. Thus both the numbers of plantsthat can be screened may be increased, and thelength of time necessary for identification of positivesmay be dramatically decreased, especially in the caseof plants that take a long time to reach maturity.Second, in terms of backcrossing to elite parentallines, the number of backcrosses necessary can besignificantly decreased using molecular markers. Thereason for this comes down to the numbers of plantsthat can be screened. The more plants that can bescreened, the greater the likelihood of identifying

J Sci Food Agric 87:925–929 (2007) 927DOI: 10.1002/jsfa

Page 4: Extreme breeding: Leveraging genomics for crop improvement

SM Brady, NJ Provart

QTL Mapping EMS Mutagenesis

eXtreme ArrayMapping (XAM)

Ecotilling

AcceleratedIntrogression

AssociationMapping

Ecotype2

Ecotype1

Alle

leF

requ

ency

1.0

-1.0 CHROMOSOME

SFP

Individual 1Individual 2

Individual 3

Individual 4

+ PopulationStructure Estimate

BULK SEGREGANT ANALYSISBULK SEGREGANT ANALYSIS

MOLECULAR MARKERSMOLECULAR MARKERS

Natural Accessions

IDENTIFICATIONOF DESIRED

ALLELE

INTROGRESSIONOF DESIREDALLELE INTO

ELITEGERMPLASM

Figure 1. Pipeline illustrating how ecotilling, association mapping and extreme array mapping may be used in the identification of desirable alleles.Subsequent introgression into an elite germplasm is accelerated through the use of molecular markers. See text for details.

one with the appropriate genetic combinations. Witha high enough density of molecular markers, e.g.using whole genome tiling arrays to identify single-feature polymorphisms, it is even possible to identifythe exact recombination break points leading tothe deletion of a gene involved in flowering timein Arabidopsis, FLM, in recombinant inbred lines.5

Although array technologies are expensive, runningat about $US 700 per sample, certainly the futurewill bring reduced prices. And nevertheless, theavailability of even a smallish collection of molecularmarkers scattered across the genome can have largelythe same beneficial effects, slashing the numberof years to the development of an improved elitegermplasm.

SUMMARYIn conclusion, genomic technologies will certainlydramatically alter the field of plant breeding inthe coming decade (Fig. 1). The identificationof candidate genes affecting traits of agriculturalimportance will be accelerated using array andassociation mapping methodologies, as opposedto traditional trial-and-error-based methods. Betteralleles of the genes, which are present in naturalpopulations, will be identified rapidly using ecotilling.Finally, the time necessary for introgression of thesealleles into existing elite germplasms will be reducedby half or more. It should be emphasized that allof these technologies may be employed without riskof consumer backlash, as has been the case with

genetically modified organisms (GMOs): both the SoilAssociation and the Organic Consumers Associationhave endorsed marker-assisted breeding.

REFERENCES1 Andersen JR and Lubberstedt T, Functional markers in plants.

Trends Plant Sci 8:554–560 (2003).2 Varshney RK, Graner A and Sorrells ME, Genomics-assisted

breeding for crop improvement. Trends Plant Sci 10:621–630(2005).

3 Borevitz JO, Array genotyping and mapping, in ArabidopsisProtocols, 2nd edition, ed. by Salinas J and Sanchez-Serrano JJ.Humana Press, Totowa, NJ, pp. 137–145 (2005).

4 Wolyn DJ, Borevitz JO, Loudet O, Schwartz C, MaloofJ, Ecker JR, et al, Light-response quantitative trait lociidentified with composite interval and eXtreme ArrayMapping in Arabidopsis thaliana. Genetics 167:907–917(2004).

5 Werner JD, Borevitz JO, Warthmann N, Trainer GT, Ecker JR,Chory J, et al, Quantitative trait locus mapping and DNAarray hybridization identify an FLM deletion as a cause fornatural flowering-time variation. Proc Natl Acad Sci USA102:2460–2465 (2005).

6 Hazen SP, Borevitz JO, Harmon FG, Pruneda-Paz JL, SchultzTF, Yanovsky MJ, et al, Rapid Array Mapping of circadianclock and developmental mutations in Arabidopsis. PlantPhysiol 138:990–997 (2005).

7 Ewing RM, Kahla AB, Poirot O, Lopez F, Audic S andClaverie J-M, Large-scale statistical analyses of rice ESTsreveal correlated patterns of gene expression. Genome Res9:950–959 (1999).

8 Yamada K, Lim J, Dale JM, Chen H, Shinn P, Palm CJ, et al,Empirical analysis of transcriptional activity in the Arabidopsisgenome. Science 302:842–846 (2003).

9 Rensink WA and Buell CR, Microarray expression profilingresources for plant genomics. Trends Plant Sci 10:603–609(2005).

928 J Sci Food Agric 87:925–929 (2007)DOI: 10.1002/jsfa

Page 5: Extreme breeding: Leveraging genomics for crop improvement

Perspective

10 Aranzana MJ, Kim S, Zhao K, Bakker E, Horton M, Jakob K,et al, Genome-wide association mapping in Arabidopsisidentifies previously known flowering time and pathogenresistance genes. PLoS Genetics 1:531–539 (2005).

11 Breseghello F and Sorrells ME, Association mapping of kernelsize and milling quality in wheat (Triticum aestivum L.)cultivars. Genetics 172:1165–1177 (2006).

12 Flint-Garcia SA, Thuillet A-C, Yu J, Pressoir G, Romero SM,Mitchell SE, et al, Maize association population: a high-resolution platform for quantitative trait locus dissection.Plant J 44:1054–1064 (2005).

13 Yu J, Pressoir G, Briggs WH, Vroh Bi I, Yamasaki M, Doeb-ley JF, et al, A unified mixed-model method for associationmapping that accounts for multiple levels of relatedness. NatGenet 38:203–208 (2006).

14 Jurinke C, van den Boom D, Cantor CR and Koster H, The useof MassARRAY technology for high throughput genotyping.Adv Biochem Eng Biotechnol 77:57–74 (2002).

15 Gardiner J, Schroeder S, Polacco ML, Sanchez-Villeda H,Fang Z, Morgante M, et al, Anchoring 9,371 maize expressedsequence tagged unigenes to the bacterial artificial chromo-some contig map by two-dimensional overgo hybridization.Plant Physiol 134:1317–1326 (2004).

16 Wright SI, Bi IV, Schroeder SG, Yamasaki M, Doebley JF,McMullen MD, et al, The effects of artificial selection onthe maize genome. Science 308:1310–1314 (2005).

17 Comai L, Young K, Till BJ, Reynolds SH, Greene EA, CodomoCA, et al, Efficient discovery of DNA polymorphisms innatural populations by Ecotilling. Plant J 37:778–786(2004).

18 Henikoff S, Till BJ and Comai L, TILLING. Traditional muta-genesis meets functional genomics. Plant Physiol 135:630–636(2004).

19 Till BJ, Burtner C, Comai L and Henikoff S, Mismatch cleav-age by single-strand specific nucleases. Nucl Acids Res32:2632–2641 (2004).

20 Comai L and Henikoff S, TILLING: practical single-nucleotidemutation discovery. Plant J 45:684–694 (2006).

21 Till BJ, Reynolds SH, Greene EA, Codomo CA, Enns LC,Johnson JE, et al, Large-scale discovery of induced pointmutations with high-throughput TILLING. Genome Res13:524–530 (2003).

22 Till B, Reynolds S, Weil C, Springer N, Burtner C, Young K,et al, Discovery of induced point mutations in maize genes byTILLING. BMC Plant Biol 4:12 (2004).

J Sci Food Agric 87:925–929 (2007) 929DOI: 10.1002/jsfa