47
1 Genetic variance partitioning and genome-wide prediction with allele dosage information in autotetraploid potato Jeffrey B. Endelman * , Cari A. Schmitz Carley * , Paul C. Bethke *† , Joseph J. Coombs , Mark E. Clough § , Washington L. da Silva ** , Walter S. De Jong ** , David S. Douches , Curtis M. Frederick * , Kathleen G. Haynes †† , David G. Holm ‡‡ , J. Creighton Miller Jr. §§ , Patricio R. Muñoz *** , Felix M. Navarro * , Richard G. Novy ††† , Jiwan P. Palta * , Gregory A. Porter ‡‡‡ , Kyle T. Rak * , Vidyasagar R. Sathuvalli §§§ , Asunta L. Thompson **** , G. Craig Yencho § * Department of Horticulture, University of Wisconsin, Madison, WI 53706, USA USDA-ARS Vegetable Crops Research Unit, Madison, WI 53706, USA Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI 48824, USA § Department of Horticultural Science, North Carolina State University, Raleigh, NC 27695, USA ** School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA †† USDA-ARS Genetic Improvement of Fruits and Vegetables Laboratory, Beltsville, MD 20705, USA ‡‡ San Luis Valley Research Center, Department of Horticulture and Landscape Architecture, Colorado State University, Center, CO 81125, USA §§ Department of Horticultural Sciences, Texas A&M University, College Station, TX 77843, USA *** Horticulture Sciences Department, University of Florida, Gainsville, FL 32611, USA ††† USDA–ARS Small Grains and Potato Germplasm Research Unit, Aberdeen, ID 83210, USA ‡‡‡ School of Food and Agriculture, University of Maine, Orono, ME 04469, USA §§§ Department of Crop and Soil Science, Oregon State University, Hermiston, OR 97838, USA **** Department of Plant Sciences, North Dakota State University, Fargo, ND 58108, USA Genetics: Early Online, published on March 8, 2018 as 10.1534/genetics.118.300685 Copyright 2018.

Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

1

Geneticvariancepartitioningandgenome-widepredictionwithallele

dosageinformationinautotetraploidpotato

JeffreyB.Endelman*,CariA.SchmitzCarley*,PaulC.Bethke*†,JosephJ.Coombs‡,MarkE.

Clough§,WashingtonL.daSilva**,WalterS.DeJong**,DavidS.Douches‡,CurtisM.

Frederick*,KathleenG.Haynes††,DavidG.Holm‡‡,J.CreightonMillerJr.§§,PatricioR.

Muñoz***,FelixM.Navarro*,RichardG.Novy†††,JiwanP.Palta*,GregoryA.Porter‡‡‡,KyleT.

Rak*,VidyasagarR.Sathuvalli§§§,AsuntaL.Thompson****,G.CraigYencho§

*DepartmentofHorticulture,UniversityofWisconsin,Madison,WI53706,USA

†USDA-ARSVegetableCropsResearchUnit,Madison,WI53706,USA

‡DepartmentofPlant,SoilandMicrobialSciences,MichiganStateUniversity,EastLansing,

MI48824,USA

§DepartmentofHorticulturalScience,NorthCarolinaStateUniversity,Raleigh,NC27695,

USA

**SchoolofIntegrativePlantScience,CornellUniversity,Ithaca,NY14853,USA

††USDA-ARSGeneticImprovementofFruitsandVegetablesLaboratory,Beltsville,MD

20705,USA

‡‡SanLuisValleyResearchCenter,DepartmentofHorticultureandLandscapeArchitecture,

ColoradoStateUniversity,Center,CO81125,USA

§§DepartmentofHorticulturalSciences,TexasA&MUniversity,CollegeStation,TX77843,

USA

***HorticultureSciencesDepartment,UniversityofFlorida,Gainsville,FL32611,USA

†††USDA–ARSSmallGrainsandPotatoGermplasmResearchUnit,Aberdeen,ID83210,USA

‡‡‡SchoolofFoodandAgriculture,UniversityofMaine,Orono,ME04469,USA

§§§DepartmentofCropandSoilScience,OregonStateUniversity,Hermiston,OR97838,USA

****DepartmentofPlantSciences,NorthDakotaStateUniversity,Fargo,ND58108,USA

Genetics: Early Online, published on March 8, 2018 as 10.1534/genetics.118.300685

Copyright 2018.

Page 2: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

2

Runningtitle:Averageeffectsintetraploidpotato

Keywords:tetraploid,non-additiveeffects,genome-wideprediction,potato

Correspondingauthor:

JeffreyB.Endelman

UniversityofWisconsin-Madison

1575LindenDr

Madison,WI53706

Phone:608-250-0754

Email:[email protected]

Page 3: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

3

ABSTRACT

Asoneoftheworld’smostimportantfoodcrops,potato(SolanumtuberosumL.)has

spurredinnovationinautotetraploidgenetics,includingtheuseofSNParraystodetermine

alleledosageatthousandsofmarkers.Bycombininggenotypeandpedigreeinformation

withphenotypedataforeconomicallyimportanttraits,theobjectivesofthisstudywereto

(1)partitionthegeneticvarianceintoadditivevs.non-additivecomponents,and(2)

determinetheaccuracyofgenome-wideprediction.Between2012and2017,atraining

populationof571cloneswasevaluatedfortotalyield,specificgravity,andchipfrycolor.

Genomiccovariancematricesforadditive(G),digenicdominant(D),andadditivex

additiveepistatic(G#G)effectswerecalculatedusing3895markers,andthenumerator

relationshipmatrix(A)wascalculatedfroma13-generationpedigree.Basedonmodelfit

andpredictionaccuracy,mixedmodelanalysiswithGwassuperiortoAforyieldandfry

colorbutnotspecificgravity.Theamountofadditivegeneticvariancecapturedbymarkers

was20%ofthetotalgeneticvarianceforspecificgravity,comparedto45%foryieldand

frycolor.Withinthetrainingpopulation,includingnon-additiveeffectsimprovedaccuracy

and/orbiaswhenpredictingtotalgenotypicvalue,forallthreetraits.WhensixF1

populationswereusedforvalidation,predictionaccuracyrangedfrom0.06to0.63and

wasconsistentlylower(0.13onaverage)withoutalleledosageinformation.Weconclude

thatgenome-widepredictionisfeasibleinpotatoandwillimproveselectionforbreeding

valuegiventhesubstantialamountofnon-additivegeneticvarianceinelitegermplasm.

Page 4: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

4

INTRODUCTION

Cultivatedpotato(SolanumtuberosumL.)isuniqueamongthemajor,globalfood

cropsinthatitisautotetraploidandclonallypropagated.Asof2018thereare12public

breedingprogramsintheUSwithamandatetoreleasevarietiesforcommercial

production,aswellasseveraladditionalprogramswithafocusongermplasm

enhancement.Thevarietydevelopmentprocessbeginswithbotanicalseedfromthesexual

hybridizationofheterozygousclones.Seedlingsaregrowninagreenhouse,andoneor

moretubersfromeachplantareretainedforsubsequentvegetativepropagation.Crossing

andseedlingtuberproductiontake1–2years,dependingonthebreedingprogram,

followedby1–2yearsoffieldselectionbasedprimarilyonvisualassessmentoftuber

appearance,plantmaturity,andyieldcomponents(tubernumberandsize),withsome

post-harvestevaluationforprocessingmarkettypes,suchasfrycolorandspecificgravity.

Quantitativemeasurementofthesetraitsinreplicatedand/ormulti-locationtrialsbegins

infieldyear(FY)threeorfourandcontinuesforseveralyears.Becauseittakes3–4years

toestablishclonesasdisease-freeplantletsinvitroandproducefoundationseed,new

varietiesaretypicallyreleased10–12yearsaftercrossing,bywhichtimedozensoftraits

havebeenevaluated.Thedurationofthepotatobreedingcycle,fromsexualhybridization

toincorporatingnewclonesasparents,isshorterthanthetimetovarietyreleasebutstill

5–7yearsformostUSprograms.Untilnow,USbreedingprogramshaveprimarilyused

phenotypicselectionincombinationwithgeneticmarkersforahandfulofmajorresistance

genes(Lopez-Pardoetal.2013).

Theuseofphenotypemeans(orBLUPs)forparentselectionisnotidealbecausethe

estimatescontainadditiveandnon-additivegeneticeffects,butonlytheformerare

Page 5: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

5

efficientlytransmittedtooffspring(Gallais2003).Anumberofpreviousstudieshave

investigatednon-additivegeneticvarianceinpotatousingfactorialmatingdesignsto

estimategeneralandspecificcombiningabilities(Plaistedetal.1962;Tai1976;Brownand

Caligari1989;Maris1989;Neeleetal.1991;Gopal1998;Bradshawetal.2000).General

combiningability(GCA)isequivalenttothecovariancebetweenhalf-sibs,whichequals

!"V$ +

!!&V$$ +

!'&V( + ⋯forautotetraploidlociinpanmicticandlinkageequilibrium

(Kempthorne1955c;Gallais2003);thesymbolsVa,Vd,andVaaarethegeneticvariancesfor

additive,digenicdominance,andadditivexadditiveepistasiseffects,respectively.Specific

combiningability(SCA)isthecovariancebetweenfull-sibs*!+V$ +

!"V$$ +

+,V( +⋯-minus

twicethecovariancebetweenhalf-sibs,whichleadstoanexpressioncontainingonlynon-

additivegeneticvariances*!&V( +

!.V$$ +⋯-.TheSCA/GCAratiothereforeprovidesan

indicationoftheimportanceofnon-additivegeneticvariance.Althoughawiderangeof

valuesforSCA/GCAisfoundintheaforementionedreferencesforpotato,thegeneral

conclusionisthatnon-additivegeneticvarianceisimportantinmanycontexts.

Forpedigreedpopulations,analternativeapproachtoestimatingadditivegenetic

varianceisviathenumeratorrelationship,orA,matrix.Kerretal.(2012)werethefirstto

publishacomplete,recursivealgorithmforAinautotetraploids,whichhasbeenappliedto

potato(Slateretal.2014)andblueberry(Amadeuetal.2016)populations.Mixedmodel

analysiswithAalsoenablesselectiononadditivevaluescalculatedbybestlinearunbiased

prediction,orBLUP(Henderson1975).Althoughpedigree-BLUPisthecornerstoneof

geneticimprovementforquantitativetraits,themethodhasseverallimitations:(i)it

Page 6: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

6

dependsonaccuratepedigreerecords,(ii)itneglectsgeneticcovariancebetweenfounders,

and(iii)thecovarianceisbasedonexpected,ratherthanrealized,parentalcontribution.

Genomicselection(GS)hasthepotentialtoovercometheselimitationsbyreplacing

Awithagenomiccovariance,orG,matrixestimatedfrommarkers(Bernardo1994;Nejati-

Javaremietal.1997;Habieretal.2007;VanRaden2008)orbyestimatingmarkereffects

directly(Meuwissenetal.2001).TherehavebeenseveralstudiesonGSinautotetraploid

species(Lietal.2015;Annichiaricoetal.2015;Slateretal.2016;Habyarimanaetal.2017;

Sverrisdóttiretal.2017),butnonehaveusednon-additivegenomiccovariancematricesto

partitiongeneticvarianceorpredicttotalgenotypicvalue.Bothtopicsareaddressedinthis

manuscript,buildingonanalogousstudiesatthediploidlevel(Suetal.2012;Xu2013;

Vitezicaetal.2013;Muñozetal.2014;JiangandReif2015)andtheclassicaltheoryof

averageeffectsintetraploids.

THEORY

Inaseriesofpapersin1955(Kempthorne1955a,b,c),whichwerefurtherdistilled

inthemonographAnIntroductiontoGeneticStatistics(Kempthorne1957),Kempthorne

developedthetheoryofaverageeffectsforarbitraryploidy,drawinguponthesame

mathematicalmethodsusedintheanalysisofvarianceforfactorialexperiments.Key

resultsfromthisliteraturearereproducedhere,aswellasdetailsonthederivation

omittedbyKempthorne.

Foranautotetraploidlocusinpanmicticequilibrium,assumingrandombivalent

formation(i.e.,randomchromosomesegregation)andnoinbreeding,thegenotypicvalue

(gijkl)ofgenotypeijkl(eachindexrangesfrom1tothenumberofalleles,andpermutations

Page 7: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

7

oftheindicesaredistinct)canbeorthogonallydecomposedintothepopulationmean(µ)

plusfouradditiveeffects(ai)correspondingtothefourgenes,sixdigenicdominance

effects(bij)forallpairsofgenes,fourtrigenicinteractions(gijk),andonequadrigenicterm

(dijkl):

/0123 ≡ 50123 − 7 = 90 + 91 + 92 + 93 + :01 + :02 + :03 + :12 + :13 + :23 +

<012 + <013 + <023 + <123 + =0123 (1)

Eq.1usesstandardnotationfromtheanalysisoffactorialexperiments,inwhichthe

symbolsdenotetheregressioncoefficients,andtheregressorsareimpliedtobeindicator

variables.Focusingontheadditiveeffects,andgroupingtheotherparametersintoa

residualterm,Eq.1becomesaregressionofgenotypicvalueonalleledosage(Fisher

1941).Theaverageeffectsminimizethesumofsquaredresidualsforthepopulation,which

isequivalenttoasumovergenotypesweightedbygenotypefrequencypijkl(=pipjpkplunder

theassumptionsofthemodel):

>?0123@/0123 − A90 + 91 + 92 + 93BC+

0123

(2)

Takingthederivativewithrespecttotheadditiveeffectforeachalleleandequatingthe

resulttozerogeneratesasetofnormalequations,whichcanbesolved(Supporting

Methods,FileS1)toproducethefollowinglinearconstraint:

0 =>?0900

(3)

whichisidenticaltotheresultfordiploids.SubstitutingEq.3intothenormalequations

(Eq.S3,FileS1),thesolutionfortheadditiveeffectofanallelebecomestheaverage

Page 8: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

8

genotypicvalueofallindividualswiththatallele(multipledosescontributeseparately),

relativetothepopulationmean:

90 =>?1?2?3/0123123

= 5̅0⋅⋅⋅ − 7 (4)

Theresidualsfromtheregressionequationfortheadditiveeffects,whichwedenote

byG0123 ≡ 50123 − A7 + 90 + 91 + 92 + 93B,areknownasthedominancedeviation.In

diploidsthisdeviationuniquelydefinesonedominanceeffectforeachgenotype,butin

tetraploidsthedominancedeviationiscomposedofdigenic,trigenic,andquadrigenic

effects(Eq.1).Thetetraploidsolutionforthedigeniceffectscorrespondstoregressingthe

dominancedeviationonthedosageofpairsofalleles,whichinvolvesminimizingthe

followingsumofsquaredresiduals:

>?0123@G0123 − A:01 + :02 + :03 + :12 + :13 + :23BC+

0123

(5)

Takingthederivativewithrespecttothedigeniceffectforeachallelepair,andequatingthe

resulttozero,generatesasetofnormalequationsthatcanbesolved(SupportingMethods,

FileS1)toproducethefollowinglinearconstraintforanyallelei:

0 =>?2:022

(6)

whichisthesameresultfordiploids.SubstitutingEq.6intothenormalequations(Eq.S6,

FileS1)leadstothesolution

:01 = >?2?3G012323

= 5̅01⋅⋅ − 7 − 90 − 91 (7)

Bynowthepatternisclear,andtheleast-squaressolutionforthetrigeniceffectscanbe

writtenas

Page 9: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

9

<012 = 5̅012⋅ − 7 − 90 − 91 − 92 − :01 − :02 − :12 (8)

Havingsolvedfortheadditive,digenic,andtrigenicterms,thequadrigeniceffectdijklisthe

residualinEq.1.

Thebreedingvalue(BV)ofanindividualisdefinedastwicethemeangenotypic

valueofitsprogenyrelativetothepopulationmean.Underthemodelassumptions,allsix

possiblegenepairsfortetraploidgenotypeijklhaveequalfrequencyinitsgametes,which

inconjunctionwithEq.7leadstothefollowingexpression:

BV0123 = +I!&AJKLM⋅⋅NJKLO⋅⋅NJKLP⋅⋅NJKMO⋅⋅NJKMP⋅⋅NJKOP⋅⋅BQRS

= A90 + 91 + 92 + 93B +!'A:01 + :02 + :03 + :12 + :13 + :23B = T + !

'U

(9)

Eq.9showsthatbreedingvalueequalsthetotaladditivevalue(u)plus1/3ofthetotal

digenicdominance(v),butitisconventionaltorefertotheadditivevalueas“breeding

value”becausethecontributionofdigenicdominancediminishesexponentially:1/3taftert

generations(Gallais2003).Thisisanalogoustothesituationindiploids(andpolyploids)

withregardtoadditivexadditiveepistasis,asitcontributestobreedingvaluewitha

coefficientof½butisgenerallyomittedwhenreferringto"breedingvalue."

MATERIALSANDMETHODS

Trainingpopulation

Phenotypedataforatrainingpopulation(TP)of571roundwhitecloneswas

collectedbetween2012and2017attheUniversityofWisconsin(UW-Madison)Hancock

AgriculturalResearchStation(numberofclonestrialedperyearinTableS1,FileS1).

Between2012and2015,allcloneswereentriesintheNationalChipProcessingTrial

Page 10: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

10

(NCPT),contributedby11publicUSbreedingprograms.In2016and2017,FY3andFY4

selectionsfromtheUW-MadisonbreedingprogramwereincludedinadditiontotheNCPT

clones.TheNCPTusesatwo-tierevaluationsystem,withoneplotperlocationforTier1

clonesandtwoplotsperlocationforTier2clones.FY3cloneswereevaluatedwithasingle

plot,andFY4cloneswereevaluatedwithtwoplotsin2016andoneplotin2017.Allplots

contained15seedpieces,plantedwith30cmin-rowspacingand91cmbetweenrows.

TrialswereplantedinlateAprilandharvestedinearlySeptember,withvinedesiccation2–

3weeksbeforeharvest.

Phenotypedataforthreetraits—yield,specificgravity,andfrycolor—areincluded

inthisstudy.TotalyieldisbasedontheweightofallharvestedtubersandreportedinMg

ha-1.Specificgravitywasdeterminedbywaterdisplacement,using2–3kgoftubersper

plot(Wangetal.2017).FrycolorwasmeasuredinMarchofeachyearafter6monthsof

storage(1monthat12.8°Cforwoundhealing,followedby5monthsat8.9°C),using1mm

slicesfriedfor130sinvegetableoilat182°C.Forthe2012–2014trials,frycolorwas

measuredona1-10visualscale,whileforthe2015–2017trialsitwasmeasuredontheL*

lightnessscaleusingtheD25HunterLabcolorimeter(HunterAssociatesLaboratory,Inc.,

VirginiaUSA).Frycolormeasurementsonthevisualscale(x)wereconvertedtoL*using

theformulaL*=-1.37x+63.7,whichisbasedonalinearregressionanalysisof70clones

phenotypedwithbothmethods.

TPsamplesweregenotypedwitheitherversion1orversion2oftheSolCAPpotato

SNParray,whichhaveincommonasetof8303markersusedforthisstudy(Hamiltonetal.

2011;Felcheretal.2012).Tetraploidgenotypecalls(coded0–4)weremadeusingversion

1.6oftheClusterCallpackage(SchmitzCarleyetal.2017)inR(RDevelopmentCoreTeam

Page 11: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

11

2015),whichcalibratestherelationshipbetweensignalintensityfromthearrayandallele

dosageforeachmarkerbasedonmultipleF1populations.InadditiontotheAtlanticx

Superior,WauseonxLenape,andRioGrandexPremierRussetpopulationsusedby

SchmitzCarleyetal.(2017),twomorecalibrationpopulationswereused:WanetaxPike

(daSilvaetal.2017;n=184)andA06084-1TExCastleRusset(n=245).Default

parameterswereusedexceptformin.train=3,whichrequiredamarkertobecalledinat

least3ofthe5calibrationfamilies.Thecuratedmarkersetcontained3895polymorphic

SNPswith³95%concordanceacrosssamples(FileS2).

ThenumeratorrelationshipmatrixAwascalculatedwithRpackageAGHmatrix

(Amadeuetal.2016),usingpedigreerecordsmaintainedbytheauthorsaswellasapublic

database(vanBerlooetal.2007).Afterremovinguninformativeancestors,therewere185

founders(cloneswithnoparent)and1138totalclonesinthepedigree(FileS3).

Genomiccovariancematrices

IntheTHEORYsection,averageeffectswerederivedforanarbitrarynumberof

alleles.Forbi-allelicSNPs,additionalsimplificationsarepossible.ConsiderallelesBandb

withfrequenciespandq,respectively.InthiscaseEq.3becomes?9V + W9X = 0,which

reducestothefollowingwell-knownformulasinvolving9 ≡ 9V − 9X:

9V = W9

9X = −?9 (10)

IfXdenotesthedosageofB,thenthetotaladditivevalueis

T = Y9V + (4 − Y)9X = (Y − 4?)9 ≡ ]9 (11)

Page 12: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

12

whereWisacenteredgenotypebecause4pisthepopulationmeanofX.Toobtaina

similarlyparsimoniousexpressionforthetotaldigenicdominancev,wefollowtheexample

ofWrickeandWeber(1986)andintroducetheparameter: ≡ :VV − 2:VX + :XX .When

combinedwithEq.6,theresultis(SupportingMethods,FileS1)

:VV = W+:

:VX = −?W:

:XX = ?+:

(12)

U = _6?+ − 3?Y + !+Y(Y − 1)c: ≡ d: (13)

Formixedmodelanalysis,Gijquantifiesthecovariancebetweentheadditivevalues

forclonesiandj,relativetotheadditivegeneticvariance ef+:

g01 = efQ+covk@T0, T1C = efQ+covk@]09,]19C = efQ+]0]1vark[9] = efQ+]0]1ek+ (14)

ThecovarianceinEq.14involvestheexpectationwithrespectto9~r(0, ek+),butthe

additivegeneticvarianceef+isbasedonthetheoryofaverageeffects,inwhichthe

expectationiswithrespecttogenotypes.Torelatethetwovariancecomponents,the

expectationwithrespecttobothparametersisused:

ef+ = sk,t[T+] − sk,t[T]+ = sk[9+]st[]+] − sk[9]+st[]]+ = 4?Wek+ (15)

UponsubstitutingEq.15intoEq.14,andextendingtheanalysistomlociinlinkage

equilibrium,theresultis

g01 =∑ ]02]12v2w!∑ 4?2W2v2w!

⇒ y =zz{

∑ 4?2W22 (16)

ThedigenicdominancematrixDijisdefinedsimilarly,asthecovariancebetween

dominancevaluesrelativetothedominancegeneticvariance,basedontheexpectation

withrespectto:~r(0, e|+):

Page 13: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

13

}01 = e~Q+cov|@U0, U1C = e~Q+cov|@d0:, d1:C = e~Q+e|+d0d1 (17)

Usingthefollowingresultforthedominancegeneticvariance:

e~+ = s|,t[U+] − s|,t[U]+ = e|+st[d+] = 6?+W+e|+ (18)

theDmatrixis

� =ÄÄ{

∑ 6?2+W2+2 (19)

Forthecovariancebetweenadditivexadditiveepistaticeffects,weusedtheHadamard

productoftheGmatrix,denotedbyG#G(Henderson1985;Suetal.2012).

Mixedmodelanalysisofthetrainingpopulationdataset

Stage-wiseanalysisofthemulti-yearTPdatasetwasperformedusingASReml-R

version3(Butleretal.2009)andadiagonalweightmatrixtoaccountforthevarying

precisionoftheestimatesinthefirststage(Smithetal.2001;Damesaetal.2017).Stage

Onewasananalysiswithinyear,includingblockingeffectswhenpresentandmodelingthe

genotypeeffectforeachcloneasfixed.Thecovariancematrix(Å1)forthevectorof

genotypeeffectestimates(ÇÉ1)inyearjwasobtainedfromtheinverseofthecoefficient

matrixofthemixedmodelequations(Henderson1975),whichisreturnedasCfixedinthe

asremlobject.StageTwowasamulti-yearanalysisbasedonthefollowinglinearmodel:

7̂01 = Ö + 50 + G1 + (5G)01 + Ü01 . (20)

InEq.20theparameterfistheintercept,giisarandomeffectforgenotype,yjisafixed

effectforyear,(gy)ijisarandomeffectforthegenotypexyearinteraction,andthevariance

oftheresidualfijis(wij)-1,wherewijisthei-thdiagonalelementofÅ1Q!fromStageOne

Page 14: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

14

(Damesaetal.2017).FileS4containsthegenotypeeffectestimates(7̂01)and

correspondingweights(wij)usedinthemulti-yearanalysis.

Afterfittingabaselinemodelwithindependentgenotypeeffects(var[gi]=Vg),five

geneticmodelswithdifferentcovariancestructuresforgiweretested(Table1).The

varianceofthegenotypexyearinteraction(var[gyij]=Vgy)wasestimatedinthebaseline

modelandconstrainedatthatvaluefortheothermodels.Thisallowedforthepartitioning

ofthegeneticvariance(Vg)intoadditive(Va)andresidual(Vr)geneticcomponentsfor

modelsAandG.ModelsG+GG,G+D,andG+GG+Dinvolvedtheestimationofnon-additive

variancecomponentsfordigenicdominance(Vd)and/oradditivexadditiveepistasis(Vaa).

VariancesarereportedusingthestandardizationproposedbyLegarra(2016)tomake

themcomparable;forcovariancematrixK,theparameterestimateismultipliedbythe

differencebetweenthemeanofthediagonalelementsandthemeanofallelements:

Aáà00 − áà01B.Goodness-of-fitwasassessedbytheAkaikeInformationCriterion(AIC),defined

asthedevianceminustwicethenumberofvarianceparameters(Piepho2009).

Ourobjectivewastocomparehowwellthedifferentcovariancemodelspredicted

thetotalgenotypicvaluegofunobservedclones.EachofthemodelsinTable1hasthe

form5 = â + ä,whereq isasumofaverageeffectsandristheresidualgeneticeffect.

GenomicpredictionswerecalculatedasBLUP[â] ≡ âéfromaStageTwoanalysis(Eq.20)

withoutresponsevalues(7̂)forclonesinthevalidationset,usingthevarianceparameter

estimatesfromASReml-RandcustomscriptstosolveHenderson’smixedmodelequations

(Henderson1975).ThevalidationdatawerecalculatedasBLUP[5] ≡ 5èfromaStageTwo

analysiswithallclones,assumingindependentcloneeffects(i.e.,thebaselinemodel).The

reliability(äJJè+ )ofthevalidationdatawascalculatedfromthepredictionerrorvariance

Page 15: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

15

(PEV)by1 − PEV0/VJforclonei(Clarketal.2012).Toestimatepredictionaccuracy(äíìJ)

frompredictiveability(i.e.,thePearsoncorrelationbetweenthegenomicpredictionsand

validationdata,äíìJè),thelatterwasdividedbythesquare-rootofthemeanreliability(i.e.,

broad-senseheritability)ofthevalidationdata(Dekkers2007).Becausethemean-squared

erroroftheaccuracyestimateworsensasthereliabilityofthevalidationdatadecreases

(Estaghvirouetal.2013),onlyvalidationdatawithreliability³0.6wereused.

Genome-widepredictioninF1populations

Aspartofvariousresearchprojects,sixunselectedF1populations(Table2)were

evaluatedatthesamelocationastheTPduringthesametimeperiod.Populations

W12011,W12012,andW12060wereevaluatedforyieldandspecificgravitywithasingle

plotof12plantsperclonein2015and2016.PopulationsW9817andW10010were

evaluatedforyieldwithtwo8-plantplotsin2013andone20-plantplotin2014(Raketal.

2015).PopulationWxLwasevaluatedwithasingleplotforspecificgravityforfouryears

(2012–2015),butyieldwasonlymeasuredin2014and2015;therewere6plantsperplot

in2012and10plantsperplotin2013–2015(Frederick2017).Phenotypedatawere

analyzedseparatelyforeachF1population,usingalinearmodelwithfixedeffectsforyear

andindependentrandomeffectsforclone.Geneticandresidualvariancecomponentswere

estimatedwithASReml-RandusedtocalculateBLUPsforvalidation.TheBLUPsand

correspondingreliabilities(whichwereusedtoestimateaccuracyfrompredictiveability,

asdescribedabove)areprovidedinFileS5,exceptthatyieldBLUPsforW12011and

W12060wereexcludedbecauseoflowreliability(<0.6).

Page 16: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

16

TheF1populationsweregenotypedusingthesameSNParrayasthetraining

population.Tetraploidgenotypecalls(coded0–4)weremadetwoways:(i)asdescribed

above(usingCC.anypopintheClusterCallpackage),forthesamesetof3895markers

selectedfortheTP,and(ii)usingtheCC.bipopfunction.Polymorphicmarkerswith

identicalcallsforthetwoapproacheswereusedforprediction(FileS6).Genomic

predictionsforeachF1populationwerecalculatedasBLUP[â]fromaStageTwoanalysis

oftheentireTP(Eq.20),usingtheTPvarianceestimateswithmulti-populationcovariance

matrices(“G_New”fromWientjesetal.2017)toaccountfordifferentallelefrequenciesin

theTPvs.F1population.Ifzî andÄîdenotescaledversionsofthematricesdefined

previously:

zî =z

ï∑ 4?2W22Äî =

Äï∑ 6?2+W2+2

(21)

then,usingthesubscripts1and2todenotethedifferentpopulations,thecovariance

matricesare

y = ñzî!zî!{ zî!zî+{

zî+zî!{ zî+zî+{ó � = ñÄ

î!Äî!{ Äî!Äî+{

Äî+Äî!{ Äî+Äî+{ó (22)

Dataavailability

Allmarker,pedigree,andphenotypedataneededtoreproducetheresultsinthis

manuscriptareprovidedinSupplementalFilesS2–S6:

FileS2,Markerdataforthetrainingpopulation(.csv)

FileS3,Pedigreetableforthetrainingpopulation(.csv)

FileS4,ClonemeansbyyearforStageTwoanalysisofthetrainingpopulation(.csv)

FileS5,CloneBLUPsforthesixF1populations(.csv)

Page 17: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

17

FileS6,MarkerdatafortheF1populations(.csv)

RESULTS

Pedigreeinformationwasusedtocalculatethenumeratorrelationship,orA,matrix

foratrainingpopulation(TP)of571clones.Allbut13cloneshadapedigreedepth

(maximumnumberofgenerationsfromafounder)ofatleast7,withamediandepthof10

generations(distributioninFig.S1,FileS1).Forautotetraploids,diagonalelementAiiis

relatedtotheinbreedingcoefficientFi—definedastheprobabilitythattworandomly

chosengenes,sampledwithoutreplacement,areidenticalbydescent—viatheequationAii

=1+3Fi(Gallais2003).ValuesforAiirangedfrom1to1.55,withameanof1.05,sothemost

inbredcloneinthepopulationhadFi=0.18.

IntheG-BLUPmethodofgenomicselection,thecovariancebetweenadditivevalues

isproportionaltoaGmatrixcalculatedfrommarkers(insteadofA).WecalculatedGusing

3895polymorphicmarkersfromthepotatoSNParray,forwhichaccuratealleledosage

informationwasavailable.TheoverallscalingofGissuchthatthemeanofthediagonal

elementsequals1atpanmicticequilibrium,whichisveryneartheobservedvalueof0.99.

Asfurtherconfirmation,theobservedfrequencyofheterozygoteswasinspectedasa

functionofallelefrequencyandfoundtobeincloseagreementwiththeexpectedvalues

underpanmixis(Fig.S2,FileS1).

Ghasbeencalledthe"realizedrelationshipmatrix"becauseitcapturesMendelian

segregationaroundtheexpectedvalueA(Hayesetal.2009).Thisconnectionisthe

motivationforregressionanalysisbetweenGandA(VanRaden2008),showninFigure1

Page 18: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

18

forourpotatodataset.Thedashedlineisthefitofthelinearmodelwhenalloff-diagonal

elementsareused(G=0.66A–0.06,R2=0.41),whichunderestimatedGathighvaluesofA.

Byexcludingverydistantrelationships(A<0.05),themodelfitimprovedoverallandatthe

upperend,asshownbythesolidblackline(G=0.79A–0.09,R2=0.51).Theseresultsare

basedontheassumptionofnodoublereduction,i.e.,thatdiploidgametesdonotcontain

genesfromsisterchromatids,butinpotatotheprobabilityofdoublereductionvariesfrom

0atthecentromeretoashighas0.07atthetelomere(Bourkeetal.2015).Inthecontextof

apolygenictrait,withgenesdistributedacrosstheentirechromosome,theeffectivevalue

ofthedoublereductionparameterisexpectedtobesmall.Whentheprobabilityofdouble

reductionwasincreasedto0.05forcomputingA,thegoodness-of-fitandinterceptforthe

regressionwereunaffectedbuttheslopedecreasedfrom0.79to0.73.

Figure2comparestheoff-diagonalelementsofthedigenicdominancecovariance

matrixDagainstthecorrespondingelementsofG.Forcloserelationships,Gij³0.4,GandD

werehighlycorrelated(r=0.81),butwhenthesepairswereexcludedthecorrelation

droppedto0.08.TheoverallscalingofDissuchthatthemeanofitsdiagonalelements

equals1atpanmicticequilibrium,whichisveryneartheobservedvalueof0.99.

Apotentialconcernwhenusinggenomicrelationshipmatricesisthatestimatesof

geneticvariancemaybetoolowduetoincompletemarker-QTLLD.Yangetal.(2010)

presentedamethodtoassess(andcorrect)thisissue.First,themarkersarerandomly

partitioned,suchthatonehalfrepresentQTLandtheotherhalfmarkers;thenG(orD)is

calculatedusingeachhalfseparately;andfinallyGQTLisregressedontoGmark.Forboththe

GandDmatricesinourdataset,themeanregressioncoefficientafter100iterationswas

Page 19: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

19

1.00(SD0.01),indicatingsufficientmarkerdensityundertheassumptionthatmarkersare

sampledfromthesamedistributionasQTL.

Variancecomponents

Phenotypedataforthreeeconomicallyimportanttraitswereanalyzed:totalyield,

specificgravity(asaproxyfordrymattercontent),andpotatochipfrycoloraftersix

monthsofstorage.Initially,genotypeeffectsweremodeledasindependenttoestimatethe

totalgeneticvariance(Vg)andthevarianceofthegenotypexyearinteraction(Vgy).TheVg

estimatewashigherthanVgyforalltraits,rangingfrom2.7timeshigherforyieldto5.6

timeshigherforspecificgravity(Table3).

Byaddinganotherrandomeffecttothebaselinemodel,withcovariance

proportionaltoAorG,thegeneticvariance(Vg)waspartitionedintoadditive(Va)and

residualgeneticvariance(Vr),thelattercorrespondingtotheindependentcloneeffect

(Table1).BothAandGloweredtheAICcomparedtothebaselinemodelforalltraits,with

theGmatrixproducingabetterfitforyieldandfrycolorvs.theAmatrixforspecific

gravity(Figure3).UsingA,theproportionofgeneticvarianceduetoadditiveeffectswas

0.52forspecificgravity,0.59foryield,and0.76forfrycolor(Figure4).WhenGwasused,

theadditivegeneticvarianceestimateswerereducedby0.12–0.18ofthetotalgenetic

variance,dependingonthetrait.

Forspecificgravityandfrycolor,includingadditivexadditiveepistasisloweredthe

AICcomparedtotheadditiveG-BLUPmodel,butwhendominancewasincludedtheAIC

increased(Fig.3).Forthesetraits,asubstantialamountoftheestimatedadditivevariance

intheGmodelbecameadditivexadditiveepistasisintheG+GGmodel:Vadroppedfrom34

Page 20: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

20

to20%ofVgforspecificgravityandfrom63to45%forfrycolor,with44–51%ofthe

geneticvariancecapturedbyG#G(Fig.4,standarderrorsinTableS2,FileS1).Foryield,

neitherdominancenoradditivexadditiveepistasisimprovedtheAICcomparedtoG-BLUP,

andonly10%ofthegeneticvariancewascapturedbyDorG#Gcomparedto45%forthe

residualcloneeffect.

Predictionaccuracy

Manystudiesuserandomcross-validationtoassesstheaccuracyofgenome-wide

prediction.However,inthecontextofapedigreedbreedingpopulation,thisapproachleads

totrainingsetindividualsthataredescendantsofindividualsinthevalidationset,whichis

unrepresentativeofhowgenomicselectionwillbeusedinpracticeandmayproduce

unrealisticallyhighaccuracies.Toavoidthispitfall,weusedthepedigreedepthmetricto

partitionthepopulationintoasetof168candidatesforselection(depth>=12)andasetof

403clonesancestraltothisgroup(depth<12)asthetrainingset.Theselectioncandidates

werefurthernarrowedbyexcludingcloneswithinsufficientlyreliabledataforvalidation,

leaving54clonesforyield,132clonesforspecificgravity,and49clonesforfrycolor(with

meanreliabilityintherange0.71–0.72foralltraits).

Figure5showstheaccuracy(left)andregressioncoefficient(right)whenusing

eachofthemodelsinTable1topredicttotalgenotypicvalueinthevalidationset.

PredictionaccuracyusingonlytheAmatrixwasjustover0.5fortotalyieldvs.0.4for

specificgravityandfrycolor.ReplacingAwithGimprovedaccuracyforyieldby0.03and

frycolorby0.06butdecreasedtheaccuracyforspecificgravityby0.07,whichisconsistent

withthetrendobservedforAIC.Includingdominanceimprovedyieldaccuracyby0.01,and

Page 21: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

21

includingepistasisimprovedspecificgravityaccuracyby0.05.BasedontheAICresultsfor

frycolor,weexpectedhigheraccuracywiththeG+GGmodel,butthiswasnotobserved.

Includingepistasisreducedpredictionbiasforfrycolorandspecificgravity,with

regressioncoefficientsover0.9fortheG+GGmodelcomparedto0.75–0.78forG-BLUP.

Toinvestigatetheeffectoftrainingpopulationsizeonaccuracy,200random

subsetsoftheTPweretakenatN=100,200,and300clones(Fig.S3,FileS1).Forallthree

traits,predictionaccuracydecreasedastheTPwasreduced.Frycoloraccuracywasthe

mostsensitive,droppingby0.28whenpopulationsizewasreducedfrom403to100,

comparedtoaccuracydecreasesof0.14and0.19foryieldandspecificgravity,respectively.

Wealsodeterminedaccuracywhenusingtheentire(N=571)trainingpopulationto

predictyieldandspecificgravityinsixunselectedF1populations(Table4).TheF1

populationsrangedinsizefrom48to167clones,andthenumberofpolymorphicmarkers

rangedfrom1376to2311(Table2).Severaloftheparentshadlittlepedigreerelationship

totheTPbecausetheywererussetclones,whichisadistinctmarketcategoryfromthe

roundwhite,chipprocessingtype.Predictionaccuraciesrangedfrom0.06to0.63withG-

BLUP,withnodiscernibleconnectionbetweenaccuracyandpedigreerelationshiptothe

TP.TheG+GG+Dmodelperformedverysimilarly,withnodifferenceinaverageaccuracy

(totwodecimalplaces)acrosstheeightcasesinTable4.Toassessthevalueoftetraploid

alleledosage,predictionsweremadewith“diploidized”markerdata(Gdip),inwhichthe

threeheterozygoteswererecodedtobeidentical.TheaccuracyoftheGdipmodelwas

consistentlylower,withanaveragelossof0.13.

Page 22: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

22

DISCUSSION

Thisisthefirststudytoconnecttheclassicaltheoryofgeneticvariancepartitioning

intetraploidswithcovariancematricesconstructedfromgenome-wide,alleledosage

information.Ourresultsarebasedonatwo-stageanalysisinwhichthegenotypeestimate

foreachclonexenvironmentcombinationwascalculatedinStageOneassuming

independenteffects,andinStageTwogenomiccovariancematriceswereused.Wenote

thatnotallreferencestotwo-stageanalysisintheliteratureemploythisconvention;often

asinglegenotypemeanacrossallenvironmentsisestimatedinthefirststage.Inour

datasettherewere849genotypexyearmeansestimatedinStageOneforthe571clones

(FileS4),andthispartialreplicationisexpectedtoimprovetheprecisionofthevariance

componentestimatescomparedtoanalyzingasinglemeanpergenotypeinStageTwo

(Kruijeretal.2015).

Thepartialreplicationacrossyearsalsoallowedforexplicitmodelingofaresidual

geneticeffectwithnocovariancestructure,inadditiontotheadditive,dominance,and

epistaticrandomeffects.Weinterprettheresidualgeneticeffecttoincludehigherorder

non-additiveeffects(e.g.,trigenicdominance,additivexdominanceepistasis)andgenetic

variancenotcapturedbythemarkers.Thelattermightappeartoberuledoutbasedonour

analysisoftheGandDmatricesusingtheYangetal.(2010)method,butthisassumesQTL

aredrawnfromthesamedistributionasthemarkers.Inreality,low-frequencyallelesare

under-representedonthepotato8303SNParrayandareexpectedtocontributeresidual

geneticvariance(Vosetal.2015).TheestimatesforVdandVaaweresensitivetowhether

theresidualgeneticeffectwasincludedinthemodel.Withoutit,Vaaforyieldwasestimated

at40(SE14)Mg2ha-2,whichis45%ofthetotalgeneticvariance(88Mg2ha-2).Whenthe

Page 23: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

23

residualeffectwasincluded,mostofthisvarianceshiftedintoVr.Thisphenomenonandthe

largestandarderrorsoftheestimates(TableS2,FileS1)suggestthatthepartitioningofthe

non-additivegeneticvarianceisuncertain,probablybecauseoflimitedpopulationsize.

Atfirstglance,ourfindingthatmoreofthegeneticvariancewasadditiveforyield

(45%)comparedtospecificgravity(20%)seemsunexpected.Diallelstudieshavetypically

foundtheratiobetweenspecificandgeneralcombiningability(SCA/GCA)tobehigherfor

yield.Taietal.(1976)reportedSCA/GCA=3.8foryieldvs.0.6forspecificgravity,and

Bradshawetal.(2000)reportedSCA/GCA=0.75foryieldvs.0.06forspecificgravity.

Whereastheseearlierstudiesusedunselectedpopulations,theclonesinourtraining

populationhadbeenselectedforhighspecificgravityatleastonce,andinmanycasesfor

2–3years.Specificgravity(whichiscloselycorrelatedwithdrymattercontent)isoneof

themostimportanttraitsforthechipprocessingmarket,andstrongselectionearlyinthe

varietydevelopmentprocessispracticedbecausethetraitshowsrelativelylittlegenotype

xenvironmentinteraction(Table3;Wangetal.2017).Ourresultssuggestthatinthetailof

thephenotypedistributionforspecificgravity,thepartitioningofgeneticvarianceis

shiftedtowardnon-additiveeffects.

Auniquefeatureofourstudycomparedtopreviousreportsofgenomicselectionin

autotetraploidspecies,suchasalfalfa(Lietal.2015;Annichiaricoetal.2015)andpotato

(Habyarimanaetal.2017;Sverrisdóttiretal.2017),wastheuseofSNParraydatarather

thangenotyping-by-sequencing(GBS).AmajorbenefitoftheSNParrayforpolyploidsis

theabilitytoaccuratelydeterminealleledosage(Voorripsetal.2011;SchmitzCarleyetal.

2017),butthecostofthearrayisdeterminedbysalesvolumeandcanbeprohibitively

expensiveforgenomicselectioninsmallbreedingprogramsorminorcrops.GBSmethods

Page 24: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

24

achievelowper-samplecostsbypoolingmanysamplesintoonelibraryforsequencing,but

muchhigherreaddepth(persample-markercombination)isneededforaccurategenotype

assignmentintetraploidscomparedtoheterozygousdiploids.BycomparingGBSwith

KASPmarkersinpotato,Uitdewilligenetal.(2013)recommended60–80Xreaddepthto

differentiatethethreeheterozygousgenotypes,whichagreeswellwiththeoretical

calculations(Endelman,unpublished).Forgenomicselection,apressingquestionis

whetherpayingformoresequencingtoimproveestimatesofalleledosageprovidesa

returnoninvestment,intermsofpredictionaccuracyandultimatelygeneticgain.A

completelygeneralanswermaybeelusiveduetocomplexinteractionsbetweenGBS

method,population,andphenotype,butourfindingthat“diploidization”ofthemarker

dataconsistentlyreducedpredictionaccuracyintheF1populations(by0.13onaverage)

highlightstheneedforfurtherresearch.

Inplantbreeding,boththetotalgenotypicvalueandadditivevaluearerelevantfor

selection.Formanycrops,theunitofcommercialproduction(i.e.,inbred,F1hybrid,or

vegetativeclone)isthesamegenotypeevaluatedbythebreeder,andthereforeselection

shouldbebasedontotalgenotypicvalue.Whenselectingnewparents,however,onlythe

additivevalueshouldbeconsideredbecausenon-additiveeffectsarelessefficiently

transmittedtoprogeny.Wehavedemonstratedthefeasibilityofthisparadigmforpotato

usingtheG+GG+Dmodel,althoughquestionsremainregardingitsoptimalimplementation.

Formanyissues,furtherprogresswillrequirelargerpopulationsgenotypedwithless

ascertainmentbias.

Page 25: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

25

Acknowledgments

FinancialsupportwasprovidedbyPotatoesUSA,theUSDANationalInstituteofFoodand

Agriculture,AwardNumber2014-67013-22418,andUSDAHatchProjects1002731and

1013047.

Authorcontributions

JBEdesignedthestudy.JBE,CASCanalyzedthedataanddraftedthemanuscript.All

authorscontributedgermplasm,data,orfinancialresources.

LiteratureCited

Amadeu,R.R.,C.Cellon,J.W.Olmstead,A.A.Garcia,M.F.Resendeetal.,2016AGHmatrix:

Rpackagetoconstructrelationshipmatricesforautotetraploidanddiploidspecies:a

blueberryexample.PlantGenome9:3.

Annichiarico,P.,N.Nazzicari,X.Li,Y.Wei,L.Pecettietal.,2015Accuracyofgenomic

selectionforalfalfabiomassyieldindifferentreferencepopulations.BMCGenomics16:

1020.

Bernardo,R.,1994Predictionofmaizesingle-crossperformanceusingRFLPsand

informationfromrelatedhybrids.CropSci.34:20–25.

Bourke,P.M.,R.E.Voorrips,R.G.F.Visser,andC.Maliepaard,2015Thedouble-reduction

landscapeintetraploidpotatoasrevealedbyahigh-densitylinkagemap.Genetics201:

853-863.

Page 26: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

26

Bradshaw,J.E.,D.Todd,andR.N.Wilson,2000Useoftuberprogenytestsforgenetical

studiesaspartofapotato(Solanumtuberosumsubsp.tuberosum)breeding

programme.Theor.Appl.Genet.100:772–781.

Brown,J.,andP.D.S.Caligari,1989Crosspredictioninapotatobreedingprogrammeby

evaluationofparentalmaterial.Theor.Appl.Genet.77:246–252.

Butler,D.G.,B.R.Cullis,A.R.Gilmour,andB.J.Gogel,2009ASReml-Rreferencemanual

Version3.QueenslandDepartmentofPrimaryIndustriesandFisheries,Brisbane.

Clark,S.A.,J.M.Hickey,H.D.Daetwyler,andJ.H.J.vanderWerf,2012Theimportanceof

informationonrelativesforthepredictionofgenomicbreedingvaluesandthe

implicationsforthemakeupofreferencedatasetsinlivestockbreedingschemes.

Genet.Sel.Evol.44:4.

daSilva,W.L.,J.Ingram,C.A.Hackett,J.J.Coombs,D.Douchesetal.,2017Mappinglocithat

controltuberandfoliarsymptomscausedbyPVYinautotetraploidpotato(Solanum

tuberosumL.).G3(Bethesda)7:3587–3595.

Damesa,T.M.,J.Möhring,M.Worku,andH.P.Piepho,2017Onestepatatime:Stage-wise

analysisofaseriesofexperiments.Agron.J.109:845–857.

Dekkers,J.C.M.,2007Predictionofresponsetomarker-assistedandgenomicselection

usingselectionindextheory.J.Anim.Breed.Genet.124:331–341.

Estaghvirou,S.B.O.,J.O.Ogutu,T.Schulz-Streeck,C.Knaak,M.Ouzunovaetal.,2013.

Evaluationofapproachesforestimatingtheaccuracyofgenomicpredictioninplant

breeding.BMCGenomics14:860.

Page 27: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

27

Felcher,K.J.,J.J.Coombs,A.N.Massa,C.N.Hansey,J.P.Hamiltonetal.,2012Integrationof

twodiploidpotatolinkagemapswiththepotatogenomesequence.PloSONE7:

e36347.

Fisher,R.A.,1941Averageexcessandaverageeffectofagenesubstitution.Ann.Eugen.11:

53–63.

Frederick,C.M.,2017Explorationofnovelphenotypingtechniquesandidentificationof

quantitativetraitlociforchipprocessingpotatoes(PhDthesis).UniversityofWisconsin,

Madison.

Gallais,A.,2003Quantitativegeneticsandbreedingmethodsinautopolyploidplants.INRA,

Paris.

Gopal,J.,1998Generalcombiningabilityanditsrepeatabilityinearlygenerationsof

potatobreedingprogrammes.PotatoRes.41:21–28.

Habier,D.,R.L.Fernando,andJ.C.M.Dekkers,2007Theimpactofgeneticrelationship

informationongenome-assistedbreedingvalues.Genetics177:2389–2397.

Habyarimana,E.,B.Parisi,andG.Mandolino,2017Genomicpredictionforyields,

processingandnutritionalqualitytraitsincultivatedpotato(SolanumtuberosumL.).

PlantBreeding136:245-252.

Hamilton,J.P.,C.N.Hansey,B.R.Whitty,K.Stoffel,A.N.Massaetal.,2011Single

nucleotidepolymorphismdiscoveryineliteNorthAmericanpotatogermplasm.BMC

Genomics12:302.

Page 28: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

28

Hayes,B.J.,P.M.Visscher,andM.E.Goddard,2009Increasedaccuracyofartificial

selectionbyusingtherealizedrelationshipmatrix.Genet.Res.Camb.91:47–60.

Henderson,C.R.,1975Bestlinearunbiasedestimationandpredictionunderaselection

model.Biometrics31:423-447.

Henderson,C.R.,1985Bestlinearunbiasedpredictionofnon-additivegeneticmeritsin

noninbredpopulations.J.Anim.Sci.60:111–117.

Jiang,Y.,andJ.C.Reif,2015Modelingepistasisingenomicselection.Genetics201:759–

768.

Kempthorne,O.,1955aThetheoreticalvaluesofcorrelationsbetweenrelativesinrandom

matingpopulations.Genetics40:153–167.

Kempthorne,O.,1955bThecorrelationbetweenrelativesinasimpleautotetraploid

population.Genetics40:168–174.

Kempthorne,O.,1955cThecorrelationsbetweenrelativesinrandommatingpopulations.

ColdSpringHarb.Symp.Quant.Biol.20:60–78.

Kempthorne,O.,1957Anintroductiontogeneticstatistics.JohnWiley&Sons,NewYork.

Kerr,R.J.,L.Li,B.Tier,G.W.Dutkowski,andT.A.McRae,2012Useofthenumerator

relationshipmatrixingeneticanalysisinautopolyploidspecies.Theor.Appl.Genet.

124:1271–1282.

Kruijer,W.,M.P.Boer,M.Malosetti,P.J.Flood,B.Engeletal.,2015Marker-based

estimationofheritabilityinimmortalpopulations.Genetics199:379–398.

Page 29: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

29

Legarra,A.,2016Comparingestimatesofgeneticvarianceacrossdifferentrelationship

models.Theor.Pop.Biol.107:26–30.

Li,X.,Y.Wei,A.Acharya,J.L.Hansen,J.L.Crawfordetal.,2015Genomicpredictionof

biomassyieldintwoselectioncyclesofatetraploidalfalfabreedingprogram.Plant

Genome8:2.

Lopez-Pardo,R.,L.Barandalla,E.Ritter,andJ.I.RuizDeGalarreta,2013Validationof

molecularmarkersforpathogenresistanceinpotato.PlantBreeding132:246–251.

Maris,B.,1989Analysisofanincompletediallelcrossamongthreessp.tuberosum

varietiesandsevenlong-dayadaptedssp.andigenaclonesofthepotato(Solanum

tuberosumL.).Euphytica41:163–182.

Meuwissen,T.H.E.,B.J.Hayes,andM.E.Goddard,2001Predictionofthetotalgenetic

valueusinggenome-widedensemarkermaps.Genetics157:1819-1829.

Muñoz,P.R.,M.F.R.Resende,S.A.Gezan,M.D.V.Resende,G.delosCamposetal.,2014

Unravelingadditivefromnonadditiveeffectsusinggenomicrelationshipmatrices.

Genetics198:1759–1768.

Neele,A.E.F.,H.J.Nab,andK.M.Louwes,1991Identificationofsuperiorparentsina

potatobreedingprogramme.Theor.Appl.Genet.82:264–272.

Nejati-Javaremi,A.,C.Smith,andJ.P.Gibson,1997Effectoftotalallelicrelationshipon

accuracyofevaluationandresponsetoselection.J.Anim.Sci.75:1738–1745.

Piepho,H.P.,2009Ridgeregressionandextensionsforgenomewideselectioninmaize.

Crop.Sci.49:1165–1176.

Page 30: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

30

Plaisted,R.L.,L.Sanford,W.T.Federer,A.E.Kehr,andL.C.Peterson,1962Specificand

generalcombiningabilityforyieldinpotatoes.Am.PotatoJ.39:185–197.

RDevelopmentCoreTeam,2015R:alanguageandenvironmentforstatisticalcomputing.R

FoundationforStatisticalComputing,Vienna.

Rak,K.,andJ.P.Palta,2015Influenceofmatingstructureonagronomicperformance,chip

frycolor,andgeneticdistanceamongbiparentaltetraploidfamilies.Am.J.PotatoRes.

92:518–535.

SchmitzCarleyC.A.,J.J.Coombs,D.S.Douches,P.C.Bethke,J.P.Paltaetal.,2017

Automatedtetraploidgenotypecallingbyhierarchicalclustering.Theor.Appl.Genet.

130:717–726.

Slater,A.T.,N.O.I.Cogan,J.W.Forster,B.J.Hayes,andH.D.Daetwyler,2016Improving

geneticgainwithgenomicselectioninautotetraploidpotato.PlantGenome9:3.

Slater,A.T.,G.M.Wilson,N.O.I.Cogan,J.W.Forster,andB.J.Hayes,2014Improvingthe

analysisoflowheritabilitycomplextraitsforenhancedgeneticgaininpotato.Theor.

Appl.Genet.127:809–820.

Smith,A.B.,B.R.Cullis,andA.Gilmour,2001Theanalysisofcropvarietyevaluationdata

inAustralia.Aust.N.Z.J.Stat.43:129–145.

SuG,O.F.Christensen,T.Ostersen,M.Henryon,andM.S.Lund,2012Estimatingadditive

andnon-additivegeneticvariancesandpredictinggeneticmeritsusinggenome-wide

densesinglenucleotidepolymorphismmarkers.PLoSONE7:e45293.

Page 31: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

31

Sverrisdóttir,E.,S.Byrne,H.E.R.Sundmark,H.Ø.Johnsen,H.G.Kirketal.,2017Genomic

predictionofstarchcontentandchippingqualityintetraploidpotatousing

genotyping-by-sequencing.Theor.Appl.Genet.130:2091–2108.

Tai,G.C.C.,1976Estimationofgeneralandspecificcombiningabilitiesinpotato.Can.J.

Genet.Cytol.18:463–470.

Uitdewilligen,J.G.A.M.L.,A.M.A.Wolters,B.B.D’hoop,T.J.A.Borm,R.G.F.Visseretal.,

2013Anext-generationsequencingmethodforgenotyping-by-sequencingofhighly

heterozygousautotetraploidpotato.PLoSONE8:e62355.

vanBerloo,R.,R.C.B.Hutten,H.J.vanEck,andR.G.F.Visser,2007Anonlinepotato

pedigreedatabaseresource.Potato.Res.50:45–57.

VanRaden,P.M.,2008Efficientmethodstocomputegenomicpredictions.J.DairySci.91:

4414-4423.

Vitezica,Z.G.,L.Varona,andA.Legarra,2013Ontheadditiveanddominantvarianceand

covarianceofindividualswithinthegenomicselectionscope.Genetics195:1223-1230.

Voorrips,R.E.,G.Gort,andB.Vosman,2011Genotypecallingintetraploidspeciesfrombi-

allelicmarkerdatausingmixturemodels.BMCBioinformatics12:172.

Vos,P.G.,J.G.A.M.L.Uitdewilligen,R.E.Voorrips,R.G.F.Visser,andH.J.vanEck,2015

Developmentandanalysisofa20KSNParrayforpotato(Solanumtuberosum):an

insightintothebreedinghistory.Theor.Appl.Genet.128:2387-2401.

Page 32: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

32

Wang,Y.,L.B.Snodgrass,P.C.Bethke,A.J.Bussan,D.G.Holmetal.,2017Reliabilityof

measurementandgenotypexenvironmentinteractionforpotatospecificgravity.Crop

Sci.57:1–7.

Wientjes,Y.C.J.,P.Bijma,J.Vandenplas,andM.P.L.Calus,2017Multi-populationgenomic

relationshipsforestimatingcurrentgeneticvarianceswithinandgeneticcorrelations

betweenpopulations.Genetics207:503–515.

Wricke,G.,andW.E.Weber,1986Quantitativegeneticsandselectioninplantbreeding.

WalterdeGruyter&Co.,Berlin.

Xu,S.,2013Mappingquantitativetraitlocibycontrollingpolygenicbackgroundeffects.

Genetics195:1209–1222.

Yang,J.,B.Benyamin,B.P.McEvoy,S.Gordon,A.K.Hendersetal.,2010CommonSNPs

explainalargeproportionoftheheritabilityforhumanheight.NatureGenet.42:565–

569.

Page 33: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

33

FIGURECAPTIONS

Figure 1 Comparison of off-diagonal elements for the additive covariance matrices estimated

from markers (G) vs. pedigree (A), for the training population of 571 clones. The dashed line is

the linear regression using all pairs of clones (G = 0.66A – 0.06, R2 = 0.41), and the solid line is

the regression (G = 0.79A –0.09, R2 = 0.51) when distant relationships (A < 0.05) are excluded.

Figure 2 Comparison of off-diagonal elements for the digenic dominance (D) vs. additive (G)

covariance matrices estimated from markers, for the training population of 571 clones.

Figure 3 Comparing goodness-of-fit in the training population for different genetic covariance

structures (see Table 1), using the Akaike Information Criterion (AIC) relative to a baseline

model with independent clone effects.

Figure 4 Partitioning of genetic variance in the training population of 571 clones, for five

different covariance structures (see Table 1). Variance symbols are Va = additive, Vaa = additive x

additive epistasis, Vd = digenic dominance, and Vr = residual genetic variance (for independent

clone effects).

Figure 5 Prediction accuracy (left) and bias (right) when using all clones with pedigree depth <

12 to predict clones with pedigree depth ³ 12.

Page 34: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

34

TABLESTable 1 Covariance structures for the total genotypic value (g) in Eq. 20.

Model var[g]

A I Vr + A Va

G I Vr + G Va

G+GG I Vr + G Va + (y#y) Vaa

G+D I Vr + G Va + D Vd

G+GG+D I Vr + G Va + (y#y) Vaa + D Vd

Table 2 Parentage, population size, and number of polymorphic markers in unselected F1 populations.

Population Mother (A90) Father (A90) < G90 > Pop Size No. Markers

W12011 W6360-1rus (0.01) Silverton Russet (0.01) 0.059 58 1580 W12012 W8736-6rus (0.06) Silverton Russet (0.01) 0.062 55 1744 W12060 Russet Norkotah (0.01) Canela Russet (0.01) 0.062 65 1376 W9817 Liberator (0.16) W4013-1 (0.12) 0.070 76 2311 W10010 Tundra (0.31) Bannock Russet (0.01) 0.066 48 1629 W´L Wauseon (0.20) Lenape (0.25) 0.070 167 1999 A90 = 90th percentile of pedigree relationship with the training population. G90 = 90th percentile of G coefficient between an F1 individual and the TP. <G90> is the average G90 for the F1 population. Table 3 Variance parameter estimates for the total genotypic value (Vg) and genotype ´ year effect (Vgy).

Variance Component

Yield (Mg2 ha-2) Specific Gravity Fry Color (L* 2)

Vg 88 26.5 ´ 10-6 5.1 Vgy 33 4.7 ´ 10-6 1.4

Page 35: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

35

Table 4 Prediction accuracy in unselected F1 populations.

Population Trait (Reliability a) G model accuracy

G+GG+D accuracy

Gdip model accuracy

W12011 SpGr (0.74) 0.63 0.61 0.36 W12012 Yield (0.66) 0.31 0.31 0.06 W12012 SpGr (0.70) 0.27 0.32 0.16 W12060 SpGr (0.84) 0.25 0.29 0.13 W9817 Yield (0.82) 0.06 0.06 0.08 W10010 Yield (0.82) 0.12 0.11 0.14 W´L Yield (0.65) 0.34 0.34 0.16 W´L SpGr (0.85) 0.33 0.33 0.19 a mean reliability of the validation data Gdip = G matrix based on diploidized marker data SpGr = Specific Gravity

Page 36: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

Figure 1 Comparison of off-diagonal elements for the additive covariance matrices estimated from markers (G) vs. pedigree (A), for the training population of 571 clones. The dashed line is the linear regression using all pairs of clones (G = 0.66A – 0.06, R2 = 0.41), and the solid line is the regression (G = 0.79A –0.09, R2 = 0.51) when distant relationships (A < 0.05) are excluded.

A

0.00 0.25 0.50 0.75 1.00

−0.50

−0.25

0.00

0.25

0.50

0.75

1.00

G

Page 37: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

Figure 2 Comparison of off-diagonal elements for the digenic dominance (D) vs. additive (G) covariance matrices estimated from markers, for the training population of 571 clones.

G

−0.50 −0.25 0.00 0.25 0.50 0.75 1.00

−0.25

0.00

0.25

0.50

0.75

1.00

D

Page 38: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

Figure 3 Comparing goodness-of-fit in the training population for different genetic covariance structures (see Table 1), using the Akaike Information Criterion (AIC) relative to a baseline model with independent clone effects.

AIC

Yield Specific Gravity Fry Color

−50

−40

−30

−20

−10

0

AGG+GGG+DG+GG+D

Page 39: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

Figure 4 Partitioning of genetic variance in the training population of 571 clones, for five different covariance structures (see Table 1). Variance symbols are Va = additive, Vaa = additive x additive epistasis, Vd = digenic dominance, and Vr = residual genetic variance (independent clone effects).

Yield Specific Gravity Fry Color

A G

G+G

G

G+D

G+G

G+D

A G

G+G

G

G+D

G+G

G+D

A G

G+G

G

G+D

G+G

G+D

0.00

0.25

0.50

0.75

1.00

Prop

ortio

n of

Gen

etic

Var

ianc

e

VarianceVrVdVaaVa

Page 40: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

Figure 5 Prediction accuracy (left) and bias (right) when using all clones with pedigree depth < 12 to predict clones with pedigree depth ³ 12.

Accu

racy

Yield SpGr Fry Color

0.2

0.3

0.4

0.5

0.6

AGG+GGG+DG+GG+D

Reg

ress

ion

Coe

ffici

ent

Yield SpGr Fry Color

0.5

0.6

0.7

0.8

0.9

1.0

1.1

Page 41: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

1

FileS1:TablesS1–S2,FiguresS1–S3,andSupportingMethodsGeneticvariancepartitioningandgenome-widepredictionwithalleledosageinformationinautotetraploidpotatoEndelmanetal. Table S1 Number of new vs. returning clones by year in the training population.

Year No. new clones No. returning clones 2012 78 0 2013 59 45 2014 82 58 2015 43 56 2016 128 52 2017 181 67 Table S2 Genetic variance estimates (and SE) for the G+GG+D model, as a proportion of total genetic variance.

Yield Specific Gravity Fry Color

Va 0.45 (0.13) 0.20 (0.10) 0.45 (0.17)

Vaa 0.03 (0.24) 0.51 (0.22) 0.44 (0.23)

Vd 0.07 (0.10) 0.00 (NAa) 0.00 (NA)

Vr 0.45 (0.20) 0.29 (0.18) 0.12 (0.17) a REML solution on the boundary

Page 42: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

2

Figure S1 Distribution of pedigree depth for the training population of 571 clones. Pedigree depth is the maximum number of generations to a founder.

1 2 3 4 5 6 7 8 9 10 11 12 13

Pedigree Depth

Num

ber o

f Clo

nes

020

4060

8010

012

014

0

Page 43: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

3

Figure S2 Comparison of the observed (red circle) vs. expected (black line) frequency of heterozygotes under random mating, for the 3895 SNPs used in this study. For allele frequency p (q = 1–p), the heterozygote frequency at panmictic equilibrium is 4p3q + 6p2q2 + 4pq3 for tetraploids compared with 2pq for diploids (Gallais 2003).

●●

● ●

●●

● ●

●●

● ●

●●

●●

● ●

●●

●●

●●● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●●

● ●

●●

●● ●

●●

●●● ●

●●

●●

●●

●●

●●●● ●●

●●

●●

●●

●●

●●

● ●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

●●●

● ●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●● ●

●●

●●

●●

● ●

●●●●

● ●

●●

●●

●●

●●

●●

●●

●●●●●

● ●●

●●

●●

●●

●●

●●

●●

● ●

● ●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

● ● ●●

●●●●

●●

● ●

●●●

●●

●● ●

●●

●●

●●

●●●

●●

● ●

● ●●●

●●

● ●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●● ●

● ●

●●

● ●●

●●

●●

●● ●

● ●

● ●

●●

●● ●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●●

● ●

● ● ●

●●

●●

● ●

●●

● ●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●● ●

● ●●●

●● ●

●●

●●

●●

●●

●●

●●●●

● ●

● ●

●●

●●

●●

●●

● ●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●●

●●

●●

●●

● ●

●●

● ●

●●●

●●

●●

●●

●●●

●●

● ●●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

● ●

● ●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●● ●

●●

●●

●●

● ●

● ●

●●

● ●

● ●●

●●

●● ●

● ●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●●

● ●

●●

●●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●●

●●●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Allele Frequency

Freq

uenc

y of

Het

eroz

ygou

s G

enot

ypes

TetraploidDiploid

Page 44: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

4

Figure S3 Effect of population size (N) on prediction accuracy within the training set. The points at N = 403 correspond to using all clones with pedigree depth < 12 to predict clones with pedigree depth ³ 12, using the G+GG+D model. The points at N = 100, 200, 300 are the mean accuracy for 200 random subsets, and the error bars show ± 1 standard error.

0.2

0.3

0.4

0.5

0.6

Training Population Size

Accu

racy

100 200 300 400

YieldSpecific GravityFry Color

Page 45: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

5

SupportingMethodsNormalEquationsfortheAdditiveEffects

TakingthederivativeofEq.2withrespecttoanarbitraryallelet,andsettingtheresult

equaltozero:

0 = −2%&'&(&)&*+,'()* − -.' + .( + .) + .*01()*

− 2%&2&'&)&*[,2')* − (.2 + .' + .) + .*)]2)*

− 2%&2&(&'&*+,2('* − -.2 + .( + .' + .*012(*

− 2%&2&(&)&'+,2()' − -.2 + .( + .) + .'012()

(S1)

Becausegenotypicvalueisunchangeduponpermutationoftheindices,eachofthetermsin

Eq.S1isidentical,whichleadstotheresult:

0 =%&(&)&*+,'()* − -.' + .( + .) + .*01()*

(S2)

RearrangingEq.S2,andusingtheidentity∑ &( = 1( ,leadsto

%&(&)&*,'()* = .' + 3%&(.((()*

(S3)

MultiplyingEq.S3bypt,summingovert,andusingtheinterchangeabilityoftheindices,

generates

%&'&(&)&*,'()* = 4%&(.(('()*

(S4)

Fromthedefinitionofz(Eq.1),theleftsideofEq.S4iszero,sotherightsideisalsozero,

whichcompletestheproofofEq.3.

Page 46: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

6

NormalEquationsfortheDigenicDominanceEffects

TakingthederivativeofEq.5withrespecttoanarbitraryallelepair(s,t),settingtheresult

equaltozero,andusingtheinterchangeabilityoftheindices,generates

0 =%&)&*[;<')* − (=<' + =<) + =<* + =') + ='* + =)*)])*

(S5)

Uponrearrangingandusingtheinterchangeabilityofindices,Eq.S5becomes

%&)&*;<')*)*

= =<' + 2%=<)&))

+ 2%=')&))

+%=)*&)&*)*

(S6)

Theleftsideofthisequationequals>̅<'⋅⋅ − A − .< − .' .Multiplyingbyptandsummingover

t(atfixeds),theleftsidebecomeszero,andtherightsidebecomes

0 = 3%=<)&))

+ 3%=)*&)&*)*

(S7)

MultiplyingEq.S7bypsandsummingoversleadsto0 = 6%=<)&<&)

)

(S8)

Eq.S8showsthatthesecondterminEq.S7iszero,whichimpliesthefirsttermisalsozero,

whichcompletestheproofofEq.6.

Page 47: Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92 +93B, are known as the dominance deviation. In diploids this deviation uniquely defines

7

DerivationofEq.12

Forabi-alleliclocus,theconstraintEq.6isasystemoftwoequationswiththreedigenic

dominance parameters:

&=CC + D=CE = 0

&=CE + D=EE = 0(S9)

Touniquelydeterminethesolution,weintroducetheparameter= ≡ =CC − 2=CE + =EE,

whichincombinationwithEq.S9producesthelinearsystem:

H1 −2 1& D 00 & D

I H=CC=CE=EE

I = H=00I (S10)

UsingGaussianelimination,Eq.S10canbereducedto

H1 −2 10 D + 2& −&0 1 0

I H=CC=CE=EE

I = H=−&=−&D=

I (S11)

andback-substitutionproducesEq.12.

DerivationofEq.13

UsingEq.12,thetotaldigenicdominancevXforaclonewithdosageXoftheBalleleis

JK = 6=CC = 6DL=

JM = 3=CC + 3=CE = (3DL − 3&D)=

JL = 1=CC + 4=CE + 1=EE = (DL − 4&D + &L)=

JN = 3=CE + 3=EE = (−3&D + 3&L)=

JO = 6=EE = 6&L=

(S12)

Replacingallinstancesofqwith1–p,Eq.S12becomesJK = (6&L − 12& + 6)=

JM = (6&L − 9& + 3)=

JL = (6&L − 6& + 1)=

JN = (6&L − 3&)=

JO = 6&L=

(S13)

whichisequivalenttoEq.13.