S4.3. Association Mapping, Breeder Ready markers and Genomic Selection

Association Mapping, Breeder Ready markers and Genomic Selection

Raman Babu, Jill Cairns, Gary Atlin, PH Zaidi, Pichet Grudloyma, George

Mahuku, Sudha K Nair, Natalia Palacios, Pixley Kevin, Jose Crossa, BM

Prasanna and all the Breeders of CIMMYT

Outline Association Mapping for Drought Tolerance – CIMMYT‟s

experience

● Are there large effect genes for GY_stress?

● Should we bother about “rare alleles” that have large

effects?

Association Mapping for Disease Resistance

Association Mapping (Candidate-gene based) for

Carotenoids

„Breeder-ready‟ markers for disease resistance and ProA

Integrating Genomic Selection in the breeding Pipeline

LD and Population structure in DTMA-AM panel based on 55K SNPs

Average distance between two

markers is 55kb and Average EM-

R2 is 0.26

LD in DTMA panel is low and

hence suitable for association

mapping

Population structure is ‘mild’ and

association results were corrected

for structure (through PCA) and

kinship by MLM

LaPosta Seq

CIM-CALI

DTP

DTMA-AM panel and 55K SNPs can identify large effect genes – 1. Grain Color

Psy1

R² = 37%

SNP with largest significant association with

grain color located within one of the exons of

Phytoene Synthase1 (psy1) on chr.6

92 – Yellow lines (1)

186 – white lines (0)

DTMA-AM panel and 55K SNPs can identify large effect genes – 2. QPM

Opaque2

at 7.01

R² = 16%

Ask2

R² = 8%

Besides opaque-2 and ask-2, several minor QTL regions

influencing kernel modification and tryptophan content

identified that overlap with previously reported regions…

10 – QPM lines (1)

268 – Normal lines (0)

Mapping Drought Tolerance

Strategy GWAS AM-panel ~ 300 inbreds – TCd with CML312

Known DT sources La Posta Sequia C7; DTP C9, MBR etc.

Phenotyping 10 locations – Stress & Optimal

Heritabilities Kiboko-10-Late (0.64), M-10 –Tlalti (0.67), Thailand-

10 (0.49), M-Tlalti-09 (0.54), Zim-10 (0.22) Across

locations: 0.35

Phenotype used in

GWAS

Combined BLUPs of TC_GY under stress, corrected for

anthesis date

Genotyping Genome-wide, high density markers – 55K SNPs and GBS

markers (500K SNPs)

Statistical Model General Linear Model (PCA correction) and Mixed Linear

Model (PCA + Kinship – Q+K)

12 -15 significant genomic regions identified for DT

7.3% 6.2% 5.7%

7.0%

5.7%

5.5%

5.1%

5.8%

4.9%

5.1%

Only 147 SNPs (~15 genomic regions) had R2 values more than 5%

Significant Genomic Regions associated with

TC_GY_Stress

SNP Chr Position P value MAF R2 (%) Effect

(kg/ha) Candidate Gene

SYN39332 10 142655119 9.62E-06 0.49 7.6 29.3 Starch Synthase

PZE-107042377 7 72216348 1.49E-05 0.32 7.3 35.6

Myb family transcription factor-related protein

PZE-108046876 8 77237318 2.33E-05 0.35 6.9 -34.4

PZE-110029252 10 50842298 7.77E-05 0.40 6.8 -26.3

PZE-107032355 7 45011599 6.62E-05 0.38 6.2 38.6

SYN37988 2 146399448 3.84E-05 0.26 6.1 49.5 TSA: Zea mays contig27975, mRNA sequence

PZE-101090321 1 80757998 1.67E-04 0.46 4.9 -31.4

PZE-109041733 9 62608362 1.35E-04 0.42 4.9 25.5

PZE-104047052 4 78536398 1.21E-04 0.32 4.7 30.1

Average GY of the stress trials – 1.3 t/ha

Heritability across locations – 0.32

Rare Alleles with Large Effects

Marker Chr Position P Minor Allele MAF Average for

DD Average for

Dd Average for

dd Effect (kg/ha) DD Dd dd

PZE-104042524 4 67259441 3.70E-03 A 0.14 1499.5 1414.1 1382.8 116.6 7 59 188

PZE-101066401 1 49827350 1.54E-02 A 0.04 1487.8 1391.5 96.3 10 0 257

SYN36769 4 4914023 7.83E-03 A 0.06 1479.7 1355.9 1391.7 88.0 14 2 249

SYN26515 1 63053588 2.42E-03 A 0.06 1472.1 1253.0 1391.0 81.0 15 1 251

SYN1035 5 5786027 3.24E-02 G 0.07 1463.1 1395.2 1390.3 72.8 16 2 240

PZE-110053356 10 100124247 4.70E-03 A 0.11 1331.7 1381.2 1401.8 -70.1 17 24 224

PZE-104113536 4 194565443 7.31E-04 A 0.13 1334.7 1282.7 1405.3 -70.6 33 3 231

PZE-102096857 2 107898705 3.07E-03 G 0.08 1329.7 1400.4 -70.6 20 0 238

PZE-109074314 9 116545321 2.21E-04 G 0.08 1328.8 1400.3 -71.5 20 0 246

PZE-105127701 5 183968110 2.24E-04 A 0.07 1323.4 1400.7 -77.4 19 0 249

PZE-102121069 2 162773047 8.92E-04 G 0.06 1320.1 1400.2 -80.1 17 0 250

PZE-106064720 6 116886483 1.09E-02 A 0.07 1318.2 1307.6 1401.3 -83.1 17 1 248

SYN14434 2 15813081 1.40E-03 A 0.08 1314.6 1433.4 1398.6 -84.0 19 1 221

PZE-106056703 6 107499158 1.98E-04 G 0.06 1310.2 1307.6 1401.2 -91.0 15 1 251

SYN8914 3 194356323 4.25E-03 G 0.08 1307.9 1381.6 1400.1 -92.2 9 25 226

Rare Alleles with

Positive Large

Effects

PZE-104042524

4 67259441

A GY (kg/ha)

90[SPMATC4/P500(SELY)]#-B-4-2-B-B 1483.8

DTPYC9-F46-3-9-1-1-B-B 1461.7 La Posta Seq C7-F125-2-1-1-2-B-B-B 1436.8

La Posta Seq C7-F103-2-2-2-1-B-B-B 1626.9

La Posta Seq C7-F180-3-1-1-1-B-B-B 1593.5 La Posta Seq C7-F96-1-1-1-B-B 1482.1

DTPYC9-F72-1-2-1-1-B-B 1411.4

PZE-101066401

1

49827350

A GY (kg/ha)

POB.502 c3 F2 10-3-2-1-BBBBBB-B 1429.0

POB.502c3 F2 9-14-1-2-B-B-B-B 1482.4

CLQ-RCYQ28=(CLQ6502*CLQ6601)-B-34-2-2-B*6-B 1476.1

DTPWC9-F24-4-3-1-B-B-B 1554.0

DTPWC9-F115-1-4-1-1-B-B-B 1483.4

DTPWC9-F103-2-1-1-1-B-B-B 1469.6

DTPYC9-F46-3-4-1-1-B-B-B 1535.9

DTPYC9-F46-3-9-1-1-B-B 1461.7

DTPYC9-F46-1-2-1-2-B-B 1606.1

DTPYC9-F13-2-1-1-2-B-B 1379.5

SYN36769

4

4914023

A GY (kg/ha)

[SYN-USAB2/SYN-ELIB2]-12-1-1-2-BBB 1497.3

[CML440/[[[K64R/G16SR]-39-1/[K64R/G16SR]-20-2]-5-1-2-B*4/CML390]-B-39-2-B-4-#-1-B//ZM303c1-243-3-B-1-1-B]-9-1

[[KILIMA ST94A]-30/MSV-03-1-10-B-1-B-B-1xP84c1 F27-4-1-6-B-5-B] F8-3-2-2-1 x G16SeqC1F47-2-1-2-1-BBBB-B-xP84c1 F26-2-2-6-B-3-B]-3-1-B/CML395]-1-1 1419.5

[Pob.SEW-HG"B"c0F39-1-1-1-1xMBR C5 Bc F22-2-1-4-B-B-B-B-2-2-B-B-B/CML442]-1-1 1333.2

[Cuba/Guad C3 F34-2-1-1-B-B-B x CML264Q]-1-1 1376.4

CML-322 1428.5

DTPWC9-F115-1-4-1-1-B-B-B 1483.4

DTPWC9-F31-1-3-1-1-B-B-B 1492.0

DTPWC9-F67-1-2-1-2-B-B-B 1506.5

DTPWC9-F104-5-4-1-1-B-B-B 1454.3

DTPYC9-F46-3-4-1-1-B-B-B 1535.9

DTPYC9-F46-3-9-1-1-B-B 1461.7

DTPYC9-F46-1-2-1-1-B-B 1552.7

DTPYC9-F46-1-2-1-2-B-B 1606.1

DTPWC9-F67-2-2-1-B-B-B 1568.7

SYN26515

1

63053588

A GY (kg/ha)

CML444-B 1501.9

S87P69Q(SIYF) 109-1-1-4-B 1518.4

CLQ-RCYQ40 = (CML165 x CLQ-6203)-B-9-1-1-B*8 1509.3

CML497=[CL-00331*v]-3-B-3-2-1-B*6 1443.1

DTPWC9-F115-1-4-1-1-B-B-B 1483.4

DTPWC9-F109-2-6-1-1-B-B-B 1467.8

DTPWC9-F67-1-2-1-2-B-B-B 1506.5

DTPWC9-F104-5-4-1-1-B-B-B 1454.3

DTPWC9-F128-1-1-1-1-B-B-B 1390.9

DTPYC9-F143-5-4-1-2-B-B-B 1442.1

DTPYC9-F143-1-6-1-B-B 1414.6

DTPWC9-F67-2-2-1-B-B-B 1568.7

Rare Alleles with

Negative Large

Effects

SYN8914

3

194356323

G

[CML198/ZSR923S4BULK-2-2-X-X-X-X-1-BB]-3-3-1-1-2-B*7 1196.562

S99TLWQ-B-8-1-B*5 1245.322

4001 1292.372

CLA41 1389.549

(A.I.Z.T.V.C. 20-3-1-1-2-B-B x A.I.Z.T.V.C.PR93A-17-1-3-1-1-B-B)-B-14TL-1-3-B-B 1252.957

[G16SeqC1F47-2-1-2-1-BBBB-B-xP84c1 F27-4-1-6-B-5-B] F23-1-3-1-1 x [KILIMA ST94A]-30/MSV-03-2-10-B-1-B-B-xP84c1 F27-4-1-6-B-5-B]-2-1-B/CML395]-1-1 1270.448

POB.501c3 F2 13-8-2-1-BBBB 1383.065

CL-RCY031=(CL-02410*CML-287)-B-9-1-1-2-B*7 1433.411

PZE-106056703 6 107499158 G [CML444/CML395//DTPWC8F31-4-2-1-6]-2-1-1-1-B*4 1331.949 [(CML395/CML444)-B-4-1-3-1-B/CML395//DTPWC8F31-1-1-2-2]-5-1-2-2-BB 1346.993 CML 384xMBR/MDR C3 Bc F58-2-1-3-B-B-B-B-3-1-B-B-BB-B 1344.688 MBR C6 Bc F280-2-B-#-1-1-B-B-B-B-B-B 1256.056 [G16SeqC1F47-2-1-2-1-BBBB-B-xP84c1 F27-4-1-6-B-5-B] F23-2-1-2-3 x P43C9-1-1-1-1-1-BBBB-1-xP84c1 F26-2-2-6-B-3-B]-2-1-B/CML395]-1-1 1258.137 [M37W/ZM607#bF37sr-2-3sr-6-2-X]-8-2-X-1-BB-B-xP84c1 F27-4-3-3-B-1-B] F29-1-2-1-6 x [KILIMA ST94A]-30/MSV-03-2-10-B-1-B-B-xP84c1 F27-4-1-6-B-5-B]3-1-2-B/CML442]-1-1 1190.413 [Pob.SEW-HG"B"c0F39-1-1-1-1xMBR C5 Bc F22-2-1-4-B-B-B-B-2-2-B-B-B/CML442]-1-1 1333.209 [MBR Et/MBR Bc C1 F4-1-1-3-B-B-B-Bx1760B B1 Bco x Comp.-B-1-1-1-1-B-B-B/CML395]-1-1 1354.8 [CML 329/MBR C3 Am F103-1-1-2-B-B x CML486]-1-1 1346.293 [(87036/87923)-X-800-3-1-X-1-B-B-1-1-1-B-B-xP84c1 F26-2-2-4-B-2-B] F47-3-1-1-3 x M37W/ZM607#bF37sr-2-3sr-6-2-X]-8-2-X-1-BB-B-xP84c1 F27-4-3-3-B-1-B]-3-2-B x P33c3 F64-1-1-4-BB]-1-1 1295.392 P390amC3/285x287 F73-3-2-3xMIRTC5Am F96-1-1-1-3-1)-1-1-B 1399.776 CL-G1837=G18SeqC2-F141-2-2-1-1-1-2-##-2-B*4 1275.469 CML421=P31DMR#1-55-2-3-2-1-B*18-B 1252.385 DTPWC9-F73-2-1-1-1-B-B-B 1329.332

SYN14434 2 15813081 A P501SRc0-F2-47-3-2-1-B-B 1268.038 [CML444/CML395//DTPWC8F31-1-1-2-2-BB]-4-2-2-2-2-BB-B 1267.39 [CML444/CML395//DTPWC8F31-1-1-2-2-BB]-4-2-2-2-1-BB-B 1408.142 02SADVL2B-#-17-1-1-B 1419.196 [CML440/[[[K64R/G16SR]-39-1/[K64R/G16SR]-20-2]-5-1-2-B*4/CML390]-B-39-2-B-4-#-1-B//ZM303c1-243-3-B-1-1-B]-9-1 [CML144/[CML144/CML395]F2-5sx]-1-3-1-3-B*4 1397.445 [CML198/ZSR923S4BULK-2-2-X-X-X-X-1-BB]-3-3-1-1-2-B*7 1196.562 [CML144/[CML144/CML395]F2-8sx]-1-1-1-B*5 1171.759 [CML144/[CML144/CML395]F2-8sx]-1-2-3-2-B*5 1203.073 CLA222 1337.217 [M37W/ZM607#bF37sr-2-3sr-6-2-X]-8-2-X-1-BB-B-xP84c1 F27-4-3-3-B-1-B] F29-1-2-1-6 x [KILIMA ST94A]-30/MSV-03-2-10-B-1-B-B-xP84c1 F27-4-1-6-B-5-B]3-1-2-B/CML442]-1-1 1190.413 [Cuba/Guad C3 F34-2-1-1-B-B-B x CML264Q]-1-1 1376.38 CA00344 / PAC777F2-6-1-1-BB-B-B-BB 1321.875 P44 C10MH8-30-4-B-4-1-B-B-B-B- 1329.436 P147-#136-5-1-B-1-BBB 1356.154 CLQ-6211=P62QC6HC13-1-3-BBB-6-B-7-6-BBBB-7-9-B-B-B-B 1311.726 CML269=P25STEC1F13-6-1-1-#-BBB-f-##-B*6-B 1407.819 CL-02143 P21C6S1MH247-5-B-1-1-2-BBB-1-##-B*10 1471.196 CML421=P31DMR#1-55-2-3-2-1-B*18-B 1252.385 DTPWC9-F66-2-1-1-2-B-B-B 1291.755

Rare Alleles – Candidate genes

Candidate genes

identified by Rare Alleles

Putative function

upstream of a DNA biding/membrane

bound receptor Many membrane bound receptors like Rpk1, shown to confer DT in AT.

DEAD box Helicase domain

Less documented helicase domain proteins in AT proved for DT in CK

dependent pathways

related to ethyline insensitive2

cross-talk between ethylene signalling and drought response pathways well-

documented

Extensin like cell wall protein

glyco poteins rich in hydroxy proline was first studied in Tracheophytes

which can with stand severe stress

Annexin IV domain Role of Annexins in DT well-documented in AT

Peroxidase protein known for involvement in DT in rice, AT etc.

Major Facilitator Superfamily (MFS)

Transporters plays key roles in different stress conditions

Aminotransferase

over expression of Aspartate aminotransferase along with other

genes has been patented for DT

CREB domain containing TF Known component in stress related pathways

Ubiquitin subgroup known component in drought tolerance pathways

Traits for which AM analysis accomplished in

DTMA-AM panel

GY_Stress_BLUPs

MSV

GLS

NDVI

Senescence

SPAD

Canopy senescence

ASI

Root traits (Shovelomics!)

Anaerobic Emergence

% reduction in shoot weight under waterlogged conditions

% reduction in root weight under water logged conditions

Following up the AM results

● BC-NILs for validation of important genomic

regions

● Identify MARS progenies with contrasting

genotypes and check the drought phenotypes

● Genotype the DH lines from DT x Normal

crosses and check the phenotypes

● Introgress validated genomic regions into tester

lines through MAS

Artesian – Recent Drought Tolerant Hybrid from Syngenta

Base Hybrid Artesian Hybrid

Artesian – how was it developed?

Strategy Association mapping (candidate gene-based)

BC-MAS of 4-8 QTLs

DT source germplasm CML333, CML322, Cateto SP VII (Brazil), Confite Morocho AYA

38 (Peru), or Tuxpeno VEN 692 (Venezula)

AM-panel 575 inbreds – 47 different testers (mostly S-2 and S-3 TCHs)

Phenotyping 4 locations (Colorado, California and Chile) – Optimal & stress -

Yield reduction under stress was 40-60% from optimal

Genotyping 85 polymorphisms (corresponding to 57 candidate genes) and

149 random polymorphisms across 600 lines – in total only

~250 markers

Effect sizes of identified

genomic regions

60 to 650 kg/ha

Minimum P value of any

significant region

0.0001

Significant Conclusions – DT mapping

LD in DTMA-AM panel is low and hence conducive for association mapping

55K genotype data is capable of identifying large effect genes

„Reasonably large effect‟ genomic regions (10-15) do exist for GY_Stress and co-locate with genes, previously implicated for DT in At, rice and maize

9 genomic regions that had robust p-values together explained 35% of phenotypic variance for GY_Stress_Combined

Lines with multiple donor segments identified for validation and introgression

Two Key genes in carotenoid biosynthetic pathway identified

Lycopene epsilon cyclase (Harjes

et al. 2008; Science)

Hydroxylase (CrtRB-1/HydB-1)

(Yan et al. 2010; Nature Genetics)

Association Mapping

based on candidate

gene sequences

Breeder-ready markers developed and routinely being used in the

H+ breeding program of CIMMYT for CrtRB1 and LcyE

AM leads to identification of

Key genes and polymorphisms

Polymorphisms validated in

diverse tropical genetic

backgrounds and breeder-ready

high throughput markers

developed

Routine use of markers and

selection of favorable

genotypes in H+ breeding

program

+ +

MAS for

LycE

MAS for

HydB Deep orange

ears

High

ProvitA

maize! =

Allele Mining for CrtRB1 (HydB1) across various

Association Mapping Panels

Panel Genotypes with Fav.

allele/Total

White(W)/Yellow(Y)

CIMMYT_Syngenta

CAM Panel

24*/501 – 16 new sources All Yellow (Y)

IMAS 16/430 (6 from ARC, SA and

3 from KARI)

14-W and 2-Y

Subtropical Collections 71/1131 – many new sources 24-W and 47-Y

ADP lines of

SYNGENTA

19/122 – “1” and 23/122 – “H”

PS: * out of 24, 8 were previously fixed for fav. allele of CrtRB1 in the H+ breeding

program through MAS

MSV – Harare 2010 data (Heritability = 0.79)

Association Mapping for Disease Resistance

GLS-combined analysis (Heritability = 0.6)

Marker Chr Position

Corr/Trend

P value

Corr/Trend

Chi-square

FDR (False

discovery

rate)

R2

(%)

Minor

Allele

Minor

Allele

Freq.

Major

Allele

Trait

average

for Minor

allele

Trait

average

for Major

allele

PZE-101093951 1 86065123 4.50E-08 29.92 0.002 11.5 A 0.34 G 1.83 3.08

PZE-101098418 1 92204598 6.47E-07 24.77 0.011 9.5 G 0.36 A 2.15 2.95

SYN36281 1 187128850 1.93E-06 22.67 0.019 8.7 G 0.11 A 2.21 2.72

PZE-101094082 1 86384320 2.45E-06 22.21 0.020 8.5 G 0.39 A 1.99 3.10

PZE-104024779 4 28770811 4.04E-06 21.24 0.022 8.2 A 0.15 G 2.26 2.73

PZE-101098295 1 91837910 5.31E-06 20.72 0.022 8.0 A 0.33 G 2.12 2.92

PZE-108038832 8 59948253 5.57E-06 20.63 0.021 7.9 A 0.47 G 2.63 2.70

PZE-103070254 3 111066077 6.36E-06 20.38 0.022 7.8 G 0.24 A 3.07 2.52

PZE-101094056 1 86365447 6.37E-06 20.37 0.021 7.8 G 0.50 A 2.16 3.16

PZE-108039819 8 62905375 7.00E-06 20.19 0.022 7.8 G 0.46 A 2.62 2.69

PZE-101090488 1 80905706 7.02E-06 20.19 0.020 7.8 A 0.29 G 1.83 3.00

PZE-104016598 4 16339600 7.13E-06 20.16 0.019 7.8 A 0.33 C 2.21 2.87

PZE-102080891 2 64845534 7.21E-06 20.14 0.019 7.7 A 0.28 C 2.19 2.84

PZE-101098960 1 93244458 7.76E-06 20.00 0.019 7.7 A 0.40 G 3.11 2.36

MSV – Harare 2010 data (Heritability = 0.79)

Significant chromosomal regions (P < 1.0E-05) associated with MSV

resistance (Har-2010 data) based on DTMA-AM panel and 55K genotype

data (MLM)

R R

S

S S

R R

S

S S

S

S S R

R

R R R

S S S

Chr.1

Msv1

Chr.3 Chr.4 Chr.8

PZ

E0175698629

PZ

E0

11

32

22

09

36

PZ

A0

20

90

_1

PZ

A0

35

27

_1

PZ

A0

26

14

_2

PZ

A0

05

29

_4

PZ

A0

36

51

_1

PH

M1

41

04

_2

3

Validation of AM regions and Breeder-ready markers for MSV

PZE0186365075

csu1138_4

PZA00944_1

PZE0195148805

PZE01101110579

PZE01111422982

PZE0175698629

PZE-101093951

Candidate SNPs for MSV

Genomic Selection

Genomic Selection

Trait RR-BLUP B-LASSO RP

Grain Color (Binary) 0.8 0.82 0.87

QPM (Binary) 0.96 0.96 0.95

ProA - Quant 0.39 0.42 0.6

GLS - Quant 0.52 0.53 0.55

MSV - Quant 0.62 0.61 0.60

GY - Quant 0.34 0.35 0.36

Using 55K SNP data across 300 individuals in the AM

panels

Integrating GS in breeding pipeline (DH + off-season nusery + GS)

Season Activity

Summer • Grow 50-100 F2s/BC1s

• Select 50 plants/cross and cross to haploid inducer

Winter

• Chromosome doubling of putative haploids to get DHs

• Seed chip (one kernel/DH) 2500 – 5000 DH kernels

• Discard disease susceptible DHs through specific marker

screening

• Select DHs through GY-GEBVs and seed Increase (top 5-

10%)

Summer • Test cross GEBV-selected DH lines to one/two testers

• Yield trials of DH-TCHs

THANKS

% phenotypic variance explained by structure

alone…in DTMA-AM panel

Trait/Location % phenotypic variance

explained by 10PCs GY_Stress_Combined_BLUP 15.8

MSV (Harrae2010+09-1) 38.2

GLS_Combined 25.1

GLS_Har_10 8.8

GLS_Kakamega 11.5

GLS_Columbia_Scatalina 30.2

GLS_San Pedro_Mexico 29.6

GLS_Acatec_Mex 23

GLS_Paraguacito_Columbia 6.7

Education

S4.3. Association Mapping, Breeder Ready markers and Genomic Selection