EXtreme Array Mapping and Haplotype analysis Using Arrays Justin Borevitz Salk Institute...

Preview:

Citation preview

eXtreme Array Mappingand Haplotype analysisUsing Arrays

Justin BorevitzSalk Institutenaturalvariation.org

Talk Outline

• Bulk Segregant Mapping of– Mendelian mutations

• eXtreme Array Mapping of QTL– Kas x Col RILs and Simulations

• Haplotype analysis– Patterns Global Variation Selection

Potential Deletions

False Discovery and Sensitivity

Permuted data

real data

5% FDR

PM only SAM threshold

5% FDR

GeneChip SFPs nonSFPs Cereon marker accuracy 3806 89118 100% Sequence 817 121 696 Sensitivity

Polymorphic 340 117 223 34% Non-polymorphic 477 4 473

False Discovery rate: 3% Test for independence of all factors: Chisq = 177.34, df = 1, p- value = 1.845e- 40

Observed t statistics vsNull (permuted) t statistics

Chip genotyping of a Recombinant Inbred Line

29kb interval

Potential Deletions

111 potential deletions45 confirmed by Ler sequence

23 (of 114) transposons

Disease Resistance(R) gene clusters

Single R gene deletions

Genes involved in Secondary metabolism

Unknown genes

Potential Deletions Suggest Candidate Genes

deletion of MAF1

FLOWERING1 QTL

Chr1 (bp)

Flowering Time QTL caused by a natural deletion in MAF1

MAF1

211 172

135

46

54 70

53

EST Deletions all KASC ALL DEL

LER del all

19838all genes

Deletions between Accessions

Fast Neutron deletions

FKF1 80kb deletion CHR1 cry2 10kb deletion CHR1

Het

Map bibb100 bibb mutant plants100 wt mutant plants

bibb mapping

ChipMapAS1

Bulk segregantMapping usingChip hybridization

bibb maps toChromosome2 near ASYMETRIC LEAVES1

BIBB = ASYMETRIC LEAVES1

Sequenced AS1 coding region from bib-1 …found g -> a change that would introduce a stop codon in the MYB domain

bibb as1-101

MYB

bib-1W49*

as-101Q107*

as1bibb

AS1 (ASYMMETRIC LEAVES1) =MYB closely related toPHANTASTICA located at 64cM

Other Mendelian mutations

aar21 arhythmic ein6 ethylene insensitive (no een?)Also aar90, aar60 and stamenstay

Short pool –Tall pool

Kas x Col RILsall Features

RED2 QTL

LOD

0 50 100 150

-200

020

040

0

log

likel

ihoo

d ra

tio

Chromosome 1

0 50 100 150

-200

020

040

0

log

likel

ihoo

d ra

tio

Chromosome 2

0 50 100 150

-200

020

040

0

log

likel

ihoo

d ra

tio

Chromosome 3

0 50 100 150

-200

020

040

0

log

likel

ihoo

d ra

tio

Chromosome 4

0 50 100 150

-200

020

040

0

log

likel

ihoo

d ra

tio

Chromosome 5

eXtreme Array Mapping

Red light QTL RED2 from 100 Kas/ Col RILs

QTL likelihood model using bulk segregant analysis with SFP genotyping

0

4

8

12

16

0 20 40 60 80 100cM

LO

D

Composite Interval MappingRED2 QTL

RED2 QTL

Chromosome 2

15 tallest RILs pooled vs15 shortest RILs pooled

Simulation Genotypes

0 20 40 60 80 100 120

-1.0

-0.5

0.0

0.5

1.0

Chromosome 1 (cM)

cM

geno

type

0 20 40 60 80

-1.0

-0.5

0.0

0.5

1.0

Chromosome 2 (cM)

cMge

noty

pe

0 50 100 150

-1.0

-0.5

0.0

0.5

1.0

Chromosome 3 (cM)

cM

geno

type

0 10 20 30 40 50 60 70

-1.0

-0.5

0.0

0.5

1.0

Chromosome 4 (cM)

cM

geno

type

0 20 40 60 80 100

-1.0

-0.5

0.0

0.5

1.0

Chromosome 5 (cM)

cM

geno

type

15 eXtremeRILs of 1002 QTL chr2 37%var chr5 13%var

Simulation Genotypes

100 eXtremeRILs of 7002 QTL

0 20 40 60 80 100 120

-1.0

-0.5

0.0

0.5

1.0

Chromosome 1 (cM)

cM

geno

type

0 20 40 60 80

-1.0

-0.5

0.0

0.5

1.0

Chromosome 2 (cM)

cMge

noty

pe0 50 100 150

-1.0

-0.5

0.0

0.5

1.0

Chromosome 3 (cM)

cM

geno

type

0 10 20 30 40 50 60 70

-1.0

-0.5

0.0

0.5

1.0

Chromosome 4 (cM)

cM

geno

type

0 20 40 60 80 100

-1.0

-0.5

0.0

0.5

1.0

Chromosome 5 (cM)

cM

geno

type

Simulation Genotypes

50 eXtremeF2s of 5002 QTL

0 20 40 60 80 100 120

-1.0

-0.5

0.0

0.5

1.0

Chromosome 1 (cM)

cM

geno

type

0 20 40 60 80

-1.0

-0.5

0.0

0.5

1.0

Chromosome 2 (cM)

cM

geno

type

0 50 100 150

-1.0

-0.5

0.0

0.5

1.0

Chromosome 3 (cM)

cM

geno

type

0 10 20 30 40 50 60 70

-1.0

-0.5

0.0

0.5

1.0

Chromosome 4 (cM)

cM

geno

type

0 20 40 60 80 100

-1.0

-0.5

0.0

0.5

1.0

Chromosome 5 (cM)

cM

geno

type

Simulation Chip Noise

50 eXtremeF2s of 5002 QTL

0 20 40 60 80 100 120

-1.0

-0.5

0.0

0.5

1.0

Chromosome 1 (cM)

cM

geno

type

0 20 40 60 80

-1.0

-0.5

0.0

0.5

1.0

Chromosome 2 (cM)

cMge

noty

pe

0 20 40 60 80 120

-1.0

-0.5

0.0

0.5

1.0

Chromosome 3 (cM)

cM

geno

type

0 10 20 30 40 50 60

-1.0

-0.5

0.0

0.5

1.0

Chromosome 4 (cM)

cM

geno

type

0 20 40 60 80 100

-1.0

-0.5

0.0

0.5

1.0

Chromosome 5 (cM)

cM

geno

type

Simulation Likelihood

50 eXtremeF2s of 5002 QTL

0 20 40 60 80 100 120

050

015

00

cM

log

likel

ihoo

d ra

tio

Chromosome 1 (cM)

0 20 40 60 80

050

015

00

cM

log

likel

ihoo

d ra

tio

Chromosome 2 (cM)

0 20 40 60 80 100 120 140

050

015

00

cM

log

likel

ihoo

d ra

tio

Chromosome 3 (cM)

0 10 20 30 40 50 60

050

015

00

cMlo

g lik

elih

ood

ratio

Chromosome 4 (cM)

0 20 40 60 80 100

050

015

00

cM

log

likel

ihoo

d ra

tio

Chromosome 5 (cM)

Array Haplotyping

• Hybridize 48 arrays with 15 accessions

• ~300ng DNAeasy MiniPrep leaf tissue

• Overnight Bioprime Klenow labeling 25C

• "col", "lz", "ler", "bay", "shah", "cvi",

"kas", "c24", "est", "kendl", "mt", "nd", "sorbo", "van", "ws2"

Linkage Disequilibrium explained

1 SNP2 haplotypes

Mutation2 SNPs3 haplotypes

2 SNPs4 haplotypes

recombination

Sequence Variation at a Candidate Locus, LIGHT2

PHYB locus (6.5 kb)

I 143 / L1072 x = 8.4 ± 0.8 mm associat ion test ing

L143 / V1072 x =10.2 ± 0.4 mm p < 0.01 (permutat ion testing)

ColLer

Uk-4SorboTsu-1Wei-0Van-0Ema-1Cvi-0Ts-1Sf-0Se-0

VVVVVVVVV

Ler PHYB protein (1172 aa)

These polymorphismsare in complete LD

LLLLLLLLL

III

L

LL

TA

LL

ER

SH

OR

TE

R

C c c c C c C j j j j j j L L L B B B S S C C C k k c c E E E K K M M M N N N S S S v v V WWW

Cc

cc

Cc

Cj

jj

jj

jL

LL

BB

BS

SC

CC

kk

cc

EE

EK

KM

MM

NN

NS

SS

vv

VW

WW

o o o o o o o w w w w w w e e e a a a h h v v v a a 2 2 s s s e e t t t d d d o o o a a a s s s

oo

oo

oo

ow

ww

ww

we

ee

aa

ah

hv

vv

aa

22

ss

se

et

tt

dd

do

oo

aa

as

ss

l l l l l l l C C C L L L r r r y y y a a i i i s s 4 4 t t t n n 0 0 0 - - - r r r n n n - - -

ll

ll

ll

lC

CC

LL

Lr

rr

yy

ya

ai

ii

ss

44

tt

tn

n0

00

--

-r

rr

nn

n-

--

Pairwise Correlation between and within replicates

Feature Density chr1

Diversity measure

LIGHT1 tstat and raw data

LIGHT1 tstat and raw data

Array Haplotyping

Inbred lines

Low effectiverecombinationdue to partialselfing

Extensive LDblocks

Col Ler Cvi Kas Bay Shah Lz Nd

Chr

omos

ome1

~50

0kb

Quantitative Trait Loci

Feature level model

FLC controls flowering time Difference detected it 3 day old seedlings

Gene Expression index that accounts for feature effect and polymorphisms

PAG1 down regulated in Cvi

PLALE GREEN1 knock out has long hypocotyl in red light

Review

• Single Feature Polymorphisms (SFPs) can be used to identify recombination breakpoints, potential deletions

• Bulk segregant mapping, and – eXtreme Array Mapping of QTL

• Haplotyping Diversity scans

NaturalVariation.orgSalkJon WernerSam HazenSarah LiljegrenRamlah NehringJoanne ChoryJoseph Ecker

UC San DiegoCharles Berry

ScrippsElizabeth Winzeler

SyngentaHur-Song ChangTong Zhu

SalkJon WernerSam HazenSarah LiljegrenRamlah NehringJoanne ChoryJoseph Ecker

UC San DiegoCharles Berry

ScrippsElizabeth Winzeler

NaturalVariation.org

Recommended