View
229
Download
0
Embed Size (px)
Citation preview
eXtreme Array Mappingand Haplotype analysisUsing Arrays
Justin BorevitzSalk Institutenaturalvariation.org
Talk Outline
• Bulk Segregant Mapping of– Mendelian mutations
• eXtreme Array Mapping of QTL– Kas x Col RILs and Simulations
• Haplotype analysis– Patterns Global Variation Selection
Potential Deletions
False Discovery and Sensitivity
Permuted data
real data
5% FDR
PM only SAM threshold
5% FDR
GeneChip SFPs nonSFPs Cereon marker accuracy 3806 89118 100% Sequence 817 121 696 Sensitivity
Polymorphic 340 117 223 34% Non-polymorphic 477 4 473
False Discovery rate: 3% Test for independence of all factors: Chisq = 177.34, df = 1, p- value = 1.845e- 40
Observed t statistics vsNull (permuted) t statistics
Chip genotyping of a Recombinant Inbred Line
29kb interval
Potential Deletions
111 potential deletions45 confirmed by Ler sequence
23 (of 114) transposons
Disease Resistance(R) gene clusters
Single R gene deletions
Genes involved in Secondary metabolism
Unknown genes
Potential Deletions Suggest Candidate Genes
deletion of MAF1
FLOWERING1 QTL
Chr1 (bp)
Flowering Time QTL caused by a natural deletion in MAF1
MAF1
211 172
135
46
54 70
53
EST Deletions all KASC ALL DEL
LER del all
19838all genes
Deletions between Accessions
Fast Neutron deletions
FKF1 80kb deletion CHR1 cry2 10kb deletion CHR1
Het
Map bibb100 bibb mutant plants100 wt mutant plants
bibb mapping
ChipMapAS1
Bulk segregantMapping usingChip hybridization
bibb maps toChromosome2 near ASYMETRIC LEAVES1
BIBB = ASYMETRIC LEAVES1
Sequenced AS1 coding region from bib-1 …found g -> a change that would introduce a stop codon in the MYB domain
bibb as1-101
MYB
bib-1W49*
as-101Q107*
as1bibb
AS1 (ASYMMETRIC LEAVES1) =MYB closely related toPHANTASTICA located at 64cM
Other Mendelian mutations
aar21 arhythmic ein6 ethylene insensitive (no een?)Also aar90, aar60 and stamenstay
Short pool –Tall pool
Kas x Col RILsall Features
RED2 QTL
LOD
0 50 100 150
-200
020
040
0
log
likel
ihoo
d ra
tio
Chromosome 1
0 50 100 150
-200
020
040
0
log
likel
ihoo
d ra
tio
Chromosome 2
0 50 100 150
-200
020
040
0
log
likel
ihoo
d ra
tio
Chromosome 3
0 50 100 150
-200
020
040
0
log
likel
ihoo
d ra
tio
Chromosome 4
0 50 100 150
-200
020
040
0
log
likel
ihoo
d ra
tio
Chromosome 5
eXtreme Array Mapping
Red light QTL RED2 from 100 Kas/ Col RILs
QTL likelihood model using bulk segregant analysis with SFP genotyping
0
4
8
12
16
0 20 40 60 80 100cM
LO
D
Composite Interval MappingRED2 QTL
RED2 QTL
Chromosome 2
15 tallest RILs pooled vs15 shortest RILs pooled
Simulation Genotypes
0 20 40 60 80 100 120
-1.0
-0.5
0.0
0.5
1.0
Chromosome 1 (cM)
cM
geno
type
0 20 40 60 80
-1.0
-0.5
0.0
0.5
1.0
Chromosome 2 (cM)
cMge
noty
pe
0 50 100 150
-1.0
-0.5
0.0
0.5
1.0
Chromosome 3 (cM)
cM
geno
type
0 10 20 30 40 50 60 70
-1.0
-0.5
0.0
0.5
1.0
Chromosome 4 (cM)
cM
geno
type
0 20 40 60 80 100
-1.0
-0.5
0.0
0.5
1.0
Chromosome 5 (cM)
cM
geno
type
15 eXtremeRILs of 1002 QTL chr2 37%var chr5 13%var
Simulation Genotypes
100 eXtremeRILs of 7002 QTL
0 20 40 60 80 100 120
-1.0
-0.5
0.0
0.5
1.0
Chromosome 1 (cM)
cM
geno
type
0 20 40 60 80
-1.0
-0.5
0.0
0.5
1.0
Chromosome 2 (cM)
cMge
noty
pe0 50 100 150
-1.0
-0.5
0.0
0.5
1.0
Chromosome 3 (cM)
cM
geno
type
0 10 20 30 40 50 60 70
-1.0
-0.5
0.0
0.5
1.0
Chromosome 4 (cM)
cM
geno
type
0 20 40 60 80 100
-1.0
-0.5
0.0
0.5
1.0
Chromosome 5 (cM)
cM
geno
type
Simulation Genotypes
50 eXtremeF2s of 5002 QTL
0 20 40 60 80 100 120
-1.0
-0.5
0.0
0.5
1.0
Chromosome 1 (cM)
cM
geno
type
0 20 40 60 80
-1.0
-0.5
0.0
0.5
1.0
Chromosome 2 (cM)
cM
geno
type
0 50 100 150
-1.0
-0.5
0.0
0.5
1.0
Chromosome 3 (cM)
cM
geno
type
0 10 20 30 40 50 60 70
-1.0
-0.5
0.0
0.5
1.0
Chromosome 4 (cM)
cM
geno
type
0 20 40 60 80 100
-1.0
-0.5
0.0
0.5
1.0
Chromosome 5 (cM)
cM
geno
type
Simulation Chip Noise
50 eXtremeF2s of 5002 QTL
0 20 40 60 80 100 120
-1.0
-0.5
0.0
0.5
1.0
Chromosome 1 (cM)
cM
geno
type
0 20 40 60 80
-1.0
-0.5
0.0
0.5
1.0
Chromosome 2 (cM)
cMge
noty
pe
0 20 40 60 80 120
-1.0
-0.5
0.0
0.5
1.0
Chromosome 3 (cM)
cM
geno
type
0 10 20 30 40 50 60
-1.0
-0.5
0.0
0.5
1.0
Chromosome 4 (cM)
cM
geno
type
0 20 40 60 80 100
-1.0
-0.5
0.0
0.5
1.0
Chromosome 5 (cM)
cM
geno
type
Simulation Likelihood
50 eXtremeF2s of 5002 QTL
0 20 40 60 80 100 120
050
015
00
cM
log
likel
ihoo
d ra
tio
Chromosome 1 (cM)
0 20 40 60 80
050
015
00
cM
log
likel
ihoo
d ra
tio
Chromosome 2 (cM)
0 20 40 60 80 100 120 140
050
015
00
cM
log
likel
ihoo
d ra
tio
Chromosome 3 (cM)
0 10 20 30 40 50 60
050
015
00
cMlo
g lik
elih
ood
ratio
Chromosome 4 (cM)
0 20 40 60 80 100
050
015
00
cM
log
likel
ihoo
d ra
tio
Chromosome 5 (cM)
Array Haplotyping
• Hybridize 48 arrays with 15 accessions
• ~300ng DNAeasy MiniPrep leaf tissue
• Overnight Bioprime Klenow labeling 25C
• "col", "lz", "ler", "bay", "shah", "cvi",
"kas", "c24", "est", "kendl", "mt", "nd", "sorbo", "van", "ws2"
Linkage Disequilibrium explained
1 SNP2 haplotypes
Mutation2 SNPs3 haplotypes
2 SNPs4 haplotypes
recombination
Sequence Variation at a Candidate Locus, LIGHT2
PHYB locus (6.5 kb)
I 143 / L1072 x = 8.4 ± 0.8 mm associat ion test ing
L143 / V1072 x =10.2 ± 0.4 mm p < 0.01 (permutat ion testing)
ColLer
Uk-4SorboTsu-1Wei-0Van-0Ema-1Cvi-0Ts-1Sf-0Se-0
VVVVVVVVV
Ler PHYB protein (1172 aa)
These polymorphismsare in complete LD
LLLLLLLLL
III
L
LL
TA
LL
ER
SH
OR
TE
R
C c c c C c C j j j j j j L L L B B B S S C C C k k c c E E E K K M M M N N N S S S v v V WWW
Cc
cc
Cc
Cj
jj
jj
jL
LL
BB
BS
SC
CC
kk
cc
EE
EK
KM
MM
NN
NS
SS
vv
VW
WW
o o o o o o o w w w w w w e e e a a a h h v v v a a 2 2 s s s e e t t t d d d o o o a a a s s s
oo
oo
oo
ow
ww
ww
we
ee
aa
ah
hv
vv
aa
22
ss
se
et
tt
dd
do
oo
aa
as
ss
l l l l l l l C C C L L L r r r y y y a a i i i s s 4 4 t t t n n 0 0 0 - - - r r r n n n - - -
ll
ll
ll
lC
CC
LL
Lr
rr
yy
ya
ai
ii
ss
44
tt
tn
n0
00
--
-r
rr
nn
n-
--
Pairwise Correlation between and within replicates
Feature Density chr1
Diversity measure
LIGHT1 tstat and raw data
LIGHT1 tstat and raw data
Array Haplotyping
Inbred lines
Low effectiverecombinationdue to partialselfing
Extensive LDblocks
Col Ler Cvi Kas Bay Shah Lz Nd
Chr
omos
ome1
~50
0kb
Quantitative Trait Loci
Feature level model
FLC controls flowering time Difference detected it 3 day old seedlings
Gene Expression index that accounts for feature effect and polymorphisms
PAG1 down regulated in Cvi
PLALE GREEN1 knock out has long hypocotyl in red light
Review
• Single Feature Polymorphisms (SFPs) can be used to identify recombination breakpoints, potential deletions
• Bulk segregant mapping, and – eXtreme Array Mapping of QTL
• Haplotyping Diversity scans
NaturalVariation.orgSalkJon WernerSam HazenSarah LiljegrenRamlah NehringJoanne ChoryJoseph Ecker
UC San DiegoCharles Berry
ScrippsElizabeth Winzeler
SyngentaHur-Song ChangTong Zhu
SalkJon WernerSam HazenSarah LiljegrenRamlah NehringJoanne ChoryJoseph Ecker
UC San DiegoCharles Berry
ScrippsElizabeth Winzeler
NaturalVariation.org