Upload
dulcie-sutton
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
February 20, 2002UD, Newark, DE
SNPs, Haplotypes, Alleles
Sequence differences
• Intra-specific differences: between individuals within species
• Inter-specific differences: between orthologous genes in different species
Intra-specific and inter-specific variation
• Mutations• radiation• chemicals• replication errors• transposable elements• somatic vs. germinal
• Mutation frequency• maize: 6.5 x 10-9 mutations per nucleotide per
year (Gaut et al 1996, PNAS 93, 1997-2001)
Frequency of SNPs
• Intra-specific diversity• Humans: 1 in 1000 nt = 3,000,000 ea• Maize: 1 in 60~120 nt = 35,000,000 ea• Soybean: 1 in 350 nt =3,000,000 ea• Melon: 1 in 700 nt = 1,400,000
• Inter-specific sequence difference - dependent on evolutionary distance• Humans - chimpanzees 1 in 100 nt
Types of sequence variation
• Single Nucleotide Polymorphisms (SNPs)
• Insertions / Deletions (Indels)
• Silent mutations vs. amino acid changing mutations
• Nonsense mutations
• Missense mutations
• Frameshifts
• Simple sequence repeats
SNPs and Indels
DNA Sequence
...GATATTCGTACGGATGT-TCCA...
...GATGTTCGTACTGATGTATCCA...
...GATATTCGTACGGATGT-TCCA...
...GATATTCGTACGGATGTATCCA...
...GATGTTCGTACTGATGTATCCA...
...GATGTTCGTACTGATGTATCCA...
SN
P
SN
P
Indi
vidu
als
1 2
3
4
5 6
Ind
el
Haploypes
DNA Sequence
A G - G T A A G - A G A G T A G T A
SN
P
SN
P
Indi
vidu
als
1 2
3
4
5 6
Ind
el
AG-GTAAG-AGAGTAGTA
Haplotypes
Haplotypes
1.AG-2.GTA3.AG-4.AGA5.GTA6.GTA
Haplotype Sequence Frequency
1 AG- 2/6
2 GTA 3/6
3 AGA 1/6
Haplotypes provide more information than individual SNPs
Haplotype list
Haplotype
• A set of closely linked genetic markers present on one chromosome which tend to be inherited together (not easily separable by recombination). Some haplotypes may be in linkage disequilibrium
(from Birgid Schlindwein's Hypermedia Glossary Of Genetic Terms)
G G A C A
Set of SNP polymorphisms: a SNP haplotype
Diploids vs Haploids
Haploid cellDiploid cell
Chr1 Chr2 Chr1 Chr2
Homo vs. Hetero
Chr1 Chr2 Chr1 Chr2
Homozygous Heterozygous
Haploid > Diploid
Haploid cell
Chr1 Chr2
Haploid cell
Chr1 Chr2 Chr1 Chr2
DiploidHeterozygousExample: sex cells
Problem of Phase
Chr1 G AT C
SNP1 SNP2
Observed: SNP1 G / T SNP2 A/CPossible Haplotypes: GA, TC or GC, TA
How to resolve the problem of phase ?
G AT C
SNP1 SNP2
T C
SNP1 SNP2G A
Experimental solution
Computational solution
G AT C
G AT C
A TG A
T CA T
G AT C
A TG A
G AT C
A TG A
Not all combinations occur. Need to observe several haplotypes in various combinations
Linkage Disequilibrium: “Non-random association of alleles”
Equilibrium
1 21 2
Disequilibrium
3 3
3 3
Locus 1
Loc
us 2
D’=0
6
6
Locus 2
Loc
us 2
D’=1
Linkage Disequilibrium
1 2
Disequilibrium
Single marker is enough to completely define haplotypes in this example. Second marker provides redundant information.In a general case, a subset of the markers will be sufficient to define major haplotypes
Linkage Disequilibrium
1 2
Disequilibrium
For example, marker in the 3’-UTR will be completely predicted by the marker at the 5’-end of the gene and vice versa, if LD extends across the gene.
A A T A C G A TA A T A C G A TA A T A C G A TA A T A C G A TA A T A C G A TA A T A C G A TA A T A C G A TA A T A C G A TG A T A C A A I/D CG A T A C A A I/D CG A T A C A A I/D CG A T A C A A I/D CG A T A C A A I/D CG A T A C A A I/D CG A T A C A A I/D CG A T A C A A I/D CG A T A C A A I/D TG A T A C A A I/D TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C T
Redundant
A-TAICAITC-T
Uniquely defined haplotypes:
Stearoyl-ACP desaturase
Linkage disequilibrium
• The extent of linkage disequilibrium (LD) in the germplasm is important for association mapping
• The LD in the population depends on population history
• LD is also expected to vary along the length of the genome: regions that recombine less will have more LD and vice versa.
Recombination of ancestral haplotypes
Parents Progeny
Conserved ancestral haplotypes are reduced in size by recombinationSize of conserved segments depends on the history of the population and on
recombination frequency of the genome segment of interest
time
High LD Low LD
DNA Sequence
Sequence diversity
GATATTCGTACGGATGATGTTCGTACTGATGATATTCGTACGGATGATATTCGTACGGATGATGTTCGTACTGATGATGTTCGTACTGAT
SN
P
SN
P
Indi
vidu
als
1 2
3
4
5 6
Genetic Map
Phenotype
ResistantSensitiveResistantResistantSensitiveSensitive
DNA Sequence
SNP Haplotypes
GATATTCGTACGGATGATGTTCGTACTGATGATATTCGTACGGATGATATTCGTACGGATGATGTTCGTACTGATGATGTTCGTACTGAT
SN
P
SN
P
Indi
vidu
als
with
cont
rast
ing
phen
otyp
es
1 2
3
4
5 6
Genetic MapCandidate gene
Phenotypedistribution
Allelic series and phenotypes(Hypothetical example)
Haplotype Frequency Phenotypicmean
GATTGTA 0.35 96
GATTATA 0.55 89
GTTCATA 0.10 115
Haplotype information provides better resolution!
102
Haplotype diversity in maize elite germplasm
Ada Ching, Dinakar Bhattramakki, Antoni Rafalski
A A T A C G A TA A T A C G A TA A T A C G A TA A T A C G A TA A T A C G A TA A T A C G A TA A T A C G A TA A T A C G A TG A T A C A A I/D CG A T A C A A I/D CG A T A C A A I/D CG A T A C A A I/D CG A T A C A A I/D CG A T A C A A I/D CG A T A C A A I/D CG A T A C A A I/D CG A T A C A A I/D TG A T A C A A I/D TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C T
1
2
3
4
Genetic diversity in maize breeding germplasm
• This sample of 32 individuals provides an excellent representation of maize elite germplasm
• Conserved haplotypes over several hundered bp
• Small number of haplotypes (2-8)
• Lots of polymorphisms• SNP frequency: 1/61 bp
• Insertion / deletion frequency: 1/126 bp
Stearoyl-ACP desaturase
32 inbreds - 4 haplotypes
Two approaches to association mapping
• Candidate gene approach• Testing candidates for association with the trait
of interest
• Whole genome scan approach• Testing thousands of markers distributed along
the genome for association with trait• Suitable only for large linkage disequilibrium
situations
Consequences of LD
LD: High Low
Resolution Low High
Required number of markers Low High
Approach to associationmapping
Whole genomescan
Candidategene only
LD
Extent of linkage disequilibrium in maize elite breeding germplasm
Distance dependence of LD: abs(D')
0
0.2
0.4
0.6
0.8
1
1.2
0 100 200 300 400 500 600
Distance (bp)
LD
me
as
ure
: A
bs
(D')
18 genes32-35 individuals
Distance dependence of LD: R squared
0
0.2
0.4
0.6
0.8
1
1.2
0 100 200 300 400 500 600
Distance (bp)
LD
me
as
ure
: R
sq
ua
red
No decline of LD within 500 bp was detected
LD at Adh1: ~120kb
137522-137693T T T C T C G T T - G C C G G CT T T C T C G T T - G C C G G CG C C C C T A A A - G T C G G AG C C C C T A A A - A C C G G CT T T C T C G T T - G C C G G CT T T C T C G T T - A C C G G CT T T C T C G T T - G C C G G CG C C C C T A A A - G T C G G AT T T C T C G T T - A C C G G CG C C C C T A A A - G T C G G AT T T C T C G T T - G C C G G CT T T C T C G T T - G C C G G CT T T C T C G T T - G C C C T CG C C C C T A A A - G T C G G AT T T C T C G T T - A C C G G CG C C C C T A A A - G T C G G AG C C C C T A A A - G T C G G AT T T C T C G T T - A C C G G CT T T C T C G T T - A C C G G CT T T C T C G T T - G C C G G CG C C C C T A A A - G T C G G AG C C C C T A A A - G T C G G AT T T C T C A A A - A C C G G CA T T T C C G T T - G C A G G CT T T C T C G T T - G C C G G CT T T C T C G T T - A C C G G CT T T C T C G T T - G C C G G CT T T C T C G T T - G C A G G CT T T C T C G T T - A C C G G CT T T C T C G T T - G C C G G CT T T C T C G T T - A C C G G C
16523-16864
D’ = 1.0R2= 0.85Fisher P<0.001
)1()1(/ RRQQDr
21122211 D
121,171 bp
Mark Jung, Ada Ching
Conclusions
• Significant LD extending in some cases to >100 kb
• Whole genome scans may be possible
• More data is needed on LD in other regions of the genome and in other populations
• Haplotype association analysis adds power
• Work in progress (D. Bhattramakki)
Haplotype Organization of the Genome
A C A CC
G A T AT
T A C GC
A G A TG
(After Lander et al. 2001)
Low recombinationConserved haplotype
Low recombinationConserved haplotype
High recombination
A.R. 1-25-2001