35
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

Embed Size (px)

Citation preview

Page 1: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

February 20, 2002UD, Newark, DE

SNPs, Haplotypes, Alleles

Page 2: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

Sequence differences

• Intra-specific differences: between individuals within species

• Inter-specific differences: between orthologous genes in different species

Page 3: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

Intra-specific and inter-specific variation

• Mutations• radiation• chemicals• replication errors• transposable elements• somatic vs. germinal

• Mutation frequency• maize: 6.5 x 10-9 mutations per nucleotide per

year (Gaut et al 1996, PNAS 93, 1997-2001)

Page 4: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

Frequency of SNPs

• Intra-specific diversity• Humans: 1 in 1000 nt = 3,000,000 ea• Maize: 1 in 60~120 nt = 35,000,000 ea• Soybean: 1 in 350 nt =3,000,000 ea• Melon: 1 in 700 nt = 1,400,000

• Inter-specific sequence difference - dependent on evolutionary distance• Humans - chimpanzees 1 in 100 nt

Page 5: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

Types of sequence variation

• Single Nucleotide Polymorphisms (SNPs)

• Insertions / Deletions (Indels)

• Silent mutations vs. amino acid changing mutations

• Nonsense mutations

• Missense mutations

• Frameshifts

• Simple sequence repeats

Page 6: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

SNPs and Indels

DNA Sequence

...GATATTCGTACGGATGT-TCCA...

...GATGTTCGTACTGATGTATCCA...

...GATATTCGTACGGATGT-TCCA...

...GATATTCGTACGGATGTATCCA...

...GATGTTCGTACTGATGTATCCA...

...GATGTTCGTACTGATGTATCCA...

SN

P

SN

P

Indi

vidu

als

1 2

3

4

5 6

Ind

el

Page 7: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

Haploypes

DNA Sequence

A G - G T A A G - A G A G T A G T A

SN

P

SN

P

Indi

vidu

als

1 2

3

4

5 6

Ind

el

AG-GTAAG-AGAGTAGTA

Haplotypes

Page 8: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

Haplotypes

1.AG-2.GTA3.AG-4.AGA5.GTA6.GTA

Haplotype Sequence Frequency

1 AG- 2/6

2 GTA 3/6

3 AGA 1/6

Haplotypes provide more information than individual SNPs

Haplotype list

Page 9: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

Haplotype

• A set of closely linked genetic markers present on one chromosome which tend to be inherited together (not easily separable by recombination). Some haplotypes may be in linkage disequilibrium

(from Birgid Schlindwein's Hypermedia Glossary Of Genetic Terms)

G G A C A

Set of SNP polymorphisms: a SNP haplotype

Page 10: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

Diploids vs Haploids

Haploid cellDiploid cell

Chr1 Chr2 Chr1 Chr2

Page 11: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

Homo vs. Hetero

Chr1 Chr2 Chr1 Chr2

Homozygous Heterozygous

Page 12: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

Haploid > Diploid

Haploid cell

Chr1 Chr2

Haploid cell

Chr1 Chr2 Chr1 Chr2

DiploidHeterozygousExample: sex cells

Page 13: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

Problem of Phase

Chr1 G AT C

SNP1 SNP2

Observed: SNP1 G / T SNP2 A/CPossible Haplotypes: GA, TC or GC, TA

Page 14: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

How to resolve the problem of phase ?

G AT C

SNP1 SNP2

T C

SNP1 SNP2G A

Experimental solution

Page 15: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

Computational solution

G AT C

G AT C

A TG A

T CA T

G AT C

A TG A

G AT C

A TG A

Not all combinations occur. Need to observe several haplotypes in various combinations

Page 16: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

Linkage Disequilibrium: “Non-random association of alleles”

Equilibrium

1 21 2

Disequilibrium

3 3

3 3

Locus 1

Loc

us 2

D’=0

6

6

Locus 2

Loc

us 2

D’=1

Page 17: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

Linkage Disequilibrium

1 2

Disequilibrium

Single marker is enough to completely define haplotypes in this example. Second marker provides redundant information.In a general case, a subset of the markers will be sufficient to define major haplotypes

Page 18: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

Linkage Disequilibrium

1 2

Disequilibrium

For example, marker in the 3’-UTR will be completely predicted by the marker at the 5’-end of the gene and vice versa, if LD extends across the gene.

Page 19: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

A A T A C G A TA A T A C G A TA A T A C G A TA A T A C G A TA A T A C G A TA A T A C G A TA A T A C G A TA A T A C G A TG A T A C A A I/D CG A T A C A A I/D CG A T A C A A I/D CG A T A C A A I/D CG A T A C A A I/D CG A T A C A A I/D CG A T A C A A I/D CG A T A C A A I/D CG A T A C A A I/D TG A T A C A A I/D TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C T

Redundant

A-TAICAITC-T

Uniquely defined haplotypes:

Stearoyl-ACP desaturase

Page 20: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

Linkage disequilibrium

• The extent of linkage disequilibrium (LD) in the germplasm is important for association mapping

• The LD in the population depends on population history

• LD is also expected to vary along the length of the genome: regions that recombine less will have more LD and vice versa.

Page 21: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

Recombination of ancestral haplotypes

Parents Progeny

Conserved ancestral haplotypes are reduced in size by recombinationSize of conserved segments depends on the history of the population and on

recombination frequency of the genome segment of interest

time

High LD Low LD

Page 22: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

DNA Sequence

Sequence diversity

GATATTCGTACGGATGATGTTCGTACTGATGATATTCGTACGGATGATATTCGTACGGATGATGTTCGTACTGATGATGTTCGTACTGAT

SN

P

SN

P

Indi

vidu

als

1 2

3

4

5 6

Genetic Map

Phenotype

ResistantSensitiveResistantResistantSensitiveSensitive

Page 23: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

DNA Sequence

SNP Haplotypes

GATATTCGTACGGATGATGTTCGTACTGATGATATTCGTACGGATGATATTCGTACGGATGATGTTCGTACTGATGATGTTCGTACTGAT

SN

P

SN

P

Indi

vidu

als

with

cont

rast

ing

phen

otyp

es

1 2

3

4

5 6

Genetic MapCandidate gene

Phenotypedistribution

Page 24: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

Allelic series and phenotypes(Hypothetical example)

Haplotype Frequency Phenotypicmean

GATTGTA 0.35 96

GATTATA 0.55 89

GTTCATA 0.10 115

Haplotype information provides better resolution!

102

Page 25: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

Haplotype diversity in maize elite germplasm

Ada Ching, Dinakar Bhattramakki, Antoni Rafalski

Page 26: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

A A T A C G A TA A T A C G A TA A T A C G A TA A T A C G A TA A T A C G A TA A T A C G A TA A T A C G A TA A T A C G A TG A T A C A A I/D CG A T A C A A I/D CG A T A C A A I/D CG A T A C A A I/D CG A T A C A A I/D CG A T A C A A I/D CG A T A C A A I/D CG A T A C A A I/D CG A T A C A A I/D TG A T A C A A I/D TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C TG C A T T A C T

1

2

3

4

Genetic diversity in maize breeding germplasm

• This sample of 32 individuals provides an excellent representation of maize elite germplasm

• Conserved haplotypes over several hundered bp

• Small number of haplotypes (2-8)

• Lots of polymorphisms• SNP frequency: 1/61 bp

• Insertion / deletion frequency: 1/126 bp

Stearoyl-ACP desaturase

32 inbreds - 4 haplotypes

Page 27: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

Two approaches to association mapping

• Candidate gene approach• Testing candidates for association with the trait

of interest

• Whole genome scan approach• Testing thousands of markers distributed along

the genome for association with trait• Suitable only for large linkage disequilibrium

situations

Page 28: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

Consequences of LD

LD: High Low

Resolution Low High

Required number of markers Low High

Approach to associationmapping

Whole genomescan

Candidategene only

LD

Page 29: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

Extent of linkage disequilibrium in maize elite breeding germplasm

Page 30: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

Distance dependence of LD: abs(D')

0

0.2

0.4

0.6

0.8

1

1.2

0 100 200 300 400 500 600

Distance (bp)

LD

me

as

ure

: A

bs

(D')

18 genes32-35 individuals

Page 31: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

Distance dependence of LD: R squared

0

0.2

0.4

0.6

0.8

1

1.2

0 100 200 300 400 500 600

Distance (bp)

LD

me

as

ure

: R

sq

ua

red

No decline of LD within 500 bp was detected

Page 32: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

LD at Adh1: ~120kb

137522-137693T T T C T C G T T - G C C G G CT T T C T C G T T - G C C G G CG C C C C T A A A - G T C G G AG C C C C T A A A - A C C G G CT T T C T C G T T - G C C G G CT T T C T C G T T - A C C G G CT T T C T C G T T - G C C G G CG C C C C T A A A - G T C G G AT T T C T C G T T - A C C G G CG C C C C T A A A - G T C G G AT T T C T C G T T - G C C G G CT T T C T C G T T - G C C G G CT T T C T C G T T - G C C C T CG C C C C T A A A - G T C G G AT T T C T C G T T - A C C G G CG C C C C T A A A - G T C G G AG C C C C T A A A - G T C G G AT T T C T C G T T - A C C G G CT T T C T C G T T - A C C G G CT T T C T C G T T - G C C G G CG C C C C T A A A - G T C G G AG C C C C T A A A - G T C G G AT T T C T C A A A - A C C G G CA T T T C C G T T - G C A G G CT T T C T C G T T - G C C G G CT T T C T C G T T - A C C G G CT T T C T C G T T - G C C G G CT T T C T C G T T - G C A G G CT T T C T C G T T - A C C G G CT T T C T C G T T - G C C G G CT T T C T C G T T - A C C G G C

16523-16864

D’ = 1.0R2= 0.85Fisher P<0.001

)1()1(/ RRQQDr

21122211 D

121,171 bp

Mark Jung, Ada Ching

Page 33: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

Conclusions

• Significant LD extending in some cases to >100 kb

• Whole genome scans may be possible

• More data is needed on LD in other regions of the genome and in other populations

• Haplotype association analysis adds power

• Work in progress (D. Bhattramakki)

Page 34: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

Haplotype Organization of the Genome

A C A CC

G A T AT

T A C GC

A G A TG

(After Lander et al. 2001)

Low recombinationConserved haplotype

Low recombinationConserved haplotype

High recombination

Page 35: February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles

A.R. 1-25-2001