17
Association Studies, Haplotype Blocks and Tagging SNPs Prof. Sorin Istrail

Association Studies, Haplotype Blocks and Tagging SNPs Prof. Sorin Istrail

Embed Size (px)

Citation preview

Association Studies, Haplotype Blocks and

Tagging SNPs

Prof. Sorin Istrail

Association studies

DiseaseResponder

ControlNon-responder

Allele 0 Allele 1

Marker A is associated with

Phenotype

Marker A:

Allele 0 =

Allele 1 =

Association studies• Evaluate whether

nucleotide polymorphisms associate with phenotype

T A GA A

C G GA A

C G TA A

T A TC G

T G TA G

T G GA G

T A GA A

C G GA A

C G TA A

T A TC G

T G TA G

T G GA G

Association studies

Hypothesis – Haplotype Blocks?

The genome consists largely of blocks of

common SNPs with relatively little recombination

within the blocks Patil et al., Science, 2001; Jeffreys et al., Nature Genetics, 2001; Daly et al., Nature Genetics, 2001

Sense genes

Antisense genes

200 kb

1 2 3 4

DNA

SNPs

Haplotypeblocks

Haplotype Block StructureLD-Blocks, and 4-Gamete Test Blocks

One definition of block

•Based on the Four Gamete test.

•Intuition: when between two SNPs there are all four gametes, there is a recombination point somewhere inbetween the two sites

Four Gamete Block Test• Hudson and Kaplan 1985

A segment of SNPs is a block if between every pair of SNPs at most 3 out of the 4 gametes (00, 01,10,11) are observed.

0 0 10 1 11 1 01 1 1

0 0 10 1 11 1 01 0 1

BLOCK VIOLATES THE BLOCK DEFINITION

Finding Recombination Hotspots:Many Possible Partitions into Blocks

A C T A G A T A G C C TG T T C G A C A A C A TA C T C T A T G A T C GG T T A T A C G A C A TA C T C T A T A G T A TA C T A G C T G G C A T

All four gametes are present:

A C T A G A T A G C C TG T T C G A C A A C A TA C T C T A T G A T C GG T T A T A C G A C A TA C T C T A T A G T A TA C T A G C T G G C A T

Find the left-most right endpoint of any constraint and mark the site

before it a recombination site.

Eliminate any constraints crossing that site.

Repeat until all constraints are gone.

The final result is a minimum-size set of sites crossing all constraints.

Tagging SNPs

ACGATCGATCATGAT

GGTGATTGCATCGAT

ACGATCGGGCTTCCG

ACGATCGGCATCCCG

GGTGATTATCATGAT

A------A---TG--

G------G---CG--

A------G---TC--

A------G---CC--

G------A---TG--

An example of real data set

and its haplotype block

structure. Colors refer to the

founding population, one

color for each founding

haplotype

Only 4 SNPs are needed to tag

all the different haplotypes

Informativeness A measure for the “information” a SNP contains about about another SNP. Useful for designing SNPs Arraysand Tagging SNPs selection.

0 1 00 1

0 1 10 0

s

h2

h1

1 0 00 0

0 1 00 1

0 1 10 0

1 0 11 1s1 s2 s3 s4 s5

I(s1,s2) = 2/4 = 1/2

Informativeness

1 0 00 0

0 1 00 1

0 1 10 0

1 0 11 1s1 s2 s3 s4 s5

I({s1,s2}, s4) = 3/4

Informativeness

1 0 00 0

0 1 00 1

0 1 10 0

1 0 11 1s1 s2 s3 s4 s5

I({s3,s4},{s1,s2,s5}) = 3

S={s3,s4} is a

Minimal Informative Subset

Informativeness

Minimum Set Cover= Minimum Informative Subset

s1

s2

s5

s3

s4

e1

e2

e3

e4

e5

e6

SNPs Edges

1 0 00 0

0 1 00 1

0 1 10 0

1 0 11 1

s1

s2

s3

s4

s5

Graph theory insight

Informativeness

Minimum Set Cover {s3, s4}= Minimum Informative Subset

s1

s2

s5

s3

s4

e1

e2

e3

e4

e5

e6

SNPs Edges

1 0 00 0

0 1 00 1

0 1 10 0

1 0 11 1

s1

s2

s3

s4

s5

Informativeness

Graph theory insight