12
Applying haplotype models to association study design Natalie Castellana June 7, 2005

Applying haplotype models to association study design Natalie Castellana June 7, 2005

  • View
    217

  • Download
    4

Embed Size (px)

Citation preview

Applying haplotype models to association study design

Natalie Castellana

June 7, 2005

Background

Certain characteristics are linked to genetic factors.

By finding models for genetic variation, we can determine which genes render an individual susceptible to a certain disease.

Normal Allele

Mutant allele

1000011010110100000010

Background(2)

SNPHaplotype

Association TestGiven that this sample has haplotype 01101, does it have the disease?

…1110101…

…1000011…

Genetic Variation

Mutation:

…1000001…

Recombination:

…1110011…

…1000101…

…1001001…

Because of recombination, similar genetic variation can be found within closely linked regions.

Generating Data

Generate genetic segments Isolate the disease causing allele, and

segregate the case (diseased) samples from the control (healthy) samples.

…1 0 1 1 1 0 1 1 1 1…

…1 1 1 0 0 1 1 0 1 0 …

…0 0 1 1 0 0 1 1 0 1…

…1 0 1 0 1 1 0 0 1 0…

Control Case

Testing individual SNP’s

Go through each SNP and determine which SNP’s accurately predict which samples have the disease and which do not.

Case Control

10 11….. 010 0…

01 10...... 100 1…

Haplotype block method

Instead of looking at each individual SNP, we can look at groups of contiguous SNP’s.

1101000000…11…

1101100100…01…

0111000000…10…

1101100100…00…

Blocks vs. SNP’s

High Return (Bounded Blocks)C = 15

0

100

200

300

400

500

600

700

800

900

1000

1 39 77 115 153 191 229 267 305 343 381 419 457 495 533 571 609 647 685 723 761 799 837 875 913 951 989 1027 1065

SNP Location

Chi S

quar

e Va

lue

Blocks

SNPs

Haplotype motif method

Notion that a sequence is the concatenation of segments (like the block method) but does not require conservation of boundaries.

1101000000…1100100100…0111000000…1101100111…

Approximation Algorithm

General idea:

…1 0 0 0 1 …………………………………

c c c cc c c c

Pick the best partition, minimizing the number of motifs needed to explain all the data.

Finding Motifs

C

0 1 1 0 1 0 0 1 1 0 0 0 1 1 0 0 1

000…000 000..100

0 1

……… 111…111

What I am working on

Implementing an efficient algorithm for finding motifs.

Performing association tests on the SNP, block, and motif models.