64
Computational Problems in Computational Problems in Haplotype Recognition Haplotype Recognition by Ali Katanforoush Under supervision of Dr Hamid Pezeshk and Dr Mehdi Sadeghi  A thesis submitted to the Graduate Studies Office of University of Tehran In partial fulfillment of the requirements for the degree of Doctor of Philosophy in Bioinformatics Doctor of Philosophy in Bioinformatics Institute of Biochemistry and Biophysics                November 1, 2009

Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Computational Problems in Computational Problems in Haplotype RecognitionHaplotype Recognition

by

Ali Katanforoush

Under supervision of

Dr Hamid Pezeshk and Dr Mehdi Sadeghi

 A thesis submitted to the Graduate Studies Office ofUniversity of Tehran

In partial fulfillment of the requirements for the degree ofDoctor of Philosophy in BioinformaticsDoctor of Philosophy in Bioinformatics

Institute of Biochemistry and Biophysics                November 1, 2009

Page 2: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Outlines

● Haplotype basis and terminology● Haplotype inference● Haplotype block partitioning● Assessment of haplotype blocks

– Common haplotype coverage and tagSNP coverage

– Robustness of partitioning method

– Application to recombination hotspot detection

– Application to disease association studies

Page 3: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Single Nucleotide Polymorphism; SNP

● A genetic variation in a single nucleotide that is sometimes observed among population; not too rare.

● SNPs are usually bi­allelic.

0

0

0

1

1

1

0

0

0

1

1

0

1

0

0

0

1

0

1

0

0

1

1

1

0

AGGACTAGATAATAGACCG

AGGACCACATTATAGTCCG

AGGACCAGATAATAGTCCG

ATGACCACATTATAGTCCG

ATGACTACATAATAGACCG

Page 4: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Single Nucleotide Polymorphism; SNP

● SNP is the result of a substantiated single site mutation in population.

Page 5: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Single Nucleotide Polymorphism; SNP

● SNPs are the most common form of genetic polymorphism in genomes.

● Each new cell contains ~3 new mutations.● Each new “child” ~20 new mutations.● Currently more then 4.3 million SNPs have been reported 

to dbSNP; (0.1% of whole genome).

--A--------C--------A----G--------T---C---A------T--------G--------A----G--------C---C---A------A--------G--------G----G--------C---C---A------A--------C--------A----G--------T---C---A------T--------C--------A----G--------T---C---A------T--------C--------A----T--------T---A---A----

Page 6: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Haplotype Map of the Human Genome

● Define patterns of genetic variation across human genome.

● Guide selection of SNPs efficiently to “tag” common variants.

● Public release of all data (assays, genotypes).

Page 7: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Haplotype Map of the Human Genome

● Phase I:    1.3 M SNPs in 269 people.● Phase II:  +2.8 M SNPs in 270 people;

– 30 parent­parent­offspring trios from Nigeria (YRI)

– 30 trios of European descent from Utah (CEU)

– 45 unrelated individuals from Beijing (CHB)

– 45 unrelated individuals from Tokyo (JPT)

● Phase III:  1.3 M SNPs in 1184 people (10 panels).

Page 8: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

The first problem; Genotype Phasing

● Every genotype can be considered as sum of two unknown unknown haplotypes.

Real haplotypes

GenotypingGenotypes

Page 9: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

The first problem; Genotype Phasing

● Given a set of genotype samples of unrelated individuals, determine pairs of haplotypes adding up into given genotypes.

Real haplotypes

GenotypingGenotypes

Computationalphasing

Inferred haplotypes

Page 10: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Haplotype inference by maximum parsimony

● Inferring the set of haplotypes consistent to genotype data requiring to some biological considerations.

● Maximum parsimonyMaximum parsimony is one of the most common models in biology.

● Other models; Perfect phylogeny, Maximum likelihood, Bayesian model.

0 1 1 0 00 1 1 0 0

0 0 0 1 10 0 0 1 1

0 0 1 0 10 0 1 0 1

1 0 0 1 11 0 0 1 1

1 1 0 0 01 1 0 0 0

00 11 11 11 11

00 11 2 0 2 0 11

11 0 0 11 11 11

22 11 00 11 11

11 22 11 0 00 0

Page 11: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Haplotype inference by maximum parsimony

● Clark's algorithm (1990); greedy algorithm

● Finding a parsimony solution to haplotype phase is NP-hard, Hubbell (2002), Pinotti et al (2004)

● 0/1 linear programming, Gusfield (2003)

● Branch-and-Bound, Wang (2003)

Page 12: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Methods on other approaches to haplotype inference

● Perfect phylogeny, Gusfield (2002), Filkov and Gusfield and Ding (2006)

● Inference of haplotype frequencies by maximum likelihood

– Expectation-Maximization, EM Slatkin and Excoffier (1995)

– Partition-Ligation, PL-EMQin et al (2002)

● Bayesian model, Smith and Donnelly and Stephens

(2001); PHASE

Page 13: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Genetic Algorithm; GA

min f (x)s.t.   P(x)=true

● Consider N feasible solutions; each one is represented by a bit string called “chromosome”.

● Select “chromosomes” of highest fitness fitness to produce a new generation.

● Cross­over Cross­over random pairs of selected “chromosomes” and mutatemutate some bits on other “chromosomes”.

● The optimal solution should be obtained by long repeats.

Page 14: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Genetic Algorithm for haplotype inference with maximum parsimony

● Given n genotypes on l SNPs; g1, g

2, ..., g

n

find                                   min |H|s.t.                                                                              

● Braaten et al. (2000). The GA applied to haplotype data at the LDL receptor locus.

● Tapadar et al. (2000). Haplotyping in pedigrees via GA.● Azuma et al. (2009). Haplotype frequency estimation by GA.

∃ha , hb∈H : gi=ha⊕hb , for i=1,2, ,n

Page 15: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

A naive Genetic Algorithm for MP haplotyping

● “Chromosome” representation

Page 16: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

A naive Genetic Algorithm for MP haplotyping

● “Crossing­over on chromosomes”

Page 17: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

A naive Genetic Algorithm for MP haplotyping

● “Mutation on chromosomes”

Page 18: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

A parametric greedy phasing aimed to MP

● Input: n genotypes on l SNPs,● Algorithm parameters:

– a permutation of {1,2,...,n}, σ=<σ1,σ

2,...,σ

n>

– a set of “guide haplotypes” {ħ1,ħ

2, ..., ħ

n} where 

ħi~g

i

● In a greedy manner, it tries to resolve g(σi) with one of 

haplotypes resolving g(σ1), g(σ

2), ..., g(σ

i­1), but if it fails 

then applies ħi.

Page 19: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

GAhapThe Genetic Algorithm for MP phasing

● Each “chromosome” ↔ an instances of greedy phasing algorithm

● Various permutations and “guide haplotypes” are encoded by bit­strings.

● Naive procedures for crossing over and mutation are applied on “guide haplotypes”.

● Cross­over and mutation on permutations are also convenient.

Page 20: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Cross­over on permutations

Page 21: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Parameter setting for GAhap

cr crint

mrint selection fitness scaling successful 

cases of 20

0.8 0.9 0.9 stochastics shift linear 18

0.2 0.5 0.5 stochastics rank 16

0.2 0.9 0.1 tournament linear 16

0.2 0.1 0.9 uniform rank 15

0.9 0.9 0.9 tournament rank 14

Page 22: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Effect of “cross over” on convergence

cr=0 cr=0.2

Page 23: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

GAhap vs. other haplotyping methods

method framework |H| haplotype error rate

switch error rate

HAPLOTYPER BayesianDrichlet prior 33 5.4 3.0

PHASE BayesianPerfect phylogeny 32 5.6 3.1

fastPHASE Simplified PHASE 35 7.3 4.5

2SNP 2 SNPs phasingand MST 40 10.4 5.6

GAhap GA and MP 34 9.7 5.7

Methods have been evaluated with 150 genotypes of GH1 with known phases (Horan et al, 2003)

Page 24: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Generate random haplotype samples under coalescent model

Before second problem

● Simulate a coalescent process.

Page 25: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Generate random haplotype samples under coalescent model

Before second problem

● Determine haplotype frequencies constrained to minor allele frequency.

Page 26: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Second problem; Haplotype block partitioning

● Genome comprises regions with certain boundaries of which haplotypes are transfered without change through generations.

● Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Patil et al. (2001)

● Pattern of Linkage Disequilibrium shows a picture of discrete haplotype blocks over genome.Daly et al.(2001)

● Haplotype blocks arise in the absence of recombination hot spots.Wang et al. (2002)

Page 27: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Blocks of limited haplotype diversity

011010

000101

011010

011010

000101

101000

000101

011010

101000

0000

0000

0000

0110

0110

1000

1100

1100

1100

01010

00110

01010

00110

01011

01011

00010

00010

10011

011010011010011010011010

000101000101000101

101000101000

000000000000

01100110

1000

110011001100

0101001010

0101101011

0011000110

0001000010

10011

Page 28: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

An early example of haplotype blocks

courtesy Daly et al. (2001)

● Block structure and common haplotypes on 5q31

Page 29: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Bases of haplotype block definition● Haplotype diversity

– common haplotype–minimum number of SNPs to cover information of 

majority of haplotypes, (Patil et al. 2001, Zhang et al. 2002)

● Linkage Disequilibrium– point estimation of LD coefficient, D, r2, D'– interval estimation, (Gabriel et al. 2002) 

● Four gamete test,  (Wang et al. 2002)

Page 30: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Local partitioning vs. global partitioning methods

● A local partitioning method defines haplotype blocks in a way that boundaries of each block are determined independent from other blocks.

● By local partitioning usually, a series of separated regions on genome, like “islands” forms blocks.

● A global partitioning method defines a whole partitioning for genome rather defining each block independently.

● By Global partitioning usually, genome is “tiled” by blocks tightly placed next to each other.

Page 31: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Methods on haplotype block partitioning

Myers, 05

Zhang, 05

Zhang, 05

Anderson, 03

Wang, 02

Gabriel, 02

Page 32: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Application of haplotype block partitioning● Studies on human origin, history of human migrations 

and genetic diversity between races

● Genetic mapping and recognizing recombination hotspots

● Genotyping and phasing

● tagSNP selection

● Disease Association Study

Page 33: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Global haplotype partitioning for maximal associated SNP pairs

Outlines● Categorize SNP pairs into association classes.● Establish a constrained optimization to find blocks 

which include the most possible number of “associated” pairs subjected to limited number of “independent” pairs.

● Solve the constrained programming.

Page 34: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Linkage Disequilibrium between two SNPs

A B

a b

A B

a b

P AB=P A . PB

P AB≠P A . PB

● Enough long time after being settled

● no admixture● no selection

● In the presence of crossing over,

● In the absence of crossing over,

Page 35: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Standardized coefficient of LD

0

0

Page 36: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Assessment of LD estimation● Confidence interval, (Gabriel et al, 2002)

– Apply thresholds on confidence interval of |D'| 

– Each SNP pair is then categorized into three classes;“strongly associated”, “recombinant” and “uninformative”

● Fisher's exact test and p­value, (present work)

Page 37: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

An association index for SNP pairs based on Fisher's Exact Test

Page 38: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

An association index for SNP pairs based on Fisher's Exact Test

● Estimate value of |D'|on given sample.● Compute p­value of Fisher's exact test.

● Apply thresholds on p­value results in a three state association index;“associated”, “independent” and “not statistically significant”.

Page 39: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Notion of Fisher's Exact Test

n11

Fex

r 2|D'|

Page 40: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Global haplotype partitioning for maximal associated SNP pairs

● Establish a constrained optimization ...

AB

Page 41: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Global haplotype partitioning for maximal associated SNP pairs

● Establish a constraint optimization ...

Page 42: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Solve the constrained programming● Convert into an unconstrained optimization using a 

Lagrange multiplier;

● Given a fixed λ, the partitioning can be obtained via a dynamic programming procedure;

Page 43: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Global haplotype partitioning for maximal associated SNP pairs

Page 44: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Method evaluation● General features of haplotype blocks,

– block length and block distribution– coverage of “common haplotypes”– consistency with LD pattern– the number of minimum tagSNP and coverage– similarity between different partitioning methods

● Robustness of partitioning method.● Performance on identification recombination hotspots● Performance on case­control association study.

Page 45: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Evaluation on ENCODE haplotypes● The Encyclopedia of DNA Elements (ENCODE)● Ten regions have been selected by ENCODE project as the 

pilot phase to identify the functional elements of human genome.

● There are about 2000 SNPs assayed by the HapMap Project in each ENCODE region (CEU panel).

● We reduced SNPs to those which are commonly ascertained for all three HapMap panels.

● Moreover, we drew out the top 400 SNPs ordered by heterozygosity out of each region.

Page 46: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Courtesy of ENSEMBL for genome annotation

Page 47: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

General features of haplotype blocks

Page 48: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Consistency with LD pattern

Page 49: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

The number of haplotype tagging SNPs

● The minimum number of htSNPs for each haplotype block has been obtained using htSNPer (Ding et al. 2005)

Page 50: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

htSNP coverage

Page 51: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Similarity of blocks between different methods

Page 52: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Robustness of block partitioning

Boundaries of haplotype blocks in 9q34.11 obtained by different methods.

92.0

69.2

99.4

99.7

100

97.6

How many times a certain method reproduce the same boundaries when applied to simulated recombinant haplotypes?

Page 53: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Application to recombination hotspots detection● Generate random haplotype samples under coalescent 

model with recombinationwith recombination using msHOT (Hellenthal & 

Stephens 2007);● Two simulated haplotype set, each one with 100 samples● Each sample contains 40/100 haplotypes on 300 SNPs● Six 2kb regions are considered as hotspots regions, in 

random● Recombination rate is chosen 50­400 times higher than 

background for hotspots.

Page 54: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Application to recombination hotspots detection

● Total error rate on detection of recombination hotspots

Page 55: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Application to recombination hotspots detection

Page 56: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Application to disease association study● Single site association test;

● Haplotype­based association test;Chi­squared test on a hierarchical clustering of Chi­squared test on a hierarchical clustering of case/control haplotypescase/control haplotypes

Page 57: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Application to disease association study● Simulate random case­control samples under various  

multiplicative models;

– GRR1 (first genotype relative risk ratio) = 3 , 5

– DAF (disease allele frequency) = 0.05­0.15 , 0.20­0.30

– The sample generator, gs (Li & Chen 2008) simulates the pattern of LD in real haplotypes.

– 500 sample sets of 50 cases / 50 controls for each ENCODE regions have been produced.

– The causative SNP has been removed from samples before assessment

Page 58: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Type I error in the disease association study

Page 59: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Marker selection

● Uniform marker selection; the first SNP out of every k consecutive SNPs is selected as marker.

● Prioritized marker selection;ranking each SNPs based on its “informativeness”, then select markers with respect to the ranking in each haplotype block.

Page 60: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Effect of marker selection on performance of disease gene identification

● Uniform marker selection

Page 61: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Effect of marker selection on performance of disease gene identification

● Prioritized marker selection

Page 62: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Conclusion

Genotype phasing with maximum parsimonyGenotype phasing with maximum parsimony● Incorporating a parametric greedy phasing into GA 

made a considerable improvement in results.

● Yet, the search space of most parsimonious haplotypes is rather complicated to be tractable by Genetic Algorithm.

● It seems that the most parsimonious haplotypes are not necessarily near to actual haplotypes, in practice.

Page 63: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

ConclusionHaplotype block partitioning using the global Haplotype block partitioning using the global partitioning for maximal associated SNP pairspartitioning for maximal associated SNP pairs

● Methods of pairwise analysis of SNPs find blocks of limited haplotype diversity.

● There is not any general concordance among block boundaries with different methods.

● By permutation re­sampling it has been shown that the Gabriel's method and its association index are highly robust. Our algorithm is also relatively robust.

Page 64: Computational Problems in Haplotype Recognitionfaculties.sbu.ac.ir/~katanforoush/dissertation/myDefense.pdf · Computational Problems in Haplotype Recognition by Ali Katanforoush

Conclusion● The global block partitioning methods performed best in 

identification of recombination hotspots.● The block­based association test is considerably more 

efficient than the conventional single site association test, in case­control study.

● Our block partitioning method performed best accuracy for the case­control study, even when a low marker density is available.