Constraint Programming and Biology: Haplotype Inferenceagostino.dovier/WROCLAW/BIOCP12_1.pdf · In...

Preview:

Citation preview

Constraint Programming and Biology:Haplotype Inference

Agostino Dovier

Dept. Math and Computer Science, Univ. of Udine, Italy

ACP Summer School in Constraint ProgrammingWrocław, September 2012

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 1 / 20

Haplotype Inference Introduction

DNA and Genome in a nutshell

DNA (DeoxyriboNucleic Acid) ischaracterized by a string of nucleotides: A,C, G, and T (Adenine, Cytosine, Guanine,Thymine)Given a sequence s ∈ {A,C,G,T}∗ thecomplementary sequence s̄ isdeterministically obtained by substitutingA↔ T and C ↔ Gs and s̄ fold together forming the famousdouble helix

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 2 / 20

Haplotype Inference Introduction

DNA and Genome in a nutshell

DNA strings are huge (106–1010 nucleotides).Differences between the DNAs of two members of the samespecie are limited (e.g., 1 on 1000 for humans)Some fragments of the DNA encode proteins (we’ll be back onthat later). Let’s say for now that they are very important parts andcalled genes.In the Human DNA it is estimated that there are 23000 (maybefew) protein-coding genes.Differences of some nucleotides in the same gene characterize aproperty of an individual w.r.t. another.The set of all genes of an individual is called genome

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 3 / 20

Haplotype Inference Introduction

Haplotype Inference

Genes are packaged in bundles called chromosomes.(Chromosomes are therefore regions of DNA)In diploid organisms (like humans) we have 23 homologouschromosome pairs, one coming from the DNA of the father,another coming from the DNA of the mother.A haplotype is a DNA sequence that has been inherited from oneparent.

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 4 / 20

Haplotype Inference Introduction

Haplotype Inference

Each person inherits two haplotypes (from the mother and from thefather) for most regions of the genome.

· · · G A T C T G T A C T G A G T · · ·· · · G A T C T G T A C T G A A T · · ·

⇑ ⇑ ⇑

In some (typical) points, the bases can be different.If this is the case, we say that there is a Single NucleotidePolymorphism (SNP).

Changes are always C ↔ T and A↔ G

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 5 / 20

Haplotype Inference Introduction

Haplotype Inference

Each person inherits two haplotypes (from the mother and from thefather) for most regions of the genome.

· · · G A T C T G T A C T G A G T · · ·· · · G A T C T G T A C T G A A T · · ·

⇑ ⇑ ⇑

In some (typical) points, the bases can be different.

If this is the case, we say that there is a Single NucleotidePolymorphism (SNP).

Changes are always C ↔ T and A↔ G

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 5 / 20

Haplotype Inference Introduction

Haplotype Inference

Each person inherits two haplotypes (from the mother and from thefather) for most regions of the genome.

· · · G A T C T G T A C T G A G T · · ·· · · G A T C T G T A C T G A A T · · ·

⇑ ⇑ ⇑

In some (typical) points, the bases can be different.If this is the case, we say that there is a Single NucleotidePolymorphism (SNP).

Changes are always C ↔ T and A↔ G

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 5 / 20

Haplotype Inference Introduction

Haplotype Inference

Each person inherits two haplotypes (from the mother and from thefather) for most regions of the genome.

· · · G A T C T G T A C T G A G T · · ·· · · G A T C T G T A C T G A A T · · ·

⇑ ⇑ ⇑

In some (typical) points, the bases can be different.If this is the case, we say that there is a Single NucleotidePolymorphism (SNP).

Changes are always C ↔ T and A↔ G

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 5 / 20

Haplotype Inference Introduction

Haplotype Inference

A good introduction in http://csiflabs.cs.ucdavis.edu/~gusfield/gusfieldorzack.pdf

The Haplotype Inference problem(s) is(are) introduced toinvestigate genetic variations in a population.Some particular points of the DNA where typically mutations areconcentrated are selected (SNPs).These lists of points are analyzed.

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 6 / 20

Haplotype Inference Introduction

Haplotype InferenceSingle Nucleotide Polymorphism (SNP)

Each person has two haplotypes (from the mother and from the father)for most regions of the genome:

G A A T C T T C G T A C T G A G TG A A T C T T C G T A C T G A A TLet us focus on the SNPs:

A C T GA C T A

0 0 1 2We know that at a location (site/locus) there is a SNP.We know whether the SNP is C ↔ T and A↔ G.We assign a SNP a 0-2 value in the following way:C,C 7→ 0 T ,T 7→ 1 C,T 7→ 2 T ,C 7→ 2A,A 7→ 0 G,G 7→ 1 A,G 7→ 2 G,A 7→ 2

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 7 / 20

Haplotype Inference Introduction

Haplotype InferenceSingle Nucleotide Polymorphism (SNP)

Each person has two haplotypes (from the mother and from the father)for most regions of the genome:

G A A T C T T C G T A C T G A G TG A A T C T T C G T A C T G A A TLet us focus on the SNPs:

A C T GA C T A0 0 1 2

We know that at a location (site/locus) there is a SNP.

We know whether the SNP is C ↔ T and A↔ G.We assign a SNP a 0-2 value in the following way:C,C 7→ 0 T ,T 7→ 1 C,T 7→ 2 T ,C 7→ 2A,A 7→ 0 G,G 7→ 1 A,G 7→ 2 G,A 7→ 2

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 7 / 20

Haplotype Inference Introduction

Haplotype InferenceSingle Nucleotide Polymorphism (SNP)

Each person has two haplotypes (from the mother and from the father)for most regions of the genome:

G A A T C T T C G T A C T G A G TG A A T C T T C G T A C T G A A TLet us focus on the SNPs:

A C T GA C T A0 0 1 2

We know that at a location (site/locus) there is a SNP.We know whether the SNP is C ↔ T and A↔ G.

We assign a SNP a 0-2 value in the following way:C,C 7→ 0 T ,T 7→ 1 C,T 7→ 2 T ,C 7→ 2A,A 7→ 0 G,G 7→ 1 A,G 7→ 2 G,A 7→ 2

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 7 / 20

Haplotype Inference Introduction

Haplotype InferenceSingle Nucleotide Polymorphism (SNP)

Each person has two haplotypes (from the mother and from the father)for most regions of the genome:

G A A T C T T C G T A C T G A G TG A A T C T T C G T A C T G A A TLet us focus on the SNPs:

A C T GA C T A0 0 1 2

We know that at a location (site/locus) there is a SNP.We know whether the SNP is C ↔ T and A↔ G.We assign a SNP a 0-2 value in the following way:C,C 7→ 0 T ,T 7→ 1 C,T 7→ 2 T ,C 7→ 2A,A 7→ 0 G,G 7→ 1 A,G 7→ 2 G,A 7→ 2

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 7 / 20

Haplotype Inference Introduction

Haplotype Inference

A string of {0,1,2}∗ is called a genotypeA string of {0,1}∗ is called a haplotypeTwo equal length haplotypes generate a unique genotype

E.g., 0010,0101⇒ 0222If we have a genotype, we can only conjecture haplotypes thatgenerated it(observe that, e.g., 0110,0001⇒ 0222)Biological experiments allow us to know genotypes!Investigating sets of genotypes for a population we canunderstand the relationships between SNPs and physical featuresas well as medical informationSince genotypes are introduced in evolution, is is reasonable tofind minimal sets of haplotypes explaining the known genotypes.

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 8 / 20

Haplotype Inference Introduction

Haplotype Inference

A string of {0,1,2}∗ is called a genotypeA string of {0,1}∗ is called a haplotypeTwo equal length haplotypes generate a unique genotypeE.g., 0010,0101⇒ 0222

If we have a genotype, we can only conjecture haplotypes thatgenerated it(observe that, e.g., 0110,0001⇒ 0222)Biological experiments allow us to know genotypes!Investigating sets of genotypes for a population we canunderstand the relationships between SNPs and physical featuresas well as medical informationSince genotypes are introduced in evolution, is is reasonable tofind minimal sets of haplotypes explaining the known genotypes.

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 8 / 20

Haplotype Inference Introduction

Haplotype Inference

A string of {0,1,2}∗ is called a genotypeA string of {0,1}∗ is called a haplotypeTwo equal length haplotypes generate a unique genotypeE.g., 0010,0101⇒ 0222If we have a genotype, we can only conjecture haplotypes thatgenerated it

(observe that, e.g., 0110,0001⇒ 0222)Biological experiments allow us to know genotypes!Investigating sets of genotypes for a population we canunderstand the relationships between SNPs and physical featuresas well as medical informationSince genotypes are introduced in evolution, is is reasonable tofind minimal sets of haplotypes explaining the known genotypes.

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 8 / 20

Haplotype Inference Introduction

Haplotype Inference

A string of {0,1,2}∗ is called a genotypeA string of {0,1}∗ is called a haplotypeTwo equal length haplotypes generate a unique genotypeE.g., 0010,0101⇒ 0222If we have a genotype, we can only conjecture haplotypes thatgenerated it(observe that, e.g., 0110,0001⇒ 0222)

Biological experiments allow us to know genotypes!Investigating sets of genotypes for a population we canunderstand the relationships between SNPs and physical featuresas well as medical informationSince genotypes are introduced in evolution, is is reasonable tofind minimal sets of haplotypes explaining the known genotypes.

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 8 / 20

Haplotype Inference Introduction

Haplotype Inference

A string of {0,1,2}∗ is called a genotypeA string of {0,1}∗ is called a haplotypeTwo equal length haplotypes generate a unique genotypeE.g., 0010,0101⇒ 0222If we have a genotype, we can only conjecture haplotypes thatgenerated it(observe that, e.g., 0110,0001⇒ 0222)Biological experiments allow us to know genotypes!Investigating sets of genotypes for a population we canunderstand the relationships between SNPs and physical featuresas well as medical informationSince genotypes are introduced in evolution, is is reasonable tofind minimal sets of haplotypes explaining the known genotypes.

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 8 / 20

Haplotype Inference Introduction

Haplotype Inference

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 9 / 20

Haplotype Inference Introduction

Haplotype Inference

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 9 / 20

Haplotype Inference Introduction

Haplotype Inference

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 9 / 20

Haplotype Inference Definition

Haplotype Inference

Let H = {0,1}∗ be the set of haplotypes andG = {0,1,2}∗ be the set of genotypes.Given h1,h2 ∈ H and g ∈ G, {h1,h2} explains g if and only if|h1| = |h2| = |g| (let’s say n = |g|) and ∀i ∈ [1..n]:

g[i] ≤ 1 −→ h1[i] = h2[i] = g[i]g[i] = 2 −→ h1[i] 6= h2[i]

A set of haplotypes H ⊆ H explains a set of genotypes G ⊆ G if forall g ∈ G there are h1,h2 ∈ H such that {h1,h2} explains g.Given a set of genotypes G ⊆ G and an integer k , the haplotypeinference problem (HIP) by pure parsimony is the problem offinding a set H ⊆ H that explains G and such that |H| = k(decision version).

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 10 / 20

Haplotype Inference Definition

Haplotype InferenceThe ILP modeling

0, 1, 2 are arbitrary valuesLet us swap the roles of 1 and 2 in genotypes, namely 1 is used ofmismatch (and 2 stands for 1)Given h1,h2 ∈ H and g ∈ G, {h1,h2} explains g if and only if|h1| = |h2| = |g| = n and ∀i ∈ [1..n]:

g[i] = 0 −→ h1[i] = h2[i] = 0g[i] = 2 −→ h1[i] = h2[i] = 1g[i] = 1 −→ {h1[i],h2[i]} = {0,1}

Therefore g[i] = h1[i] + h2[i]This simplifies an ILP encoding. Experimentally it does notimprove CP speed-up. Just forget it in this school.

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 11 / 20

Haplotype Inference Definition

Haplotype InferenceThe ILP modeling

0, 1, 2 are arbitrary valuesLet us swap the roles of 1 and 2 in genotypes, namely 1 is used ofmismatch (and 2 stands for 1)Given h1,h2 ∈ H and g ∈ G, {h1,h2} explains g if and only if|h1| = |h2| = |g| = n and ∀i ∈ [1..n]:

g[i] = 0 −→ h1[i] = h2[i] = 0g[i] = 2 −→ h1[i] = h2[i] = 1g[i] = 1 −→ {h1[i],h2[i]} = {0,1}

Therefore g[i] = h1[i] + h2[i]This simplifies an ILP encoding. Experimentally it does notimprove CP speed-up. Just forget it in this school.

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 11 / 20

Haplotype Inference Definition

Haplotype Inference by Pure Parsimony

Use of such a parsimony criterion is consistent with the factthat the number of distinct haplotypes observed in mostnatural populations is vastly smaller than the number ofpossible haplotypes; this is expected given the plausibleassumptions that the mutation rate at each site is small andrecombinations rate are low.

[Gusfield and Orzack, 2006]

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 12 / 20

Haplotype Inference Complexity

Haplotype Inference by Pure ParsimonyNP-completeness (sketch — see [LPR04])

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 13 / 20

Haplotype Inference Complexity

Haplotype Inference by Pure ParsimonyNP-completeness (sketch — see [LPR04])

Vertex cover of cardinality 2

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 14 / 20

Haplotype Inference Complexity

Haplotype Inference by Pure ParsimonyNP-completeness (sketch — see [LPR04])

1 2 3 4 51 0 1 1 1 1 0

2 1 0 1 1 1 03 1 1 0 1 1 04 1 1 1 0 1 05 1 1 1 1 0 0

(1,2) 2 2 1 1 1 2(1,3) 2 1 2 1 1 2(1,4) 2 1 1 2 1 2(1,5) 2 1 1 1 2 2(2,5) 1 2 1 1 2 2(3,5) 1 1 2 1 2 2(4,5) 1 1 1 2 2 2

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 15 / 20

Haplotype Inference Complexity

Haplotype Inference by Pure ParsimonyNP-completeness (sketch — see [LPR04])

1 2 3 4 51 0 1 1 1 1 02 1 0 1 1 1 03 1 1 0 1 1 04 1 1 1 0 1 05 1 1 1 1 0 0

(1,2) 2 2 1 1 1 2(1,3) 2 1 2 1 1 2(1,4) 2 1 1 2 1 2(1,5) 2 1 1 1 2 2(2,5) 1 2 1 1 2 2(3,5) 1 1 2 1 2 2(4,5) 1 1 1 2 2 2

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 15 / 20

Haplotype Inference Complexity

Haplotype Inference by Pure ParsimonyNP-completeness (sketch — see [LPR04])

1 2 3 4 51 0 1 1 1 1 02 1 0 1 1 1 03 1 1 0 1 1 04 1 1 1 0 1 05 1 1 1 1 0 0

(1,2) 2 2 1 1 1 2

(1,3) 2 1 2 1 1 2(1,4) 2 1 1 2 1 2(1,5) 2 1 1 1 2 2(2,5) 1 2 1 1 2 2(3,5) 1 1 2 1 2 2(4,5) 1 1 1 2 2 2

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 15 / 20

Haplotype Inference Complexity

Haplotype Inference by Pure ParsimonyNP-completeness (sketch — see [LPR04])

1 2 3 4 51 0 1 1 1 1 02 1 0 1 1 1 03 1 1 0 1 1 04 1 1 1 0 1 05 1 1 1 1 0 0

(1,2) 2 2 1 1 1 2(1,3) 2 1 2 1 1 2

(1,4) 2 1 1 2 1 2(1,5) 2 1 1 1 2 2(2,5) 1 2 1 1 2 2(3,5) 1 1 2 1 2 2(4,5) 1 1 1 2 2 2

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 15 / 20

Haplotype Inference Complexity

Haplotype Inference by Pure ParsimonyNP-completeness (sketch — see [LPR04])

1 2 3 4 51 0 1 1 1 1 02 1 0 1 1 1 03 1 1 0 1 1 04 1 1 1 0 1 05 1 1 1 1 0 0

(1,2) 2 2 1 1 1 2(1,3) 2 1 2 1 1 2(1,4) 2 1 1 2 1 2(1,5) 2 1 1 1 2 2(2,5) 1 2 1 1 2 2(3,5) 1 1 2 1 2 2(4,5) 1 1 1 2 2 2

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 15 / 20

Haplotype Inference Complexity

Haplotype Inference by Pure ParsimonyNP-completeness (sketch — see [LPR04])

Vertex cover for G = 〈N,E〉 of cardinality k ⇒ |H| = |N|+ k .

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 16 / 20

Haplotype Inference Complexity

Haplotype Inference by Pure ParsimonyNP-completeness (sketch — see [LPR04])

Vertex cover for G = 〈N,E〉 of cardinality k ⇒ |H| = |N|+ k .

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 16 / 20

Haplotype Inference Complexity

Haplotype Inference by Pure ParsimonyNP-completeness (sketch — see [LPR04])

Vertex cover for G = 〈N,E〉 of cardinality k ⇒ |H| = |N|+ k .

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 16 / 20

Haplotype Inference Complexity

Haplotype Inference by Pure ParsimonyNP-completeness (sketch — see [LPR04])

Vertex cover for G = 〈N,E〉 of cardinality k ⇒ |H| = |N|+ k .

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 16 / 20

Haplotype Inference Complexity

Haplotype Inference by Pure ParsimonyNP-completeness (sketch — see [LPR04])

Vertex cover for G = 〈N,E〉 of cardinality k ⇒ |H| = |N|+ k .

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 16 / 20

Haplotype Inference Encoding

Haplotype Inference1st CP encoding

Let us focus on the decisional version: Is there an explanation forG with k haplotypes?Generate k vectors of 0-1 FD variables H1, . . . ,Hk of length nAdd a lexicographical constraint on H1, . . . ,Hk .Build a constraint of the form:

∀Gi ∈ G ∃Hi1∃Hi2 s.t. 〈Hi1 ,Hi2〉 explain Gi

Basically, for each i , i1, i2 we have a flag F ii1,i2

true iff

n∧j=1

(Gi [j] ≤ 1→ (Hi1 [j] = Hi2 [j] = Gi [j])∧Gi [j] = 2→ (Hi1 [j] 6= Hi2 [j])

)

Then forall i ∈ [1..|G|]:∑

i1,i2 F ii1,i2≥ 1

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 17 / 20

Haplotype Inference Encoding

Haplotype Inference2nd CP encoding

Let us focus on the decisional version: Is there an explanation forG with k haplotypes?Generate m = 2|G| vectors of 0-1 FD variables H1, . . . ,Hm oflength nAdd a lexicographical constraint on pairs(H1,H2), (H3,H4), . . . , (Hm−1,Hm) (we can have repetitions now!)Build a constraint of the form:

(∀Gi ∈ G) (〈H2i−1,H2i〉 explain G)

Namely, again,n∧

j=1

(Gi [j] ≤ 1→ (H2i1 [j] = Hi2 [j] = G2i [j])∧Gi [j] = 2→ (H2i1 [j] 6= H2i [j])

)We need to state (using constraints!) that |{H1, . . . ,Hm}| = k .

This is a good constraint exercise.

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 18 / 20

Haplotype Inference Encoding

Haplotype Inference2nd CP encoding

Let us focus on the decisional version: Is there an explanation forG with k haplotypes?Generate m = 2|G| vectors of 0-1 FD variables H1, . . . ,Hm oflength nAdd a lexicographical constraint on pairs(H1,H2), (H3,H4), . . . , (Hm−1,Hm) (we can have repetitions now!)Build a constraint of the form:

(∀Gi ∈ G) (〈H2i−1,H2i〉 explain G)

Namely, again,n∧

j=1

(Gi [j] ≤ 1→ (H2i1 [j] = Hi2 [j] = G2i [j])∧Gi [j] = 2→ (H2i1 [j] 6= H2i [j])

)We need to state (using constraints!) that |{H1, . . . ,Hm}| = k .This is a good constraint exercise.

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 18 / 20

Haplotype Inference Encoding

Haplotype Inference2nd CP encoding

For a,b ∈ [1..m] we set Fa,b ↔∧n

i=1 Ha[i] = Hb[i].Namely Fa,b is a Boolean variable that is true iff Ha and Hb will beequal in the solution

Then define Ma ↔∨m

b=a+1 Fa,b

Ma is again a Boolean variable that is true if and only if there isanother vector in Ha+1,Ha+2, . . . ,Hm equal to Ha

The size of H can be therefore expressed as∑n

a=1(1−Ma)(viewing Boolean truth values as 0/1)What is the best encoding? Try codes clp_direct.pl andclp_second.pl (0, 1, and 2 have the original meaning here).Search space: O(2nk ), with k < 2|G| (1), O(22n|G|)) (2).Call goals like :- input(1,Gs),haplo_decision(Gs,5). or:- haplo_decision([[0,1,2],[0,2,1]],3).

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 19 / 20

Haplotype Inference Encoding

Haplotype Inference2nd CP encoding

For a,b ∈ [1..m] we set Fa,b ↔∧n

i=1 Ha[i] = Hb[i].Namely Fa,b is a Boolean variable that is true iff Ha and Hb will beequal in the solutionThen define Ma ↔

∨mb=a+1 Fa,b

Ma is again a Boolean variable that is true if and only if there isanother vector in Ha+1,Ha+2, . . . ,Hm equal to Ha

The size of H can be therefore expressed as∑n

a=1(1−Ma)(viewing Boolean truth values as 0/1)What is the best encoding? Try codes clp_direct.pl andclp_second.pl (0, 1, and 2 have the original meaning here).Search space: O(2nk ), with k < 2|G| (1), O(22n|G|)) (2).Call goals like :- input(1,Gs),haplo_decision(Gs,5). or:- haplo_decision([[0,1,2],[0,2,1]],3).

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 19 / 20

Haplotype Inference Encoding

Haplotype Inference2nd CP encoding

For a,b ∈ [1..m] we set Fa,b ↔∧n

i=1 Ha[i] = Hb[i].Namely Fa,b is a Boolean variable that is true iff Ha and Hb will beequal in the solutionThen define Ma ↔

∨mb=a+1 Fa,b

Ma is again a Boolean variable that is true if and only if there isanother vector in Ha+1,Ha+2, . . . ,Hm equal to Ha

The size of H can be therefore expressed as∑n

a=1(1−Ma)(viewing Boolean truth values as 0/1)

What is the best encoding? Try codes clp_direct.pl andclp_second.pl (0, 1, and 2 have the original meaning here).Search space: O(2nk ), with k < 2|G| (1), O(22n|G|)) (2).Call goals like :- input(1,Gs),haplo_decision(Gs,5). or:- haplo_decision([[0,1,2],[0,2,1]],3).

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 19 / 20

Haplotype Inference Encoding

Haplotype Inference2nd CP encoding

For a,b ∈ [1..m] we set Fa,b ↔∧n

i=1 Ha[i] = Hb[i].Namely Fa,b is a Boolean variable that is true iff Ha and Hb will beequal in the solutionThen define Ma ↔

∨mb=a+1 Fa,b

Ma is again a Boolean variable that is true if and only if there isanother vector in Ha+1,Ha+2, . . . ,Hm equal to Ha

The size of H can be therefore expressed as∑n

a=1(1−Ma)(viewing Boolean truth values as 0/1)What is the best encoding? Try codes clp_direct.pl andclp_second.pl (0, 1, and 2 have the original meaning here).Search space: O(2nk ), with k < 2|G| (1), O(22n|G|)) (2).Call goals like :- input(1,Gs),haplo_decision(Gs,5). or:- haplo_decision([[0,1,2],[0,2,1]],3).

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 19 / 20

Haplotype Inference References

Haplotype InferenceSome References

Gusfield and Orzack. Haplotype Inference (Survey, and ILPformulations) In CRC Handbook on Bioinformatics, 2006Lancia, Pinotti, Rizzi. [LPR04] Haplotyping Populations by PureParsimony: Complexity of Exact and Approximation Algorithms.INFORMS Journal on Computing 16(4):348–359, 2004.Graça, Marques-Silva, Lynce, Oliveira. Several works onSAT-based and specialized 0-1 ILP for Haplotype Inference. (e.g.WCB 08, WCB 09)Di Gaspero, Roli. Stochastic local search for large-scale instancesof the haplotype inference problem by pure parsimony. J.Algorithms 63(1-3): 55-69 (2008) (also in WCB08).Erdem, Erdem, Türe. HAPLO-ASP: Haplotype Inference UsingAnswer Set Programming. LPNMR 2009: 573–578James Cussens Maximum likelihood pedigree reconstructionusing integer programming. WCB 10.

Agostino Dovier (DIMI, UDINE Univ.) CP and Biology Wrocław, September 2012 20 / 20