227
Optimization Problems for Optimization Problems for Polymorphisms of Single Nucleotides Polymorphisms of Single Nucleotides

Optimization Problems for Polymorphisms of Single Nucleotides

Embed Size (px)

Citation preview

Page 1: Optimization Problems for Polymorphisms of Single Nucleotides

Optimization Problems for Optimization Problems for

Polymorphisms of Single Polymorphisms of Single NucleotidesNucleotides

Page 2: Optimization Problems for Polymorphisms of Single Nucleotides

PolymorphismsPolymorphisms

A polymorphism is a feature

Page 3: Optimization Problems for Polymorphisms of Single Nucleotides

PolymorphismsPolymorphisms

A polymorphism is a feature - common to everybody

Page 4: Optimization Problems for Polymorphisms of Single Nucleotides

PolymorphismsPolymorphisms

A polymorphism is a feature - common to everybody - not identical in everybody

Page 5: Optimization Problems for Polymorphisms of Single Nucleotides

PolymorphismsPolymorphisms

A polymorphism is a feature - common to everybody - not identical in everybody- the possible variants (alleles) are just a few

Page 6: Optimization Problems for Polymorphisms of Single Nucleotides

PolymorphismsPolymorphisms

E.g. think of eye-coloreye-color

A polymorphism is a feature - common to everybody - not identical in everybody- the possible variants (alleles) are just a few

Page 7: Optimization Problems for Polymorphisms of Single Nucleotides

PolymorphismsPolymorphisms

A polymorphism is a feature - common to everybody - not identical in everybody- the possible variants (alleles) are just a few

E.g. think of eye-coloreye-color

Or blood-typeblood-type for a feature not visible from outside

Page 8: Optimization Problems for Polymorphisms of Single Nucleotides

At DNA level, a polymorphism is a sequence of nucleotidesvarying in a population.

Page 9: Optimization Problems for Polymorphisms of Single Nucleotides

At DNA level, a polymorphism is a sequence of nucleotidesvarying in a population.

The shortest possible sequence has only 1 nucleotide, hence

SSingle NNucleotide PPolymorphism (SNP)

Page 10: Optimization Problems for Polymorphisms of Single Nucleotides

At DNA level, a polymorphism is a sequence of nucleotidesvarying in a population.

The shortest possible sequence has only 1 nucleotide, hence

SSingle NNucleotide PPolymorphism (SNP)

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

Page 11: Optimization Problems for Polymorphisms of Single Nucleotides

At DNA level, a polymorphism is a sequence of nucleotidesvarying in a population.

The shortest possible sequence has only 1 nucleotide, hence

SSingle NNucleotide PPolymorphism (SNP)

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacgtac

atcggcttagttagggcacaggacgtac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacgtac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

Page 12: Optimization Problems for Polymorphisms of Single Nucleotides

- SNPs are predominant form of human variations

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacgtac

atcggcttagttagggcacaggacgtac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacgtac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

- Used for drug design, study disease, forensic, evolutionary...

- On average one every 1,000 bases

Page 13: Optimization Problems for Polymorphisms of Single Nucleotides

- Multimillion dollar SNP consortium project

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacgtac

atcggcttagttagggcacaggacgtac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacgtac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

- Goal: associate SNPs (or group of SNPs) to genetic diseases

- 1st step: build maps of several thousand SNPs

Page 14: Optimization Problems for Polymorphisms of Single Nucleotides

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacgtac

atcggcttagttagggcacaggacgtac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacgtac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes

Page 15: Optimization Problems for Polymorphisms of Single Nucleotides

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacgtac

atcggcttagttagggcacaggacgtac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacgtac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes

Page 16: Optimization Problems for Polymorphisms of Single Nucleotides

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacgtac

atcggcttagttagggcacaggacgtac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacgtac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes

HETEROZYGOUSHETEROZYGOUS: different alleles

Page 17: Optimization Problems for Polymorphisms of Single Nucleotides

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacgtac

atcggcttagttagggcacaggacgtac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacgtac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes

HETEROZYGOUSHETEROZYGOUS: different alleles

Page 18: Optimization Problems for Polymorphisms of Single Nucleotides

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacgtac

atcggcttagttagggcacaggacgtac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacgtac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes

HETEROZYGOUSHETEROZYGOUS: different alleles

HAPLOTYPEHAPLOTYPE: chromosome content at SNP sites

Page 19: Optimization Problems for Polymorphisms of Single Nucleotides

atcggcttagttagggcacaggacgtac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacgtac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacgtac

atcggattagttagggcacaggacgt

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggcttagttagggcacaggacggac

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacggac

HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes

HETEROZYGOUSHETEROZYGOUS: different alleles

HAPLOTYPEHAPLOTYPE: chromosome content at SNP sites

atcggattagttagggcacaggacggac

atcggattagttagggcacaggacgtac

Page 20: Optimization Problems for Polymorphisms of Single Nucleotides

ag at

ct ag

ct cg

at at

ag cg

ag cg

ag ag

HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes

HETEROZYGOUSHETEROZYGOUS: different alleles

HAPLOTYPEHAPLOTYPE: chromosome content at SNP sites

Page 21: Optimization Problems for Polymorphisms of Single Nucleotides

ag at

ct ag

ct cg

at at

ag cg

ag cg

ag ag

HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes

HETEROZYGOUSHETEROZYGOUS: different alleles

HAPLOTYPEHAPLOTYPE: chromosome content at SNP sites

GENOTYPEGENOTYPE: “union” of 2 haplotypes

OcE

EE

OaOg

OaE OaOt

EOg

OgE

Page 22: Optimization Problems for Polymorphisms of Single Nucleotides

ag at

ct ag

ct cg

at at

ag cg

ag cg

ag ag

OcE

EE

OaOg

OaE OaOt

EOg

OgE

CHANGE OF SYMBOLSCHANGE OF SYMBOLS: each SNP only two values in a poplulation (bio).

Call them 1 and O. Also, call * the fact that a site is heterozygous

HAPLOTYPEHAPLOTYPE: string over 1,OGENOTYPEGENOTYPE: string over 1,O,*

Page 23: Optimization Problems for Polymorphisms of Single Nucleotides

1o 11

o1 1o

o1 oo

11 11

1o oo

1o oo

1o 1o

o*

**

*o

1* 11

*o

*o

CHANGE OF SYMBOLSCHANGE OF SYMBOLS: each SNP only two values in a poplulation (bio).

Call them 1 and O. Also, call * the fact that a site is heterozygous

HAPLOTYPEHAPLOTYPE: string over 1,OGENOTYPEGENOTYPE: string over 1,O,*

Page 24: Optimization Problems for Polymorphisms of Single Nucleotides

THE HAPLOTYPING PROBLEMTHE HAPLOTYPING PROBLEM

Single IndividualSingle Individual: Given genomic data of one individual, determine 2 haplotypes (one per chromosome)

Population Population : Given genomic data of k individuals, determine (at most) 2k haplotypes (one per chromosome/indiv.)

For the individual problem, input is erroneous haplotype data, from sequencing

For the population problem, data is ambiguous genotype data, from screening

OBJ is lead by Occam’s razor: find minimum explanation of observed data under given hypothesis (a.k.a. parsimony principle)

Page 25: Optimization Problems for Polymorphisms of Single Nucleotides

Theory and ResultsTheory and Results

- Polynomial Algorithms for gapless haplotyping (L, Bafna, Istrail, Lippert, Schwartz 01 & Bafna, L, Istrail, Rizzi 02)

- Polynomial Algorithms for bounded-length gapped haplotyping (BLIR 02)

Single individual

- NP-hardness for general gapped haplotyping (LBILS 01)

- APX-hardness (Gusfield 00)

- Reduction to Graph-Theoretic model and I.P. approach (Gusfield 01)

Population

-New formulations and Disease Detection (L, Ravi, Rizzi, 02)

- Exact algorithms for min-size solution (L,Serafini 2011)

- Heuristics (Tininini, L, Bertolazzi 2010)

Page 26: Optimization Problems for Polymorphisms of Single Nucleotides

The Single-IndividualThe Single-IndividualHaplotyping problemHaplotyping problem

Page 27: Optimization Problems for Polymorphisms of Single Nucleotides

TGAGCCTAG GATTT GCCTAG CTATCTT

ATAGATA GAGATTTCTAGAAATC ACTGA

TAGAGATTTC TCCTAAAGAT CGCATAGATA

fragmentation

sequencing

assembly

Shotgun Assembly of a Chromosome [ Webber and Myers, 1997]

ACTGCAGCCTAGAGATTCTCAGATATTTCTAGGCGTATCTATCTTACTGCAGCCTAGAGATTCTCAGATATTTCTAGGCGTATCTATCTTACTGCAGCCTAGAGATTCTCAGATATTTCTAGGCGTATCTATCTT

ACTGCAGCCTAGAGATTCTCAGATATTTCTAGGCGTATCTATCTT

Page 28: Optimization Problems for Polymorphisms of Single Nucleotides

-Sequencing errors:

ACTGCCTGGCCAATGGAACGGACAAG CTGGCCAAT CATTGGAAC AATGGAACGGA

-Contaminants

MAIN ERROR SOURCESMAIN ERROR SOURCES

Page 29: Optimization Problems for Polymorphisms of Single Nucleotides

Given errorserrors, the data may be inconsistentinconsistent with exactly 2 haplotypes

PROBLEMPROBLEM: Find and remove : Find and remove the errors so that the data the errors so that the data becomes consistent with becomes consistent with exactly 2 haplotypesexactly 2 haplotypes

Hence, assembler is unable Hence, assembler is unable to build 2 chromosomesto build 2 chromosomes

Page 30: Optimization Problems for Polymorphisms of Single Nucleotides

ACTGAAAGCGA ACTAGAGACAGCATGACTGATAGC GTAGAGTCAACTG TCGACTAGA CATGACTGA CGATCCATCG TCAGCACTGAAA ATCGATC AGCATGACTGAAAGCGA ACTAGAGACAGCATGACTGATAGC GTAGAGTCAACTG TCGACTAGA CATGACTGA CGATCCATCG TCAGCACTGAAA ATCGATC AGCATG 1 1 O O O 1 1 1 1 1 O

The data: a SNP matrix

Page 31: Optimization Problems for Polymorphisms of Single Nucleotides

Snips 1,..,n

1 2 3 4 5 6 7 8 9 1 - - - O X X O O - 2 - O - O X - - - X3 X X O X X - - - - 4 O O X - - - - O - 5 - - - - - - - X O6 - - - - O O O X -

Fragments 1,..,m

Page 32: Optimization Problems for Polymorphisms of Single Nucleotides

Snips 1,..,n

1 2 3 4 5 6 7 8 9 1 - - - O X X O O - 2 - O - O X - - - X3 X X O X X - - - - 4 O O X - - - - O - 5 - - - - - - - X O6 - - - - O O O X -

Fragments 1,..,m

Fragment conflict: can’t be on same haplotype

Page 33: Optimization Problems for Polymorphisms of Single Nucleotides

Snips 1,..,n

1 2 3 4 5 6 7 8 9 1 - - - O X X O O - 2 - O - O X - - - X3 X X O X X - - - - 4 O O X - - - - O - 5 - - - - - - - X O6 - - - - O O O X -

Fragments 1,..,m

Fragment conflict: can’t be on same haplotype

1

6

2

3

4

5

Fragment Conflict Graph GF(M)

We have 2 haplotypes iff GF is BIPARTITE

Page 34: Optimization Problems for Polymorphisms of Single Nucleotides

Snips 1,..,n

1 2 3 4 5 6 7 8 9 1 - - - O X X O O - 2 - O - O X - - - X3 X X O X X - - - - 4 O O X - - - - O - 5 - - - - - - - X O6 - - - - O O O X -

Fragments 1,..,m

1

6

2

3

4

5

PROBLEM (Fragment Removal): make GF Bipartite

Page 35: Optimization Problems for Polymorphisms of Single Nucleotides

Snips 1,..,n

1 2 3 4 5 6 7 8 9 1 - - - O X X O O - 2 - O - O X - - - X3 X X O X X - - - - 4 O O X - - - - O - 5 - - - - - - - X O6 - - - - O O O X -

Fragments 1,..,m

PROBLEM (Fragment Removal): make GF Bipartite

1

6

2

3

4

5

1 2 3 4 5 6 7 8 9 1 - - - O X X O O - 2 - O - O X - - - X4 O O X - - - - O -

3 X X O X X - - - -5 - - - - - - - X O

O O X O X X O O X

X X O X X - - X O

Page 36: Optimization Problems for Polymorphisms of Single Nucleotides

Removing fewest fragments is equivalent to maximum induced bipartite subgraph

NP-complete [Yannakakis, 1978a, 1978b; Lewis, 1978] O(|V|(log log |V|/log |V|)2)-approximable [Halldórsson, 1999] not O(|V|)-approximable for some [Lund and Yannakakis, 1993]

Are there cases of M for which GF(M) is easier?

YES: the gapless M

---OXXOO---OXOOX--- gap

---OXXOOXOXOXOOX--- gapless

---OXX--XO----OX--- 2 gaps

Page 37: Optimization Problems for Polymorphisms of Single Nucleotides

Why gaps?

Sequencing errors (don’t call with low confidence)

---OOXX?XX--- ===> ---OOXX-XX---

Celera’s mate pairs

attcgttgtagtggtagcctaaatgtcggtagaccttga

attcgttgtagtggtagcctaaatgtcggtagaccttga

Page 38: Optimization Problems for Polymorphisms of Single Nucleotides

THEOREM

For a gapless M, the Min Fragment RemovalProblem is Polynomial

NOTENOTE: Does not need to be gapless. Enough if it can be sorted to become such (Consecutive Ones Property, Booth and Lueker, 1976)

Page 39: Optimization Problems for Polymorphisms of Single Nucleotides

An O(nm + n ) D.P. algoAn O(nm + n ) D.P. algo3

1 - O O X X O O - -2 - - X O X X O - -3 - - - X X O - - - 4 - - - - O O X O - 5 - - - - - X O X O

Page 40: Optimization Problems for Polymorphisms of Single Nucleotides

An O(nm + n ) D.P. algoAn O(nm + n ) D.P. algo3

1 - O O X X O O - -2 - - X O X X O - -3 - - - X X O - - - 4 - - - - O O X O - 5 - - - - - X O X O

LFT(i) RGT(i)

sort according to LFT

Page 41: Optimization Problems for Polymorphisms of Single Nucleotides

An O(nm + n ) D.P. algoAn O(nm + n ) D.P. algo3

1 - O O X X O O - -2 - - X O X X O - -3 - - - X X O - - - 4 - - - - O O X O - 5 - - - - - X O X O

LFT(i) RGT(i)

D(i;h,k) := min cost to solve up to row i, with k, h not removed and put in different haplotypes, and maximizing RGT(k), RGT(h)

sort according to LFT

D(i; h,k) =

D(i-1; h,k) if i, k compatible and RGT(i) <= RGT(k) or i, h compatible and RGT(i) <= RGT(h)

1 + D(i-1; h, k) otherwise{

OPT is min h,k D( n; h, k ) and can be found in time O(nm + n^3)

Page 42: Optimization Problems for Polymorphisms of Single Nucleotides

Th: NP-Hard if 2 gaps per fragment

proof: (simple) use fact that for every G there is M s.t. G = GF(M) and reduce from Max Bip. InducedSubgraph on 3-regular graphs

Th : NP-Hard if even 1 gap per fragment proof: technical. reduction from MAX2SAT

WITH GAPS…..WITH GAPS…..

But, gaps must be long for problem to be difficult.

We have O( 2 mn + 2 n ) D.P.

for MFR on matrix with total gaps length L

2L 3L 3

Page 43: Optimization Problems for Polymorphisms of Single Nucleotides

What for MFR with gaps? Why not ILP...What for MFR with gaps? Why not ILP...

min xff

xf >= 1 for all odd cycles Cf\in C

x \in {0,1}^n

Page 44: Optimization Problems for Polymorphisms of Single Nucleotides

What for MFR with gaps? Why not ILP...What for MFR with gaps? Why not ILP...

min xff

xf >= 1 for all odd cycles Cf\in C

x \in {0,1}^n

1

5 2

34

1/2

1/3

1/41/2

0

Page 45: Optimization Problems for Polymorphisms of Single Nucleotides

What for MFR with gaps? Why not ILP...What for MFR with gaps? Why not ILP...

min xff

xf >= 1 for all odd cycles Cf\in C

x \in {0,1}^n

1

5 2

34

1/2

1/3

1/41/2

01

5 2

34

1

5 2

34

Page 46: Optimization Problems for Polymorphisms of Single Nucleotides

What for MFR with gaps? Why not ILP...What for MFR with gaps? Why not ILP...

min xff

xf >= 1 for all odd cycles Cf\in C

x \in {0,1}^n

1

5 2

34

1/2

1/3

1/41/2

01

5 2

34

1

5 2

34

5/12 5/12

Page 47: Optimization Problems for Polymorphisms of Single Nucleotides

What for MFR with gaps? Why not ILP...What for MFR with gaps? Why not ILP...

min xff

xf >= 1 for all odd cycles Cf\in C

x \in {0,1}^n

1

5 2

34

1/2

1/3

1/41/2

01

5 2

34

1

5 2

34

5/12 5/12

Page 48: Optimization Problems for Polymorphisms of Single Nucleotides

What for MFR with gaps? Why not ILP...What for MFR with gaps? Why not ILP...

min xff

xf >= 1 for all odd cycles Cf\in C

x \in {0,1}^n

1

5 2

34

1/2

1/3

1/41/2

01

5 2

34

1

5 2

34

5/12 5/12

Page 49: Optimization Problems for Polymorphisms of Single Nucleotides

What for MFR with gaps? Why not ILP...What for MFR with gaps? Why not ILP...

min xff

xf >= 1 for all odd cycles Cf\in C

x \in {0,1}^n

1

5 2

34

1/2

1/3

1/41/2

01

5 2

34

1

5 2

34

5/12 5/12

Randomized rounding heuristic: round and repeat. Worked well at Celera

Page 50: Optimization Problems for Polymorphisms of Single Nucleotides

The fragment removal is good to get rid of contaminants.

However, we may want to keep all fragments andcorrect errors otherwise

A dual point of view is to disregard some SNPs and keepthe largest subset sufficient to reconstruct the haplotypes

All fragments get assigned to one of the two haplotypes.We describe the min SNP removal problem: remove the fewest number of columns from M so that the fragmentgraph becomes bipartite.

Page 51: Optimization Problems for Polymorphisms of Single Nucleotides

- - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -

SNP conflicts

Page 52: Optimization Problems for Polymorphisms of Single Nucleotides

- - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -

SNP conflicts

OK

Page 53: Optimization Problems for Polymorphisms of Single Nucleotides

- - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -

SNP conflicts

OK

Page 54: Optimization Problems for Polymorphisms of Single Nucleotides

- - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -

SNP conflicts

OK

Page 55: Optimization Problems for Polymorphisms of Single Nucleotides

- - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -

SNP conflicts

CONFLICT !

Page 56: Optimization Problems for Polymorphisms of Single Nucleotides

- - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -

SNP conflicts

CONFLICT !

Page 57: Optimization Problems for Polymorphisms of Single Nucleotides

- - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -

SNP conflicts

SNP conflict graph GS(M)1 node for each SNP (column)edge between conflicting SNPs

Page 58: Optimization Problems for Polymorphisms of Single Nucleotides

1 2 3 4 5 6 7 8 9 - - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -

SNP conflicts

Page 59: Optimization Problems for Polymorphisms of Single Nucleotides

1 2 3 4 5 6 7 8 9 - - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -

SNP conflicts

1

6

2

3

4

5

8

9

7

Page 60: Optimization Problems for Polymorphisms of Single Nucleotides

1 2 3 4 5 6 7 8 9 - - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -

SNP conflicts

1

6

2

3

4

5

8

9

7

Page 61: Optimization Problems for Polymorphisms of Single Nucleotides

THEOREM 1

For a gapless M, GF(M) is bipartiteif and only if GS(M) is an independent set

THEOREM 2

For a gapless M, GS(M) is a perfect graph

COROLLARY

For a gapless M, the min SNP removalproblem is polynomial

Page 62: Optimization Problems for Polymorphisms of Single Nucleotides

THEOREM 1For a gapless M, GF(M) is bipartite if and only if

GS(M) is an independent set

PROOF (sketch): by minimal counterexample

--OOXXOO-------------OOXOOXOXXO-----------XXOXOXXX-----XXOOXOXXO-----------XOOOX-----------XXXXXO-------XXOXXOXOO------

Assume M gapless, GS(M) an independent set, but GF(M)not bipartite.

Take an odd cycle in GF

Page 63: Optimization Problems for Polymorphisms of Single Nucleotides

THEOREM 1For a gapless M, GF(M) is bipartite if and only if

GS(M) is an independent set

PROOF (sketch): by minimal counterexample

--O?X???-------------O????????O-----------??O??X??-----??????X??-----------???O?-----------????X?-------X???????O------

There is a generic structure of hor-vert cycle

Page 64: Optimization Problems for Polymorphisms of Single Nucleotides

THEOREM 1For a gapless M, GF(M) is bipartite if and only if

GS(M) is an independent set

PROOF (sketch): by minimal counterexample

--O?X???-------------O????????O-----------??O??X??-----??????X??-----------???O?-----------????X?-------X???????O------

“vertical lines”

There cannot be only one vertical line in odd cycle

We merge rightmost and next to reduce them by 1

Hence, there cannot be a minimal (in n. of vertical lines) counterexample

Page 65: Optimization Problems for Polymorphisms of Single Nucleotides

THEOREM 1For a gapless M, GF(M) is bipartite if and only if

GS(M) is an independent set

PROOF (sketch): by minimal counterexample

--O?X???-------------O????????O-----------??O??X??-----??????X??-----------???O?-----------????X?-------X???????O------

“vertical lines”

Must be X

Page 66: Optimization Problems for Polymorphisms of Single Nucleotides

THEOREM 1For a gapless M, GF(M) is bipartite if and only if

GS(M) is an independent set

PROOF (sketch): by minimal counterexample

--O?X???-------------O?????X??O-----------??O??X??-----??????X??-----------???O?-----------????X?-------X???????O------

“vertical lines”

Must be X

Merge the rightmost lines

Page 67: Optimization Problems for Polymorphisms of Single Nucleotides

THEOREM 1For a gapless M, GF(M) is bipartite if and only if

GS(M) is an independent set

PROOF (sketch): by minimal counterexample

--O?X???-------------O?????X--------------??O----------??????X-------------???O------------????X--------X???????O------

“vertical lines”

Still a counterexample!

Merge the rightmost lines

Page 68: Optimization Problems for Polymorphisms of Single Nucleotides

1 2 31 O - O 2 - O X 3 X X -

Note: Theorem not true if there are gaps

1

2 3

1

2 3

GF(M) GS(M)

M

Page 69: Optimization Problems for Polymorphisms of Single Nucleotides

THEOREM 2For a gapless M, GS(M) is a perfect graph

PROOF: GS(M) is the complement of a comparability graph A

Comparability graphs are perfect

Comparability Graphs: unoriented that can be oriented to become a partial order

Page 70: Optimization Problems for Polymorphisms of Single Nucleotides

LEMMA: If i<j<k and (i,k) is a SNP conflict then either (i,k) or (j,k) is also a SNP conflict

i j k - X O O ? X O X - - O X O ? X X X -

Equal:conflicts with i

OO

Different:conflicts with k

OX

i kj

I.e. if (i,j) is not a conflict and (j,k) is not a conflict, also (i,k) is not a conflict

So (u,v) with u < v and u not a conflict with v is a comparability graph Aand GS is A complement

NOTE: ind set on perfect graph is in P (Lovasz, Schrijvers, Groetschel, 84)

Page 71: Optimization Problems for Polymorphisms of Single Nucleotides

THEOREM: The min SNP removal is NP-hard if there can be gaps (Reduction from MAXCUT)

Again, gaps must be long for problem to be difficult.

We have O(mn + n ) D.P.

for MSR on matrix with total gaps length L

2L + 1 2L + 2

Hence gapless MSR is polynomial (max stable set on perfect graph).

There are better, D.P., algorithms, O(mn + m^2)

What if gaps ?

Page 72: Optimization Problems for Polymorphisms of Single Nucleotides

The PopulationThe PopulationHaplotyping problemHaplotyping problem

Page 73: Optimization Problems for Polymorphisms of Single Nucleotides

The input is GENOTYPE data

oooxx

xxoxx

?x??x

????x

xx??x

INPUT: G = { xx??x, ????x, xxoxx, ?x??x, oooxx }

Page 74: Optimization Problems for Polymorphisms of Single Nucleotides

The input is GENOTYPE data

xxoxxxxxox

oooxx

oooxxxxxox

xxoxxoxxox

xxoxxxxoxx

oooxxoooxx

xxoxx

?x??x

????x

xx??x

OUTPUT: H = { xxoxx, xxxox, oooxx, oxxox}

INPUT: G = { xx??x, ????x, xxoxx, ?x??x, oooxx }

Each genotype is explained by two haplotypes

We will define some objectives for H

Page 75: Optimization Problems for Polymorphisms of Single Nucleotides

1st Objective1st Objective (open research problem):

minimize |H|

2nd Objective2nd Objective based on inference rule:

Page 76: Optimization Problems for Polymorphisms of Single Nucleotides

1st Objective (parsimony)1st Objective (parsimony) :

minimize |H|

An easy SQRT(n) approximation: k haplotypes can explain at most k(k-1)/2 genotypes, hence, we need at least LB = SQRT(n) haplotypes.

BUT any greedy algorithm can find 2 haplotypes to explain a genotype, giving asolution of <= 2n haplotypes, i.e. <= SQRT(n) * LB

It’s difficult, but not impossible, to come up with better approximations, like constants(Lancia, Pinotti, Rizzi ’02)

Page 77: Optimization Problems for Polymorphisms of Single Nucleotides

2nd Objective2nd Objective based on inference rule:

Page 78: Optimization Problems for Polymorphisms of Single Nucleotides

xoxxooxoxx +********** =x??xoox?x?

known haplotype h

known (ambiguos) genotype g

Inference RuleInference Rule

Page 79: Optimization Problems for Polymorphisms of Single Nucleotides

xoxxooxoxx +xxoxooxxxo =x??xoox?x?

known haplotype h

known (ambiguos) genotype g

new (derived) haplotype h’

Inference RuleInference Rule

Page 80: Optimization Problems for Polymorphisms of Single Nucleotides

xoxxooxoxx +xxoxooxxxo =x??xoox?x?

known haplotype h

known (ambiguos) genotype g

new (derived) haplotype h’

We write h + h’ = g

g and h must be compatible to derive h’

Inference RuleInference Rule

Page 81: Optimization Problems for Polymorphisms of Single Nucleotides

2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)

1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while

Page 82: Optimization Problems for Polymorphisms of Single Nucleotides

2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)

1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while

If, at end, G is empty, SUCCESS, otherwise FAILURE

Step 3 is non-deterministic

Page 83: Optimization Problems for Polymorphisms of Single Nucleotides

2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)

1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while

If, at end, G is empty, SUCCESS, otherwise FAILURE

Step 3 is non-deterministic

ooooxooo??ooxx??

Page 84: Optimization Problems for Polymorphisms of Single Nucleotides

2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)

1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while

If, at end, G is empty, SUCCESS, otherwise FAILURE

Step 3 is non-deterministic

ooooxooo??ooxx??

xxoo

Page 85: Optimization Problems for Polymorphisms of Single Nucleotides

2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)

1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while

If, at end, G is empty, SUCCESS, otherwise FAILURE

Step 3 is non-deterministic

ooooxooo??ooxx??

xxoo xxxx SUCCESS

Page 86: Optimization Problems for Polymorphisms of Single Nucleotides

2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)

1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while

If, at end, G is empty, SUCCESS, otherwise FAILURE

Step 3 is non-deterministic

ooooxooo??ooxx??

Page 87: Optimization Problems for Polymorphisms of Single Nucleotides

2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)

1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while

If, at end, G is empty, SUCCESS, otherwise FAILURE

Step 3 is non-deterministic

ooooxooo??ooxx??

oxoo

Page 88: Optimization Problems for Polymorphisms of Single Nucleotides

2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)

1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while

If, at end, G is empty, SUCCESS, otherwise FAILURE

Step 3 is non-deterministic

ooooxooo??ooxx??

oxoo FAILURE (can’t resolve xx?? )

OBJ: find order of application rule that leaves the fewest elements in GOBJ: find order of application rule that leaves the fewest elements in G

Page 89: Optimization Problems for Polymorphisms of Single Nucleotides

- Problem is APX-hard (Gusfield,00)

- Graph-Model + Integer Programming for practical solution (G.,01)

Page 90: Optimization Problems for Polymorphisms of Single Nucleotides

- Problem is APX-hard (Gusfield,00)

- Graph-Model + Integer Programming for practical solution (G.,01)

x??o?

1. expand genotypes

Page 91: Optimization Problems for Polymorphisms of Single Nucleotides

- Problem is APX-hard (Gusfield,00)

- Graph-Model + Integer Programming for practical solution (G.,01)

x??o?

xxxox

xxxoo

xxoox

xxooo

xoxox

xooox

xoxoo

xoooo

1. expand genotypes

Page 92: Optimization Problems for Polymorphisms of Single Nucleotides

- Problem is APX-hard (Gusfield,00)

- Graph-Model + Integer Programming for practical solution (G.,01)

x??o?

xxxox

xxxoo

xxoox

xxooo

xoxox

xooox

xoxoo

xoooo

2. create (h, h’) if exists g s.t. h’ can bederived from g and h

1. expand genotypes 3. Largest number of nodes in forest

rooted at unambiguos genotpes = = largest number of ambiguous genotypes resolved

Hence, find largest number of nodes in forest rooted at unambiguos genotpes. Use I.P. model with vars x(ij).

This reduction is exponential. Is there a better practical approach?

Page 93: Optimization Problems for Polymorphisms of Single Nucleotides

3rd Objective3rd Objective (open research problem)Disease Detection:

oooxx

??oxx

?x??x

????x

xx??x

INPUT: G = { xx??x, ????x, ??oxx, ?x??x, oooxx }

Page 94: Optimization Problems for Polymorphisms of Single Nucleotides

3rd Objective3rd Objective (open research problem)Disease Detection:

xxoxxxxxox

oooxx

oooxxxxxox

xxoxxoxxox

xxoxxoooxx

oooxxoooxx

??oxx

?x??x

????x

xx??x

OUTPUT: H = { xxoxx, xxxox, oooxx, oxxox}

H contains H’, s.t. each diseased has one haplotype in H’ and each healty none

minimize | H|

INPUT: G = { xx??x, ????x, ??oxx, ?x??x, oooxx }

Page 95: Optimization Problems for Polymorphisms of Single Nucleotides

Genome Rearrangements and Genome Rearrangements and Evolutionary DistancesEvolutionary Distances

Page 96: Optimization Problems for Polymorphisms of Single Nucleotides

Each species has a genome (organized in pairs of chromosomes)

tcgtgatggat………………ttgatggattga

tcgattatggat………………ttttgatatcca

Genomes evolve by means of

•Insertions•Deletions•Inversions•Transpositions•Translocations

of DNA regions

Page 97: Optimization Problems for Polymorphisms of Single Nucleotides
Page 98: Optimization Problems for Polymorphisms of Single Nucleotides

deletion

Page 99: Optimization Problems for Polymorphisms of Single Nucleotides

deletioninsertion

Page 100: Optimization Problems for Polymorphisms of Single Nucleotides

deletioninsertion

translocation

Page 101: Optimization Problems for Polymorphisms of Single Nucleotides

deletioninsertion

translocation

inversion

Page 102: Optimization Problems for Polymorphisms of Single Nucleotides

deletioninsertion

translocation

inversion

transposition

Page 103: Optimization Problems for Polymorphisms of Single Nucleotides

Combinatorial problem: given 2 permutations P, Q and operators in a set F find ashortest sequence f1, ..fk of operators such that Q = fk(fk-1(…(f1(P))))

Very difficult problem! We focus on operators all of the same type (e.g. inversions)(…still difficult…)

Wlog we can take Q = (1 2 … n). Hence we talk of sorting by … (inversions, transpositions…)

5 6 4 8 3 2 1 9 7Example:

We focus on inversions, that are the most important in Nature

1 2 3 8 4 6 5 9 7

1 2 3 8 4 5 6 9 7

1 2 3 6 5 4 8 9 7

1 2 3 6 5 4 8 7 9

1 2 3 4 5 6 8 7 9

1 2 3 4 5 6 7 8 9

Page 104: Optimization Problems for Polymorphisms of Single Nucleotides

Combinatorial problem: given 2 permutations P, Q and operators in a set F find ashortest sequence f1, ..fk of operators such that Q = fk(fk-1(…(f1(P))))

Very difficult problem! We focus on operators all of the same type (e.g. inversions)(…still difficult…)

Wlog we can take Q = (1 2 … n). Hence we talk of sorting by … (inversions, transposition…)

+5 +6 -4 -8 -3 -2 -1 -9 +7Example:

We focus on inversions, that are the most important in Nature

+1 +2 +3 +8 +4 -6 -5 -9 +7

+1 +2 +3 +8 +4 +5 +6 -9 +7

+1 +2 +3 -6 -5 -4 -8 -9 +7

+1 +2 +3 -6 -5 -4 -8 -7 +9

+1 +2 +3 +4 +5 +6 -8 -7 +9

+1 +2 +3 +4 +5 +6 +7 +8 +9

There is also a SIGNED VERSION of the problem !

Page 105: Optimization Problems for Polymorphisms of Single Nucleotides

(Unsigned) Sorting by Inversions is NP-hard (longstanding question, settled by Caprara ‘98)

Surprisingly, Signed Sorting by Inversions is Polynomial (beautiful theory, by Hannenhalli and Pevzner)

The complexity of Sorting by Transpositions, e.g., is unknown

Page 106: Optimization Problems for Polymorphisms of Single Nucleotides

5 7 8 2 1 4 3 6 9

The concept of breakpoint

reakpoint at position i if(i) - (i+1) | > 1

0 10

(Unsigned) Sorting by Inversions is NP-hard (longstanding question, settled by Caprara ‘98)

Surprisingly, Signed Sorting by Inversions is Polynomial (beautiful theory, by Hannenhalli and Pevzner)

The complexity of Sorting by Transpositions, e.g., is unknown

Page 107: Optimization Problems for Polymorphisms of Single Nucleotides

(Unsigned) Sorting by Inversions is NP-hard (longstanding question, settled by Caprara ‘98)

Surprisingly, Signed Sorting by Inversions is Polynomial (beautiful theory, by Hannenhalli and Pevzner)

The complexity of Sorting by Transpositions, e.g., is unknown

5 7 8 2 1 4 3 6 9

The concept of breakpoint

reakpoint at position i if(i) - (i+1) | > 1

0 10

d() = inversion distanceb() = # breakpoints

TRIVIAL BOUND: d() >= b() / 2

Example: d() >= 6 / 2 = 3

Page 108: Optimization Problems for Polymorphisms of Single Nucleotides

The Breakpoint GraphBreakpoint Graph

5 7 8 2 1 4 3 6 9 0

10

Page 109: Optimization Problems for Polymorphisms of Single Nucleotides

The Breakpoint GraphBreakpoint Graph

5 7 8 2 1 4 3 6 9 0

10

Page 110: Optimization Problems for Polymorphisms of Single Nucleotides

The Breakpoint GraphBreakpoint Graph

5 7 8 2 1 4 3 6 9 0

10

Page 111: Optimization Problems for Polymorphisms of Single Nucleotides

The Breakpoint GraphBreakpoint Graph

5 7 8 2 1 4 3 6 9 0

10

Page 112: Optimization Problems for Polymorphisms of Single Nucleotides

The Breakpoint GraphBreakpoint Graph

5 7 8 2 1 4 3 6 9 0

10

10 64

Each node has degree...

0 2 or 4 …

hence the graph can be decomposed in cycles!

Page 113: Optimization Problems for Polymorphisms of Single Nucleotides

The Breakpoint GraphBreakpoint Graph

5 7 8 2 1 4 3 6 9 0

10

Alternating cycle decomposition

Page 114: Optimization Problems for Polymorphisms of Single Nucleotides

The Breakpoint GraphBreakpoint Graph

5 7 8 2 1 4 3 6 9 0

10

Alternating cycle decomposition

Page 115: Optimization Problems for Polymorphisms of Single Nucleotides

The Breakpoint GraphBreakpoint Graph

5 7 8 2 1 4 3 6 9 0

10

Alternating cycle decomposition

c() = max # cycles in alternating decomposition

VERY STRONG BOUND : d () >= b() - c()

Example: c()= 2 and d () >= 6 - 2 = 4

Page 116: Optimization Problems for Polymorphisms of Single Nucleotides

The Breakpoint GraphBreakpoint Graph

5 7 8 2 1 4 3 6 9 0

10

The best algorithm for this problem is based on an Integer Programmingformulation of the max cycle decomposition

A variable xC for each cycle (exponential # of vars…)

A constraint xC = 1 for each edge e

Objective: maximize C xC

C containing e

Page 117: Optimization Problems for Polymorphisms of Single Nucleotides

max xCC

xC = 1 for all edges eC\ni e

xC \in {0,1} for all alt. cycles C

PRIMAL

min yee

ye <= 1 for all alt. Cycles Ce\in C

ye \in R for all edges e

DUAL

Page 118: Optimization Problems for Polymorphisms of Single Nucleotides

max xCC

xC = 1 for all edges eC\ni e

xC \in {0,1} for all alt. cycles C

PRIMAL

min yee

ye <= 1 for all alt. Cycles Ce\in C

ye \in R for all edges e

DUAL

Page 119: Optimization Problems for Polymorphisms of Single Nucleotides

5 7 8 2 1 4 3 6 9 0

10

Pricing out the cycles for which y*(C) < 1Pricing out the cycles for which y*(C) < 1

Page 120: Optimization Problems for Polymorphisms of Single Nucleotides

5 7 8 2 1 4 3 6 9 0

10

5 7 8 2 1 4 3 6 9 0

10

Split the graph in two copiesSplit the graph in two copies

Page 121: Optimization Problems for Polymorphisms of Single Nucleotides

5 7 8 2 1 4 3 6 9 0

10

5 7 8 2 1 4 3 6 9 0

10

Connect twinsConnect twins

Page 122: Optimization Problems for Polymorphisms of Single Nucleotides

5 7 8 2 1 4 3 6 9 0

10

5 7 8 2 1 4 3 6 9 0

10

A perfect matching corresponds to (a set of) alternating cyclesA perfect matching corresponds to (a set of) alternating cycles

Page 123: Optimization Problems for Polymorphisms of Single Nucleotides

5 7 8 2 1 4 3 6 9 0

10

5 7 8 2 1 4 3 6 9 0

10

A perfect matching corresponds to (a set of) alternating cyclesA perfect matching corresponds to (a set of) alternating cycles

Page 124: Optimization Problems for Polymorphisms of Single Nucleotides

5 7 8 2 1 4 3 6 9 0

10

5 7 8 2 1 4 3 6 9 0

10

A perfect matching corresponds to (a set of) alternating cyclesA perfect matching corresponds to (a set of) alternating cycles

Page 125: Optimization Problems for Polymorphisms of Single Nucleotides

5 7 8 2 1 4 3 6 9 0

10

5 7 8 2 1 4 3 6 9 0

10

A perfect matching corresponds to (a set of) alternating cyclesA perfect matching corresponds to (a set of) alternating cycles

Page 126: Optimization Problems for Polymorphisms of Single Nucleotides

5 7 8 2 1 4 3 6 9 0

10

5 7 8 2 1 4 3 6 9 0

10

A perfect matching corresponds to (a set of) alternating cyclesA perfect matching corresponds to (a set of) alternating cycles

Page 127: Optimization Problems for Polymorphisms of Single Nucleotides

5 7 8 2 1 4 3 6 9 0

10

5 7 8 2 1 4 3 6 9 0

10

The weight of the matching is the y*-weight of the cyclesThe weight of the matching is the y*-weight of the cycles

.2

.4

.5

1

.6

0

Page 128: Optimization Problems for Polymorphisms of Single Nucleotides

5 7 8 2 1 4 3 6 9 0

10

5 7 8 2 1 4 3 6 9 0

10

Forcing a cycle to use a certain nodeForcing a cycle to use a certain node

.2

.4

.5

1

.6

100000

Page 129: Optimization Problems for Polymorphisms of Single Nucleotides

- These cycles would not use the same node twice, but with simple trick is possible to model (OMISSIS)

BRANCH&PRICE algorithm by Caprara, Lancia, Ng (1999,2001)

BRANCH&BOUND combinatorial algorithm by Kececioglu, Sankoff (1996)

KS can solve at most n=40. Take days for n=50

CLN can solve for n=200. Takes few seconds (say 5) for n=100

NP-hard problem practically solved to optimality!

Page 130: Optimization Problems for Polymorphisms of Single Nucleotides

Statistical view of evolutionStatistical view of evolution

• Genome evolve by random inversions

• It’s like a random walk on a huge graph with an edge for

each permutation an edge for each inversion

• It is not clear why the shortest solution should be the

one followed by Nature (in fact, often it isn’t)

• We want to find the most likely number of inversions

that lead from (1 2 … n ) to

• We use the expected number of breakpoints after k

inversions as a way to guess the # of inversions

Page 131: Optimization Problems for Polymorphisms of Single Nucleotides

Let B(k) be the (r.v.) number of breakpoint after k random inversions from (1..n)

Given a obtained by h random inversions from (1 … n ) we want to estimate h

The inversion distance is only a lower bound: h >= d() but the gap could be big

We estimate E[B(k)]. Then, faced with some , we pick h such that E[B(h)] is as close as possible to b() (maximum likelihood). CL ,2000, have shown:

Question: estimate E[D(k)], the (r.v.) inversion distance after k random inversions

E[B(k)] = ( n - 1 ) ( 1 - ( ) )

n - 3n - 1

k

Page 132: Optimization Problems for Polymorphisms of Single Nucleotides

Example: n = 200, k (u.a.r. in 1…n) inversions

8 8 8 1619 19 19 3468 67 67 9869 73 68 10473 79 73 10985 91 83 12086 85 83 11587 90 84 119118 117 109 138184 184 135 168

k k’ d() b

Page 133: Optimization Problems for Polymorphisms of Single Nucleotides

Protein Structure Alignments: the Protein Structure Alignments: the Maximum Contact Map Overlap Maximum Contact Map Overlap

ProblemProblem

Page 134: Optimization Problems for Polymorphisms of Single Nucleotides

A ProteinProtein is a complex molecule with a primary, linear structure (a sequence of aminoacids) and a3-Dimensional structure (the protein fold).

Protein STRUCTURE determines its FUNCTION

For instance, the Drug Design problemcalls for constructing peptides with a 3Dshape complementary to a protein, so asto dock onto it.

Page 135: Optimization Problems for Polymorphisms of Single Nucleotides

Motivation:Motivation:Structure Alignment is Important for:

- Discovery of Protein Function (shape determines function)

- Search in 3D data bases

- Protein Classification and Evolutionary Studies

- ...

Problem: Problem: Align two 3D protein structures

Page 136: Optimization Problems for Polymorphisms of Single Nucleotides

Contact MapsContact Maps

Page 137: Optimization Problems for Polymorphisms of Single Nucleotides

Unfolded protein

CONTACT MAPSCONTACT MAPS

Page 138: Optimization Problems for Polymorphisms of Single Nucleotides

Unfolded protein

Folded protein = contacts

CONTACT MAPSCONTACT MAPS

Page 139: Optimization Problems for Polymorphisms of Single Nucleotides

Unfolded protein

Folded protein = contacts

Contact map = graph

CONTACT MAPSCONTACT MAPS

Page 140: Optimization Problems for Polymorphisms of Single Nucleotides

CONTACT MAPSCONTACT MAPS

Unfolded protein

Folded protein = contacts

Contact map = graph

OBJECTIVE: align 3d folds of proteins = align contact maps

Page 141: Optimization Problems for Polymorphisms of Single Nucleotides

Contact Map AlignmentsContact Map Alignments

Page 142: Optimization Problems for Polymorphisms of Single Nucleotides

Non-crossing AlignmentsNon-crossing Alignments

Protein 1

Protein 2

non-crossing map of residues in protein 1 and protein 2

Page 143: Optimization Problems for Polymorphisms of Single Nucleotides

The value of an alignmentThe value of an alignment

Page 144: Optimization Problems for Polymorphisms of Single Nucleotides

The value of an alignmentThe value of an alignment

Page 145: Optimization Problems for Polymorphisms of Single Nucleotides

The value of an alignmentThe value of an alignment

Page 146: Optimization Problems for Polymorphisms of Single Nucleotides

Value = 3

The value of an alignmentThe value of an alignment

Page 147: Optimization Problems for Polymorphisms of Single Nucleotides

Value = 3We want to maximize the value

The value of an alignmentThe value of an alignment

Page 148: Optimization Problems for Polymorphisms of Single Nucleotides

NP-Hard

The value of an alignmentThe value of an alignment

Page 149: Optimization Problems for Polymorphisms of Single Nucleotides

Integer Programming Integer Programming FormulationFormulation

Page 150: Optimization Problems for Polymorphisms of Single Nucleotides

Integer Programming Integer Programming FormulationFormulation

0-1 VARIABLES

yef for e and f contacts

e

f

yef

Page 151: Optimization Problems for Polymorphisms of Single Nucleotides

Integer Programming Integer Programming FormulationFormulation

0-1 VARIABLES

yef + ye’f’ <= 1

yef for e and f contacts

e

f

yef

CONSTRAINTS

e

f

e’

f’

Page 152: Optimization Problems for Polymorphisms of Single Nucleotides

Integer Programming Integer Programming FormulationFormulation

0-1 VARIABLES

yef + ye’f’ <= 1

yef for e and f contacts

e

f

yef

CONSTRAINTS

e

f

e’

f’

OBJECTIVE max ef yef

Page 153: Optimization Problems for Polymorphisms of Single Nucleotides

Independent Set ProblemIndependent Set ProblemIt’s just a huge max independent set problem in Gy:

• a node for each sharing • an edge for each pair of incompatible sharings

e

f

e’

f’f’’

e’’

ef

e’f’

e’’f’’

Page 154: Optimization Problems for Polymorphisms of Single Nucleotides

Independent Set ProblemIndependent Set ProblemIt’s just a huge max independent set problem in Gy:

• a node for each sharing • an edge for each pair of incompatible sharings

e

f

e’

f’f’’

e’’

ef

e’f’

e’’f’’

|Gy|=|E1|*|E2| (approximately 5000 for two proteins with 50 residues and 75 contacts each)

The best exact algorithm for independent set can solve for at most a few hundred nodes

Page 155: Optimization Problems for Polymorphisms of Single Nucleotides

Node to Node VariablesNode to Node VariablesNew variables x provide an easy check for the non-crossing conditions

NEW VARIABLES

xij for i and j residues

e

f

yef

i

jxij

Page 156: Optimization Problems for Polymorphisms of Single Nucleotides

Node to Node VariablesNode to Node VariablesNew variables x provide an easy check for the non-crossing conditions

NEW VARIABLES

xij for i and j residues

e

f

yef

NEW CONSTRAINTS

i

j

i’

j’

xij + xi’j’ <= 1

i

jxij

Page 157: Optimization Problems for Polymorphisms of Single Nucleotides

Node to Node VariablesNode to Node VariablesNew variables x provide an easy check for the non-crossing conditions

NEW VARIABLES

y(ip)(jq) <= xij and y(ip)(jq) <= xpq

xij for i and j residues

e

f

yef

NEW CONSTRAINTS

i

j

i’

j’

xij + xi’j’ <= 1

i

jxij

i

j

p

q

Page 158: Optimization Problems for Polymorphisms of Single Nucleotides

Clique ConstraintsClique ConstraintsVariables x define a graph Gx:

• A node for each line• An edge between each pair of crossing lines

i

j

i’

j’

ij

i’j’

Page 159: Optimization Problems for Polymorphisms of Single Nucleotides

Clique ConstraintsClique ConstraintsVariables x define a graph Gx:

• Gx is much smaller than Gy

• Gx has nice proprieties (it’s a perfect graph)• It’s easier to find large independent sets in Gx

• A node for each line• An edge between each pair of crossing lines

i

j

i’

j’

ij

i’j’

Page 160: Optimization Problems for Polymorphisms of Single Nucleotides

Clique ConstraintsClique ConstraintsNon-crossing constraints can be extended to

CLIQUE CONSTRAINTS

xij <= 1[i,j] in M

For all sets M of mutually incompatible (i.e. crossing) lines

All clique constraints satisfied (and Gx perfect) imply a strong bound!

Page 161: Optimization Problems for Polymorphisms of Single Nucleotides

Structure of Maximal cliques in Structure of Maximal cliques in GGxx

1. Pick two subsets of same size

Page 162: Optimization Problems for Polymorphisms of Single Nucleotides

Structure of Maximal cliques in Structure of Maximal cliques in GGxx

2. Connect them in a zig-zag fashion

Page 163: Optimization Problems for Polymorphisms of Single Nucleotides

Structure of Maximal cliques in Structure of Maximal cliques in GGxx

Page 164: Optimization Problems for Polymorphisms of Single Nucleotides

Structure of Maximal cliques in Structure of Maximal cliques in GGxx

Page 165: Optimization Problems for Polymorphisms of Single Nucleotides

Structure of Maximal cliques in Structure of Maximal cliques in GGxx

Page 166: Optimization Problems for Polymorphisms of Single Nucleotides

Structure of Maximal cliques in Structure of Maximal cliques in GGxx

Page 167: Optimization Problems for Polymorphisms of Single Nucleotides

Structure of Maximal cliques in Structure of Maximal cliques in GGxx

Page 168: Optimization Problems for Polymorphisms of Single Nucleotides

Structure of Maximal cliques in Structure of Maximal cliques in GGxx

Page 169: Optimization Problems for Polymorphisms of Single Nucleotides

Structure of Maximal cliques in Structure of Maximal cliques in GGxx

Page 170: Optimization Problems for Polymorphisms of Single Nucleotides

Structure of Maximal cliques in Structure of Maximal cliques in GGxx

Page 171: Optimization Problems for Polymorphisms of Single Nucleotides

Structure of Maximal cliques in Structure of Maximal cliques in GGxx

Page 172: Optimization Problems for Polymorphisms of Single Nucleotides

Structure of Maximal cliques in Structure of Maximal cliques in GGxx

3. Throw in all lines included in a zig or a zag

Page 173: Optimization Problems for Polymorphisms of Single Nucleotides

Structure of Maximal cliques in Structure of Maximal cliques in GGxx

3. Throw in all lines included in a zig or a zag

Page 174: Optimization Problems for Polymorphisms of Single Nucleotides

Structure of Maximal cliques in Structure of Maximal cliques in GGxx

The result is a maximal clique in Gx

Page 175: Optimization Problems for Polymorphisms of Single Nucleotides

Separation of Clique InequalitiesSeparation of Clique Inequalities

Page 176: Optimization Problems for Polymorphisms of Single Nucleotides

Separation of Clique InequalitiesSeparation of Clique InequalitiesPROBLEM

There exist exponentially many such cliques (O(22n) inequalities).

We need to generate in polynomial time a clique inequality when needed,i.e., when violated by the current LP solution x*

x*ij > 1[i,j] in M

THEOREM

We can find the most violated clique inequality in time O(n2)

Page 177: Optimization Problems for Polymorphisms of Single Nucleotides

Separation of Clique InequalitiesSeparation of Clique InequalitiesPROOF (sketch)

1) Clique = zigzag path

Page 178: Optimization Problems for Polymorphisms of Single Nucleotides

Separation of Clique InequalitiesSeparation of Clique InequalitiesPROOF (sketch)

1) Clique = zigzag path

1 2 3 4 5 6 7 8

Page 179: Optimization Problems for Polymorphisms of Single Nucleotides

Separation of Clique InequalitiesSeparation of Clique InequalitiesPROOF (sketch)

1) Clique = zigzag path 2) Flip one graph: zigzag leftright

1 2 3 4 5 6 7 8 8 7 6 5 4 3 2 1

Page 180: Optimization Problems for Polymorphisms of Single Nucleotides

Separation of Clique InequalitiesSeparation of Clique InequalitiesPROOF (sketch)

1) Clique = zigzag path 2) Flip one graph: zigzag leftright

1 2 3 4 5 6 7 8 8 7 6 5 4 3 2 1

3) Define a grid with lengths for arcs so that length(P) = x*(clique(P)). Use Dyn. Progr.to find longest path in grid, time O(n^2)

Page 181: Optimization Problems for Polymorphisms of Single Nucleotides

Separation of cliquesSeparation of cliques

n2

1n11 2

2

i

u

Create n1 x n2 gridOrient all edges and give weights

Page 182: Optimization Problems for Polymorphisms of Single Nucleotides

Separation of cliquesSeparation of cliques

n2

1n11 2

2

i

u

Create n1 x n2 gridOrient all edges and give weights

x*iu

x*iu

Page 183: Optimization Problems for Polymorphisms of Single Nucleotides

Separation of cliquesSeparation of cliques

Create n1 x n2 gridOrient all edges and give weightsThere is violated clique iff longest A,B path has length > 1

A=(1,n2)

B=(n1,1)

Page 184: Optimization Problems for Polymorphisms of Single Nucleotides

Gx is a Perfect GraphGx is a Perfect Graph

We show why polynomial separation is possible:

Gx is weakly triangulated (no chordless cycles >= 5 in Gx or Gx)

=> Gx is perfect (Hayward, 1985)

Page 185: Optimization Problems for Polymorphisms of Single Nucleotides

Gx is a Perfect GraphGx is a Perfect Graph

L1

L2

L3

L4

L7

L6

L5

PROOF (Sketch, for Gx)

L1 and L3 don’t cross. Wlog RIGHT(L3, L1)

Page 186: Optimization Problems for Polymorphisms of Single Nucleotides

Gx is a Perfect GraphGx is a Perfect Graph

L1

L2

L3

L4

L7

L6

L5L1 L3

L1 and L3 don’t cross. Wlog RIGHT(L3, L1)

Page 187: Optimization Problems for Polymorphisms of Single Nucleotides

Gx is a Perfect GraphGx is a Perfect Graph

L1

L2

L3

L4

L7

L6

L5L1 L3

For i=4,5,… Li crosses Li-1 but not L1

=> RIGHT (Li, L1)

Page 188: Optimization Problems for Polymorphisms of Single Nucleotides

Gx is a Perfect GraphGx is a Perfect Graph

L1

L2

L3

L4

L7

L6

L5L1 L3

For i=4,5,… Li crosses Li-1 but not L1

=> RIGHT (Li, L1)

L4

Page 189: Optimization Problems for Polymorphisms of Single Nucleotides

Gx is a Perfect GraphGx is a Perfect Graph

L1

L2

L3

L4

L7

L6

L5

For i=4,5,… Li crosses Li-1 but not L1

=> RIGHT (Li, L1)

L1

L4

L5

Page 190: Optimization Problems for Polymorphisms of Single Nucleotides

Gx is a Perfect GraphGx is a Perfect Graph

L1

L2

L3

L4

L7

L6

L5

For i=4,5,… Li crosses Li-1 but not L1

=> RIGHT (Li, L1)

L1 L5L6

Page 191: Optimization Problems for Polymorphisms of Single Nucleotides

Gx is a Perfect GraphGx is a Perfect Graph

L1

L2

L3

L4

L7

L6

L5L1

We get LEFT(L1, {L3, L4, L5, L6})

L3, L4, L5 L6

L6

Page 192: Optimization Problems for Polymorphisms of Single Nucleotides

Gx is a Perfect GraphGx is a Perfect Graph

L1

L2

L3

L4

L7

L6

L5L1

A symmetric argument started at L6, with LEFT(L1, L6) implies LEFT(Li, L6) for i=2,3,4,5

L3, L4, L5 L6

L6

Page 193: Optimization Problems for Polymorphisms of Single Nucleotides

Gx is a Perfect GraphGx is a Perfect Graph

L1

L2

L3

L4

L7

L6

L5L1

A symmetric argument started at L6, with LEFT(L1, L6) implies LEFT(Li, L6) for i=2,3,4,5

L3, L4, L5 L6

L6

L2, L3, L4 L5

Page 194: Optimization Problems for Polymorphisms of Single Nucleotides

Gx is a Perfect GraphGx is a Perfect Graph

L1

L2

L3

L4

L7

L6

L5L1

Then {L3, L4, L5} are between L1 and L6

L3, L4, L5 L6

L6

L2, L3, L4 L5

Page 195: Optimization Problems for Polymorphisms of Single Nucleotides

Gx is a Perfect GraphGx is a Perfect Graph

L1

L2

L3

L4

L7

L6

L5L1

Then {L3, L4, L5} are between L1 and L6

L3, L4, L5 L6

L6

L2, L3, L4 L5

But L7 crosses L1 and L6, and so should cross them all !

L7

Page 196: Optimization Problems for Polymorphisms of Single Nucleotides

The approach just seen is due to Lancia, Carr, Istrail, Walenz (2001)It can be applied to small or moderate proteins (up to 80 residues/150 contacts)

In 2002, a new approach, by Caprara and Lancia, based on LAGRANGIANLAGRANGIANRELAXATIONRELAXATION. Approach borrowed from Quadratic Assignment. With newapproach we can solve important proteins (up to 150 residues/300 contacts)

Page 197: Optimization Problems for Polymorphisms of Single Nucleotides

What about Heuristics?What about Heuristics?E.g., genetic algorithms…E.g., genetic algorithms…

Page 198: Optimization Problems for Polymorphisms of Single Nucleotides

Genetic Algorithm OverviewGenetic Algorithm Overview

• A Population of candidate solutions thatevolve (improve) over time

• Recombination creates new candidate solutions viacrossover and mutation

Populationat time t

Populationat time t+1

Recombinationoperators

Evaluationfunction

Page 199: Optimization Problems for Polymorphisms of Single Nucleotides

CrossoverCrossover

• Crossover selects pieces from both parents and creates two offspring solutions

Blue Parent

Offspring

Red Parent

Page 200: Optimization Problems for Polymorphisms of Single Nucleotides

CrossoverCrossover

• Crossover selects pieces from both parents and creates two offspring solutions– Select a set of edges in one parent to copy to the child

Page 201: Optimization Problems for Polymorphisms of Single Nucleotides

CrossoverCrossover

• Crossover selects pieces from both parents and creates two offspring solutions– Select a set of edges in one parent to copy to the child

Page 202: Optimization Problems for Polymorphisms of Single Nucleotides

CrossoverCrossover

• Crossover selects pieces from both parents and creates two offspring solutions– Select a set of edges in one parent to copy to the child

– Copy as many edges as possible from the other parent

Page 203: Optimization Problems for Polymorphisms of Single Nucleotides

CrossoverCrossover

• Crossover selects pieces from both parents and creates two offspring solutions– Select a set of edges in one parent to copy to the child

– Copy as many edges as possible from the other parentThese edges conflict with existing

edges and are not copied

Page 204: Optimization Problems for Polymorphisms of Single Nucleotides

CrossoverCrossover

• Crossover selects pieces from both parents and creates two offspring solutions– Select a set of edges in one parent to copy to the child

– Copy as many edges as possible from the other parent

– Add random edges to fill any remaining space

Page 205: Optimization Problems for Polymorphisms of Single Nucleotides

CrossoverCrossover

• Crossover selects pieces from both parents and creates two offspring solutions– Select a set of edges in one parent to copy to the child

– Copy as many edges as possible from the other parent

– Add random edges to fill any remaining space

Page 206: Optimization Problems for Polymorphisms of Single Nucleotides

MutationMutation

• Mutation introduces small changes to existing solutions by shifting edge endpoints

Page 207: Optimization Problems for Polymorphisms of Single Nucleotides

MutationMutation

• Mutation introduces small changes to existing solutions by shifting edge endpoints– Select a set of endpoints to shift

Page 208: Optimization Problems for Polymorphisms of Single Nucleotides

MutationMutation

• Mutation introduces small changes to existing solutions by shifting edge endpoints– Select a set of endpoints to shift

Page 209: Optimization Problems for Polymorphisms of Single Nucleotides

MutationMutation

• Mutation introduces small changes to existing solutions by shifting edge endpoints– Select a set of endpoints to shift

This edge “fell off” theend of the contact map

and is removed

Page 210: Optimization Problems for Polymorphisms of Single Nucleotides

MutationMutation

• Mutation introduces small changes to existing solutions by shifting edge endpoints– Select a set of endpoints to shift

– Randomly add new edges

Page 211: Optimization Problems for Polymorphisms of Single Nucleotides

MutationMutation

• Mutation introduces small changes to existing solutions by shifting edge endpoints– Select a set of endpoints to shift

– Randomly add new edges

Page 212: Optimization Problems for Polymorphisms of Single Nucleotides

Computational ResultsComputational Results

Page 213: Optimization Problems for Polymorphisms of Single Nucleotides

Computational ResultsComputational Results

• 269 proteins– 70 -100 residues

– 80 to 140 contacts

• Picked 10,000 pairs of proteins out of 36046 possible

• Took a weekend on PC

• 500 were solved to optimality

• 2500 had a gap <= 10 contacts

Page 214: Optimization Problems for Polymorphisms of Single Nucleotides

Skolnick Clustering TestSkolnick Clustering Test

Page 215: Optimization Problems for Polymorphisms of Single Nucleotides

Skolnick ResultsSkolnick Results• Four Families

1 Flavodoxin-like fold Che-Y related

2 Plastocyanin

3 TIM Barrel

4 Ferratin

• alpha-beta

• 8 structures

• up to 124 residues

• 15-30% sequence similarity

• < 3Å RMSD

Page 216: Optimization Problems for Polymorphisms of Single Nucleotides

Skolnick ResultsSkolnick Results• Four Families

1 Flavodoxin-like fold Che-Y related

2 Plastocyanin

3 TIM Barrel

4 Ferratin

• beta

• 8 structures

• up to 99 residues

• 35-90% sequence similarity

• < 2Å RMSD

Page 217: Optimization Problems for Polymorphisms of Single Nucleotides

Skolnick ResultsSkolnick Results• Four Families

1 Flavodoxin-like fold Che-Y related

2 Plastocyanin

3 TIM Barrel

4 Ferratin

• alpha-beta

• 11 structures

• up to 250 residues

• 30-90% sequence similarity

• < 2Å RMSD

Page 218: Optimization Problems for Polymorphisms of Single Nucleotides

Skolnick ResultsSkolnick Results• Four Families

1 Flavodoxin-like fold Che-Y related

2 Plastocyanin

3 TIM Barrel

4 Ferratin

• alpha

• 6 structures

• up to 170 residues

• 7-70% sequence similarity

• < 4Å RMSD

Page 219: Optimization Problems for Polymorphisms of Single Nucleotides

Skolnick ResultsSkolnick Results

Family Style Residues Seq. Sim. RMSD Proteins1 alpha-beta 124 15-30% < 3A 1b00, 1dbw, 1nat, 1ntr,

1qmp, 1rnl, 3cah, 4tmy2 beta 99 35-90% < 2A 1baw, 1byo, 1kdi, 1nin,

1pla, 3b3i, 2pcy, 2plt3 alpha-beta 250 30-90% < 2A 1amk, 1aw2, 1b9b, 1btm,

1hti, 1tmh, 1tre, 1tri,1ydv, 3ypi, 8tim

4 170 7-70% < 4A 1b71, 1bcf, 1dps, 1fha,1ier, 1rcd

• Four Families1 Flavodoxin-like fold Che-Y related

2 Plastocyanin

3 TIM Barrel

4 Ferratin

Page 220: Optimization Problems for Polymorphisms of Single Nucleotides

ClusteringClustering

Define score(P1, P2) as

0 <= # shared contacts

Min # of contacts of P1,P2

<= 1

Put P1, P2 in same family if score(P1, P2) >= threshold

Page 221: Optimization Problems for Polymorphisms of Single Nucleotides

ClusteringClustering

Define score(P1, P2) as

0 <= # shared contacts

Min # of contacts of P1,P2

<= 1

Put P1, P2 in same family if score(P1, P2) >= threshold

If P1, P2 too big, use G.A. and local search to compute score

L.P. gives then bounds:

HEUR score <= OPT score <= LP boundHEUR score <= OPT score <= LP bound

and we know how far off OPT we are

Page 222: Optimization Problems for Polymorphisms of Single Nucleotides

Clustering validationClustering validation

We got some known families from biologists, PDB.

Experiment: Take a family F of proteins and align them against each other and against the remaining.

Page 223: Optimization Problems for Polymorphisms of Single Nucleotides

Clustering validationClustering validation

We got some known families from biologists, PDB.

0.05 MISMATCH0.1 MISMATCH0.15 MISMATCH0.2 MISMATCH0.25 MISMATCH0.3 MISMATCH0.35 MATCH…… ……1.0 MATCH

score proteins were…

Experiment: Take a family F of proteins and align them against each other and against the remaining.

TYPICAL BEHAVIOUR

Page 224: Optimization Problems for Polymorphisms of Single Nucleotides

Skolnick ResultsSkolnick Results

• Performance– 528 alignments

– 1.3% false negative

– 0.0% false positive

Page 225: Optimization Problems for Polymorphisms of Single Nucleotides

ClusteringClustering

Computed, for 1st time, provably optimal alignments for 150 pairs(inter-family)

Used the CMO value to cluster: retrieves the clusters.

Set S(i,j) = 1 if CMO >= , S(i,j) = 0 otherwise

Use TSP to find a block diagonal structure for S

Page 226: Optimization Problems for Polymorphisms of Single Nucleotides

ClusteringClustering

Page 227: Optimization Problems for Polymorphisms of Single Nucleotides

Last Open ProblemLast Open Problem

? ?