280
Algorithms and Bioinformatics Part II — Comparative Genomics Laurent Bulteau

Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

Algorithms and Bioinformatics

Part II — Comparative Genomics

II.1 — Models and Distances

Laurent Bulteau

Slides from Anthony Labarre

Page 2: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

Algorithms and Bioinformatics

Part II — Comparative Genomics

II.1 — Models and Distances

Laurent BulteauSlides from Anthony Labarre

Page 3: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

Context and motivationsComparing genomes

Context and motivations

I Deoxyribonucleic acid: double helix ofnucleotides (A, C, G, T);

I Complementarity (A-T, C-G): one stripis enough;

I Gene = sequence of nucleotides (thatcodes for a specific protein);

I Chromosome = ordered set of genes;

I Genome = set of chromosomes;

I Goal: compare genomes;

2

Page 4: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

Context and motivationsComparing genomes

Context and motivations

I Biologists are interested in comparing species, for example:I in order to classify them;I in order to explain evolution by reconstructing scenarios;

I (Dis)similarity measures are needed;

I Usually based on the sequenced genomes;

3

Page 5: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

Context and motivationsComparing genomes

At the nucleotide level

I Most comparisons take place at the nucleotide level;

Example (sequence alignment)

S1 : · · · T C C G C C A − − C T A · · ·| | | | | |

S2 : · · · T C G G A C T G G C − A · · ·

I Matches, substitutions, insertions and deletions;

I Correspond to mutations;

4

Page 6: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

Context and motivationsComparing genomes

At the “gene” level

I Some mutations act on segments of nucleotides;

I Those large-scale mutations are called genome rearrangements;

I Sequence alignment becomes unfit;

Example (genomes as sequences of segments)

(A)

(B)

genome rearrangements

5

Page 7: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

Context and motivationsComparing genomes

At the “gene” level

I Some mutations act on segments of nucleotides;

I Those large-scale mutations are called genome rearrangements;

I Sequence alignment becomes unfit;

Example (genomes as sequences of segments)

(A)

(B)

genome rearrangements

6

Page 8: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

Context and motivationsComparing genomes

At the “gene” level

I Some mutations act on segments of nucleotides;

I Those large-scale mutations are called genome rearrangements;

I Sequence alignment becomes unfit;

Example (genomes as sequences of segments)

(A)

(B)

genome rearrangements

7

Page 9: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

Context and motivationsComparing genomes

Genome rearrangements

I Our problem:

Problem (pairwise genome rearrangement)

Input: genomes G1, G2, a set S of mutations;Goal: find a shortest sequence of elements of S that transforms G1

into G2.

I Related, simpler problem: compute the evolutionary distancedS(G1,G2) (i.e. just the length of a shortest sequence);

I Many variants, depending on how genomes are modelled, what(and how) mutations are taken into account, etc.;

8

Page 10: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Modelling genomes as permutations

I Genomes are seen as permutations if:

1. they form ordered sequences of genes (or other segments), and2. they only differ by order (no duplications or deletions).

Example (genomes → permutations)

5 1 2 4 7 3 6

(A)

1 2 3 4 5 6 7

(B)

genome rearrangements

9

Page 11: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Modelling genomes as permutations

I Genomes are seen as permutations if:

1. they form ordered sequences of genes (or other segments), and2. they only differ by order (no duplications or deletions).

Example (genomes → permutations)

5 1 2 4 7 3 6

(A)

1 2 3 4 5 6 7

(B)

genome rearrangements

10

Page 12: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Modelling genomes as permutations

I Genomes are seen as permutations if:

1. they form ordered sequences of genes (or other segments), and2. they only differ by order (no duplications or deletions).

Example (genomes → permutations)

5 1 2 4 7 3 6

(A)

1 2 3 4 5 6 7 (B)

genome rearrangements

11

Page 13: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Modelling genomes as permutations

I Genomes are seen as permutations if:

1. they form ordered sequences of genes (or other segments), and2. they only differ by order (no duplications or deletions).

Example (genomes → permutations)

5 1 2 4 7 3 6 (A)

1 2 3 4 5 6 7 (B)

genome rearrangements

12

Page 14: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Modelling genomes as permutations

I Genomes are seen as permutations if:

1. they form ordered sequences of genes (or other segments), and2. they only differ by order (no duplications or deletions).

Example (genomes → permutations)

5 1 2 4 7 3 6 (A)

1 2 3 4 5 6 7 (B)

genome rearrangements

13

Page 15: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Genome rearrangements for permutations

I Segments can be numbered as we wish, so we assume eithergenome is the identity permutation ι = 〈1 2 · · · n〉 and we wishto sort the other genome:

Problem (genome rearrangement (permutations))

Input: a permutation π in Sn, a set S ⊆ Sn of (per)mutations;Goal: find a shortest sorting sequence of elements of S for π.

I Again, we can also focus on merely computing dS(π) – the lengthof an optimal sorting sequence;

I S must generate Sn for any pair of permutations to be a finitedistance apart;

14

Page 16: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Notation and definitions pertaining to permutations

I Permutations can be written in one- or two-row notation:

π =

⟨1 2 3 4 5 64 1 6 2 5 3

⟩= 〈4 1 6 2 5 3〉.

I We deal exclusively with [n] = 1, 2, . . . , n;I All permutations of [n] with composition form the symmetric

group Sn;

I Composition: the usual , which means that in π σ, σ is appliedfirst;

15

Page 17: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Notation and definitions pertaining to permutations

I Permutations can be written in one- or two-row notation:

π =

⟨1 2 3 4 5 64 1 6 2 5 3

⟩= 〈4 1 6 2 5 3〉.

I We deal exclusively with [n] = 1, 2, . . . , n;

I All permutations of [n] with composition form the symmetricgroup Sn;

I Composition: the usual , which means that in π σ, σ is appliedfirst;

16

Page 18: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Notation and definitions pertaining to permutations

I Permutations can be written in one- or two-row notation:

π =

⟨1 2 3 4 5 64 1 6 2 5 3

⟩= 〈4 1 6 2 5 3〉.

I We deal exclusively with [n] = 1, 2, . . . , n;I All permutations of [n] with composition form the symmetric

group Sn;

I Composition: the usual , which means that in π σ, σ is appliedfirst;

17

Page 19: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Notation and definitions pertaining to permutations

I Permutations can be written in one- or two-row notation:

π =

⟨1 2 3 4 5 64 1 6 2 5 3

⟩= 〈4 1 6 2 5 3〉.

I We deal exclusively with [n] = 1, 2, . . . , n;I All permutations of [n] with composition form the symmetric

group Sn;

I Composition: the usual , which means that in π σ, σ is appliedfirst;

18

Page 20: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Sorting permutations by adjacent exchanges

I Simple operation : exchange any two adjacent elements:

Example

4 1 6 2 5 3

1 4 6 2 5 3

I So we want to sort a permutation by performing as few suchexchanges as possible;

I Solution:

I sorting: use Bubble Sort;I distance: the number of swaps used by Bubble Sort;

19

Page 21: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Sorting permutations by adjacent exchanges

I Simple operation : exchange any two adjacent elements:

Example

4 1 6 2 5 3

1 4 6 2 5 3

I So we want to sort a permutation by performing as few suchexchanges as possible;

I Solution:I sorting: use Bubble Sort;

I distance: the number of swaps used by Bubble Sort;

20

Page 22: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Sorting permutations by adjacent exchanges

I Simple operation : exchange any two adjacent elements:

Example

4 1 6 2 5 3

1 4 6 2 5 3

I So we want to sort a permutation by performing as few suchexchanges as possible;

I Solution:I sorting: use Bubble Sort;I distance: the number of swaps used by Bubble Sort;

21

Page 23: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

A closed formula for the adjacent exchange distance

I An inversion in a permutation is a pair of misplaced elements: (πi , πj)with i < j and πi > πj ;

I The permutation graph of π has [n] as vertex set and the inversions ofπ as edges;

Example (the permutation graph of 〈4 1 6 2 5 3〉)

1

3

2 5

4 6

22

Page 24: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

A closed formula for the adjacent exchange distance

Theorem

The adjacent exchange distance of π is |Inv(π)| = |E (PG (π))|.

Proof sketch.

Each adjacent swap “fixes” at most one inversion, which isequivalent to removing an edge from PG , and we can always findsuch a move at every step if π 6= ι.

23

Page 25: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Sorting permutations by exchanges

I Simple operation : exchange any two elements:

Example

4 1 6 2 5 3

2 1 6 4 5 3

I So we want to sort a permutation by performing as few suchexchanges as possible;

24

Page 26: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Sorting permutations by exchanges

I Here’s a more complete example:

Example (a sorting sequence for 〈4 1 6 2 5 3〉)4 1 6 2 5 3

1 4 6 2 5 3

1 2 6 4 5 3

1 2 3 4 5 6

I It works... but can we do better?

25

Page 27: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Sorting permutations by exchanges

I Here’s a more complete example:

Example (a sorting sequence for 〈4 1 6 2 5 3〉)4 1 6 2 5 3

1 4 6 2 5 3

1 2 6 4 5 3

1 2 3 4 5 6

I It works... but can we do better?

26

Page 28: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Sorting permutations by exchanges

I Here’s a more complete example:

Example (a sorting sequence for 〈4 1 6 2 5 3〉)4 1 6 2 5 3

1 4 6 2 5 3

1 2 6 4 5 3

1 2 3 4 5 6

I It works... but can we do better?

27

Page 29: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Sorting permutations by exchanges

I Here’s a more complete example:

Example (a sorting sequence for 〈4 1 6 2 5 3〉)4 1 6 2 5 3

1 4 6 2 5 3

1 2 6 4 5 3

1 2 3 4 5 6

I It works... but can we do better?

28

Page 30: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Sorting permutations by exchanges

I Here’s a more complete example:

Example (a sorting sequence for 〈4 1 6 2 5 3〉)4 1 6 2 5 3

1 4 6 2 5 3

1 2 6 4 5 3

1 2 3 4 5 6

I It works... but can we do better?

29

Page 31: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Sorting permutations by exchanges

I Our goal: each element should be “at the right place”;

I Some elements are already where they should be, so they won’tmove;

I Strategy: read permutation from left to right, and:I if πi = i , pass;I otherwise, exchange πi with i ;

30

Page 32: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Sorting permutations by exchanges

I The algorithm obviously terminates;

I At every step, we “fix” one or two positions;

I We use the minimum number of exchanges;

I On the other hand, we’d like to be able to compute the distancewithout sorting;

31

Page 33: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Cycles

I Computing the distance requires using the cycles of thepermutation;

I Those cycles are obtained by iterating the permutation’s action on1, 2, . . . , n, stopping when all elements have been visited;

Example (cycles of 〈4 1 6 2 5 3〉)1 2 3 4 5 6

4 1 6 2 5 3

32

Page 34: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Cycles

I Computing the distance requires using the cycles of thepermutation;

I Those cycles are obtained by iterating the permutation’s action on1, 2, . . . , n, stopping when all elements have been visited;

Example (cycles of 〈4 1 6 2 5 3〉)1 2 3 5 6

4 6 5 3

33

Page 35: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Cycles

I Computing the distance requires using the cycles of thepermutation;

I Those cycles are obtained by iterating the permutation’s action on1, 2, . . . , n, stopping when all elements have been visited;

Example (cycles of 〈4 1 6 2 5 3〉)1 2 3 5

4 6 5

34

Page 36: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Cycles

I Computing the distance requires using the cycles of thepermutation;

I Those cycles are obtained by iterating the permutation’s action on1, 2, . . . , n, stopping when all elements have been visited;

Example (cycles of 〈4 1 6 2 5 3〉)1 2 3 5

4 6

35

Page 37: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Disjoint cycle decomposition of permutations

I Each permutation decomposes into disjoint cycles:

π =

⟨1 2 3 4 5 64 1 6 2 5 3

⟩= (1, 4, 2)(3, 6)(5).

I The graph of the permutation π, denoted by Γ(π), pictures thisdecomposition:

4 1 6 2 5 3

I The number of cycles of π is written c(π);

I 1-cycles are sometimes omitted;

36

Page 38: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Disjoint cycle decomposition of permutations

I Each permutation decomposes into disjoint cycles:

π =

⟨1 2 3 4 5 64 1 6 2 5 3

⟩= (1, 4, 2)(3, 6)(5).

I The graph of the permutation π, denoted by Γ(π), pictures thisdecomposition:

4 1 6 2 5 3

I The number of cycles of π is written c(π);

I 1-cycles are sometimes omitted;

37

Page 39: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Disjoint cycle decomposition of permutations

I Each permutation decomposes into disjoint cycles:

π =

⟨1 2 3 4 5 64 1 6 2 5 3

⟩= (1, 4, 2)(3, 6)(5).

I The graph of the permutation π, denoted by Γ(π), pictures thisdecomposition:

4 1 6 2 5 3

I The number of cycles of π is written c(π);

I 1-cycles are sometimes omitted;

38

Page 40: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Disjoint cycle decomposition of permutations

I Each permutation decomposes into disjoint cycles:

π =

⟨1 2 3 4 5 64 1 6 2 5 3

⟩= (1, 4, 2)(3, 6)(5).

I The graph of the permutation π, denoted by Γ(π), pictures thisdecomposition:

4 1 6 2 5 3

I The number of cycles of π is written c(π);

I 1-cycles are sometimes omitted;

39

Page 41: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Cycles and sorting

I Cycles of length 1 correspond to sorted elements; all other cyclesconsist of elements that are misplaced;

4 1 6 2 5 3 1 2 3 4 5 6

I Sorting comes down to splitting cycles until we only have cycles oflength 1;

I Our algorithm repeatedly splits k-cycles into a 1-cycle and a(k − 1)-cycle;

40

Page 42: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Computing the “exchange distance” exc(·)

I At each step, we can always create a new cycle if π 6= ι, so:

exc(π) ≤ n − c(π)

I And we can’t do better, so:

exc(π) ≥ n − c(π)

I Therefore:

Theorem ([Cayley, 1849])

The exchange distance of π in Sn is n − c(π).

41

Page 43: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Computing the “exchange distance” exc(·)

I At each step, we can always create a new cycle if π 6= ι, so:

exc(π) ≤ n − c(π)

I And we can’t do better, so:

exc(π) ≥ n − c(π)

I Therefore:

Theorem ([Cayley, 1849])

The exchange distance of π in Sn is n − c(π).

42

Page 44: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Computing the “exchange distance” exc(·)

I At each step, we can always create a new cycle if π 6= ι, so:

exc(π) ≤ n − c(π)

I And we can’t do better, so:

exc(π) ≥ n − c(π)

I Therefore:

Theorem ([Cayley, 1849])

The exchange distance of π in Sn is n − c(π).

43

Page 45: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Computing the “exchange distance” exc(·)

I At each step, we can always create a new cycle if π 6= ι, so:

exc(π) ≤ n − c(π)

I And we can’t do better, so:

exc(π) ≥ n − c(π)

I Therefore:

Theorem ([Cayley, 1849])

The exchange distance of π in Sn is n − c(π).

44

Page 46: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Lessons from sorting by exchanges

I Note that ι is the only permutation with n cycles;I The formula exc(π) = n − c(π) expresses:

I the difference between the number of cycles we have and thenumber of cycles we want;

I and the fact that at each step, we can obtain exactly one new cycle.

I This point of view will be crucial to sorting problems;

45

Page 47: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

Practice

For the algorithms below, a permutation is represented as a size-n array

with values and indices ranging from 1 to n.

1. Give a linear-time algorithm computing the inverse of apermutation.

2. Give a linear-time algorithm computing an optimal sequenceof swaps sorting a permutation.

3. Give a linear-time algorithm computing the decomposition ofa permutation into disjoint cycles

4. Let S be a set of permutations defining distance dS over Sn,such that S is stable by inversion (π ∈ S ⇒ π−1 ∈ S).

I Prove that dS(π) = dS(π−1) for every permutation π.I The stability by inversion is a sufficient condition to have the

above property, but is it necessary?

Page 48: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Block-interchanges and transpositions

I The mutations we observe in evolution (may) act on intervals;I We’ll now look at two generalisations of exchanges:

1. block-interchanges;2. transpositions;

I Block-interchanges exchange two disjoint intervals in apermutation;

1 2 34 5 6 7 8 9 1 6 7 5 2 3 4 8 9

I Transpositions displace an interval of the permutation;

1 2 34 5 6 7 8 9 10 1 5 67 8 2 3 4 9 10

46

Page 49: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Block-interchanges and transpositions

I The mutations we observe in evolution (may) act on intervals;I We’ll now look at two generalisations of exchanges:

1. block-interchanges;2. transpositions;

I Block-interchanges exchange two disjoint intervals in apermutation;

1 2 34 5 6 7 8 9 1 6 7 5 2 3 4 8 9

I Transpositions displace an interval of the permutation;

1 2 34 5 6 7 8 9 10 1 5 67 8 2 3 4 9 10

47

Page 50: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Block-interchanges and transpositions

I The mutations we observe in evolution (may) act on intervals;I We’ll now look at two generalisations of exchanges:

1. block-interchanges;2. transpositions;

I Block-interchanges exchange two disjoint intervals in apermutation;

1 2 34 5 6 7 8 9 1 6 7 5 2 3 4 8 9

I Transpositions displace an interval of the permutation;

1 2 34 5 6 7 8 9 10 1 5 67 8 2 3 4 9 10

48

Page 51: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Computing the associated distances

I Does the disjoint cycle technique “work”?

I Unlikely: the following permutation has n/2 cycles of length two,but bid(π) = td(π) = 1:

n2 + 1 n

2 + 2 n2 + 3 n 1 2 3 n

2· · · · · ·

I We’ll need something else since we cannot bound the effect of anoperation in that setting;

49

Page 52: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Computing the associated distances

I Does the disjoint cycle technique “work”?

I Unlikely: the following permutation has n/2 cycles of length two,but bid(π) = td(π) = 1:

n2 + 1 n

2 + 2 n2 + 3 n 1 2 3 n

2· · · · · ·

I We’ll need something else since we cannot bound the effect of anoperation in that setting;

50

Page 53: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Computing the associated distances

I Does the disjoint cycle technique “work”?

I Unlikely: the following permutation has n/2 cycles of length two,but bid(π) = td(π) = 1:

n2 + 1 n

2 + 2 n2 + 3 n 1 2 3 n

2· · · · · ·

I We’ll need something else since we cannot bound the effect of anoperation in that setting;

51

Page 54: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

The “directed breakpoint graph”

I (the term “breakpoint” will be explained later);

I Let’s build the directed breakpoint graph of π = 〈4 1 6 2 5 7 3〉:

1

4

0

3

7

5

2

6

1. build the ordered vertex set(π0 = 0, π1, π2, . . . , πn);

2. add black arcs for every orderedpair (πi , πi−1 (mod n+1));

3. add grey arcs for every ordered pair(i , i + 1 (mod n + 1));

DBG (π) decomposes in a unique way into alternating cycles

52

Page 55: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

The “directed breakpoint graph”

I (the term “breakpoint” will be explained later);

I Let’s build the directed breakpoint graph of π = 〈4 1 6 2 5 7 3〉:

1

4

0

3

7

5

2

6

1. build the ordered vertex set(π0 = 0, π1, π2, . . . , πn);

2. add black arcs for every orderedpair (πi , πi−1 (mod n+1));

3. add grey arcs for every ordered pair(i , i + 1 (mod n + 1));

DBG (π) decomposes in a unique way into alternating cycles

53

Page 56: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

The “directed breakpoint graph”

I (the term “breakpoint” will be explained later);

I Let’s build the directed breakpoint graph of π = 〈4 1 6 2 5 7 3〉:

1

4

0

3

7

5

2

6

1. build the ordered vertex set(π0 = 0, π1, π2, . . . , πn);

2. add black arcs for every orderedpair (πi , πi−1 (mod n+1));

3. add grey arcs for every ordered pair(i , i + 1 (mod n + 1));

DBG (π) decomposes in a unique way into alternating cycles

54

Page 57: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

The “directed breakpoint graph”

I (the term “breakpoint” will be explained later);

I Let’s build the directed breakpoint graph of π = 〈4 1 6 2 5 7 3〉:

1

4

0

3

7

5

2

6

1. build the ordered vertex set(π0 = 0, π1, π2, . . . , πn);

2. add black arcs for every orderedpair (πi , πi−1 (mod n+1));

3. add grey arcs for every ordered pair(i , i + 1 (mod n + 1));

DBG (π) decomposes in a unique way into alternating cycles

55

Page 58: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

The “directed breakpoint graph”

I (the term “breakpoint” will be explained later);

I Let’s build the directed breakpoint graph of π = 〈4 1 6 2 5 7 3〉:

1

4

0

3

7

5

2

6

1. build the ordered vertex set(π0 = 0, π1, π2, . . . , πn);

2. add black arcs for every orderedpair (πi , πi−1 (mod n+1));

3. add grey arcs for every ordered pair(i , i + 1 (mod n + 1));

DBG (π) decomposes in a unique way into alternating cycles

56

Page 59: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Formal definition of the directed breakpoint graph

Definition ([Bafna and Pevzner, 1998])

The directed breakpoint graph of 〈π1 π2 · · · πn〉 is defined by:

1. an ordered vertex set V = (π0 = 0, π1, π2, . . . , πn);

2. a bicoloured arc set A = AB ∪ AG , where:

2.1 AB = (πi , πi−1 (mod n+1)) : 0 ≤ i ≤ n;2.2 AG = (i , i + 1 (mod n + 1)) : 0 ≤ i ≤ n;

57

Page 60: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Circular vs. linear layout

I One can also represent the directed breakpoint graph using a linearlayout without affecting the cycle structure:

π2

π1

π0

π7

π6

π5

π4

π3

1

4

0

3

7

5

2

60π0

4π1

1π2

6π3

2π4

5π5

7π6

3π7

8π8

I Circular → linear: split 0 into 0 and n + 1;

I Linear → circular: merge 0 and n + 1 into 0;

I (in both cases: adapt arcs accordingly);

58

Page 61: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Intuitions behind DBG (π) – 1

I Both “monochromatic” cycles represent an ordering:

1. the black one represents the one we have (“reality”);2. the grey one represents the one we want to obtain (“desire”);

1

4

0

3

7

5

2

6

1

4

0

3

7

5

2

6

59

Page 62: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Intuitions behind DBG (π) – 2

I The alternating cycles are a blend of “reality” and “desire”, andwe must act on those cycles to turn “reality” into “desire”;

I When we are done, we have the largest number of cycles;

π2

π1

π0

π7

π6

π5

π4

π3

1

4

0

3

7

5

2

6

ι2

ι1

ι0

ι7

ι6

ι5

ι4

ι3

2

1

0

7

6

5

4

3

60

Page 63: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Proving lower bounds

I One way of proving lower bounds on distances: be optimistic:

1. find out the “best case”;2. pretend we’re always in that case;

I Techniques based on breakpoint graphs follow the same spirit asthose based on the disjoint cycle decomposition, but less freedomis allowed (details later);

I Usually: case analysis to determine by how much a parameter ofthe graph can change with one operation;

61

Page 64: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Lower bounding the block-interchange distance

A block-interchange β(i , j , k , `) increases the number of cycles inDBG by at most 2:

πi−1 πi πj−1 πj πk−1 πk π`−1 π` πi−1 πi πj−1πj πk−1πk π`−1 π`

Theorem ([Christie, 1996])

For all π in Sn: bid(π) ≥ n+1−c(DBG(π))2 .

62

Page 65: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Lower bounding the transposition distanceA transposition τ(i , j , k) increases the number of odd cycles in DBG byat most 2:

πi−1 πi πj−1 πj πk−1 πk

πi−1 πi πj−1πj πk−1 πk

πi−1 πi πj−1 πj πk−1 πk

πi−1 πi πj−1πj πk−1 πk

πi−1 πi πj−1 πj πk−1 πk

πi−1 πi πj−1πj πk−1 πk

πi−1 πi πj−1 πj πk−1 πk

πi−1 πi πj−1πj πk−1 πk

Theorem ([Bafna and Pevzner, 1998])

For all π in Sn: td(π) ≥ n+1−codd (DBG(π))2 .

63

Page 66: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Lower bounding the transposition distanceA transposition τ(i , j , k) increases the number of odd cycles in DBG byat most 2:

πi−1 πi πj−1 πj πk−1 πk πi−1 πi πj−1πj πk−1 πk

πi−1 πi πj−1 πj πk−1 πk πi−1 πi πj−1πj πk−1 πk

πi−1 πi πj−1 πj πk−1 πk πi−1 πi πj−1πj πk−1 πk

πi−1 πi πj−1 πj πk−1 πk πi−1 πi πj−1πj πk−1 πk

Theorem ([Bafna and Pevzner, 1998])

For all π in Sn: td(π) ≥ n+1−codd (DBG(π))2 .

64

Page 67: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

IntroductionPermutations

The modelExchangesLarger-scale transformationsThe directed breakpoint graph

Lower bounding the transposition distanceA transposition τ(i , j , k) increases the number of odd cycles in DBG byat most 2:

πi−1 πi πj−1 πj πk−1 πk πi−1 πi πj−1πj πk−1 πk

πi−1 πi πj−1 πj πk−1 πk πi−1 πi πj−1πj πk−1 πk

πi−1 πi πj−1 πj πk−1 πk πi−1 πi πj−1πj πk−1 πk

πi−1 πi πj−1 πj πk−1 πk πi−1 πi πj−1πj πk−1 πk

Theorem ([Bafna and Pevzner, 1998])

For all π in Sn: td(π) ≥ n+1−codd (DBG(π))2 .

65

Page 68: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

Practice

5. Give sorting sequences for the following permutations, andprove they are optimal

I 〈654321〉, using block-interchanges

I 〈3254761〉, using transpositions

Page 69: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

Proving upper bounds

I One way of proving upper bounds on distances: be pessimistic:

1. find out the “worst case”;2. pretend we’re always in that case;

I For better upper bounds: be less pessimistic:I case analyses of varying difficulty;I look at sequences of moves;

2

Page 70: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

Upper bounds

I For block-interchanges, we have:

Theorem ([Christie, 1996])

For all π in Sn: bid(π) ≤ n+1−c(DBG(π))2 .

I Which equals the lower bound and therefore the exact distance;

3

Page 71: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

Upper bounds

I For transpositions:

Theorem ([Bafna and Pevzner, 1998])

For all π in Sn: td(π) ≤ 34 (n + 1− codd(DBG (π))) = 3

2OPT .

I Current best approximation: 11/8 [Elias and Hartman, 2006]

4

Page 72: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

Practice

6. Show that td(π) ≤ n − LIS(π), where LIS denotes the lengthof the longest increasing subsequence.

7. Give a polynomial-time 2-approximation algorithm for theTransposition Distance problem.

Hint 1: Use the lower bound based on the number of cycles:

td(π) ≥ n+1−c(DBG(π))2

Hint 2: If π displays a cycle of the right form, give a transpositioncreating two cycles. Otherwise, turn a cycle into the right formwith one transposition.

Page 73: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

Practice

6. Show that td(π) ≤ n − LIS(π), where LIS denotes the lengthof the longest increasing subsequence.

7. Give a polynomial-time 2-approximation algorithm for theTransposition Distance problem.

Hint 1: Use the lower bound based on the number of cycles:

td(π) ≥ n+1−c(DBG(π))2

Hint 2: If π displays a cycle of the right form, give a transpositioncreating two cycles. Otherwise, turn a cycle into the right formwith one transposition.

Page 74: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

Practice

6. Show that td(π) ≤ n − LIS(π), where LIS denotes the lengthof the longest increasing subsequence.

7. Give a polynomial-time 2-approximation algorithm for theTransposition Distance problem.

Hint 1: Use the lower bound based on the number of cycles:

td(π) ≥ n+1−c(DBG(π))2

Hint 2: If π displays a cycle of the right form, give a transpositioncreating two cycles. Otherwise, turn a cycle into the right formwith one transposition.

Page 75: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

Reversals

I Another type of mutation occurs frequently: reversals, whichreverse the order of elements on an interval of the permutation;

Example

4 1 6 2 5 3

4 5 2 6 1 3

5

Page 76: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

Reversals

I Here’s an optimal sorting sequence1:

Example (optimal sorting sequence of reversals)

I Bad news: stuff seen so far doesn’t work;

1Obtained using GRIMM: http://grimm.ucsd.edu/cgi-bin/grimm.cgi

6

Page 77: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

Breakpoints

Definition ([Watterson et al., 1982])

A breakpoint in a permutation π is an ordered pair (πi , πi+1) with|πi+1 − πi | 6= 1 (otherwise it’s an adjacency).

Example (breakpoints of 〈3 1 5 4 2 8 6 7〉)3 1 5 4 2 8 6 7• • • • •

I This notion characterises elements that are “relatively misplaced”:they’re not consecutive in ι, nor in 〈n n − 1 · · · 2 1〉

7

Page 78: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

Breakpoints

Definition ([Watterson et al., 1982])

A breakpoint in a permutation π is an ordered pair (πi , πi+1) with|πi+1 − πi | 6= 1 (otherwise it’s an adjacency).

Example (breakpoints of 〈3 1 5 4 2 8 6 7〉)3 1 5 4 2 8 6 7• • • • •

I This notion characterises elements that are “relatively misplaced”:they’re not consecutive in ι, nor in 〈n n − 1 · · · 2 1〉

8

Page 79: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

Breakpoints

Definition ([Watterson et al., 1982])

A breakpoint in a permutation π is an ordered pair (πi , πi+1) with|πi+1 − πi | 6= 1 (otherwise it’s an adjacency).

Example (breakpoints of 〈3 1 5 4 2 8 6 7〉)3 1 5 4 2 8 6 7• • • • •

I This notion characterises elements that are “relatively misplaced”:they’re not consecutive in ι, nor in 〈n n − 1 · · · 2 1〉

9

Page 80: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

Breakpoints

I Note that ι = 〈1 2 · · · n〉 has no breakpoint;

I That’s also the case of 〈n n − 1 · · · 2 1〉;I To distinguish them, we frame permutations:

〈π1 π2 · · · πn〉 7→ 〈0 π1 π2 · · · πn n + 1〉

I Those artificial elements are denoted by π0 and πn+1;

I Which leads to the following definition:

Definition

The number of breakpoints of a permutation π in Sn is

b(π) = |(πi , πi+1) | 0 ≤ i ≤ n and |πi+1 − πi | 6= 1|.

I Example: 〈0 • 3 • 1 • 5 4 • 2 • 8 • 6 7 • 9〉 ⇒ b(π) = 7;

10

Page 81: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

Breakpoints

I Note that ι = 〈1 2 · · · n〉 has no breakpoint;

I That’s also the case of 〈n n − 1 · · · 2 1〉;

I To distinguish them, we frame permutations:

〈π1 π2 · · · πn〉 7→ 〈0 π1 π2 · · · πn n + 1〉

I Those artificial elements are denoted by π0 and πn+1;

I Which leads to the following definition:

Definition

The number of breakpoints of a permutation π in Sn is

b(π) = |(πi , πi+1) | 0 ≤ i ≤ n and |πi+1 − πi | 6= 1|.

I Example: 〈0 • 3 • 1 • 5 4 • 2 • 8 • 6 7 • 9〉 ⇒ b(π) = 7;

11

Page 82: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

Breakpoints

I Note that ι = 〈1 2 · · · n〉 has no breakpoint;

I That’s also the case of 〈n n − 1 · · · 2 1〉;I To distinguish them, we frame permutations:

〈π1 π2 · · · πn〉 7→ 〈0 π1 π2 · · · πn n + 1〉

I Those artificial elements are denoted by π0 and πn+1;

I Which leads to the following definition:

Definition

The number of breakpoints of a permutation π in Sn is

b(π) = |(πi , πi+1) | 0 ≤ i ≤ n and |πi+1 − πi | 6= 1|.

I Example: 〈0 • 3 • 1 • 5 4 • 2 • 8 • 6 7 • 9〉 ⇒ b(π) = 7;

12

Page 83: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

Breakpoints

I Note that ι = 〈1 2 · · · n〉 has no breakpoint;

I That’s also the case of 〈n n − 1 · · · 2 1〉;I To distinguish them, we frame permutations:

〈π1 π2 · · · πn〉 7→ 〈0 π1 π2 · · · πn n + 1〉

I Those artificial elements are denoted by π0 and πn+1;

I Which leads to the following definition:

Definition

The number of breakpoints of a permutation π in Sn is

b(π) = |(πi , πi+1) | 0 ≤ i ≤ n and |πi+1 − πi | 6= 1|.

I Example: 〈0 • 3 • 1 • 5 4 • 2 • 8 • 6 7 • 9〉 ⇒ b(π) = 7;

13

Page 84: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

Breakpoints

I Note that ι = 〈1 2 · · · n〉 has no breakpoint;

I That’s also the case of 〈n n − 1 · · · 2 1〉;I To distinguish them, we frame permutations:

〈π1 π2 · · · πn〉 7→ 〈0 π1 π2 · · · πn n + 1〉

I Those artificial elements are denoted by π0 and πn+1;

I Which leads to the following definition:

Definition

The number of breakpoints of a permutation π in Sn is

b(π) = |(πi , πi+1) | 0 ≤ i ≤ n and |πi+1 − πi | 6= 1|.

I Example: 〈0 • 3 • 1 • 5 4 • 2 • 8 • 6 7 • 9〉 ⇒ b(π) = 7;

14

Page 85: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

Breakpoints

I Note that ι = 〈1 2 · · · n〉 has no breakpoint;

I That’s also the case of 〈n n − 1 · · · 2 1〉;I To distinguish them, we frame permutations:

〈π1 π2 · · · πn〉 7→ 〈0 π1 π2 · · · πn n + 1〉

I Those artificial elements are denoted by π0 and πn+1;

I Which leads to the following definition:

Definition

The number of breakpoints of a permutation π in Sn is

b(π) = |(πi , πi+1) | 0 ≤ i ≤ n and |πi+1 − πi | 6= 1|.

I Example: 〈0 • 3 • 1 • 5 4 • 2 • 8 • 6 7 • 9〉 ⇒ b(π) = 7;

15

Page 86: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

Usefulness of breakpoints

I Observation: reversals can “fix” breakpoints:

Example

0 3 1 5 4 2 8 6 7 9• • • • • • • 0 3 1 5 4 8 2 6 7 9• • • • • • •

0 3 1 2 8 4 5 6 7 9• • • • •0 3 1 2 8 7 6 5 4 9• • • •

16

Page 87: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

Lower bound

I A reversal can fix at most two breakpoints;

I If we’re lucky, we are always in that case;

I This yields the following lower bound:

Theorem ([Kececioglu and Sankoff, 1995])

For every permutation π:

rd(π) ≥ b(π)/2.

17

Page 88: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

Upper bound

I On the other hand we can always fix at least one breakpoint(details: [Kececioglu and Sankoff, 1995])

I So the algorithm is a 2-approximation:

b(π)/2 ≤ rd(π) ≤ b(π)

I Can we do better?

18

Page 89: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

The undirected breakpoint graph

I Yes, but we need a more appropriate structure:

Definition ([Bafna and Pevzner, 1996])

The undirected breakpoint graph of the permutation π in Sn,written UBG (π) = (V ,E ), is defined by:

I V = (π0 = 0, π1, π2, . . . , πn, πn+1 = n + 1);

I E = πi , πi+1 | 0 ≤ i ≤ n︸ ︷︷ ︸black edges

∪i , i + 1 | 0 ≤ i ≤ n︸ ︷︷ ︸grey edges

.

19

Page 90: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

The undirected breakpoint graph: example

I Let us build the undirected breakpoint graph π = 〈3 1 5 4 2 8 6 7〉:

Example

3 1 5 4 2 8 6 7

0 9

1. frame the permutation;

2. build ordered vertex set (π0 = 0, π1, π2, . . ., πn+1 = n + 1);

3. add black edges for every pair πi , πi+1;4. add grey edges for every pair i , i + 1;

20

Page 91: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

The undirected breakpoint graph: example

I Let us build the undirected breakpoint graph π = 〈3 1 5 4 2 8 6 7〉:

Example

3 1 5 4 2 8 6 70 9

1. frame the permutation;

2. build ordered vertex set (π0 = 0, π1, π2, . . ., πn+1 = n + 1);

3. add black edges for every pair πi , πi+1;4. add grey edges for every pair i , i + 1;

21

Page 92: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

The undirected breakpoint graph: example

I Let us build the undirected breakpoint graph π = 〈3 1 5 4 2 8 6 7〉:

Example

3 1 5 4 2 8 6 70 9

1. frame the permutation;

2. build ordered vertex set (π0 = 0, π1, π2, . . ., πn+1 = n + 1);

3. add black edges for every pair πi , πi+1;4. add grey edges for every pair i , i + 1;

22

Page 93: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

The undirected breakpoint graph: example

I Let us build the undirected breakpoint graph π = 〈3 1 5 4 2 8 6 7〉:

Example

3 1 5 4 2 8 6 70 9

1. frame the permutation;

2. build ordered vertex set (π0 = 0, π1, π2, . . ., πn+1 = n + 1);

3. add black edges for every pair πi , πi+1;

4. add grey edges for every pair i , i + 1;

23

Page 94: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

The undirected breakpoint graph: example

I Let us build the undirected breakpoint graph π = 〈3 1 5 4 2 8 6 7〉:

Example

3 1 5 4 2 8 6 70 9

1. frame the permutation;

2. build ordered vertex set (π0 = 0, π1, π2, . . ., πn+1 = n + 1);

3. add black edges for every pair πi , πi+1;4. add grey edges for every pair i , i + 1;

24

Page 95: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

Decomposition

I That graph decomposes into cycles:

Example

0 3 1 5 4 2 8 6 7 9

I ... but the decomposition is no longer unique!

Example

0 2 1 3

has either one 3-cycle or a 1-cycle and a 2-cycle.

25

Page 96: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

Decomposition

I That graph decomposes into cycles:

Example

0 3 1 5 4 2 8 6 7 9

I ... but the decomposition is no longer unique!

Example

0 2 1 3

has either one 3-cycle or a 1-cycle and a 2-cycle.

26

Page 97: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

Decomposition

I That graph decomposes into cycles:

Example

0 3 1 5 4 2 8 6 7 9

I ... but the decomposition is no longer unique!

Example

0 2 1 3

has either one 3-cycle or a 1-cycle and a 2-cycle.

27

Page 98: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

Cycle decompositions

I We use decompositions to derive lower bounds;

I A reversal acts on one or two cycles and can split / merge cycles;

πi−1 πi πj πj+1· · ·

πi−1 πj πi πj+1· · ·

I So we’re tempted to say:

rd(π) ≥ n + 1− c(UBG (π))

28

Page 99: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

Cycle decompositions

I We use decompositions to derive lower bounds;

I A reversal acts on one or two cycles and can split / merge cycles;

πi−1 πi πj πj+1· · ·

πi−1 πj πi πj+1· · ·

I So we’re tempted to say:

rd(π) ≥ n + 1− c(UBG (π))

29

Page 100: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

Cycle decompositions

I We use decompositions to derive lower bounds;

I A reversal acts on one or two cycles and can split / merge cycles;

πi−1 πi πj πj+1· · ·

πi−1 πj πi πj+1· · ·

I So we’re tempted to say:

rd(π) ≥ n + 1− c(UBG (π))

30

Page 101: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

Decomposition

I ... but recall that the decomposition is not unique!!!

I The more cycles we have, the closer we are to UBG (ι);

I Therefore, we have in fact:

Theorem ([Bafna and Pevzner, 1996])

For all π in Sn:

rd(π) ≥ n + 1− c∗(UBG (π)),

where c∗(UBG (π)) is the number of cycles in a maximumcardinality decomposition.

I Unfortunately, finding a maximum cardinality decomposition isNP-hard [Caprara, 1999];

31

Page 102: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

Decomposition

I ... but recall that the decomposition is not unique!!!

I The more cycles we have, the closer we are to UBG (ι);

I Therefore, we have in fact:

Theorem ([Bafna and Pevzner, 1996])

For all π in Sn:

rd(π) ≥ n + 1− c∗(UBG (π)),

where c∗(UBG (π)) is the number of cycles in a maximumcardinality decomposition.

I Unfortunately, finding a maximum cardinality decomposition isNP-hard [Caprara, 1999];

32

Page 103: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

Decomposition

I ... but recall that the decomposition is not unique!!!

I The more cycles we have, the closer we are to UBG (ι);

I Therefore, we have in fact:

Theorem ([Bafna and Pevzner, 1996])

For all π in Sn:

rd(π) ≥ n + 1− c∗(UBG (π)),

where c∗(UBG (π)) is the number of cycles in a maximumcardinality decomposition.

I Unfortunately, finding a maximum cardinality decomposition isNP-hard [Caprara, 1999];

33

Page 104: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

Decomposition

I ... but recall that the decomposition is not unique!!!

I The more cycles we have, the closer we are to UBG (ι);

I Therefore, we have in fact:

Theorem ([Bafna and Pevzner, 1996])

For all π in Sn:

rd(π) ≥ n + 1− c∗(UBG (π)),

where c∗(UBG (π)) is the number of cycles in a maximumcardinality decomposition.

I Unfortunately, finding a maximum cardinality decomposition isNP-hard [Caprara, 1999];

34

Page 105: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

Decomposition

I ... but recall that the decomposition is not unique!!!

I The more cycles we have, the closer we are to UBG (ι);

I Therefore, we have in fact:

Theorem ([Bafna and Pevzner, 1996])

For all π in Sn:

rd(π) ≥ n + 1− c∗(UBG (π)),

where c∗(UBG (π)) is the number of cycles in a maximumcardinality decomposition.

I Unfortunately, finding a maximum cardinality decomposition isNP-hard [Caprara, 1999];

35

Page 106: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

BreakpointsThe undirected breakpoint graph

Results on sorting permutations

I Here’s a nonexhaustive summary on sorting permutations usingvarious operations:

Operation Sorting Distance Best approximation

exchange O(n) [Knuth, 1995] 1block-interchange O(n) [Christie, 1996] 1double cut-and-joins NP-hard [Chen, 2010] ?reversal NP-hard [Caprara, 1999] 11/8 [Berman et al., 2002]transposition NP-hard [Bulteau et al., 2012] 11/8 [Elias and Hartman, 2006]

prefi

x exchange O(n) [Akers et al., 1987] 1reversal NP-hard [Bulteau et al., 2015] 2 [Fischer and Ginzinger, 2005]transposition ? ? 2 [Dias and Meidanis, 2002]

I Let us move on to our next model: signed permutations;

36

Page 107: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Motivation

I Permutations lack realism: DNAsegments are oriented;

I We need to take orientation intoaccount;

I Therefore, two DNA segmentsmatch if:

I they are the same, orI one is the reverse

complement of the other.

(picture by Madeleine Price Ball, taken from Wikimedia)

37

Page 108: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Motivation

I So instead of permutations:

Example (genomes → permutations)5 1 2 4 7 3 6 (A)

1 2 3 4 5 6 7 (B)

genome rearrangements

I We now have signed permutations:

Example (genomes → signed permutations)

−5 +1 +2 +4 −7 −3 +6

(A)

+1 +2 +3 +4 +5 +6 +7

(B)

genome rearrangements

38

Page 109: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Motivation

I So instead of permutations:

Example (genomes → permutations)5 1 2 4 7 3 6 (A)

1 2 3 4 5 6 7 (B)

genome rearrangements

I We now have signed permutations:

Example (genomes → signed permutations)

−5 +1 +2 +4 −7 −3 +6

(A)

+1 +2 +3 +4 +5 +6 +7

(B)

genome rearrangements

39

Page 110: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Motivation

I So instead of permutations:

Example (genomes → permutations)5 1 2 4 7 3 6 (A)

1 2 3 4 5 6 7 (B)

genome rearrangements

I We now have signed permutations:

Example (genomes → signed permutations)−

5

+

1

+

2

+

4

7

3

+

6

(A)

+

1

+

2

+

3

+

4

+

5

+

6

+

7

(B)

genome rearrangements

40

Page 111: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Motivation

I So instead of permutations:

Example (genomes → permutations)5 1 2 4 7 3 6 (A)

1 2 3 4 5 6 7 (B)

genome rearrangements

I We now have signed permutations:

Example (genomes → signed permutations)−

5

+

1

+

2

+

4

7

3

+

6

(A)

+1 +2 +3 +4 +5 +6 +7 (B)

genome rearrangements

41

Page 112: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Motivation

I So instead of permutations:

Example (genomes → permutations)5 1 2 4 7 3 6 (A)

1 2 3 4 5 6 7 (B)

genome rearrangements

I We now have signed permutations:

Example (genomes → signed permutations)−5 +1 +2 +4 −7 −3 +6 (A)

+1 +2 +3 +4 +5 +6 +7 (B)

genome rearrangements

42

Page 113: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Motivation

I So instead of permutations:

Example (genomes → permutations)5 1 2 4 7 3 6 (A)

1 2 3 4 5 6 7 (B)

genome rearrangements

I We now have signed permutations:

Example (genomes → signed permutations)−5 +1 +2 +4 −7 −3 +6 (A)

+1 +2 +3 +4 +5 +6 +7 (B)

genome rearrangements

43

Page 114: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Signed (per)mutations

I Note: this does not mean that everything you know aboutunsigned comparisons is useless:

1. orientation information is not always available;2. ideas from unsigned comparisons lead to ideas for signed

comparisons;

I Mutations may now act on a segment’s place and orientation;

44

Page 115: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Tools

I As expected, tools we’ve seen previously cannot be used herebecause they do not take signs into account;

I Then again, some ideas can be adapted;

45

Page 116: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Notation and definitions pertaining to signed permutations

I We deal exclusively with ±1,±2, . . . ,±n;

I Convention: π(−i) = −π(i) for 1 ≤ i ≤ n;

I Permutations can be written in one- or two-row notation:

π =

⟨−4 −3 −2 −1 1 2 3 4

2 4 −1 3 −3 1 −4 −2

⟩= 〈−3 1 − 4 − 2〉.

I We will restrict ourselves to the mapping of positive elements;

I Composition works as before;

I The corresponding group is the hyperoctahedral group S±n ;

46

Page 117: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Notation and definitions pertaining to signed permutations

I We deal exclusively with ±1,±2, . . . ,±n;I Convention: π(−i) = −π(i) for 1 ≤ i ≤ n;

I Permutations can be written in one- or two-row notation:

π =

⟨−4 −3 −2 −1 1 2 3 4

2 4 −1 3 −3 1 −4 −2

⟩= 〈−3 1 − 4 − 2〉.

I We will restrict ourselves to the mapping of positive elements;

I Composition works as before;

I The corresponding group is the hyperoctahedral group S±n ;

47

Page 118: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Notation and definitions pertaining to signed permutations

I We deal exclusively with ±1,±2, . . . ,±n;I Convention: π(−i) = −π(i) for 1 ≤ i ≤ n;

I Permutations can be written in one- or two-row notation:

π =

⟨−4 −3 −2 −1 1 2 3 4

2 4 −1 3 −3 1 −4 −2

⟩= 〈−3 1 − 4 − 2〉.

I We will restrict ourselves to the mapping of positive elements;

I Composition works as before;

I The corresponding group is the hyperoctahedral group S±n ;

48

Page 119: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Notation and definitions pertaining to signed permutations

I We deal exclusively with ±1,±2, . . . ,±n;I Convention: π(−i) = −π(i) for 1 ≤ i ≤ n;

I Permutations can be written in one- or two-row notation:

π =

⟨−4 −3 −2 −1 1 2 3 4

2 4 −1 3 −3 1 −4 −2

⟩= 〈−3 1 − 4 − 2〉.

I We will restrict ourselves to the mapping of positive elements;

I Composition works as before;

I The corresponding group is the hyperoctahedral group S±n ;

49

Page 120: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Notation and definitions pertaining to signed permutations

I We deal exclusively with ±1,±2, . . . ,±n;I Convention: π(−i) = −π(i) for 1 ≤ i ≤ n;

I Permutations can be written in one- or two-row notation:

π =

⟨−4 −3 −2 −1 1 2 3 4

2 4 −1 3 −3 1 −4 −2

⟩= 〈−3 1 − 4 − 2〉.

I We will restrict ourselves to the mapping of positive elements;

I Composition works as before;

I The corresponding group is the hyperoctahedral group S±n ;

50

Page 121: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Notation and definitions pertaining to signed permutations

I We deal exclusively with ±1,±2, . . . ,±n;I Convention: π(−i) = −π(i) for 1 ≤ i ≤ n;

I Permutations can be written in one- or two-row notation:

π =

⟨−4 −3 −2 −1 1 2 3 4

2 4 −1 3 −3 1 −4 −2

⟩= 〈−3 1 − 4 − 2〉.

I We will restrict ourselves to the mapping of positive elements;

I Composition works as before;

I The corresponding group is the hyperoctahedral group S±n ;

51

Page 122: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Signed reversals

I Reversals can be generalised to signed reversals, which not onlyreverse an interval but also flip signs along the interval;

Example (signed reversal)

-5 1 2 4 -7 -3 6

-5 1 3 7 -4 -2 6

I As before, we’re interested in sorting a given signed permutationusing as few signed reversals as possible (or merely computing thelength of a shortest sequence);

52

Page 123: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Signed reversals

I Reversals can be generalised to signed reversals, which not onlyreverse an interval but also flip signs along the interval;

Example (signed reversal)

-5 1 2 4 -7 -3 6

-5 1 3 7 -4 -2 6

I As before, we’re interested in sorting a given signed permutationusing as few signed reversals as possible (or merely computing thelength of a shortest sequence);

53

Page 124: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Signed reversals

I Reversals can be generalised to signed reversals, which not onlyreverse an interval but also flip signs along the interval;

Example (signed reversal)

-5 1 2 4 -7 -3 6

-5 1 3 7 -4 -2 6

I As before, we’re interested in sorting a given signed permutationusing as few signed reversals as possible (or merely computing thelength of a shortest sequence);

54

Page 125: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Example: sorting by signed reversals

π = −5 +1 +2 +4 −7 −3 +6

−5 +1 +2 −4 −7 −3 +6

−5 +1 +2 −4 −7 −6 +3

−5 +1 +2 −4 −3 +6 +7

−5 +1 +2 +3 +4 +6 +7

−5 −4 −3 −2 −1 +6 +7

σ = +1 +2 +3 +4 +5 +6 +7

srd(π, σ) ≤ 6

55

Page 126: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Example: sorting by signed reversals

π = −5 +1 +2 +4 −7 −3 +6

−5 +1 +2 −4 −7 −3 +6

−5 +1 +2 −4 −7 −6 +3

−5 +1 +2 −4 −3 +6 +7

−5 +1 +2 +3 +4 +6 +7

−5 −4 −3 −2 −1 +6 +7

σ = +1 +2 +3 +4 +5 +6 +7

srd(π, σ) ≤ 6

56

Page 127: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Example: sorting by signed reversals

π = −5 +1 +2 +4 −7 −3 +6

−5 +1 +2 −4 −7 −3 +6

−5 +1 +2 −4 −7 −6 +3

−5 +1 +2 −4 −3 +6 +7

−5 +1 +2 +3 +4 +6 +7

−5 −4 −3 −2 −1 +6 +7

σ = +1 +2 +3 +4 +5 +6 +7

srd(π, σ) ≤ 6

57

Page 128: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Example: sorting by signed reversals

π = −5 +1 +2 +4 −7 −3 +6

−5 +1 +2 −4 −7 −3 +6

−5 +1 +2 −4 −7 −6 +3

−5 +1 +2 −4 −3 +6 +7

−5 +1 +2 +3 +4 +6 +7

−5 −4 −3 −2 −1 +6 +7

σ = +1 +2 +3 +4 +5 +6 +7

srd(π, σ) ≤ 6

58

Page 129: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Example: sorting by signed reversals

π = −5 +1 +2 +4 −7 −3 +6

−5 +1 +2 −4 −7 −3 +6

−5 +1 +2 −4 −7 −6 +3

−5 +1 +2 −4 −3 +6 +7

−5 +1 +2 +3 +4 +6 +7

−5 −4 −3 −2 −1 +6 +7

σ = +1 +2 +3 +4 +5 +6 +7

srd(π, σ) ≤ 6

59

Page 130: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Example: sorting by signed reversals

π = −5 +1 +2 +4 −7 −3 +6

−5 +1 +2 −4 −7 −3 +6

−5 +1 +2 −4 −7 −6 +3

−5 +1 +2 −4 −3 +6 +7

−5 +1 +2 +3 +4 +6 +7

−5 −4 −3 −2 −1 +6 +7

σ = +1 +2 +3 +4 +5 +6 +7

srd(π, σ) ≤ 6

60

Page 131: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Example: sorting by signed reversals

π = −5 +1 +2 +4 −7 −3 +6

−5 +1 +2 −4 −7 −3 +6

−5 +1 +2 −4 −7 −6 +3

−5 +1 +2 −4 −3 +6 +7

−5 +1 +2 +3 +4 +6 +7

−5 −4 −3 −2 −1 +6 +7

σ = +1 +2 +3 +4 +5 +6 +7

srd(π, σ) ≤ 6

61

Page 132: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Example: sorting by signed reversals

π = −5 +1 +2 +4 −7 −3 +6

−5 +1 +2 −4 −7 −3 +6

−5 +1 +2 −4 −7 −6 +3

−5 +1 +2 −4 −3 +6 +7

−5 +1 +2 +3 +4 +6 +7

−5 −4 −3 −2 −1 +6 +7

σ = +1 +2 +3 +4 +5 +6 +7

srd(π, σ) ≤ 6

62

Page 133: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Example: sorting by signed reversals

π = −5 +1 +2 +4 −7 −3 +6

−5 +1 +2 −4 −7 −3 +6

−5 +1 +2 −4 −7 −6 +3

−5 +1 +2 −4 −3 +6 +7

−5 +1 +2 +3 +4 +6 +7

−5 −4 −3 −2 −1 +6 +7

σ = +1 +2 +3 +4 +5 +6 +7

srd(π, σ) ≤ 6

63

Page 134: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Example: sorting by signed reversals

π = −5 +1 +2 +4 −7 −3 +6

−5 +1 +2 −4 −7 −3 +6

−5 +1 +2 −4 −7 −6 +3

−5 +1 +2 −4 −3 +6 +7

−5 +1 +2 +3 +4 +6 +7

−5 −4 −3 −2 −1 +6 +7

σ = +1 +2 +3 +4 +5 +6 +7

srd(π, σ) ≤ 6

64

Page 135: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Example: sorting by signed reversals

π = −5 +1 +2 +4 −7 −3 +6

−5 +1 +2 −4 −7 −3 +6

−5 +1 +2 −4 −7 −6 +3

−5 +1 +2 −4 −3 +6 +7

−5 +1 +2 +3 +4 +6 +7

−5 −4 −3 −2 −1 +6 +7

σ = +1 +2 +3 +4 +5 +6 +7

srd(π, σ) ≤ 6

65

Page 136: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Example: sorting by signed reversals

π = −5 +1 +2 +4 −7 −3 +6

−5 +1 +2 −4 −7 −3 +6

−5 +1 +2 −4 −7 −6 +3

−5 +1 +2 −4 −3 +6 +7

−5 +1 +2 +3 +4 +6 +7

−5 −4 −3 −2 −1 +6 +7

σ = +1 +2 +3 +4 +5 +6 +7

srd(π, σ) ≤ 6

66

Page 137: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Solving the problem

I How do we attack this problem?

I Breakpoints can be generalised to the signed setting ...

I ... but you already know / guess that this will at best provide anapproximation;

I Instead, we’re going to adapt the breakpoint graph to the signedsetting;

67

Page 138: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Solving the problem

I How do we attack this problem?

I Breakpoints can be generalised to the signed setting ...

I ... but you already know / guess that this will at best provide anapproximation;

I Instead, we’re going to adapt the breakpoint graph to the signedsetting;

68

Page 139: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Solving the problem

I How do we attack this problem?

I Breakpoints can be generalised to the signed setting ...

I ... but you already know / guess that this will at best provide anapproximation;

I Instead, we’re going to adapt the breakpoint graph to the signedsetting;

69

Page 140: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Solving the problem

I How do we attack this problem?

I Breakpoints can be generalised to the signed setting ...

I ... but you already know / guess that this will at best provide anapproximation;

I Instead, we’re going to adapt the breakpoint graph to the signedsetting;

70

Page 141: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

The breakpoint graph

I The breakpoint graph in the signed case is slightly different:

−5 +1 +2 +4 −7 −3 +6π =

π′ = 0 10 9 1 2 3 4 7 8 14 13 6 5 11 12 15

1. double π’s elements(i 7→ 2|i | − 1, 2|i |) and add 0and 2n + 1

2. elements of π′ = vertices

3. black edges connect distinctadjacent genes

4. grey edges connect distinctconsecutive genes

010

9

1

2

3478

14

13

6

5

1112 15

π′0

π′1

π′2

π′3

π′4

π′5

π′6π′

7π′

8

π′9

π′10

π′11

π′12

π′13

π′14

π′15

71

Page 142: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

The breakpoint graph

I The breakpoint graph in the signed case is slightly different:

−5 +1 +2 +4 −7 −3 +6π =

π′ = 0 10 9 1 2 3 4 7 8 14 13 6 5 11 12 15

1. double π’s elements(i 7→ 2|i | − 1, 2|i |) and add 0and 2n + 1

2. elements of π′ = vertices

3. black edges connect distinctadjacent genes

4. grey edges connect distinctconsecutive genes

010

9

1

2

3478

14

13

6

5

1112 15

π′0

π′1

π′2

π′3

π′4

π′5

π′6π′

7π′

8

π′9

π′10

π′11

π′12

π′13

π′14

π′15

72

Page 143: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

The breakpoint graph

I The breakpoint graph in the signed case is slightly different:

−5 +1 +2 +4 −7 −3 +6π =

π′ = 0 10 9 1 2 3 4 7 8 14 13 6 5 11 12 15

1. double π’s elements(i 7→ 2|i | − 1, 2|i |) and add 0and 2n + 1

2. elements of π′ = vertices

3. black edges connect distinctadjacent genes

4. grey edges connect distinctconsecutive genes

010

9

1

2

3478

14

13

6

5

1112 15

π′0

π′1

π′2

π′3

π′4

π′5

π′6π′

7π′

8

π′9

π′10

π′11

π′12

π′13

π′14

π′15

73

Page 144: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

The breakpoint graph

I The breakpoint graph in the signed case is slightly different:

−5 +1 +2 +4 −7 −3 +6π =

π′ = 0 10 9 1 2 3 4 7 8 14 13 6 5 11 12 15

1. double π’s elements(i 7→ 2|i | − 1, 2|i |) and add 0and 2n + 1

2. elements of π′ = vertices

3. black edges connect distinctadjacent genes

4. grey edges connect distinctconsecutive genes

010

9

1

2

3478

14

13

6

5

1112 15

π′0

π′1

π′2

π′3

π′4

π′5

π′6π′

7π′

8

π′9

π′10

π′11

π′12

π′13

π′14

π′15

74

Page 145: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

The breakpoint graph

I The breakpoint graph in the signed case is slightly different:

−5 +1 +2 +4 −7 −3 +6π =

π′ = 0 10 9 1 2 3 4 7 8 14 13 6 5 11 12 15

1. double π’s elements(i 7→ 2|i | − 1, 2|i |) and add 0and 2n + 1

2. elements of π′ = vertices

3. black edges connect distinctadjacent genes

4. grey edges connect distinctconsecutive genes

010

9

1

2

3478

14

13

6

5

1112 15

π′0

π′1

π′2

π′3

π′4

π′5

π′6π′

7π′

8

π′9

π′10

π′11

π′12

π′13

π′14

π′15

75

Page 146: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Using the breakpoint graph

I The breakpoint graph is 2-regular and decomposes as such intoalternating cycles in a unique way;

I The breakpoint graph of 〈1 2 · · · n〉 contains the largest numberof cycles:

0

10

9

1

2

3

47

8

14

13

6

5

11

1215

0

1

2

3

4

5

67

8

9

10

11

12

13

1415

I ⇒ goal: create new cycles in as few moves as possible;

76

Page 147: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

Practice

8. Draw the breakpoint graph for the signed permutation

〈−4, 3,−2,−5, 1〉

Page 148: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Overview of Hannenhalli and Pevzner’s solution

I [Hannenhalli and Pevzner, 1999] came up with the firstpolynomial-time algorithm for this problem:

1. Transform π into π (simpler, does not affect distance);

2. Find an optimal sorting sequence for π;

2.1 identify “good” and “bad” cycles in BG(π);2.2 identify “good” and “bad” components in BG(π);2.3 “sort” those components to optimality;

3. Convert it back to an optimal sorting sequence for π;

77

Page 149: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Overview of Hannenhalli and Pevzner’s solution

I [Hannenhalli and Pevzner, 1999] came up with the firstpolynomial-time algorithm for this problem:

1. Transform π into π (simpler, does not affect distance);2. Find an optimal sorting sequence for π;

2.1 identify “good” and “bad” cycles in BG(π);2.2 identify “good” and “bad” components in BG(π);2.3 “sort” those components to optimality;

3. Convert it back to an optimal sorting sequence for π;

78

Page 150: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Overview of Hannenhalli and Pevzner’s solution

I [Hannenhalli and Pevzner, 1999] came up with the firstpolynomial-time algorithm for this problem:

1. Transform π into π (simpler, does not affect distance);2. Find an optimal sorting sequence for π;

2.1 identify “good” and “bad” cycles in BG(π);2.2 identify “good” and “bad” components in BG(π);2.3 “sort” those components to optimality;

3. Convert it back to an optimal sorting sequence for π;

79

Page 151: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Overview of Hannenhalli and Pevzner’s solution

I [Hannenhalli and Pevzner, 1999] came up with the firstpolynomial-time algorithm for this problem:

1. Transform π into π (simpler, does not affect distance);2. Find an optimal sorting sequence for π;

2.1 identify “good” and “bad” cycles in BG(π);

2.2 identify “good” and “bad” components in BG(π);2.3 “sort” those components to optimality;

3. Convert it back to an optimal sorting sequence for π;

80

Page 152: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Overview of Hannenhalli and Pevzner’s solution

I [Hannenhalli and Pevzner, 1999] came up with the firstpolynomial-time algorithm for this problem:

1. Transform π into π (simpler, does not affect distance);2. Find an optimal sorting sequence for π;

2.1 identify “good” and “bad” cycles in BG(π);2.2 identify “good” and “bad” components in BG(π);

2.3 “sort” those components to optimality;

3. Convert it back to an optimal sorting sequence for π;

81

Page 153: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Overview of Hannenhalli and Pevzner’s solution

I [Hannenhalli and Pevzner, 1999] came up with the firstpolynomial-time algorithm for this problem:

1. Transform π into π (simpler, does not affect distance);2. Find an optimal sorting sequence for π;

2.1 identify “good” and “bad” cycles in BG(π);2.2 identify “good” and “bad” components in BG(π);2.3 “sort” those components to optimality;

3. Convert it back to an optimal sorting sequence for π;

82

Page 154: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Transformation into simple permutations

I A permutation π is simple if BG (π) contains only cycles of length≤ 2;

I The transformation was introduced to simplify analysis andpreserves the distance: if π is the “simplified” version of π, thensrd(π) = srd(π);

I So we can assume from now on that the permutation to sort issimple;

I [Gog and Bader, 2008] give fast algorithms to achieve theconversions;

83

Page 155: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Transformation into simple permutations

I A permutation π is simple if BG (π) contains only cycles of length≤ 2;

I The transformation was introduced to simplify analysis andpreserves the distance: if π is the “simplified” version of π, thensrd(π) = srd(π);

I So we can assume from now on that the permutation to sort issimple;

I [Gog and Bader, 2008] give fast algorithms to achieve theconversions;

84

Page 156: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Transformation into simple permutations

I A permutation π is simple if BG (π) contains only cycles of length≤ 2;

I The transformation was introduced to simplify analysis andpreserves the distance: if π is the “simplified” version of π, thensrd(π) = srd(π);

I So we can assume from now on that the permutation to sort issimple;

I [Gog and Bader, 2008] give fast algorithms to achieve theconversions;

85

Page 157: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Transformation into simple permutations

I A permutation π is simple if BG (π) contains only cycles of length≤ 2;

I The transformation was introduced to simplify analysis andpreserves the distance: if π is the “simplified” version of π, thensrd(π) = srd(π);

I So we can assume from now on that the permutation to sort issimple;

I [Gog and Bader, 2008] give fast algorithms to achieve theconversions;

86

Page 158: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

A lower bound on the signed reversal distance

I A signed reversal involves black edges belonging to at most twocycles;

I The only way to increase c(BG (π)) is to split cycles:

π′2i π′2i+1 π′2j π′2j+1

· · ·π′2i π′2j π′2i+1 π

′2j+1

· · ·

I Therefore, for all π in S±n :

srd(π) ≥ n + 1− c(BG (π)).

87

Page 159: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

A lower bound on the signed reversal distance

I A signed reversal involves black edges belonging to at most twocycles;

I The only way to increase c(BG (π)) is to split cycles:

π′2i π′2i+1 π′2j π′2j+1

· · ·π′2i π′2j π′2i+1 π

′2j+1

· · ·

I Therefore, for all π in S±n :

srd(π) ≥ n + 1− c(BG (π)).

88

Page 160: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

A lower bound on the signed reversal distance

I A signed reversal involves black edges belonging to at most twocycles;

I The only way to increase c(BG (π)) is to split cycles:

π′2i π′2i+1 π′2j π′2j+1

· · ·π′2i π′2j π′2i+1 π

′2j+1

· · ·

I Therefore, for all π in S±n :

srd(π) ≥ n + 1− c(BG (π)).

89

Page 161: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Good and bad cycles

I However, we cannot always split cycles:

π′2i π′2i+1 π′2j π′2j+1

· · ·π′2i π′2j π′2i+1 π

′2j+1

· · ·

I Hence the inequality: we can split “good” cycles, and we cannotsplit “bad” cycles;

I standard terminology: “good” = oriented, “bad” = unoriented;

90

Page 162: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Good and bad cycles

I However, we cannot always split cycles:

π′2i π′2i+1 π′2j π′2j+1

· · ·π′2i π′2j π′2i+1 π

′2j+1

· · ·

I Hence the inequality: we can split “good” cycles, and we cannotsplit “bad” cycles;

I standard terminology: “good” = oriented, “bad” = unoriented;

91

Page 163: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Good and bad cycles

I However, we cannot always split cycles:

π′2i π′2i+1 π′2j π′2j+1

· · ·π′2i π′2j π′2i+1 π

′2j+1

· · ·

I Hence the inequality: we can split “good” cycles, and we cannotsplit “bad” cycles;

I standard terminology: “good” = oriented, “bad” = unoriented;

92

Page 164: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Handling good cycles

I Things are actually more complicated than that;

I Even when we don’t have bad cycles, the order in which we dothings matters!

Example (careless and careful cycle splitting)

0 6 5 10 9 2 1 7 8 3 4 11

(bad)−→ 0 1 2 9 10 5 6 7 8 3 4 11

0 6 5 10 9 2 1 7 8 3 4 11

(good)−→ 0 6 5 10 9 8 7 1 2 3 4 11

93

Page 165: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Handling good cycles

I Things are actually more complicated than that;

I Even when we don’t have bad cycles, the order in which we dothings matters!

Example (careless and careful cycle splitting)

0 6 5 10 9 2 1 7 8 3 4 11

(bad)−→ 0 1 2 9 10 5 6 7 8 3 4 11

0 6 5 10 9 2 1 7 8 3 4 11

(good)−→ 0 6 5 10 9 8 7 1 2 3 4 11

94

Page 166: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Handling good cycles

I Things are actually more complicated than that;

I Even when we don’t have bad cycles, the order in which we dothings matters!

Example (careless and careful cycle splitting)

0 6 5 10 9 2 1 7 8 3 4 11

(bad)−→ 0 1 2 9 10 5 6 7 8 3 4 11

0 6 5 10 9 2 1 7 8 3 4 11

(good)−→ 0 6 5 10 9 8 7 1 2 3 4 11

95

Page 167: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Handling good cycles

I Things are actually more complicated than that;

I Even when we don’t have bad cycles, the order in which we dothings matters!

Example (careless and careful cycle splitting)

0 6 5 10 9 2 1 7 8 3 4 11

(bad)−→ 0 1 2 9 10 5 6 7 8 3 4 11

0 6 5 10 9 2 1 7 8 3 4 11

(good)−→ 0 6 5 10 9 8 7 1 2 3 4 11

96

Page 168: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Handling bad cycles

I “Bad” cycles are not so bad if we can make them “good” (seeprevious example);

I But sometimes we can’t:

Example (a minimal permutation with only bad cycles)

0 5 6 3 4 1 2 7

97

Page 169: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Components

I Although reversals only modify the “contents” of a single cycle,they may also modify the configuration of some other cycles;

I This suggests that cycles are not the right “unit” to deal with;

I We need to consider collections of cycles, or components instead;

98

Page 170: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Interleaving cycles

I Grey edge π′i , π′j spans the interval [i , j ] in π′;I Two grey edges interleave if their spans intersect properly;

I Two cycles interleave if they contain interleaving edges;I The interleaving graph Iπ is defined by:

I V (Iπ) = cycles of BG (π);I E (Iπ) = pairs of interleaving cycles in BG (π);

I A component of the breakpoint graph is a connected componentof the interleaving graph;

99

Page 171: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

The interleaving graph Iπ

(source: [Hannenhalli and Pevzner, 1999])

100

Page 172: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Good and bad components

I A component is bad if it contains only bad cycles, and goodotherwise;

I Although we must be careful (as seen before), good componentsare not a problem:

I they contain good cycles, which can be split;I applying a signed reversal on a 2-cycle C reverses the orientation of

the cycles interleaving with C (and also changes their interleavingrelationships);

I so we just need to make sure at each step that we can keep splittingcycles afterwards;

101

Page 173: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

Hurdles

I “Bad” (unoriented) components are called hurdles and are aproblem;

I There are two ways of getting rid of them:

1. “cutting” them;2. “merging” them;

I Either way, one move must be wasted for each hurdle to turn theminto “good” components;

I Therefore, for all π in S±n :

srd(π) ≥ n + 1− c(BG (π)) + h(π).

102

Page 174: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

Practice

9. In the breakpoint graph of

〈3,−6,−9, 8,−7, 4,−10,−5, 11, 2, 1〉 :

I Identify good and bad cycles, components and hurdlesI Give a lower-bound for its signed reversal distance and an

optimal sorting scenario

0 1 23 45 6 7 8 9101112 131415 161718 1920 21 22 23

Page 175: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

An actual formula

I All this (and many concealed details) leads to a formula for thesigned reversal distance:

Theorem ([Hannenhalli and Pevzner, 1999])

For all π in S±n :

srd(π) = n + 1− c(BG (π)) + h(π)︸︷︷︸number of hurdles

+ f (π)︸︷︷︸special “fortress” case

.

103

Page 176: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

StringsOther models

Alternative approachesBeyond pairwise comparisons

Duplications in evolutionBalanced stringsGeneral strings

Motivation

I We saw a model for representing genomes without directionality;

I We saw another model for taking directionality into account;

I Both of them lack realism in a crucial way: they don’t allowduplications;

I And duplications / insertions / deletions account for a very largepart of what happens in evolution [Ohno, 1970];

2

Page 177: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

StringsOther models

Alternative approachesBeyond pairwise comparisons

Duplications in evolutionBalanced stringsGeneral strings

Two examples of duplications

Example (tandem duplications)

(source: K. Aainsqatsi on Wikimedia)

Example (whole genome duplication)

(source: Eric Lyons on CoGePedia)

3

Page 178: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

StringsOther models

Alternative approachesBeyond pairwise comparisons

Duplications in evolutionBalanced stringsGeneral strings

Strings

I Since duplications pervade genomes, we should take them intoaccount;

I We now see genomes as strings on an alphabet Σ;

I Be careful: similar segments have been identified, soΣ = segments and not A,C ,G ,T;

I Our goal is still to explain evolution using most parsimoniousscenarios made of fixed transformations;

5

Page 179: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

StringsOther models

Alternative approachesBeyond pairwise comparisons

Duplications in evolutionBalanced stringsGeneral strings

Strings

I Note: the restriction to sorting problems does not work anymore;I if you have two A’s, which one should be “number one”?

I So we really are interested in transforming one string into another,which is not equivalent to sorting another string;

I Sorting problems have been considered in that model, but they’rejust a special case of a more general problem;

6

Page 180: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

StringsOther models

Alternative approachesBeyond pairwise comparisons

Duplications in evolutionBalanced stringsGeneral strings

Strings

I We can distinguish between several approaches based on genecontents;

I Either we have exactly the same contents in both genomes (andduplications are of course allowed);

I Or we have duplications but with different amounts of repetitions(e.g. three 1’s in genome A but only two in genome B);

I This time the breakpoint graph cannot save us anymore, since wewould not know how to connect elements or decompose the graph;

7

Page 181: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

StringsOther models

Alternative approachesBeyond pairwise comparisons

Duplications in evolutionBalanced stringsGeneral strings

Balanced strings

I The number of occurrences of a character c in a string S isdenoted by occ(c ,S);

Definition (balanced strings)

Two strings S and T on an alphabet Σ are balanced if:

∀ c ∈ Σ : occ(c ,S) = occ(c ,T ).

I Basically, S and T are anagrams;

I Straightforward generalisation of permutations: we haveduplications, but we actually still have the same content in bothgenomes;

8

Page 182: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

StringsOther models

Alternative approachesBeyond pairwise comparisons

Duplications in evolutionBalanced stringsGeneral strings

Comparing balanced strings

I One way of relating genomes’ contents is to identify commonsegments;

I In other words, we want to partition genomes into the same set ofsegments;

I this is how we obtained (signed) permutations;I but now we want to partition the resulting sequences;

9

Page 183: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

StringsOther models

Alternative approachesBeyond pairwise comparisons

Duplications in evolutionBalanced stringsGeneral strings

Generalising breakpoints

I Recall that, for permutations:I adjacencies are pairs of adjacent elements in π that are also

adjacent in ι = 〈1 2 · · · n〉 (or χ = 〈n n − 1 · · · 1〉 for reversals);I breakpoints are pairs that are not adjacencies;

I Recall that, for signed permutations:I adjacencies are pairs of adjacent elements in π that are also

adjacent in ι = 〈1 2 · · · n〉 (or χ = 〈−n − (n − 1) · · · − 1〉 forsigned reversals);

I breakpoints are pairs that are not adjacencies;

I Those can be generalised to any pair of permutations;

I And we can do the same thing for strings;

10

Page 184: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

StringsOther models

Alternative approachesBeyond pairwise comparisons

Duplications in evolutionBalanced stringsGeneral strings

Minimum common string partition

I A partition of a string S is a set of strings that can beconcatenated to obtain S ;

I A common partition of two strings S and T is a set of stringsthat can be concatenated to obtain both S and T ;

Example (common string partitions)

Here’s a common partition of “dictionary” and “indicatory”:

d i c t i o n a r y

S1 S2 S3 S4 S5 S6 S7

i n d i c a t o r y

S3 S5 S1 S6 S2 S4 S7

11

Page 185: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

StringsOther models

Alternative approachesBeyond pairwise comparisons

Duplications in evolutionBalanced stringsGeneral strings

Minimum common string partition

I A common string partition is minimum if there is no smallercommon string partition for the two strings under consideration;

I This leads to the following decision problem:

Problem (minimum common string partition (mcsp))

Instance: balanced strings S and T , a bound k ∈ N;Question: is there a common partition of S and T with at most kblocks?

12

Page 186: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

StringsOther models

Alternative approachesBeyond pairwise comparisons

Duplications in evolutionBalanced stringsGeneral strings

Relation(s) to rearrangement problems

I Recall that breakpoints were pairs of elements adjacent in onegenome but not in the other;

I Common string partitions generalise that point of view to anarbitrary number of elements in each part;

I So if we have a minimum common string partition for S and T , weget the number of breakpoints between strings S and T ;

13

Page 187: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

StringsOther models

Alternative approachesBeyond pairwise comparisons

Duplications in evolutionBalanced stringsGeneral strings

About mcsp

I Bad news about mcsp:I NP-hard, even if only one gene family is

nontrivial [Blin et al., 2004];I APX-hard, even if no character appears more than

twice [Goldstein et al., 2005];

I Good news about mcsp:I fixed parameter tractable: a solution of size k can be found in time

f (k) · poly(n) (n = |S | = |T |) [Bulteau and Komusiewicz, 2014];

I Greedy approach [Goldstein and Lewenstein, 2011]: repeatedlyselect an LCS without any marked letter;

X simple and fast (runs in O(n) time);× approximation ratio between Ω(n0.43) and O(n0.69)

[Kaplan and Shafrir, 2006];

14

Page 188: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

Practice

10. Give pairwise MCSP distances between the following strings(ignore capitals and whitespaces)

I Arrange A StringI A Staring RangerI A Garnering Tsar

11. Find a pair of strings for which Greedy behaves as badly aspossible (i.e. maximizing the approximation ratio)

Page 189: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

References I

Akers, S. B., Krishnamurthy, B., and Harel, D. (1987).

The star graph: An attractive alternative to the n-cube.In ICPP’87, pages 393–400. Pennsylvania State University Press.

Bafna, V. and Pevzner, P. A. (1996).

Genome rearrangements and sorting by reversals.SIAM Journal on Computing, 25(2):272–289.

Bafna, V. and Pevzner, P. A. (1998).

Sorting by transpositions.SIAM Journal on Discrete Mathematics, 11(2):224–240.

Berman, P., Hannenhalli, S., and Karpinski, M. (2002).

1.375-approximation algorithm for sorting by reversals.In ESA’02, volume 2461 of LNCS, pages 200–210. Springer-Verlag.

Bulteau, L., Fertin, G., and Rusu, I. (2012).

Sorting by transpositions is difficult.SIAM Journal on Discrete Mathematics, 26(3):1148–1180.

Bulteau, L., Fertin, G., and Rusu, I. (2015).

Pancake flipping is hard.Journal of Computer and System Sciences, 81(8):1556–1574.

Caprara, A. (1999).

Sorting permutations by reversals and eulerian cycle decompositions.SIAM Journal on Discrete Mathematics, 12(1):91–110 (electronic).

104

Page 190: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

References IIChen, X. (2010).

On sorting permutations by double-cut-and-joins.In COCOON’10, volume 6196 of LNCS, pages 439–448. Springer-Verlag.

Christie, D. A. (1996).

Sorting permutations by block-interchanges.Information Processing Letters, 60(4):165–169.

Dias, Z. and Meidanis, J. (2002).

Sorting by prefix transpositions.In SPIRE’02, volume 2476 of LNCS, pages 65–76. Springer-Verlag.

Elias, I. and Hartman, T. (2006).

A 1.375-approximation algorithm for sorting by transpositions.IEEE/ACM Transactions on Computational Biology and Bioinformatics, 3(4):369–379.

Fischer, J. and Ginzinger, S. W. (2005).

A 2-approximation algorithm for sorting by prefix reversals.In ESA’05, volume 3669 of LNCS, pages 415–425. Springer-Verlag.

Gog, S. and Bader, M. (2008).

Fast algorithms for transforming back and forth between a signed permutation and its equivalent simplepermutation.Journal of Computational Biology, 15(8):1–13.

Hannenhalli, S. and Pevzner, P. A. (1999).

Transforming cabbage into turnip: Polynomial algorithm for sorting signed permutations by reversals.Journal of the ACM, 46(1):1–27.

105

Page 191: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

PermutationsSigned permutations

Signed reversals

References III

Kececioglu, J. and Sankoff, D. (1995).

Exact and approximation algorithms for sorting by reversals, with application to genome rearrangement.Algorithmica, 13(1-2):180–210.

Knuth, D. E. (1995).

Sorting and Searching, volume 3 of The art of Computer Programming.Addison-Wesley.

Watterson, G. A., Ewens, W. J., Hall, T. E., and Morgan, A. (1982).

The chromosome inversion problem.Journal of Theoretical Biolooy, 99:1–7.

106

Page 192: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

Algorithms and Bioinformatics

Part II — Comparative Genomics

II.2 — Focus on MCSP and FPT

Laurent BulteauPresented at WABI 2013

Workshop on Algorithms in Bioinformatics

Page 193: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Fixed-Parameter Algorithm forMinimum Common String Partition with

Few Duplications

Laurent Bulteau, Christian Komusiewicz,

Guillaume Fertin, Irena Rusu

WABI 2013

Full version available at arxiv.org/abs/1307.7842

Page 194: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Outline

1 MCSP Problem

2 O(d2kn) FPT algorithm

3 Implementation

4 Conclusion

2/21

Page 195: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

MCSP Problem

3/21

Page 196: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Minimum Common String Partition

Input: two strings of length nOutput: min. set of blocks partitioning both sequences

Parameters:

d = max. number of occurences of each letter

k = number of blocks in an optimal solution

d e e a b d a b d d e b d e a be a b a b d d e e a b d d e b d4/21

Page 197: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Minimum Common String Partition

Input: two strings of length nOutput: min. set of blocks partitioning both sequences

Parameters:

d = max. number of occurences of each letter

k = number of blocks in an optimal solution

d e e a b d | a b d | d e b d | e a be a b | a b d | d e e a b d | d e b d4/21

Page 198: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Minimum Common String Partition

Input: two strings of length nOutput: min. set of blocks partitioning both sequences

Parameters:

d = max. number of occurences of each letter

k = number of blocks in an optimal solutiond e e a b d | a b d | d e b d | e a be a b | a b d | d e e a b d | d e b d4/21

Page 199: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Minimum Common String Partition

Input: two strings of length nOutput: min. set of blocks partitioning both sequences

Parameters:

d = max. number of occurences of each letter

k = number of blocks in an optimal solutiond e e a b d | a b d | d e b d | e a be a b | a b d | d e e a b d | d e b dn = 18 elements, d = 5, k = 4

4/21

Page 200: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Minimum Common String Partition

Input: two strings of length nOutput: min. set of blocks partitioning both sequences

Parameters:

d = max. number of occurences of each letter

k = number of blocks in an optimal solutiond e e a b d a b d d e b d e a be a b a b d d e e a b d d e b dCan be seen as a matching problem

4/21

Page 201: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Minimum Common String Partition

Input: two strings of length nOutput: min. set of blocks partitioning both sequences

Parameters:

d = max. number of occurences of each letter

k = number of blocks in an optimal solution

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

Unbalanced strings: allow deletion of abundant elements between

blocks (all rare elements are kept)

4/21

Page 202: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Minimum Common String Partition

Input: two strings of length nOutput: min. set of blocks partitioning both sequences

Parameters:

d = max. number of occurences of each letter

k = number of blocks in an optimal solution

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

abundantrare

Unbalanced strings: allow deletion of abundant elements between

blocks (all rare elements are kept)

4/21

Page 203: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Minimum Common String Partition

Input: two strings of length nOutput: min. set of blocks partitioning both sequences

Parameters:

d = max. number of occurences of each letter

k = number of blocks in an optimal solution

d e e a b c d | a | b c d | d e b d | e a | a

e a | e c | b c d | d e e a b c d | d e b d

4/21

Page 204: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Minimum Common String Partition

NP-hard

FPT algorithms d!kpoly(n)k = number of blocks; d = number of duplications

FPT algorithm with parameter k only k21k2poly(n)

Improved FPT algorithm with k and d d2kpoly(n)+ Allows for unbalanced strings

+ Implemented

Assignment of orthologous genes via genome rearrangement

X. Chen, J. Zheng, Z. Fu, P. Nan, Y. Zhong, S. Lonardi and T. Jiang 2005

5/21

Page 205: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Minimum Common String Partition

NP-hard

FPT algorithms d!kpoly(n)k = number of blocks; d = number of duplications

FPT algorithm with parameter k only k21k2poly(n)

Improved FPT algorithm with k and d d2kpoly(n)+ Allows for unbalanced strings

+ Implemented

Minimum common string partition problem: Hardness and

approximations

A. Goldstein, P. Kolman and J. Zheng 2005

5/21

Page 206: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Minimum Common String Partition

NP-hard

FPT algorithms d!kpoly(n)k = number of blocks; d = number of duplications

FPT algorithm with parameter k only k21k2poly(n)

Improved FPT algorithm with k and d d2kpoly(n)+ Allows for unbalanced strings

+ Implemented

Minimum common string partition parameterized

P. Damaschke 2008

Minimum common string partition revisited

H. Jiang, B. Zhu, D. Zhu and H. Zhu 2012

5/21

Page 207: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Minimum Common String Partition

NP-hard

FPT algorithms d!kpoly(n)k = number of blocks; d = number of duplications

FPT algorithm with parameter k only k21k2poly(n)

Improved FPT algorithm with k and d d2kpoly(n)+ Allows for unbalanced strings

+ Implemented

Minimum Common String Partition Parameterized by

Partition Size is Fixed-Parameter Tractable

L. Bulteau, C. Komusiewicz Arxiv 2013

5/21

Page 208: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Minimum Common String Partition

NP-hard

FPT algorithms d!kpoly(n)k = number of blocks; d = number of duplications

FPT algorithm with parameter k only k21k2poly(n)

Improved FPT algorithm with k and d d2kpoly(n)+ Allows for unbalanced strings

+ Implemented

This talk

5/21

Page 209: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

O(d 2kn) FPT algorithm

6/21

Page 210: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Seeds

Optimal solution OPT, seen as a matching

One block ↔ one seed

Goal: nd a good set of seeds

Every rare element must be in the same block as a seed:

the set is complete

Two seeds cannot be in the same block:

the set is non-redundant

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

7/21

Page 211: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Seeds

Optimal solution OPT, seen as a matching

One block ↔ one seed

Goal: nd a good set of seeds

Every rare element must be in the same block as a seed:

the set is complete

Two seeds cannot be in the same block:

the set is non-redundant

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

7/21

Page 212: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Seeds

Optimal solution OPT, seen as a matching

One block ↔ one seed

Goal: nd a good set of seeds

Every rare element must be in the same block as a seed:

the set is complete

Two seeds cannot be in the same block:

the set is non-redundant

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

7/21

Page 213: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Seeds

Optimal solution OPT, seen as a matching

One block ↔ one seed

Goal: nd a good set of seeds

Every rare element must be in the same block as a seed:

the set is complete

Two seeds cannot be in the same block:

the set is non-redundant

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

7/21

Page 214: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Seeds

Optimal solution OPT, seen as a matching

One block ↔ one seed

Goal: nd a good set of seeds

Every rare element must be in the same block as a seed:

the set is complete

Two seeds cannot be in the same block:

the set is non-redundant

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

7/21

Page 215: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Seeds

Optimal solution OPT, seen as a matching

One block ↔ one seed

Goal: nd a good set of seeds

Every rare element must be in the same block as a seed:

the set is complete

Two seeds cannot be in the same block:

the set is non-redundant

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

7/21

Page 216: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Seeds

Optimal solution OPT, seen as a matching

One block ↔ one seed

Goal: nd a good set of seeds

Every rare element must be in the same block as a seed:

the set is complete

Two seeds cannot be in the same block:

the set is non-redundant

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

7/21

Page 217: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Seeds

Optimal solution OPT, seen as a matching

One block ↔ one seed

Goal: nd a good set of seeds

Every rare element must be in the same block as a seed:

the set is complete

Two seeds cannot be in the same block:

the set is non-redundant

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

7/21

Page 218: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Algorithm Outline

Start with set of seeds T = ∅

Identify set of pairs containing 1 correct seed

Try each candidate in the set as a new seed, start again

No more possible set:

The set is complete, create a CSP corresponding to the seeds.

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

8/21

Page 219: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Algorithm Outline

Start with set of seeds T = ∅Identify set of pairs containing 1 correct seed

Try each candidate in the set as a new seed, start again

No more possible set:

The set is complete, create a CSP corresponding to the seeds.

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

8/21

Page 220: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Algorithm Outline

Start with set of seeds T = ∅Identify set of pairs containing 1 correct seed

Try each candidate in the set as a new seed, start again

No more possible set:

The set is complete, create a CSP corresponding to the seeds.

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

8/21

Page 221: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Algorithm Outline

Start with set of seeds T = ∅Identify set of pairs containing 1 correct seed

Try each candidate in the set as a new seed, start again

No more possible set:

The set is complete, create a CSP corresponding to the seeds.

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

8/21

Page 222: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Algorithm Outline

Start with set of seeds T = ∅Identify set of pairs containing 1 correct seed

Try each candidate in the set as a new seed, start again

No more possible set:

The set is complete, create a CSP corresponding to the seeds.

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

8/21

Page 223: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Algorithm Outline

Start with set of seeds T = ∅Identify set of pairs containing 1 correct seed

Try each candidate in the set as a new seed, start again

No more possible set:

The set is complete, create a CSP corresponding to the seeds.

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

8/21

Page 224: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Algorithm Outline

Start with set of seeds T = ∅Identify set of pairs containing 1 correct seed

Try each candidate in the set as a new seed, start again

No more possible set:

The set is complete, create a CSP corresponding to the seeds.

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

8/21

Page 225: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Algorithm Outline

Start with set of seeds T = ∅Identify set of pairs containing 1 correct seed

Try each candidate in the set as a new seed, start again

No more possible set:

The set is complete, create a CSP corresponding to the seeds.

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

8/21

Page 226: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Algorithm Outline

Start with set of seeds T = ∅Identify set of pairs containing 1 correct seed

Try each candidate in the set as a new seed, start again

No more possible set:

The set is complete, create a CSP corresponding to the seeds.

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

8/21

Page 227: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Algorithm Outline

Start with set of seeds T = ∅Identify set of pairs containing 1 correct seed

Try each candidate in the set as a new seed, start again

No more possible set:

The set is complete, create a CSP corresponding to the seeds.

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

8/21

Page 228: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Algorithm Outline

Start with set of seeds T = ∅Identify set of pairs containing 1 correct seed

Try each candidate in the set as a new seed, start again

No more possible set:

The set is complete, create a CSP corresponding to the seeds.

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

8/21

Page 229: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Algorithm Outline

Start with set of seeds T = ∅Identify set of pairs containing 1 correct seed

Try each candidate in the set as a new seed, start again

No more possible set:

The set is complete, create a CSP corresponding to the seeds.

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

8/21

Page 230: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Algorithm Outline

Start with set of seeds T = ∅Identify set of pairs containing 1 correct seed

Try each candidate in the set as a new seed, start again

No more possible set:

The set is complete, create a CSP corresponding to the seeds.

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

8/21

Page 231: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Algorithm Outline

Start with set of seeds T = ∅Identify set of pairs containing 1 correct seed

Try each candidate in the set as a new seed, start again

No more possible set:

The set is complete, create a CSP corresponding to the seeds.

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

8/21

Page 232: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Algorithm Outline

Start with set of seeds T = ∅Identify set of pairs containing 1 correct seed

Try each candidate in the set as a new seed, start again

No more possible set:

The set is complete, create a CSP corresponding to the seeds.

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

8/21

Page 233: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Algorithm Outline

Start with set of seeds T = ∅Identify set of pairs containing 1 correct seed

Try each candidate in the set as a new seed, start again

No more possible set:

The set is complete, create a CSP corresponding to the seeds.

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

8/21

Page 234: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Finding new seeds (1)

Non-redundancy

In OPT, every (x , y) which is not in the same block as an already

present seed can be a new seed...

Every x which cannot be in the same block as any seed can be

part of a new seed.

y is one of the d copies of x in the other sequence: each such

pair is a candidate

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

9/21

Page 235: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Finding new seeds (1)

Non-redundancy

In OPT, every (x , y) which is not in the same block as an already

present seed can be a new seed...

Every x which cannot be in the same block as any seed can be

part of a new seed.

y is one of the d copies of x in the other sequence: each such

pair is a candidate

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

9/21

Page 236: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Finding new seeds (1)

Non-redundancy

In OPT, every (x , y) which is not in the same block as an already

present seed can be a new seed...

Every x which cannot be in the same block as any seed can be

part of a new seed.

y is one of the d copies of x in the other sequence: each such

pair is a candidate

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

9/21

Page 237: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Finding new seeds (1)

Non-redundancy

In OPT, every (x , y) which is not in the same block as an already

present seed can be a new seed...

Every x which cannot be in the same block as any seed can be

part of a new seed.

y is one of the d copies of x in the other sequence: each such

pair is a candidate

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

9/21

Page 238: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Finding new seeds (2)

Association graph

Link together pairs (x , y) that can be in the same block as a seed

Green edge: (x , y) can be part of the right-side of a seed

Red edge: (x , y) can be part of the left-side of a seed

Abundant vertices can be ignored.

Degree-0 vertices can be part of new seeds (rule 1).

What about degree-1 and degree-2 vertices?

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

10/21

Page 239: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Finding new seeds (2)

Association graph

Link together pairs (x , y) that can be in the same block as a seed

Green edge: (x , y) can be part of the right-side of a seed

Red edge: (x , y) can be part of the left-side of a seed

Abundant vertices can be ignored.

Degree-0 vertices can be part of new seeds (rule 1).

What about degree-1 and degree-2 vertices?

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

10/21

Page 240: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Finding new seeds (2)

Association graph

Link together pairs (x , y) that can be in the same block as a seed

Green edge: (x , y) can be part of the right-side of a seed

Red edge: (x , y) can be part of the left-side of a seed

Abundant vertices can be ignored.

Degree-0 vertices can be part of new seeds (rule 1).

What about degree-1 and degree-2 vertices?

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

10/21

Page 241: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Finding new seeds (2)

Association graph

Link together pairs (x , y) that can be in the same block as a seed

Green edge: (x , y) can be part of the right-side of a seed

Red edge: (x , y) can be part of the left-side of a seed

Abundant vertices can be ignored.

Degree-0 vertices can be part of new seeds (rule 1).

What about degree-1 and degree-2 vertices?

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

10/21

Page 242: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Finding new seeds (2)

Association graph

Link together pairs (x , y) that can be in the same block as a seed

Green edge: (x , y) can be part of the right-side of a seed

Red edge: (x , y) can be part of the left-side of a seed

Abundant vertices can be ignored.

Degree-0 vertices can be part of new seeds (rule 1).

What about degree-1 and degree-2 vertices?

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

10/21

Page 243: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Finding new seeds (2)

Association graph

Link together pairs (x , y) that can be in the same block as a seed

Green edge: (x , y) can be part of the right-side of a seed

Red edge: (x , y) can be part of the left-side of a seed

Abundant vertices can be ignored.

Degree-0 vertices can be part of new seeds (rule 1).

What about degree-1 and degree-2 vertices?

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

10/21

Page 244: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Finding new seeds (2)

Association graph

Link together pairs (x , y) that can be in the same block as a seed

Green edge: (x , y) can be part of the right-side of a seed

Red edge: (x , y) can be part of the left-side of a seed

Abundant vertices can be ignored.

Degree-0 vertices can be part of new seeds (rule 1).

What about degree-1 and degree-2 vertices?

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

10/21

Page 245: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Finding new seeds (2)

Association graph

Link together pairs (x , y) that can be in the same block as a seed

Green edge: (x , y) can be part of the right-side of a seed

Red edge: (x , y) can be part of the left-side of a seed

Abundant vertices can be ignored.

Degree-0 vertices can be part of new seeds (rule 1).

What about degree-1 and degree-2 vertices?

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

10/21

Page 246: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Odd Paths

Path of odd length

If (c1, c2, c3, . . . , c2p+1) is an odd path with c1 being rare

One of c2i+1 | i = 1..p is part of a seed.

NB: p ≤ dOther half of the seed: any letter c in the other sequence.

Total number of seeds to try: at most d2.

1 3 5 7 9 2 4 6 8x x11/21

Page 247: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Odd Paths

Path of odd length

If (c1, c2, c3, . . . , c2p+1) is an odd path with c1 being rare

One of c2i+1 | i = 1..p is part of a seed.

NB: p ≤ d

Other half of the seed: any letter c in the other sequence.

Total number of seeds to try: at most d2.

c1

c3

c5

c7

c9

c2

c4

c6

c8

x x

11/21

Page 248: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Odd Paths

Path of odd length

If (c1, c2, c3, . . . , c2p+1) is an odd path with c1 being rare

One of c2i+1 | i = 1..p is part of a seed.

NB: p ≤ d

Other half of the seed: any letter c in the other sequence.

Total number of seeds to try: at most d2.

c1

c3

c5

c7

c9

c2

c4

c6

c8

x x

11/21

Page 249: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Odd Paths

Path of odd length

If (c1, c2, c3, . . . , c2p+1) is an odd path with c1 being rare

One of c2i+1 | i = 1..p is part of a seed.

NB: p ≤ d

Other half of the seed: any letter c in the other sequence.

Total number of seeds to try: at most d2.

1 3 5 7 9 2 4 6 8XXX Xx x11/21

Page 250: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Odd Paths

Path of odd length

If (c1, c2, c3, . . . , c2p+1) is an odd path with c1 being rare

One of c2i+1 | i = 1..p is part of a seed.

NB: p ≤ dOther half of the seed: any letter c in the other sequence.

Total number of seeds to try: at most d2.

1 3 5 7 9 2 4 6 8 XXx x11/21

Page 251: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Odd Paths

Path of odd length

If (c1, c2, c3, . . . , c2p+1) is an odd path with c1 being rare

One of c2i+1 | i = 1..p is part of a seed.

NB: p ≤ dOther half of the seed: any letter c in the other sequence.

Total number of seeds to try: at most d2. 1 3 5 7 9 2 4 6 8 11/21

Page 252: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

So far...

We know how to deal with single elements and rare odd paths.

What if our association graph has only even paths, cycles, and

abundant odd paths?

Then we can create a CSP based on our seeds!

12/21

Page 253: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

So far...

We know how to deal with single elements and rare odd paths.

What if our association graph has only even paths, cycles, and

abundant odd paths?

Then we can create a CSP based on our seeds!

12/21

Page 254: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Constructing the solution

No more seeds can be found...

Create one block per seed

Extend the blocks to the left and right (details omitted)

Delete remaining (abundant) elements

Output the solution!

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

13/21

Page 255: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Constructing the solution

No more seeds can be found...

Create one block per seed

Extend the blocks to the left and right (details omitted)

Delete remaining (abundant) elements

Output the solution!

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

13/21

Page 256: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Constructing the solution

No more seeds can be found...

Create one block per seed

Extend the blocks to the left and right (details omitted)

Delete remaining (abundant) elements

Output the solution!

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

13/21

Page 257: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Constructing the solution

No more seeds can be found...

Create one block per seed

Extend the blocks to the left and right (details omitted)

Delete remaining (abundant) elements

Output the solution!

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

13/21

Page 258: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Constructing the solution

No more seeds can be found...

Create one block per seed

Extend the blocks to the left and right (details omitted)

Delete remaining (abundant) elements

Output the solution!

d e e a b c d a b c d d e b d e a a

e a e c b c d d e e a b c d d e b d

13/21

Page 259: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Theoretical Time Complexity

Time complexity:d or d2 choices for each of the k seeds: d2k−1 branches

Computing the graph, looking for paths, etc.: O(kn)Overall: O(d2kn)

The algorithm may try to optimize a second objective value.

Many duplicates, few blocks: from d2 to 3dkIf an odd path visits many times the same intervals, keep only

≤ 3k candidate seeds.

x u v a1 b a2 b a3 b a4 b a5 b a6 b a7 w y

x u v a b a b a b a b a b a w y

14/21

Page 260: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Theoretical Time Complexity

Time complexity:d or d2 choices for each of the k seeds: d2k−1 branches

Computing the graph, looking for paths, etc.: O(kn)Overall: O(d2kn)

The algorithm may try to optimize a second objective value.

Many duplicates, few blocks: from d2 to 3dkIf an odd path visits many times the same intervals, keep only

≤ 3k candidate seeds.

x u v a1 b a2 b a3 b a4 b a5 b a6 b a7 w y

x u v a b a b a b a b a b a w y

14/21

Page 261: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Theoretical Time Complexity

Time complexity:d or d2 choices for each of the k seeds: d2k−1 branches

Computing the graph, looking for paths, etc.: O(kn)Overall: O(d2kn)

The algorithm may try to optimize a second objective value.

Many duplicates, few blocks: from d2 to 3dkIf an odd path visits many times the same intervals, keep only

≤ 3k candidate seeds.

x u v a1 b a2 b a3 b a4 b a5 b a6 b a7 w y

x u v a b a b a b a b a b a w y

14/21

Page 262: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Implementation

15/21

Page 263: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Reduction rules

Reduction Rule 1Merge strings with unique letters

Reduction Rule 3Keep best match (one unique letter)

Reduction Rule 2Remove obvious size-1 blocks

Reduction Rule 4Keep best match (≤ 2 occurences)

a u v b

a u v b

16/21

Page 264: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Reduction rules

Reduction Rule 1Merge strings with unique letters

Reduction Rule 3Keep best match (one unique letter)

Reduction Rule 2Remove obvious size-1 blocks

Reduction Rule 4Keep best match (≤ 2 occurences)

a u v b

a u v b

a’

a’

16/21

Page 265: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Reduction rules

Reduction Rule 1Merge strings with unique letters

Reduction Rule 3Keep best match (one unique letter)

Reduction Rule 2Remove obvious size-1 blocks

Reduction Rule 4Keep best match (≤ 2 occurences)

u a v

w a x

16/21

Page 266: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Reduction rules

Reduction Rule 1Merge strings with unique letters

Reduction Rule 3Keep best match (one unique letter)

Reduction Rule 2Remove obvious size-1 blocks

Reduction Rule 4Keep best match (≤ 2 occurences)

a

a

u v

w x

Decrease k by 116/21

Page 267: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Reduction rules

Reduction Rule 1Merge strings with unique letters

Reduction Rule 3Keep best match (one unique letter)

Reduction Rule 2Remove obvious size-1 blocks

Reduction Rule 4Keep best match (≤ 2 occurences)

a

a

u v x w

u v w x

Decrease k by 216/21

Page 268: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Reduction rules

Reduction Rule 1Merge strings with unique letters

Reduction Rule 3Keep best match (one unique letter)

Reduction Rule 2Remove obvious size-1 blocks

Reduction Rule 4Keep best match (≤ 2 occurences)

a

a

u v w x

u v w x

Decrease k by 316/21

Page 269: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Reduction rules

Reduction Rule 1Merge strings with unique letters

Reduction Rule 3Keep best match (one unique letter)

Reduction Rule 2Remove obvious size-1 blocks

Reduction Rule 4Keep best match (≤ 2 occurences)

a

a

u v

u v

Leave k unchanged16/21

Page 270: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Reduction rules

Reduction Rule 1Merge strings with unique letters

Reduction Rule 3Keep best match (one unique letter)

Reduction Rule 2Remove obvious size-1 blocks

Reduction Rule 4Keep best match (≤ 2 occurences)

| a | a

a

16/21

Page 271: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Reduction rules

Reduction Rule 1Merge strings with unique letters

Reduction Rule 3Keep best match (one unique letter)

Reduction Rule 2Remove obvious size-1 blocks

Reduction Rule 4Keep best match (≤ 2 occurences)

| | a

a

a’ a

a

16/21

Page 272: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Reduction rules

Reduction Rule 1Merge strings with unique letters

Reduction Rule 3Keep best match (one unique letter)

Reduction Rule 2Remove obvious size-1 blocks

Reduction Rule 4Keep best match (≤ 2 occurences)

a a

| a | | a |

16/21

Page 273: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Reduction rules

Reduction Rule 1Merge strings with unique letters

Reduction Rule 3Keep best match (one unique letter)

Reduction Rule 2Remove obvious size-1 blocks

Reduction Rule 4Keep best match (≤ 2 occurences)

a

| a | | |

a a’

| a | |a’ |

16/21

Page 274: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Reduction rules

Reduction Rule 1Merge strings with unique letters

Reduction Rule 3Keep best match (one unique letter)

Reduction Rule 2Remove obvious size-1 blocks

Reduction Rule 4Keep best match (≤ 2 occurences)

Reduction of genome size: from 40% to 85%

16/21

Page 275: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Running time on biological data

Genomic sequences from: Borrelia burgdorferi, Treponema pallidum,

Escherichia coli, Bacillus subtilis, and Bacillus thuringiensis

Average letter occurences: from 1.02 to 1.24

Species 1 Species 2 n k d t (s)

B. burg. T. pall. 93 68 3 0.06

B. burg. E. coli 72 59 6 0.22

B. burg. B. sub. 91 63 6 0.15

B. burg. B. thur. 71 51 5 0.09

T. pall. E coli 93 78 5 0.35

T. pall. B. sub. 144 82 7 0.18

T. pall. B. thur. 128 76 6 0.15

E. coli B. sub. 287 234 7 41.06

E. coli B. thur. 282 221 10 18.64

B. sub. B. thur. 693 340 8 249.71

EnsemblBacteria database17/21

Page 276: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Running time on biological data

Genomic sequences from: Borrelia burgdorferi, Treponema pallidum,

Escherichia coli, Bacillus subtilis, and Bacillus thuringiensis

Average letter occurences: from 1.02 to 1.24

Species 1 Species 2 n k d t (s)

B. burg. T. pall. 93 68 3 0.06

B. burg. E. coli 72 59 6 0.22

B. burg. B. sub. 91 63 6 0.15

B. burg. B. thur. 71 51 5 0.09

T. pall. E coli 93 78 5 0.35

T. pall. B. sub. 144 82 7 0.18

T. pall. B. thur. 128 76 6 0.15

E. coli B. sub. 287 234 7 41.06

E. coli B. thur. 282 221 10 18.64

B. sub. B. thur. 693 340 8 249.71

EnsemblBacteria database17/21

Page 277: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Running time on synthetic data

Generated sequences of size n = 1000

Average letter occurences: from 2 to 4

k t (s)

d = 6 d = 8

50 0.06 0.07

60 0.06 0.06

70 0.07 0.08

80 0.09 0.09

90 0.10 0.12

100 0.12 0.16

110 0.13 0.26

120 0.18 1.62

130 0.21 30.42

18/21

Page 278: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Conclusion

19/21

Page 279: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Conclusion

Practical FPT algorithms for MCSP with association graph:

O(d2kn

)

O((3dk)kn

)

Open questions:

Extend algorithm to signed strings?

Practical running time with high number of duplications?

Constant-ratio approximation?

20/21

Page 280: Algorithms and Bioinformaticsigm.univ-mlv.fr/~bulteau/teaching/ens/slides1.pdf · Introduction Permutations Context and motivations Comparing genomes Genome rearrangements I Our problem:

MCSP Problem O(d2kn) FPT algorithm Implementation Conclusion

Conclusion

Practical FPT algorithms for MCSP with association graph:

O(d2kn

)

O((3dk)kn

)

Open questions:

Extend algorithm to signed strings?

Practical running time with high number of duplications?

Constant-ratio approximation?

20/21