12
UNIVERSIDADE ESTADUAL DE CAMPINAS - UNICAMP INSTITUTO DE COMPUTAÇÃO Cleber V. G. Mira Analysis of Sorting by Transpositions based on Algebraic Formalism RECOMB2004 João Meidanis

UNIVERSIDADE ESTADUAL DE CAMPINAS - UNICAMP INSTITUTO DE COMPUTAÇÃO Cleber V. G. Mira Analysis of Sorting by Transpositions based on Algebraic Formalism

Embed Size (px)

Citation preview

Page 1: UNIVERSIDADE ESTADUAL DE CAMPINAS - UNICAMP INSTITUTO DE COMPUTAÇÃO Cleber V. G. Mira Analysis of Sorting by Transpositions based on Algebraic Formalism

UNIVERSIDADE ESTADUAL DE CAMPINAS - UNICAMPINSTITUTO DE COMPUTAÇÃO

Cleber V. G. Mira

Analysis of Sorting by Transpositions based on Algebraic Formalism

RECOMB2004

João Meidanis

Page 2: UNIVERSIDADE ESTADUAL DE CAMPINAS - UNICAMP INSTITUTO DE COMPUTAÇÃO Cleber V. G. Mira Analysis of Sorting by Transpositions based on Algebraic Formalism

Genomes as Permutations

● Permutation● Genome

1 2n

12 n

( 1 2 ... n)(n ... 2 1)

Reverse Complementary Strands

Reverse Complementary Cycles

Page 3: UNIVERSIDADE ESTADUAL DE CAMPINAS - UNICAMP INSTITUTO DE COMPUTAÇÃO Cleber V. G. Mira Analysis of Sorting by Transpositions based on Algebraic Formalism

Working with Transpositions

● Since we are woking with transpositions, we will consider only one of the strands:

= ( 1 2 ... n)

● Circular order:

(i) = i+1 (n) = 1

1 2 ... k =

● Sorting by transpositions:

Page 4: UNIVERSIDADE ESTADUAL DE CAMPINAS - UNICAMP INSTITUTO DE COMPUTAÇÃO Cleber V. G. Mira Analysis of Sorting by Transpositions based on Algebraic Formalism

Product of Permutations

= ( 3 2 5 1)

= (6 4 2 )

E = {1, 2, 3, 4, 5, 6}

(1) = 1 (1) = 3

(3) = 3 (3) = 2 (2) = 6 (6) = 6 (6) = 4 (4) = 4

} (4) = 2 (2) = 5

(5) = 5 (5) = 1

= (1 3 2 6 4 5)

Page 5: UNIVERSIDADE ESTADUAL DE CAMPINAS - UNICAMP INSTITUTO DE COMPUTAÇÃO Cleber V. G. Mira Analysis of Sorting by Transpositions based on Algebraic Formalism

Applying a Transposition

(i j k) (1 ... i ... j-1 j ... k-1 k ... n) =

(1 ... i-1 j ... k-i i ... j-1 k ... n)

In the Algebraic approach:

(i, j, k) = (i j k)

(1, 4, 5) = (4 1 5) = ( 4 3 2 1 5 )

= (4 1 5) ( 4 3 2 1 5 ) = ( 1 4 3 2 5 )

Page 6: UNIVERSIDADE ESTADUAL DE CAMPINAS - UNICAMP INSTITUTO DE COMPUTAÇÃO Cleber V. G. Mira Analysis of Sorting by Transpositions based on Algebraic Formalism

2-cycle decomposition

● Every permutation has a 2-cycle decomposition.

= (4 3 2 1 5) = (4 3)(3 2)(2 1)(1 5)

● Odd cycles have an even number of 2-cycles in their 2-cycle decomposition.

● The norm of is the minimum number of cycles in the 2-cycle decomposition of .

Page 7: UNIVERSIDADE ESTADUAL DE CAMPINAS - UNICAMP INSTITUTO DE COMPUTAÇÃO Cleber V. G. Mira Analysis of Sorting by Transpositions based on Algebraic Formalism

3-cycle Decompositions

● Permutations whose norm is even has a minimum decomposition on 3-cycles.

● The 3-norm is the minimum number of cycles int 3-cycles decomposition of .

= (0 3 4 6 2 7 1 5 8) = (0 3 4)(4 6 2)(2 7 1)(1 5 8)

||3 = 4

Page 8: UNIVERSIDADE ESTADUAL DE CAMPINAS - UNICAMP INSTITUTO DE COMPUTAÇÃO Cleber V. G. Mira Analysis of Sorting by Transpositions based on Algebraic Formalism

Building a 3-cycle Decomposition

● It is possible to find a 3-cycle decomposition of through its 2-cycle decomposition.

= (0 3 4 6 2 7 1 5 8) = (0 3)(3 4)(4 6)(6 2)(2 7)(7 1)(1 5)(5 8)

= (0 3 4 6 2 7 1 5 8) = (0 3 4)(4 6 2)(2 7 1)(1 5 8)

(0 3)(3 4) = (0 3 4) (4 6)(6 2) = (4 6 2)

(2 7)(7 1) = (2 7 1) (1 5)(5 8) = (1 5 8)

Page 9: UNIVERSIDADE ESTADUAL DE CAMPINAS - UNICAMP INSTITUTO DE COMPUTAÇÃO Cleber V. G. Mira Analysis of Sorting by Transpositions based on Algebraic Formalism

Lower Bound

● The 3-norm of a permutation is lower bound for the transposition distance of .

1 2 ... k =

1 2 ... k = -1

k ≥ | -1 |3

Dt( , ) ≥ | -1 |

3

Page 10: UNIVERSIDADE ESTADUAL DE CAMPINAS - UNICAMP INSTITUTO DE COMPUTAÇÃO Cleber V. G. Mira Analysis of Sorting by Transpositions based on Algebraic Formalism

Splits● A split is a transposition which is not aplicable to

the genome , i.e. the product of this transposition and the genome is not a genome.

Ex.: (1 2 3) is not applicable to (0 3 4 6 2 7 1 5 8) since:

(1 2 3)(0 3 4 6 2 7 1 5 8) = (0 1 5 8)(2 7)(3 4 6)

Page 11: UNIVERSIDADE ESTADUAL DE CAMPINAS - UNICAMP INSTITUTO DE COMPUTAÇÃO Cleber V. G. Mira Analysis of Sorting by Transpositions based on Algebraic Formalism

Split+Transposition Distance

● If we consider the problem of sorting genomes by splits besides transpositions then the split+transposition distance of a genome to is:

Dst( , ) = | |

3

Page 12: UNIVERSIDADE ESTADUAL DE CAMPINAS - UNICAMP INSTITUTO DE COMPUTAÇÃO Cleber V. G. Mira Analysis of Sorting by Transpositions based on Algebraic Formalism

Bibliography

● V. Bafna and P. A. Pevzner, 1995, Sorting by Transpositions. In: “Proceedings of the Sixth Annual ACM-SIAM Symposium on Discrete Algorithms”, San Francisco, USA, pp. 614-623

● J. Meidanis and Z. Dias, 2000, An Alternative Algebraic Formalism for Genome Rearrangements. In: “Comparative Genomics:Empirical and Analytical Approaches to Gene Order Dynamics, Map Alignment and Evolution of Gene Families” D. Sankoff and J.H. Nadeau, editors, pp. 213-223