Whole mirror duplication-random loss model and pattern avoiding permutations

  • Published on
    26-Jun-2016

  • View
    214

  • Download
    2

Embed Size (px)

Transcript

  • Information Processing Letters 110 (2010) 474480

    Contents lists available at ScienceDirect

    ce

    com/

    W s mp

    Je

    LE ce

    a

    ArReReAcAvCommunicated by A.A. Bertossi

    KeAlCoPaWGeGeBin

    y theavoidodelavoif dup

    1.

    inlogemgelegamsethbichdiritpa

    *re

    00doywords:gorithmsmbinatorial problemsttern avoiding permutationhole duplication-random loss modelnomenerating algorithmary reected Gray code

    length n. We provide two ecient algorithms to reconstitute a possible scenario of wholemirror duplications from identity to any permutation of length n. One of them uses thewell-known binary reected Gray code (Gray, 1953) [10]. Other relative models are alsoconsidered.

    2010 Elsevier B.V. All rights reserved.

    Introduction and notation

    The well-known genome duplication consists in copy-g a part of the original genome inserted into itself, fol-wed by the loss of one copy of each of the duplicatednes (see [2,9,11,14,17] for an explanation of differentethods of duplication). From a formal point of view, anome of n genes is represented by a permutation ofngth n. In a previous article, Chaudhuri et al. [7] investi-ted a variant called the tandem duplication-random lossodel: the duplicated part (of size K ) of the genome is in-rted immediately after the original portion, followed bye loss procedure. This model comes from evolutionaryology where it has been applied to the vertebrate mito-ondrial genomes. Chaudhuri et al. introduce a notion ofstance between two genomes and they provide an algo-hm to compute it eciently for certain regions of therameter space. Bouvel and Rossin [5] have also studied

    Corresponding author.E-mail addresses: barjl@u-bourgogne.fr (J.-L. Baril),

    mi.vernay@u-bourgogne.fr (R. Vernay).

    this model. They proved that the class of permutations ob-tained from the identity after p steps (of width K ) is alsoa class of pattern avoiding permutations. More particularly,they investigate the restricted case of a whole duplication(W-duplication for short): the whole duplication consistsin copying entirely the permutation on its right and theloss procedure consists to delete one of the two copies ofeach gene. Here, we give an example of the process of aW-duplication followed by the loss procedure on the per-mutation 123456:

    123456 1 2 3 4 5 61 2 3 4 5 6 duplication

    1 2 3 4 5 6 1 2 3 4 5 6 loss procedure

    124635.So, they prove that the permutations obtained after p W-duplications is the class of permutations avoiding all min-imal permutations with 2p descents, minimal in the senseof pattern-involvement relation on permutations. More-over, they computed the number of duplication-loss steps

    20-0190/$ see front matter 2010 Elsevier B.V. All rights reserved.i:10.1016/j.ipl.2010.04.016Information Pro

    www.elsevier.

    hole mirror duplication-random losermutations

    an-Luc Baril , Rmi Vernay2I UMR-CNRS 5158, Universit de Bourgogne, B.P. 47 870, 21078 Dijon Cedex, Fran

    r t i c l e i n f o a b s t r a c t

    ticle history:ceived 29 June 2009ceived in revised form 10 February 2010cepted 20 April 2010ailable online 21 April 2010

    In this paper we studin terms of patternobtained with this mclass of permutationscompute the number ossing Letters

    locate/ipl

    odel and pattern avoiding

    problem of the whole mirror duplication-random loss modeling permutations. We prove that the class of permutationsafter a given number p of duplications of the identity is theding the alternating permutations of length 2p + 1. We alsolications necessary and sucient to obtain any permutation of

  • J.-L. Baril, R. Vernay / Information Processing Letters 110 (2010) 474480 475

    The e

    oftianse

    rainloinchth

    54

    Lepose seapsc(rstinrunoserehasisalaleths jmstWi.e

    cunu

    m[nnopeouvamenstann

    peisthco14ththanleStisthave C . We now formulate a remark that is crucial fort be extended in a longer run up (resp. run down) in the

    quence. We refer the reader to Rodney and Wilf [6] forsults concerning the enumeration of permutations thatve a given number of runs up and down. For instance, if

    = 5467312 then position 4 is a descent, the substring 46a run up, and 467 is a maximal run up. Note that 5 isso a maximal run up. A valley of s is a substring which isrun down followed by a run up each of the length of atast two, i.e. a substring sksk+1 . . . s , 1 k < n, suchat there is j (k < j < ) verifying sk > sk+1 > > s j and< s j+1 < < s . A valley is maximal if the substring isaximal for this property. In the above example, the sub-rings 5467 and 7312 are the only maximal valleys of s.e denote by val(s) the number of maximal valleys in s,. the cardinality of the set { j, s j < min{s j1, s j+1}}. A re-

    the present study.

    Remark 1. If a class C of permutations is stable for thenC is also a class of pattern avoiding permutations of basisB = { / C, with = , C}.

    The paper is organized as follow. In Section 2, we provethat the class of permutations obtained from the identityafter a given number p of whole mirror duplications isthe class of permutations with at most 2p1 1 valleys.This is the class of permutations avoiding the alternatingpermutations of length 2p + 1. Moreover, we obtain thelength of a shortest path between any permutation and theidentity. In Section 3, we yield two algorithms (and theirFig. 1. A WM-duplication of the permutation 5421673.

    width K necessary and sucient to obtain any permuta-on. More recently, Bouvel and Pergola [4] showed a locald simpler characterization and several properties of thet of minimal permutations with 2p descents.In this article, we focus on the whole mirror duplication-

    ndom loss model (WM-duplication for short): it consistscopying the mirror of the permutation on its right fol-wed by the loss procedure. This model very likely occursone half of eubacterial genomes, and possibly in mostromosomes [8,15]. Fig. 1 illustrates a WM-duplication fore permutation = 5421673:

    21673 5 4 2 1 6 7 3 3 7 6 1 2 4 5 mirror duplication

    5 4 2 1 6 7 3 3 7 6 1 2 4 5 loss procedure

    5463712.

    t Sn denote the set of n-length sequences s = s1s2 . . . sn ofsitive integers. The mirror of s is s = snsn1 . . . s1. A sub-quence of s is a sequence si1 si2 . . . sim for 1 i1 < i2 si+1). A run up (resp. a run down) of s is a sub-ring (of length at least one) in which the elements are increasing order (resp. decreasing order). More generally, an up (or run down) is said to be maximal when it can-ncircled points are deleted by the loss procedure.

    rsive formula enumerating permutations with a givenmber of valleys can be found in [16].On the other hand, a sequence s of length n is a per-

    utation whenever each si is a distinct member of the] = {1,2, . . . ,n}. In the sequel, permutations will be de-ted by Greek letters: ,, , . . . . Let Sn be the set of allrmutations of length n (n 1). In relation to the previ-s denition, any permutation contains at most n12 lleys. A permutation Sn is alternating if 1 > 2 4 < 5 > . In the literature [1], alternating per-utations are also called down-up permutations and areumerated by the Euler numbers (A000111 [19]). For in-ance, the permutation 324165 is alternating. Notice thatalternating permutation of length n contains exactly

    12 valleys.A permutation of length k, k n, is a pattern of armutation Sn if there is a subsequence of whichorder-isomorphic to ; i.e., if there is a subsequence

    i1i2 . . . ik of (with 1 i1 < i2 < < ik n) such thati < im whenever < m . We write to denoteat is a pattern of . A permutation that does notntain as a pattern is said to avoid . For example, =23 contains the patterns 132, 312 and 123; but avoidse pattern 321. The class of all permutations avoidinge patterns 1,2, . . . ,k is denoted S(1,2, . . . ,k),d Sn(1,2, . . . ,k) denotes the set of permutations ofngth n avoiding 1,2, . . . and k . We also say that(1,2, . . . ,k) is a class of pattern-avoiding permuta-ons of basis {1,2, . . . ,k}. A class C of permutations isable for if, for any C , for any , then we also

  • 476 J.-L. Baril, R. Vernay / Information Processing Letters 110 (2010) 474480

    Fi

    cothkngiW

    2.du

    loesteav

    Le

    PrthfrothMurdoleInnu

    Legiva

    PrfonotaduaduwThsuisantiothle

    Thidcla

    PranbeWmloantativruteanocktioaltioemceininruinuorpeenFidumm1d2st

    titdedu

    Coof1)fro

    Pr2p

    orobcobe

    togig. 2. The three non-isomorphic congurations in the proof of Lemma 1.

    mplexity) which construct such a shortest path. One ofem uses an ecient algorithm for generating the well-own binary reected Gray code [3,10]. In Section 4, weve some results about other models of duplications using- and WM-duplications.

    Pattern avoiding permutations and the mirrorplication-random loss model

    In this section we study the WM-duplication randomss model in terms of pattern avoiding permutations. Wetablish that the class C(p) obtained from the identity af-r p WM-duplications is exactly the class of permutationsoiding all alternating permutations of length 2p + 1.

    mma 1. Let and be two different permutations such that . Then contains at least many valleys as does.

    oof. It is sucient to check the result for Sn and Sn1 since a straightforward induction will completee proof. Let Sn and Sn1 such that , thenis order-isomorphic to a subsequence of obtainedm by deleting only one entry of . We distinguishree non-isomorphic congurations illustrated in Fig. 2.ore precisely, if we delete the encircled value in cong-ation (b), this reduces the number of valleys of . Thises not occur for (a) and (c) where the number of val-ys remains invariant after deletion of the encircled value.all cases, a deletion of value in can not increase thember of valleys. mma 2. A permutation obtained from the identity after aven number p of WM-duplications contains at most 2p1 1lleys.

    oof. We obtain the proof by induction. The result holdsr p = 1; indeed a mirror duplication of the identity can-t create a valley. Now, let us assume that each permu-tion obtained from the identity after (p 1) mirrorplications contains at most 2p2 1 valleys. Let bepermutation obtained from the identity after p mirrorplications. Then is obtained from a permutation ith 2p2 1 valleys after exactly one mirror duplication.erefore can be written as the concatenation of twobsequences of and : i.e. = where (resp. )a subsequence of (resp. ). According to Lemma 1, d contains at most 2p21 valleys. As the concatena-n of and can (eventually) create one valley betweenem, contains at most 2 (2p2 1)+1 = 2p1 1 val-ys which achieves the induction. eorem 1. The class C(p) of permutations obtained from theentity after a given number p of mirror duplications is thess of permutations with at most 2p1 1 valleys.

    oof. After considering Lemma 2, it suces to prove thaty permutation with at most 2p1 1 valleys canobtained from the identity after p mirror duplications.

    e proceed by induction on p. Indeed, let be a per-utation with k valleys, 2p2 1 < k 2p1 1. Thencan be written = where corresponds to the

    ngest prex of containing exactly 2p2 1 valleysd the remaining sux. We decompose the permu-tion = u1d1u2d2 . . .ud , where ui and di are respec-ely runs up and down dened as follows: u1 is the rstn up excepted its top value; d1 the run down just af-r u1; u2 the run up just after d1 excepted its top value,d so on. Notice that u1 can be empty which does notcur for d . Remark that we have = 2p2 and thus< 2. For example, = 5421673 has the decomposi-n: u1 is empty, d1 = 5421, u2 = 6 and d2 = 73. Letso u+1d+1 . . .ukdk . . .u2d2 be the similar decomposi-n for . In this decomposition, the runs ui and di arepty for i > k. Now let us perform the following pro-ss: we sort in decreasing order the values that appeard or u+1 which creates a run down D; we sort increasing order the values in u or d+1 which creates an up U; we construct the sequence S = UD . We sorta decreasing order sequence D1 the values in d1 or+2, and so on. At each step j, we insert the obtaineddered sequence U j and D j at the beginning of S . Thermutation S = U1D1 . . .U1D1UD obtained at thed of this process contains at most 2p2 1 valleys. Seeg. 3 for an example of construction of S . Thus, by in-ction, S can be obtained from the identity after (p 1)irror duplications. By construction is reached by oneirror duplication of S . Indeed, from any Ui (resp. Di), i , we can reconstitute the corresponding ui andi+1 (resp. di and u2i+1). Therefore can be con-ructed from the identity with p mirror duplications.Notice that the permutation is decomposed in a par-ion into runs up and down u j and d j . We will use thiscomposition in Section 3.2 to reconstitute a path of WM-plications from the identity to . rollary 1. Let be a permutation and val( ) the numberits valleys. In the mirror duplication model, log2(val( ) + + 1 steps are necessary and sucient in order to obtain m the identity permutation.

    oof. Let p be the integer such that 2p2 1 < val( )1 1, i.e. p = log2(val( )+ 1)+ 1. According to The-em 1, p = log2(val( ) + 1) + 1 steps are sucient totain from the identity. It is also necessary since ntains at least 2p2 valleys which means that cannotobtained from the identity in (p 1) steps at most. In the next part we will provide algorithms in orderreconstitute a shortest path between the identity and aven permutation.

  • J.-L. Baril, R. Vernay / Information Processing Letters 110 (2010) 474480 477

    Fi ss in tu1 5, d4Th

    Thgiav

    Prduanmtipemm2p

    ifAle

    3152

    for the class of permutations having at most p valleys. No-tifo

    1

    wti

    Thistio

    Pr

    3.

    viloWrecuribygiintuthmmal

    capapl

    3.ce that, with [16,12,13], a bivariate generating functionr this class is:

    1

    y(1 1

    y+ 1

    y

    y 1

    tan(x

    y 1+ arctan(

    1y 1

    ))),

    here the coecient of xn yk is the number of permuta-ons of length n with at most k valleys.

    eorem 3. The class of permutations having at most p valleysthe class of permutations avoiding the alternating permuta-ns of length 2p + 3.

    oof. The proof is obtained mutatis mutandis.

    Here we explain how we can establish an algorithmin order to construct a scenario of WM-duplications fromidentity to Sn by reconstructing this path from ,i.e. at each step we nd the predecessor of the currentpermutation and the process nishes when the identityis reached. We partition the permutation as follow: = u1d1u2d2 . . .ukdk where u1 = 1 . . . i1 is the maximalrun up containing the rst entry 1; dk is the maximalrun down containing n; d1 is the maximal run downcontaining i1+1; uk is the maximal run up just beforedk , i.e. the value just before d

    k in belongs to u

    k; we

    continue this process by alternating runs up and down un-til the permutation is entirely partitioned. Notice thatthis decomposition is not the same as we have done inthe proof of Theorem 1. For i from 1 to = k+12 , wedene by Ui (resp. Di ) the increasing (resp. decreasing)order sorting of ui and d

    ki+1 (resp. d

    i and u

    ki+1). Let

    = U1D1U2D2 . . .UD be the concatenation of all sortedg. 3. Decomposition = and the permutation S obtained after the proce= 10, d1 = 13 9 6, u2 is empty, d2 =...

Recommended

View more >