Pattern matching for permutations

  • Published on
    02-Jul-2016

  • View
    215

  • Download
    0

Embed Size (px)

Transcript

<ul><li><p>Information :z+ezssing </p><p>ELSEVIER Information Processing Letters 65 ( 1998) 277-283 </p><p>Pattern matching for permutations Prosenjit Bose a,2, Jonathan F. Buss b~3, Anna Lubiw b,* </p><p>a Department of Computer Science, McGill University, Montreal, Canada H3A 2A7 b Department of Computer Science, University of Waterloo, Waterloo, Canada, N2L 3GI </p><p>Received 26 January 1996 Communicated by F. Dehne </p><p>Abstract </p><p>Given a permutation T of 1 to n, and a permutation P of 1 to k, for k &lt; n, we wish to find a k-element subsequence of T whose elements are ordered according to the permutation P. For example, if P is ( 1,2, . . . , k), then we wish to find an increasing subsequence of length k in T; this special case was done in time O( n log log n) by Chang and Wang. We prove that the general problem is NP-complete. We give a polynomial time algorithm for the decision problem, and the corresponding counting problem, in the case that P is separable - i.e., contains neither the subpattem (3,1,4,2) nor its reverse, the subpattem (2,4, 1,3). @ 1998 Published by Elsevier Science B.V. </p><p>Keywords: Algorithms; Computational complexity; Pattern matching; Permutation; Separable permutation </p><p>1. Introduction </p><p>The pattern matching problem for permutations </p><p>is the following: We are given a permutation T = </p><p>(t1,t2,..* , tn) of 1 to n, which we call the text, and a permutation P = (~1, ~2, . . . , pk) of 1 to k, k &lt; n, which we call the pattern. We wish to know whether there is a length k subsequence of T, say </p><p>T = (fil,fi2, . . ..tik). with il &lt; i2 &lt; . &lt; ik, such that the elements of T are ordered according to the permutation P - i.e., tir &lt; ti, iff pr &lt; ps. If T does contain such a subsequence, we will say </p><p>that T contains P, or that P matches into T. In the special case when P is the identity, we are asking for an increasing subsequence of length k in T. This case can be solved in time 0( n logn) </p><p>[Il. Herb Wilf spoke at the 1992 SIAM Discrete Math </p><p>meeting about counting pattern matchings of permu- tations. He asked whether there is an algorithm to de- cide the pattern matching problem for permutations that runs faster than exponential time. </p><p>This note contains one negative and one positive result. We show that the general decision problem is NP-complete, and the counting problem is #P- complete. (See [5, Section 7.31, for the notion of #P-completeness.) We give a polynomial time algo- rithm for the decision and counting problems in case the oattern P contains neither the Dattern (3. 1.4.2) </p><p>* Corresponding author. Email: alubiw@uwaterloo.ca. Work supported in part by NSERC. A preliminary version ap- </p><p>peared in: E Dehne, J.-R. Sack, N. Santoro, S. Whitesides (Eds.), Algorithms and Data Structures, Lecture Notes in Computer Sci- ence, vol. 709, Springer, Berlin, 1993, pp. 200-209. </p><p>* Email: jit@muff.cs.mcgill.ca. 3 Email: jfbuss@uwaterloo.ca. </p><p>_ I nor the pattern (2,4,1,3). Such a pattern P is called separable, for reasons given in Section 3. </p><p>0020-0190/98/$19.00 @ 1998 Published by Elsevier Science B.V. All rights reserved PII s0020-0190(97)00209-3 </p></li><li><p>278 P: Bose et al. /Information Processing Letters 65 (1998) 277-283 </p><p>There has been considerable interest in counting the number of permutations T that do not contain a smaller permutation P. Knuth [ 8, Section 2.2.11 showed that the number of permutations of 1, . . , R without the pattern (3, 1,2) is equal to the nth Catalan number. Lovasz [ 9, p. 271 considered permutations without the pattern (2, 1,3). Rotem [ 111 considered permutations without either of the patterns (2,3, 1) or (3, 1,2). Simion and Schmidt [ 131 go further, counting the number of permutations avoiding all the patterns in any subset of the permutations on 1, 2, 3. </p><p>More recently, there have been results on 4 element patterns: West [ 161 showed that the number of sepa- rable n-permutations is the (II - 1) st Schrijder num- ber. He used a general technique of generating trees, which also allowed him to count the number of permu- tations that do not contain various pairs of 3- and/or 4-permutations [ 171. Further work has been done by Stankova [ 141. </p><p>There has also been interest in characterizing the permutations T that do not contain a smaller per- mutation P in terms of how T can be sorted. Knuth [ 8, Section 2.2.11, showed that permutations without the pattern (2,3,1) are precisely the permutations sortable using one stack. This idea of characterizing permutations sortable via some network of queues and stacks was pursued further by Tarjan [ 151, Even and Itai [4], and Pratt [ lo]. An analogous characterization of separable permutations will be given. </p><p>2. The NP-completeness result </p><p>Theorem 1. The pattern matching problem for per- mutations is NP-complete. </p><p>Proof. We will reduce 3-SATISFIABILITY to this problem. Suppose we have an instance of 3-SATIS- FIABILITY with variables ~1,. . . , ZQ and clauses </p><p>Cl,. . . , cm where each clause Ci is the disjunction of three literals, and each literal is a variable or the negation of a variable. Cl </p><p>We will first construct a pattern and a text with repeated elements, and later show how to eliminate the need for the repetitions. When the pattern has some element repeated, then copies of the repeated </p><p>element must match to the copies of some repeated element in the text. In other words, using the notation of the introduction, we add the condition that ti, = ti, iff pr = ps. For example, the pattern ( 1,3, 1,2) matches into the text ( 1,3,2,6,3,4,5) (in particular, it matches to (3,6,3,4) ) but does not match into the text (1,4,2,3). </p><p>The pattern P will have a substring (i.e., contiguous subsequence) PV for the variables, and a substring PC, for each of the clauses. Similarly, the text T will have a substring TV for the variables, and a substring T, for each clause. We will join these substrings of P and of T one after another, with guards in-between to ensure that elements of one of the substrings of the pattern can only match to elements of the corresponding substring of the text. The construction of these guards will be described later. </p><p>Construct PV = (1,2,. . . ,I + l), and TV = (2,1,4,3,6,5,. ..,21,21 - 1,2Z + 1). Note that a match of PV into TV will force i in PV to match with either 2i or 2i - 1 in TV, for 1 6 i 6 1. The former will correspond with setting the variable ui TRUE, and the latter with setting it FALSE. Also, 1 + 1 in PV must match with 21 + 1 in TV; these values will separate the variables from other values in the other substrings. </p><p>Consider the clause Ci containing variables Ui, , uh, Uij in either positive or negated form, with il &lt; i2 &lt; is.ConstructPc,=(Z+i+l,it,i2,i3,Z+i+l).For j = 1,2,3, let tj be 2ij if variable ui, appears in Ci positively, and 2ij - 1 otherwise; and let fj be the other of the two values (i.e., fj is 2ij - 1 if variable ui, appears in Ci positively, and 2ij otherwise). Let bi = 2Zt 7 (i - 1) + 1. There are seven truth assign- ments to the variables that make the clause true; these assignments correspond to the seven sequences si,i = </p><p>(t1,t2,t3), si,2 = (tlvt29f3), si,3 = (t1,f2,13). si,4 = </p><p>(tlyfzvf3), Si,5 = (f13t2,t3), si,6= (fl,tz,f3), and si,7 = (fl, f2, t3). The text string Tc, comprises these seven sequences, each bracketed by a pair of equal values; i.e., Tc, is the sequence </p><p>(bi+ l,si,t,bi+ l,bi+2,Si,2,bi+2,bi+3, </p><p>si,39 bi + 3,. . . 3 bi +7,Si,7,bi +7). </p><p>For example, consider the clauses Ct = ( u1 vu2VUq) and Cz = (I&amp; V 23 V us) over variables (~1, . . _ , us}. The corresponding patterns and text strings are </p></li><li><p>l? Bose et al. /Information Processing Letters 65 (1998) 277-283 219 </p><p>Pv=(1,2,3,4,5,6), </p><p>T=(2,1,4,3,6,5,8,7,10,9,11), </p><p>PC, = (7,1,2,4,7), </p><p>Tc,=(12,2,4,8,12,13,2,4,7,13,14,2, </p><p>3,8,14,15,2,3,7,15,16,1,4,8, </p><p>16,17,1,4,7,17,18,1,3,8,18), </p><p>Pc,=(8,2,3,5,8), </p><p>Tc2 = ( 19,4,6,10,19,20,4,6,9,20,21,4, </p><p>5,10,21,22,4,5,9,22,23,3,6,10, </p><p>23,24,3,6,9,24,25,3,5,10,25). </p><p>In any matching of PC, into Tc, that respects a match- ing of Pv into Tv, the occurrences of 1 + i + 1 must match with one of the bracket pairs, and thus (it, i2, i3) must be one of the sequences Si,k. </p><p>Claim 2. The set of clauses {Ci} is satisjable if and only if there is a simultaneous matching of Pv into Tv and each PC, into Tc8. </p><p>Proof. A matching of Pv into Tv corresponds to a truth-value assignment to the variables. By the above construction, this matching can be extended to a matching of PC, into Tc, if and only if the correspond- ing assignment satisfies C;. Cl </p><p>Two issues remain: how to put guards between the segments of P and of T to ensure that they remain separate; and how to eliminate the need for repeated elements in P and T. </p><p>We consider the guards first. Recall that m is the number of clauses and 1 the number of variables. Let P be the sequence (1 + m + 2, Pv, t + m + 2, PC,, I+m+2,Pc,,1+m+2,. . . ,Z+m+2,Pc,), followed by 3m + 1 copies of 1 + m + 2. Let T be the sequence (21+7m + 2,Tv,21+ 7m +2,Tc,,21+7m+ ~,Tc,, 21+ 7m + 2,. . . , Tc,, ) , followed by 3m + 1 copies of 21+ 7m + 2. Observe that any matching of P into T must match the element 1+ m + 2 of P to the element 21+ 7m + 2 of T, because there are 4m + 2 copies of 1+ m + 2 in P, and no value except 21-t 7m + 2 appears more than 4m + 1 times in T. Since there are the same number of occurrences of Z+ m + 2 in P as there are of 2tf 7m + 2 in T, they do perform as guards, ensuring that Pv matches into Tv and each PC, matches into Tc,. </p><p>Finally, let us turn to the question of eliminating re- peated elements. It is easier to describe the construc- tion if we allow the pattern and the next to each be a sequence of distinct numbers, not necessarily integers. Then to obtain a permutation, simply replace each el- ement by its ordinal. </p><p>Instead of using a singleton in Pv for each vari- able, which must map to one of two possible values in Tv, we will use a pair in Pv which must map to one of two possible pairs in Tv. Occurrences of the vari- able in clauses will then be modelled as values inter- mediate to the corresponding pair, thus avoiding the need to duplicate values. Construct Pv = ( 1,2, . . . , 2Z- 1,21,2l+ l&gt;,andTv= (3,4,1,2,7,8,5,6 ,..., 4Z-1,4Z-3,4Z-2,4Z+l).ObservethatifP~matches into Tv, then the pair 2i - 1, 2i must match with either 4i - 1,4i or 4i - 3,4i - 2, which we will interpret as setting variable u; TRUE or FALSE respectively. </p><p>Consider a clause Ci containing variables ui,, ui2, Ui3 in either positive or negated form, with il &lt; i2 &lt; i3. Construct PC; = (21 + 2i + 1, &amp;,I, Ui,2, Ui,3,21 + 2i), where Ui,_/ is a number strictly between 2ij - 1 and 2ij, larger than all values previously chosen in that interval For j = 1,2,3, let 6 be the open interval (4ij - 1,4ij) if variable Uij appears in Ci positively, and the interval (4ij - 3,4ij - 2) otherwise; and let </p><p>fli be the other of the two intervals. Let tj,l, tj,2, tj,3 and tj,4 be values in the interval ?k and let fj,t, fj,2 </p><p>and fj,3 be values in the interval fj; all these values are to be distinct from each other and larger than all values previously chosen in their respective intervals. Let bi=4Z+ 14(i- 1) + 1. </p><p>For the seven satisfying assignments of Ci we choose the seven sequences si,t = (tl.1, t2.1, t3,1), </p><p>~2 = (t1.2, t2,2, f3,1), si,3 = (t1,3, f2,1, t3.21, . . ., and Si.7 = ( f 1,3, f2,3, t3,4). The text segment Tci is the sequence </p><p>(bi + 2, si,l T bi + 1, bt + 4, si.2, bi + 3, bi + 6, </p><p>Si,s~bi+5~... ,bi+ 14,Si,T,bi+ 13). </p><p>In any matching of PC, into Tc, that respects a match- ing of Pv into Tv, the occurrences of 21+ 2i + 1 and 21+ 2i must match with one of the bracket pairs, and thus (il, i2, i3) must be one of the sequences Si,k, </p><p>Claim 3. The set of clauses { Ci} is satisfiable @there is a matching of Pv into Tv and each Pc, into Tc,. </p></li><li><p>280 I? Bose et al. /Information Processing Letters 65 (1998) 277-283 </p><p>Separating Tree of Permutation: 34128567 </p><p>Permutation Graph </p><p>Positive Node </p><p>165 4 3 3 41 285 6 7 </p><p>Fig. 1. </p><p>Proof. It remains to implement the guards without the use of duplicate elements. Note that the parts of P so far, (Pv,Pc,,.. . , PC, ) , form a permutation of 1, . . . ,21+ 5m + 1, and the corresponding parts of T form a permutation of 1, . . . ,4Z + 35m + 1. Augment P to a permutation of 1, . . . ,4Z + 36m + 3 by inserting the decreasing subsequence of 21 + 31m + 2 values 41+ 36m + 3,. . . ,2Z + 5m + 2, using the first m + 1 values as guards - the first before Pv, the second before PC,, etc. - and putting the remaining 21 + 31m + 2 values at the end of P. Similarly, augment T to a permutation of 1,. . . ,6Z + 66m + 3 by inserting the decreasing subsequence of 2E + 31m + 2 values 61+ 66m+3,. . . ,41+ 35m + 2, using the first m + 1 values as guards, and putting the remaining 21 + 31m + 2 values at the end of T. </p><p>If the last element of P maps to the last element of T, then, since there are an equal number of larger elements in P as in T, these large elements act as guards. If the last element of P maps to a different one of the large trailing elements of T, then there are insufficiently many larger elements in T to ac- commodate the larger elements of P. The final pos- sibility is that the last element of P maps to one of the smaller 4Z+ 35m + 1 values of T. But then the trailing 2Zf 31m + 2 values of T cannot be used in the match, leaving only 41 + 36m + 2 values - not enough to match P, which has 41 + 36m + 3 ele- ments. 0 </p><p>Corollary 4. Counting the number of matchings of a pattern into a text is #P-complete. </p><p>Proof. In the reduction above, the matchings of P into T are in one-to-one correspondence with the satisfying assignments of the original formula. 0 </p><p>3. A good algorithm for separable permutations </p><p>A pattern P is said to be separable if it contains neither the subpattern (3, 1,4,2) nor the subpattern (2,4, 1,3). We begin with a lemma that clarifies the sense of the term separable, and a discussion of the relationship between separable permutations and a class of graphs called Pd-free graphs. </p><p>3.1. Separable permutations </p><p>A separating tree for a permutation (~1, . _ . , pk) of 1,. . . , k is an ordered binary tree 7 with leaves (~1, . . . ,pk) in that order, such that for each node V, if the leaves of the subtree rooted at V are pi,pi+l, . . . ,pi+j, then the set of num- </p><p>bers {PivPi+t, *. . ,pi+j} is a subrange of the range 1 li </p><p>..I k - i.e., is of the form {Z,Z + 1,. . . ,E + j}, 1, 1 + j &lt; k. This subrange is called the range of </p><p>the node V. If V is a node of the tree with left child VL and right child VR, then the above condition implies that either the range of V just precedes the range of VR, or vice versa. In the first case V is called a positive node, and in the second case a negative node. </p><p>Lemma 5. A pattern P = (~1, . . . , pk) is separable iff it has a separating tree. </p><p>The separating tree need to be unique. For example, the permutation (4,5,3,1,2) has the two separating </p><p>trees (((4,5),3), (1,2)) and ((4,5), (3, (1,2))). See Fig. 1 for an example of a separating tree. </p><p>A graph-theoretic formulation of this result is well- kn...</p></li></ul>