29
1 Parameterized Pattern Matching by Boyer- Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541 - 550 Brenda S. Baker Advisor: Prof. R. C. T. Lee Speaker: Kuei-hao Chen

1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

Embed Size (px)

Citation preview

Page 1: 1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

1

Parameterized Pattern Matching by Boyer-Moore-type Algorithms

Proceedings of the 6th Annual ACM-SIAM Symposium on Discrete Al

gorithms, 1995, pp. 541 - 550  

Brenda S. Baker

Advisor: Prof. R. C. T. Lee

Speaker: Kuei-hao Chen

Page 2: 1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

2

Let us consider two strings:

A=a1a2a3a4a5=xaxby

B=b1b2b3b4b5=bacbc

If the edit distance concept is used, A may be transformed to B by substituting a1 by b1, a3 by b

3 and a5 by b5.

Page 3: 1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

3

In this paper, we define a new transformation in which a character may be substituted by another character. But the substitution is global. That is, if x in A is substituted by a, then every x in A is substituted by a.

Page 4: 1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

4

A=a1a2a3a4a5=xaxby

B=b1b2b3b4b5=bacbc

Consider the above example again. To transform A to B, the first x must be substituted by b. But this is global. Thus,

A’=babbyIt can be easily seen that if this kind of substitution is used, A=xaxby can not be transformed to B.

Page 5: 1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

5

For A=xaxby and B=babbc, A can be transformed to B by substituting x by b and y by c.

Page 6: 1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

6

We define bijection to be a global substitution of a set of distinct characters into another set characters.

A string P p-matches a string Q if P can be transformed to Q by a bijection.

Page 7: 1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

7

Let

A=ababc

B=bcbcd

Then A p-matches B because there is a bijection, namely which transforms A to B.

, , , dccbba

Page 8: 1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

8

On the other hand, for A=ababc and B=bcbdc, A does not p-match B.

It is actually easy to determine whether A p-matches B. Given A=a1a2… aN and B=b1b2…bN. A p-matches B if and only if for every i, if ai=x and bi=y, then if aj=x, bj must be y.

Page 9: 1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

9

For A=ababc and B=bcbcc. It can be seen that every a in A is matched with b and every b is matched c. This is not true for A=ababc and B=bcbdc.

Thus, given a string A and a string B which are of the same length, it is trivial to determine whether A p-matches B.

Page 10: 1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

10

There is another property which is important. If A p-matches B and B p-matches C, then A p-matches C. It is obvious that this is true.

Page 11: 1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

11

This paper considers the following problem:

Given a text T and a pattern P, find all occurrence where P p-matches a substring of T.

For example:

Let

and

We can see that P p-matches strings in T.

T=abcadbcbdabccacbd

P=abaecS1 S2

Page 12: 1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

12

For P=abaec and S2=cacbd, the substitution will transform P to S2.

For S2=cacbd and S1=bcbda, the substitution

transforms S2 to S1.

It can be seen that P=abaec will be transformed to S1=bcbda by

, , , , bedcabca

, , , , caaddbbc

. , , , cbacdeba

Page 13: 1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

13

The substitution can be visualized as follows:

S1 S2T

P

Page 14: 1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

14

This paper is based upon Good suffix rule 1 and Good suffix rule 2 proposed in Boyer and Moore Algorithm.

Page 15: 1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

15

Good Suffix Rule 1 for p-match

Let T1 be the largest suffix which p-matches with a suffix P1 of P. If there is a substring zP2 which is the right most one and p-matches with yP1 , and z≠y, we can move P as follows:

T1T

P

xwindow

P1yP2z

T1T

P

xwindow

P1yP2zshift

Page 16: 1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

16

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

T v v x x v v v v x x w v v w w

P u u u v v v w w v v1 2 3 4 5 6 7 8 9 10

Shift

Example

p-mismatch

P u u u v v v w w v v1 2 3 4 5 6 7 8 9 10

u u u x x x v v x xTransform

P’

Page 17: 1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

17

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

T v v x x v v v v x x w v v w w

P u u u v v v w w v v1 2 3 4 5 6 7 8 9 10

v v v x x w v v w wTransform

After moving, we compare T and P from right to left. We found out T6,15≡P1,10.

P’

Page 18: 1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

18

Good Suffix Rule 2 for p-match

T

P

xT1

yP1

'1T

'1P'

2P

'1P

Let T1 be the largest suffix of the window of P which p-matches with a suffix P1 of P.

Let be suffix of P1 which p-matches with a prefix P2 of P. If exists, we move P as follows:

'1P

T

P

xT1

'1T

'2Pshift

Page 19: 1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

19

1 2 3 4 5 6 7 8 9 10 11 12 13

T x x v v v v x x w v v w w

P u v v v w w v v1 2 3 4 5 6 7 8

Shift

p-mismatch

P u v v v w w v v3 4 5 6 7 8 9 10

u x x x v v x xTransform

P’

Example

Page 20: 1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

20

1 2 3 4 5 6 7 8 9 10 11 12 13

T x x v v v v x x w v v w w

P u v v v w w v v3 4 5 6 7 8 9 10

u x x x v v x xTransform

P’

Page 21: 1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

21

The shift function ∆ is

) and 2( rulesuffix Good

) and 1(0 rulesuffix Goodmaxmin

1,-1,

1,,1

mm

mjmj

PPmj

PPjm

Page 22: 1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

22

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

T G A T C G A T C A A T C A T A T C A T C A T

P A T C A C A T C A T C A1 2 3 4 5 6 7 8 9 10 11 12

Example

C A T C T C A T C A T CP’

AT

TC

CA

Transform

p-mismatch

j’=7 j=9

P A T C A C A T C A T C A1 2 3 4 5 6 7 8 9 10 11 12

Shift

Page 23: 1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

23

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

T G A T C G A T C A A T C A T A T C A T C A T

P A T C A C A T C A T C A1 2 3 4 5 6 7 8 9 10 11 12

AT

TC

CATransform

p-mismatch

j’=7 j=9

P A T C A C A T C A T C A1 2 3 4 5 6 7 8 9 10 11 12

Shift

C A T C T C A T C A T CP’

Page 24: 1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

24

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

T G A T C G A T C A A T C A T A T C A T C A T

P A T C A C A T C A T C A1 2 3 4 5 6 7 8 9 10 11 12

CT

AC

TATransform

T C A T A T C A T C A TP’

Page 25: 1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

25

Time Complexity• In average case, the preprocessing phase in O(m

log min(m, Π)) time and space complexity O(n) time complexity and searching phase in O(nlog min(m, Π)) .

Page 26: 1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

26

References

• [AFM94] Amihood Amir, Martin Farach, and S. Muthukrishnan, Alphabet dependence in parameterized matching. Info. Proc. Letters, Vol. 49, pp.111-115, 1994.

• [Bak] Brenda S. Baker, Parameterized pattern matching: algorithms and applications., J. Comput. Syst. Sci. to appear.

• [Bak92] Brenda S. Baker, A program for identifying duplicated code., In Computing Science and Statistics Vol.24: Proceeding of the 24th Symposium on the Interface, pp.49-57, 1992.

• [Bak93a] Brenda S. Baker, Parameterized duplication in strings: algorithms and an application to software maintenance., submitted for publication, 1993.

• [Bak93b] Brenda S. Baker, A theory of parameterized pattern matching: Algorithms and applications, In Proceedings of the 25th Annual Symposium on Theory of Computing, pp.71-80, pp.1993.

• [BM77] Robert S. Boyer and J. Strother Moore, A fast string searching algorithm, Commun. ACM,Vol.20, No.10, pp.762-772, 1977.

Page 27: 1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

27

References

• [BYGR90] Ricardo A. Baeza-Yates, Gaston H. Gonnet, and Mireille Regnier, Analysis of Boyer-Moore-type string searching algorithms. In Proc. of First Annual ACM-SIAM Symposium on Discrete Algorithms, pp.328-343, 1990.

• [BYR92] Ricardo A. Baeza-Yates and Mireille Regnier, Average running time of the Boyer-Moore-Horspool algorithm, Theoretical Computer Sci., Vol. 92, pp.19-31, 1992.

• [CLC+92] Maxime Crochemore, Thierry Lecroq, Artur Czumaj, Leszek Gasieniec, S. Jarominek, and W. Plandowski, Speeding up two string-matching algorithms, In 9th Annual Symposium on Theoretical Aspects of Computer Science, LNCS Vol.577, pp.589-600, 1992.

• [Col 91] Richard Cole. Tight bounds of the complexity of the Boyer-Moore string matching algorithm, In Proceedings of the Second Annual ACM-SIAM Symposium on Discrete Algorithms, pp.224-234, pp.1991.

• [Hor 80] R. Nigel Horspool. Practical fast searchingin strings. Soft. Pract. And Exp., Vol.10, pp.501-506, 1980.

Page 28: 1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

28

References

• [HS91] Andrew Hume and Daniel Sunday, Fast string search, Soft. Pract. And Exp., Vol. 21, No.11, pp.1221-1248, 1991.

• [IS94] Ramana M. Idury and Alejandro A. Schaffer. Multiple matching of parameterized patterns. In proc. Of 5th Symposium on Combinatorial Pattern Matching, pp.226-239, 1994.

• [KMP77] D. E. Knuth, J. H. Morries, and V. R. Pratt, Fast pattern matching in strings, SIAM J. Comput., Vol.6, No.2, pp.323-350, 1977.

• [Ryt80] Wojciech Rytter, A correct preprocessing algorithm for Boyer-Moore string-searching, SIAM J. Comput., Vol.9, No.3, pp.509-512, 1980.

• [Sch88] R. Schaback, On the expected sublinearity of the Boyer-Moore algorithm. SIAM J. on Comput., Vol. 17, No.4, pp.648-659, 1988.

• [Sun 90] Daniel M. Sunday, A very fast substring search algorithm, Commun. ACM, Vol.33, No.8, pp132-139, 1990

Page 29: 1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541

29

THANK YOU