174
String Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011

String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

  • Upload
    others

  • View
    28

  • Download
    1

Embed Size (px)

Citation preview

Page 1: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

String Matching Algorithms

Antonio Carzaniga

Faculty of InformaticsUniversità della Svizzera italiana

December 22, 2011

Page 2: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Outline

Problem definition

Naïve algorithm

Knuth-Morris-Pratt algorithm

Boyer-Moore algorithm

Page 3: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Problem

Page 4: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Problem

Given the text

Nel mezzo del cammin di nostra vitami ritrovai per una selva oscurache la dritta via era smarrita. . .

Find the string “trova”

Page 5: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Problem

Given the text

Nel mezzo del cammin di nostra vitami ritrovai per una selva oscurache la dritta via era smarrita. . .

Find the string “trova”

A more challenging example: How many times does the string“110011” appear in the following text

001111010101101001100011010111101101011101101110010010101010111110111101100001011011000010111111011110011000011111000100100101001011101110101101111010100110010100101110010000111111100100110111010110100110011011101001010010101000010100111110

Page 6: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Problem

Given the text

Nel mezzo del cammin di nostra vitami ritrovai per una selva oscurache la dritta via era smarrita. . .

Find the string “trova”

A more challenging example: How many times does the string“110011” appear in the following text

001111010101101001100011010111101101011101101110010010101010111110111101100001011011000010111111011110011000011111000100100101001011101110101101111010100110010100101110010000111111100100110111010110100110011011101001010010101000010100111110

Page 7: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

String Matching: Definitions

Given a text T

T ∈ Σ∗: finite alphabet Σ

|T | = n: the length of T is n

Page 8: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

String Matching: Definitions

Given a text T

T ∈ Σ∗: finite alphabet Σ

|T | = n: the length of T is n

Given a pattern P P ∈ Σ

∗: same finite alphabet Σ |P| = m: the length of P ism

Page 9: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

String Matching: Definitions

Given a text T

T ∈ Σ∗: finite alphabet Σ

|T | = n: the length of T is n

Given a pattern P P ∈ Σ

∗: same finite alphabet Σ |P| = m: the length of P ism

Both T and P can be modeled as arrays T[1 . . . n] and P[1 . . .m]

Page 10: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

String Matching: Definitions

Given a text T

T ∈ Σ∗: finite alphabet Σ

|T | = n: the length of T is n

Given a pattern P P ∈ Σ

∗: same finite alphabet Σ |P| = m: the length of P ism

Both T and P can be modeled as arrays T[1 . . . n] and P[1 . . .m]

Pattern P occurs with shift s in T iff 0 ≤ s ≤ n −m T[s + i] = P[i] for all positions 1 ≤ i ≤ m

Page 11: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Example

Problem: find all s such that

0 ≤ s ≤ n −m

T[s + i] = P[i] for 1 ≤ i ≤ m

Page 12: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Example

Problem: find all s such that

0 ≤ s ≤ n −m

T[s + i] = P[i] for 1 ≤ i ≤ m

n = 14

T a b c a a b a a b a b a c a

Page 13: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Example

Problem: find all s such that

0 ≤ s ≤ n −m

T[s + i] = P[i] for 1 ≤ i ≤ m

n = 14

T a b c a a b a a b a b a c a

P

m = 3

a b a

Result

Page 14: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Example

Problem: find all s such that

0 ≤ s ≤ n −m

T[s + i] = P[i] for 1 ≤ i ≤ m

n = 14

T a b c a a b a a b a b a c a

Ps = 4

a b a

Results = 4

Page 15: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Example

Problem: find all s such that

0 ≤ s ≤ n −m

T[s + i] = P[i] for 1 ≤ i ≤ m

n = 14

T a b c a a b a a b a b a c a

Ps = 7

a b a

Results = 4s = 7

Page 16: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Example

Problem: find all s such that

0 ≤ s ≤ n −m

T[s + i] = P[i] for 1 ≤ i ≤ m

n = 14

T a b c a a b a a b a b a c a

Ps = 9

a b a

Results = 4s = 7s = 9

Page 17: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Naïve Algorithm

Page 18: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Naïve Algorithm

For each position s in 0 . . . n −m, see if T[s + i] = P[i] for all1 ≤ i ≤ m

Page 19: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Naïve Algorithm

For each position s in 0 . . . n −m, see if T[s + i] = P[i] for all1 ≤ i ≤ m

NAIVE-STRING-MATCHING(T, P)

1 n = length(T)2 m = length(P)3 for s = 0 to n −m4 if SUBSTRING-AT(T, P, s)5 OUTPUT(s)

Page 20: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Naïve Algorithm

For each position s in 0 . . . n −m, see if T[s + i] = P[i] for all1 ≤ i ≤ m

NAIVE-STRING-MATCHING(T, P)

1 n = length(T)2 m = length(P)3 for s = 0 to n −m4 if SUBSTRING-AT(T, P, s)5 OUTPUT(s)

SUBSTRING-AT(T, P, s)

1 for i = 1 to length(P)2 if T[s + i] , P[i]3 return FALSE

4 return TRUE

Page 21: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Complexity of the Naïve Algorithm

Page 22: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Complexity of the Naïve Algorithm

Complexity of NAIVE-STRING-MATCH is O((n −m + 1)m)

Page 23: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Complexity of the Naïve Algorithm

Complexity of NAIVE-STRING-MATCH is O((n −m + 1)m)

Worst case example

T = an, P = am

i.e.,

T =

n︷ ︸︸ ︷

aa · · · a, P =

m︷ ︸︸ ︷

aa · · · a

Page 24: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Complexity of the Naïve Algorithm

Complexity of NAIVE-STRING-MATCH is O((n −m + 1)m)

Worst case example

T = an, P = am

i.e.,

T =

n︷ ︸︸ ︷

aa · · · a, P =

m︷ ︸︸ ︷

aa · · · a

So, (n −m + 1)m is a tight bound, so the (worst-case)complexity of NAIVE-STRING-MATCH is

Θ((n −m + 1)m)

Page 25: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy

Page 26: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy

Observation

T a b c a a b a a b a b a c a

P a b a

Page 27: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy

Observation

T a b c a a b a a b a b a c a

P a b a

=

Page 28: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy

Observation

T a b c a a b a a b a b a c a

P a b a

= =

Page 29: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy

Observation

T a b c a a b a a b a b a c a

P a b a

= = ,

Page 30: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy

Observation

T a b c a a b a a b a b a c a

P a b a

= = ,

What now?

Page 31: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy

Observation

T a b c a a b a a b a b a c a

P a b a

= = ,

What now?

the naïve algorithm goes back to the second position in T andstarts from the beginning of P

Page 32: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy

Observation

T a b c a a b a a b a b a c a

P a b a

= = ,

What now?

the naïve algorithm goes back to the second position in T andstarts from the beginning of P

can’t we simply move along through T?

Page 33: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy

Observation

T a b c a a b a a b a b a c a

P a b a

= = ,

What now?

the naïve algorithm goes back to the second position in T andstarts from the beginning of P

can’t we simply move along through T?

why?

Page 34: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (2)

Page 35: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (2)

Here’s a wrong but insightful strategy

Page 36: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (2)

Here’s a wrong but insightful strategy

WRONG-STRING-MATCHING(T, P)

1 n = length(T)2 m = length(P)3 q = 0 // number of characters matched in P4 s = 15 while s ≤ n6 s = s + 17 if T[s] == P[q + 1]8 q = q + 19 if q == m10 OUTPUT(s −m)11 q = 012 else q = 0

Page 37: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (3)

Example run ofWRONG-STRING-MATCHING

Page 38: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (3)

Example run ofWRONG-STRING-MATCHING

T p a g l i a i o b a g o r d o

P a g o

Page 39: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (3)

Example run ofWRONG-STRING-MATCHING

T p a g l i a i o b a g o r d o

P a g o

s

q+1

Page 40: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (3)

Example run ofWRONG-STRING-MATCHING

T p a g l i a i o b a g o r d o

P a g o

s

q+1

Page 41: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (3)

Example run ofWRONG-STRING-MATCHING

T p a g l i a i o b a g o r d o

P a g o

s

q+1

Page 42: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (3)

Example run ofWRONG-STRING-MATCHING

T p a g l i a i o b a g o r d o

P a g o

s

q+1

Page 43: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (3)

Example run ofWRONG-STRING-MATCHING

T p a g l i a i o b a g o r d o

P a g o

s

q+1

Page 44: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (3)

Example run ofWRONG-STRING-MATCHING

T p a g l i a i o b a g o r d o

P a g o

s

q+1

Page 45: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (3)

Example run ofWRONG-STRING-MATCHING

T p a g l i a i o b a g o r d o

P a g o

s

q+1

Page 46: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (3)

Example run ofWRONG-STRING-MATCHING

T p a g l i a i o b a g o r d o

P a g o

s

q+1

Page 47: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (3)

Example run ofWRONG-STRING-MATCHING

T p a g l i a i o b a g o r d o

P a g o

s

q+1

Page 48: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (3)

Example run ofWRONG-STRING-MATCHING

T p a g l i a i o b a g o r d o

P a g o

s

q+1

Page 49: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (3)

Example run ofWRONG-STRING-MATCHING

T p a g l i a i o b a g o r d o

P a g o

s

q+1

Page 50: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (3)

Example run ofWRONG-STRING-MATCHING

T p a g l i a i o b a g o r d o

P a g o

s

q+1

Page 51: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (3)

Example run ofWRONG-STRING-MATCHING

T p a g l i a i o b a g o r d o

P a g o

s

q+1

Page 52: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (3)

Example run ofWRONG-STRING-MATCHING

T p a g l i a i o b a g o r d o

P a g o

s

q+1

Page 53: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (3)

Example run ofWRONG-STRING-MATCHING

T p a g l i a i o b a g o r d o

P a g o

s

q+1

Page 54: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (3)

Example run ofWRONG-STRING-MATCHING

T p a g l i a i o b a g o r d o

P a g o

s

Page 55: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (3)

Example run ofWRONG-STRING-MATCHING

T p a g l i a i o b a g o r d o

P a g o

s

q+1

Output: 10

Page 56: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (3)

Example run ofWRONG-STRING-MATCHING

T p a g l i a i o b a g o r d o

P a g o Output: 10

s

q+1

Page 57: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (3)

Example run ofWRONG-STRING-MATCHING

T p a g l i a i o b a g o r d o

P a g o Output: 10

s

q+1

Page 58: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (3)

Example run ofWRONG-STRING-MATCHING

T p a g l i a i o b a g o r d o

P a g o Output: 10

Done. Perfect!

Page 59: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (3)

Example run ofWRONG-STRING-MATCHING

T p a g l i a i o b a g o r d o

P a g o Output: 10

Done. Perfect!

Complexity: Θ(n)

Page 60: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (4)

What is wrong withWRONG-STRING-MATCHING?

Page 61: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (4)

What is wrong withWRONG-STRING-MATCHING?

T a a b a a a b a b a b a c a

P a a b

Page 62: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (4)

What is wrong withWRONG-STRING-MATCHING?

T a a b a a a b a b a b a c a

P a a b

s

q+1

Page 63: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (4)

What is wrong withWRONG-STRING-MATCHING?

T a a b a a a b a b a b a c a

P a a b

s

q+1

Page 64: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (4)

What is wrong withWRONG-STRING-MATCHING?

T a a b a a a b a b a b a c a

P a a b

s

q+1

Page 65: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (4)

What is wrong withWRONG-STRING-MATCHING?

T a a b a a a b a b a b a c a

P a a b

OUTPUT(0)

s

Page 66: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (4)

What is wrong withWRONG-STRING-MATCHING?

T a a b a a a b a b a b a c a

P a a b

s

q+1

Page 67: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (4)

What is wrong withWRONG-STRING-MATCHING?

T a a b a a a b a b a b a c a

P a a b

s

q+1

Page 68: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (4)

What is wrong withWRONG-STRING-MATCHING?

T a a b a a a b a b a b a c a

P a a b

s

q+1

Page 69: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (4)

What is wrong withWRONG-STRING-MATCHING?

T a a b a a a b a b a b a c a

P a a b

s

q+1

Page 70: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (4)

What is wrong withWRONG-STRING-MATCHING?

T a a b a a a b a b a b a c a

P a a b

s

q+1

Page 71: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (4)

What is wrong withWRONG-STRING-MATCHING?

T a a b a a a b a b a b a c a

P a a b

s

q+1

Page 72: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (4)

What is wrong withWRONG-STRING-MATCHING?

T a a b a a a b a b a b a c a

P a a b

s

q+1

Page 73: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (4)

What is wrong withWRONG-STRING-MATCHING?

T a a b a a a b a b a b a c a

P a a b

s

q+1

missed!

Page 74: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (4)

What is wrong withWRONG-STRING-MATCHING?

T a a b a a a b a b a b a c a

P a a b

s

q+1

missed!

SoWRONG-STRING-MATCHING doesn’t work, but it tells ussomething useful

Page 75: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (5)

Where didWRONG-STRING-MATCHING go wrong?

Page 76: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (5)

Where didWRONG-STRING-MATCHING go wrong?

T a a b a a a b a b a b a c a

P a a b

Page 77: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (5)

Where didWRONG-STRING-MATCHING go wrong?

T a a b a a a b a b a b a c a

P a a b

s

q+1

Page 78: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (5)

Where didWRONG-STRING-MATCHING go wrong?

T a a b a a a b a b a b a c a

P a a b

s

q+1

Page 79: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (5)

Where didWRONG-STRING-MATCHING go wrong?

T a a b a a a b a b a b a c a

P a a b

s

q+1

Page 80: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (5)

Where didWRONG-STRING-MATCHING go wrong?

T a a b a a a b a b a b a c a

P a a b

s

q+1

Page 81: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (5)

Where didWRONG-STRING-MATCHING go wrong?

T a a b a a a b a b a b a c a

P a a b

s

q+1

Wrong: by going all the way back to q = 0 we throw away agood prefix of P that we already matched

Page 82: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (6)

Another example

Page 83: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (6)

Another example

T a b a b a b a c b a c b c a

P a b a b a c

Page 84: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (6)

Another example

T a b a b a b a c b a c b c a

P a b a b a c

s

q+1

Page 85: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (6)

Another example

T a b a b a b a c b a c b c a

P a b a b a c

s

q+1

Page 86: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (6)

Another example

T a b a b a b a c b a c b c a

P a b a b a c

s

q+1

Page 87: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (6)

Another example

T a b a b a b a c b a c b c a

P a b a b a c

s

q+1

Page 88: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (6)

Another example

T a b a b a b a c b a c b c a

P a b a b a c

s

q+1

Page 89: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (6)

Another example

T a b a b a b a c b a c b c a

P a b a b a c

s

q+1

Page 90: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (6)

Another example

T a b a b a b a c b a c b c a

P a b a b a c

s

q+1

We have matched “ababa”

Page 91: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (6)

Another example

T a b a b a b a c b a c b c a

P a b a b a c

s

q+1

We have matched “ababa” suffix “aba” can be reused as a prefix

Page 92: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (6)

Another example

T a b a b a b a c b a c b c a

P a b a b a c

s

q+1

We have matched “ababa” suffix “aba” can be reused as a prefix

Page 93: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (6)

Another example

T a b a b a b a c b a c b c a

P a b a b a c

s

q+1

We have matched “ababa” suffix “aba” can be reused as a prefix

Page 94: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (6)

Another example

T a b a b a b a c b a c b c a

P a b a b a c

s

q+1

We have matched “ababa” suffix “aba” can be reused as a prefix

Page 95: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (6)

Another example

T a b a b a b a c b a c b c a

P a b a b a c

s

q+1

We have matched “ababa” suffix “aba” can be reused as a prefix

Page 96: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Improvement Strategy (6)

Another example

T a b a b a b a c b a c b c a

P a b a b a c

OUTPUT(2)

We have matched “ababa” suffix “aba” can be reused as a prefix

Page 97: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

New Strategy

P[1 . . . q] is the prefix of Pmatched so far

Page 98: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

New Strategy

P[1 . . . q] is the prefix of Pmatched so far

Find the longest prefix of P that is also a suffix of P[2 . . . q]

i.e., find 0 ≤ π < q such that P[q − π + 1 . . . q] = P[1 . . . π] π = 0 means that such a prefix does not exist

Page 99: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

New Strategy

P[1 . . . q] is the prefix of Pmatched so far

Find the longest prefix of P that is also a suffix of P[2 . . . q]

i.e., find 0 ≤ π < q such that P[q − π + 1 . . . q] = P[1 . . . π] π = 0 means that such a prefix does not exist

P a b a b a c

Page 100: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

New Strategy

P[1 . . . q] is the prefix of Pmatched so far

Find the longest prefix of P that is also a suffix of P[2 . . . q]

i.e., find 0 ≤ π < q such that P[q − π + 1 . . . q] = P[1 . . . π] π = 0 means that such a prefix does not exist

P a b a b a c

q+1

Page 101: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

New Strategy

P[1 . . . q] is the prefix of Pmatched so far

Find the longest prefix of P that is also a suffix of P[2 . . . q]

i.e., find 0 ≤ π < q such that P[q − π + 1 . . . q] = P[1 . . . π] π = 0 means that such a prefix does not exist

P a b a b a c

q+1π = 3

Page 102: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

New Strategy

P[1 . . . q] is the prefix of Pmatched so far

Find the longest prefix of P that is also a suffix of P[2 . . . q]

i.e., find 0 ≤ π < q such that P[q − π + 1 . . . q] = P[1 . . . π] π = 0 means that such a prefix does not exist

P a b a b a c

q+1π = 3

Page 103: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

New Strategy

P[1 . . . q] is the prefix of Pmatched so far

Find the longest prefix of P that is also a suffix of P[2 . . . q]

i.e., find 0 ≤ π < q such that P[q − π + 1 . . . q] = P[1 . . . π] π = 0 means that such a prefix does not exist

P a b a b a c

q+1

Restart from q = π

Page 104: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

New Strategy

P[1 . . . q] is the prefix of Pmatched so far

Find the longest prefix of P that is also a suffix of P[2 . . . q]

i.e., find 0 ≤ π < q such that P[q − π + 1 . . . q] = P[1 . . . π] π = 0 means that such a prefix does not exist

P a b a b a c

q+1

Restart from q = π

Iterate as usual

Page 105: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

New Strategy

P[1 . . . q] is the prefix of Pmatched so far

Find the longest prefix of P that is also a suffix of P[2 . . . q]

i.e., find 0 ≤ π < q such that P[q − π + 1 . . . q] = P[1 . . . π] π = 0 means that such a prefix does not exist

P a b a b a c

q+1

Restart from q = π

Iterate as usual

In essence, this is the Knuth-Morris-Pratt algorithm

Page 106: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

The Prefix Function

Given a pattern prefix P[1 . . . q], the longest prefix of P that isalso a suffix of P[2 . . . q] depends only on P and q

Page 107: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

The Prefix Function

Given a pattern prefix P[1 . . . q], the longest prefix of P that isalso a suffix of P[2 . . . q] depends only on P and q

This prefix is identified by its length π(q)

Page 108: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

The Prefix Function

Given a pattern prefix P[1 . . . q], the longest prefix of P that isalso a suffix of P[2 . . . q] depends only on P and q

This prefix is identified by its length π(q)

Because π(q) depends only on P (and q), π can be computedat the beginning by PREFIX-FUNCTION

we represent π as an array of lengthm

Page 109: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

The Prefix Function

Given a pattern prefix P[1 . . . q], the longest prefix of P that isalso a suffix of P[2 . . . q] depends only on P and q

This prefix is identified by its length π(q)

Because π(q) depends only on P (and q), π can be computedat the beginning by PREFIX-FUNCTION

we represent π as an array of lengthm

Example

P a b a b a c

Page 110: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

The Prefix Function

Given a pattern prefix P[1 . . . q], the longest prefix of P that isalso a suffix of P[2 . . . q] depends only on P and q

This prefix is identified by its length π(q)

Because π(q) depends only on P (and q), π can be computedat the beginning by PREFIX-FUNCTION

we represent π as an array of lengthm

Example

P a b a b a c

π 0 0 1 2 3 0

Page 111: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

The Knuth-Morris-Pratt Algorithm

KMP-STRING-MATCHING(T, P)

1 n = length(T)2 m = length(P)3 π = PREFIX-FUNCTION(P)4 q = 0 // number of character matched5 for i = 1 to n // scan the text left-to-right6 while q > 0 and P[q + 1] , T[i]7 q = π[q] // no match: go back using π8 if P[q + 1] == T[i]9 q = q + 110 if q == m11 OUTPUT(i −m)12 q = π[q] // go back for the next match

Page 112: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Prefix Function Algorithm

Page 113: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Prefix Function Algorithm

Computing the prefix function amounts to finding all theoccurrences of a pattern P in itself

Page 114: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Prefix Function Algorithm

Computing the prefix function amounts to finding all theoccurrences of a pattern P in itself

In fact, PREFIX-FUNCTION is remarkably similar toKMP-STRING-MATCHING

PREFIX-FUNCTION(P)

1 m = length(P)2 π[1] = 03 k = 04 for q = 2 tom5 while k > 0 and P[k + 1] , P[q]6 k = π[k]7 if P[k + 1] == P[q]8 k = k + 19 π[q] = k

Page 115: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Prefix Function at Work

PREFIX-FUNCTION(P)

1 m = length(P)2 π[1] = 03 k = 04 for q = 2 to m5 while k > 0 and P[k + 1] , P[q]6 k = π[k]7 if P[k + 1] == P[q]8 k = k + 19 π[q] = k

P a b a b a c

π

Page 116: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Prefix Function at Work

PREFIX-FUNCTION(P)

1 m = length(P)2 π[1] = 03 k = 04 for q = 2 to m5 while k > 0 and P[k + 1] , P[q]6 k = π[k]7 if P[k + 1] == P[q]8 k = k + 19 π[q] = k

P a b a b a c

π 0

Page 117: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Prefix Function at Work

PREFIX-FUNCTION(P)

1 m = length(P)2 π[1] = 03 k = 04 for q = 2 to m5 while k > 0 and P[k + 1] , P[q]6 k = π[k]7 if P[k + 1] == P[q]8 k = k + 19 π[q] = k

P a b a b a c

π 0

q

k+1

Page 118: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Prefix Function at Work

PREFIX-FUNCTION(P)

1 m = length(P)2 π[1] = 03 k = 04 for q = 2 to m5 while k > 0 and P[k + 1] , P[q]6 k = π[k]7 if P[k + 1] == P[q]8 k = k + 19 π[q] = k

P a b a b a c

π 0

q

k+1

0

Page 119: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Prefix Function at Work

PREFIX-FUNCTION(P)

1 m = length(P)2 π[1] = 03 k = 04 for q = 2 to m5 while k > 0 and P[k + 1] , P[q]6 k = π[k]7 if P[k + 1] == P[q]8 k = k + 19 π[q] = k

P a b a b a c

π 0

k+1

0

q

Page 120: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Prefix Function at Work

PREFIX-FUNCTION(P)

1 m = length(P)2 π[1] = 03 k = 04 for q = 2 to m5 while k > 0 and P[k + 1] , P[q]6 k = π[k]7 if P[k + 1] == P[q]8 k = k + 19 π[q] = k

P a b a b a c

π 0 0

q

k+1

Page 121: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Prefix Function at Work

PREFIX-FUNCTION(P)

1 m = length(P)2 π[1] = 03 k = 04 for q = 2 to m5 while k > 0 and P[k + 1] , P[q]6 k = π[k]7 if P[k + 1] == P[q]8 k = k + 19 π[q] = k

P a b a b a c

π 0 0

q

k+1

1

Page 122: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Prefix Function at Work

PREFIX-FUNCTION(P)

1 m = length(P)2 π[1] = 03 k = 04 for q = 2 to m5 while k > 0 and P[k + 1] , P[q]6 k = π[k]7 if P[k + 1] == P[q]8 k = k + 19 π[q] = k

P a b a b a c

π 0 0

k+1

1

q

Page 123: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Prefix Function at Work

PREFIX-FUNCTION(P)

1 m = length(P)2 π[1] = 03 k = 04 for q = 2 to m5 while k > 0 and P[k + 1] , P[q]6 k = π[k]7 if P[k + 1] == P[q]8 k = k + 19 π[q] = k

P a b a b a c

π 0 0 1

q

k+1

Page 124: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Prefix Function at Work

PREFIX-FUNCTION(P)

1 m = length(P)2 π[1] = 03 k = 04 for q = 2 to m5 while k > 0 and P[k + 1] , P[q]6 k = π[k]7 if P[k + 1] == P[q]8 k = k + 19 π[q] = k

P a b a b a c

π 0 0 1

q

k+1

2

Page 125: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Prefix Function at Work

PREFIX-FUNCTION(P)

1 m = length(P)2 π[1] = 03 k = 04 for q = 2 to m5 while k > 0 and P[k + 1] , P[q]6 k = π[k]7 if P[k + 1] == P[q]8 k = k + 19 π[q] = k

P a b a b a c

π 0 0 1

k+1

2

q

Page 126: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Prefix Function at Work

PREFIX-FUNCTION(P)

1 m = length(P)2 π[1] = 03 k = 04 for q = 2 to m5 while k > 0 and P[k + 1] , P[q]6 k = π[k]7 if P[k + 1] == P[q]8 k = k + 19 π[q] = k

P a b a b a c

π 0 0 1 2

q

k+1

Page 127: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Prefix Function at Work

PREFIX-FUNCTION(P)

1 m = length(P)2 π[1] = 03 k = 04 for q = 2 to m5 while k > 0 and P[k + 1] , P[q]6 k = π[k]7 if P[k + 1] == P[q]8 k = k + 19 π[q] = k

P a b a b a c

π 0 0 1 2

q

k+1

3

Page 128: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Prefix Function at Work

PREFIX-FUNCTION(P)

1 m = length(P)2 π[1] = 03 k = 04 for q = 2 to m5 while k > 0 and P[k + 1] , P[q]6 k = π[k]7 if P[k + 1] == P[q]8 k = k + 19 π[q] = k

P a b a b a c

π 0 0 1 2

k+1

3

q

Page 129: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Prefix Function at Work

PREFIX-FUNCTION(P)

1 m = length(P)2 π[1] = 03 k = 04 for q = 2 to m5 while k > 0 and P[k + 1] , P[q]6 k = π[k]7 if P[k + 1] == P[q]8 k = k + 19 π[q] = k

P a b a b a c

π 0 0 1 2 3

q

k+1

Page 130: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Prefix Function at Work

PREFIX-FUNCTION(P)

1 m = length(P)2 π[1] = 03 k = 04 for q = 2 to m5 while k > 0 and P[k + 1] , P[q]6 k = π[k]7 if P[k + 1] == P[q]8 k = k + 19 π[q] = k

P a b a b a c

π 0 0 1 2 3

q

k+1

Page 131: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Prefix Function at Work

PREFIX-FUNCTION(P)

1 m = length(P)2 π[1] = 03 k = 04 for q = 2 to m5 while k > 0 and P[k + 1] , P[q]6 k = π[k]7 if P[k + 1] == P[q]8 k = k + 19 π[q] = k

P a b a b a c

π 0 0 1 2 3

q

k+1

0

Page 132: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Complexity of KMP

Page 133: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Complexity of KMP

O(n) for the search phase

Page 134: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Complexity of KMP

O(n) for the search phase

O(m) for the pre-processing of the pattern

Page 135: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Complexity of KMP

O(n) for the search phase

O(m) for the pre-processing of the pattern

The complexity analysis is non-trivial

Page 136: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Complexity of KMP

O(n) for the search phase

O(m) for the pre-processing of the pattern

The complexity analysis is non-trivial

Can we do better?

Page 137: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Comments on KMP

Knuth-Morris-Pratt is Ω(n)

KMP will always go through at least n character comparisons

it fixes our “wrong” algorithm in the case of periodic patternsand texts

Page 138: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Comments on KMP

Knuth-Morris-Pratt is Ω(n)

KMP will always go through at least n character comparisons

it fixes our “wrong” algorithm in the case of periodic patternsand texts

Perhaps there’s another algorithm that works better on theaverage case

e.g., in the absence of periodic patterns

Page 139: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

Page 140: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

e x a m p l e

Page 141: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

e x a m p l e

We match the pattern right-to-left

Page 142: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

e x a m p l e

We match the pattern right-to-left

If we find a bad character α in the text, we can shift

so that the pattern skips α , if α is not in the pattern

Page 143: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

e x a m p l e

We match the pattern right-to-left

If we find a bad character α in the text, we can shift

so that the pattern skips α , if α is not in the pattern

Page 144: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

e x a m p l e

We match the pattern right-to-left

If we find a bad character α in the text, we can shift

so that the pattern skips α , if α is not in the pattern

so that the pattern lines up with the rightmost occurrence of αin the pattern, if the pattern contains α

Page 145: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

e x a m p l e

We match the pattern right-to-left

If we find a bad character α in the text, we can shift

so that the pattern skips α , if α is not in the pattern

so that the pattern lines up with the rightmost occurrence of αin the pattern, if the pattern contains α

Page 146: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

e x a m p l e

We match the pattern right-to-left

If we find a bad character α in the text, we can shift

so that the pattern skips α , if α is not in the pattern

so that the pattern lines up with the rightmost occurrence of αin the pattern, if the pattern contains α

Page 147: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

e x a m p l e

We match the pattern right-to-left

If we find a bad character α in the text, we can shift

so that the pattern skips α , if α is not in the pattern

so that the pattern lines up with the rightmost occurrence of αin the pattern, if the pattern contains α

Page 148: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

e x a m p l e

We match the pattern right-to-left

If we find a bad character α in the text, we can shift

so that the pattern skips α , if α is not in the pattern

so that the pattern lines up with the rightmost occurrence of αin the pattern, if the pattern contains α

Page 149: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

e x a m p l e

We match the pattern right-to-left

If we find a bad character α in the text, we can shift

so that the pattern skips α , if α is not in the pattern

so that the pattern lines up with the rightmost occurrence of αin the pattern, if the pattern contains α

Page 150: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

e x a m p l e

We match the pattern right-to-left

If we find a bad character α in the text, we can shift

so that the pattern skips α , if α is not in the pattern

so that the pattern lines up with the rightmost occurrence of αin the pattern, if the pattern contains α

Page 151: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

e x a m p l e

We match the pattern right-to-left

If we find a bad character α in the text, we can shift

so that the pattern skips α , if α is not in the pattern

so that the pattern lines up with the rightmost occurrence of αin the pattern, if the pattern contains α

Page 152: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

e x a m p l e

We match the pattern right-to-left

If we find a bad character α in the text, we can shift

so that the pattern skips α , if α is not in the pattern

so that the pattern lines up with the rightmost occurrence of αin the pattern, if the pattern contains α

Page 153: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

e x a m p l e

We match the pattern right-to-left

If we find a bad character α in the text, we can shift

so that the pattern skips α , if α is not in the pattern

so that the pattern lines up with the rightmost occurrence of αin the pattern, if the pattern contains α

Page 154: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

e x a m p l e

We match the pattern right-to-left

If we find a bad character α in the text, we can shift

so that the pattern skips α , if α is not in the pattern

so that the pattern lines up with the rightmost occurrence of αin the pattern, if the pattern contains α

Page 155: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

e x a m p l e

We match the pattern right-to-left

If we find a bad character α in the text, we can shift

so that the pattern skips α , if α is not in the pattern

so that the pattern lines up with the rightmost occurrence of αin the pattern, if the pattern contains α

Page 156: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

e x a m p l e

We match the pattern right-to-left

If we find a bad character α in the text, we can shift

so that the pattern skips α , if α is not in the pattern

so that the pattern lines up with the rightmost occurrence of αin the pattern, if the pattern contains α

so that a pattern prefix lines up with a suffix of the currentpartial (or complete) match

Page 157: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

e x a m p l e

We match the pattern right-to-left

If we find a bad character α in the text, we can shift

so that the pattern skips α , if α is not in the pattern

so that the pattern lines up with the rightmost occurrence of αin the pattern, if the pattern contains α

so that a pattern prefix lines up with a suffix of the currentpartial (or complete) match

Page 158: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

e x a m p l e

We match the pattern right-to-left

If we find a bad character α in the text, we can shift

so that the pattern skips α , if α is not in the pattern

so that the pattern lines up with the rightmost occurrence of αin the pattern, if the pattern contains α

so that a pattern prefix lines up with a suffix of the currentpartial (or complete) match

Page 159: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

e x a m p l e

We match the pattern right-to-left

If we find a bad character α in the text, we can shift

so that the pattern skips α , if α is not in the pattern

so that the pattern lines up with the rightmost occurrence of αin the pattern, if the pattern contains α

so that a pattern prefix lines up with a suffix of the currentpartial (or complete) match

Page 160: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

e x a m p l e

We match the pattern right-to-left

If we find a bad character α in the text, we can shift

so that the pattern skips α , if α is not in the pattern

so that the pattern lines up with the rightmost occurrence of αin the pattern, if the pattern contains α

so that a pattern prefix lines up with a suffix of the currentpartial (or complete) match

Page 161: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

e x a m p l e

We match the pattern right-to-left

If we find a bad character α in the text, we can shift

so that the pattern skips α , if α is not in the pattern

so that the pattern lines up with the rightmost occurrence of αin the pattern, if the pattern contains α

so that a pattern prefix lines up with a suffix of the currentpartial (or complete) match

Page 162: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

e x a m p l e

We match the pattern right-to-left

If we find a bad character α in the text, we can shift

so that the pattern skips α , if α is not in the pattern

so that the pattern lines up with the rightmost occurrence of αin the pattern, if the pattern contains α

so that a pattern prefix lines up with a suffix of the currentpartial (or complete) match

Page 163: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

e x a m p l e

We match the pattern right-to-left

If we find a bad character α in the text, we can shift

so that the pattern skips α , if α is not in the pattern

so that the pattern lines up with the rightmost occurrence of αin the pattern, if the pattern contains α

so that a pattern prefix lines up with a suffix of the currentpartial (or complete) match

Page 164: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

e x a m p l e

We match the pattern right-to-left

If we find a bad character α in the text, we can shift

so that the pattern skips α , if α is not in the pattern

so that the pattern lines up with the rightmost occurrence of αin the pattern, if the pattern contains α

so that a pattern prefix lines up with a suffix of the currentpartial (or complete) match

Page 165: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

e x a m p l e

We match the pattern right-to-left

If we find a bad character α in the text, we can shift

so that the pattern skips α , if α is not in the pattern

so that the pattern lines up with the rightmost occurrence of αin the pattern, if the pattern contains α

so that a pattern prefix lines up with a suffix of the currentpartial (or complete) match

Page 166: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

e x a m p l e

We match the pattern right-to-left

If we find a bad character α in the text, we can shift

so that the pattern skips α , if α is not in the pattern

so that the pattern lines up with the rightmost occurrence of αin the pattern, if the pattern contains α

so that a pattern prefix lines up with a suffix of the currentpartial (or complete) match

Page 167: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

e x a m p l e

We match the pattern right-to-left

If we find a bad character α in the text, we can shift

so that the pattern skips α , if α is not in the pattern

so that the pattern lines up with the rightmost occurrence of αin the pattern, if the pattern contains α

so that a pattern prefix lines up with a suffix of the currentpartial (or complete) match

Page 168: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

A New Strategy

h e r e i s a s i m p l e e x a m p l e

e x a m p l e

We match the pattern right-to-left

If we find a bad character α in the text, we can shift

so that the pattern skips α , if α is not in the pattern

so that the pattern lines up with the rightmost occurrence of αin the pattern, if the pattern contains α

so that a pattern prefix lines up with a suffix of the currentpartial (or complete) match

In essence, this is the Boyer-Moore algorithm

Page 169: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Comments on Boyer-Moore

Page 170: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Comments on Boyer-Moore

Like KMP, Boyer-Moore includes a pre-processing phase

Page 171: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Comments on Boyer-Moore

Like KMP, Boyer-Moore includes a pre-processing phase

The pre-processing is O(m)

Page 172: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Comments on Boyer-Moore

Like KMP, Boyer-Moore includes a pre-processing phase

The pre-processing is O(m)

The search phase is O(nm)

Page 173: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Comments on Boyer-Moore

Like KMP, Boyer-Moore includes a pre-processing phase

The pre-processing is O(m)

The search phase is O(nm)

The search phase can be as low as O(n/m) in common cases

Page 174: String Matching Algorithms - USI InformaticsString Matching Algorithms Antonio Carzaniga Faculty of Informatics Università della Svizzera italiana December 22, 2011. Outline Problem

Comments on Boyer-Moore

Like KMP, Boyer-Moore includes a pre-processing phase

The pre-processing is O(m)

The search phase is O(nm)

The search phase can be as low as O(n/m) in common cases

In practice, Boyer-Moore is the fastest string-matchingalgorithm for most applications