Transcript
Page 1: String Matching Problem.ppt

7/27/2019 String Matching Problem.ppt

http://slidepdf.com/reader/full/string-matching-problemppt 1/16

String Matching Problem

• Given a text string T of length n and a pattern string  P of length m, the exact stringmatching problem is to find all occurrences

of  P in T .• Example: T=“AGCTTGA” P=“GCT” 

• Applications:

 – Searching keywords in a file – Searching engines (like Google and Openfind)

 – Database searching (GenBank)

Page 2: String Matching Problem.ppt

7/27/2019 String Matching Problem.ppt

http://slidepdf.com/reader/full/string-matching-problemppt 2/16

• Problem/issue

Finding occurrence of a pattern (string)

„P‟ in String „S‟ and also finding theposition in „S‟ where the pattern match

occurs

What is pattern matching?

Page 3: String Matching Problem.ppt

7/27/2019 String Matching Problem.ppt

http://slidepdf.com/reader/full/string-matching-problemppt 3/16

Brute Force algorithm

The brute-force pattern matching algorithm comparesthe pattern P with the text T  for each possible shift of P  

relative to T  ,

*until either a match is found, or 

*all placements of the pattern have been tried

Page 4: String Matching Problem.ppt

7/27/2019 String Matching Problem.ppt

http://slidepdf.com/reader/full/string-matching-problemppt 4/16

Brute-force

• Worst O(m*n) 

• Best O(n)

algorithm brute-force:

input: an array of characters, T (the string to be analyzed) , length n

an array of characters, P (the pattern to be searched for), length m

for  i := 0 to n-m do

for  j := 0 to m-1 do

compare T[j] with P[i+j]if not equal, exit the inner loop

Page 5: String Matching Problem.ppt

7/27/2019 String Matching Problem.ppt

http://slidepdf.com/reader/full/string-matching-problemppt 5/16

Compare each character of P with S if 

match continue else shift one position

String S

a

b

a

a

a

b

c

a

b

a

a

b

c

a

b

a

c

Pattern p

Example

Page 6: String Matching Problem.ppt

7/27/2019 String Matching Problem.ppt

http://slidepdf.com/reader/full/string-matching-problemppt 6/16

Step 1:compare p[1] with S[1]

a

b

c

a

b

a

a

b

c

a

b

a

c

a

b

a

a

Step 2: compare p[2] with S[2]

a

b

c

a

b

a

a

b

c

a

b

a

c

a

b

a

a

S

p

p

Page 7: String Matching Problem.ppt

7/27/2019 String Matching Problem.ppt

http://slidepdf.com/reader/full/string-matching-problemppt 7/16

Step 3: compare p[3] with S[3]

S a b c a b a a b c a b a c

a b a ap

 Mismatch occurs here..

“Since mismatch is detected, shift ‘P’ one position to the Right andperform steps analogous to those from step 1 to step 3. At position where mismatch is detected, shift ‘P’ one position to the right andrepeat matching procedure. “ 

Page 8: String Matching Problem.ppt

7/27/2019 String Matching Problem.ppt

http://slidepdf.com/reader/full/string-matching-problemppt 8/16

The Knuth-Morris-Pratt Algorithm

Knuth, Morris and Pratt proposed a linear time algorithm for the string matchingproblem.

 A matching time of O(n) is achieved byavoiding comparisons with elements of  „S‟ that have previously been involved incomparison with some element of thepattern „p‟ to be matched. i.e.,backtracking on the string „S‟ never occurs

Page 9: String Matching Problem.ppt

7/27/2019 String Matching Problem.ppt

http://slidepdf.com/reader/full/string-matching-problemppt 9/16

Components of KMP algorithm

• The prefix function, Π 

The prefix function,Π for a pattern encapsulatesknowledge about how the pattern matches against shiftsof itself. This information can be used to avoid uselessshifts of the pattern „p‟. In other words, this enablesavoiding backtracking on the string „S‟. 

• The KMP Matcher 

With string „S‟, pattern „p‟ and prefix function „Π‟ asinputs, finds the occurrence of „p‟ in „S‟ and returns the

number of shifts of „p‟ after which occurrence is found.

Page 10: String Matching Problem.ppt

7/27/2019 String Matching Problem.ppt

http://slidepdf.com/reader/full/string-matching-problemppt 10/16

• Knuth-Morris-Pratt algorithm-Algorithm

Compute-Prefix-Function(P )

1. m length[T ]

2. [1]  0

3. k   0

4. for  q 2 to m 5. do while k > 0 and P [k + 1]  P [q]

6. do k  [k ] /*if k = 0 or P [k + 1]

= P [q],

7. if  P [k + 1] = P [q] going out of thewhile-loop.*/ 

8. then k  k + 1 

9. [q]  k 

10.return  

Page 11: String Matching Problem.ppt

7/27/2019 String Matching Problem.ppt

http://slidepdf.com/reader/full/string-matching-problemppt 11/16

• Knuth-Morris-Pratt algorithm-Algorithm

KMP-Matcher(T , P )

1. n length[T ]

2. m length[P ]

3.   Compute-Prefix-Function(P )

4. q 05. for  i  1 to n 

6. do while q > 0 and P [q + 1]  T [i ]

7. do q  [q]

8. if  P [q + 1] = T [i ] 9. then q q + 1 

10. if  q = m 

11. then print “pattern occurs with shift” i  – m

12. q  [q]

Page 12: String Matching Problem.ppt

7/27/2019 String Matching Problem.ppt

http://slidepdf.com/reader/full/string-matching-problemppt 12/16

Compute prefix function

P = ababababca, T = ababaababababca[1] = 0

k = 0

q = 2, P [k + 1] = P [1] = a, P [q] = P [2] = b, P [k + 1]  P [q]

[q]  k ([2]  0)q = 3, P [k + 1] = P [1] = a, P [q] = P [3] = a, P [k + 1] = P [q]

k  k + 1, [q]  k ([3]  1)

k = 1

q = 4, P [k + 1] = P [2] = b, P [q] = P [4] = b, P [k + 1] = P [q]

k  k + 1, [q]  k ([4]  2)

Page 13: String Matching Problem.ppt

7/27/2019 String Matching Problem.ppt

http://slidepdf.com/reader/full/string-matching-problemppt 13/16

 

k = 2 

q = 5, P [k + 1] = P [3] = a, P [q] = P [5] = a, P [k + 1] = P [q]

k  k + 1, [q]  k ([5]  3)k = 3 

q = 6, P [k + 1] = P [4] = b, P [q] = P [6] = b, P [k + 1] = P [q]

k  k + 1, [q]  k ([6]  4)

k = 4 q = 7, P [k + 1] = P [5] = a, P [q] = P [7] = a, P [k + 1] = P [q]

k  k + 1, [q]  k ([7]  5)

k = 5 

q = 8, P [k + 1] = P [6] = b, P [q] = P [8] = b, P [k + 1] = P [q]

k  k + 1, [q]  k ([8]  6)

Page 14: String Matching Problem.ppt

7/27/2019 String Matching Problem.ppt

http://slidepdf.com/reader/full/string-matching-problemppt 14/16

 

k = 6 

q = 9, P [k + 1] = P [6] = b, P [q] = P [9] = c, P [k + 1]  P [q]k  [k ] (k  [6] = 4)

P [k + 1] = P [5] = a, P [q] = P [9] = c, P [k + 1]  P [q]

k  [k ] (k  [4] = 2)

P [k + 1] = P [3] = a, P [q] = P [9] = c, P [k + 1]  P [q]

k  [k ] (k  [2] = 0)

k = 0 

q = 9, P [k + 1] = P [1] = a, P [q] = P [9] = c, P [k + 1]  P [q]

[q]  k ([9]  0)

q = 10, P [k + 1] = P [1] = a, P [q] = P [10] = a, P [k + 1] = P [q]k  k + 1, [q]  k ([10]  1)

Page 15: String Matching Problem.ppt

7/27/2019 String Matching Problem.ppt

http://slidepdf.com/reader/full/string-matching-problemppt 15/16

 After prefix computation, the table is shown below

P = ababababca

1 2 3 4 5 6 7 8 9 10

a b a b a b a b c a0 0 1 2 3 4 5 6 0 1

i

 P [i]   [i] 

a b a b a b a b c a

a b a b a b

a b a b

a b

a b c a

a b a b c a

a b a b a b c a

a b a b a b a b c a

 P 8

 P 6

 P 4

 P 2

 P 0  

  [8] = 6 

  [6] = 4

  [4] = 2 

  [2] = 0 

Page 16: String Matching Problem.ppt

7/27/2019 String Matching Problem.ppt

http://slidepdf.com/reader/full/string-matching-problemppt 16/16

Another Example for KMP Algorithm

Phase 1

Phase 2

 f (4 – 1)+1= f (3)+1=0+1=1

 f (13-1)+1= 4+1=5matched

First finish the prefix

computation

Next, Search phase computation