28
String Matching

String Matching Finite Automata & KMP Algorithm

Embed Size (px)

Citation preview

Page 1: String Matching Finite Automata & KMP Algorithm

String Matching

Page 2: String Matching Finite Automata & KMP Algorithm

String Matching

String matching with finite automata

• The string-matching automaton is veryEffective tool which is used in string matching Algorithms.it examines each character in the text exactly once and reports all the valid shifts in O(n) time.

Page 3: String Matching Finite Automata & KMP Algorithm

The basic idea is to build a automaton in which

•••

Each character in the pattern has a state.Each match sends the automaton into a new state.If all the characters in the pattern has beenmatched, the automaton enters the accepting state. Otherwise, the automaton will return to a suitablestate according to the current state and the inputCharacter.the matching takes O(n) time since each character is examined once.

Page 4: String Matching Finite Automata & KMP Algorithm

• The construction of the string-matching automaton is based on the given pattern. The time of this construction may be O(m3||).

The finite automaton begins in state q0 andread the characters of its input string one ata time. If the automaton is in state q and reads input character a, it moves from state q to state (q,a).

Page 5: String Matching Finite Automata & KMP Algorithm

input

State

0

1

a2k+1Given pattern:Input string = abaaaStart state: 0Terminate state: 1

Figure 1: An automaton.

a b

1 0

0 1

Page 6: String Matching Finite Automata & KMP Algorithm

Finite automata:A finite automaton M is a 5-tuple

(Q,q0,A,,), where••••••

Q is a finite set of states.q0 Q is the start state.

A Q is a distinguish set of accepting states. is a finite input alphabet is a function from Q × into Q, called the transition function of M.

Page 7: String Matching Finite Automata & KMP Algorithm

The following inputs it accepts: (Odd number of a’s accepted and any number of bb’s. )-“aaa”-“abb”-“bababab”-“babababa”Rejected: (Even number of a’s not accepted)-“aabb”-“aaaa”

Page 8: String Matching Finite Automata & KMP Algorithm

input

State

0

1

a b

1 0

0 1

(a)Transition Table (b) Finite Automata

The automaton can also be represented as a state-transition diagram as shown in right hand side of the figure.

Page 9: String Matching Finite Automata & KMP Algorithm

FINITE-AUTOMATON-MATCHER(T,,m)

n length[T]q 0for i 1 to n do q (q,if q=m then

1.2.3.4.5.6.

T[i])

print (“Pattern matches with“,i-m)

Page 10: String Matching Finite Automata & KMP Algorithm

Build DFA from pattern. Run DFA on text.

3 4a a

5 6a

0 1a a

2b

bb

b

b

b

a

a a b a a aa a a b a a

Search Textb a a a b

accept state

Example

Page 11: String Matching Finite Automata & KMP Algorithm

3 4a a

5 6a

0 1a a

2b

bb

b

b

b

a

accept state

a a b a a aa a a b a a

Search Textb a a a b

Page 12: String Matching Finite Automata & KMP Algorithm

3 4a a

5 6a

0 1a a

2b

bb

b

b

b

a

accept state

a a b a a aa a a b a a

Search Textb a a a b

Page 13: String Matching Finite Automata & KMP Algorithm

3 4a a

5 6a

0 1a a

2b

bb

b

b

b

a

accept state

a a b a a a

a a a b a aSearch Text

b a a a ba a b a a a

Page 14: String Matching Finite Automata & KMP Algorithm

3 4a a

5 6a

0 1a a

2b

bb

b

b

b

a

accept state

a a b a a a

a a a b a aSearch Text

b a a a ba a b a a a

Page 15: String Matching Finite Automata & KMP Algorithm

3 4a a

5 6a

0 1a a

2b

bb

b

b

b

a

accept state

a a b a a a

a a a b a aSearch Text

b a a a ba a b a a a

Page 16: String Matching Finite Automata & KMP Algorithm

3 4a a

5 6a

0 1a a

2b

bb

b

b

b

a

accept state

a a b a a a

a a a b a aSearch Text

b a a a ba a b a a a

Page 17: String Matching Finite Automata & KMP Algorithm

3 4a a

5 6a

0 1a a

2b

bb

b

b

b

a

accept state

a a b a a a

a a a b a aSearch Text

b a a a ba a b a a a

a a b a a a

Page 18: String Matching Finite Automata & KMP Algorithm

3 4a a

5 6a

0 1a a

2b

bb

b

b

b

a

accept state

a a b a a a

a a a b a aSearch Text

b a a a ba a b a a a

a a b a a a

Page 19: String Matching Finite Automata & KMP Algorithm

3 4a a

5 6a

0 1a a

2b

bb

b

b

b

a

accept state

a a b a a a

a a a b a aSearch Text

b a a a ba a b a a a

a a b a a a

Page 20: String Matching Finite Automata & KMP Algorithm

3 4a a

5 6a

0 1a a

2b

bb

b

b

b

a

accept state

a a b a a a

a a a b a aSearch Text

b a a a ba a b a a a

a a b a a a

Page 21: String Matching Finite Automata & KMP Algorithm

3 4a a

5 6a

0 1a a

2b

bb

b

b

b

a

accept state

a a b a a a

a a a b a aSearch Text

b a a a ba a b a a a

a a b a a a

Page 22: String Matching Finite Automata & KMP Algorithm

Knuth-Morris-Pratt

Page 23: String Matching Finite Automata & KMP Algorithm

KNUTH-MORRIS-PRATT ALGORITHM.

• This algorithm was conceived by Donald Knuth and Vaughan Pratt and independently by James H.Morris in 1977.

Page 24: String Matching Finite Automata & KMP Algorithm

KNUTH-MORRIS-PRATT ALGORITHM.

Text: abcabbPattern: abd

• Prefix: Prefix of a string any number of leading symbols of that String.o Ex: λ,a,ab,abc,abca,abca,abcab,abcabb.

• Suffix : Suffix is any number of trailing symbols of that string.o Ex: λ,b,bb,abb,cabb,bcabb,abcabb.

Page 25: String Matching Finite Automata & KMP Algorithm

• Propare Prefix and Propare Suffix: Prefix and Suffix other than string is called Propare Prefix and Propare Suffix.

o Ex: Propare Prefix: λ,a,ab,abc,abca,abca,abcab Propare Suffix: λ,b,bb,abb,cabb,bcabb.

• Border: Border of the given string is intersection of Propare prefix and Propare suffix.

o Ex: λ So, Border is 1.

Page 26: String Matching Finite Automata & KMP Algorithm

• Shift Distance = Length of Pattern – Border.o Ex: Length of a Pattern = 3 & Length of a Border = 1

So, Shift Distance = 2.

Page 27: String Matching Finite Automata & KMP Algorithm

ALGORITHM: Step 1: Initialize Input Variables:

m = Length of the Pattern. u = Prefix-Function of Pattern( p ) . q = Number of character matched . Step 2: Define the variable : q=0 , the beginning of the match . Step 3: Compare the first character with first character of Text. If match is not found ,Substitute the value of u[ q ] to q . If match is found , then increment the value of q by 1. Step 4: Check whether all the pattern elements are matched with the text elements . If not , repeat the search process . If yes , print the number of shifts taken by the pattern. Step 5: look for the next match .

Page 28: String Matching Finite Automata & KMP Algorithm

TIME COMPLEXITY:

• O(m) - It is to compute the prefix function values. • O(n) - It is to compare the pattern to the text. Total of• O(n + m) run time.