String Matching Finite Automata & KMP Algorithm

Preview:

Citation preview

String Matching

String Matching

String matching with finite automata

• The string-matching automaton is veryEffective tool which is used in string matching Algorithms.it examines each character in the text exactly once and reports all the valid shifts in O(n) time.

The basic idea is to build a automaton in which

•••

Each character in the pattern has a state.Each match sends the automaton into a new state.If all the characters in the pattern has beenmatched, the automaton enters the accepting state. Otherwise, the automaton will return to a suitablestate according to the current state and the inputCharacter.the matching takes O(n) time since each character is examined once.

• The construction of the string-matching automaton is based on the given pattern. The time of this construction may be O(m3||).

The finite automaton begins in state q0 andread the characters of its input string one ata time. If the automaton is in state q and reads input character a, it moves from state q to state (q,a).

input

State

0

1

a2k+1Given pattern:Input string = abaaaStart state: 0Terminate state: 1

Figure 1: An automaton.

a b

1 0

0 1

Finite automata:A finite automaton M is a 5-tuple

(Q,q0,A,,), where••••••

Q is a finite set of states.q0 Q is the start state.

A Q is a distinguish set of accepting states. is a finite input alphabet is a function from Q × into Q, called the transition function of M.

The following inputs it accepts: (Odd number of a’s accepted and any number of bb’s. )-“aaa”-“abb”-“bababab”-“babababa”Rejected: (Even number of a’s not accepted)-“aabb”-“aaaa”

input

State

0

1

a b

1 0

0 1

(a)Transition Table (b) Finite Automata

The automaton can also be represented as a state-transition diagram as shown in right hand side of the figure.

FINITE-AUTOMATON-MATCHER(T,,m)

n length[T]q 0for i 1 to n do q (q,if q=m then

1.2.3.4.5.6.

T[i])

print (“Pattern matches with“,i-m)

Build DFA from pattern. Run DFA on text.

3 4a a

5 6a

0 1a a

2b

bb

b

b

b

a

a a b a a aa a a b a a

Search Textb a a a b

accept state

Example

3 4a a

5 6a

0 1a a

2b

bb

b

b

b

a

accept state

a a b a a aa a a b a a

Search Textb a a a b

3 4a a

5 6a

0 1a a

2b

bb

b

b

b

a

accept state

a a b a a aa a a b a a

Search Textb a a a b

3 4a a

5 6a

0 1a a

2b

bb

b

b

b

a

accept state

a a b a a a

a a a b a aSearch Text

b a a a ba a b a a a

3 4a a

5 6a

0 1a a

2b

bb

b

b

b

a

accept state

a a b a a a

a a a b a aSearch Text

b a a a ba a b a a a

3 4a a

5 6a

0 1a a

2b

bb

b

b

b

a

accept state

a a b a a a

a a a b a aSearch Text

b a a a ba a b a a a

3 4a a

5 6a

0 1a a

2b

bb

b

b

b

a

accept state

a a b a a a

a a a b a aSearch Text

b a a a ba a b a a a

3 4a a

5 6a

0 1a a

2b

bb

b

b

b

a

accept state

a a b a a a

a a a b a aSearch Text

b a a a ba a b a a a

a a b a a a

3 4a a

5 6a

0 1a a

2b

bb

b

b

b

a

accept state

a a b a a a

a a a b a aSearch Text

b a a a ba a b a a a

a a b a a a

3 4a a

5 6a

0 1a a

2b

bb

b

b

b

a

accept state

a a b a a a

a a a b a aSearch Text

b a a a ba a b a a a

a a b a a a

3 4a a

5 6a

0 1a a

2b

bb

b

b

b

a

accept state

a a b a a a

a a a b a aSearch Text

b a a a ba a b a a a

a a b a a a

3 4a a

5 6a

0 1a a

2b

bb

b

b

b

a

accept state

a a b a a a

a a a b a aSearch Text

b a a a ba a b a a a

a a b a a a

Knuth-Morris-Pratt

KNUTH-MORRIS-PRATT ALGORITHM.

• This algorithm was conceived by Donald Knuth and Vaughan Pratt and independently by James H.Morris in 1977.

KNUTH-MORRIS-PRATT ALGORITHM.

Text: abcabbPattern: abd

• Prefix: Prefix of a string any number of leading symbols of that String.o Ex: λ,a,ab,abc,abca,abca,abcab,abcabb.

• Suffix : Suffix is any number of trailing symbols of that string.o Ex: λ,b,bb,abb,cabb,bcabb,abcabb.

• Propare Prefix and Propare Suffix: Prefix and Suffix other than string is called Propare Prefix and Propare Suffix.

o Ex: Propare Prefix: λ,a,ab,abc,abca,abca,abcab Propare Suffix: λ,b,bb,abb,cabb,bcabb.

• Border: Border of the given string is intersection of Propare prefix and Propare suffix.

o Ex: λ So, Border is 1.

• Shift Distance = Length of Pattern – Border.o Ex: Length of a Pattern = 3 & Length of a Border = 1

So, Shift Distance = 2.

ALGORITHM: Step 1: Initialize Input Variables:

m = Length of the Pattern. u = Prefix-Function of Pattern( p ) . q = Number of character matched . Step 2: Define the variable : q=0 , the beginning of the match . Step 3: Compare the first character with first character of Text. If match is not found ,Substitute the value of u[ q ] to q . If match is found , then increment the value of q by 1. Step 4: Check whether all the pattern elements are matched with the text elements . If not , repeat the search process . If yes , print the number of shifts taken by the pattern. Step 5: look for the next match .

TIME COMPLEXITY:

• O(m) - It is to compute the prefix function values. • O(n) - It is to compare the pattern to the text. Total of• O(n + m) run time.

Recommended