21
String Matching String Matching Algorithms Algorithms Introduction (32) Introduction (32) Naïve String Matching Algorithm Naïve String Matching Algorithm (32.1) (32.1)

String Matching Algorithms

Embed Size (px)

DESCRIPTION

String Matching Algorithms. Introduction (32) Naïve String Matching Algorithm (32.1). Introduction. 32. - Formal Definition of String Matching Problem. Definitions. - Assume text is an array T[1..n] of length n and the pattern is an array P[1..m] of length m ≤ n. Explanation:. - PowerPoint PPT Presentation

Citation preview

Page 1: String Matching Algorithms

String MatchingString Matching AlgorithmsAlgorithms

• Introduction (32)Introduction (32)• Naïve String Matching Algorithm (32.1)Naïve String Matching Algorithm (32.1)

Page 2: String Matching Algorithms

IntroductionIntroduction

Page 3: String Matching Algorithms

• DefinitionsDefinitions

3232

- Formal Definition of String Matching Problem- Formal Definition of String Matching Problem

- Assume text is an array - Assume text is an array T[1..n] of length n and the T[1..n] of length n and the pattern is an array P[1..m] of pattern is an array P[1..m] of length m ≤ nlength m ≤ n

Explanation:Explanation:

This basically means that there is a string This basically means that there is a string array T which contains a certain number of array T which contains a certain number of characters that is larger than the number of characters that is larger than the number of characters in string array P. P is said to be characters in string array P. P is said to be the pattern array because it contains a the pattern array because it contains a pattern of characters to be searched for in pattern of characters to be searched for in the larger array T.the larger array T.

Page 4: String Matching Algorithms

• DefinitionsDefinitions

3232

- Alphabet- Alphabet

- It is assumed that the - It is assumed that the elements in P and T are elements in P and T are drawn from a finite alphabet drawn from a finite alphabet ΣΣ. .

-Example-Example

ΣΣ = {a,b, …z} = {a,b, …z}

ΣΣ = {0,1} = {0,1}

Sigma simply defines what characters are Sigma simply defines what characters are allowed in both the character array to be allowed in both the character array to be searched and the character array that searched and the character array that contains the subsequence to be searched for.contains the subsequence to be searched for.

Explanation:Explanation:

Page 5: String Matching Algorithms

3232

• DefinitionsDefinitions- Strings- Strings

- - ΣΣ* denotes the set of all * denotes the set of all finite length strings formed finite length strings formed by using characters from the by using characters from the alphabetalphabet- The zero length empty - The zero length empty string denoted by string denoted by εε and is a and is a member of member of ΣΣ* * - The length of a string x is - The length of a string x is denoted by |x|denoted by |x|

- The concatenation of two - The concatenation of two strings x and y, denoted xy, strings x and y, denoted xy, has length |x| + |y| and has length |x| + |y| and consists of the characters in consists of the characters in x followed by the characters x followed by the characters in yin y

Page 6: String Matching Algorithms

3232

• DefinitionsDefinitions- Shift- Shift

- If P occurs with shift s in T, - If P occurs with shift s in T, then we call s a valid shiftthen we call s a valid shift

-If P does not occurs with -If P does not occurs with shift s in T, we call s an shift s in T, we call s an invalid shiftinvalid shift

Page 7: String Matching Algorithms

3232

- String Concatenation Example- String Concatenation Example

ΣΣ = {A,B,C,D,E,H,1,2,6,9} = {A,B,C,D,E,H,1,2,6,9}

String X = A125 , |X| = 4String X = A125 , |X| = 4 String Y = HE69D, |Y| = String Y = HE69D, |Y| = 55

• DefinitionsDefinitions

The Concatenator

String Z = A125HE69D, |x| = 9String Z = A125HE69D, |x| = 9

Page 8: String Matching Algorithms

The Concatenator

3232

- String Concatenation Example- String Concatenation Example

ΣΣ = {A,B,C,D,E,H,1,2,6,8} = {A,B,C,D,E,H,1,2,6,8}

String X = A125 , |X| = 4String X = A125 , |X| = 4 String Y = HE69D, |Y| = 5String Y = HE69D, |Y| = 5

• DefinitionsDefinitions

String Z = A125HE69D , |x| = 9String Z = A125HE69D , |x| = 9

Page 9: String Matching Algorithms

• DefinitionsDefinitions- Prefix- Prefix

- String w is a prefix of a - String w is a prefix of a string x if x = wy for some string x if x = wy for some string y string y εε ΣΣ**- w[x means that string w is a - w[x means that string w is a prefix of string xprefix of string x

Explanation:Explanation:

If a string w is a prefix of siring x this means If a string w is a prefix of siring x this means that there exists some string y that when that there exists some string y that when added onto the back of string w will make w = added onto the back of string w will make w = xx

**==

String YString W

String X

3232

Page 10: String Matching Algorithms

3232

• DefinitionsDefinitions- Prefix Examples- Prefix Examples

To Prefix Or Not To PrefixTo Prefix Or Not To Prefix

ΣΣ = {A,B} = {A,B} ΣΣ* = {A, B, AB, BA}* = {A, B, AB, BA}

ExamplesExamples: :

String x = AABBAABBABABString x = AABBAABBABAB

String w =AABBAAString w =AABBAA

Is w[x ?Is w[x ? Why?Why?

Page 11: String Matching Algorithms

3232

• DefinitionsDefinitions

To Prefix Or Not To PrefixTo Prefix Or Not To Prefix

ΣΣ = {A,B} = {A,B} ΣΣ* = {A, B, AB, BA}* = {A, B, AB, BA}

ExamplesExamples: :

String x = AABBAABBABABString x = AABBAABBABAB

String w =AABABAAString w =AABABAA

Is w[x ?Is w[x ? Why?Why?

Page 12: String Matching Algorithms

• DefinitionsDefinitions- Suffix- Suffix

- String w is a suffix of a - String w is a suffix of a string x if x = yw for some y string x if x = yw for some y εε ΣΣ**- w]x means that string w is a - w]x means that string w is a suffix of string xsuffix of string x

Explanation:Explanation:

If a string w is a suffix of string x this means If a string w is a suffix of string x this means that there exists some string y that when that there exists some string y that when added onto the front of string w will make w = added onto the front of string w will make w = xx

**==

String WString Y

String X

3232

Page 13: String Matching Algorithms

3232

• DefinitionsDefinitions- Suffix Examples- Suffix Examples

Et Tu Suffix?Et Tu Suffix?

ΣΣ = {A,B} = {A,B} ΣΣ* = {A, B, AB, BA}* = {A, B, AB, BA}

ExamplesExamples: :

String x = AABBAABBABABString x = AABBAABBABAB

String w = BABBAString w = BABBA

Is w[x ?Is w[x ? Why?Why?

Page 14: String Matching Algorithms

3232

• DefinitionsDefinitions- Suffix Examples- Suffix Examples

Et Tu Suffix?Et Tu Suffix?

ΣΣ = {A,B} = {A,B} ΣΣ* = {A, B, AB, BA}* = {A, B, AB, BA}

ExamplesExamples: :

String x = AABBAABBABABString x = AABBAABBABAB

String w = BABABString w = BABAB

Is w[x ?Is w[x ? Why?Why?

Page 15: String Matching Algorithms

Naïve String Matching AlgorithmNaïve String Matching Algorithm

Page 16: String Matching Algorithms

• DefinitionsDefinitions

3232

- Formal Definition of String Matching Problem- Formal Definition of String Matching Problem

- Assume text is an array - Assume text is an array T[1..n] of length n and the T[1..n] of length n and the pattern is an array P[1..m] of pattern is an array P[1..m] of length m ≤ nlength m ≤ n

Explanation:Explanation:

This basically means that there is a string This basically means that there is a string array T which contains a certain number of array T which contains a certain number of characters that is larger than the number of characters that is larger than the number of characters in string array P. P is said to be characters in string array P. P is said to be the pattern array because it contains a the pattern array because it contains a pattern of characters to be searched for in pattern of characters to be searched for in the larger array T.the larger array T.

Page 17: String Matching Algorithms

32.132.1

• Basic ExplanationBasic Explanation

The Naïve String Matching The Naïve String Matching Algorithm takes the pattern that is Algorithm takes the pattern that is being searched for in the “base” being searched for in the “base” string and slides it across the base string and slides it across the base string looking for a match. It keeps string looking for a match. It keeps track of how many times the track of how many times the pattern has been shifted in pattern has been shifted in varriable s and when a match is varriable s and when a match is found it prints the statement found it prints the statement “Pattern Occurs with Shirt s” .“Pattern Occurs with Shirt s” .This algorithm is This algorithm is

also sometimes also sometimes known as the known as the Brute Force Brute Force algorithmalgorithm..

Page 18: String Matching Algorithms

32.132.1

• Algorithm Pseudo CodeAlgorithm Pseudo Code

NAÏVE-STRING-MATCHER(T,P)NAÏVE-STRING-MATCHER(T,P)1 N ← length [T]N ← length [T]2 M ← length[P]M ← length[P]3 For s ← 0 to n –mFor s ← 0 to n –m4 do if P[1…m] = T[s+1 .. S+m]do if P[1…m] = T[s+1 .. S+m]5 then print “Pattern Occurs with shift” sthen print “Pattern Occurs with shift” s

- Examples- Examples

- - Demonstration 1

- - Demonstration 2

Page 19: String Matching Algorithms

• Algorithm Time AnalysisAlgorithm Time Analysis

NAÏVE-STRING-MATCHER(T,P)NAÏVE-STRING-MATCHER(T,P)1 N ← length [T]N ← length [T]2 M ← length[P]M ← length[P]3 For s ← 0 to n –mFor s ← 0 to n –m4 do if P[1…m] = T[s+1 .. S+m]do if P[1…m] = T[s+1 .. S+m]5 then print “Pattern Occurs with shift” sthen print “Pattern Occurs with shift” s

- The worst case is when the algorithm has a - The worst case is when the algorithm has a substring to find in the string it is searching substring to find in the string it is searching that is repeated throughout the whole string. that is repeated throughout the whole string. An example of this would be a substring of An example of this would be a substring of length alength amm that is being searched for in a that is being searched for in a substring of length asubstring of length ann..

Page 20: String Matching Algorithms

• Algorithm Time AnalysisAlgorithm Time Analysis

The algorithm is O((n-m)+1)*m) The algorithm is O((n-m)+1)*m)

n = length of string being n = length of string being searchedsearchedm = length of substring being m = length of substring being comparedcompared

Inclusive subtractionInclusive subtraction

- The Naïve String Matcher is not an optimal - The Naïve String Matcher is not an optimal solutionsolution- It is inefficient because information gained - It is inefficient because information gained about the text for one value of s is entirely about the text for one value of s is entirely ignored in considering other values of s.ignored in considering other values of s.

Comments:Comments:

Page 21: String Matching Algorithms

32.132.1

• QuestionsQuestions