of 16 /16
String Matching String Matching

# 11 String Matching

• Author
tieppv

• View
91

1

Embed Size (px)

### Text of 11 String Matching

String Matching

String Matching

String Matching AlgorithmsFinding Patterns in a given Text Datastructures: Tries, Sux-Tries, Sux Arrays Algorithms:Naive Approach Boyer-Moore Rabin-Karp Knuth-Morris-Pratt (KMP)

Literature: Dan Guseld, Algorithms on strings, trees, and sequences CLRS (Cormen,. . .), Introduction to Algorithms

String Matching

Naive ApproachNaive Approach n = text.size(); m = pattern.size(); for s = 0 to n - m { if (pattern[1 .. m] = text[s+1 .. s+m]) add_result(s); } For T = an , P = am and m = n/2 the worst case occurs, yielding a running time of (n2 ).

String Matching

Rabin-KarpRabin-Karp n=text.lenght(); m=pattern.length(); hpattern = hash(pattern) htext = hash(text[0..m-1]) for s = 0 to n - m { if (htext == hpattern) if (pattern[1 .. m] = text[s .. s+m-1]) add_result(s); htext = hash(s+1,s+m) }

String Matching

Properties of Rabin-KarpProperties of Rabin-Karp-Algorithm Worst case running time (as for the naive approach) is O((n m + 1)m). On average good, i.e. O(n + m).

String Matching

Boyer-MooreCompare right left. possible that some text chars are never compared Good explanation in Dan Guseld, Algorithms on strings trees and sequences Bad char shifts

String Matching

Boyer-Moore, strong good sux rule(strong) good sux rule T: prstabstubabvqxrst * P: qcabdabdab

String Matching

Boyer-Moore, strong good sux rule(strong) good sux rule T: prstabstubabvqxrst * P: qcabdabdab

String Matching

Boyer-Moore, strong good sux rule(strong) good sux rule T: prstabstubabvqxrst * P: qcabdabdab

String Matching

Boyer-Moore, strong good sux rule(strong) good sux rule T: prstabstubabvqxrst * P: qcabdabdab P: qcabdabdab

String Matching

Boyer-Moore, strong good sux rule(strong) good sux rule T: prstabstubabvqxrst * P: qcabdabdab

String Matching

Boyer-Moore, strong good sux rule(strong) good sux rule T: prstabstubabvqxrst * P: qcabdabdab P: qcabdabdab

String Matching

Properties of Boyer-MooreProperties of Boyer-Moore-Algorithm Worst case if pattern is not in the text O(n). Best case O(n/m) running time. In practice one of the best known algorithms for string matching.

details see e.g. http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string

String Matching

Properties of KMPProperties of KMP-Algorithm Worst case running time is O(n). In practice most of the time slower than Boyer Moor but easier to code. no details here extension: Aho-Corasick for matching multiple strings in one pass

String Matching

TriesTries data structure for a set of strings each node corresponds to a prex of some string each edge corresponds to a character example stolen from wikipedia: to, tea, ten, i, in, and innt t o to e te a tea n ten i i

11 nin n 5 inn

7

3

12

9

String Matching

Sux-Trees/Tries/ArraysSux-Tries/Trees preprocessing the text not the pattern tree containing every sux of a text (size?) Fast searching for any substring trietree: one edge for paths without branches there are linear time algorithm for sux trees (clearly linear size) Sux Arrays array of length |S| listing the suxes of S in ascending order (simple) search in m log n time simple implementation in O(n2 log n) and O(n) space often sucientString Matching

Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Engineering
Documents
Documents
Documents
Documents
Documents
Technology
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Documents
Engineering
Documents
Engineering