Author
tieppv
View
91
Download
1
Embed Size (px)
String Matching
String Matching
String Matching AlgorithmsFinding Patterns in a given Text Datastructures: Tries, Sux-Tries, Sux Arrays Algorithms:Naive Approach Boyer-Moore Rabin-Karp Knuth-Morris-Pratt (KMP)
Literature: Dan Guseld, Algorithms on strings, trees, and sequences CLRS (Cormen,. . .), Introduction to Algorithms
String Matching
Naive ApproachNaive Approach n = text.size(); m = pattern.size(); for s = 0 to n - m { if (pattern[1 .. m] = text[s+1 .. s+m]) add_result(s); } For T = an , P = am and m = n/2 the worst case occurs, yielding a running time of (n2 ).
String Matching
Rabin-KarpRabin-Karp n=text.lenght(); m=pattern.length(); hpattern = hash(pattern) htext = hash(text[0..m-1]) for s = 0 to n - m { if (htext == hpattern) if (pattern[1 .. m] = text[s .. s+m-1]) add_result(s); htext = hash(s+1,s+m) }
String Matching
Properties of Rabin-KarpProperties of Rabin-Karp-Algorithm Worst case running time (as for the naive approach) is O((n m + 1)m). On average good, i.e. O(n + m).
String Matching
Boyer-MooreCompare right left. possible that some text chars are never compared Good explanation in Dan Guseld, Algorithms on strings trees and sequences Bad char shifts
String Matching
Boyer-Moore, strong good sux rule(strong) good sux rule T: prstabstubabvqxrst * P: qcabdabdab
String Matching
Boyer-Moore, strong good sux rule(strong) good sux rule T: prstabstubabvqxrst * P: qcabdabdab
String Matching
Boyer-Moore, strong good sux rule(strong) good sux rule T: prstabstubabvqxrst * P: qcabdabdab
String Matching
Boyer-Moore, strong good sux rule(strong) good sux rule T: prstabstubabvqxrst * P: qcabdabdab P: qcabdabdab
String Matching
Boyer-Moore, strong good sux rule(strong) good sux rule T: prstabstubabvqxrst * P: qcabdabdab
String Matching
Boyer-Moore, strong good sux rule(strong) good sux rule T: prstabstubabvqxrst * P: qcabdabdab P: qcabdabdab
String Matching
Properties of Boyer-MooreProperties of Boyer-Moore-Algorithm Worst case if pattern is not in the text O(n). Best case O(n/m) running time. In practice one of the best known algorithms for string matching.
details see e.g. http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string
String Matching
Properties of KMPProperties of KMP-Algorithm Worst case running time is O(n). In practice most of the time slower than Boyer Moor but easier to code. no details here extension: Aho-Corasick for matching multiple strings in one pass
String Matching
TriesTries data structure for a set of strings each node corresponds to a prex of some string each edge corresponds to a character example stolen from wikipedia: to, tea, ten, i, in, and innt t o to e te a tea n ten i i
11 nin n 5 inn
7
3
12
9
String Matching
Sux-Trees/Tries/ArraysSux-Tries/Trees preprocessing the text not the pattern tree containing every sux of a text (size?) Fast searching for any substring trietree: one edge for paths without branches there are linear time algorithm for sux trees (clearly linear size) Sux Arrays array of length |S| listing the suxes of S in ascending order (simple) search in m log n time simple implementation in O(n2 log n) and O(n) space often sucientString Matching