of 16 /16
String Matching String Matching

11 String Matching

  • Author
    tieppv

  • View
    91

  • Download
    1

Embed Size (px)

Text of 11 String Matching

String Matching

String Matching

String Matching AlgorithmsFinding Patterns in a given Text Datastructures: Tries, Sux-Tries, Sux Arrays Algorithms:Naive Approach Boyer-Moore Rabin-Karp Knuth-Morris-Pratt (KMP)

Literature: Dan Guseld, Algorithms on strings, trees, and sequences CLRS (Cormen,. . .), Introduction to Algorithms

String Matching

Naive ApproachNaive Approach n = text.size(); m = pattern.size(); for s = 0 to n - m { if (pattern[1 .. m] = text[s+1 .. s+m]) add_result(s); } For T = an , P = am and m = n/2 the worst case occurs, yielding a running time of (n2 ).

String Matching

Rabin-KarpRabin-Karp n=text.lenght(); m=pattern.length(); hpattern = hash(pattern) htext = hash(text[0..m-1]) for s = 0 to n - m { if (htext == hpattern) if (pattern[1 .. m] = text[s .. s+m-1]) add_result(s); htext = hash(s+1,s+m) }

String Matching

Properties of Rabin-KarpProperties of Rabin-Karp-Algorithm Worst case running time (as for the naive approach) is O((n m + 1)m). On average good, i.e. O(n + m).

String Matching

Boyer-MooreCompare right left. possible that some text chars are never compared Good explanation in Dan Guseld, Algorithms on strings trees and sequences Bad char shifts

String Matching

Boyer-Moore, strong good sux rule(strong) good sux rule T: prstabstubabvqxrst * P: qcabdabdab

String Matching

Boyer-Moore, strong good sux rule(strong) good sux rule T: prstabstubabvqxrst * P: qcabdabdab

String Matching

Boyer-Moore, strong good sux rule(strong) good sux rule T: prstabstubabvqxrst * P: qcabdabdab

String Matching

Boyer-Moore, strong good sux rule(strong) good sux rule T: prstabstubabvqxrst * P: qcabdabdab P: qcabdabdab

String Matching

Boyer-Moore, strong good sux rule(strong) good sux rule T: prstabstubabvqxrst * P: qcabdabdab

String Matching

Boyer-Moore, strong good sux rule(strong) good sux rule T: prstabstubabvqxrst * P: qcabdabdab P: qcabdabdab

String Matching

Properties of Boyer-MooreProperties of Boyer-Moore-Algorithm Worst case if pattern is not in the text O(n). Best case O(n/m) running time. In practice one of the best known algorithms for string matching.

details see e.g. http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string

String Matching

Properties of KMPProperties of KMP-Algorithm Worst case running time is O(n). In practice most of the time slower than Boyer Moor but easier to code. no details here extension: Aho-Corasick for matching multiple strings in one pass

String Matching

TriesTries data structure for a set of strings each node corresponds to a prex of some string each edge corresponds to a character example stolen from wikipedia: to, tea, ten, i, in, and innt t o to e te a tea n ten i i

11 nin n 5 inn

7

3

12

9

String Matching

Sux-Trees/Tries/ArraysSux-Tries/Trees preprocessing the text not the pattern tree containing every sux of a text (size?) Fast searching for any substring trietree: one edge for paths without branches there are linear time algorithm for sux trees (clearly linear size) Sux Arrays array of length |S| listing the suxes of S in ascending order (simple) search in m log n time simple implementation in O(n2 log n) and O(n) space often sucientString Matching