Author
marguerite-lefevre
View
46
Download
0
Embed Size (px)
DESCRIPTION
Exact String Matching Algorithms. Presented By Dr. Shazzad Hosain Asst. Prof. EECS, NSU. Exact Matching: What’s the Problem. 1 1 2 34 5 67 8 90 1 2 T = bbabaxababay P = aba. P occurs in T starting at locations 3, 7, and 9 P may overlap, as found at 7 and 9. The Naive Method. - PowerPoint PPT Presentation
Introduction to Bioinformatics
Exact String Matching AlgorithmsPresented ByDr. Shazzad HosainAsst. Prof. EECS, NSUExact Matching: Whats the Problem
1 1 2 34 5 67 8 90 1 2T = bbabaxababayP = abaP occurs in T starting at locations 3, 7, and 9P may overlap, as found at 7 and 9.The Naive MethodProblem is to find if a pattern P[1..m] occurs within text T[1..n]Let P = abxyabxz and T = xabxyabxyabxzWhere m = 8 and n = 13The Naive MethodIf P = aaa and T = aaaaaaaaaa then n=3, m=10In worst case exactly n(m-n+1) comparisonsIn this case 24 comparisons in the order of (mn).
The Naive AlgorithmChar text[], pat[] ;int n, m ;{ int i, j, k, lim ; lim=n-m+1 ; for (i=1 ; i j 1 1 2 34 5 67 8 90 1 2S = bbabaxababayS[3..7] = abaxaS[1..4] = bbab|S| is the length of the string. Here, |S| = 12S[1..i] is prefix of S that ends at position i
PrefixS[i..|S|] is the suffix of S that begins at position iS[9..12] = abaySuffixA proper prefix, suffix or substring of S is, respectively, a prefix, suffix or substring that is not the entire string S, not the empty string.For any string S, S(i) denotes the ith character of SBasic String Definitions/Notations12PreprocessingGoal: To gather the information needed for speeding up the algorithmDefinitions:Zi: For i>1, the length of the longest substring of S that starts at i and matches a prefix of SZ-box: for any position i >1 where Zi>0, the Z-box at i starts at i and ends at i+Zi-1ri; For every i>1, ri is the right-most endpoint of the Z-boxes that begin at or before ili; For every i>1, li is the left endpoint of the Z-box ends at ri PreprocessingZi(S) = The longest prefix of S[i..|S|] that matches a prefix of S, where i > 1 1 12 3 456 7 8 901S = aabcaabxaazZ5(S) = Z6(S) = Z7(S) = Z8(S) = 0Z9(S) = 2 (aabaaz) 3 (aabcaabx)1 (aaab)We will use Zi in place of Zi(S)Z Boxfor i > 1, where Zi is greater than zero
Figure 1.2: From GusfieldThe li and ri of Z-Box
40 50 55 62 70 78 82 85 89 95ri = the right-most endpoint of the Z-boxes that begin at or before position i.li = the left end of the Z-box that ends at ri.r78 = 95l78 = 78r82 = 95l82 = 78r52 = 50l52 = 40r75 = 85l75 = 7015 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 S: a a b a a b c a x a a b a a b c y Z: 0 1 0 3 1 0 0 1 0 7 1 0 3 1 0 0 0Z-box
a a b a a b c a x a a b a a b c y ri: 0 2 2 6 6 6 6 8 8 16 16 16 16 16 16 16 16 li: 0 2 2 4 4 4 4 8 8 10 10 10 10 10 10 10 10Preprocessing16Z-AlgorithmGoal: To calculate Zi for an input string S in a linear time
Starting from i=2, calculate Z2, r2 and l2For i=3; irCompare the characters starting at k+1 with those starting at 1.Update r, and l if necessary22Z-AlgorithmInput: Pattern POutput: ZiZ AlgorithmCalculate Z2, r2 and l2 specifically by comparisons. R= r2 and l=l2 for i=3; i