of 48/48
م ی ح ر ل ا ن م ح ر ل له ا ل م ا س ب1

Approximate String Matching

  • View
    132

  • Download
    1

Embed Size (px)

Text of Approximate String Matching

1

: 1393 2

Parallel Algorithm for Approximate StringMatching with k Differences k

3

4

" " . . 5

....

6

Levenshtein (2) : .

K-difference k .7

1 ( ):

X=X[0,..,n-1] Y=Y[0,..,m-1] Edit(X,Y) X, Y X Y : X X X

8

2 ( K ):Edit( P[0,,m-1],T[ ,,j-1] ) k9

1 :

10

11

1 ( ) P=ACTACG T=TAGTACG .

12

13

.If i=0 D[i,j]=0 [s[1..i [t[1..0 i .If j=0 D[i,j]= i

14

[p[1..i-1 [t[l..j-1 k p[i] t[j] k s p .If T[j-1,i-1] = P[i-1] D[i,j] = D[i-1,j-1]15

: [p[1..i [t[l..j-1 k [t[j [t[1..j k+1 . [p[1..i-1 [t[l..j k [p[1..i [p[i k+1 . [p[1..i-1 [t[l..j-1 k [p[1..i [t[j [p[i k+1 .

D[i,j]=1+min{D[i-1,j],D[i-1,j-1],D[i,j-1]}16

[d[i,j [d[i,j .17

T= TAGTP= ACT18

T= TAGT

P=ACT

2132Edit(P,T)min19

for i=0:m d(0,i)=0;endfor j=0:n d(i,0)=i;endfor i=1:n for j=1:m if p(i-1)==t(j-1) d(i,j)=d(i-1,j-1); else d(i,j)=minmum(d(i,j-1),d(i-1,j),d(i-1,j-1))+1; end endend

O(nm)

O(n)O(m)20

K T(n)=O(n)+O(m)+O(mn)=O(mn) 21

. .22

23

k 24

.

25

. 26

. D[i ,j] =1+min{D[i-1,j-1], D[i-1 ,j], D[i ,j-1]} 27

D[i , j] N 1 N j-1 1 :

D[i-1, j-1] D[i , j-1] < D[i-1, j-1]

28

. 29

.

30

X .

31

X 2 :

32

33

.

34

: X1. FOR i=0 to || -1 PARALLEL DO FOR i=0 to n DO compute X[i,j] according formula(3); END FOR END FOR PARALLEL

O(n)

O(n)35

: D2. FOR i=0 to m DO FOR i=0 to n PARALLEL DO compute D[i,j] according formula(4); END FOR PARALLEL Barrier synchronization END FOR

O(1)

O(m)36

3.FOR j=0 to n-1 PARALLEL DO IF (D[m,j+1] K) Result[j] j; ELSE Result[j] -1; END FOR PARALLEL

O(1)37

T(n)=1 + 2 + 3

T(n)=t(1)+t(2)+t(3)=O(n) +O(m) +O(1)=O(m+n)

O(m+n) O(n)38

: n (n )

: m+1 (m )

: m+1

: n+m+1

39

.40

1

41

( X)

42

( D)

43

( )

44

K . . K m+1 n n+m+1 m+1 . DNA 45

[1] L. Z., B. J., and J. T. A software system for gene sequence databaseconstruction. Engineering in Medicine and Biology Society, 2005.[2] L. V.I. Binary codes capable of correcting deletions, insertions andreversals. ov. Phys. Dokl, 1996.10.[3] G. Navarro and R. Baeza-yates. A hybrid indexing method for approximatestring matching. Journal of Discrete Algorithms, 1(1):2149, 2000.[4] Z. C and C. GL. Parallel algorithms for approximate string matching onpram and larpbs. Journal of software, 15:159169, 2004.[5] S. P. The theory and computation of evolutionary distance:patternrecognition. Journal of Algorithms, pages 359373, 1980.1.[6] G. Navarro. A guided tour to approximate string matching. ACMComputing Surveys, 33(1):3188, 2000.[7] B.-Y. Z. Y.S.Jayram and R. Krauthgamer. Approximating Edit DistanceEfficiently. Computer Science, 2004.10.[8] K. A. T. MIURA and I. SHIOYA. Approximate String Matching UsingMarkovian Distance. Algorithms and Programming, 2010.[9] D. S. J. Zibert and N. Pavesic. An edit-distance model for the approximatematching of timed strings. Pattern Analysis and MachineIntelligence, 31(4):736741, 2009.46

46

[9] D. S. J. Zibert and N. Pavesic. An edit-distance model for the approximatematching of timed strings. Pattern Analysis and MachineIntelligence, 31(4):736741, 2009.[10] L. D. S. Wang and Z. Mei. Approximate Address Matching. InternationalConference on P2P, Parallel, Grid, Cloud and Internet Computing,2010.10.[11] H.-C. Lee and E. F. RMESH algorithms for parallel string matching.Los Alamitos: IEEE Computer Society Press, 1997.[12] A. H. Wright and Y. Jiang. O(k) parallel algorithms for approximatestring matching. ournal of Neural Parallel and Scientific Computation,1993.1.[13] S. Xiao and W. chun Feng. Inter-Block GPU communication via fastbarrier synchronization. 24th IEEE International Parallel DistributedProcessing Symposium, 2010.[14] K. C. K. G. Margaritis. String Matching on a Multicore GPU usingCUDA. 13th Panhellenic Conference on Informatics, 2009.

47

Thank YouVery much. 48

48