45
1 Finding approximate pali ndromes in strings Pattern Recognition, vol.35, pp. 25 81-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof. R. C. T. Lee Speaker: L. C. Chen

1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

Embed Size (px)

Citation preview

Page 1: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

1

Finding approximate palindromes in strings

Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa

Advisor: Prof. R. C. T. Lee Speaker: L. C. Chen

Page 2: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

2

Definition

• S: a string of n characters.• S[i]: the ith character in S.

S[i..j]: the substring of S whose first and last characters are S[i] and S[j].

SR: the reverse of S.

S: abcab

SR:bacba

Page 3: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

3

Definition

• A even(odd) palindrome is a string which is of the form of SRS(SRaS). Thus abaccaba is a palindrome because abac is the reverse of caba.

S[c]: the center of palindrome S[i…j] in S, where

. 2/)1(1 ijic

1 2 3 4 5 6 7 8

c b a c c a b aS

S[2…7]=baccab is an even palindrome and S[c]=4

Page 4: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

4

Edit distance• In edit distance, there are three types of differences between two strings X and Y:

• Insertion: a symbol of Y is missing in X at a corresponding position.

• Substitution: symbols at corresponding positions are distinct.

• Deletion: a symbol of X is missing in Y at a corresponding position.

X : A - T Y : A G T X : A C CY : T C C

X: G C AY: G - A

Page 5: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

5

• denotes the edit distance between two strings A and B as the minimum number of substitutions, insertions and deletions of characters in B to transform to A.

),( BAED

A=abcab-aB=cb–abbc Insertion:1, Substitution:2 and Deletion:1.

4),( BAED

Page 6: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

6

Approximate palindromes

• An approximate palindrome with error up to k : a string of the form of SRS(SRaS) such that ED(S,SR) ≦k.

• An approximate palindrome is maximal if no other approximate palindrome for the same c and k exists having strictly greater size or the same size but strictly fewer errors.

Page 7: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

7

• To simplify our discussion, we only discuss even approximate palindromes here.

• S: aabaabcd and k=1.

1 2 3 4 5 6 7 8

a a b a a b c dS

At c=3,abaa and aabaa are even approximate palindromes,

and aabaa is a maximal approximate palindrome.

Delete bSubstitute b with a

Page 8: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

8

Problem

• Given a string T of size n, we want to find all maximal approximate palindromes in T with up to k errors.

• For each c, we find the largest i’ and j’ in T[c+1…n] and TR[1…c] respectively such that ED(T[c+1…i’] ), TR[1…j’]) ≦k.

Page 9: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

9

• Let S2=TR[1…c] and S1=T[c+1…n], where 1≦c≦n.

• In the dynamic programming approach, we construct a matrix Dn’+1,m’+1 when Di,j is the minimum edit distance between S1[1,i] and S2[1,j], where the length of S1

and S2 are n’ and m’ respectively.

Page 10: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

10

• T: dbcaabac, and k=2.• At c=3,

S2=TR [1…3] =cbd and S1=T[4…7]=aabac.

i

j

a a b a c

0 1 2 3 4 5

c 1 1 2 3 3 4

b 2 2 2 2 3 4

d 3 3 3 3 3 4

We can find that the maximal approximate palindrome is bcaab.

↖: substitution or a matching ↑: deletion

←: insertion

Page 11: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

11

• How can we compute the table faster?• In this paper, the method in [LV89]( L.Y. Huang)

was used.

Page 12: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

12

• We shall heavily use the concept of diagonal.

• Diagonal d is defined as all of the Di,j’s where d = i – j.

• The diagonal property: Di,j-Di-1,j-1=0 or 1. It means

that on the diagonal, the values are monotonically increasing. [U85]

Diagonal 2

Diagonal 0

1

1222c

211b

3210

cba

i 1 2 3

j

1

2

Page 13: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

13

• Consider diagonal d=0. Let us find the largest j, if it exists, such that (i,j) is on Diagonal d (i - j = d) and Di,j = 0.

• Let us now label all of these locations.

4c

3t

2t

01g

76543210

atctgggi 1 2 3 4 5 6 7

j

1

2

3

4

Diagonal 0

S1=gggtctaS2=gttc

Page 14: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

14

• Having found the above locations (i, j) where Di,j = 0, we can further find the largest j, if it exists, such that (i, j) is on Diagonal d and Di,j = 1.

• To do this, we use the following observation: Each element in Diagonal d can only influence elements in Diagonals d-1, d and d+1.

Page 15: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

15

• Let us consider any (i, j) location on Diagonal d.

– Di,j can only be influenced as shown below:

• Thus, we conclude that we only need to consider Diagonals d-1, d and d+1 for each Di,j.

Di-1, j-1Di, j-1

Di-1, jDi, j

d

d+1

d-1

delete

insert

substitution

Page 16: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

16

• Observe the following two strings:

• If i and j are the largest i and j such that ED(T1[1…i],T2[1…j]) = k and T1[i+1]≠ T2[j+1],

then ED(A1+x, A2+y) = k+1.

A1

A2

x

y

T1

T2 1 j

1 i

Page 17: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

17

• Consider T1=abcd and T2=cdde. ED(T1[1…i],T2[1…j])=2. The largest such i and j are 2 and 3 respectively, and T1[i+1]≠ T2[j+1]. Thus the ED(ab+c,cbd+e)=2+1=3.

T1 ab c

T2 cbd

dd

e1 j

1 i

Page 18: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

18

• Based upon the above discussion, on a diagonal d, we can find the largest i and j such that Di,j =e.

• How can we find the largest row containing the value smaller or equal to k ?

• We need to let Ld,e denote the largest row j such that Di,j is on the Diagonal d (i- j = d) and Di,j =e≦k.

Page 19: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

19

• Let Ld,e denote the largest row j such that Di,j is on the Diagonal d (i- j = d) and Di,j =e≦k.

• Based upon this definition, e is the edit distance between S

1[1…i] and S2[1…j] such that i and j are the such largest ones, and S2[ j+1] ≠S1[i+1].

• At d =0. L0,0 = 1, L0,1=2, L1,2 =3 and L1,3 =4.

g g g t c t a

0 1 2 3 4 5 6 7

g 1 0 1 2 3 4 5 6

t 2 1 1 2 2 3 4 5

t 3 2 2 2 2 3 3 4

c 4 3 3 3 3 2 3 4

i 1 2 3 4 5 6 7

j

1

2

3

4

S1=gggtctaS2=gttc

dd=0=0

Page 20: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

20

• How can we compute the Ld,e’s value?

• We define

rowd,e = max[(Ld,e-1+1),(Ld-1,e-1),(Ld+1,e-1+1)].

(substitution) (insertion) (deletion) Ld,e= rowd,e+t, where t= the length of the longest comm

on prefix of S1[d+rowd,e+1…n’] and S2[rowd,e+1…m’]. If t=0, it means that S1[d+rowd,e+1] ≠S2[rowd,e+1].

Page 21: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

21

• Consider D3,2. L1,1=1. The largest j on d=1 for Di,j=1 is j=1. In this case, d=1, e=2. Ld,e-1=L1,1=1, Ld-1,e-1=L0,1=2 and Ld+1,e-1=L2,1=0. Thus rowd,e=row1,2=max(L1,1+1,L0,1,L2,1+1)=max(1+1,2,0+1)=max(2,2,1)=2.

g g g t c t a

0 1 2 3 4 5 6 7

g 1 0 1 2 3 4 5 6

t 2 1 1 2 2 3 4 5

t 3 2 2 2 2 3 3 4

c 4 3 3 3 3 2 3 4

i 1 2 3 4 5 6 7

j

1

2

3

4

dd=0=0 dd=1=1 dd=2=2

Page 22: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

22

• How to compute L-1,1?• row-1,1 = max[(Ld,e-1+1),(Ld-1,e-1),(Ld+1,e-1+1)] = max[(L-1,0+1),(L-2,0),(L0,0+1)] = max[0+1, 0, 1+1]= max[1, 0, 2] = 2 Since S1[d+rowd,e+1]= S1[-1+1+2]=g ≠S2[rowd,e+1]=S2[2+1]=t, L-1,1 = row-1,1+0 = 2.

d = -1

i 1 2 3 4 5 6 7

j

1

2

3

4 4c

3t

12t

01 g

76543210

atctgggS1=gggtctaS2=gttc

• e =1, d = -1

Page 23: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

23

• How to compute L1,2?• row1,2 = max[(Ld,e-1+1),(Ld-1,e-1),(Ld+1,e-1+1)] = max[(L1,1+1),(L0,1),(L2,1+1)] = max[1+1, 2, 0+1]= max[2, 2, 1] = 2. Since the length of the longest common prefix of S1[d+row

1,2+1…n’]=S1[4…7]=tcta and S2[row1,2+1…m’]= S2[3…4]=tc is 2, L1,2 = row1,2+2 =4.

d = 1

i 1 2 3 4 5 6 7

j

1

2

3

4 24c

22223t

2112t

101 g

76543210

atctggg

S1=gggtctaS2=gttc

Page 24: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

24

• Ld,e=rowd,e+t, where t= the length of the longest common prefix of S1[d+rowd,e+1…n’] and S2[rowd,e+1…m’].

• How can we compute t ?

In this paper, LCA (lowest common ancestor ) is used.

Page 25: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

25

• Consider two substrings T1 and T2 as shown below:

T1 A1 S1

T2 A2 S2

If ED(A1, A2) =k and S1=S2, then ED(A1+S1, A2+S2) =k.

x

y

Page 26: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

26

When we find the ED(A1, A2) =k, we want to determine whether the longest common prefix S of B1 and B2 exists.

This paper will use LCA (lowest common ancestor) to

find S.

A1

A2

S

S

x

y

S1

S2

B1

B2

Page 27: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

27

• To find such S, if it exists, we may concatenate S1 and S2 to a new string.

• Obviously, suffixes S1’ and S2’ have a common prefix S.

A1

A2

S

S

x

y

S1

S2

SA1 x ySA2

S2’

S1’

Page 28: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

28

• Let us concatenate S1 and S2 to be a new string as follows:

Consider D3,2, the substring after ggg is tctagttc=S1’. The substring after gt is tc=S2’. Note that S2’ and S1’ have a common prefix with length 2. Thus we have that D3,2=D4,3=D5,4=2.

S1=gggtctaS2=gttc g g g t c t a

0 1 2 3 4 5 6 7

g 1 0 1 2 3 4 5 6

t 2 1 1 2 2 3 4 5

t 3 2 2 2 2 3 3 4

c 4 3 3 3 3 2 3 4

i 1 2 3 4 5 6 7

j

1

2

3

4

ggg gt tctcta

d = 1

Page 29: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

29

agttc$

tc$

g

c

t

tagttc$ $

g t

gtctagttc$

tctagttc$

tc$

ctagttc$

tagttc$

agttc$

c

$

S1=gggtctaS2=gttcLet us concatenate S1 and S2 to be a new string as follows:gggtctagttaa. And then we construct the suffix tree of it. The substring after ggg is tctagttc=S1’. The substring after gt is tc=S2’. Note that S2’ and S1’ have a common ancestor tc of length 2.

ggg gt tctcta

Page 30: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

30

AlgorithmInitialization for all d, 1≦d ≦k+1, d > e, Ld,e=-1 .

for all d, -(k+1) ≦d -1,≦ Ld,|d|-1= -1, Ld,|d|-2 =|d|-2 .

for all e, -1≦e≦k, Ln’+1,e = -1

Find L0,0= the length of longest common prefix of S1 and S2 For e = 1 to k do

For d = -e to e do

rowd,e = max[(Ld,e-1+1),(Ld-1,e-1),(Ld+1,e-1+1)]

rowd,e = min(rowd,e,m’)

while rowd,e < m’ and row d,e+d <n’ do find t= the length of longest common prefix of

S1[d+rowd,e+1…n’] and S2[rowd,e+1…m’];

rowd,e = rowd,e + t;

Ld,e = rowd,e.

Page 31: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

31

g g g t c t a

0 1 2 3 4 5 6 7

g 1

t 2

t 3

c 4

Example:

T = cttggggtcta and k=2.

At c=4, T[1…4]=cttg, S2=TR[1..4]=gttc and S1=T[5…11]=gggtcta.

i 1 2 3 4 5 6 7

j

1

2

3

4

S2

S1

Page 32: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

32

• At d = 0, find the largest j such that S2[1…j] is equal to S1[1..i], then we set the value of L0,0 = j.

•S2[1] = S1[1], L0,0 =1

i 1 2 3 4 5 6 7

4c

3t

2t

01 g

76543210

atctggg

j

1

2

3

4

d=0

S2

S1

Page 33: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

33

• row-1,1 = max[(Ld,e-1+1),(Ld-1,e-1),(Ld+1,e-1+1)]

= max[0,0,2]=2.

the length of longest common prefix of ggtctagttc and tc is 0.

• L-1,1 = 2

d = -1

i 1 2 3 4 5 6 7

j

1

2

3

4 4c

3t

12t

01 g

76543210

atctggg

• e =1, d = -1

S2

S1

Page 34: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

34

The length of LCA of ggtctagttc and tc is 0.

agttc$

tc$

g

c

t

tagttc$ $

g t

gtctagttc$

tctagttc$

tc$

ctagttc$

tagttc$

aggttc$

c

$

Page 35: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

35

row0,1 = max[(Ld,e-1+1),(Ld-1,e-1),(Ld+1,e-1+1)] = max[2,0,1]=2.the length of common prefix of gtctagttc and tc is 0.L0,1 = 2

d = 0

i 1 2 3 4 5 6 7

j

1

2

3

4 4c

3t

112t

01 g

76543210

atctggg

• e =1, d = 0

S2

S1

Page 36: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

36

The length of LCA of gtctagttc and tc is 0.

agttc$

tc$

g

c

t

tagttc$ $

g t

gtctagttc$

tctagttc$

tc$

ctagttc$

tagttc$

aggttc$

c

$

Page 37: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

37

row1,1= max[(Ld,e-1+1),(Ld-1,e-1),(Ld+1,e-1+1)] =1.the length of common prefix of gtctagttc and ttc is 0.

L1,1 = 1

d = 1

i 1 2 3 4 5 6 7

j

1

2

3

4 4c

3t

112t

101 g

76543210

atctggg

• e =1, d = 1

S2

S1

Page 38: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

38

The length of LCA of gtctagttc and ttc is 0.

agttc$

tc$

g

c

t

tagttc$ $

g t

gtctagttc$

tctagttc$

tc$

ctagttc$

tagttc$

aggttc$

c

$

Page 39: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

39

• row1,2 = max[(Ld,e-1+1),(Ld-1,e-1),(Ld+1,e-1+1)] =2d = 1

i 1 2 3 4 5 6 7 •e =2, d = 1

j

1

2

3

4 4c

2223t

2112t

101 g

76543210

atctgggS2

S1

Page 40: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

40

We find that the longest common prefix of tc and tctagttc is tc.d = 1

i 1 2 3 4 5 6 7

j

1

2

3

4 24c

22223t

2112t

101 g

76543210

atctggg

tctaggg g tctS1’

S2’

•e =2, d = 1

L1,2 = row+2=2+2=4

Page 41: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

41

agttc$

tc$

g

c

t

tagttc$ $

g t

gtctagttc$

tctagttc$

tc$

ctagttc$

tagttc$

aggttc$

c

$

The length of LCA of tctagttc and ttc is 2.

Page 42: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

42

row2,2 = max[(Ld,e-1+1),(Ld-1,e-1),(Ld+1,e-1+1)] =1

• We find that the lenghth of common prefix of ttc and tctagttc is 1.

d = 2

i 1 2 3 4 5 6 7

tctaggg g ttcS1’

S2’

•e =2, d = 2

j

1

2

3

4 24c

22223t

22112t

2101 g

76543210

atctggg

L2,2 = row2,2+1=1+1=2

S1

S2

Page 43: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

43

agttc$

tc$

g

c

t

tagttc$ $

g t

gtctagttc$

tctagttc$

tc$

ctagttc$

tagttc$

aggttc$

c

$

The length of LCA of ttc and tctagttc is 1.

Page 44: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

44

T = cttggggtcta and k=2.

At c=4, T[1…4]=cttg, TR[1..4]=gttc and TR[5…11]=gggtcta.

cttggggtc is the maximal approximate palindromes.

i 1 2 3 4 5 6 7

j

1

2

3

4 24c

2223t

2112t

2101 g

76543210

atctggg

2

2

S2

S1

S1=gggtctaS2=gttc

Page 45: 1 Finding approximate palindromes in strings Pattern Recognition, vol.35, pp. 2581-2591, 2002 Alexandre H. L Porto and Valmir C. Barbosa Advisor: Prof

45

References

• [U85] Finding approximate patterns in strings, Ukkonen, E., Journal of algorithms, Vol. 6, 1985, pp.132-137.

• [LV89] Fast parallel and serial approximate string matching, G. Landau and U. Vishkin, Journal of algorithms, Vol. 10, 1989, pp.157-169.