24
1 String Matching of Bit Parallel Suffix Automata

1 String Matching of Bit Parallel Suffix Automata

Embed Size (px)

Citation preview

Page 1: 1 String Matching of Bit Parallel Suffix Automata

1

String Matching of Bit Parallel Suffix Automata

Page 2: 1 String Matching of Bit Parallel Suffix Automata

2

Suffix Automata

Base on a Deterministic Acyclic Word Graph (DAWG) To facilitate comparing equivalence suffix string Nondeterministic suffix automata

Deterministic suffix automataSubset Construction

Page 3: 1 String Matching of Bit Parallel Suffix Automata

3

Suffix Automata Search Also called Backward Deterministic automata Matching (BDM) Build the factor x for pattern p

endpos(x) set of all the pattern position where an occurrence of x ends Ex: Pattern = baabbaa, endpos(aa) = {3,7}

Safe shift, if no equivalent suffix in pattern

Text: shift left to right

Fail to matching a factor

Shift window

Windows size = pattern length

Page 4: 1 String Matching of Bit Parallel Suffix Automata

4

BDM AlgorithmBuild automata

Reached the final state

Page 5: 1 String Matching of Bit Parallel Suffix Automata

5

Suffix Automata Search Example1. Build Reverse Deterministic Suffix Automata

2. endpos(x) to find a factor

3. Fail to find a factor, do a safe shift

Page 6: 1 String Matching of Bit Parallel Suffix Automata

6

1. T= [abbaba a ]bbaab a is a factor of pr and a reverse prefix of p. last =6

01234567

145

26 4

5

62367

737

aa

a a a

a

b

b

b

bb

Suffix Automata Search Example

Page 7: 1 String Matching of Bit Parallel Suffix Automata

7

2. T= [abbab aa ]bbaab aa is a factor of pr and a reverse prefix of p. last =5

01234567

145

26 4

5

62367

737

aa

a a a

a

b

b

b

bb

Suffix Automata Search Example

Page 8: 1 String Matching of Bit Parallel Suffix Automata

8

3. T= [abba baa ]bbaab

aab is a factor of pr

01234567

145

26 4

5

62367

737

aa

a a a

a

b

b

b

bb

Suffix Automata Search Example

Page 9: 1 String Matching of Bit Parallel Suffix Automata

9

4. T= [abb abaa ]bbaabWe fail to recognize the next a.So we shift the window to last.We search again in position:T= abbab[aabbaab] . last=7

01234567

145

26 4

5

62367

737

aa

a a a

a

b

b

b

bb

Suffix Automata Search Example

Page 10: 1 String Matching of Bit Parallel Suffix Automata

10

5. T= abbab[aabbaa b ]b is a factor of pr

01234567

145

26 4

5

62367

737

aa

a a a

a

b

b

b

bb

Suffix Automata Search Example

Page 11: 1 String Matching of Bit Parallel Suffix Automata

11

6. T= abbab[aabba ab ]

ba is a factor of pr

01234567

145

26 4

5

62367

737

aa

a a a

a

b

b

b

bb

Suffix Automata Search Example

Page 12: 1 String Matching of Bit Parallel Suffix Automata

12

7. T= abbab[aabb aab ]

baa is a factor of pr and a reverse prefix of p. last =4

01234567

145

26 4

5

62367

737

aa

a a a

a

b

b

b

bb

Suffix Automata Search Example

Page 13: 1 String Matching of Bit Parallel Suffix Automata

13

8. T= abbab[aab baab ]

baab is a factor of pr

01234567

145

26 4

5

62367

737

aa

a a a

a

b

b

b

bb

Suffix Automata Search Example

Page 14: 1 String Matching of Bit Parallel Suffix Automata

14

9. T= abbab[aa bbaab ]baabb is a factor of pr

01234567

145

26 4

5

62367

737

aa

a a a

a

b

b

b

bb

Suffix Automata Search Example

Page 15: 1 String Matching of Bit Parallel Suffix Automata

15

10. T= abbab[a abbaab ]

baabba is a factor of pr

01234567

145

26 4

5

62367

737

aa

a a a

a

b

b

b

bb

Suffix Automata Search Example

Page 16: 1 String Matching of Bit Parallel Suffix Automata

16

11. T= abbab[ aabbaab ]

We recognize the word aabbaab and report an occurrence.

01234567

145

26 4

5

62367

737

aa

a a a

a

b

b

b

bb

Suffix Automata Search Example

Page 17: 1 String Matching of Bit Parallel Suffix Automata

17

BNDM Algorithm

Backward Nondeterministic Dawg Matching (BNDM)

Handle class, multiple pattern, and allow errors Using bit parallelism, Combine Shift-Or and BD

M Faster than BDM 20% ~ 25%, Faster than BM

10% ~ 40% Update Function

Page 18: 1 String Matching of Bit Parallel Suffix Automata

18

BNDM Algorithm

Page 19: 1 String Matching of Bit Parallel Suffix Automata

19

BNDM Example

Page 20: 1 String Matching of Bit Parallel Suffix Automata

20

BNDM Example

Page 21: 1 String Matching of Bit Parallel Suffix Automata

21

BNDM Further Improvement

Handle long pattern Partition pattern p into subpatterns pi Build a array of D and B, process each part with basic algorithm If pi is found, than process pi+1 …

Handle Class Modified B table only

Have the ith bit set for all chars belonging to ith position in pattern Multiple Pattern

Two method Interleave patterns, shift r bit for each D update Just concatenate, shift 1 bit, but modifed D = (D<<1) &(1m-10)r

Where r is # of patterns Approximate Matching

Use Wu’s method

Page 22: 1 String Matching of Bit Parallel Suffix Automata

22

Performance Comparison

In 1/100 of second per megabyte

Page 23: 1 String Matching of Bit Parallel Suffix Automata

23

Reference

Gonzalo Navarro and Mathieu Raffinot. A Bit-parallel approach to Suffix Automata: Fast Extended String Matching. In M. Farach (editor), Proc. CPM'98, LNCS 1448. Pages 14-33, 1998.

Gonzalo Navarro, Mathieu Raffinot, Fast and Flexible String Matching by Combining Bit-parallelism and Suffix Automata (1998)

Page 24: 1 String Matching of Bit Parallel Suffix Automata

24

Rreverse Pattern ?