28
Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt April 18, 2007

Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt

Parallel String Matching Algorithm(s) Using

Associative Processors

Parallel String Matching Algorithm(s) Using

Associative ProcessorsOriginal work by

Mary Esenwein and Dr. Johnnie Baker

Presented by Shannon Steinfadt

April 18, 2007

Original work by Mary Esenwein and Dr. Johnnie

Baker

Presented by Shannon SteinfadtApril 18, 2007

Page 2: Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt

2

String Matching ProblemString Matching Problem

Aka. pattern matching or string searching

Useful in many applications such as text editing and information retrieval, DNA analysis, Homeland Security

Aka. pattern matching or string searching

Useful in many applications such as text editing and information retrieval, DNA analysis, Homeland Security

Page 3: Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt

3

What are we doing?What are we doing?

Given a pattern and some text, find out if the pattern is IN the text

Is pattern AB in the text ABAA? If so, where?

Given a pattern and some text, find out if the pattern is IN the text

Is pattern AB in the text ABAA? If so, where? AB

ABAA

Page 4: Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt

4

What’s the notation?What’s the notation?

P is a pattern string of length m T is a text string of length n,

usually n ≥ m

P is a pattern string of length m T is a text string of length n,

usually n ≥ m

Page 5: Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt

5

Goal of String MatchingGoal of String Matching To find all occurrences of a pattern

string in the text string Locate all positions i in T such that

T[i+j-1] = P[j] for all j, 1 ≤ j ≤ m

To find all occurrences of a pattern string in the text string

Locate all positions i in T such that T[i+j-1] = P[j] for all j, 1 ≤ j ≤ m

Why use P[j]? How does it relate to T[i+j-1]?

Page 6: Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt

6

Pattern VariationsPattern Variations An exact pattern A “Don’t Care” character (*) in

pattern Flexibility in matching * indicates character(s) of the text that

are irrelevant to the matching process

An exact pattern A “Don’t Care” character (*) in

pattern Flexibility in matching * indicates character(s) of the text that

are irrelevant to the matching process

Page 7: Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt

7

General “Don’t Care” Character’s (*) Characteristics

General “Don’t Care” Character’s (*) Characteristics

Single character of text Multiple consecutive text characters No characters Combination of above threeExample:

Pattern AB*CD could match ABBCD, ABBBBBCD, or ABCD (* is null)

Single character of text Multiple consecutive text characters No characters Combination of above threeExample:

Pattern AB*CD could match ABBCD, ABBBBBCD, or ABCD (* is null)

Page 8: Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt

8

String Matching using ASCString Matching using ASC

Three parallel algorithms using associative computing (using 1-D mesh) String matching for exact match String matching with fixed length “don’t

care” I.e., exactly 1 character

String matching with variable length “don’t care”

a “don’t care” can have any length or be null

Three parallel algorithms using associative computing (using 1-D mesh) String matching for exact match String matching with fixed length “don’t

care” I.e., exactly 1 character

String matching with variable length “don’t care”

a “don’t care” can have any length or be null

Page 9: Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt

9

ASC Exact Match Algorithm ASC Exact Match Algorithm for (j = patt_length - 1; j >= 0; j--){

Responders are text[$] == patt_string[j]and counter[$] == patt_counter;

Responders add 1 to counter[$] and store result in counter[$] of preceding cell;patt_counter++;

}

/* When pattern has been processed */Responders are counter[$] == patt_length;

Responders set match[$] = 1 in next cell;

for (j = patt_length - 1; j >= 0; j--){

Responders are text[$] == patt_string[j]and counter[$] == patt_counter;

Responders add 1 to counter[$] and store result in counter[$] of preceding cell;patt_counter++;

}

/* When pattern has been processed */Responders are counter[$] == patt_length;

Responders set match[$] = 1 in next cell;

Page 10: Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt

@ 0 0

A 0 0

B 0 0

B 0 0

B 0 0

A 0 0

B 0 0

B 0 0

B 0 0

A 0 0

B 0 0

A 0 0

Text[$] Match[$] Counter[$]

Pattern: BBA

Text:

ABBBABBBABA

m=pattern length

n=text length

j = pattern index

i = text indexPattern:

BBA

0

patt_counter

patt_length

3

Page 11: Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt

11

Page 12: Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt

@ 0 1

A 0 0

B 0 3

B 1 2

B 0 1

A 0 0

B 0 3

B 1 2

B 0 1

A 0 2

B 0 1

A 0 0

Text[$] Match[$] Counter[$]

Pattern: BBA

Text: ABBBABBBABA

m = pattern length

n = text length

j = pattern index

i = text index

Final State of Exact Match Algorithm

B

B

A

1

0

0

1

0

0

B

B

A

Page 13: Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt

13

Algorithm for unit length "don't cares" using ASCAlgorithm for unit length "don't cares" using ASC

for (j = patt_length - 1; j >= 0; j--){

if (pattern[j] == '*')Responders are counter[$] == patt_counter;

else // pattern[j] is not the “don’t care” characterResponders are text[$] == pattern[j]

and counter[$] == patt_counter;

If no Responders are detected, exit;

Responders add 1 to counter[$] and store result in counter[$] of preceding cell;patt_counter++;

}

/* When pattern has been processed */Responders are counter[$] == patt_length;

Responders set match[$] = 1 in next cell;

for (j = patt_length - 1; j >= 0; j--){

if (pattern[j] == '*')Responders are counter[$] == patt_counter;

else // pattern[j] is not the “don’t care” characterResponders are text[$] == pattern[j]

and counter[$] == patt_counter;

If no Responders are detected, exit;

Responders add 1 to counter[$] and store result in counter[$] of preceding cell;patt_counter++;

}

/* When pattern has been processed */Responders are counter[$] == patt_length;

Responders set match[$] = 1 in next cell;

Page 14: Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt

14

ASC Exact Match Algorithm (again)

ASC Exact Match Algorithm (again)

for (j = patt_length - 1; j >= 0; j--){

Responders are text[$] == patt_string[j]and counter[$] == patt_counter;

Responders add 1 to counter[$] and store result in counter[$] of preceding cell;patt_counter++;

}

/* When pattern has been processed */Responders are counter[$] == patt_length;

Responders set match[$] = 1 in next cell;

for (j = patt_length - 1; j >= 0; j--){

Responders are text[$] == patt_string[j]and counter[$] == patt_counter;

Responders add 1 to counter[$] and store result in counter[$] of preceding cell;patt_counter++;

}

/* When pattern has been processed */Responders are counter[$] == patt_length;

Responders set match[$] = 1 in next cell;

Page 15: Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt

@ 0 0

A 0 0

B 0 0

B 0 0

B 0 0

A 0 0

B 0 0

B 0 0

B 0 0

A 0 0

B 0 0

A 0 0

Text[$] Match[$] Counter[$]

Pattern: BBA

Text:

ABBBABBBABA

m=pattern length

n=text length

j = pattern index

i = text indexPattern:

B*A

0

patt_counter

patt_length

3

Page 16: Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt

16

Page 17: Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt

@ 0 1

A 0 0

B 0 3

B 1 2

B 0 1

A 0 0

B 0 3

B 1 2

B 0 1

A 0 2

B 0 1

A 0 0

Text[$] Match[$] Counter[$]

Pattern: B*A

Text: ABBBABBBABA

m = pattern length

n = text length

j = pattern index

i = text index

Final State of Exact Match Algorithm

B

B

A

1

0

0

1

0

0

B

B

A

Page 18: Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt

18

VLDC Algorithm (added)VLDC Algorithm (added)

Works on each “segment” of the pattern broken up by the * character AB*BB*A has three sections

Consecutive ** characters not necessary, not allowed

This VLDC algorithm unique Provides information to find all continuation

points of all matches following each “*”

Works on each “segment” of the pattern broken up by the * character AB*BB*A has three sections

Consecutive ** characters not necessary, not allowed

This VLDC algorithm unique Provides information to find all continuation

points of all matches following each “*”

Page 19: Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt

19

VLDC ALGORITHM USING ASCVLDC ALGORITHM USING ASC

int patt_length = m;int maxcell = n + 2;/* Special handling for ‘*’ at end of pattern */if (pattern[m-1] == ‘*’){

Responders are cell index > 1;Responders set segment$[0] = 1;patt_counter = 1;k = 1; /* Reset initial segment index */

}while ((patt_length -= patt_counter) > 0 && maxcell > 0){

patt_counter = 0;for ( I = patt_length - 1; I>= 0 && pattern[I] != ‘*’; I--){

Responders are text$ == pattern[I] and counter$ == patt_counter and cell index < maxcell;

Responders add 1 to counter$ and store result in counter$ of preceding cell;

patt_counter++;}Responders are counter$ == patt_counter;

int patt_length = m;int maxcell = n + 2;/* Special handling for ‘*’ at end of pattern */if (pattern[m-1] == ‘*’){

Responders are cell index > 1;Responders set segment$[0] = 1;patt_counter = 1;k = 1; /* Reset initial segment index */

}while ((patt_length -= patt_counter) > 0 && maxcell > 0){

patt_counter = 0;for ( I = patt_length - 1; I>= 0 && pattern[I] != ‘*’; I--){

Responders are text$ == pattern[I] and counter$ == patt_counter and cell index < maxcell;

Responders add 1 to counter$ and store result in counter$ of preceding cell;

patt_counter++;}Responders are counter$ == patt_counter;

Page 20: Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt

20

VLDC continuedVLDC continuedResponders set segment$[k] = patt_counter in next cell;

Responders are segment$[k] > 0;maxcell = maximum cell index value of Responders else if no Responders maxcell = 0;All cells become Responders and set counter$ = 0;patt_counter++; k++ }

/* When pattern has been processed */Responders are segment$[--k] > 0;Responders set match$ = 1;

/* Special handling for ‘*’ at start of pattern */if (pattern[0] == ‘*’){

Responders are cell index < maxcell and cell index > 1;Responders set match$ = 1;

}

Responders set segment$[k] = patt_counter in next cell;Responders are segment$[k] > 0;maxcell = maximum cell index value of Responders else if no Responders maxcell = 0;All cells become Responders and set counter$ = 0;patt_counter++; k++ }

/* When pattern has been processed */Responders are segment$[--k] > 0;Responders set match$ = 1;

/* Special handling for ‘*’ at start of pattern */if (pattern[0] == ‘*’){

Responders are cell index < maxcell and cell index > 1;Responders set match$ = 1;

}

Page 21: Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt

Pattern: AB*BB*A

Text: ABBBABBBABA

After third pattern segment in VLDC Algorithm

@ 0 0 10 0 0 0 Y N

A 0 0 01 0 0 Y

B 0 0 0 0 0

B 0 0 0 0 0

B 0 0 10 0 0 0 Y N

A 0 0 01 0 0 Y

B 0 0 0 0 0

B 0 0 0 0 0

B 0 0 10 0 0 0 Y N

A 0 0 01 0 0 Y

B 0 0 10 0 0 0 Y N

A 0 0 01 0 0 Y

01 2

2

1

3

4

5

T$ M$ C$

6

1312

7

8

9

10

11

Maxcell

S0$ S1$ S2$

Patt_counter

12

Responder$

Page 22: Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt

Pattern: AB*BB*A

Text: ABBBABBBABA

After second pattern segment in VLDC

Algorithm@ 0 0 0 0 0

A 0 0 1 2 0 1 0 0 Y

B 0 0 1 2 0 0 0 2 0 Y Y Y

B 0 0 1 0 0 0 2 0 Y Y N

B 0 0 0 0 0 Y N

A 0 0 1 2 0 1 0 0 Y

B 0 0 1 2 0 0 0 2 0 Y Y Y

B 0 0 1 0 0 0 2 0 Y Y N

B 0 0 0 0 0 Y N

A 0 0 10 1 0 0

B 0 0 0 0 0 Y N

A 0 0 1 0 0

012

0123

2

1

3

4

5

T$ M$ Counter$

6

1312 8

7

8

9

10

11

Maxcell

S0$ S1$ S2$

Patt_counter

12

Responder$

(Used to keep pattern

segments in order, I.e.

AB occurs before BB)

Page 23: Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt

Pattern: AB*BB*A

Text: ABBBABBBABA

After first pattern segment in VLDC Algorithm

@ 0 0 2 0 0 0 0 Y

A 0 0 1 0 1 0 02 Y N

B 0 0 1 0 0 2 0 Y N

B 0 0 1 0 0 2 0 Y N

B 0 0 2 0 0 0 0 Y N Y

A 0 0 1 0 1 0 02 Y N

B 0 0 0 2 0 Y N

B 0 0 0 2 0

B 0 0 0 0 0

A 0 0 1 0 0

B 0 0 0 0 0

A 0 0 1 0 0

012

0123

0123

2

1

3

4

5

T$ M$ Counter$

6

1312 8 6

7

8

9

10

11

Maxcell

S0$ S1$ S2$

Patt_counter

12

Responder$

(Used to keep pattern

segments in order, I.e.

AB occurs before BB)

Page 24: Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt

Pattern: AB*BB*A

Text: ABBBABBBABA

Final State in VLDC Algorithm

@ 0 0 0 0 0

A 1 0 1 0 2 Y

B 0 0 0 2 0

B 0 0 0 2 0

B 0 0 0 0 0

A 1 0 1 0 2 Y

B 0 0 0 2 0

B 0 0 0 2 0

B 0 0 0 0 0

A 0 0 1 0 0

B 0 0 0 0 0

A 0 0 1 0 0

012

0123

0123

2

1

3

4

5

T$ M$ Counter$

6

1312 8 6

7

8

9

10

11

Maxcell

S0$ S1$ S2$

Patt_counter

12

Responder$

(Used to keep pattern

segments in order, I.e.

AB occurs before BB)

Page 25: Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt

25

Finding All Continuation Points

Finding All Continuation Points

Match starts where M$ = 1 Match to any pattern segment begins

where S$[x] == segment length i.e. where any S$[x] > 0

Continuation of match in S$[x-1] whose cell/PE index is >= (S$[x] + segment size) of S$[x]’s cell/PE index

Match starts where M$ = 1 Match to any pattern segment begins

where S$[x] == segment length i.e. where any S$[x] > 0

Continuation of match in S$[x-1] whose cell/PE index is >= (S$[x] + segment size) of S$[x]’s cell/PE index

Page 26: Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt

Pattern: AB*BB*A

Text: ABBBABBBABA

Using the Final State in VLDC Algorithm

@ 0 0 0 0 0

A 1 0 1 0 2

B 0 0 0 2 0

B 0 0 0 2 0

B 0 0 0 0 0

A 1 0 1 0 2

B 0 0 0 2 0

B 0 0 0 2 0

B 0 0 0 0 0

A 0 0 1 0 0

B 0 0 0 0 0

A 0 0 1 0 0

2

1

3

4

5

T$ M$ C$

6

7

8

9

10

11

S0$ S1$ S2$

12

•Start with index 2, where there’s a match M$=1•Work from S2$ down and left, count down 2 values and move into S1$, count down 2 values and move to S0$

•That produces: 246 ABBBA•Any index >= 4 in S1[$] whose value is >0 will also produce a correct match

•2710 ABBBABBBA•2810 ABBBABBBA

Some of the additional matches are:2410 ABBBABBBA 2412 ABBBABBBABA2812 ABBBABBBABA6810 ABBBA6812 ABBBABA

Page 27: Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt

27

Existing AlgorithmsExisting Algorithms Sequential Algorithms

Naïve algorithm: O(mn) Knuth, Morris, & Pratt, or Boyer-Moore: O(m+n)

Parallel Algorithms A PRAM exact string matching: O(n) On a reconfigurable mesh: O(1) on n(n-m+1) PEs On a SIMD hypercube (limited to {0,1}): O(lg n) on

n/lg n PEs On a neural network: O(1) on nm PEs ASC algorithms: O(m) time on O(n) PEs

Sequential Algorithms Naïve algorithm: O(mn) Knuth, Morris, & Pratt, or Boyer-Moore: O(m+n)

Parallel Algorithms A PRAM exact string matching: O(n) On a reconfigurable mesh: O(1) on n(n-m+1) PEs On a SIMD hypercube (limited to {0,1}): O(lg n) on

n/lg n PEs On a neural network: O(1) on nm PEs ASC algorithms: O(m) time on O(n) PEs

Page 28: Parallel String Matching Algorithm(s) Using Associative Processors Original work by Mary Esenwein and Dr. Johnnie Baker Presented by Shannon Steinfadt

28

Question to considerQuestion to consider

The “don’t care” character allows non-matching for an arbitrary length. This is discussed on slide 13. Instead, consider “*” to allow a non-match for two characters and make necessary changes in trace in Slide 15-16.

The “don’t care” character allows non-matching for an arbitrary length. This is discussed on slide 13. Instead, consider “*” to allow a non-match for two characters and make necessary changes in trace in Slide 15-16.