VLSI architectures for string matching and pattern matching

Pattern Recognition, Vol. 20, No. I, pp, 125-141. 1987. Printed in Great Britain.

0031 - 32q)3/87 $ 3.00 + .00 Pergamon Journals Ltd

Pattern Recognition Sociely

VLSI ARCHITECTURES FOR STRING MATCHING AND PATTERN MATCHING*

H. D. CHENG

Electrical and Computer Engineering Department, University of California, Davis, CA 95616, U.S.A.

and

K. S. Fu )~

School of Electrical Engineering, Purdue University, West Lafayette, IN 47907, U.S.A.

(Received 3 December 1985; in revised form 1 ! February 1986; received for publication 17 April 1986)

Abstractmln this paper, we discuss string-matching and dynamic time.warp pattern-matching. The string-matching problem arises in a number of applications such as in artificial intelligence, pattern recognition and information retrieval. The method of dynamic time-warping is a well-established technique for time alignment and comparison of speech and image patterns. It has found extensive application in speech recognition and related areas of pattern-matching.

We propose a VLSI architecture based on the space-time domain expansion approach which can compute the string distance and also give the matching index-pairs which correspond to the edit sequence. The time complexity is O(max(m, n)) by using m x n processing elements array, where m is the length ofthe input string and n is the length of the reference string. With a uniprocessor the matching process will have the time complexity O(m x n). If there are p reference strings, using the proposed architecture the string-matching problem can be solved in time O(max(m, n, p)). With a uniprocessor the time complexity will be O(m x n x p). We also propose a VLSI architecture for dynamic time-warping based on the space-time expansion method which can obtain high throughput by using extensive pipelining and parallelism. It can measure the dissimilarity between two patterns in time O(max(m, n, N)). If using a uniprocessor the time complexity will be O(m x n x N), where m and n are the numbers of feature vectors of the unknown input pattern and the reference template respectively, and N is the number of elements of the feature vector. If there are p reference templates, the time complexity will be O~max(m, n, N x p)), and if using a uniprocessor the time complexity will be O(m x n x p x N). The algorithm partition problems are discussed. Verifications of the proposed VLSI architectures are also given. The backtracking procedures are discussed in much detail and their hardware implementations are also given. The proposed architectures can be applied to many areas such as pattern recognition, information retrieval, image processing, speech processing, remote sensing, robotics, computer vision, artificial intelligence and office automata. They are useful to real-time information processing.

String-matching Algorithm partition Backtracking procedure

Dynamic time-warp pattern-matching Very large scale integration (VLSI)

Dynamic programming VLSI architecture verification

!. I N T R O D U C T I O N

The theory of dynamic programming was introduced by Bellman fa4) to solve mathematical problems arising from multistage decision processes. It has wide applications in computer science: ~ Based on the dynamic programming path-finding algorithm, the technique of dynamic programming is both mathematically sound and computationally efficient. Many researchers have attempted to speed-up dynamic programming procedure for obtaining real- time information processing by using pipeline and

*This work was supported by the NSF Grant ECS 80-16580.

parallel techniques. The recent advent of very-large- scale integration (VLSI) technology has created a new architecture horizon in implementing parallel algorithms directly on hardware. The use of VLSI architectures to implement dynamic programming procedures has been investigated for several particular applications. Guibas et al: ~s~ describe a VLSI implementation of a class of dynamic programming problems characterized by optimal parenthesization. Clarke and Dyer t2°1 describe four VLSI designs for a line and curve detection chip. Chiang and Fu 117~ describe a VLSI implementation of Earley's algorithm for parsing general context-free languages which is essentially a dynamic programming procedure. Liu and Fu ~'1 describe a VLSI implementation of string-

125

126 H.D. Cvmso and K. S. Fu

distance computation. Cheng and Fu ¢7~ describe algorithm partition and parallel recognition of general context-free languages using fixed-size VLSI architecture. Cheng and Fu a~ have proposed a VLSI architecture for hand-written symbol recognition. Cheng and Fu have also proposed a method to partition computation problems so they can be solved on fixed-size VLSI architectures/~ In this paper, we propose new VLSI architectures for string-matching and dynamic time-warp pattern-matching. The algorithm partition problem as a very important issue in VLSI design has also been discussed. The backtracking procedures are discussed in much detail and their hardware implementations are also given. The formal verifications for both VLSI architectures arc given.

2. PRELIMINARIES

2.1. String matching

The string-matching problem arises in a number of applications such as artificial intelligence, pattern recognition and information retrieval. The problem of string-matching can generally be classified into exact matching and approximate matching. For exact matching, a single string is matched against a set of strings. The VLSI implementation for exact pattern matching of strings containing wild cards has been proposed by Foster and Kung. "~

For approximate string matching, given a string s of some set S of possible strings, we want to find a string t which approximately matched this string, where t belongs to a subset T of S. Approximate string matching is based on the string distances which are computed by using the editing operations: substitution, insertion and deletion, aj A good survey about approximate string matching is given in Ref. (3).

Definition 1. Let A be a finite string of m symbols, where m is the length of the string. A(/) is the/th symbol of A.

A(i,j) is the ith through jth symbols of A and A(i, j) = A, the null string, if i > j.

Definition 2. An edit operation is a pair (a, b) (A, A) of strings of length less than or equal to I and is usually written a --, b.

substitution: a -., b if a # A and b # A. deletion: a --, b if b ffi A. insertion: a --* b if a ffi A. Definition 3. String B results from the application of

the operation a ~ b to string A, written A ~, B via a --, b, if A ffi 0 ~ and B = ~b~ for some strings ,, and ~.

Definition 4. Let S be a sequence s~, s~ .. . . . s~ of edit operations (or edit sequence for short).

Definition 5. An S-derivation from A to B is a sequence of strings Ao, A~ . . . . , A, such that A •ffi Ao, B ffi A, and Aj_2 ~. Atvia s~for 1 < i < nt We say Stakes A to B if there is some S-derivation from A to B

Definition 6. Let 7 be an arbitrary cost function which assigns to each edit operation a ~ b a

non-negative real number y(a--, b). Extend ~, to a imm

sequence of edit operations S by letting 7(5") = ~'. 7(s~. i = l

(ifm = 0, we define y(S) -- 0.) Definition 7. We define the edit sequence ~(A, B)

from string A to string B be the minimum cost of all edit sequences transforming A to B, i.e. ~(A, 8) = min {~($)1S is the edit sequence}.

We will assume that y(a --* b) -- ~(a, b) for all edit operations a --, b and

¢~(a, b) = {~ i f a = b otherwise"

The key operation for string matching is the computation of edit distance. Let A and B be strings, and D(i, f l = ~(A(1, 0, B(I,j)), 0 < i < m, 0 < j _< n (where m ffi I A [, n --~ t BI), then:

D(i,j) = rain {D(i - 1,j - 1) + 7(A(0 ~ B(j)),

o(i - l , j 3 + ~,(A(O --. ^),

v ( i , j - I) + ~(A-, BO))} (I)

foraIli , j, I ~ i <_m, I <j<_n. This can be done by Wangner's algorithm ~j which is

essentially a dynamic programming procedure and has time complexity O(m x n). If there are p reference strings, for finding the approximate matching, the time complexity is O(m × n × p). This method can be represented as a problem of finding the shortest path in a lattice and can be done by a backtracking procedure, as shown in Fig. 1.

Definition 8. We define a pair of integers i andj as an index-pair (i, j) and use i to indicate the ith symbol of string A and j to indicate the jth symbol of string B

Definition 9. The sequence of index-pairs of string matching (or index-pair sequence for short) is an ordered set, the clement (s, t) is in front of the clement (p,q) i f s < p a n d t < q o r s < p a n d t < q, thefirst element is (0, 0) and the last clement is (m, n) and others are (i,j)s'(0 < i < n t0 < j < n).

(40) ~ (41} - (42) ~ (43) - ( 4 4 ) - - - ~ ( 4 ~ ) . / 3 1 0 f 2

t t /1 t *'0"¢ ~ / ] ' ¢ ~ ~ / | / ¢ t (ao) ,, (3J) . (32 l . (33j ,, (34~ .. (3:~}

3 J 9 ~ ¢ z f 3

• t . ' t (20) • (~J) ..,, ( ~ 2 ~ . - . - , , - ( 2 3 ) ,,- (~4) ~ (2s}

(zo) : ( I l l ,- (|~,) • (13} ,. ( | 4 ) . (| .5) 1 0 d 1 / l , J 2 f 3 J 3 o f ! / z |

' t l t - t t t / t (00) - (01) st (02) ~, (03} - (04) i, (05)

0 l 9 3 4 $

s t • t •

Fig. I. The example of computing string distance and index.pair sequence of two strings "state" and "seat". Each node oftbe graph is labeled (i,j) and below that label the valuc of D(i,]) is shown. The best matching occurs with a difference 2. The index-pair sequence for the best matching is {(0, 0)

(1, 1) (2, 2) (3, 3) (4, 4) (4, 5)}.

String matching and pattern matching 1 2 7

There is a correspondence between the index-pair sequence and edit sequence. Each index-pair corresponds to an edit operation: (A(i)--* B(j)). It means that the ith symbol of string A is transferred into the j th symbol of string B. If integer i repeats k + 1 times, i.e. there is a subsequence: (i, j), (i, j + 1), .... (i, j + k), then k insertion operations have to be applied for inserting B(j + 1), B(j + 2), .... B(j + k) symbols. If integerj repeats k + 1 times, i.e. there is a subsequence: (i,j), (i + 1,j), .... (i + k,j), then k deletion operations are needed to delete A(i + 1), A(i+ 2) . . . . . A(i+ k), symbols. The index-pair sequence for Example I is also shown in Fig. 1.

2.2 Pattern matching using dynamic time.warp

The technique of dynamic time-warping has found extensive applications in speech recognition and image processing. ~s-~3~ Based on the dynamic programming pathfinding algorithm, the technique is both mathematically sound and computationally efficient. However it most often limits real-time processing, especially with large number of patterns. For example, the ability to rapidly compare large dictionaries is particularly important in speaker-independent recognition, and would bring unrestricted vocabulary recognition closer to reality: t~ Both in speech recognition and in image processing, we can compare an unknown input pattern with the reference template (or templates) after preprocessing and feature extraction.

Definition 10. Let U be a finite template ofm feature vectors (or time intervals). U~ is the ith feature vector of U. Each feature vector has N elements.

Definition 1 1. Let D,.j be the distance (or dissimilarity) of two feature vectors U~ and Rj, then

l k f N

D,j = 5". l U f - e~ l. (2) t - I

There are other distance measures that could be used: s) In this paper we only use this definition.

The template matching process is illustrated in Fig. 2. The horizontal axis represents the unknown input

Sm,n n •

i

j + l Si-I J' /

[ -

,#

1 2 i - I i i+! m-! m

U Fig. 2. Pattern matching using dynamic time warp.

pattern U and the vertical axis represents the reference template R. Each grid intersection point (i, j3 represents a possible matching between UI and Ry Any monotonic path from (1, 1) to (m, n) represents a possible mapping or warp of the unknown input pattern onto the reference template. From dynamic programming, we know that if points (i, j) and (k,/) both lie on the optimum path, then the subpath from (i,)3 to (k,/) is locally optimum. This means that the optimum global path can be found by optimizing local paths, one grid at a time. The computation of summation distance is the key operation for dynamic time-warp pattern-matching. The summation distance S,.j for Ui and Rjis defined as:

S,,j = D~j + min(Si_l.j_l, S,-1,~ S~j-t) (3)

where

SI. 1 ~ DI, 1

S,,I = D~I + $,-1, I

Si,j ~ Di,j + Sld-1

for all2~ i~ mand2~j~ n

2.3. Space-time domain expansion and the computational model

Space-time expansion method has been used for building VLSI architectures of vector inner-product, matrix-matrix multiplication, convolution computation, comparison operations in relational database, Fast Fourier Transformation (FFT), hierarchical scene-matching, context-free language recognition and pattern-matchings. By using the computational model based on the space-time domain expansion method, we can partition the recursive algorithms and solve them on fixed-size VLSI architectures. Is-7' 191 Here we summarize the space-time domain expansion approach which will be used in this paper to design VLSI architectures for string-matching and pattern- matching.

Let S represent space domain and Trepresent time domain. The space-time domain is then the set of {S, 7"} which, in general, is a (n + D-dimensional vector {xl, x2 ..... x, [x~ e S i = 1, 2, .... )1, and t e ~ . But for the real world, we only concern the case of at most 4-dimensional vectors {Xh X2, XS, t}.

Along x, a k-space expansion means that the processing unit repeats uniformly k times along the xj direction, as shown in Fig. 3. The structure in Fig. 3(b) is called the k-space expansion of the structure in Fig. 3(a); on the other hand the structure in Fig. 3(a) is

K

Fig. 3. (a) Symbolic representation of the processing element, (b) K-space expansion of the structure in (a).

! 28 H.D. CHENO and K. S. Fu

called the k-space condensation of the structure in Fig. 3(b). The space-expansion (condensation) can be performed at several levels, each processing unit could be a gate, a processing element (PE), a group of PEs, a processor, a group of processors or a processing system, etc. In this paper, we only consider processing unit consisting of a PE or a group of PEs.

A K-time expansion means that K events occur sequentially and each adjacent pair of the events has equal time interval (1 time unit). Its configuration is shown in Fig. 4. The structure in Fig. 4(b) is called the K-time expansion of the structure in Fig. 4(a); on the other hand the structure in Fig. 4(a) is called the K-time condensation of the structure in Fig. ~b). The time-expansion (condensation) can also be performed at several levels, each event could be a datum, a block of data, a task, or a group of tasks, etc. We only consider an event as a block of data in this paper.

We use 4-tuple (I, D, O, V) to describe a recursive task which can be transformed into VLSI algorithm where

I: index set, 1 -- /i~ . . . . id/, in the recursive formula, D: input data set, D -~ ~dl . . . . . dk/, in the recursive

formula, O: operation set, O -- /ol . . . . . o~/, in the recursive

formula, H variable set, V = / v t .. . . , u,},in the recursive formula.

The dependency is defined as follows: If v's ¢ F _c Fare the inputs of c, we say that there is a dependency between c and o's.

VLSI algorithm basically has linear recursive

formula and V = {v}, so we can specifically define the dependency as follows.

Let I - i be set operation andf(i) be the function of the index i. For any i ¢ l, if the variables v(1 - i, f l ( i )) , . . . . v ( l - i, fp(i)) are the inputs of v ( l - i, f03) in the recursive formula, then we say that there is a dependency of variable v related with the index i and use an integer ~ to express it, where ~ ~- f,~-max {fl(i), .... fp(i)/, in general, ~ =~ 1 because of the locality of VLSI communication; otherwise ~5 -~- 0.

The space-time domain expansion approach is defined as follows. For any index i e l , we can make either space expansion or time expansion which is guided by space expansion rule or time expansion rule.

(1) S p a c e - e x p a n s i o n rule. If the index i(1 ~ i < m) is used for the space expansion, then for any k _< m, all of the input data D ( I - i, i(k)) of the kth processing element have to be spatially arranged to maintain the time consistency, i.e. the input data of the kth processing element have to spend k x 6 x t, time units to arrive the processing element, where t~ is the time difference between the time at which the processing element produces the output v(l - i , f ( i ) ) and the time at which v( l - i , f , (O) (f,(0 ~- max {fl . . . . . f~}) inputs into the processing element.

(2) Time-expans ion rule. For any i ~ l (1 _< i ~ m), we can make a time expansion. The input data D of a given processing element have to be arranged in the time sequence to maintain the space consistency, i.e. the input data D(I - i, i(k)) need k more steps (each spends

__ .~ d b

(.)

time-ex pansion

data flow

d~ • • • d I ] fl "" " f|

(b)

Fig. 4. (a) Symbolic representation of the processing element, (b) ./-time expansion of the structure in (a)

String matching and pattern matching 129

t d time units, where t# is the longest time of the operations performed among the data set D) to reach the processing element than the input data D(I - i, i(1)) do.

(3) Rule of mappino a recursive algorithm into a VLSI architecture. For the index (indices) in the recursive formula (or of the program loop) we can apply space and time expansions alternatively such that each index corresponds to one expansion and the processing element performs all operations o • O in the recursive formula. In general, space and time expansions are applied alternately, can we obtain better perform- ance.

Time-expansion can improve the utilization of a system, since it essentially uses pipelining technique. Space-expansion essentially uses multiprocessing technique. Space-time domain expansion refers to the expansion which involves space expansion and/or time expansion.

There are several measurements for a computational task. We will use problem size N which is the number of operations needed to solve the given task to measure the computational task. The computation model of a VLSI architecture obtained by the space-time domain expansion can be described by the tuple (K~, K2, K3, Q~ . . . . Qt). Here K~ denotes a. Krspace expansion along the x~ direction and Qj denotes a ~; t ime expansion in the jth-time expansion. K~ equals to one if there is no space-expansion along x, direction and Qj equals to one if there is no expansion in the flh time-expansion. A necessary condition for solving a recursive task by using the space-time domain expansion is

FIK, x I'IQj 2 N. (4)

This equation indicates that the VLSI architecture based on the space-time expansion can perform a number of operations equal to the product of all the space--time domain expansions.

(4) Partitioning rule. When the computational task size is larger than the VLSI architecture size, we have to partition the algorithm to solve it on a fixed-size VLSI architecture. According to equation (4), if we make a k-space condensation along the x~ direction, then we have to also make a k-time expansion. Multi-dimensional condensations need multi-time expansions which may require some input data to be used repeatedly. See Ref. (6) for details.

3. NEW ALGORITHM FOIl STIIING MATCHING AND ITS VLSI IMPLEMENTATION

3.1. The aloorithm and its VLSI implementation

A VLSI implementation of String-distance computation has been proposed, gj However the structure and the dataflow are quite complicated and the edit sequence $ can not be found. Also the algorithm partition problem has not been considered. Here we

propose a VLSI architecture based on the space-time domain expansion approach 15-7'~9~ which has a very natural and regular configuration. It can compute the String-distance and find the edit sequence S. The algorithm partition problem is also solved. We propose Algorithm 1 as follows.

Algorithm 1. Algorithm for computing string- distance and index-pairs.

Let m be the length of the input string and n be the length of the reference string.

begin ~ 0 , 0 ) = 0; for i - - 1 to redo

D(i, O) = D(i - 1, O) + 1; for j~- I t o n d o

D(O,j~ ffi ~ O , j - 1) + 1;

f o r i f f i l t o m d o f o r j f f i l t o n d o begin

D, ffi D ( i - l , j - 1) + ~,(A(i)--. B(j)); D2 ffi D ( i - l , j) + I; Da --- D(i,j - I) + I; D(i,j) = rain (DI, D2, D3); output (D(i, fl).

(We use "output" to indicate that the value of D(i,j) is put into the output belt. If we do not want to know the partial result, it is not needed, excepting D(m, n).)

ff D(i,j) ffi DI then append {(i - 1,j - 1), (i,j)}

(We use "append" to represent forming index-pairs and output it into the output belt and the first pair, that is (i - 1, j - 1), will be in the output sequence; the second pair (i,j) will be used as the tag for extracting edit sequence if it is required.)

else ff D(i, j) ffi D2 then

append { ( i - 1,j), (/, j)} else

append {(i,j - 1), (/,j)} end

end.

According to the space-time expansion method ~s'*l we could make an m-space expansion along xj direction and an n-space expansion along x2 direction in the space domain. The inputs should be arranged according to the space-expansion rule. For the (i,j)th processing element in the array, its symbolic representation is shown in Fig. 5(a). The structure of the processing element is shown in Fig. 5(b). For clarity, we do not show the output belt and index moving channel in Fig/5. The functions performed by each processing element are as follows.

lnput:A(i),B(I),D(i- 1 , j - - 1 ) ,D( i , j - I ) , D ( i - l,j), ( i - l , j - l ) , ( f , j - 1 ) , ( i - l,j), iandj. (Note/ 's are the same for a column and f s are the same for a row. We can input them constantly or we can even preset the index-pairs since they are never changed.)

130 H.D. Cm~o and K. S. Fu

D(i-l, j- l) ai

(i-l, j - l ) D{i,j-l)

(i,j-l)

-~ Diij) "x c,,Jl

i/, " iode, ,.ir,

+ l

t D

I T __"'"' l I I D

D(i, identification

signal

(identification signal channels not shown)

(bl

(a)

.J

i - l}

"f = "; function proeemng unit

01 (tl = b i} ~(ti,bj) = otherwise

C =eomparttor

Fig. 5. (a) Symbolic representation of the processing element for string matching, (b) the structure of the processing element.

Output: D(i, j), (i, j) and the index-pairs where

{(i-- l,j-- l),(/,J)} if/)(/,./)--- DI

index-pairs = {(i - 1, j), (i, j)} if D(i, j) ~- D2

{(i,j - l), (i, J)l otherwise.

Operations: Each processing element has local outputs which are connected to the inputs of the adjacent processing elements and also has an output belt which i s connected to the output belt of adjacent-up-diagonal processing element. For example, the output belt of the (i, j) processing element is connected 1o the output belt of the (i + 1, j + 1) processing element forming the global connection. The data along the output belts move one processing element per time-unit.

(1) When A(i) and B(j) arrive, the 7 computation unit of the (i, j) processing element performs the computation T(A(i).-, B(j)) which requires one time- unit. A(i) and B(j) will move one processing element per time-unit without change.

(2) C o m p u t e DI = D(i - 1, j - I) + y(A(0 -" B(j)) and delay the result one time-unit by the delay component (here we assume addition is performed by a combinational circuit which can perform the operation without delay or with a delay much shorter than the period of the clock). The indices (i - 1,j - 1)s will also be output with the D(i,j)s, although we do not show it in Fig. 5 for clarity.

(3) C o m p u t e D~ ffi D(i - 1, j) + 1 and D3 ffi D(i, j - 1) + 1 and compare DI, D, and D3, then

according to the result of the comparison, output D(i,j), which is the minimum of Dz, D2 and D3, and the index-pairs into the output belt and also output D(i, j) and index (i,j) to the adjacent processing elements. It requires one time unit.

From the above computation, we conclude that the outputs D(i, J3, index (i, j) and index-pairs will be produced at three time-units after the arrival of inputs A(i) and B(j). After establishing the structure of the processing element and making two space-expansions, we can obtain the structure shown in Fig. 6(a). Now we can present the VLSI implementation of Algorithm I.

Algorithm 2. VLSI implementation of Algorithm 2. Input: string A with m symbols, string B with n

symbols, indices i's andj 's. The initial values of D(0, 0), D(0, j) and D(i, 0) are given in Fig. 6(a) which can be fixed connections.

Output: the edit distances D(i,j) and the corresponding index-pairs, where 1 ~ i ~ m, 1 ~ j ~ n.

Method: move input A and indices i's from the bottom of the VLSI structure in Fig. 6{a) upwards to the top each processing element per time-unit. Move B and indices f s from the left to the right of the VLSI structure in Fig. 6(a) each processing element per time-unit. Send the identification signal, after two time units from the start, to permit the corresponding processing element output the result into the output belt (it is a data bus controlled by the identification signal) and the identification signal moves along the channels which connect to the right neighbor and to the top processing element, each processing element per


time-unit. For instance, in example 1, at the second time unit, (0, 0) and D(0, 0) are put into the output belt. At the fourth time unit {(0, 0), (1, 1)} and/)(1, 1) are put

bn

ha- j

\

into the output belt, and so on. After one pass, we obtain D(i, j3 and index-pairs from (m + n) output belts as shown in Fig. 6(a). It requires m + n + 2 time units.

output

dataflow

ba

b~

te O. j-l)tk b~bl-lmi~ wq"tef ~ogem~lg

i I bllck trsckin|

imemmhqg (i- I.))tk ( 'tq *kmem I-oeem~s %.~

ekment

(0,3 " j ~ ~ ~ m ~ "'"

0i2

g

(oo) (so) (~o) (~o) (m-~,O) (m-l,O) (toO) 0 g] 1 2 3 m-2 m-I m

(a}

e, o

s~n-!

am

°°.

dm~llow

I ~ apem

br

m,o I | | st

ms

b

b,' I o ~

e

(b) *4

deJ*Sow O-ti~ e precede*

ovtpwt Dt

(m-S,el Im-l,O) ImOl m-2 m-I m

I ..rx.

Start at

(at *n ¢.|)nd time umt.

p t h l o

Fig. 6. (a) VLSI implementation of algorithm 2 for computing string distance and index-pair sequence, (b) P-time expansion of the structure in (s) for string matching with p reliance strings. Starting (m + n + 2)nd

time unit, it outputs one comparison distance per time unit.

132 H.D. CHENG and K. S. Fu

3.2. Finding the edit sequence To obtain the edit sequence S, we have to perform a

backtracking procedure. It can be done in several ways.

(1) Output the entire matrix D and/or index-pairs to the host computer and the backtracking procedure can be done by the host machine.

(2) Attach extra processing elements and use the tag of the index-pairs as the search key to perform the backtracking procedure.

(3) Expand the "append" operation to the one which appends an index to the index-list of its ancestor. An index-list is formed by appending an index to an index or an index-list. For instance, in example 1, (1, 1) appends to (0, 0), (2, 2) appends to {(0, 0), (1, 1)} and so on. Finally, (4, 5) appends to {(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)}. We can use (4, 5) as the tag to find the edit sequence. This will make all operations becoming forward, but it requires a large output channel capacity, specially, for the processing elements located at the upper-right corner. Actually, the (i, j)th processing element will need at most (i + j + 1) output channel capacity.

(4) Add an index-pair register which can be divided into two parts (the first part for the first index and the second part for the second index) to each processing element. The second part of the index-pair register is compared with the tag. If they are matched, then output this part into the output channel and output the first part as the tag to its three lower left neighbors. Basic structure is shown in Fig. 6(a). When using this method, we should input the index-pairs into the index-pair registers instead of output them to the output channels. At the (m + n + 2)nd time unit, send a backtracking signal which moves along the channels connecting to the left neighbor and the one beneath it, each processing element per time unit. The index (m, n) is used for the tag of the (m, n)th processing element. It needs at most (m + n) time units to complete this procedure.

We use example 1 to explain how to find the edit sequence by using index-pairs. From Algorithm 1, we know that each index-pair (i, j) appends to one of the three indices: (i - 1,j), (i,j - 1) and (] - 1, i - 1), and only appends once. First we use (4, 5) as the tag to find the index-pair {(4, 4), (4, 5)} and output index (4, 5). Then we use index (4, 4) as the tag to find the index-pair {(3, 3), (4, 4)} and output (4, 4), and so on. Finally, we obtain the index sequence of example I which is {(4, 5), (4, 4), (3, 3), (2, 2), (I, 1), (0, 0)}.

From Section 3.1 and the above discussion, we can conclude that the VLSI architecture in Fig. 6(a) can perform the string-matching problem and find the S sequence. Of course, a different implementation of backtracking may cause a different time requirement. The third method is the best from the time point of view, its time complexity is (m + n + 2) time units. Here we give the time complexity of string-matching using the fourth method for backtracking: m + n + 2

< time complexity < (m + n + 2) + (m + n). It is obviously O(max(m, n)).

If there are p reference strings, we can make a p-time expansion of the structure of Fig. 6(a) and obtain the structure in Fig. 6(b). In general, we only need the edit sequence of the input string with a specific string (for instance, the most matched string) in the reference set. This can be done by performing two passes through the structure in Fig. 6(b). In the first pass, we find the most matched string. In the second pass, we input the unknown string and the most matched string to find the edit sequence. If using the third or fourth method for backtracking, the time complexity will be O(max(m, n, p)). If using a uniprocessor, the time complexity will be O(m x n x p).

For indicating the most matched string in the reference set with the input string, we number the reference strings and add a register consisting of two pans. One part is for the distance and another pan is for the index of the reference strings. We also add a counter which is initially set to zero and starts at (m + n + 2) time unit. Initially, distance.register = oc and index.register = don't care. The operation of the register is as follows.

f ~ distance.register ffi distance.array index.register ffi counter

register = unchanged

if distance.register > distance.array. otherwise

The final result of index.register indicates the index of the most matched string of the reference set. The entire structure is shown in Fig. 6(b). In many applications, only the distance measured is required, if so, we can greatly simplify the structure of the processing element and the entire VLSI architecture since only loyal outputs D(i, j)s are required. From now on, for simplicity, we only concern the computation of string distance.

3.3. Verification of the proposed algorithm To verify Algorithm 2, we need the following lemma

and theorem. /,emma 1. It needs i + j - 1 time units to make A(i)

and B(j) arrive at the (i,j)th processing element. Proof It follows the data arrangement of the

structure in Fig. 6(a). Theorem 1. After receiving inputs, the (i,j)th

processing element will output D(i, j) at (i + j + 2)th time-unit for all 1 < i < mand 1 < j < n

Proof We prove the theorem by induction on i andj. Basis. First we consider the case i ffi j = 1. Since

D(0, 0), D(0, j) and D(i, 0) are fixed values, they exist already. According to the structure of the processing element, when A(1) and B(I) arrive at the (1, 1) processing element, it will produce y(A(l) ~ B(1)) at the second time-unit and compute D(0, 0) + 7(A(I) - , B('-)) and delay the output one time-unit by the delay


component. Thus the output will arrive at the comparator at the third rime-unit, and D(0, 1) + 1 and D(I, 0 ) + 1 also arrive. Finally, the outputs are obtained at the fourth rime-unit, and 4 == 1 + 1 + 2 = i + j + 2 .

Imtuction step. Our induction hypothesis is that all (p, q)th processing elements can produce outputs and index-pairs at the (p + q + 2)th rime-unit where (I < p ~; i + ! a n d I < q < j + l) o r ( l < p < i + l a n d l < q ~ j + I).

Now consider the (i + l , j + I) processing element. According to Lemma I, A(i + I) and B(j + I) arrive at the (i + l,j + I) processing element at the (i + j + l)th rime-unit, the ~ computation unit will output 7(A(i+ I ) - - , B ( j + I ) ) at the ( i + j + 2 ) t h time-unit. But from the hypothesis, D(i, j) is produced at the ( i + j + 2 ) t h time-unit, it can add ?(A(i+ I) --, B(j + I)) and output the result to the comparator at the (i + j + 3)th rime-unit by the one rime-unit delay. At the same rime, according to the hypothesis, D(i+ l , j ) and D ( i , j + l ) are produced at the (i + j + 3)th rime-unit and can perform the computations D(i + l,j) + ! and D(i,j + I) + I, and the results will arrive at the comparator. Following the com- par/son, D(i+ l , j+ l) and the index-pair will be obtained at the ( i + j + 4 ) = ( i + l ) + ( j + l ) + 2 rime-unit. Therefore the proof is completed.

Corollary 1. The string distance D(m, n) and the index-pairs can be obtained at the (m + n + 2)th time-unit.

Proof. Follow Theorem 1 and let i = m and j = n. The outputs D(i, fl and index-pairs will be taken from (m + n) output belts as shown in Fig. 6(a), and D(m, n) is obtained from the (m, n) processing element.

Corollary 2. The computations of the proposed VLSI architecture are carried out on a diagonal-by- diagonal basis.

Proof. According to Theorem 1 and consider the processing elements (1, i + j - 1), (2, i + j - 2) ..... (i,j), ( i+ 1 , j - 1), .... ( i + j - 2, 2), i f + j - 1, 1). Since they have the same summation of indices which equal to i + A they will output the results at the (i + j + 2)th rime-unit. Parallel computation then proceeds in waves from lower left to upper right.

We have discussed the VLSI implementation of string-matching. If there are p reference strings, we can make a pr ime expansion of the structure in Fig. 6(a) and obtain the structure in Fig. 6(b). After (m + n + 2) rime-units, we will obtain one comparison result per rime-unit, so totally (m + n + p + 1) time-units are needed and the time complexity will be O(max(m, n, p)). If a uniprocessor is used, the time complexity will be O(m x n x p).

4. VLSi IMPLEMENTATION OF DYNAMIC TIME-WAIP, P PA II~itN-MATCHING

4.1. New algorithm for dynamic time-warp pattern- matching

Dynamic rime-warp pattern-matching using an

integrated multiprocessing array has been investigated/ s) However, the structure is quite complicated and the computation is performed only one feature vector at a time (i.e. it has to wait until the first vector is computed then start to compute the second vector), and this is not very efficient. We propose a VLSI architecture based on the space-rime domain expansion approach ~-7) which has a very natural and regular configuration. It can perform dynamic rime-warp pattern-matching and find the warp path in a much more efficient manner by using extensive pipelining and parallelism techniques. We also solve the algorithm partition problem. We propose Algorithm 3 as follows.

Algorithm 3. Algorithm for dynamic time-warp pattern-matching.

Let m be the number of feature vectors of the input pattern, n be the number of feature vectors of the reference pattern and N be the number of elements of the feature vector.

begin So, o= O; f o r i = l t o m d o Si, o= oo; f o r j = l t o n d o So, j = oo; for i = 1 to rode for j = l t o n d o begin D,.j=O; for k--- 1 to N d o D,.j = D,.j + I U~ - R~I

if Sj, j = S~-l.j-t then append { ( i - 1 , j - 1), (/,j)}.

(We use "append" to represent forming index-pairs and output it into the output belt and the first pair, that is (i - 1, j - 1), will be in the output sequence; the ~econd pair (i,j) will be used as the tag for extracting edit sequence if it is required.)

else if S~j = Si-l,j then

append { ( i - 1,j), ( i , j ) } else

append {(i,j - I), (/,j3} S,,j = D,.j + S~j; output (S~j).

(We use "output" to indicate that the value of S~.j is put into the output belt, if we do not want to know the partial result, it is not needed, excepting S.~ ~)

end end.

According to the space-time expansion method c~'~'9~ we could make an m-space expansion along xt direction, an n-space expansion along x2 direction in the space domain, and one N-time expansion in the time domain. The inputs should be

134 H.D. ~ o and K. S. Fu

arranged according to the space-expansion rule. For the (i, j) processing element in the array, its symbolic representation is shown in Fig. 7(a). The structure of the processing element is shown in Fig. 7(b). The functions performed by each processing element are as follows.

Input: U~, Rf, where 1 ~ k < N, and $,-l.j-I, S,,~-i, S,_~,~, (i - 1,j - 1), (i,j - 1), (i - l,J3, i and j. (Note rs are the same for a column and f s are the same for a row. We can input them constantly or even we can preset index-pairs since they are never changed.)

Output: S~ (i, j) and index-pairs where

{(i -- l, j --1), (i, j)} i f $ ~ j - - D ~ j ÷

index-pairs ffi { ( i - L/), (/, J)} if S,,j ffi D,,j+

{ ( i , j - - 1), (i,j)} o t h e r w i s e

Operations: each processing element has local outputs which are connected to the inputs of adjacent processing elements and also has an output belt which is connected to the output belt of adjacent-up- diagonal processing element forming the global connection. For example, the output belt of the (i, j)

s(~qj} Ri

s(i-td-l)

"~ S(ij)

S(ij-l)

processing element is connected with the output belt of the (i -6 1,j + 1) processing element forming the global connection. The data along the output belts move one processing element per time-unit.

(1) S~_~.j_I arrives and is delayed one t/me-unit by the delay component.

(2) S,-i,~ S~j-~ arrive and are compared with SJ-l,j-i to find min(S~_l.j_,, Si-l.~ $~j-~.

(3) When Uf and R~ arrive, it performs the computation I U~ - R:I which requires one time-unit. U~ and R~ will move one processing element per

Si-l,j-I Si_l,j.

time-unit without change. When D~j finished, it requires N time-units, the result will be added to the result of the above comparison. (Here we assume addition is performed by a combinational circuit which can perform the operation without delay or with a delay much shorter than the period of the clock.) The

(*)

R i

S(i-I ~t

S(i-lj-l) ,~,u-,, I Ui

J

initial '4Or' rmet

A: Accumulator

" ' : b ~ k~b)l C: Oompgrator

(b) A will be reset each N time units for starting n~ t computstion wben we make multiple-psir compsrisons

Fig 7. (a) Symbolic representation of the processing element for dynamic time warp, (b) the structure of the processing

element.


structure for dynamic time-warp pattern-matching is shown in Fig. 8(a).

Aioorithm 4. VLSI implementation of Algorithm 4. Input: m feature vectors of template U, n feature

vectors of template R, indices rs and fs . The initial values of S~ o, 5~0 and So,~ are given in Fig. 8(a) which can be fixed connections.

Output: the dissimilarities S~j and corresponding index-pairs, where I < i ~ m, 1 ~ j < n.

Method: move the elements of the input feature vectors and indices/ 's from the bottom of the VLSI architecture in Fig. 8(a) upwards to the top each processing element per time-unit and the elements of the reference feature vectors and indicesfs from the left to the right of the VLSI architecture in Fig. 8(a) each processing element per time-unit. Send identification signal after starting N time-units which moves to right and top and passes each processing element per time-unit along the channels. When the signal arrives, the processing element outputs S,.j and index-pairs into the output belt. After one pass, we obtain S~.j and index-pairs from (m + 11) output belts as shown in Fig. 8(a). It requires m + n + N - 1 time-units.

To obtain the warp path, we have to perform a backtracking procedure and it can be done in several ways.

(1) Output the entire matrix S to the host machine and the backtracking procedure can be clone by the host machine.

R: r :

R.Lt

(2) Attach extra processing elements and use the tag of the index-pairs as the search key to perform the backtracking procedure.

(3) Expand the "append" operation to be the one which appends index to the index-list of its ancestor. This will make all operations becoming forward, but it requires a large output channel capacity, specially, for the processing elements located at the upper-right corner. It requires linear channel capacity.

(4) Use the method similar to the fourth method for the string-matching backtracking. Details are omitted here.

From the above discussion, we can conclude that the architecture in Fig. 8(a) can perform dynamic time-warp pattern-matching and find the actual warp path. In many applications, only the dissimilarity measured is required, if so, we can simplify the structure of the processing element and the entire VLSI architecture greatly and only local S,.j outputs are required. From now on, for simplicity, we only concern the computation of dissimilarities of patterns.

4.2. Verification of the proposed algorithm

To verify Algorithm 4, we need the following lemma and theorem.

/,emma 2. It needs i + j + k - 2 time-units to make U~ and R~ arrive at the (/, j)th processing element.

Proof. It follows the data arrangement of the structure in Fig. 8(a).

output S

R~' ... RI tl~,3j[---tc~.~ji-...]l~.s~---~c,,s ~-

R N ... s: / ? n - , / , - . - - , - - , - , -

o / u ; u~'

. r u u: ul r~t p~ UI u~ (period N) u l

u," !] U: u," ( , ) u ~

u,"

÷~

u:., ul-,

ul

u.,_, u.~

136 H.D. CHZ~O and K. S. Fu

r "

pR~

p blocks

pR~

output S

i / , , . I /%, I /%,. I / , % %

datsflow

pR~ pR2 t R~ R~

-~- o? u~ +oo

reset pulse UI~ U~ (period N) U~ U~

(b) u~ ~ v~ "'" v~ ~ u~'_,

Um~ l repeat or UmN_I feedback

UmNJ p times

Fig. 8. (a) VL$I implementation of algorithm 3 for computing the dissimilarity of two patterns, (b) P-time expansion of the structure in (a) for computing the dissimilarities of p pairs of templates.

pRN! pRnl-! R~-) It~.1

Theorem 2. After receiving inputs, the (i, j)th processing element will output S,,~ at the ( i + j + N - 1 ) t h t i m e - u n i t for all 1 < i < m and l < j ~ n .

Proof We prove the theorem by induction on i andj. Basis. First we consider the case i -- j = I. Since S~ ~,

So, j and S~ 0 are fixed values, they exist already and the result of the comparison exists too. According to the structure of the processing element, when U~ and R~ arrive at the (1, l)th processing element at the Nth time-unit and finish the computation at the N + 1st time-unit also S~.~ is obtained at the N + 1 ffi N + 1 + 1 - 1 = N + i + j - I time-unit, fori = j = 1,

Induction step: our induction hypothesis is that all (p, q)th processing elements can produce outputs and index-pairs at the (p + q + N - 1) time-unit where (1 < p < i + l a n d l < q < j + l)or(1 ~ : p < i + land 1 ~ q ~ j + I).

N o w consider the (i + l , j + I) processing element. According to I.emma 2, U~÷~ and Rf.~ arrive at the (i + l , j + I) processing element at the (i + j + N)th time-unit, and produce D(i+ l, j + I) at the (i + j + N + l)th time-unit. From the hypothesis, S~.j is produced at the (i + j + N - 1)th time-unit, it is delayed one time-unit by the delay component and

arrives at the comparator at the (i 4-j + N)th time-unit. At the same time, according to the hypothesis, S~+).~ and Si.j.) are produced at the (i + j + N)th time-unit. They can perform the comparison and the result will output to the combinational addition circuit at the (i + j + N + l)th time-unit and S,+t.j+~ and index-pairs will be obtained at the (i + j + N + 1) = ((i + 1) + (j + 1) + N - l)th time-unit. Therefore the proof is completed.

Corollary 3. The dissimilarity $~. and the index- pairs can be obtained at the ( m + n + N - 1 ) t h time-unit.

Proof. Follow Theorem 2 and let i = m and j = n. The outputs Si,~ will be taken from (m + n) output belts as shown in Fig. 8(a), and S.,, is obtained from the (m, n) processing element.

Corollary 4. The computations of the proposed VLSI architecture are carried out on a diagonal-by- diagonal basis.

Proof According to Theorem 2 and consider the processing elements: (1, i + j - 1), (2, i + j - 2) . . . . . (i,j),(i + l , j -- 1) ..... (i + j -- 2, 2),(i + j -- 1, 1). Since they have the same summation of indices which equal to i + j , they will output the results at the (i + j + N - l)th time-unit. Parallel computation

String matching and pattern matching 13 7

then proceeds in waves from lower left to upper right. We have discussed the VLSI implementation of

dynamic time-warp pattern-matching. If there are p reference templates, we can make a p-time expansion of the structure in Fig. 8(a) and obtain the structure in Fig. 8(b). After (m + n + N - 1) time-units, we will obtain one comparison result per N time-units, so totally (m + n + p × N - 1) time-units are needed. There will be identification signals having period N sent by the host machine or generated by self-timing circuit. ".) Certainly, we may make a N-space expansion along x3 direction and the time complexity will be ( m + n + N + p - l ) time-units. It is obviously

ba.. ,b:tb t ,,, t + l l - - - I A --'1 4 d s t a 0 o w ~

initial "0" J I ~ i

(],o) (s,o) 1 2

Start puke J'l. n t

al a2

O(max(m, n, N, p)). If using a uniproeessor, the time complexity will be ( m x n x N x p) time-units. We can use a structure similar to the one in Fig. 6(b) to indicate the pattern in the reference set which is the most matched pattern with the input pattern. And we can use one of the backtracking methods to find the warp path. Details are omitted here.

S. ALGORITHM PARTITION

In addition to using a two-dimensional array to solve string-matching and dynamic time-warp pat-

(a,o) (4,o) 3 4 m-I

a a

[• : OR Gate • ~ g D : AND G , . ~

"O A : inverter

ID : one time unit delay

p b i l k s

b: . . . q 'b r . . , bn...4b, dataflo • + I I - - I A

iui~,,"o~ li~ i i_

1"? Start pulse ~ uj

a4 *e,

a4 *e.

output D At ( r e + n + 2 } time unit after start

m

(,)

,! 4

. I as

st q a, : O R G m

('~ : AND Gate st q q t4 "..

/k : imverter ta a a ~ a¢ !] ID : one time a~ a4 unit deJay " . .

oo, (b)

am- I

am-i

(m-l,O} m-I

am ) repeat or feed back u times

output D ut (m +a +S} time uuit after Jtm't

im,O) m

gm-I

Iq.. s ~ ] repeut or ) t feed buck ( ( tOtld repeat

am-! ~ ) u times ~ or ft~lblw.k

am_ I an } repettorreedbaek / u ' p t i num

4 n time*

Fig. 9. String matching using one dimensional VLSI array along xt direction. (a) Comparing input string with one reference string, (b) comparing input string with p reference strings.

13 8 H, D. CHUG and K. S. Fu

tern-matching problems, we could use an one-dimensional array or a two-dimensional array with size different to the problem size by performing time- expansions following the partition rule. We discuss the case of string matching in details. For dynamic time-warp pattern-matching, we can easily obtain similar conclusion. When the VLSI architecture size is smaller than the computational task size, we can consider that there is (are) space condensation (condensations), and according to the partition rule, we have to make time expansion (expansions).

5.1. Using one-dimensional array along x i direction

First we assume that the size of the array is m. We can consider it as an m-space expansion along xl direction. The architecture of each processing element is the same as the one in Fig. 5. The structure of the processing array and the interconnection between adjacent processing elements are shown in Fig. 9. It needs some OR gates, AND gates, inverters, delays and a counter. According to equation (4) and the partition rule, we have to make a n-time expansion to solve the string-matching problem. We input string B from left to right and string A from the bottom to the top of the structure in Fig. 9(a), and repeat string A n

bn b n

bn-I bn- I

d s t d o w

b s . . , b~l

b~ b~

b] b t

repeat or feedback m times

times. The time complexity for comparing two strings is also (m + n + 2) time units. But when comparing string A with p reference strings, we have to make a p-time expansion and the time complexity will be (m + n × p + 2) time units. If we use a two-dimension. al array with size m × n to solve this problem, the time complexity is only (m + n + p + 2) time-units. If the number of processing elements k does not equal to m,

we need a [ k ] - t i m e expansion according to the

partition rule and equation (4). Refer to Ref. (6) for details.

5.2. Using one-dimensional array along x 2 direction

First we assume that the size of the array is n. We can consider it as an n-space expansion along x2 direction. The architecture of each processing element is the same as the one in Fig. 5. The structure of the processing array and the interconnection between adjacent processing elements are shown in Fig. 10. According to equation (4) and the partition rule, we have to make an m-time expansion to solve the string-matching problem. We input string B from left to right and repeat or feedback m times and input string A from the bottom to the top. The time

0,2: '

(o,n-~) n - 2 I I

Output D l } ~ s t ( re+n+2)

time unit alter start

f

/

/

Start Counter " " '-iti )

( , )


b, b:

b.P.: bt-t

b,

b;_, b.l_,

dataflow

b ; - . . b l . . , bl ...

br br

p blocks

Io,n-t) D - ]

,%,,

bl b;

repeat o r

feedback m times

(b)

OutpuL D f , t (m+D+~)

1o,21 ~ ( i

(o,]} ~..r-N~--~j._~-I (i

Start Counter pulse initial

"0" /

repeat o r s: feedback / p.m-

Fig. 10. String matching using one dimensional VLSI array along x2 direction. (a) Comparing input string with one reference string, (b) comparing input string with p reference strings.

complexity for comparing two strings is also (m + n + 2) time-units. But when comparing string A with p reference strings, A p-time expansion is needed and the time complexity will be (mx p + n + 2) time-units. Ifwe use a two-dimensional array with size m x n to solve this problem, the time complexity is only (m + n + p + 2) time-units. If the number of processing elements k of the array does not equal to n,

• •

we need make a -tune expanszon according to the

partition rule and equaton (4). Refer to Ref. (6) for details.

5.3. Using two-dimensional array with the dimensions k × !

If k - m and i -- n, it is the case which has already been discussed. We now consider the other cases. Aocording to equation (4) and the partition rule, it

expansion. There also need queues for feedbacking the

data. The structure is shown in Fig. 11. See Ref. (6) for details. The lengths of the queues have to be changeable with the values ofm and n to make the right data meet at the right places, this will cause much difficulty to the control system and the queue structures. Hence we either use a sufficiently large size VLSI architecture or use an one-dimensional array to solve the partition problem.

6. CONCLUDING REMARKS

We have proposed new algorithms and their VLSI implementations based on the space-time domain expansion for string-matching and dynamic time- warp pattern-matching. The time complexity is O(max(m, n)) by using m × n processing elements array, where m is the length of the input string and n is the length of the reference string. With a uniprocessor the matching process will have the time complexity O((m × n)). If there are p reference strings, using the

140 H.D. CI~NO and K. S. Fu

repeat a times repeat s times

b= b. bf b,

bip-i~ *2 bcp-zW *2

bip- l~ +1 b (¢ - I~ * l

DttaJow

.e.

bz

bl

t

b=

bl

ii

I

p=*=t tm=t

,[2, :(3,f Ik-

I

~ t D

note initial values also in queues

al

a2 a k * i a= m=sk

~'t2 n=~p ~ s - l ) k * ~ I ak4"$ Ilk

t('-ilk*:t 1 ~,-t)k *s a2k repeat : am p times

&l a~

I ( r l ) k +:1 It(s-Ilk 4"3

*b-i)k'~ I

Fig. 11. String matching using k x I two dimensional VLSI array.

II k

proposed architecture the string-matching problem can be solved in time O(max(m, a, p)) and using uniprocessor the time complexity will be O(m x n x p). We also propose a VLSI architecture based on the space-time domain expansion method which can obtain high throughput by using extensive pipelining and parallelism techniques. It can measure the dissimilarity between two patterns in time O(max(ng n, N)) and using a uniprocessor the time complexity will be O(m x n x N), where m and n are the numbers of feature vectors of unknown template and reference template respectively and N is the number of elements of the feature vector. If there are p reference templates, the time complexity will be O(max(m, n, N x p), when using a uniprocessor the time complexity will be O(m x n x N × p). If we use m x n x N VLSI array designed by performing three space expansions, the time complexity will be O(max(m, n, N, p)). The algorithm partition problems are also discussed. Formal verifications of the proposed VLSI structures are given. The backtracking procedures are discussed in much detail and their hardware implementations are also given. Even though we only discuss both VLSI architectures at the word processing level, we can easily extend them to bit

processing level by just using t-time expansion, where t is the number of bits of each word. This requires only minor modifications for both architectures, the details are omitted here. Both string-matching and dynamic time-warp pattern-matching use the dynamic programming technique, they have wide applications such as in artificial intelligence, pattern recognition, information retrieval, image processing, speech processing, remote sensing, robotics and computer vision. Using the proposed approach, we can speed-up the processing procedure toward real-time information processing.

REFERENCES

1. M. J. Foster and H. T. Kung, The design of special-purpose VLSI chips, IEEE Comput. 13 (1980).

2. R. A. Wagner and M. J. Fischer, The string to string correction problem, J. Ass. comput. Mach. 21, No. 1 (1974).

3. P. A. V. Hall and G. R. Dowling, Approximate string matching ACM Comput. Surveys 12 (1980).

4. H. H. Lin and K. S. Fu, VLSI arrays for minimum- distance classifications, V I, Sl for Pattern Recognition and Image Processing. K. S. Fu, Ed. Springer, Berlin (1984).

5. H.D. Chen& W. C. Lin and K. S. Fu, Space-time domain expansion approach to VLSI and its application to


hierarchical scene matching, IEEE Trans. Pattern Anal. Mach. lntell., May (1985); (summary published in the Proc. Pattern Recognition Conf, Montreal, July 1984.)

6. H. D. Cheng and K. S. Fu, Algorithm partition for a fixed-size VLSI architecture using space-time domain expansion, Proc. of the Seventh Symp. on Computer Arithmetic. Urbana, Illinois, June (1985).

7. H. D. Cheng and K. S. Fu, Algorithm partition and parallel recognition of general context-free languages rating fixed-size VLSI architecture, Proc. of Midwest VLSI Workshop, Ohio State University, Jan. (1985). Another version accepted for publication in Proc. of First Int. Conf. on Supercomputing Systems, St. Petersburg, Florida, USA (1985).

8. N. Weste, D. J. Burr and B. D. Ackland, Dynamic time-warp pattern-matching using an integrated multiprocessing array, IEEE Trans. Comput. C-32 (1983).

9. L. R. Rabiner, A. E. Rosenberg and S. E. Levinson, Considerations in dynamic time warping algorithms for discrete word recognition, IEEE Trans. Acoust. Speech Signal Proc. ASSP-26 (1978).

10. H. Sakoe and S. Chiba" Dynamic programming optlmiTJtion for spoken word recognition, IEEE Trans. Acoust. Speech Signal Proc. ASSP-26 (1978).

11. C. S. Myers and L. R. Rabiner, A level building dynamic time warping algorithm for connected word recognition, I EEE Trans. A coust. Speech Signal Proc. ASSP-29 ( 1981 ).

12. R.K. Moore, A dynamic programming algorithm for the

distance between two finite areas, IEEE Trans. Pattern Anal Mach. lntell. PAMI-I (1979).

13. D. H. Ballard and C. M. Brown, Computer Vision. Prentice.Hall, Englewood Cliffs, N.l.

14. R. E. Bellman, Dynamic Programming. Princeton Univ. Press, Princeton, NJ (1975).

15. K. Q. Brown, Dynamic programming in computer science, CMU Tech. Rep. (1979).

16. L. J. Guibas, H. T. Kung and C. D. Thompson, Direct VLSI implementation of combinatorial algorithms, Proc. Cahech. Conf. VL51 (1979).

17. Y. T. Chiang and K. S. Fu, Parallel parsing algorithms and VLSI implementations for syntactic pattern recognition, IEEE Trans. Pattern Anal. Mach. Intell. PAMI-6 (1984).

18. C. Mead and L. Conway, Introduction to VLSI Systems. Addison.Wesley, Reading, MA (1980).

19. H.D. Cbeng and K. S. Fu, VLSI architectures for pattern matching using space-time domain expansion approach, Proc. of IEEE Int. Conf. on Computer Design: VLSI in Computers, Rey Town Hilton, Port Chester, New York (1985).

20. M. J. Clark and C. R. Dyer, Curve detection in VLSL VLSl for Pattern Recognition and Image Processing, K. S. Fu, Ed. Springer, Berlin (1984).

21. H.D. Cheng and K. S. Fu, VLSI architecture for dynamic time-warp recognition of hand-written symbols, IEEE Trans. Acoust. Speech Signal Proc. (to appear).

Abont the Author--l-lEND-D^ C'm~NG received the B.S. degree in Computer Science from the Harbin Polytechnique University, Harbin, China in 1967, the M.S. degree in Electrical Engineering from Wayne State University, Detroit, MI in 1981 and the Ph.D. degree in Electrical Engineering from Purdue University, West Lafayette, IN in 1985.

He was a Teaching and Research Staff Member of the Computer Science Department of Harbin Shipbuilding Institute, Harbin from 1971 to 1976. He worked as a Technician at the Harbin Railway Science and Technique Research Institute, Harbin, from 1976 to 1978. He was a graduate student of the Chinese Academy of Sciences from 1978 to 1980. He was also a Research Assistant in the Advanced Automation Research Laboratory, School of Electrical Engineering, Purdue University.

Dr. Cheng now is on the faculty of Electrical and Computer Engineering Department, University of California, Davis, CA 95616. His research interests include parallel processing, parallel algorithms, VLSI architectures and advanced computer architectures for pattern recognition, image processing and artificial intelligence.

Dr. Cheng is a member of the IEEE Computer Society and the Association of Computing Machinery.

About the Author--KtNO-LrN Fu received the Ph.D. degree from the University of Illinois, Urbana" in 1959. He was Goss Distinguished Professor of Engineering and Professor of Electrical Engineering at Purdue

University. He was author of the books Sequential Methods in Pattern Recognition, Syntactic Methods in Pattern Recognition, Statistical Pattern Classification Using Contextual Information and Syntactic Pattern Recognition and Applications. In addition, he was editor or creditor of over 15 books and author or coauthor of more than 100journal papers and 200 conference papers. His research interests included pattern analysis, machine intelligence and image database systems.

Dr. Fu was the Editor of Information Sciences. an Associate Editor of Pattern Recognition, Computer Vision, Graphics and Image Processing, Journal of Cybernetics, Journal of Parallel and Distributed Computing, Pattern Recognition Letters and International Journal of Computer and Information Sciences. He was the first President oftbe International Association for Pattern Recognition (IAPR). He was a member of the National Academy of Engineering and Acadera/a Sinica and a Guggenbeim Fellow. He received a Certificate for Appreciation (1977,1979), Honor Roll (1973), Special Award (1982)and the Outstanding Paper Award (1977) of the IEEE Computer Society. He also received the 1981 ASEE Senior Research Award, the 1982 IEEE Education Medal and the 1982 AFIPS Harry Goode Memorial Award.

Dr. Fu died from a sudden heart attack while attending a National Science Foundation meeting on 29 April 1985 at Washington, D:C.

Documents

VLSI architectures for string matching and pattern matching