8
A CIRCULANT MATRIX BASED APPROACH TO STORAGE SCHEMES FOR PARALLEL MEMORY SYSTEMS Cengiz Erbas, Murat M. Tanik, and V.S.S. Nair Department of Computer Science and Engineering Southern Methodist University, Dallas, TX 75275-0122 Abstract -- In this paper, we introduce a memory storage scheme allowing conflict-free parallel access to rows, columns, square blocks, distributed blocks, and positive and negative diagonals of two dimensional arrays. Unlike the existing schemes, the proposed scheme can be used for an arbitrary number of memory modules and an arbitrary size of the arrays. We develop a systematic procedure for the memory allocation based on a placement matrix constructed using circulant matrices. Keywords -- Parallel memory systems, conflict- free access, data alignment, address generation, circulant matrices. I. INTRODUCTION I I I I ConM Interconnection Network I I A Single Instruction Multiple Data (SIMD) computer model consists of four basic components: a parallel memory system, a set of processing elements, an interconnection network, and a control unit, as shown in Fig. 1. A parallel memory system comprises a set of memory modules each of which functions independently. The processing elements access data stored in the parallel memory system through an interconnection network, which manages the data alignment requirements. In other words, the in- terconnection network rearranges the data read from memory modules and places them to the processing elements in some specified order, and vice versa. In such a system, the level of parallelism that can be achieved is limited by the memory storage scheme, and the interconnection network. In order to exploit the parallelism, one should deal with the following two problems. The first problem is related to the storage schemes. In a parallel memory system, a vector consisting of N elements may be fetched from the memory in one access. However, sometimes the data required by each processing element cannot be fetched simultaneously 1063-637U93$03.00 0 1993 IEEE Fig. 1. An SIMD computer model. due to memory conflicts. A memory conflict occurs if more than one processing element request words from the same memory module in the same fetch cycle. The second problem is attributed to the interconnection network. After the data are fetched from memory modules, it is necessary to permute the data elements in such a way that each data element be transmitted to the corresponding processing element. Hence, the interconnection network should be capable of providing the necessary data alignment requirements. Parallel memory systems and storage schemes allowing conflict-free access to various portions of two (or more) dimensional arrays, i.e., rows, columns, diagonals, square blocks, and distributed blocks, have been studied, and several models have been proposed [1,3,6,7,8]. In [3]. Budnik and Kuck introduced the linear skewing schemes. Their method allows conflict- free access to rows, columns, diagonals, and square blocks of size 6. However, this method requires the 92

[IEEE Comput. Soc. Press 1993 5th IEEE Symposium on Parallel and Distributed Processing - Dallas, TX, USA (1-4 Dec. 1993)] Proceedings of 1993 5th IEEE Symposium on Parallel and Distributed

  • Upload
    vss

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: [IEEE Comput. Soc. Press 1993 5th IEEE Symposium on Parallel and Distributed Processing - Dallas, TX, USA (1-4 Dec. 1993)] Proceedings of 1993 5th IEEE Symposium on Parallel and Distributed

A CIRCULANT MATRIX BASED APPROACH TO STORAGE SCHEMES FOR PARALLEL MEMORY SYSTEMS

Cengiz Erbas, Murat M. Tanik, and V.S.S. Nair

Department of Computer Science and Engineering Southern Methodist University, Dallas, TX 75275-0122

Abstract - - I n this paper, we introduce a memory storage scheme allowing conflict-free parallel access to rows, columns, square blocks, distributed blocks, and positive and negative diagonals of two dimensional arrays. Unlike the existing schemes, the proposed scheme can be used for an arbitrary number of memory modules and an arbitrary size of the arrays. We develop a systematic procedure for the memory allocation based on a placement matrix constructed using circulant matrices.

Keywords -- Parallel memory systems, conflict- free access, data alignment, address generation, circulant matrices.

I. INTRODUCTION

I I I

I ConM Interconnection Network

I I

A Single Instruction Multiple Data (SIMD) computer model consists of four basic components: a parallel memory system, a set of processing elements, an interconnection network, and a control unit, as shown in Fig. 1.

A parallel memory system comprises a set of memory modules each of which functions independently. The processing elements access data stored in the parallel memory system through an interconnection network, which manages the data alignment requirements. In other words, the in- terconnection network rearranges the data read from memory modules and places them to the processing elements in some specified order, and vice versa.

In such a system, the level of parallelism that can be achieved is limited by the memory storage scheme, and the interconnection network. In order to exploit the parallelism, one should deal with the following two problems.

The first problem is related to the storage schemes. In a parallel memory system, a vector consisting of N elements may be fetched from the memory in one access. However, sometimes the data required by each processing element cannot be fetched simultaneously

1063-637U93 $03.00 0 1993 IEEE

Fig. 1. An SIMD computer model.

due to memory conflicts. A memory conflict occurs if more than one processing element request words from the same memory module in the same fetch cycle.

The second problem is attributed to the interconnection network. After the data are fetched from memory modules, it is necessary to permute the data elements in such a way that each data element be transmitted to the corresponding processing element. Hence, the interconnection network should be capable of providing the necessary data alignment requirements.

Parallel memory systems and storage schemes allowing conflict-free access to various portions of two (or more) dimensional arrays, i.e., rows, columns, diagonals, square blocks, and distributed blocks, have been studied, and several models have been proposed [1,3,6,7,8]. In [3]. Budnik and Kuck introduced the linear skewing schemes. Their method allows conflict- free access to rows, columns, diagonals, and square blocks of size 6. However, this method requires the

92

Page 2: [IEEE Comput. Soc. Press 1993 5th IEEE Symposium on Parallel and Distributed Processing - Dallas, TX, USA (1-4 Dec. 1993)] Proceedings of 1993 5th IEEE Symposium on Parallel and Distributed

number of memory modules to be prime, which is a restrictive requirement. Lee [8] introduced a scrambled storage scheme allowing conflict-free access to rows, columns, square blocks, and distributed blocks. However, this scheme does not provide parallel access to diagonals. In [l] Balakrishnan et. al. proposed a scheme allowing parallel access to rows, columns, and main diagonals. Their scheme cannot provide parallel access to all the diagonals. Further, square blocks and the distributed blocks are not considered. Kim et. al. 161 introduced another scheme based on perfect lath squares. In this scheme, rows, columns, and square blocks are parallel accessible. It also provides parallel access to main diagonals, however, the other diagonals are not accessible in parallel.

In [5], we proposed two storage schemes for parallel memory systems based on our star polygon and compound polygon solutions of the N-Queens problem. In this paper, we generalize the N-Queens based memory storage schemes and propose a scheme, which allows conflict-free access to rows, columns, diagonals, square blocks, and distributed square blocks based on circulant matrices. One observes that the scheme we proposed in [5] is a special case of the general result we have in this paper. The data alignment requirements of the proposed scheme is analyzed in another paper.

The rest of the paper is organized as follows: In section 2 we introduce some terminology which are used in the discussions in subsequent sections. The memory allocation scheme based on a placement matrix is developed in Section 3. The procedure to construct the placement matrix and the various properties of the placement matrices are described in this section. Finally, in Section 4, we conclude the results with suggestions for future research.

11. TERMINOLOGY AND NOTATIONS

Definirion I : Let A be a square matrix of order N. The element at the i-th row and j-th column of A is represented by the notation xij, where 0 I i, j I N-1.

Definition 2: The i-th row of A is the set of elements defined as,

Ri= { x i ~ I O I k I N - l ) , w h e r e O I i I N - l . Definition 3: The i-th column of A is the set of

elements defined as, Ci= {Xk,iIO<k<N-l) ,where,O<i<N-l .

Definition 4: The i-th positive diagonal of A is the

Ui= { x k J I O I k , l I N - l andk-l=i) ,where 1-

Definition 5: The i-th negative diagonal of A is

set of elements defined as,

N I i I N-1.

the set of elements defined as,

L i = {XkJ I O I k, 1 I N-1 and k + 1 = i) , where 0 I i I 2N-2.

Definition 6: The (ij)-th square block of size s of

Bij = {xi+k,j+l I 0 I k,l I S-11, where 0 I i, j <

Definition 7: The (ij)-th distributed block of

Dij = { xi+ksj+ls I 0 I k,l I S-1 ) , where 0 I i, j <

Definition 8: Let A be an MXN matrix made up of N distinct integers 0,1,2, ..., N-1. If each of these integers occurs at most once in each row and the column of A, then A is called a latin rectangle of order MxN 121.

Definition 9: If A is a latin rectangle of order NxN, it is called a lath square of order N.

Definition IO: A circulant matrix of order N, is an NxN matrix whose subsequent rows are generated by shifting the previous row to the right by one position [4]. A circulant matrix is represented by the notation C

Example I : A circulant matrix of order N is

A is the set of elements defined as,

N-S.

distance s of A is the set of elements defined as,

N-S.

= cirC(c0, c1, ..., cN.1).

illustrated in Fig. 2.

Fig. 2. A circulant matrix of order N.

Definition I I : A g-circulant of order N is defined as an NxN matrix in which each row is generated from the preceding row rotated g places to the right [41. Similarly, a g-circulant can be represented by the notation C = g-circ(c0, CI , CZ, .... CN-I).

Example 2: A 2-~ir~ulant, ~ - c ~ c ( c o , ~ 1 . ~ 2 , .., C N - ~ )

is illustrated in Fig. 3.

Fig. 3. A 2-circulant of order N.

Lemma I: Let g and N be two positive integcrs. If g and N are relatively prime, and the elements CO, c1. .... cN-1 are distinct then a g-circulant C = g-cirC(c0, CI, .... CN-1) forms a latin square. If g and N are not relatively

93

Page 3: [IEEE Comput. Soc. Press 1993 5th IEEE Symposium on Parallel and Distributed Processing - Dallas, TX, USA (1-4 Dec. 1993)] Proceedings of 1993 5th IEEE Symposium on Parallel and Distributed

prime, C = g-circ(co, c1. ..., c ~ . ~ ) can be partitioned into h = GCD(N,g) identical submatrices, each of which corresponds to the same latin rectangle of order H,

. (Here, GCD(N,g) is the greatest where M =

common divisor of N and g). Proof: By definition, all the rows of C are a

permutation of co, c1, ..., c ~ - ~ which implies that no two elements in a row are the same. Further, if g and N are relatively prime (that is, if GCD(N,g) = 1). then all the columns are also a permutation of N distinct elements. If g and N are not relatively prime (that is, h = GCD(N,d) > l), the elements belonging to any column of C can be partitioned into h identical sequences. Q.E.D.

~ D ( N 9 g )

111. MEMORY STORAGE SCHEMES

We describe the memory storage scheme in three steps. First, we demonstrate a general method using circulant matrices to construct the placement matrices Q(N,d). Then, we analyze some of the properties of the placement matrices that will eventually be used for developing parallel memory storage schemes. Finally, based on the concept of placement matrices, we describe a memory storage scheme that will allow conflict-free parallel access to rows, columns, diagonals, square blocks, and distributed blocks of two dimensional arrays.

A . Constructing the Placement Matrices Q(N,d)

We start by defining the integer set S(d). The members of the set S(d) will be used in the construction of the placement matrix Q(N,d).

Definition 12: For a given positive integer d, the integer set S(d) is defined as follows: S(d) = (k I k = ad, where a > d+l and GCD(a, d-1) =

GCD(a, d+l) = 1). Example 3: The sets S(2). S(4) and S(8) are

defined as follows: S(2) = { 8,10,14,16.20,22,26,28,32 ,... }. S(4) = { 28,32,44,52,56,64,68,76,84 ,... ). S(8 ) = [ 80,88,104,128,136,152,160 ,... ).

Let N and d be positive integers, and N E S(d). The

Steu 1;Construct Li(N,d) fori = 0, 1, 2, ..., d-1, following steps construct the placement matrix Q(N,d):

where, Li(N,d) is defined as follows:

N d Definition 13: Li(N,d) is a --xN matrix formed

by the first -rows of Ci(N,d) = g-circ(ij, ij+l, ij+2, ..., ij-l), where j = (d-l)(d+l), and g = N-d.

N d

Lemma 2: Li(N,d) is a latin square. Proof: Since N is a multiple of d, GCD(Nd) =

d. From Lemma 1, we know that if d = 1, then Co(N,d) corresponds to a latin square. Otherwise, Ci(N,d) does not form a latin square. However, if Ci(N,d) is partitioned into d identical submatrices,

L.,(N,d) of order -xN, as shown in Fig. 4, each of

these submatrices corresponds to a latin rectangle.

N h :::I - - - -

Fig. 4. Ci(N,d) and its submatrices b(N,d).

Example 4 : Assume that N = 16 and d = 2. Then, g = 14, and j = 3. Therefore, C0(16,2) = 14- circ(O,l,2 ,..., 15). and C1(16,2) = 14-circ (3,4,5 ,..., 15,0,1,2). The circulants C0(16,2) and C1(16,2) are illustrated in Fig. 5 and Fig. 6, respectively.

0 1 2 3 4 5 6 7 8 9 1 0 1 1 12 1 3 1 4 1 5

2 3 4 5 6 7 8 9 10 11 1 2 1 3 14 1 5 0 1

4 5 6 7 E 9 10 1 1 1 2 1 3 1 4 1 5 0 1 2 3

6 7 8 9 1 0 1 1 12 1 3 1 4 1 5 0 1 2 3 4 5

8 9 10 11 1213 14 1 5 0 1 2 3 4 5 6 7

10 1 1 1 2 1 3 1 4 1 5 0 1 2 3 4 5 6 7 8 9

12 1314 15 0 1 2 3 4 5 6 7 8 9 1 0 1 1

14 1 5 0 1 2 3 4 5 6 7 E 9 10 1 1 1 2 1 3 , _ _ _ - - - - - - _ - _ _ - - _ _ _ _ _ - - - - - - - - - - - -

0 1 2 3 4 5 6 7 8 9 1 0 1 1 12 1314 15

2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 14 1 5 0 1

4 5 6 7 E 9 10 1 1 1 2 1 3 1 4 1 5 0 1 2 3

6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 0 1 2 3 4 5

8 9 1 0 1 1 1 2 1 3 1 4 1 5 0 1 2 3 4 5 6 7

10 1 1 1 2 1 3 1 4 1 5 0 1 2 3 4 5 6 7 8 9

12 1 3 1 4 15 0 1 2 3 4 5 6 7 8 9 1 0 1 1

14 1 5 0 1 2 3 4 5 6 7 8 9 10 1 1 1 2 1 3

Fig. 5 . C0(16.2) = 14-circ(O, 1.2, ..., 15), and its partitions b(16,2).

94

Page 4: [IEEE Comput. Soc. Press 1993 5th IEEE Symposium on Parallel and Distributed Processing - Dallas, TX, USA (1-4 Dec. 1993)] Proceedings of 1993 5th IEEE Symposium on Parallel and Distributed

- 3 4 5 6 7 8 9 1011 12 1 3 1 4 1 5 0 1 2

5 6 7 8 9 1 0 1 1 1213 14 1 5 0 1 2 3 4

7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 0 1 2 3 4 5 6

9 1 0 1 1 12 1 3 1 4 15 0 1 2 3 4 5 6 7 8

11 1 2 1 3 14 1 5 0 1 2 3 4 5 6 7 8 9 10

13 1 4 1 5 0 1 2 3 4 5 6 7 8 9 1 0 1 1 12

15 0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4

1 2 3 4 5 6 7 8 9 10 1 1 1 2 1 3 1 4 1 5 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 4 5 6 7 8 9 1011 12 1 3 1 4 1 5 0 1 2

5 6 7 8 9 1 0 1 1 1213 14 1 5 0 1 2 3 4

7 8 9 10 1 1 1 2 13 1 4 1 5 0 1 2 3 4 5 6

9 1 0 1 1 12 1 3 1 4 15 0 1 2 3 4 5 6 7 8

11 1213 1 4 1 5 0 1 2 3 4 5 6 7 8 9 10

13 1 4 1 5 0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2

1 5 0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 9

1 2 3 4 5 6 7 8 9 10 1 1 1 2 1 3 1 4 1 5 0 - Fig. 6. C1(16,2) = 14-circ(3.4,5, ..., 2), and

its partitions L1(16,2).

SteD 2; Take latin rectangles, b(N,d), L,(N,d), ..., Lh-l(N,d), and construct a square matrix LS(N,d) of order N by placing the latin rectangles from top to bottom, as shown in Fig. 7.

Fig. 7. Submatrices Lo(N,d), L1(N,d), ..., Lh-,(N,d). and corresponding LS(N,d).

Lemma 3: LS(N,d) is a latin square. Proof: From Lemma 2, we know that each of

Li(N,d) is a latin rectangle, indicating that no two elements in a row are the same. Now, in order to demonstrate that no two column elements are equal consider the k-th column which can be written as ij + rd + k where 0 S i I d, and 0 I y I -. Since, GCD(j,d) = 1 by definition, no two elements in the same column can be equal.

Example 5: LS(16,2), which is composed of b(16,2) andL1(16,2) is given in Fig. 8.

N d

- 0 1 2 3 4 5 6 7 8 9 1 0 1 1 12 1 3 1 4 1 5

2 3 4 5 6 7 8 9 1 0 1 1 1213 14 1 5 0 1

4 5 6 7 8 9 10 1 1 1 2 1 3 1415 0 1 2 3

6 7 8 9 1011 12 1 3 1 4 15 0 1 2 3 4 5

8 9 10 11 1 2 1 3 14 1 5 0 1 2 3 4 5 6 7

10 1 1 1 2 13 1 4 1 5 0 1 2 3 4 5 6 7 8 9

12 1314 15 0 1 2 3 4 5 6 7 8 9 1 0 1 1

14 1 5 0 1 2 3 4 5 6 7 8 9 10 1 1 1 2 1 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 4 5 6 7 8 9 1 0 1 1 12 1 3 1 4 1 5 0 1 2

5 6 7 8 9 1 0 1 1 1 2 1 3 14 1 5 0 1 2 3 4

7 8 9 10 1 1 1 2 13 1 4 1 5 0 1 2 3 4 5 6

9 1 0 1 1 1 2 1 3 1 4 1 5 0 1 2 3 4 5 6 7 8

11 1213 14 1 5 0 1 2 3 4 5 6 7 8 9 10

13 1 4 1 5 0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2

15 0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 - 1 2 3 4 5 6 7 8 9 10 1 1 1 2 13 1 4 1 5 0

Fig. 8. LS(16,2).

SteD Remove (d-l)(d+l) rows from the bottom, and (d-l)(d+l) columns from the right side of LS(N,d), as shown in Fig 9. The remaining square matrix of order N - (d-l)(d+l) on the upper left comer of LS(N,d) is called Q(N,d).

r (h-l)(d+l)

I

Fig. 9. Obtaining Q(N,d) from LS(N,d).

B . Properties of the Placement Matrix

The significant properties of Q(N,d) are discussed in

Theorem I : The placement matrix Q(N,d) satisfies

(i) All the integers belonging to any row Ri of

(ii) All the integers belonging to any column Ci of

Q(N,d)

the following theorems:

the following two conditions:

Q(N,d) are distinct.

Q(N,d) are distinct.

95

Page 5: [IEEE Comput. Soc. Press 1993 5th IEEE Symposium on Parallel and Distributed Processing - Dallas, TX, USA (1-4 Dec. 1993)] Proceedings of 1993 5th IEEE Symposium on Parallel and Distributed

Proof: Q(N,d) is a square submatrix of LS(N,d) which implies that Q(N,d) is a latin square. Q.E.D.

Theorem 2: The placement matrix Q(N,d) satisfies the following two conditions:

(i) All the integers belonging to any positive diagonal Ui of Q(N,d) are distinct.

(ii) All the integers belonging to any negative diagonal Li of Q(N,d) are distinct.

Proof: (i) Let U, be a positive diagonal of LS(N,d), where 0 I p I 2N-2. Since Lo(N,d) is constructed using a g-circulant, where g = N-d, any two consecutive diagonal elements xij and xi+lj-l belonging to b(N,d) satisfy the following condition:

xi+l j-l = xi; + (d - 1) (mod N). The same argument IS also valid for the other latin rectangles Ll(N,d), k(N,d), ..., L&4,d). Furthermore, the boundary elements (between L,(N,d) and L,,(N,d)) xkj and xk+l j-1 will satisfy the condition:

where, k = - - 1, for every 0 I Q I d-1. Since, j = (d-

l)(d+l), the equation can be written as,

Since GCD(N$-1) = 1, the operations, Xi+l j -1= xij + (d - 1) (mod N), generates all the elements of the set (0,1,2, ...,N- 1). However, some of the integers are skipped at each latin rectangle boundary. The number of elements skipped at each boundary is

X k + l j - l = X k j + (d - 1) + j (mod N), aN d

Xk+l j-1 = X k j + (d - 1) + (d-l)(d+l) (mod N).

& = d+l.

Further, we know that there are d-1 boundaries. Hence, the total number of elements skipped are given by (d- l)(d+l). As a result of the skipped elements, LS(N,d) contains two identical square matrices of order (d-l)(d+l) on the upper right, and lower left comer of LS(N,d), as illustrated in Fig. 10. Q(N,d) is constructed by removing these two identical submatrices.

LS(N,d) =

Fig. 10. LS(N,d), and identical submatrices.

(U) Let Lq be a negative diagonal of LS(Nd), where -N+1 I p 5 N-1. Since Lo(N,d) is constructed using a g circulant, where g = N-d, any two consecutive diagonal

elements x i j and xi+lj+l belonging to Lo(N,d) satisfies the following:

xi+l,j+l = xij + (d + 1) (mod N). The same argument also applies to the elements of the other latin rectangles Ll(N,d), b(N,d), ..., L,.,(N,d). Furthermore, the boundary elements between latin rectangles satisfy the following:

where, k =- - 1, for every 0 I a I h-1. Since, j = (d-

l)(d+l), the equation can be written as,

Since GCD(N,d+l) = 1, the operation, xi+lj+l = xi4 + (d + 1) (mod N), generates all the elements of the set (0,1,2 ,..., N-1). However, some of the integers are skipped at each latin rectangle boundary. So, at each latin rectangle boundary, the number of missing elements are calculated as,

xk+l,j+l= X k j + (d + 1) + j (mod N), aN d

X k + l j + l = X k j + (d + 1) + (d-l)(d+l) (mod N).

Further, we know that-there are d-1 boundaries. Hence, the total number of missing elements are given by (d- l)(d-1). As a result of the skipped elements, LS(N,d) containd two identical square matrices of order (d-l)(d-1) on the upper left, and lower right comer of LS(N,d), as illustrated in Fig. 10. It is apparent that LS(N.d) contains N - (d-l)(d+ 1) non-repeated consecutive elements on each positive diagonal, and all the positive diagonal elements of Q(N,d) are distinct. Q.E.D.

Example 6: Consider LS(16,2) which was constructed in the previous section. The identical submatrices are demonstrated in Fig. 11.

. . . . . . . . . . . . . . . . . . . . . . . -pl 2 3 4 5 6 7 E 9 1011 1

1 2 3 4 5 6 7 E 9 10 11 1213 1 I 1 4 5 6 7 E 9 10 1 1 1 2 1 3 1 4 1 5 0 I 1 6 7 E 9 1 0 1 1 12 1 3 1 4 15 0 1 2 1 3 4 5

: 8 9 10 11 1213 14 1 5 0 1 2 3 4: 5 6 7 I I 1 1 0 1 1 1 2 1 3 1 4 1 5 0 1 2 3 4 5 6 1 7 E 9 I I 1 1 2 1314 15 o 1 2 3 4 5 6 7 8 1 9 io 11

:14 1 5 0 1 2 3 4 5 6 7 8 9 1 d 1 1 1 2 1 3 I I 1 3 4 5 6 7 E 9 1 0 1 1 12 1 3 1 4 151 0 1 2 I I 1 5 6 7 E 9 1 0 1 1 1213 14 1 5 0 1 1 2 3 4 I I 1 7 8 9 10 1 1 1 2 1 3 1415 0 1 2 3 1 4 5 6 I I 1 9 1 0 1 1 12 1 3 1 4 15 0 1 2 3 4 5 1 6 7 8 I I I11 1 2 1 3 14 15 0 1 2 3 4 5 6 7 l 8 9 10

- - - - - - - - - - - - - - - - - - - J 0 1 2 3 4 5 6 7 8 9 1 0 1 1 12

2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4

4 5 6 7 8 9 10 1 1 1 2 1 3 1 4 1 a

Fig. 11. Identification of the conflict-free regions in LS(16.2).

96

Page 6: [IEEE Comput. Soc. Press 1993 5th IEEE Symposium on Parallel and Distributed Processing - Dallas, TX, USA (1-4 Dec. 1993)] Proceedings of 1993 5th IEEE Symposium on Parallel and Distributed

Both the elements on the upper left comer and on the lower right comer, which are on the same diagonal, correspond to the same integer. Similarly, the corresponding elements of the 3x3 matrix on the upper right corner of LS(16,2) and the one on the lower left comer are identical. The 13x13 matrix marked with dashed lines corresponds to Q(16.2). which contains no two identical integers occupying the Same diagonal.

Theorem 3: The placement matrix Q(N,d) satisfies the following two conditions:

(i) All the integers belonging to any square block

Bij of size d of Q(N,d) are distinct, provided that d 5

IF (ii) All the integers belonging to any distributed

block Dij of distance d of Q(N,d) are distinct, provided 7

that d 54;.

Proof: (i) Let Bij be a square block of size d. Then, Bij corresponds to a square submatrix of order d of LS(N,d). The elements of any two consecutive rows k and k+l of Bij, belonging to the same latin rectangle Ll(N,d) can be given as,

a a+ 1 a+2 ... a+d-1 a+d a+d+l a+d+2 ... a+2d-1

where a = xkj. Further, the boundary rows between two latin rectangles LI(N,d) and Ll+l(N,d) generate the following sequences:

a a+ 1 a+2 ... a+d-1 a+j+d a+j+d+l a+j+d+2 . . . a+j+2d-l

The number of elements skipped are given by, j = (d-l)(d+l) = d2 - 1.

So, the total number of integers are d2 + d2 - 1 = 2d2 - 1. In order all the integers to be distinct 2d2 - 1 I N

should be satisfied. By definition, d I indicating that 2dZ I N.

(ii) Let Dij be a square block of distance d. Then, the elements of Dij can be written in the form of a square submatrix of order d. The elements of any two consecutive rows k and k+l of Dij, belonging to the Same latin rectangle Ll(N,d) can be given as,

fi

a a+d a+2d ... a+(d-l)d a+d2 a+d2+d a+d2+2d ... a+d2+(d-l)d

where a = Xkj. The number of elements belonging to the

same latin square Ll(N,d) is -. Since d 5 px I 2d. N d

Further, the boundary rows between two latin rectangles Ll(N,d) and b+l(N,d) generate the following sequence.

a a-td ~ 2 d ... at(d-l)d a+j+d2 a+j+d2+6 a+j+d2+2d ... a+j+d%(d-l)d

Since j = (d-l)(d+l) is relatively prime to d, the elements of Dij belonging to different latin squares are distinct. The elements of Dij are distributed at most into d latin squares, satisfying that 2d2 I N. Q.E.D.

C . Memory Storage Schemes Based on Q(N,d)

In the previous section, we demonstrated that the integers belonging to any row, column, diagonal, square block, or distributed square block of the placement matrix Q(N,d) are distinct. Based on the placement matrix Q(N,d) we define the memory storage scheme as follows: Consider a parallel memory system consisting of N memory modules. Let A be matrix of order N-(d- l)(d+l), where N E S(d), and h = GCD(N,d). Then, store every element aij of A into the memory module qij, where, Qj,is the (ij)-th element of the placement matrix Q(N,d).

Theorem 4: Let the elements of a matrix A be stored into a parallel memory system using Q(N,d). Then, the index of the corresponding memory module mij for every element aij of A is generated as follows:

mij = LLJ((d-l)(d+l) - N) + di + j mod N. N/d

Proof: The number of rows in each component of

LS(N,d) is given by :. Therefore, LGA gives the

component number to which aij belongs, and (i - 7 L-J) gives the index of row i within that component.

Each component is shifted (d-l)(d+l) positions to right, and each row within each component is shifted d positions to right. Therefore,

i

i N/d

mij = (i - N i - L bd + L-](d-l)(d+l) i + j mod N, d NJd N/d

mij = LLJ((d-l)(d+l) - N) + di + j mod N. Nld

Q.E.D.

Example 7: The placement matrix, Q(16,2) has 16

been constructed in Example 5 . Here, c = y = 8. The

data element a3.5 is stored in the memory module with 6 index, (2(3 mod 8) + Q)(2-1)(2+1) - 5 ) mod 16 = ( -5)

mod 16 = 11.

Page 7: [IEEE Comput. Soc. Press 1993 5th IEEE Symposium on Parallel and Distributed Processing - Dallas, TX, USA (1-4 Dec. 1993)] Proceedings of 1993 5th IEEE Symposium on Parallel and Distributed

It can be observed that for a particular storage N scheme the values of - and ((d-l)(d+l) - N) are d

constants. Therefore the memory module adress of the element aij can be given as:

N d where, c1 = - , and c2 = (d-l)(d+l) - N.

In accessing the rows, columns, diagonals, square blocks, and distributed square blocks of two dimensional matrices, the address calculation is done only for the first element of the vector. Once the correspondance between one element of matrix and the memory module is established then the memory modules of the rest of the elements in the array are implicitly specified by the allocation procedure. The addressing and the alignment of the data elements is handled by the interconnection network [6].

Corollary I : If the elements of a square matrix of order N - (d-l)(d+l) are placed into a parallel memory system of M memory modules based on the placement matrix Q(N,d), then the following conditions are Satisfied:

(i) All the elements of any row Ri of A are accessible in parallel.

(ii) All the elements of any column Ci of A are

(iii) All the elements of any positive diagonal Ui of

(iv) All the elements of any negative diagonal Li of

(v) All the elements of any square block Bi of A are

accessible in parallel.

A are accessible in parallel.

A are accessible in parallel.

accessible in parallel,

provided that d I @-

provided that d I I F

(vi) All the elements of any distributed block Di of A are accessible in parallel,

Proof: Follows from Theorem 1, 2, and 3.

Example 8: Consider a parallel memory system having 16 memory modules. Let A be a square matrix of order 13. According to the storage scheme suggested, the elements of A are placed into the memory modules using Q(16.2) as shown in Fig. 12.

Corollary 2 : If the size of the matrix is greater than N - (d-l)(d+l), then our memory storage scheme guarantees that each row, column, positive and negative diagonal of A can be accessed in

rK(d-l)(d+l) M 1

Memory Modules 0 1 2 3 4 5 6 7 8 9 10 11 1 2 13 14 15

a2 ,5 a 2 , 6

Fig. 12. Placing the elements of a matrix using LS(16,2).

98

Page 8: [IEEE Comput. Soc. Press 1993 5th IEEE Symposium on Parallel and Distributed Processing - Dallas, TX, USA (1-4 Dec. 1993)] Proceedings of 1993 5th IEEE Symposium on Parallel and Distributed

memory fetch cycles, where M is the size of the matrix. (It should be noticed that theoretically the best possible

M time would have been r$).

Example 9: Consider a memory system with N memory modules. Let A be a matrix of size (N-3). Then, the matrix can be placed into the memory modules using Q(N.2). However, if the size of A is greater than N-3, then the matrix is partitioned into submatrices of size (N-3)x(N-3), as shown in Fig. 13, and the elements of each submatrix are placed into the memory modules using the storage scheme introduced in this section. This guarantees that each row, column, positive and negative diagonal of A can be accessed in

memory accesses, where M is the size of A. M

Fig. 13. Partitioning a matrix into submatrices of size (N-3)x(N-3).

IV. CONCLUSION

We have introduced a memory storage scheme allowing conflict-free parallel access to rows, columns, diagonals, square blocks, and distributed square blocks of two dimensional arrays, based on circulant matrices. As opposed to the previous approaches, the proposed scheme can be used for an arbitrary number of memory modules and arbitrary size of matrices. The allocation and access schemes have been illustrated with examples. The data alignment requirements of the proposed storage scheme will be analyzed in a subsequent paper.

REFERENCES

M. Balakrishnan, R. Jain, and C.S. Raghavendra, "On array storage for conflict-free memory for parallel processors," in Proc. International Conference on Parallel Processing, Illinois,

R.A. Brualdi, Introductory combinatorics, North- Holland, DD. 205-213, 1979.

August 15-19, 1988.

[3] P. Budniland DJ . Kuck, "The organization and use of parallel memories," IEEE Trans. Computers, pp.1566-1569, (Dec. 1971). P.J. Davis, Circulant matrices, Wiley, N.Y., 1979. C. Erbas and M.M. Tanik, "Storage Schemes for Parallel Memory Systems and the N-Queens problem," in Proc. 15th Annual ASME ETCE Conference, Computer Applications Symposium, Houston, Texas, Jan. 26-30, 1992. K. Kim and V.K.P. Kumar, "Perfect latin squares and parallel array access," in Proc. 16th Annual Symposium on Computer Architecture, Washington, pp. 372-379, 1989. D.H. Lawrie and C.R. Vora, "The prime memory system for array access," IEEE Trans. Computers,

D. Lee, "Scrambled Storage for Parallel Memory Systems," in Proc. IEEE 15th Annual International Symposium on Computer Architecture, Hawaii, M a y 30-June 2, 1988.

[4]

[5]

[6]

[7l

C-31 ( 5 ) (May, 1982), pp. 435-442. [8]

99