25
ElSEVlER to PARALLEL COMPUTING Spanning subgraphs with applications communication on the multidimensional torus network Paraskevi Fragopoulou *, Selim G. Akl * J Department of Computingand InformationScience, Queen’s University. Kingston, Ontario, Canada K7L 3N6 Received 12 August 1994; revised 24 February19959 March 1996 Abstract In this paper we develop a special framework on the multidimensional torus (or multidimen- sional wraparound mesh), a network that is currently receiving considerable attention. Subse- quently, we show how this framework can be used to construct a spanning tree, denoted by BST, and a spanning subgraph, denoted by BSG, that possess several special properties, on the network under consideration. BST is a shortest path, balanced spanning tree. BSG is a subgraph derived as an extension of BST. The importance of the spanning subgraphs is demonstrated with the development of algorithms for four fundamental communication problems on interconnection networks, namely, the single-node and multinode broadcasting and the single-node and multin- ode scattering (or total exchange). These problems are studied under the store-and-forward, all-port, and bidirectional communication model and are analyzed according to the constant cost model. BSG is used to develop optimal algorithms that achieve 100% utilization of the network edges, given that the communication model allows the splitting and recombining of messages. If this is prohibitively expensive, near optimal algorithms are developed, using BST. Similar techniques have been previously used for the development of communication algorithms on the binary hypercube network (Bertsekas et al., 199 1). Keywords: Communication algorithm; Interconnection network; Multidimensionaltorus; Spanning subgraph * Corresponding author. Email: (fragopou,akl)@qucis.queensu.ca. 0167-8191/%/$15.00 0 1996 Elscvier Science B.V. All rights reserved PfI SOl67-8191(96)00029-4

Spanning subgraphs with applications to communication on the multidimensional torus network

Embed Size (px)

Citation preview

ElSEVlER

to

PARALLEL COMPUTING

Spanning subgraphs with applications communication on the multidimensional

torus network

Paraskevi Fragopoulou *, Selim G. Akl * J

Department of Computing and Information Science, Queen’s University. Kingston, Ontario, Canada K7L 3N6

Received 12 August 1994; revised 24 February 19959 March 1996

Abstract

In this paper we develop a special framework on the multidimensional torus (or multidimen-

sional wraparound mesh), a network that is currently receiving considerable attention. Subse- quently, we show how this framework can be used to construct a spanning tree, denoted by BST, and a spanning subgraph, denoted by BSG, that possess several special properties, on the network under consideration. BST is a shortest path, balanced spanning tree. BSG is a subgraph derived as an extension of BST. The importance of the spanning subgraphs is demonstrated with the development of algorithms for four fundamental communication problems on interconnection networks, namely, the single-node and multinode broadcasting and the single-node and multin- ode scattering (or total exchange). These problems are studied under the store-and-forward, all-port, and bidirectional communication model and are analyzed according to the constant cost model. BSG is used to develop optimal algorithms that achieve 100% utilization of the network edges, given that the communication model allows the splitting and recombining of messages. If this is prohibitively expensive, near optimal algorithms are developed, using BST. Similar techniques have been previously used for the development of communication algorithms on the binary hypercube network (Bertsekas et al., 199 1).

Keywords: Communication algorithm; Interconnection network; Multidimensional torus; Spanning subgraph

* Corresponding author. ’ Email: (fragopou,akl)@qucis.queensu.ca.

0167-8191/%/$15.00 0 1996 Elscvier Science B.V. All rights reserved PfI SOl67-8191(96)00029-4

992 P. Fragopoulou, S.G. Akl/Parallel Computing 22 (1996) 991-1015

1. Introduction

We develop a shortest path, balanced spanning tree, denoted by BST, on the multidimensional torus network. BST is balanced, meaning that the imbalance among its subtrees diminishes rapidly as the number of nodes in the network increases. To our knowledge it is the first time a spanning tree of this type is constructed on the torus network of any dimension. BST is extended to a spanning subgraph, denoted by BSG. BSG has a single root node, and there are multiple paths between the root node and some other nodes of the network through BSG. Following the construction of the subgraphs we show how they can be used to develop algorithms for some fundamental communication problems, namely, the single-node and multinode broadcasting, and the single-node and multinode scattering. If the underlying communication model allows the splitting and recombining of messages, BSG can be used to develop optimal algorithms for some of the above communication problems that also achieve 100% utilization of the network edges. However, if splitting and recombining of messages is prohibitively costly, the BST spanning tree is used to derive near optimal algorithms. Using BST, only near optimal algorithms are developed since this is not a perfectly balanced spanning tree. However, since the imbalance among the subtrees of BST diminishes rapidly, these algorithms are very close to being optimal.

Broadcasting is the distribution of the same group of messages from a source processor to all other processors, and scattering is the distribution of distinct groups of messages from a source processor to each other processors. We consider broadcasting and scattering from a single source processor of the network (single-node broadcasting and scattering) and simultaneously executed from all processors of the network (multi- node broadcasting and scattering). The multinode scattering problem is often referred to as rotal exchange [41.

All of the communication problems am studied under the store-and-forward commu- nication model, i.e., a processor must receive the entire message before it can process it and retransmit it. Furthermore, the algorithms are derived for the all-port model (as opposed to the one-port model), meaning that in one time step a processor can exchange messages of fixed length with all of its neighbors simultaneously. Whenever the one-port assumption applies, the algorithms are derived from those under the all-port model. The communication is bidirectional, so that an edge can be used for message transmission in both directions at each time step and can be viewed as two unidirectional edges. We consider a model where splitting and recombining of messages is allowed, and one in which it is prohibited. However, even for the algorithms developed under the first model only a small overhead is introduced since a limited amount of message splitting is required. In general, it takes one time step for a message to be transmitted on an edge, i.e., the unit cost model is assumed. If a message is split into s equal parts, each one of them requires l/s time to be transmitted on an edge.

The interconnection network under consideration is the multidimensional torus (or multidimensional wraparound mesh), which has been proven to be a flexible topology for the interconnection of processors [5,15]. An n-dimensional, k-at-y multidimensional torus, denoted by MT,,, , has N = k” processors, each one labeled by an n-digit number in radix k arithmetic. Two processors are connected if their labels differ in exactly one

P. Fragopoulou, S.G. Akl/Parallel Computing 22 (1996) 991-1015 993

Fig. 1. The MT2,4 network.

digit by j in modulo k arithmetic, j E { - 1, l), i.e., processor v,_ 1 . . . vi+ ,vivi_ , . . . vg is connected to processors v,_ , . . . vi+ , vi vi_ , . . . v,,, vi=(vi+j)mod k, jE{-1, I), for all 0 Q i Q n - 1. The MT2,4 network can be seen in Fig. 1.

The multinode broadcasting and scattering problems are of special interest. A special technique is used on the multidimensional torus (Lemma 17) so that messages originat- ing at individual nodes are interleaved in such a manner that no two messages contend for the same edge at any given time during the execution of the algorithm. The technique for the interleaving of messages was first presented in [4] for the binary hypercube network and is modified here for the torus network. For the single-node scattering problem, where the edges incident to the source node constitute a bottleneck for the transmission of the messages, the BSG and BST spanning subgraphs offer the capability to transmit an equal, or almost equal, respectively, number of messages over each edge incident to the source node. These, along with the fact that each message follows a shortest path to its destination help achieve the minimum number of time steps required for the algorithms to complete. The single-node and multinode broadcasting algorithms can be used for the single-node and multinode reduction problems, respec- tively, by inverting the transmission of messages. Furthermore, the single-node and multinode scattering algorithms can be used for the single-node and multinode gathering also by inverting the transmission of messages. Single-node gathering is the collection in one processor of the values stored in each other processor. Single-node reduction is similar to the single-node gathering problem except that the values are reduced over an associative operator, i.e., add, multiply, exclusive-OR, maximum, minimum, etc. The simultaneous execution of a single-node gathering or reduction from all processors give rise to the multinode gathering or reduction problems, respectively.

The method of constructing spanning subgraphs for the development of communica- tion algorithms has been used extensively in the past. We refer the reader to the following nonexhaustive list of references [17,3,4,24,6-9,12,25]. Subgraphs with the

properties of BST and BSG have been previously constructed on the binary hypercube network [17]. In [3,4], multinode broadcasting and scattering algorithms that achieve

994 P. Fragopoulou, S.G. Akl/ Parallel Computing 22 (1996) 991-1015

100% edge utilization were developed on the binary hypercube network. The develop- ment of these algorithms was based on the construction of spanning trees. In [24], optimal algorithms that also achieve full edge utilization for the multinode broadcasting and scattering problems were developed on the torus network based on a novel technique of matrix decomposition. These algorithms use the two communication models considered here, the one that allows and the one that prohibits message splitting and recombining. In this paper we develop optimal algorithms for the same problems using spanning subgraph construction. Although spanning subgraphs of this type have been previously constructed on the binary hypercube, their construction on the torus network is nontrivial and considerable effort was required. We believe that the BST and BSG subgraphs are interesting on their own right, independently of the communication algorithms. To our knowledge it is the first time a spanning tree, such as BST, with an imbalance among its subtrees that diminishes as the number of nodes increases, is constructed on the torus. Furthermore, for the first time a subgraph, such as BSG, that can be used to achieve full edge utilization for algorithms that proceed symmetrically from all nodes is constructed on the torus. The constructions are general enough to be valid on toruses of any dimensions as opposed to other results that apply to toruses of two or three dimensions.

The remainder of this paper is organized as follows. Notations and definitions that are used throughout the paper are introduced in Section 2. Sections 3 and 4 present the construction of a spanning tree and a spanning subgraph, respectively, on the multidi- mensional torus network. Section 5 is devoted to the derivation of lower bounds and the design of optimal algorithms based on the spanning subgraphs,.for all of the communica- tion problems under consideration. Finally, we conclude in section 6 along with a summary of the results obtained in this paper and some suggestions for further research.

2. Notations and definitions

An n-dimensional k-ary multidimensional torus (or wraparound mesh) network, denoted by MT,,k, is an undirected graph of N = k” nodes, each one labeled by an n digit number in radix k arithmetic. Each node v is connected to 2n other nodes with which it differs in only one digit by j in modulo k arithmetic, j E { - 1, l), i.e., node v=v n_, . . . vi+ ]ViVi_ * . . . v. is connected to nodes U’ = v,_ ,..‘vi+,v~Ui_ I... vg, rJ:= (vi + j) mod k, j E { - 1, l), for all 0 d i < n - 1. The MT,,, network can be seen in Fig. 1. The network is edge and node symmetric with degree D = 2 n (number of edges at each node) and diameter nl k/21 ( maximum shortest distance between any pair of nodes). MT,,, belongs to the class of Cayley graphs [2,18]. For networks in this class, nodes correspond to the elements of a finite group and edges correspond to a set of generators that act on the elements of the group [21. In this context, the 2n generators that define the edges of MT,,, are denoted by g,‘, 0 < i < n - 1, j E { - 1, 1). Generator g/ connects node v = v,_, . . . vi+,vivi_, . . . u0 to node v’= v,_, . . . vi+,((ui + j)mod k)vi_,...vo, which results by adding j in modulo k arithmetic to the ith digit of v. In this case we say that edge (v, v’) is of dimension g,‘, or dim(v, v’) = g,‘. Thus,

each node of MT,,, is connected to 2n other nodes through dimensions g/, 0 < i < n -

P. Fragopoulou, S.G. Akl/ Parallel Computing 22 (19%) 991- 1015 995

1, j E { - 1, l}. In what follows, node 00 . . . 0 of MT,,, that contains only zero digits is denoted by 0”. It can be easily observed that the network is a generalization of the popular binary hypercube. The binary hypercube contains a pair of connected nodes in each dimension, while the multidimensional torus contains a ring of k nodes in each dimension. The multidimensional torus network is also referred to as the generalized hyperrectangulur network [ 1.51.

We now present the definition of an operation on nodes of the multidimensional torus network, namely the translation operation, that will be of primary importance for the construction of the spanning subgraphs and the description of the communication algorithms. This is a well-known automorphism that is derived from the node symmetry property of the multidimensional torus network. Having a spanning subgraph rooted at node 0” of MT,,,, we can derive an isomorphic spanning subgraph, with the same properties, rooted at any other node s of MT,.,, using a translation of the subgraph rooted at node 0” with respect to S. As a consequence, it is sufficient to construct a spanning subgraph rooted at node 0” of MT,,,. The translation operation on MT,,, is analogous to the exclusive-OR operation on nodes of the binary hypercube [17,3,4].

Definition 1. The trunslution of a node u with respect to node S, denoted by T,(v), is defined to be node t = T,(u), so that ti = (vi + si> mod k, 0 < i Q n - 1. The inverse translation of a node v with respect to node S, denoted by T,-‘(v), is defined to be node t=T,-‘(v),sothatti=(vi-si)mod k,O,<i<n-l.Bytranslationofanetworkwith respect to s we mean that each node of the network is translated with respect to S. For example, for nodes v = 231 and s = 132 of MT3,4, T,(v) = 323 and TSW’(v) = 103.

Lemma 2. The translation operation is an automorphism on the MT,,, network that preserves the dimension of each edge. If edge (v, u) is of dimension g,‘, then edge (T,(v), T,(u)) is also of dimension g,‘.

The proof for this lemma is omitted due to its simplicity [lo]. We now define another operation on MT,,,, namely the rotation operation, that will

also be of primary importance for the construction of the spanning trees, and for the development of the multinode broadcasting and scattering algorithms. As emphasized in the introduction, these algorithms are designed so that messages originating at individual nodes are interleaved in such a manner that no two messages contend for the same edge at any given time. The properties of the rotation operation as explained below, will help achieve this attribute. The rotation operation on nodes of the multidimensional torus has properties similar to those of the right cyclic shift operation on nodes of the binary hypercube [ 17,3,4].

Definition 3. Consider the function r(i) = (k - i) mod k from the set (0, 1,. . . , k - l} to itself. The rotation of a node u = v,_ , . . . ui+ ,uiui_ , . . . vo, denoted by R(v), is defined to be node r(vO)v,_ ,v,_2.. . vi+ ,vivi_, . . . v,. This can be viewed as a right cyclic shift of the digits of v with the wraparound digit being mapped through function r. By rotation of a network we mean that the rotation operation is applied to each node of the network. By R’ = R 0 R’- ’ we denote i application of rotation. For example, for nodes u = 321 and u = 012 of MT3,4, R(v) = 332 and R(u) = 201.

996 P. Fragopoulou, S.G. Akl/Parallel Computing 22 (19%) 991-1015

Lemma 4. The rotation operation has the following properties: (1) Zf (v, u) is an edge of dimension g,‘. 0 < i < n - 1, j E { - 1, l}, then edge

(R(v), R(u)) is of dimension g,!‘. so that:

i’= (i- 1) mod n, J”= (

-j ifi=O,

j otherwise’

(2) The rotation operation preserves the distance of each node from node 0”.

Proof. The proof for part 1 of this lemma is omitted due to its simplicity [lo]. The second part of the lemma can be proved as follows: The distance, d(v), of a node u of IVT,,~ from node 0” equals

n-1

d( v) = c I vi I, i-0

From the definition of the rotation operation we know that R(v) = r(vo)v,_ , . . . u,. It can be easily verified that Ir(i)l = Ii! for all i in (1, 2,...,k- l] and as a conse- quence d(v) = d( R( v)). 0

Lemma 5. If (v, u) is a directed edge of dimension g{, then the 2n directed edges derived from (v, u) by consecutive applications of the rotation operation are ail of diflerent dimensions [lo].

Proof. This is derived in a straightforward manner from the first part of Lemma 4, which describes the impact of the rotation operation on the dimension of an edge. For example, for edge (01, 31) of MT2,4 the 2n = 4 edges reduced b consecutive applications of the rotation operation are (01, 31) -% (30, 33) s (03, 13) 1 (10, 11) and their corresponding dimensions are g; ’ 3 go ’ 3 gf 4 g,& Cl

To summarize, the translation and the rotation operations are automorphisms on MT, k that preserve the distance between its nodes. The translation operation preserves the dimension of each edge (Lemma 2), while the rotation operation alters it in a regular fashion (Lemmas 4 and 5). The following definition shows how the nodes of the MT,,, network can be grouped into equivalence classes under the operation of rotation.

Definition 6. An ordered group of nodes, each one derived from its preceding one, cyclically, by the application of a rotation is called a necklace.

Lemma 7. Necklaces have the following properties: (1) A necklace contains at most 2n nodes. (2) The size of a necklace always divides 2n.

Proof. We prove each property separately. (1) From the definition of rotation it can be verified that R’“(v) = v for every node u

of MT,,,. However, we say at most 2n rotations because the same node can emerge after less than 2n rotations. For example, for node v = 131 of MT3,4, R’(131) = 131 and the same node emerges after only 2 and not 2n = 6 rotations.

P. Fragopoulou. S.G. Akl/ Parallel Computing 22 (1996) 991-1015 997

(2) The proof for this property can be found in group theory. A rotation operation is an automorphism on MT,,, of order 2n. A necklace is an orbit under the action of rotation. The size of an orbit always divides the order of the automorphism [14,20]. 0

From part 2 of Lemma 4 it is straightforwardly implied that all nodes of a necklace are at the same distance from node 0”.

In what follows a fill necklace is a necklace that contains 2n distinct nodes. A nonfill necklace is a necklace that contains less than 2n nodes.

Lemma 8. A node of MT,,, belongs to a nonfill necklace ifit has a nontrivial symmetry with respect to the rotation operation. Nodes of this type consist of a substring of q digits vq_, . . . v,vO which is repeated n/q times as follows:

vq-I... v+%r(v,-l)J(vl)r(%)

X . ..vq_ ,... v,vOr(vp_,)...r(v,)r(vO)vg_I...v,vO

Proof. We deal only with the case k > 2, because MTnz is equivalent to the binary hypercube and the nodes that belong to nonfull necklaces in this network were characterized in [ 171.

If a node v of MT,,k, k > 2, belongs to a nonfull necklace that contains m nodes then R”(v) = U. From Lemma 7 we know that m divides 2n. Furthermore, it can be easily verified that m < n and that m does not divide n (otherwise r(i) = i for all i in 0, z..., k - 1) thus a contradiction). Assume that p = (n div m) and q = (n mod m). Since m divides 2n and it does not divide n, it is obvious that 2q = m. Below we analytically express R”(v) and v as follows (notice that these is an one to one correspondence between the symbols of v and R”‘(v)):

v = v,_ , . . . vpm . . . v(p-l)m+qv(p-l)m+q-l~~~v2~+q”‘UZm.”Un+q

X . ..v....v . ..vg.

Rm(v)=r(v~_,g)...r(vq)...r(vO)v~_ ,... ~~~+~...v~,...v~~+~

X .‘.v2m...vm+q...v,

(upm = r(vq) because 2q = m). Since Rm(v) = v we notice that v0 = v, = v2m = . . . =v = r(u I= r(vm+q)= r(v2m+q) = . . - = r(vCP_ ,Im+q) = r’(v,). The same can be lzd for ill vi, 0 < i < q. From this discussion we conclude that v consists of a substring of q digits which is repeated as stated in the lemma. •I

For example, node 3 1133 1 of MT6,4 which belongs to a nonfull necklace that contains 4 nodes consists of substring 31 of two digits which is repeated three times as follows: 31r(3)r(l)31 = 311331.

From the properties of the rotation operation we conclude that the nodes of MT,,k at each distance from node 0” are collections of necklaces. In table 1, the necklaces of MT3,3, and those of MT,,, at each distance d, 0 ,< d G nlk/2], from node 0” are given enclosed in parentheses.

The following definition aims to distinguish one particular node of each necklace.

998 P. Fragopoulou, S.G. Akl/ Parallel Computing 22 (1996) 991-1015

Table 1 The necklaces of MT,., and MT,.,

d=O: 0380>

d= 1: (2&l, oso, oh2,180,010,061)

d=2: (2~0,012,162,130,0~1,2d1)

(2100, Oh, 262,1~0,0t2,161)

d= 3: (22Di, li2,172,1-i 1,2fl,2il)

(2102, lh)

d= 0:

d= 1:

d= 2:

d= 3:

d=4:

d= 5:

d=6:

(Oh,

(3~Oo,010, 43, Ido, olo, 061,

(3300,0~3,1~3,1i0,041,361)

(3100, Oil, 383,IlO, 013,161)

(Zo(jo, 0~0,062)

(393.113.173, lh, 3il.331)

(3200.092.263, liO,O42,2~1>

(2300,053, 162,210, Ok, 362)

(3103,111)

(233, 113, Ii,23 1,3h, 312)

(2%. 3i3, 112,2i3, lh,3;2)

(2200. Oi2,262)

(233, 112,272,2il, 322,232)

(252)

Definition 9. The binary correspondent of a node II, denoted by b(v) is defined to be node U’ = b(u) so that

u! = 0 if vi = 0, I

i 1 otherwise’

for 0 < i < n - 1. In other words, b(u) is derived by substituting each nonzero digit in u with 1. One particular node of each necklace is now distinguished as follows:

(1) Select the nodes of the necklace that have the largest binary correspondent. (2) Choose the largest among the nodes selected in step (I), if the k digits that label

the nodes of MT,,, are ordered as follows: 0 < 1 < k - 1 < . . * < i < k - i < . . . < [k/2]. This ordering of the digits is adopted in order to reflect how each digit contributes to the distance of a node from node 0”.

The node selected in step (2) is defined to be the generator node of the necklace.

For example, for necklace (230, 023, 102, 210,021, 302) of MT3,4 nodes 230, and 210 have the largest binary correspondent among the nodes of the necklace. The generator node of the necklace is node 230 because this is the largest from nodes 230, 210, if the digits used to label the nodes of MT,,, are ordered as 0 < 1 < 3 < 2.

The BST,. and BSG,. (which are rooted at node 0” of MT,,,) will each be composed of D subtrees. The first subtree of BST,, and BSG,. will contain the generator nodes of the necklaces of MT,,,. As a consequence, the generator nodes of the necklaces were chosen so that the generator node of a necklace at distance d from node 0” of MT,,, is connected with an edge to the generator node of a necklace at distance

P. Fragopoulou, S.G. AW/ParalleI Computing 22 (1996) 991-1015 999

d - 1 of node 0”. This is necessary in order for the generator nodes of the necklaces to form a subtree rooted an node 0”. The proof that the above statement is true will be given along with the construction of the BST,. and the BSG,. spanning subgraphs.

Definition 10. The displacement of a node u, denoted by D(u), is defined to be the minimum number of rotations that have to be applied on the generator node of the necklace of u, in order to derive node U.

Definition 11. The period of a node u, denoted by P(u), is defined to be the number of nodes contained in the necklace to which it belongs.

In Table 1, where the necklaces of MT,,, and MT,,, are given, the generator node of each necklace is underlined and the displacement of each node is marked on top of its label.

Definition 12. An unfolded necklace is an ordered group of exactly 2n nodes, not necessarily distinct, each one obtained from its preceding one cyclically, by the application of a rotation.

Each necklace has a corresponding unfolded necklace. For full necklaces, the corresponding unfolded necklace is the necklace itself. For nonfull necklaces that contain P nodes, the corresponding unfolded necklace is the necklace repeated 2n/P times. This is possible since the size of a necklace is always a divisor of 2n (Lemma 7). In Table 2, the unfolded necklaces of MT3,3, and those of IVT~,~ are given. A comparison with Table 1 will help clarify the differences between a necklace and its corresponding unfolded necklace.

The property of the rotation operation that 2n directed edges each of which is obtained as a rotation of its preceding one are all of different dimensions (Lemma 5),

Table 2

The unfolded necklaces of MT,., and MT,,,

d= 0: (ooo, 000. 000. 000, 000,000) d= 1: (200,020,002, 100,010,001) d= 2: (220,022,102, 110,011,201)

(210,021,202, 120,012,101) d= 3: (222,122,112,111,211,221)

(212, 121,212, 121,212,121)

d=O:

d= 1:

d=2:

d= 3:

d=4:

d=5:

d=6:

(ooo, OCO, 000, 000,000,000) (300,030,003,100,010,001)

(330,033, 103, 1 IO,01 1,301) (310,031,303,130,013, 101)

(200,020,002,200,020,002)

(333,133,113,111,311,331)

(320,032,203, 120,012,201)

(230,023, 102,210,021,302)

(313, 133,313, 131.313.131)

(233, 133, 113,211,321,332)

(231,323, 132,213, 121,312)

(220,022,202,220,022,202)

(223, 122,212,221,322,232)

(222,222,222,222,222,222)

1000 P. Fragopoulou. S.G. AW/Parallel Computing 22 (1996) 991-1015

along with the property of edge dimension preservation of the translation operation (Lemma 2) will be used extensively in the development of the multinode broadcasting and scattering algorithms. These properties will help guarantee that messages originating at individual nodes will be interleaved in such a manner that no two messages will contend for the same edge at any given time.

We are now ready to proceed to the construction of the spanning subgraphs. We start by constructing a shortest-path, balanced spanning tree using the framework defined in this section. Subsequently, we extend the spanning tree to a shortest-path spanning subgraph.

3. Spanning tree construction

We define a shortest-path, balanced spanning tree, rooted at node 0” of MT,,,, and denoted by BST,.. The spanning tree is balanced, meaning that the imbalance among its subtrees rapidly diminishes as the number of nodes in the network increases. The framework developed in the previous section will be the basic tool for the construction of the spanning tree with the stated properties. The ith, 0 < i < 2n, subtree of BST,,. is defined to be the subtree that contains all nodes u of MT,,, with displacement D(u) = i. Furthermore, an isomorphic spanning tree rooted at any other node s of MTn,k, and denoted by BST,, can be easily derived from BST,,. using the operation of translation with respect to s. We are now ready to proceed to a formal definition of BST,..

Definition 13. A shortest-path spanning tree, balanced to within a constant factor, rooted at node 0” of MT,,,, and denoted by BST,., is defined as follows: The 0th subtree of BST,. contains all nodes of MT,,, that have displacement zero and is defined through the following parent function. For node u with displacement zero, let p be the position of its lowest order nonzero digit.

parent(u) = pl if u=O”,

U”_, . ..u.+,u&- ,“. ug otherwise,

where up = UP - 1 ifO<u,G[k/2],

(u, + 1) mod k otherwise.

Any other subtree i, 0 < i < 2n, of BST,. is defined as a rotation of subtree i - 1, and it includes only those nodes of MT,,, that have displacement i.

The BST,. spanning tree on the MT3,3 network can be seen in Fig. 2.

Lemma 14. BST,. has the following properties: (1) BST,. is a shortest-path spanning tree rooted at node 0” of MT,,,. (2) BST,. is balanced, meaning that the imbalance among the subtrees diminishes

rapidly as the number of nodes in the network increases.

P. Fragopoulou, S.G. Akl/Parallel Computing 22 (19%) 991-1015

ool

Fig. 2. The BSr,., spanning tree on the MT,,, network.

1001

Proof. We prove each property separately. (1) We start by proving that the parent(u) function of Definition 13 constructs a

shortest-path tree on those nodes of MT,,, that have displacement zero (these are the generators nodes of necklaces). For node u = u,,_ , . . . up0 . . .O of MT,,, with displace- ment zero, its parent node u’ is obtained by changing the value of digit up (least significant nonzero digit of u) to up , or u’ = u,_ , . . . u; 0. . . 0. We have to prove that u’ also has displacement zero.

(a) Firstly we have to show that u’ is one of the nodes that have the largest binary correspondent among the nodes of its necklace (Definition 9).

If u; St: 0 then the binary correspondent of u’ is equal to the binary correspondent of u thus maximum.

Assume up = 0 and that u’ is not one of the nodes in its necklace with the largest binary correspondent. In this case there is a node R’(u’), obtained from u after i rotations, that has the largest binary correspondent. In this case the position of digit up or r(u;) in R’(u’) will be greater than p, otherwise node R’(u’) would start with digit 0. However, in this case node R’(u) would have largest binary correspondent than u in its necklace thus a contradiction.

(b) It can be similarly proven that u’ is the largest among the nodes of its necklace that have the largest binary correspondent, with respect to tbe ordering of the digits given in Definition 12. Assume again that node R’(u’) has the same binary correspon- dent as u’ but larger value. Again the position of digit u; or r(u;) in R’(u’) will be greater than p, otherwise node R’(u’) would start with digit 0. Furthermore, from the ordering of the digits in Definition 9 we can verify that the values of u; and r( u;) are both smaller than both the values of up and r(u$. As a consequence, R’(u) would have largest value than u in its necklace, thus a contradiction.

Furthermore, the parent(u) function generates a shortest path from node u to node 0”. The distance, d(u), of node u = u,_ 1 . . . ui+ ,uiui_ 1 . . . u. from node 0” equals

n-l

d(u) = c [oil, where Iuil = i-0

Ii_.. I

ifm~~/~‘2J’.

1002 P. Fragopoulou, S.G. Akl/Parallel Computing 22 (19%) 991-1015

The parent of node u is u’ = v,_ , . . . up+ ,uPup_, . . . u. and its distance from node 0” is:

n-l n-l n-l

d(d)= c Iuil+lu,l= c IuiI+Iu,I--l= ~Iz+l=d(u)-1. i=O,i#p i= O,i#p i=O

Consequently, the parent of node u is closer to node 0” than the node itself, hence a shortest-path tree.

The ith subtree of MT,. is obtained as a rotation of its preceding one or from the 0th subtree by the application of i rotations, after excluding nodes that do not have displacement i. The nodes that are excluded are always nodes that belong to nonfull necklaces. In order to show that the ith subtree of BSZ’o. is connected we have to prove that nodes that belong to nonfull necklaces and have displacement zero are always leaf nodes of the 0th subtree of BST,.. According to Lemma 8, a node u with displacement zero that belongs to a nonfull necklace consists of a substring s of q digits, s = uq_ , . . . u ,q,, which is repeated n/q times as follows: u = ST(S). . . sr( s>s (assuming that r(s)=r(uq_,)... r(u,)r(v,)). Assume that up is the last nonzero digit of u or one of its final zero digits. Obviously r~; belongs to the last substring of q digits of u. If u had a child node, then this would be node u’ = ST(S). . . sr(s>u,_ , s’ so that d = vq_ , . . . VP.. . u,u,,. We will prove that node u’ does not have displacement zero, it does not belong to the 0th subtree of BSr,. , and as a consequence it cannot be a child of node v.

(a) If up = 0 and up # 0 then node U’ is not one of the nodes of its necklace that have the largest binary correspondent because node U” = r( s’)sr( s) . . . ST(S) produced by U’ after q rotations has largest binary correspondent than u’ (Definition 9).

(b) If up # 0 and up # 0 then node u’ is one of the nodes in its necklace that have the largest binary correspondent. However, it is not the largest of those nodes because node U” = s’T(s)s.. . sr( s) which is obtained by U’ after n + q rotations is larger than U’ with respect to the ordering of the digits given in Definition 9.

(2) We must prove that the imbalance among the subtrees of BST rapidly diminishes as the number of nodes in the network increases. From the definition of BST,., its ith, 0 G i < 2n, subtree contains nodes u for which D(u) = i. From the 2n nodes that belong to a full necklace, each one belongs to a different subtree. Nodes that create the imbalance among the subtrees are the ones that belong to nonfull necklaces. We now derive an upper bound for the number of nodes that belong to nonfull necklaces. These nodes consist of a substring of q digits, which is repeated n/q times as explained in Lemma 8. Also from Lemma 8 we know that n/q is odd. So for n/q odd and q prime divisor of n (all the other divisors are included in this case> an estimate for the number of nodes that belong to nonfull necklaces is:

n/3

c kq Q 2k"j3.

As a consequence each subtree contains at least k”/2n - k”j3/n nodes. From Table 3 we notice that the ratio of the number of nodes between the largest subtree of MT,,, over k”/2n rapidly converges to one as the number of nodes increases. From the construction of the BST,. spanning tree we now that each subtree contains at most one

P. Fragopoulou, S.G. Akl/ParaNel Computing 22 (19%) 991-1015 1003

Table 3

Comparison between tbe smallest and the largest subtrees of SSr,. in IW,,~ for sample values of n and k. A:

Number of nodes that belong to nonfull necklaces. B: Number of nonfull necklaces. C: Number of necklaces.

D: Size of maximum subtree of BST. E: size of minimum subtree of BST. F: Ratio of the number of nodes

that belong to the maximum subtree of BST with 1 k” - 1/2nl

n k k” A B C D E [k”-1/2n] F

2 3 9 1 1 3 2 2 2 1.00

2 4 16 4 3 6 5 3 4 1.25

2 5 25 1 1 7 6 6 6 1.00

2 6 36 4 3 11 10 8 9 1.11

2 7 49 1 1 13 12 12 12 1.00

2 8 64 4 3 18 17 15 16 1.06

2 9 81 I 1 21 20 20 20 1.00

2 10 100 4 3 27 26 24 25 1.04

2 11 121 1 1 31 30 30 30 1.00

3 3 27 3 2 6 5 4 5 1.00

3 4 64 10 5 14 13 9 11 1.18

3 5 125 5 3 23 22 20 21 1.05

3 6 216 12 6 40 39 34 36 1.08

3 7 343 7 4 60 59 56 57 1.04

3 8 512 14 7 90 89 83 86 1.03

3 9 729 9 5 125 124 120 122 1.02

3 10 1000 16 8 172 171 164 167 I .02

3 11 1331 11 6 226 225 220 222 1.01

4 3 81 1 1 11 10 10 10 1.00

4 4 256 16 6 36 35 30 32 1.09

4 5 625 1 1 79 78 78 78 1.00

4 6 12% 16 6 166 165 160 162 1.02

4 7 2401 1 1 301 300 300 300 1.00

5 3 243 3 2 26 25 24 25 1.00

5 4 1024 34 9 108 107 99 103 1.04

5 5 3125 5 3 315 314 312 313 1.00

5 6 7776 36 10 784 783 774 778 1.01

5 7 16807 7 4 1684 1683 1680 1681 1.00 6 3 729 9 3 63 62 60 61 1.02 6 4 40% 76 17 352 351 335 342 1.03 6 5 15625 25 7 1307 1306 1300 1302 1.00

6 6 46656 96 12 3902 3901 3880 3888 1.00 6 7 117649 49 13 9813 9812 9800 9804 1.00

node from each necklace excluding the necklace that contains node 0” which is the root node. The first subtree of BST,. contains all nodes with displacement zero and as a consequence, it contains exactly one node from each necklace. As a consequence, the size of the maximum subtree is one less the number of necklaces. Cl

From the definition of BST,,. and the fact that each subtree is obtained as a rotation of its preceding one with some nodes excluded, we conclude that corresponding nodes of the subtrees form necklaces (Definition 6) and corresponding directed edges of the subtrees are of different dimensions (Lemma 5). The properties of BST,. are apparent in Fig. 2.

1004 P. Fragopoulou, S.G. Akl/ Parallel Computing 22 (19%) 991-1015

Using the BST,. spanning tree we can easily derive a SST,, rooted at any other node s of MT,,,. This spanning tree is isomorphic to BST,., and has the same properties as it. To derive BST,, we simply apply the operation of translation with respect to s, on BST,.. If edge (u, u> belongs to the ith subtree of BST,., then edge (T,(u), T,(u)) belongs to the ith subtree of BST,. Since the dimension of each edge is preserved under translation (Lemma 21, these edges are of the same dimension.

4. Spanning subgraph construction

The definition of BST,, is extended to a spanning subgraph, rooted at node 0” of MT,,,, and denoted by BSG,.. This is a special type of subgraph, which is composed of 2n subtrees, rooted at the nodes adjacent to node 0”. The ith, 0 Q i < 2n, subtree of BSG,. contains nodes u of MT,,, that have displacement D(u) = i mod P(u). All subtrees of BSG,. are isomorphic, and each one is derived from its preceding one cyclically, by the application of a rotation. Furthermore, an isomorphic spanning subgraph rooted at any other node s of IVT”,~, and denoted by BSG,, can be easily derived from BSG,, using the operation of translation with respect to s. We are now ready to proceed to a formal definition of BSG,..

Definition 15. A shortest-path spanning subgraph, rooted at node 0” of MT, k, and denoted by BSG,., is defined as follows: The 0th spanning tree of BSG, contains all nodes of MT,,, that have displacement zero and is defined through the following parent function. For node u with displacement zero, let p be the position of its lowest order nonzero digit.

parent(u) = B if u = 0”, u _, n . ..u.+,upup_, . ..Q otherwise ’

where up = up - 1 if O< up< [k/2],

(u,+l)modk otherwise .

Any other subtree i, 0 < i < 2n, of BSG,. is derived as a rotation of subtree i - 1.

The BSG,. spanning subgraph on the MT,,, network can be seen in Fig. 2.

Lemma 16. BSG,. has the following properties: (1) BSG,. is a shortest-path subgraph rooted at node 0” of MT,,,. (2) Nodes with period P that belong to nonfull necklaces haue 2n/P paths to node

0” through BSG,..

Proof. We prove each property separately. (1) The 0th subtrees of BSG,. and BST,. are the same and as a consequence the

proof of this property for BSG,. is the same as the proof of the same property for BST,. (Lemma 14).

P. Fragopovlou, S.G. Akl/ Parallel Computing 22 (1996) 991-1015 1005

Fig. 3. The LUG,,. spanning subgraph on the MT,., network.

(2) A node u with displacement zero and period P that belongs to a nonfull necklace belongs to the 0th subtree of BSG,.. Since each subtree is obtained as a rotation of its preceding one, node 1) also belongs to subtrees iP, 0 Q i < 2n/P. Consequently, u has 2n/P paths to node 0” through BSG,., one path through each one of the 2n/P subtrees it belongs to. 0

From the definition of BSG,. and the fact that each subtree of BSG,. is obtained as a rotation of its preceding one we conclude that corresponding nodes of the subtrees form unfolded necklaces (Definition 12) and corresponding directed edges are of different dimensions (Lemma 5). The properties of BSG,. are apparent in Fig. 3. Using the BSG,. subgraph we can easily derive a BSG,, rooted at any other node s of mnpt. This subgraph is isomorphic to BSG,. and has the same properties as it. To derive BSG,, we simply apply the operation of translation with respect to s, on BSG,.. If edge (u, u) belongs to the ith subtree of BSG,., then edge (T,(u), T,(u)) belongs to the ith subtree of BSG,. Since the dimension of each edge is preserved under translation, these edges are of the same dimension.

The importance of the BSG, subgraph lies in several different properties it possesses. The fact that each of the 2n subtrees of BSG, contains the same number of nodes is used in the single-node and multinode scattering algorithms in order for each source node to transmit an equal number of its messages over each one of its incident edges. A node that belongs to a number of different subtrees of BSG, receives an equal part of its messages from s through the edges of each subtree. Furthermore, as mentioned in Section 2, messages originating at individual nodes in a multinode broadcasting or scattering algorithm will be interleaved in such a manner that no two messages will contend for the same edge at any time during the execution of the algorithm. Under the all-port communication model 2nk” directed edges are available on MT,,, for message transmission at each time step. The algorithm proceeds symmetrically from all nodes and as a consequence messages originating at each one of the k” nodes of MT, t are transmitted through at most 2n directed edges at each time step. Let us denote by $0”) the set of 2n directed edges on which messages originating at node 0” are transmitted at time step i of the algorithm. Since a multinode algorithm proceeds symmetrically from each node of the network, the 2n directed edges on which messages originating at node

1006 P. Fragopoulou. S.G. Akl/Parallel Computing 22 (1996) 991-1015

s are transmitted at time step i, denoted by Ei(s), is obtained from @O”) using the operation of translation with respect to S, i.e., if (v, U) E EJO”) then (T,(u), T,(u)) E E,(s). The following lemma is enough to guarantee that no conflicts arise during the execution of an algorithm. The proof of this lemma is similar to the one given in [4] for the binary hypercube network and is omitted here.

Lemma 17. Al each time step i, if the 2n directed edges in Ei(O”) are all of different dimensions, then the sets of 2n directed edges Ei(s), where s ranges over all nodes of MT,,, , are disjoint.

We conclude that in order to avoid conflicts of messages originating at individual nodes during a multinode broadcasting or scattering algorithm, it is enough to use 2n corresponding directed edges of the subtrees of BSG,..

5. Communication algorithms

5.1. Lower bounds

In a single-node broadcasting problem on MT,,,, each of the k” - 1 destination nodes receives a message of length M from the source node and a lower bound for the number of message transmissions is M( kn - 1). The lower bound for the number of time step depends on the length of the message M, that the source node wishes to broadcast. It can be easily verified that for M < k/2 a shortest path spanning tree is sufficient and a lower bound for the number of time steps is M -I- nl k/21 - 1. However, if M > k/2 we need to use the edge-disjoint paths that exist between the source node and each destination node.

Lemma 18. There are 2n edge-disjoint paths of length at most (n - lNkk/2] + k - 1 between a pair of nodes in the MT,+k torus network.

Proof. We will explain how the 2n disjoint paths are constructed between node 0” and any other node u = u,_ 1.. . ui+ ,uiui_ 1 . . . u,, of the MT, k torus network. Assume that node u has j nonzero digits which are uj_ , . . . u. (all other cases are symmetric). Furthermore, without loss of generality assume that vi < k - ui. Below we describe how the 2n disjoint paths from node 0” to node u are constructed.

(1) A group of j paths is created as follows: The lth, 0 sz 1 < j, of these paths is created by increasing the digits of node 0” to thoseofnodeuintheorder:l,l+l,..., j-l,0 ,..., l-l.

(2) Another group of j paths is created as follows: ‘Ihe lth, 0 G 1 <j, of these paths is created by changing the lth bit of node 0” to k - 1, then increasing the digits of the resulting node to those of node u in the order: 1+ 1 ,..., j-l,0 ,..., 1 - 1. Finally node u is reached by decreasing the lth digit to the value of uI.

P. Fragopoulou, S.G. AW/ Parallel Computing 22 (19%) 991-1015 1007

(3) The final group of 2(n - j) paths is created as follows: The MI, j G 1< n, pair of these paths is created by changing the Itb digit of node 0” to 1 (or k - 1 for the other path in the pair), then increasing the digits of the resulting node to those of u in the order 0, 1,. . . , j - 1. Finally, node u is reached by increasing (decreasing for the other path in the pair) the Ith digit to the value of u,.

It can be easily proven that these paths are all edge disjoint. If ui z 0 the two paths created by first changing the value of ui move in the tearings of the torus for which ui = 0 and ui = k - 1, respectively. If ui = 0, then the two paths move in the tearings of the torus for which ui = 1 and ui = k - 1, respectively. These paths are shortest in all but one dimension. The worst case is reached for a node with all except one digit equal to 1 k/2] and a digit equal to 1 or k - 1. As a consequence, the value of the largest path is atmostcn- l)lk/2j+k- 1. 0

If M > k/2, in order to achieve the minimum number of time steps for the single-node broadcasting problem, the message of length M is separated into 2n submessages, each of length M/2n, which are pipelined in the network. Each of the 2n submessages reaches each destination node through a different edge-disjoint path. As a consequence, a lower bound for the number of time steps required for this problem is [M/2n] + (n - l>ik/2] + k - 2.

In a multinode broadcasting problem on MT,,k. each node receives a total of M( k” - 1) messages, a message of length M from each one of the k” - 1 other nodes. As a consequence, a lower bound for the number of message transmissions is M(k” - 1)k”. Since each node of MT,,, has 2n incident edges, a lower bound for the number of time steps required for this problem is [M(k” - 1)/2nl.

In a single-node scattering problem on MT,+, the source node transmits a total of M(k” - 1) messages, a message of length M to each one of the k” - 1 other nodes. Since each node of MT,,, has 2n incident edges, a lower bound for the number of time steps required for this problem is 1 M(k” - 1)/2nl. A message destined to a specific node must be transmitted on a number of edges at least equal to the shortest distance between that node and the source node. Therefore, a lower bound for the number of message transmissions required is the sum of the shortest distances from all nodes to the source node, multiplied by M, since each node receives a message of length M from the source node. Let us denote by Nd the number of nodes at distance d from the source node. A lower bound for the number of message transmissions required for a single-node scattering problem is the following:

nW2J

M c dIVd=Mk” d= I

. (1)

The expression enclosed in parentheses is the average distance per nude pair of the MT,,k network. In 1241 it was proven that it equals nk/4 for k even, and n(k2 - 1)/4k for k odd. If we substitute the average distance per node pair of MT, k in expression (11, we conclude that a lower bound for the number of message transmissions required for

1008 P. Fragopoulou, S.G. Akl/ Parallel Computing 22 (1996) 991-1015

Table 4 Lower bounds on the MT,,, network

Problem Tie steps Message transmissions

Single-node broadcasting M + nl k /2] - 1 ( M 6 k/2) M(k” - 1) ~hfM/2n]+h--l~k/2j+ k-2(M> k/2)

Multinode broadcasting [M(k” - 1)/2n1 Mk”(k” - 1) Single-node scattering [M(k” - 1)/2nl [Mnk”+‘/4l(k even)

fMn(k* - l)k”- ‘/41 (k odd) Multinode scattering 1 Mk”+ ’ /81 (k even) [Mnk*“+‘/41 (k even)

[M(k* - l)k”- ‘/8] (k odd) [Mtdk2 - l)k2n- ‘/41 (k odd)

the single-node scattering problem on MT,,, is [ Mnk”+ l/41 for k even, and [Mn(k* - l)k”- l/4] for k odd.

A multinode scattering problem can be viewed as k” single-node scattering problems, one from each node of MT,,,. A lower bound for the number of message transmissions is derived from the lower bound for the number of message transmissions required ,for the single-node scattering problem, multiplied by k”. This lower bound is equal to [Mnk2”“/41 for k even, and fMn(k’- l)k2”-‘/4] for k odd. Each node has 2n incident edges and at most 2nk” message transmissions can be performed at each time step. Consequently, a lower bound for the number of time steps required for this problem is [Mk'+ l/8) for k even, and [M(k2 - l)k”-‘/8] for k odd.

Table 4 summarizes the lower bounds for all of the above problems.

5.2. Single-node broadcasting

In a single-node broadcasting, a source node s wishes to transmits the same message of length M to each other node. It was shown in [3] that the problem of developing a single-node broadcasting algorithm on a network under the communication model considered in this paper is equivalent to the problem of constructing a balanced spanning tree on this network. We use BST, to develop the single-node broadcasting algorithm from node s of the torus. The message of length M that node s wishes to transmit is simply pipelined down the BST, sparming tree [3]. The number of message transmis- sions performed by this algorithm is M(k” - 11, which is optimal. However, the number of time steps required is M + nl k/21 - 1, which is optimal only for M sz k/2, because the algorithm does not take advantage of the edge-disjoint paths that exist between s and the other nodes of the network. The algorithm is only asymptotically optimal with respect to n for M > k/2.

5.3. Multinode broadcasting

In a multinode broadcasting algorithm, each node of the network transmits a message of length M to all the other nodes. Each node s uses BSG, for the transmission of its message. We assume that each node has a copy of the first subtree of BST,. which is precomputed according to the algorithm given in Definition 15. For each node that belongs to this first subtree a pointer is stored for each one its children nodes. These

P. Fragopoulou, S.G. Akl/Parallel Computing 22 (1996) 991-1015 1009

pointers are stored in increasing order of their edge dimensions. This requires a table of approximately k”/2n entries since this subtree contains only those nodes that have displacement zero. Furthermore, we assume that each node knows the period of every node that belongs to a nonfull necklace. According to Lemma 14, only a small number of such nodes exists and the amount of storage required is not essential.

The multinode broadcasting algorithm (under the model that allows splitting and recombining of messages) proceeds simultaneously for all nodes of the network as follows:

(1) Each source node s transmits its message to all of its neighbors simultaneously. Each message header carries the identity of the source node s, along with a number to indicate the spanning tree of SSG, in which it is transmitted.

(2) When an intermediate node u in subtree T,, 0 < 1< 2n, of BSG, receives the message broadcast by node s, it stores a copy, and performs the following procedures. This message has to be forwarded to the first child of node u in SSG,. This child node can be found as follows. Node u computes node U’ = R”‘- ‘(T’(u)). If u’ is the first child node of u’ in the first subtree of KS’,., then i = T,( R’(d)) is the first child node of u in subtree T, of BSG,.

If node U’ has period P then the message of length M is split into 2n/P

submessages of size MP/2n each. Node u of the ith spanning tree of BSG,

transmits the (i div P)th submessage to its first child node in BSG,.

(3) When a leaf node of BSG, receives the message broadcast by node s, it transmits an acknowledgment with the identity of s to its parent node in BSG, from which it received this message. When an intermediate node u receives an acknowledgment from one of its children nodes in BSG,, it forwards the message it received in the past from node s to its next child in BSG, following the splitting technique described above. When an acknowledgment is received from the last child node of u in BSG,,

node u transmits an acknowledgment with the identity of s to its parent node in BSG, .

The algorithm terminates when each source node receives acknowledgments from all its neighbors. In this algorithm, the transmission of messages in each BSG, corresponds to a simultaneous depth first traversal of its spanning trees. At each time step, one edge of each subtree of BSG,. is used for message transmission and corresponding edges of all its subtrees are used simultaneously. From the properties of BSG,. (Lemma 16) a group of 2n corresponding edges of the BSG,. are rotations of each other, thus of different dimensions. As a consequence, the requirement of Lemma 8 is satisfied and perfect interleaving of messages originating at individual nodes is achieved. An example of a multinode broadcasting algorithm on the MT,,, network that illustrates the message splitting technique can be seen in Fig. 4.

Given that the communication model allows the splitting of messages the above algorithms terminates in f M(k” - 1)/2nl time steps, which is optimal. The number of message transmissions performed is M(k” - 1)k” which is also optimal.

If the splitting of messages is prohibited by the communication model, then BST, is

used. The algorithm is similar to the one described above excluding the message splitting technique. Although this algorithm achieves the minimum number of message

1010 P. Fragopoulou, S.G. Akl/ Parallel Computing 22 (1996) 991-1015

Fig. 4. An instance of the multinode broadcasting algorithm on the MT,,, network using BSG,..

transmissions, the time required for its completion is M multiplied by the number of nodes in the maximum subtree of BST. It was shown in Lemma 14, that the ratio of the number of nodes in the maximum subtree of BST over k” - l/2 n rapidly converges to one as the number of nodes in the network increases. As a consequence, the algorithm derived using BST is near optimal.

5.4. Single-node scattering

In a single-node scattering algorithm, a source node s transmits distinct messages of length M to each other node. Node s uses BSG, for the transmission of its messages. Each source node keeps a table of approximately k”/2n nodes. The table includes the nodes in the first subtree of BSG,., sorted in reverse ordering of their distance from node 0”. The nodes in the table correspond to the transmission order in the first subtree of BSG,., and each one is accompanied by a number to indicate its period P. Recall that nodes with period P that belong to nonfull necklaces have 2n/P paths to node 0” through BSG,.. The interesting property of this algorithm is that nodes that belong to nonfull necklaces with period P receive a submessage of length MP/2n, from the M messages that node 0” broadcasts, through each one of the 2n/P paths from node 0” through BSG,. .

The single-node scattering algorithm (under the communication model that allows the splitting of messages) proceeds as follows:

For each node u in the table of k”/2n entries do the following: (1) If the source node is node O”, then it transmits messages destined to nodes U,

R(v), R*(v), . . . , R*“- l(u), simultaneously. If u belong to a full necklace then all of these nodes are distinct and node 0” transmits the message destined to node R’(u), 0 < i < 2n, through its ith subtree. However, if node u has period P < 2n and belongs to a nonfull necklace, then these nodes are not distinct but they are P distinct nodes repeated 2n/P times, in other words it is the unfolded necklace of a nonfull necklace that contains P nodes. In this case each of the P messages of length M node 0” has to transmit (to nodes u, R(u), R*(u), . . . , R*“- ‘(u)> is split into 2n/P submessages, each of length MP/2n. The ith, 0 < i < 2n/P, submessage of the jth, 0 Q j < P, message is transmitted over subtree iP + j of

P. Fragopoulou, S.G. AW/Parallel Computing 22 (1996) 991-1015 1011

(2)

(3)

BSG,.. As a consequence, each of the P nodes of a nonfull necklace receives a submessage of length MP/2n, from the message of length M originating at node 0”, through each of the 2n/P paths from node 0” through BSG,.. If the source is any other node s of MT,,,, then s transmits messages destined to nodes T,(v), T,(R(u)), 7’&R2(u)),. . . ,T,(R2”-‘(u)), simultaneously, using the same technique of message splitting described above for node 0”. We have to mention that each message header includes the identity of its destination node and a number to indicates the subtree of BSG, in which it is transmitted. As soon as an intermediate node u receives a new message header, it performs the following procedures. If node u is the destination of the message it stores a copy and removes it from the network. If u is not the destination of the message, the identity of the child node to which the message will be forward has to be determined. Node u of the ith, 0 d i < 2n, subtree of BSG, identifies the first digit to the left of digit n - 1 - i in its label that is not equal to the correspond- ing digit of the destination node. The message is forwarded to the neighbor of u in which the value of this digit is closer to the value of the same numbered digit in the destination node (with respect to the ordering of the digits in Definition 9). As soon as a source node has transmitted message to nodes T,(R’(u)), 0 Q i < 2n, through its incident edges, it starts transmitting messages to nodes T,(R’(u)), 0 Q i < 2n, for the next entry u in the table.

An instance of the single-node scattering on MT, 3 for messages transmitted from node 0” to nodes 212 and 121 is shown in Fig. 5, in order to demonstrate the message splitting technique described above.

Since BSG, is a shortest-path spanning subgraph, each message follows a shortest path to its destination node and as a consequence the minimum number of message transmissions, Mnk”+ ’ /4 for k even and M&k2 - l)k”- ‘/4 for k odd, is achieved. Given that the communication model allows the splitting of messages, the number of time steps required for this algorithm to complete is optimal, 1 M(k” - 1)/2n]. In this case, all edges incident to the source node contribute equally to the transmission of its messages.

Fig. 5. An instance of the single-node scattering algorithm on the MT,,, network using BSG,..

1012 P. Fragopoulou, S.G. Akl/ Parallel Computing 22 (1996) 991-l 015

If the communication model does not allow the splitting of messages, then BST, is used. The algorithm is similar to the one described above excluding the message splitting technique. Although this algorithm achieves the minimum number of message transmissions, the time required for its completion is M multiplied by the number of nodes in the maximum subtree of BST. It was shown in Lemma 14 that the ratio of the number of nodes in the maximum subtree of BST over k” - 1/2n rapidly converges to one as the number of nodes in the network increases. As a consequence, the algorithm derived using BST is near optimal.

5.5. Multinode scattering

In a multinode scattering, or total exchange, algorithm each node transmits distinct messages of length M to each other node. The algorithm proceeds symmetrically from all nodes of the network and each node s uses BSG, for the transmission of its messages. Each node keeps a table of approximately k”/2n nodes. The nodes in the table correspond to the transmission order of the first subtree of BSG,., and each one is accompanied by a number to indicate its period P.

The multinode scattering algorithm (under the communication model that allows the splitting of messages) proceeds simultaneously from all nodes of the network as follows:

For each node u in the table of k”/2n entries do the following: (1) Source node s determines the destination of the message to be transmitted over

its ith, 0 < i < 2n, subtree as T&R’(u)). For node u with period P, each of the P messages of length M that have to be transmitted by the source node is split into 2n/P submessages of size MP/2n each. The ith, 0 < i < 2n/P, submes- sage of the jth, 0 < j < P, message is transmitted over the (iP + jhh subtree of the source. We have to mention that the identity of the destination node and a number that indicates the subtree of BSG,. in which the message is transmitted are included in the message header.

(2) As soon as an intermediate node u receives a new message header, it has to wait until it receives the entire message that follows it. If node u is the destination node of the message, it stores a copy and removes it from the network. If node u is not the destination node of the message, it has to identify the child node to which this message has to be forwarded. Node u of the ith, 0 d i < 2n, subtree of BSG,, locates the first digit to the left of digit n - 1 - i in its label that is not equal to the corresponding digit of the destination node. The message is forwarded to the neighbor of u in which the value of this digit is closer to the value of the same numbered digit in the destination node (with respect to the ordering of the digits in Definition 9).

(3) When the messages transmitted from a source node s have reached their destination nodes T,(R’(u)), 0 < i < 2n, then s can transmit messages to nodes T,( I?‘( u)), 0 < i < 2 n, for the next entry u in the table.

From the properties of BSG,., we know that the 2n paths that lead to nodes Z?‘(u), 0 Q i < 2n, through its subtrees i, 0 Q i < 2n, respectively, are rotations of each other (Lemma 16), and as a consequence, the 2n directed edges at each level of these paths are of different dimensions. Each node in a path receives the entire message from

P. Fragopoulou. S.G. Akl/ Parallel Computing 22 (19%) 991-1015 1013

its parent node before it starts transmitting it to the next node down the path. As a consequence, at each time step, 2n directed edges that are all at the same level of the paths are used. Since these edges are all of different dimensions the requirement of Lemma 17 is satisfied, and no two messages contend for the same edge during the execution of the algorithm.

Each message follows a shortest path to its destination node and the minimum number of message transmissions, Mnk”‘+ ‘/4 for k even and Mn(k2 - l)k2”- ‘/4 for k odd, is achieved. Given that the model allows the splitting of messages, the number of time steps required for this algorithm to complete is Mk”+‘/8 for k even and M( k2 - l)k”- ‘/8 for k odd, which is optimal. In this case, all edges of the network contribute equally to the transmission of the messages at each time step of the algorithm.

If the splitting of messages is prohibited by the communication model, then the BST’ spanning tree is used. The algorithm is similar to the one described above excluding the message splitting technique. Although this algorithm achieves the minimum number of message transmissions, the time required for its completion is M multiplied by the number of nodes in the maximum subtree of BST and by the average distance per node pair of the MT,,, network (defined in Subsection 5.1). Since the ratio of the number of nodes in the maximum subtree of BST oven k” - 1/2n rapidly converges to one (Lemma 61, the algorithm derived using BST is near optimal.

6. Conclusions

A general framework was developed on the multidimensional torus network, that led to the construction of a shortest-path, balanced spanning tree, and a spanning subgraph. Definitions such as the ones for the rotation operation and the grouping of the nodes into necklaces, were developed.

The application of the spanning subgraphs to the development of optimal communica- tion algorithms was demonstrated by giving algorithms for the single-node and multin- ode broadcasting, and the single-node and multinode scattering problems, under the store-and-forward, all-port communication model. These are problems in which all nodes of the network know in advance the communication pattern. ‘The method is mostly useful for communication problems that require a group or all nodes of the network to be sources of messages, such as the multinode broadcasting and scattering problems. The property that corresponding edges of the subtrees are of different dimensions, along with Lemma 17, give a sufficient condition for conflict avoidance. The spanning subgraphs can be used for the development of algorithms for a number of other communication problems, or under a variety of communication models, such as the one-port model.

We are confident that a general framework that leads to the construction of spanning subgraphs with similar properties can be potentially developed on networks that belong to a subclass of the Cayley graphs. This will offer a uniform solution to a wide range of communication problems on a wide range of networks. Future research could move towards various directions, the most important being the generalization of the developed

1014 P. Fragopouiou, S.G. Akl/Parallel Computing 22 (1996) 991-1015

framework to a class of interconnection networks that exhibit specific characteristics, and the application of this framework to the solution of other types of problems.

Acknowledgments

This research was supported by the Telecommunications Research Institute of Ontario and by the Natural Sciences and Engineering Research Council of Canada.

References

[l] S.B. Akers, D. Hare1 and B. Krishnamurthy, The star graph An attractive alternative to the hypercube, in: Proc. Internat. Conf on Parallel Processing, St. Charles, IL (1987) 393-400.

[2] S.B. Akers and B. Krishnamurdty, A group theoretic model for symmetric interconnection networks, IEEE Trans. Cotnput. 38 (4) (1989) 555-566.

[3] D.P. Bertsekas and J.N. Tsitsiklis, Parallel and Distribured Computation: Numerical Methods (Prentice- Hall, Englewood Cliffs, NJ, 1989).

[4] D.P. Bertsekas, C. Ozveren, G.D. Stamoulis, P. Tseng and J.N. Tsitsiklis, Optimal communication algorithms for hypercubes, J. Parallel Distributed Comput. 11 (4) (1991) 263-275.

[5] L.N. Bhuyan and D.P. Agrawal, Generalized hypercube and hyperbus structures for a computer network, fEEE Trans. Comput. 33 (4) (1984) 323-333.

[6] P. Fragopoulou and S.G. Akl, Optimal communication algorithms on the star interconnection network, in: Proc. IEEE Symp. on Parallel and Distributed Processing, Dallas, TX ( 1993) 702-7 11.

[7] P. Fragopoulou and S.G. Akl, Optimal communication algorithms on star graphs using spanning tree constructions, J. Parallel Distributed Comput. 24 (1) (1995) 55-71.

[8] P. Fragopoulou and S.G. Akl, Edge-disjoint spanning trees on the star network with applications to fault tolerance, IEEE Trans. Comput. (19951, to appear.

[9] P. Fragopoulou, S.G. Akl and H. Meijer, Optimal communication primitives on the generalized hypercube network, J. Parallel Distributed Comput. (19951, to appear.

[lo] P. Fragopoulou and S.G. Akl, A framework for optimal communication on the multidimensional torus network, Tech. Rep. 94-363, Dept. of Computing and Information Science, Queen’s University, Kingston, ON, Canada, 1994.

[ 1 l] P. Fragopoulou and S.G. Akl, Spanning graphs with applications to communication on a subclass of the Cayley graph based networks, Manuscript, 1995.

[12] P. Fraigniaud and E. Lazard, Methods and models of communication in usual networks, Discrete Appl. Math. 53 (l-3) (1994) 79-133.

[13] P. Fraigniaud, Complexity analysis of broadcasting in hypercubes with restricted communication capabih- ties, J. Parallel Distributed Comput. 16 (1) (1992) 15-26.

[ 141 L. Garding and T. Tambour, Algebra for Computer Science (Springer, Berlin, 1988). [15] P.T. Gaughan and S. Yalamanchili, Adaptive routing protocols for hypercuhe interconnection networks,

Computer 26 (5) (1993) 12-23. [ 161 S.L. Johnson, Communication efficient basic linear algebra computations on hypercube architectures, J.

Parallel Distributed Comput. 4 (2) (1987) 133- 172. [17] S.L. Johnson and CT. Ho, Optimum broadcasting and personalized communication in hypercubes, IEEE

Trans. Comput. 38 (9) ( 1989) 1249- 1268. 1183 S. Lakshmivarahan, J.S. Jwo and SK. Dhall, Symmetry in interconnection networks based on Cayley

graphs of permutation groups: A survey, Parallel Computing 19 (4) (1993) 361-407. [19] F.T. Leighton, Complexity Issues in VLSI: Optimal Layouts for rhe Shuffle Exchange Graph and orher

Networks (MIT-PI%, Cambridge, MA, 1983). [20] K.A. Ross and C.R.B. Wright, Discrete Mathematics (Prentice-Hall, Englewocd Cliffs, NJ, 1985).

P. Fragopoulou, S.G. Akl/Parallel Computing 22 (1996) 991-1015 1015

[Zl] Y. Saad and M.H. Schultz, Data communication in parallel architectures, Parallel Computing 11 (2)

(1989) 131-150.

[22] Y. Saad and M.H. Schultz, Data communication in hypercubes, /. Parallel Distributed Comput. 6 (1989)

11.5-135. (231 Q.F. Stout and B. Wagar, Intensive hypercube communication, J. Parullel Distributed Comput. 10 (2)

(1993) 167-181. [24] E.A. Varvarigos and D.P. Bertsekas, Communication algorithms for isotropic tasks in hypercubes and

wraparound meshes, Parallel Computing 18 (11) ( 1992) 1233- 1257. [25] E.A. Varvarigos and D.P. Bertsekas, Partial multinode broadcast and partial exchange algorithms for

d-dimensional meshes, J. Parallel Distributed Comput. 23 (2) (1994) 177- 189.