13
J. Parallel Distrib. Comput. 64 (2004) 16–28 The hierarchical cliques interconnection network Stuart Campbell, a,1,2 Mohan Kumar, b, and Stephan Olariu c a Fair Isaac Corporation, 901 Marquette Avenue Suite 3200 Minneapolis, MN 55402, USA b Department of Computer Science and Engineering, The University of Texas at Arlington, Box 19015, Arlington, TX 76120, USA c Department of Computer Science, Old Dominion University, Norfolk, VA 23529-0162, USA Received 7 February 2000; revised 15 July 2003 Abstract The fully connected network possesses extremely good topological, fault-tolerant, and embedding properties. However, due to its high degree, the fully connected network has not been an attractive candidate for building parallel computers. On the other hand, tree-based networks are popular as parallel computer networks, even though they suffer from poor fault-tolerance and embedding properties. The hierarchical cliques interconnection network described in this paper incorporates positive features of the fully connected network and the tree network. In other words, the hierarchical cliques possess such desirable properties as low diameter, low degree, self routing, versatile embedding, good fault-tolerance and strong resilience. Hierarchical cliques can efficiently embed most important networks and possess a scalable, modular structure. Further, by combining hierarchical cliques with fat trees congestion in the upper levels can be alleviated. r 2003 Elsevier Inc. All rights reserved. Keywords: Cliques; Trees; Interconnection networks; Network embedding; Fault-tolerance 1. Introduction In recent years several interconnection networks have been proposed and studied for their suitability in parallel computers. In general, some of the desirable properties of multiprocessor topologies include low diameter, low degree, high bisection width, versatile embedding properties and robustness to faults. The fully connected network possesses all the desirable properties except low degree. As the degree of a fully connected network of size N is ðN 1Þ; it is very expensive to construct multiprocessor networks with this topology. Less costly topologies, such as meshes and k-ary n- cubes, have been the most popular for building commercial parallel computers. Mesh topologies map well onto a plane, and are well suited to VLSI implementation, but suffer from high diameter and correspondingly high latency. Methods such as wormhole routing [4] have been used in mesh architectures to reduce latency. These methods are very successful when contention is low, but are less effective when network congestion rises [1]. Under these circum- stances meshes are only effective if the algorithm exhibits communication locality. Hence they are suitable for implementation of applications such as computa- tional fluid dynamics and low-level image processing. Other topologies which allow better use to be made of communication locality offer greater performance under these circumstances. Meshes also suffer from poor embedding properties and fault-tolerance. The binary n-cube topology possesses such desirable properties as symmetry, regularity, robustness, logarith- mic diameter and self routing schemes, among many others. On the other hand the hypercube topology has high degree for large networks and it does not have a modular structure. The tree network has low degree and is suitable for VLSI implementation. However, the tree network is notorious for message congestion at the root. Several interconnection networks that combine the good fea- tures of the hypercube, the mesh, the tree, and the fully connected network have been proposed [5,8–10,12,16]. The fat tree has been popular, its topology was adopted in the Connection Machine-5 (CM-5), a ARTICLE IN PRESS Corresponding author. E-mail address: [email protected] (M. Kumar). 1 This work was supported by a Curtin University Postgraduate Scholarship. 2 Thanks Dr J. Simpson for his invaluable assistance. 0743-7315/$ - see front matter r 2003 Elsevier Inc. All rights reserved. doi:10.1016/j.jpdc.2003.08.005

The hierarchical cliques interconnection network

Embed Size (px)

Citation preview

Page 1: The hierarchical cliques interconnection network

J. Parallel Distrib. Comput. 64 (2004) 16–28

ARTICLE IN PRESS

�Correspond

E-mail addr1This work

Scholarship.2Thanks Dr

0743-7315/$ - se

doi:10.1016/j.jp

The hierarchical cliques interconnection network

Stuart Campbell,a,1,2 Mohan Kumar,b,� and Stephan Olariuc

aFair Isaac Corporation, 901 Marquette Avenue Suite 3200 Minneapolis, MN 55402, USAbDepartment of Computer Science and Engineering, The University of Texas at Arlington, Box 19015, Arlington, TX 76120, USA

cDepartment of Computer Science, Old Dominion University, Norfolk, VA 23529-0162, USA

Received 7 February 2000; revised 15 July 2003

Abstract

The fully connected network possesses extremely good topological, fault-tolerant, and embedding properties. However, due to its

high degree, the fully connected network has not been an attractive candidate for building parallel computers. On the other hand,

tree-based networks are popular as parallel computer networks, even though they suffer from poor fault-tolerance and embedding

properties. The hierarchical cliques interconnection network described in this paper incorporates positive features of the fully

connected network and the tree network. In other words, the hierarchical cliques possess such desirable properties as low diameter,

low degree, self routing, versatile embedding, good fault-tolerance and strong resilience. Hierarchical cliques can efficiently embed

most important networks and possess a scalable, modular structure. Further, by combining hierarchical cliques with fat trees

congestion in the upper levels can be alleviated.

r 2003 Elsevier Inc. All rights reserved.

Keywords: Cliques; Trees; Interconnection networks; Network embedding; Fault-tolerance

1. Introduction

In recent years several interconnection networks havebeen proposed and studied for their suitability inparallel computers. In general, some of the desirableproperties of multiprocessor topologies include lowdiameter, low degree, high bisection width, versatileembedding properties and robustness to faults. The fullyconnected network possesses all the desirable propertiesexcept low degree. As the degree of a fully connectednetwork of size N is ðN � 1Þ; it is very expensive toconstruct multiprocessor networks with this topology.Less costly topologies, such as meshes and k-ary n-cubes, have been the most popular for buildingcommercial parallel computers.Mesh topologies map well onto a plane, and are well

suited to VLSI implementation, but suffer from highdiameter and correspondingly high latency. Methodssuch as wormhole routing [4] have been used in mesh

ing author.

ess: [email protected] (M. Kumar).

was supported by a Curtin University Postgraduate

J. Simpson for his invaluable assistance.

e front matter r 2003 Elsevier Inc. All rights reserved.

dc.2003.08.005

architectures to reduce latency. These methods are verysuccessful when contention is low, but are less effectivewhen network congestion rises [1]. Under these circum-stances meshes are only effective if the algorithmexhibits communication locality. Hence they are suitablefor implementation of applications such as computa-tional fluid dynamics and low-level image processing.Other topologies which allow better use to be made ofcommunication locality offer greater performance underthese circumstances. Meshes also suffer from poorembedding properties and fault-tolerance.The binary n-cube topology possesses such desirable

properties as symmetry, regularity, robustness, logarith-mic diameter and self routing schemes, among manyothers. On the other hand the hypercube topology hashigh degree for large networks and it does not have amodular structure.The tree network has low degree and is suitable for

VLSI implementation. However, the tree network isnotorious for message congestion at the root. Severalinterconnection networks that combine the good fea-tures of the hypercube, the mesh, the tree, and the fullyconnected network have been proposed [5,8–10,12,16].The fat tree has been popular, its topology was

adopted in the Connection Machine-5 (CM-5), a

Page 2: The hierarchical cliques interconnection network

ARTICLE IN PRESS

0

1 2 3 4

11 12 13 14

21 22 23 24

31 32 33 34

41 42 43 44

141 142 143 144

241 242 243 244

341 342 343 344

441 442 443 444

Represents a single connection between two nodes.

Represents connections forming a clique among enclosed nodes.

Fig. 1. Part of a HiC with k ¼ 4 and h ¼ 3:

S. Campbell et al. / J. Parallel Distrib. Comput. 64 (2004) 16–28 17

commercial machine. Further, Leiserson [9] and laterOhring et al. [13] have demonstrated the versatility ofthe fat tree network. Unlike the tree network, theintermediate nodes of the fat tree network have multipleparent nodes and hence there are multiple communica-tion paths among all pairs of pendant nodes.In this paper we describe the hierarchical cliques

ðHiCÞ network, which combines positive features of thefully connected network, the extended hypercube net-work, and the tree network. The HiC is a k-ary tree,modified to enhance local connectivity in a hierarchical,modular fashion. The HiC structure can be used todefine a number of different useful multiprocessorarchitectures. We define the leaf nodes, on level 0, asprocessor elements (PEs). All other nodes are switchingelements (SEs). The cost effectiveness of the HiC hasalready been analysed in [2], where it was referred to asthe Reliable Hierarchical Cliques (RHiC). At the risk ofcausing some confusion with a similar, but inferiortopology referred to in [2] as HiC; we have changed thename to prevent the misapprehension that reliability isthe main feature of this topology. Here we analyse someof its topological features relevant to performance andfault-tolerance. We demonstrate the simplicity ofvarious routing algorithms using the proposed addres-sing scheme and finally define some embeddings ofpopular topologies onto the HiC to demonstrate itsversatility.The paper is organised as follows. Section 2 describes

the structure and addressing scheme of the HiC: Sometopological parameters of the HiC are described inSection 3. Section 4 demonstrates the fault-tolerantproperties of the HiC: Various message routing algo-rithms for the HiC are described in Section 5.Embeddings of some important topologies onto theHiC are discussed in Section 6. Section 7 concludes thepaper.

2. Structure of the hierarchical cliques

HiCðk;hÞ is a k-ary tree of height h modified so thatgroups of nodes on the same level form cliques.Members of a clique are referred to as neighbours. Theroot node is at level h; and has address 0. The k childrenof the root node form a clique at level h � 1: They haveaddresses consisting of a single unique digit in the range1 to k: In general let m be a node at level l of HiCðk;hÞ;where ð0ploh � 1Þ: Then m has address M consistingof a sequence of digits /Ml ;y;Mh�1S; where eachdigit is in the range 1 to k: Consider a second node n inthe same HiC as m: If n’s address N is a proper suffix ofM then n is an ancestor of m; which is a descendant of n:If M ¼ /Ml ;NS then n is the parent of m and m is achild of n: If a sequence P exists such that N ¼

/P;Nh�1S and M ¼ /P;Mh�1S then m and n areneighbours.We refer the reader to Fig. 1, illustrating the structure

and addressing scheme of an HiC with k ¼ 4 and h ¼ 3:(Only one-quarter of the level 0 nodes are shown). Theroot node has address 0 and is at level 3. The root nodehas four children at level 2, with addresses 1, 2, 3 and 4.The nodes at level 2 form a clique; in Fig. 1 nodes of aclique are shown enclosed in a dashed oval. Each level 2node has four children at level 1. The address of a nodeat level 1 consists of the address of its parent nodeappended to a digit between 1 and 4, which distinguishesthe level 1 node from its siblings. Thus, nodes 12, 22, 32and 42 are all children of node 2. Nodes at level 1 areneighbours if their parents are neighbours and the firstdigit of their address is the same; for example nodes 21,22, 23 and 24 form a clique. The address of a node atlevel 0 consists of the address of its parent nodeappended to a digit between 1 and 4. The digitdistinguishes the level 0 node from its siblings. Thusnodes 141, 241, 341 and 441 are all children of node 41.Nodes at level 0 are neighbours if their parents areneighbours and the first digit of their address is thesame, for example nodes 241, 242, 243 and 244 form aclique.

3. Parameters of the HiC

Assume that ka1; then the nodes of HiCðk;hÞ occur ath þ 1 levels, with the leaves at level 0 and the root at

Page 3: The hierarchical cliques interconnection network

ARTICLE IN PRESSS. Campbell et al. / J. Parallel Distrib. Comput. 64 (2004) 16–2818

level h: The total number of nodes in HiCðk;hÞ is N ¼khþ1�1

k�1 : Recall that we use only leaf nodes as processors(PEs), with all other nodes acting as switches (SEs). Wewish to determine the number of SEs ðNSÞ; the numberof PEs ðNPÞ and the number of communication linksðLÞ: An easy inductive argument shows that at level l;ð0plphÞ; there are Nl ¼ kh�l nodes. Therefore, thenumber of leaf nodes is NP ¼ kh: The number ofswitching nodes NS ¼

Phl¼1 kh�l ¼ kh�1

k�1 : The number oflinks L ¼ LT þ LC where LT represents the number oflinks in a k-ary tree of height h and LC represents thenumber of links connecting nodes into k-cliques. Thetotal number of links within a k-ary tree of height h isgiven by LT ¼

Phl¼1 kl : The number of links within a k-

clique is given by kðk � 1Þ=2; and the number of k-

cliques within HiCðk;hÞ is given byPh�1

l¼0 kh�1�l so LC ¼ðk � 1Þ=2�

Phl¼1 kl : Therefore L ¼ kðkþ1Þðkh�1Þ

2ðk�1Þ :

One of the most important properties of a multi-processor interconnection network is the diameter. Wedefine and prove two related lemmas before deriving theexpression for the diameter of an HiCðk;hÞ:

Lemma 1. In any HiCðk;hÞ; if leaf node m has address

M ¼ /M0;y;Mh�1S and leaf node n has address N ¼/N0;y;Nh�1S such that MxaNx for 0pxph � 1; then

dðm; nÞ ¼ 2h � 1:

Proof. Since MxaNx for 0pxph � 1 no prefix P canexist such that M ¼ /P;Mh�1S and N ¼ /P;Nh�1S:Therefore m and n cannot be neighbours, nor can theirancestors at level loh � 1 be neighbours. Now considerthe ancestors of m and n at level l: Their addresses will be/Ml ;y;Mh�1S and /Nl ;y;Nh�1S; respectively.Then, since MxaNx for 0pxph � 1; m and n can haveno common ancestor nearer than the root. Therefore,the shortest path between m and n passes through thelink between their respective ancestors at level h � 1 andhas a length of 2h � 1: &

Lemma 2. In HiCðk;hÞ there exist at least two nodes, m with

address M ¼ /M0;y;Mh�1S and n with address N ¼/N0;y;Nh�1S; such that MxaNx for 0pxph � 1:

Proof. Nodes at level 0 have an address made up of h

digits in the range 1 to k: The number of uniqueaddresses which can be generated with a sequence of h

digits, where each digit is in the range 1 to k; is kh: Thenumber of nodes at level 0 is kh; and each node has aunique address. &

Theorem 3. A hierarchical clique HiCðk;hÞ with k41 and

h40 has diameter D ¼ 2h � 1:

Proof. An HiCðk;hÞ is a k-ary tree of height h; modifiedwith extra links. A complete tree of height h has diameter

D ¼ 2h: Since the nodes at level h � 1 form a clique thedistance from any node at level h � 1 to any other node atlevel h � 1 is d ¼ 1: Therefore, the maximum possibledistance between two nodes in an HiCðk;hÞ is 2h � 1:From Lemmas 1 and 2 there exists at least one pair ofnodes m; n with distance dðm; nÞ ¼ 2h � 1: &

Another important parameter of a multiprocessorinterconnection network is the average distance betweenprocessors. The average inter-PE distance is often usedin dynamic network topologies rather than the averageinter-node distance, as it is a more useful indicator ofperformance. However, the symbol ð %dÞ is used torepresent the average inter-PE distance, as it isfrequently compared with the average inter-node dis-tance of static networks.

Theorem 4. A hierarchical clique HiCðk;hÞ with k41 and

h40 has average inter-PE distance

%d ¼2k

k�1ð1� khÞ þ ð2h þ 1Þkh � kh�1

kh � 1 :

Proof. Observe that theHiC topology is symmetric from

the point of view of the leaves. Without loss of generalitythen, %d can be determined by considering one leaf node,m: The distance from m to another leaf node n is dðm; nÞ:We wish to determine

Pdðm; nÞ for all n at level 0.

* The sum of distances to m’s ðk � 1Þ neighbours is ðk �1Þ:

* At level l ð0olohÞ; m has an ancestor u; which is theleast common ancestor of m and ðk � 1Þkðl�1Þ leafnodes. The distance from any of these leaf nodes to mis 2l; so the sum of the distances equals 2lðk �1Þkðl�1Þ: Considering all leaf nodes which have a leastcommon ancestor, other than the root, with m; thesum of distances is

Ph�1l¼1 2lðk � 1Þkðl�1Þ:

* Each of u’s ðk � 1Þ neighbours at level l is the leastcommon ancestor of one of m’s neighbours and ðk �1Þkðl�1Þ leaf nodes. The distance from any of theseleaf nodes to m is 2l þ 1; so the sum of the distancesequals ðk � 1Þ2kðl�1Þð2l þ 1Þ: Considering all leafnodes which have a least common ancestor, otherthan the root, with one of m’s neighbours, the sum ofdistances is

Ph�1l¼1 ðk � 1Þ2kðl�1Þð2l þ 1Þ:

These three cases cover all possible leaf nodes. Theirpartial results can be combined to give the expressionfor the total sum of distances.X

dðm; nÞ

¼ ðk � 1Þ þ ðk � 1ÞXh�1l¼1

2lkðl�1Þ

þ ðk � 1Þ2Xh�1l¼1

kðl�1Þð2l þ 1Þ ð1Þ

Page 4: The hierarchical cliques interconnection network

ARTICLE IN PRESSS. Campbell et al. / J. Parallel Distrib. Comput. 64 (2004) 16–28 19

¼ ðk � 1Þ kðh�1Þ þ 2Xh�1l¼1

lkl

!ð2Þ

¼ ðk � 1Þkðh�1Þ þ 2ððh � 1Þkhþ1 � hkh þ kÞðk � 1Þ ð3Þ

¼ � 2

ðk � 1Þ khþ1 þ ð2h þ 1Þkh � kh�1 þ 2

ðk � 1Þ k: ð4Þ

The total number of distances is NP � 1 ¼ kh � 1;

‘ %d ¼� 2

ðk�1Þ khþ1 þ ð2h þ 1Þkh � kh�1 þ 2ðk�1Þ k

kh � 1 : &

Values of average inter-PE distance for variousconfigurations of HiCðk;hÞ are plotted in Fig. 2. Theplots clearly show the reduced average distance which isobtained in a system with a given number of PEs byincreasing the value of k: This improvement is gained atthe cost of extra links and extra switch complexity,which naturally increases the cost of any system. Tojustify the extra cost, algorithms must perform betterwith the extra PE connectivity provided by the extralinks.

4. Fault-tolerance

In this section we study the fault-tolerance of the HiC

interconnection network. The connectivity, fault dia-meter, two-terminal reliability and average two-terminal

k=4k=3

Ave

rage

Dis

tanc

e

Numbe

0

10

20

30

40

50

0 500 1000 1500

Fig. 2. Average distance b

reliability of the HiC topology are determined. Theresults are compared with those of the binary hyper-cube. The binary hypercube is a fault-tolerant topologyand its characteristics have been widely studied andreported [6,14,15]. This makes it an ideal topology to useas a benchmark for fault-tolerance. In general, anysystem with fault-tolerance greater than or near to thatof the binary hypercube can be considered to have goodfault-tolerance.

4.1. Connectivity

We first prove some theorems regarding basic graphtheoretic properties of HiCs.

Theorem 5. The degree dðHiCðk;hÞÞ ¼ 2k:

Proof. dðGÞ ¼ maxðdmÞ: 8mAVðGÞ: In HiCðk;hÞ; degreedm ¼ k when m is the root node or any of the leaf nodesand dm ¼ 2k when m is any other node. &

Theorem 6. The connectivity kðHiCðk;hÞÞ ¼ k:

Proof. In HiCðk;hÞ; k node-disjoint paths exist from anyleaf m to any other node via m’s parent and k � 1neighbours. The root node has k node-disjoint paths toany other node via its k children. Any node n at level l;where 0oloh; has k node-disjoint paths to descendants,or descendants of n’s neighbours, via n’s k � 1 neigh-bours and the appropriate one of n’s children. Node nhas 2k node-disjoint paths to its neighbours via its k

k=5

r of PEs

2000 2500 3000

etween PEs in HiC:

Page 5: The hierarchical cliques interconnection network

ARTICLE IN PRESSS. Campbell et al. / J. Parallel Distrib. Comput. 64 (2004) 16–2820

children, k � 1 neighbours and single parent. Node n hask node-disjoint paths to all other nodes via n’s parentand k � 1 neighbours. In HiCðk;hÞ; therefore, a minimumof k node-disjoint paths exist between any two nodes.The HiCðk;hÞ is therefore ðk � 1Þ node fault-toler-

ant. &

Theorem 7. The link-connectivity lðHiCðk;hÞÞ ¼ k:

Proof. In HiCðk;hÞ; k link-disjoint paths exist from anyleaf m to any other node via m’s parent and k � 1neighbours. The root node has k link-disjoint paths toany other node via its k children. Any node n at level l;where 0oloh; has k link-disjoint paths to descendants,or descendants of n’s neighbours, via n’s k � 1 neigh-bours and the appropriate one of n’s children. Node nhas 2k link-disjoint paths to its neighbours via its k

children, k � 1 neighbours and single parent. Node n hask link-disjoint paths to all other nodes via n’s parent andk � 1 neighbours. In HiCðk;hÞ; therefore, a minimum of k

link-disjoint paths exist between any two nodes.The HiCðk;hÞ is therefore ðk � 1Þ link fault-toler-

ant. &

A more accurate reflection of the practical fault-tolerance of the network is given by determining theprobability of any fault set of k nodes or l linksdisconnecting the network [5].

Theorem 8. The probability p of an HiCðk;hÞ with N nodes

being disconnected by a k-node fault set is

p ¼ khð2k � 2Þ þ kðh�1Þ � kð2k � 2Þ � 1k � 1 � ðN � kÞ!k!

N!:

Proof. The total number F of k-node fault sets possibleis F ¼ N

k

� �which can be represented as F ¼ N!

ðN�kÞ!k!: LetTl be the number of k-node fault sets which can occur atlevel l which cause disconnection of the network. LetT ¼

Phl¼0Tl be the total number of k-node fault sets

which cause disconnection of the network. At level l ¼ h

there is only the root node, therefore Th ¼ 0: At levell ¼ h � 1 disconnection will occur only if all k nodes in aclique are faulty. There is only one clique at this level,therefore Tðh�1Þ ¼ 1: At level l; where 0oloðh � 1Þ;disconnection of the network will occur if n ð0onpkÞnodes in a clique are faulty, and the parents of the ðk �nÞ non-faulty nodes in the clique are faulty. The numberof ways such a failure can occur at a particular clique is

2k � 1: The total number of cliques for 0oloðh � 1Þ isPh�2l¼1 kðh�l�1Þ ¼ ðkðh�1Þ�kÞ

ðk�1Þ for ka1: ThereforePh�2

l¼1 Tl ¼ð2k�1Þðkðh�1Þ�kÞ

ðk�1Þ : At level l ¼ 0; disconnection of the

network will occur if n ð0onokÞ nodes in a clique arefaulty, and the parents of the ðk � nÞ non-faulty nodesin the clique are faulty. The number of ways such a

failure can occur at a particular clique is 2k � 2: Thetotal number of cliques for l ¼ 0 is kðh�1Þ: Therefore

T0 ¼ kðh�1Þð2k � 2Þ: Summing these terms we obtain

T ¼ Th þ Tðh�1Þ þXh�2l¼1

Tl þ T0 ð5Þ

¼ 1þ ð2k � 1Þðkðh�1Þ � kÞðk � 1Þ þ kðh�1Þð2k � 2Þ ð6Þ

¼ ðkh�1 � 1Þð2kk � 2k þ 1Þk � 1 : & ð7Þ

The probability of any given k node fault set causingdisconnection is given by T=F :

4.2. Fault diameter

From Theorem 6 we have kðHiCðk;hÞÞ ¼ k: The faultdiameter f ðHiCðk;hÞÞ is the largest diameter of thenetwork in the presence of a fault set of k � 1 nodes.Determining f ðHiCðk;hÞÞ involves computing max dðm; nÞfor each pair of PEs in HiCðk;hÞ given any k � 1 nodefault set.Let SðGÞ be the set of all pairs of nodes in VðGÞ: As

we are concerned with the ability of a network toprovide communication between PEs, let SðHiCðk;hÞÞ bethe set of all pairs of leaf nodes in VðHiCðk;hÞÞ: The pairsof leaf nodes within SðHiCðk;hÞÞ can be classifiedaccording to the distance dðm; nÞ: We divideSðHiCðk;hÞÞ into three classes and prove a lemma oneach.

Lemma 9. If dðm; nÞ ¼ 1 there are k node and link disjoint

paths between m and n of length at most dðm; nÞ þ 2:

Proof. If dðm; nÞ ¼ 1 then m and n are in the same clique.Node m has address M ¼ /M0;y;Mh�1S and node nhas address N ¼ /N0;y;Nh�1S: Let u be a leaf nodewhich is a neighbour of m and of n:

(1)

There is one path of length 1:

m-n:

(2)

There are k � 2 paths of length 2:

m- one of k � 2 possible nodes u-n:

(3)

There is one path of length 3:

m-m’s parent -n’s parent -n: &

Lemma 10. If dðm; nÞ41 and dðm; nÞ is ODD there are k

node and link disjoint paths between m and n of length at

most dðm; nÞ þ 1:

Page 6: The hierarchical cliques interconnection network

ARTICLE IN PRESSS. Campbell et al. / J. Parallel Distrib. Comput. 64 (2004) 16–28 21

Proof. If dðm; nÞ41 and dðm; nÞ is ODD then m and n arenot in the same clique and do not share a commonancestor other than the root. Node m has address M ¼/M0;y;Mh�1S and node n has address N ¼/N0;y;Nh�1S: Let node u with address U ¼/M0;y;Mh�2;Nh�1S and node v with address V ¼/N0;y;Nh�2;Mh�1S be two other leaf nodes. Then u isa neighbour of m and the least common ancestor of u andn; denoted by lcaðu; nÞ; exists at level l; where 0oloh:Also, v is a neighbour of n and lcaðv; mÞ exists at level l;where 0oloh: Let u be any leaf node other than u

which is a neighbour of m: Let vu be that leaf nodewhich is a neighbour of n such that lcaðvu; uÞ exists atlevel l:

(1)

There is one path of length dðm; nÞ:

m-lcaðm; vÞ-v-n:

(2)

There is one path of length dðm; nÞ:

m-u-lcaðu; nÞ-n:

(3)

There are k � 2 paths of length dðm; nÞ þ 1; eachincluding:

m- one of k � 2 possible nodes u-vu-n: &

Lemma 11. If dðm; nÞ is EVEN there are k node and

link disjoint paths between m and n of length at most

dðm; nÞ þ 2:

Proof. If dðm; nÞ is EVEN then m and n are not in thesame clique but share a common ancestor other than theroot, that is lcaðm; nÞ exists at level l; where 1oloh: Letu be any leaf node which is a neighbour of m: Let vu bethat leaf node which is a neighbour of n such thatlcaðvu; uÞ exists at level l:

(1)

There is one path of length dðm; nÞ:

m-lcaðm; nÞ-n:

(2)

There are k � 1 paths of length dðm; nÞ þ 2 eachincluding:

m- one of k � 1 possible nodes u-vu-n: &

Lemma 12. In an HiCðk;hÞ with hX2; any two leaf nodes

have at least k node disjoint paths between them of length

2h or less.

Proof. From Lemma 9, if dðm; nÞ ¼ 1 there are k nodedisjoint paths of maximum length 3 between m and n:For hX2; 3p2h � 1:From Theorem 3, DðHiCðk;hÞÞ ¼ 2h � 1; which implies

that DðHiCÞ is always ODD. Therefore maxðdðm; nÞÞ ¼DðHiCÞ if dðm; nÞ is ODD. From Lemma 10, if dðm; nÞ isODD, there are k node disjoint paths of maximumlength dðm; nÞ þ 1 between m and n: Combining these tworesults, we see that nodes m and n with dðm; nÞ ODD,

have k node disjoint paths of maximum length 2h

between them.From Lemma 11, if dðm; nÞ is EVEN there are k

node disjoint paths between m and n of length at mostdðm; nÞ þ 2: But distance dðm; nÞ is EVEN if andonly if lcaðm; nÞ exists at level l; where 1oloh: Forloh path length 2l þ 2 is never greater than 2h:Therefore there are k node disjoint paths of maximumlength 2h:These three cases include all possible pairs of leaf

nodes, hence the proof. &

Theorem 13. A hierarchical clique HiCðk;hÞ has a fault

diameter f ¼ 2h for k41 and h41:

Proof. Let m and n be two leaf nodes at distance dðm; nÞ:From Theorem 3 we know that at least one pair ofnodes m; n exists with distance dðm; nÞ ¼ DðHiCðk;hÞÞ ¼2h � 1:If k42 then a fault set of k � 1 nodes can break all

the paths between m and n of length 2h � 1:If k ¼ 2 then a fault set of k � 1 nodes cannot

break all the paths between m and n of length 2h � 1;since the proof for Lemma 10 shows that thereare always two paths of length dðm; nÞ if dðm; nÞ isODD. However, if dðm; nÞ ¼ 2h � 2 then the proof forLemma 11 shows that only a single path of length2h � 2 exists, with a further k � 1 paths of 2h:Thus a fault set of k � 1 nodes can increase dðm; nÞfrom 2h � 2 to 2h:From Lemma 12 we see that a fault set of k � 1 nodes

cannot increase dðm; nÞ to more than 2h regardless of thevalue of k: The largest diameter in the presence of ak � 1 node fault set is therefore 2h: &

In [7] two classes of graphs based on fault diameterwere distinguished.

Definition 14. A class of graphs Gi is strongly resilient if,for all i; there exists a constant t such thatfðGÞpDðGÞ þ t:

Definition 15. A class of graphs Gi is weakly resilient if,for all i; there exists a constant t such thatfðGÞpDðGÞ � t:

The HiCðk;hÞ is strongly resilient, since f ¼ D þ 1: Thisindicates that even under maximally faulty conditionsthe performance of the HiCðk;hÞ will not be severelydegraded.In Fig. 3 the fault diameter of the binary n-cube is

compared with that of HiCð4;hÞ and HiCð3;hÞ for a rangeof network sizes. The fault diameter of the HiCð3;hÞincreases most rapidly with increasing network size,followed by the n-cube. The HiCð4;hÞ clearly has thelowest fault diameter for all network sizes considered.

Page 7: The hierarchical cliques interconnection network

ARTICLE IN PRESS

HiC(3,h)

n-cube

HiC(4,h)

Faul

t Dia

met

er

Number of PEs

2

4

6

8

10

12

0 200 400 600 800 1000

Fig. 3. The fault-diameter of n-cube and HiC:

S. Campbell et al. / J. Parallel Distrib. Comput. 64 (2004) 16–2822

GivenHiCðk1;h1Þ andHiCðk2;h2Þ with identical numbers ofPEs, if k14k2 then f ðHiCðk1;h1ÞÞof ðHiCðk2;h2ÞÞ:

4.3. Two-terminal reliability

The two-terminal reliability or path reliability R2ðm; nÞbetween a pair of nodes m and n in VðGÞ is defined asthe probability of finding a path entirely composed ofoperational links and nodes between m and n [3]. LetSðGÞ be the set of all pairs of nodes in VðGÞ; thenthe two-terminal reliability R2ðGÞ of graph G is definedas ðminR2ðm; nÞÞ : 8ðm; nÞASðGÞ: As we are concernedwith the ability of a network to provide communica-tion between processors, the two-terminal reliabilityfor the HiC deals only with leaf nodes. Let SðHiCðk;hÞÞbe the set of all pairs of leaf nodes in HiCðk;hÞ: Inthis section a lower bound for R2ðHiCðk;hÞÞ will bedetermined.From Theorem 7, there are at least k link-disjoint

paths between any given pair of leaf nodes m and n inHiCðk;hÞ: Assume that link failures are statisticallyindependent and occur randomly in time. A lowerbound IR2ðm; nÞm can be determined by establishingthe probability of at least one of the k link-disjointpaths between leaf nodes m and n being completelyoperational. Using Lemmas 9–11, minðIR2ðm; nÞmÞ:8ðm; nÞASðHiCðk;hÞÞ can be obtained; a lower bound onR2ðHiCðk;hÞÞ: Assume that all links have a probability p

of being operational. Then the probability of anypath of length l being operational is pl and theprobability of failure of such a path is 1� pl :The probability of failure of i link disjoint paths, all of

length l; is given by ð1� plÞi: Therefore, the probabilitythat at least one of i link disjoint paths of length l isoperational is 1� ð1� plÞi:

Theorem 16. If h41 then IR2ðHiCðk;hÞÞm ¼ 1� ð1�pDðHiCÞþ1Þk:

Proof. From Theorem 3, DðHiCðk;hÞÞ ¼ 2h � 1; whichimplies that DðHiCÞ is always ODD. Therefore,maxðdÞ ¼ DðHiCÞ if d is ODD and maxðdÞ ¼ DðHiCÞ �1 if d is EVEN. From Lemma 11 we know that if dðm; nÞis EVEN there are k link-disjoint paths between m and nof length at most dðm; nÞ þ 2: Therefore a lower boundIR2ðm; nÞm is given by ð1� ð1� pdðm;nÞþ2ÞkÞ: ButmaxðdÞ ¼ DðHiCÞ � 1 if d is EVEN. Therefore minð1�ð1� pdþ2ÞkÞ ¼ 1� ð1� pDðHiCÞþ1Þk:From Lemma 10 we know that if dðm; nÞ41 is ODD

there are k link-disjoint paths between m and n of lengthat most dðm; nÞ þ 1: Therefore a lower bound IR2ðm; nÞmis given by ð1� ð1� pdðm;nÞþ1ÞkÞ: But maxðdÞ ¼ DðHiCÞif d is ODD. Therefore minð1� ð1� pdþ1ÞkÞ ¼ 1� ð1�pDðHiCÞþ1Þk:A lower bound IR2ðHiCðk;hÞÞm is given by

minðIR2ðm; nÞmÞ:8ðm; nÞASðHiCðk;hÞÞ: SinceminðIR2ðm; nÞmÞ is given by ð1� ð1� pDðHiCÞþ1ÞkÞ ifdðm; nÞ is EVEN or if dðm; nÞ is ODD and greater thanone, and if h41; we conclude that IR2ðHiCðk;hÞÞm isgiven by 1� ð1� pDðHiCÞþ1Þk: &

4.4. Average two-terminal reliability

While two-terminal reliability can give an indicationof the likelihood of failure of communications betweenspecific pairs of nodes, or even classes of pairs of nodes,it gives no indication of the average path reliability of apair of nodes in the graph. A graph with a few short,reliable links and many long, unreliable ones isindistinguishable from a graph with many short, reliablelinks and few long, unreliable ones. Average two-terminal reliability overcomes this shortcoming. Aver-age two-terminal reliability R2ðGÞ is defined as1

jSðGÞjP

R2ðm; nÞ : 8ðm; nÞASðGÞ where jSðGÞj ¼ jVðGÞj2

� �:

In this section we derive a lower bound on average two-terminal reliability for the HiCðk;hÞ: A lower bound onthe average two-terminal reliability of the binaryhypercube will also be derived, and compared with thatof the HiC for a range of system sizes.A network is symmetric if it is isomorphic to itself

with any node labelled as the origin. A symmetricnetwork appears the same if viewed from the perspectiveof any node. Therefore, all values of R2ðm; nÞ can bedetermined by considering only those sets of nodes inSðGÞ which contain a given node m: DefineSðG; mÞCSðGÞ as the set of all pairs of nodes in VðGÞwhich contain node m: For symmetric networks, then,

Page 8: The hierarchical cliques interconnection network

ARTICLE IN PRESSS. Campbell et al. / J. Parallel Distrib. Comput. 64 (2004) 16–28 23

we can redefine R2ðGÞ as 1jVðGÞj�1

PR2ðm; nÞ :

8ðm; nÞASðG; mÞ: Symmetric networks have a furtherproperty however: There is a constant k such thatR2ðm; nÞ ¼ k : 8ðm; nÞASðGÞ with a given distance d:Define SðG; m; dÞCSðGÞ as the set of all pairs of nodes

in VðGÞ which contain node m and have a given value ofdistance d: We can now redefine R2ðGÞ asP

ððR2ðm; nÞ : 8ðm; nÞASðG;m; dÞÞ � jSðG;m; dÞjÞ : 8SðG; m; dÞASðG; mÞjVðGÞj � 1

ð8Þ

Since for the HiC we are concerned only with the PEs,we can replace jVðGÞj � 1 with kh � 1:Recall that Theorem 16 established lower bounds

IR2ðm; nÞm for all pairs of distinct nodesðm; nÞASðHiC; mÞ:

(1)

IR2ðm; nÞm is ð1� ð1� p3ÞkÞ: 8ðm; nÞASðHiC; m; d : 1Þ:

(2)

IR2ðm; nÞm is ð1� ð1� pdþ2ÞkÞ: 8ðm; nÞASðHiC; m; d : EVENÞ:

(3)

IR2ðm; nÞm is ð1� ð1� pdþ1ÞkÞ: 8ðm; nÞASðHiC; m; d :ODD41Þ:

It remains to determine jSðHiC; m; dÞj for eachSðHiC; m; dÞASðHiC; mÞ:

(1)

If dðm; nÞ ¼ 1 then m and n are neighbours, and

jSðHiC; m; d : 1Þj ¼ k � 1:

(2)

If dðm; nÞ is even then m and n share a commonancestor, and

jSðHiC; m; d :EVENÞj ¼ kðd2Þ � kðd�22 Þ:

(3)

If dðm; nÞ41 is odd then m and n are not in the sameclique and do not share a common ancestor, so

jSðHiC; m; d :ODD41Þj ¼ ðkd�12 � k

d�32 Þðk � 1Þ:

Thus a lower bound IR2ðHiCðk;hÞÞm is made up of thesum of three terms. The first term comes from nodepairs with a distance of 1.

Term1 ¼ ðk � 1Þðkh � 1Þ ð1� ð1� p3ÞkÞ:

The second term comes from node pairs with an evendistance.

Term2 ¼Xh�1d2¼1

ð1� ð1� pdþ2ÞkÞ kðd2Þ � kðd�2

kh � 1 ð9Þ

¼ ðk � 1Þkðkh � 1Þ

Xh�1d2¼1

kd2ð1� ð1� pdþ2ÞkÞ: ð10Þ

The third term comes from node pairs with an odddistance greater than 1.

Term3 ¼Xh�1

d�12

¼1

ð1� ð1� pdþ1ÞkÞ ðkd�12 � k

d�32 Þðk � 1Þ

kh � 1

ð11Þ

¼ ðk � 1Þ2

kðkh � 1ÞXh�1

d�12

¼1

kd�12 ð1� ð1� pdþ1ÞkÞ ð12Þ

IR2ðHiCðk;hÞÞm is Term1þ Term2þ Term3:We now determine a lower bound IR2ðQÞm on the

average two-terminal reliability for a hypercube Q ofdimension n: Since the hypercube is symmetric, theexpression for R2ðGÞ given in (8) is applicable. The totalnumber of node pairs considered is given by jVðQÞj �1 ¼ 2n � 1: We can also determine that jSðQ; m; dÞj ¼

n!ðn�dÞ!�d!: A lower bound on the two-terminal reliability

IR2ðQÞm was determined in [5]. If the probability of alink being operational is p; then

IR2ðQÞm ¼ minð1� ðð1� pdÞdð1� pðdþ2ÞÞðn�dÞÞÞover all distances 0odpn: Therefore, a lower bound isgiven by

IR2ðQÞm ¼ n!

ð2n � 1ÞXn

d¼1

ð1� ðð1� pdÞdð1� pðdþ2ÞÞðn�dÞÞÞðn � dÞ!� d!

ð13Þ

¼ 2n

2n � 1�1

2n � 1Xn

d¼1

nd

� �ð1� pdÞdð1� pdþ2Þn�d :

ð14Þ

4.5. Comparison

The lower bounds determined for the average two-terminal reliability of hypercube, HiCð4;hÞ and HiCð3;hÞare plotted against the number of PEs in Fig. 4. Thelower bound on the two-terminal reliability of thehypercube rises with network dimension. This is asexpected, since the connectivity and node degree of ahypercube also increase with dimension. The HiCðk;hÞnetwork, by contrast, has fixed connectivity and nodedegree, consequently the lower bound decreases withnetwork size. The rate of decrease is much lower in theHiCð4;hÞ than in the HiCð3;hÞ: Further increasing thevalue of k leads to higher bounds for the value ofaverage two-terminal reliability, but at the cost ofincreased node degree. An HiCð4;4Þ has 256 PEs, nodedegree of 8 and an average two-terminal reliability ofover 0:9 with an operational link probability of p ¼ 0:9:With the same operational link probability, a hypercube

Page 9: The hierarchical cliques interconnection network

ARTICLE IN PRESS

n-cube

HiC(4,h)

HiC(3,h)

Ave

rage

Rel

iabi

lity

Number of PEs

0

0.2

0.4

0.6

0.8

1

0 50 100 150 200 250

Fig. 4. The average two-terminal reliability of hypercube and HiC:

if ð0oloh � 1Þ then

if /Cl ;y;Ch�1S ¼ /Dl ;y;Dh�1S then

Route message to child /Dl�1;y;Dh�1Selse

if /Cl ;y;Ch�2S ¼ /Dl ;y;Dh�2S then

Route message to neighbouring SE/Dl ;y;Dh�1S

else

S. Campbell et al. / J. Parallel Distrib. Comput. 64 (2004) 16–2824

of the same size has an average two-terminal reliabilityof nearly 1 but node degree of 16.

Route message to parentelse

if ðl ¼ h � 1Þ then

if /ClS ¼ /DlS then

Route message to child /Dl�1;DlSelse

Route message to neighbouring SE /DlS

5. Message routing algorithms

5.1. One to one communication

PEs and SEs handle message routing in differentways. PEs either source or receive messages, SEs serveonly as intermediate nodes for PE to PE communica-tion. Let the address of the current node be/Cl ;y;Ch�1S where ð0plohÞ and the destinationaddress be /D0;y;Dh�1S:A PE sending a message compares /C0;y;Ch�2S

with /D0;y;Dh�2S: If they are not the same themessage is routed to the PE’s parent. If they are thesame then the destination PE is a neighbour and themessage is passed directly. This is expressed formally inAlgorithm 1.

if /C0;y;Ch�2S ¼ /D0;y;Dh�2S then

Route message to neighbouring PE /D0;y;Dh�1Selse

Route message to parentif l ¼ h � 1 then

Send message to all neighbourselse

if /Cl ;y;Ch�1S ¼ /Sl ;y;Sh�1S then

Send message to parent and all children suchthat/Cl�1;y;Ch�1Sa/Sl�1;y;Sh�1S

else

Send message to all children

Algorithm 1. HiC PE message routing algorithm.

A SE at level l has address /Cl ;y;Ch�1S forð0oloh � 1Þ:Upon receipt of a message a SE comparesits own address with the corresponding digits from thedestination node address. If they are the same the SE isan ancestor of the destination node, and the message canbe routed to the child node which is either the

destination or an ancestor of the destination. If theaddresses differ only in the rightmost digit, then aneighbour of the SE is an ancestor of the destinationnode, and the message is routed via that neighbour. Inany other case the message is routed further up thehierarchy, to the SE’s parent. An SE at level l ¼ ðh � 1Þhas address ðClÞ: If ðClÞ ¼ ðDlÞ then the SE is anancestor of the destination node, and the message isrouted to the child node which is the destination or anancestor of the destination. Otherwise the message isrouted to the SE’s neighbour which is an ancestor of thedestination node. This is expressed formally in Algo-rithm 2.

Algorithm 2. HiC SE message routing algorithm.

5.2. One to many communication

The HiC broadcast algorithms are very simple. Thebroadcast algorithm used by the source PE, with address/S0;y;Sh�1S; is trivial. The source PE routes themessage only to its parent. For a SE at level l; whereð0olph � 1Þ; the broadcast algorithm is shown inAlgorithm 3.

Algorithm 3. HiC SE broadcast algorithm.

Page 10: The hierarchical cliques interconnection network

ARTICLE IN PRESSS. Campbell et al. / J. Parallel Distrib. Comput. 64 (2004) 16–28 25

Note that the message is transmitted to neighboursonly at level l ¼ h � 1: At all other levels only parent orchild links are used. This is to prevent PEs fromreceiving duplicate messages.

Fð0; 0Þ m with address /M0;y;Mh�1S such thatMi ¼ 1 for 0piph � 1for x ¼ 0-2h � 1 do

if xo2h � 1 then

CALL MAPXðx; 0Þfor y ¼ 0-2h � 2 do

CALL MAPYðx; yÞ

PROCEDURE MAPXðx; yÞGiven thatFðx; yÞ m with address /M0;y;Mh�1Sif x is EVEN then

if Mh�1 is EVEN then

Fðx þ 1; yÞ n with address /M0;y;Mh�1 �1S

else

Fðx þ 1; yÞ n with address /M0;y;Mh�1 þ1S

else

if Mh�1 is EVEN then

Fðx þ 1; yÞ n with address /M0 þ1;M1;y;Mh�1S

else

Fðx þ 1; yÞ n with address /N0;y;Nh�1Sk ¼ 0; i ¼ 1repeat

if xþ14i mod 4 is ODD then

k ¼ 2i

else

if xþ14i mod 4 ¼ 2 then

k ¼ 2i þ 1else

i ¼ i þ 1until ka0for i ¼ 0-k � 1 do

if Mi is EVEN then

Ni ¼ Mi � 1else

Ni ¼ Mi þ 1for i ¼ k-h � 1 do

Ni ¼ Mi

6. Network embeddings

The ability of a parallel computer’s interconnectionnetwork to emulate other networks allows it toeffectively employ algorithms and data structuresdeveloped for different parallel architectures.The poten-tial quality of such an emulation can be determinedfrom the parameters of the embedding. Let G and H beundirected graphs. G represents a guest graph and H

represents a host graph. Using the terminology of [11]an embedding of G into H is a mapping F from thenodes of G to the nodes of H: The mapping from anynode g in G to a node h in H is represented by FðgÞ ðhÞ: The dilation of an embedding F is the maximumdistance in the host between the images of adjacent guestnodes. The expansion of the embeddingF is the ratio ofthe number of nodes in the host graph to the number ofnodes in the guest graph, i.e. jVðHÞj=jVðGÞj: Sinceexpansion is primarily a measure of how effectively anembedding utilises the available computing resources wederive it using only the number of PEs in a system anddisregard the SEs. The number of edges of G routedthrough edge eAEðHÞ in embeddingF is cðeÞ: The edge

congestion of F is maxðcðeÞÞ : 8eAEðHÞ:We have chosen to study embeddings of two of the

most popular interconnection networks, two-dimen-sional meshes and binary hypercubes. A wide varietyof algorithms have been developed for these twotopologies, so efficient embeddings ensure easy avail-ability of well tested, high-performance algorithms foruse on the hierarchical cliques network when resourcesare not available to develop custom algorithms. Binarystructures such as meshes or binary hypercubes mapmost naturally onto the HiCðk;hÞ when k is a power oftwo. In the following work, all mappings are made toHiCð4;hÞ:

6.1. Two-dimensional meshes

AmeshM of dimension 2h � 2h can be embedded intoHiCð4;hÞ: A node of mesh M is identified as ðx; yÞ; where0px; yo2h: For HiCð4;hÞ the address of a leaf node m atlevel l is represented by M ¼ /M0;y;Mh�1S: For anymesh node ðx; yÞ where x is even, mesh nodes ðx; yÞ andðx þ 1; yÞ will map to neighbours in the HiCð4;hÞ:Similarly, for any mesh node ðx; yÞ where y is even,mesh nodes ðx; yÞ and ðx; y þ 1Þ will map to neighboursin the HiCð4;hÞ: For any mesh node ðx; yÞ where x is odd,mesh nodes ðx; yÞ and ðx þ 1; yÞ share a commonancestor at level l: Similarly for any mesh node ðx; yÞ

where y is odd, mesh nodes ðx; yÞ and ðx; y þ 1Þ share acommon ancestor at level l: The level l of the commonancestor is determined by the value of x or y: Theembedding is described in Algorithm 4.

Algorithm 4. 2D mesh embedding.

This embedding has dilation 2ðh � 1Þ; expansion 1 andedge congestion 2h�1:

Algorithm 5. 2D mesh embedding procedure MAPX.

Page 11: The hierarchical cliques interconnection network

ARTICLE IN PRESS

PROCEDURE MAPYðx; yÞGiven thatFðx; yÞ m with address /M0;y;Mh�1Sif y is EVEN then

if Mh�1p2 then

Fðx; y þ 1Þ n with address /M0;y;Mh�1 þ2S

else

Fðx; y þ 1Þ n with address /M0;y;Mh�1 �2S

else

if Mh�142 then

Fðx; y þ 1Þ n with address /M0 þ2;M1;y;Mh�1S

else

Fðx þ 1; yÞ n with address /N0;y;Nh�1Sk ¼ 0; i ¼ 1repeat

if xþ14i mod 4 is ODD then

k ¼ 2i

else

if xþ14i mod 4 ¼ 2 then

k ¼ 2i þ 1else

i ¼ i þ 1until ka0for i ¼ 0-k � 1 do

if Mi42 then

Ni ¼ Mi � 2else

Ni ¼ Mi þ 2for i ¼ k-h � 1 do

Ni ¼ Mi

for all qAQðnÞ do

if n is EVEN then

FðqÞ mfor i ¼ 0 to In=2� 1m do

Mi ¼ ðqn�2i; qn�2i�1Þ10 þ 1else

FðqÞ mMh�1 ¼ ðq1; q0Þ10 þ 1Mh�2 ¼ ðqn�1Þ10 þ 1for i ¼ 1 to In=2� 1m do

Mi�1 ¼ ðqn�2i; qn�2i�1Þ10 þ 1

S. Campbell et al. / J. Parallel Distrib. Comput. 64 (2004) 16–2826

Algorithm 6. 2D mesh embedding procedure MAPY.

6.2. Binary hypercubes

A binary hypercube QðnÞ can be embedded intoHiCð4;Inþ1

2mÞ: For any QðnÞ the address of a node q is

represented as a binary string ðqn�1;y; q0Þ [11]. For anybinary string ðqx;y; qx�yÞ; where ypx; we represent the

decimal value of the binary number ðqx;y; qx�yÞ byðqx;y; qx�yÞ10: For HiCð4;hÞ the address of a leaf node mis represented by M ¼ /M0;y;Mh�1S: A 2-cube canbe trivially embedded into HiCð4;1Þ with dilation, edge

congestion and expansion all equal to one. A 3-cube canbe mapped onto HiCð4;2Þ as two 2-cubes on separate

cliques, with each node also connected to the appro-priate node in the other clique via their common parent.This mapping has dilation and expansion of two andcongestion of one. For each further increment in thevalue of n; if n becomes odd the height of the HiCð4;hÞinto which it is embedded will increase by one and onlyhalf the PEs will be used. If n becomes even the height

will remain the same and the other half of the PEs willbe used. The embedding is described in Algorithm 7.

Algorithm 7. Binary hypercube embedding.

The dilation of this embedding is derived in thefollowing way. A node q of QðnÞ has addressðqn�1;y; q0Þ: If a node p of QðnÞ is one of the n nodesdirectly connected to q then p’s address differs from thatof q in that one bit qx is replaced by qx; where 0pxpn �1: From Algorithm 7 it is clear that, if FðqÞ m andFðpÞ n; then the address of n differs from that of m inthat one digit My is replaced by My71; where0pyrh � 1: If y ¼ h � 1 then m and n are neighbours.If 0pyoh � 1 then lcaðm; nÞ occurs at level y þ 1: Themaximum distance between adjacent nodes of QðnÞ istherefore 2ðh � 2þ 1Þ ¼ 2ðh � 1Þ:The congestion of this embedding is derived as

follows. For np3 congestion is equal to one, that is,for n ¼ 3 congestion is 2ðn�3Þ: For each subsequentincrement in the value of n; if n becomes even the heightof the HiCð4;hÞ into which it is embedded remains thesame, but the number of edges routed through the linksfrom nodes 1–4 doubles as the number of embeddednodes doubles. If n becomes odd the height of theHiCð4;hÞ into which it is embedded will increase by one,but the number of edges routed through the links fromnodes 1–4 still doubles, as once again the number ofembedded nodes doubles. Thus for each increase in n thecongestion increases by 2, and the system congestion canbe expressed as 2ðn�3Þ for nX3:This embedding has:dilation 2ðh � 1Þ;edge congestion 2n�3;

expansion1 : n EVEN;2 : n ODD:

6.3. Embeddings into the FatHiC

In the embeddings considered above the edgecongestion grows rapidly as the size of the embeddednetwork increases, due to congestion in higher-level

Page 12: The hierarchical cliques interconnection network

ARTICLE IN PRESS

Table 1

Parameters of Embeddings into HiC and GFT

Guest graph Host graph Dilation Expansion Edge

congestion

2D-mesh HiCð4;hÞ 2ðh � 1Þ 1 2h�1

FatHiCð4;hÞ 2ðh � 1Þ 1 1

2h � 2h GFTð2h;2;2Þ 4h 1 3

Hypercube HiCð4;Inþ12

mÞ 2ðh � 1Þ 1 or 2 2n�3

FatHiCð4;Inþ12

mÞ 2ðh � 1Þ 1 or 2 2

order n GFTðn;2;2Þ 2n 1 Jn=2n

Table 2

Hardware requirements of HiCð4;3Þ; FatHiCð4;3Þ and GFTð6;2;2Þ

HiCð4;3Þ FatHiCð4;3Þ GFTð6;2;2Þ

PEs 64 64 64

SEs 21 21 384

Links 210 318 768

S. Campbell et al. / J. Parallel Distrib. Comput. 64 (2004) 16–28 27

communication channels. Since congestion in the upperlevels is possibly the most fundamental problem withtree-based networks it has already been extensivelystudied. A well-accepted solution has been proposed, theFat tree of [9]. In order to prevent congestion in theupper levels dominating communication time for regularalgorithms involving a large proportion of PEs trans-mitting across the diameter of the network simulta-neously, the bandwidth of links in upper layers isincreased. Applying the same principle, we define aFatHiCðk;hÞ as an extension of the HiCðk;hÞ; where thebandwidth of parent–child links increases with the levelof the nodes incident to them. The increase ofbandwidth required is dictated by the cost andperformance requirements of the system; the higherthe bandwidth of the links the higher the expectedperformance, but at a higher cost. While consideringembeddings into the FatHiCðk;hÞ the link bandwidth willbe assumed to increase by a factor of k at each level ofthe network. For example, a FatHiCð4;3Þ would havelinks of bandwidth B from leaf nodes at level zero totheir parents at level one, links of bandwidth 4B fromnodes at level one to their parents at level two, and linksof bandwidth 16B from nodes at level two to the rootnode. Assuming that the cost of the extra bandwidth isapproximated by representing each link of bandwidthkB as k separate links, the number of links in thisversion of a FatHiCðk;hÞ is L ¼ khðk

2þ hÞ � k

2:

Since FatHiCðk;hÞ is topologically identical to HiCðk;hÞany network which can be embedded into HiCðk;hÞ canbe embedded into FatHiCðk;hÞ in exactly the samemanner, with the same dilation and expansion. Thecongestion may be reduced. All embeddings consideredin the previous sections were into HiCð4;hÞ; these sameembeddings are used here into FatHiCð4;hÞ; and thecongestion is determined.Considering the two-dimensional mesh embedding of

Section 6.1, the congestion of the embedding is 1.Considering the binary hypercube embedding of

Section 6.2, the congestion of the embedding is 2.

6.4. Quality of embeddings

In order to provide an indication of the relativequality of these embeddings, Table 1 shows embeddingparameters for HiCðk;hÞ; FatHiCðk;hÞ and another hier-archical network, the generalised fat trees GFTðh;m;wÞ[13]. All the results given here are for GFTðh;2;2Þ: Thisconfiguration has a total number of nodes (PEs andSEs) NðGFTÞ ¼ ðh þ 1Þ � 2h and a number of linksLðGFTÞ ¼ hð2hþ1Þ:For each of the guest graphs studied, the dilation and

expansion of the embeddings into each of the hostgraphs are of the same order. In general, the dilation ofthe GFT is higher by some small factor, indicatingpossible higher latency for communications between

mapped nodes. For each embedding the edge congestionof the HiC grows rapidly, the edge congestion of theGFT is either constant or grows slowly, while the edgecongestion of the FatHiC is constant for each case. Thisindicates congestion would occur rapidly under uniformcommunication conditions with these embeddings intothe HiC: Such a result is not surprising, both theFatHiC used and the GFTðh;2;2Þ maintain constantbandwidth between all levels of the hierarchy while thebandwidth of the HiC decreases in the upper levels ofthe hierarchy. However, the potential performancebenefit yielded by the lower edge congestion of theFatHiC and GFT must be weighed against the cost ofthe extra hardware required to achieve it. For example,a binary hypercube of order 6 maps onto HiCð4;3Þ;FatHiCð4;3Þ or GFTð6;2;2Þ: Table 2 shows that while allthree of these networks have an equal number of PEs,the number of both SEs and Links required by the GFT

is significantly higher than either the HiC or FatHiC:Of course using different members of either family of

networks would yield different results for these map-pings. In general the HiC emphasises local communica-tion over global communication and allows excellent,cost effective local connectivity at the cost of poorerglobal performance. The FatHiC has superior globalcommunication and allows higher quality mappingsthan the GFT ; with fewer switches and less links. TheFatHiC is clearly the best of the hierarchical networksconsidered, offering high-quality embeddings for rea-sonable cost.

7. Discussion and conclusion

We presented the HiC interconnection network formultiprocessor systems. This network, a combination of

Page 13: The hierarchical cliques interconnection network

ARTICLE IN PRESSS. Campbell et al. / J. Parallel Distrib. Comput. 64 (2004) 16–2828

the tree and fully connected networks, possessestopological and architectural properties superior tomost other hierarchical networks in a variety of problemdomains. Some features of the hierarchical cliquesinclude: (1) flexibility in terms of degree of connectivityand the height of the tree, (2) fault-tolerance, (3) theability to make maximum use of communication localityin parallel algorithms, (4) logarithmic diameter, (5)versatile embedding properties. In addition, the pro-posed addressing scheme facilitates self routing andbroadcasting.

References

[1] A. Agarwal, Limits on interconnection network performance,

IEEE Trans. Parallel Distrib. Systems 2 (4) (1991) 398–412.

[2] S.M. Campbell, M.J. Kumar, Hierarchical cliques: cost effective

multiprocessor interconnection network, IASTED Inter-

national Conference on Parallel and Distributed Computing and

Systems October 16–19, Chicago, Illinois, USA, IASTED, 1996,

pp. 134–138.

[3] C.J. Colbourn, The Combinatorics of Network Reliability,

Oxford University Press, New York, 1987.

[4] W.J. Dally, C.L. Seitz, Deadlock-free message routing in multi-

processor interconnection networks, IEEE Trans. Comput. C-36

(5) (1987) 547–553.

[5] A. El-Amawy, S. Latifi, Properties and performance of folded

hypercubes, IEEE Trans. Parallel Distrib. Systems 2 (1) (1991)

31–42.

[6] A-H. Esfahanian, Generalized measures of fault tolerance with

application to n-cube networks, IEEE Trans. Comput. 38 (11)

(1989) 1586–1591.

[7] M.S. Krishnamoorthy, B. Krishnamurthy, Fault diameter of

interconnection networks, Comput. Math. Appl. 13 (5/6) (1987)

577–582.

[8] J.M. Kumar, L.M. Patnaik, Extended hypercube: a hierarchical

interconnection network of hypercubes, IEEE Trans. Parallel

Distrib. Systems 3 (1) (1992) 45–57.

[9] C.E. Leiserson, Fat-trees: universal networks for hardware-

efficient supercomputing, IEEE Trans. Comput. C-34 (10)

(1985) 892–900.

[10] Q.M. Malluhi, M.A. Bayoumi, The hierarchical hypercube: a new

interconnection topology for massively parallel systems, IEEE

Trans. Parallel Distrib. Systems 5 (1) (1994) 17–30.

[11] B. Monien, H. Sudborough, Embedding one interconnection

network in another, in: G. Tinhofer, E. Mayr, H. Noltemeier, M.

Syslo (Eds.), Computational Graph Theory, Springer, Berlin,

1990.

[12] S.R. Ohring, S.K. Das, Folded Petersen cube networks: new

competitors for the hypercube, The International Parallel

Processing Symposium, Anaheim, CA, USA, April 1993.

[13] S.R. Ohering, M. Ibel, S.K. Das, M.J. Kumar, On generalized fat

trees, International Parallel Processing Symposium, Santa Bar-

bara, CA, April 1995, pp. 37–45.

[14] Y. Saad, M.H. Schultz, Topological properties of hypercubes,

IEEE Trans. Comput. 37 (7) (1988) 867–872.

[15] S-B. Tien, C.S. Raghavendra, Algorithms and bounds for shortest

paths and diameter in faulty hypercubes, IEEE Trans. Parallel

Distrib. Systems 4 (6) (1993) 713–718.

[16] N-F. Tzeng, S. Wei, Enhanced hypercubes, IEEE Trans. Comput.

40 (3) (1991) 284–293.

Dr. Campbell received his bachelor of engineering from the Western

Australian Institute of Technology in 1986 and the Ph.D degree in

computer science from Curtin University of Technology in 1999. In

2002 he joined Fair Isaac Inc where he works developing analytic

platforms. His research interests include design of parallel algorithms,

graph theory and applications, and medical computing.

Dr. Mohan Kumar received his Ph.D. and MTech degrees from the

Indian Institute of Science in 1985 and 1992 resepctively and the BE

degree from Bangalore University, India in 1982. In 2001 he joined the

Department of Computer Science and Engineering as Associate

Professor. He was on the faculty of the School of Computing School

at the Curtin University of Technology, Perth, Australia during 1992-

2000. His interests include pervasive computing, mobile computing,

caching for mobile Internet, active network technologies and

distributed systems. he has published over 80 articles in these areas.

He served as the Program Chair for PerCom 2003 and is the General

Vice Chair for PerCom 2004. He has guest edited special issues leading

journals and is on the editorial board of The Computer Journal.

Dr. Olariu received the M.Sc. and Ph.D. degrees in computer science

from McGill University, Montreal in 1983 and 1986, respectively. In

1986 he joined Old Dominion University where he is a Professor of

Computer Science. Dr. Olariu has published extensively in various

journals, book chapters, and conference proceedings. His research

interests include image processing and machine vision, parallel

architectures, design and analysis of parallel algorithms, computa-

tional graph theory, computational geometry, and mobile computing.

Dr. Olariu serves on the Editorial Board of IEEE Transactions on

Parallel and Distributed Systems, Journal of Parallel and Distributed

Computing, VLSI Design, Parallel Algorithms and Applications,

International Journal of Computer Mathematics, and International

Journal of Foundations of Computer Science.