View
216
Download
2
Category
Preview:
Citation preview
J. Parallel Distrib. Comput. 64 (2004) 16–28
ARTICLE IN PRESS
�Correspond
E-mail addr1This work
Scholarship.2Thanks Dr
0743-7315/$ - se
doi:10.1016/j.jp
The hierarchical cliques interconnection network
Stuart Campbell,a,1,2 Mohan Kumar,b,� and Stephan Olariuc
aFair Isaac Corporation, 901 Marquette Avenue Suite 3200 Minneapolis, MN 55402, USAbDepartment of Computer Science and Engineering, The University of Texas at Arlington, Box 19015, Arlington, TX 76120, USA
cDepartment of Computer Science, Old Dominion University, Norfolk, VA 23529-0162, USA
Received 7 February 2000; revised 15 July 2003
Abstract
The fully connected network possesses extremely good topological, fault-tolerant, and embedding properties. However, due to its
high degree, the fully connected network has not been an attractive candidate for building parallel computers. On the other hand,
tree-based networks are popular as parallel computer networks, even though they suffer from poor fault-tolerance and embedding
properties. The hierarchical cliques interconnection network described in this paper incorporates positive features of the fully
connected network and the tree network. In other words, the hierarchical cliques possess such desirable properties as low diameter,
low degree, self routing, versatile embedding, good fault-tolerance and strong resilience. Hierarchical cliques can efficiently embed
most important networks and possess a scalable, modular structure. Further, by combining hierarchical cliques with fat trees
congestion in the upper levels can be alleviated.
r 2003 Elsevier Inc. All rights reserved.
Keywords: Cliques; Trees; Interconnection networks; Network embedding; Fault-tolerance
1. Introduction
In recent years several interconnection networks havebeen proposed and studied for their suitability inparallel computers. In general, some of the desirableproperties of multiprocessor topologies include lowdiameter, low degree, high bisection width, versatileembedding properties and robustness to faults. The fullyconnected network possesses all the desirable propertiesexcept low degree. As the degree of a fully connectednetwork of size N is ðN � 1Þ; it is very expensive toconstruct multiprocessor networks with this topology.Less costly topologies, such as meshes and k-ary n-cubes, have been the most popular for buildingcommercial parallel computers.Mesh topologies map well onto a plane, and are well
suited to VLSI implementation, but suffer from highdiameter and correspondingly high latency. Methodssuch as wormhole routing [4] have been used in mesh
ing author.
ess: kumar@cse.uta.edu (M. Kumar).
was supported by a Curtin University Postgraduate
J. Simpson for his invaluable assistance.
e front matter r 2003 Elsevier Inc. All rights reserved.
dc.2003.08.005
architectures to reduce latency. These methods are verysuccessful when contention is low, but are less effectivewhen network congestion rises [1]. Under these circum-stances meshes are only effective if the algorithmexhibits communication locality. Hence they are suitablefor implementation of applications such as computa-tional fluid dynamics and low-level image processing.Other topologies which allow better use to be made ofcommunication locality offer greater performance underthese circumstances. Meshes also suffer from poorembedding properties and fault-tolerance.The binary n-cube topology possesses such desirable
properties as symmetry, regularity, robustness, logarith-mic diameter and self routing schemes, among manyothers. On the other hand the hypercube topology hashigh degree for large networks and it does not have amodular structure.The tree network has low degree and is suitable for
VLSI implementation. However, the tree network isnotorious for message congestion at the root. Severalinterconnection networks that combine the good fea-tures of the hypercube, the mesh, the tree, and the fullyconnected network have been proposed [5,8–10,12,16].The fat tree has been popular, its topology was
adopted in the Connection Machine-5 (CM-5), a
ARTICLE IN PRESS
0
1 2 3 4
11 12 13 14
21 22 23 24
31 32 33 34
41 42 43 44
141 142 143 144
241 242 243 244
341 342 343 344
441 442 443 444
Represents a single connection between two nodes.
Represents connections forming a clique among enclosed nodes.
Fig. 1. Part of a HiC with k ¼ 4 and h ¼ 3:
S. Campbell et al. / J. Parallel Distrib. Comput. 64 (2004) 16–28 17
commercial machine. Further, Leiserson [9] and laterOhring et al. [13] have demonstrated the versatility ofthe fat tree network. Unlike the tree network, theintermediate nodes of the fat tree network have multipleparent nodes and hence there are multiple communica-tion paths among all pairs of pendant nodes.In this paper we describe the hierarchical cliques
ðHiCÞ network, which combines positive features of thefully connected network, the extended hypercube net-work, and the tree network. The HiC is a k-ary tree,modified to enhance local connectivity in a hierarchical,modular fashion. The HiC structure can be used todefine a number of different useful multiprocessorarchitectures. We define the leaf nodes, on level 0, asprocessor elements (PEs). All other nodes are switchingelements (SEs). The cost effectiveness of the HiC hasalready been analysed in [2], where it was referred to asthe Reliable Hierarchical Cliques (RHiC). At the risk ofcausing some confusion with a similar, but inferiortopology referred to in [2] as HiC; we have changed thename to prevent the misapprehension that reliability isthe main feature of this topology. Here we analyse someof its topological features relevant to performance andfault-tolerance. We demonstrate the simplicity ofvarious routing algorithms using the proposed addres-sing scheme and finally define some embeddings ofpopular topologies onto the HiC to demonstrate itsversatility.The paper is organised as follows. Section 2 describes
the structure and addressing scheme of the HiC: Sometopological parameters of the HiC are described inSection 3. Section 4 demonstrates the fault-tolerantproperties of the HiC: Various message routing algo-rithms for the HiC are described in Section 5.Embeddings of some important topologies onto theHiC are discussed in Section 6. Section 7 concludes thepaper.
2. Structure of the hierarchical cliques
HiCðk;hÞ is a k-ary tree of height h modified so thatgroups of nodes on the same level form cliques.Members of a clique are referred to as neighbours. Theroot node is at level h; and has address 0. The k childrenof the root node form a clique at level h � 1: They haveaddresses consisting of a single unique digit in the range1 to k: In general let m be a node at level l of HiCðk;hÞ;where ð0ploh � 1Þ: Then m has address M consistingof a sequence of digits /Ml ;y;Mh�1S; where eachdigit is in the range 1 to k: Consider a second node n inthe same HiC as m: If n’s address N is a proper suffix ofM then n is an ancestor of m; which is a descendant of n:If M ¼ /Ml ;NS then n is the parent of m and m is achild of n: If a sequence P exists such that N ¼
/P;Nh�1S and M ¼ /P;Mh�1S then m and n areneighbours.We refer the reader to Fig. 1, illustrating the structure
and addressing scheme of an HiC with k ¼ 4 and h ¼ 3:(Only one-quarter of the level 0 nodes are shown). Theroot node has address 0 and is at level 3. The root nodehas four children at level 2, with addresses 1, 2, 3 and 4.The nodes at level 2 form a clique; in Fig. 1 nodes of aclique are shown enclosed in a dashed oval. Each level 2node has four children at level 1. The address of a nodeat level 1 consists of the address of its parent nodeappended to a digit between 1 and 4, which distinguishesthe level 1 node from its siblings. Thus, nodes 12, 22, 32and 42 are all children of node 2. Nodes at level 1 areneighbours if their parents are neighbours and the firstdigit of their address is the same; for example nodes 21,22, 23 and 24 form a clique. The address of a node atlevel 0 consists of the address of its parent nodeappended to a digit between 1 and 4. The digitdistinguishes the level 0 node from its siblings. Thusnodes 141, 241, 341 and 441 are all children of node 41.Nodes at level 0 are neighbours if their parents areneighbours and the first digit of their address is thesame, for example nodes 241, 242, 243 and 244 form aclique.
3. Parameters of the HiC
Assume that ka1; then the nodes of HiCðk;hÞ occur ath þ 1 levels, with the leaves at level 0 and the root at
ARTICLE IN PRESSS. Campbell et al. / J. Parallel Distrib. Comput. 64 (2004) 16–2818
level h: The total number of nodes in HiCðk;hÞ is N ¼khþ1�1
k�1 : Recall that we use only leaf nodes as processors(PEs), with all other nodes acting as switches (SEs). Wewish to determine the number of SEs ðNSÞ; the numberof PEs ðNPÞ and the number of communication linksðLÞ: An easy inductive argument shows that at level l;ð0plphÞ; there are Nl ¼ kh�l nodes. Therefore, thenumber of leaf nodes is NP ¼ kh: The number ofswitching nodes NS ¼
Phl¼1 kh�l ¼ kh�1
k�1 : The number oflinks L ¼ LT þ LC where LT represents the number oflinks in a k-ary tree of height h and LC represents thenumber of links connecting nodes into k-cliques. Thetotal number of links within a k-ary tree of height h isgiven by LT ¼
Phl¼1 kl : The number of links within a k-
clique is given by kðk � 1Þ=2; and the number of k-
cliques within HiCðk;hÞ is given byPh�1
l¼0 kh�1�l so LC ¼ðk � 1Þ=2�
Phl¼1 kl : Therefore L ¼ kðkþ1Þðkh�1Þ
2ðk�1Þ :
One of the most important properties of a multi-processor interconnection network is the diameter. Wedefine and prove two related lemmas before deriving theexpression for the diameter of an HiCðk;hÞ:
Lemma 1. In any HiCðk;hÞ; if leaf node m has address
M ¼ /M0;y;Mh�1S and leaf node n has address N ¼/N0;y;Nh�1S such that MxaNx for 0pxph � 1; then
dðm; nÞ ¼ 2h � 1:
Proof. Since MxaNx for 0pxph � 1 no prefix P canexist such that M ¼ /P;Mh�1S and N ¼ /P;Nh�1S:Therefore m and n cannot be neighbours, nor can theirancestors at level loh � 1 be neighbours. Now considerthe ancestors of m and n at level l: Their addresses will be/Ml ;y;Mh�1S and /Nl ;y;Nh�1S; respectively.Then, since MxaNx for 0pxph � 1; m and n can haveno common ancestor nearer than the root. Therefore,the shortest path between m and n passes through thelink between their respective ancestors at level h � 1 andhas a length of 2h � 1: &
Lemma 2. In HiCðk;hÞ there exist at least two nodes, m with
address M ¼ /M0;y;Mh�1S and n with address N ¼/N0;y;Nh�1S; such that MxaNx for 0pxph � 1:
Proof. Nodes at level 0 have an address made up of h
digits in the range 1 to k: The number of uniqueaddresses which can be generated with a sequence of h
digits, where each digit is in the range 1 to k; is kh: Thenumber of nodes at level 0 is kh; and each node has aunique address. &
Theorem 3. A hierarchical clique HiCðk;hÞ with k41 and
h40 has diameter D ¼ 2h � 1:
Proof. An HiCðk;hÞ is a k-ary tree of height h; modifiedwith extra links. A complete tree of height h has diameter
D ¼ 2h: Since the nodes at level h � 1 form a clique thedistance from any node at level h � 1 to any other node atlevel h � 1 is d ¼ 1: Therefore, the maximum possibledistance between two nodes in an HiCðk;hÞ is 2h � 1:From Lemmas 1 and 2 there exists at least one pair ofnodes m; n with distance dðm; nÞ ¼ 2h � 1: &
Another important parameter of a multiprocessorinterconnection network is the average distance betweenprocessors. The average inter-PE distance is often usedin dynamic network topologies rather than the averageinter-node distance, as it is a more useful indicator ofperformance. However, the symbol ð %dÞ is used torepresent the average inter-PE distance, as it isfrequently compared with the average inter-node dis-tance of static networks.
Theorem 4. A hierarchical clique HiCðk;hÞ with k41 and
h40 has average inter-PE distance
%d ¼2k
k�1ð1� khÞ þ ð2h þ 1Þkh � kh�1
kh � 1 :
Proof. Observe that theHiC topology is symmetric from
the point of view of the leaves. Without loss of generalitythen, %d can be determined by considering one leaf node,m: The distance from m to another leaf node n is dðm; nÞ:We wish to determinePdðm; nÞ for all n at level 0.
* The sum of distances to m’s ðk � 1Þ neighbours is ðk �1Þ:
* At level l ð0olohÞ; m has an ancestor u; which is theleast common ancestor of m and ðk � 1Þkðl�1Þ leafnodes. The distance from any of these leaf nodes to mis 2l; so the sum of the distances equals 2lðk �1Þkðl�1Þ: Considering all leaf nodes which have a leastcommon ancestor, other than the root, with m; thesum of distances is
Ph�1l¼1 2lðk � 1Þkðl�1Þ:
* Each of u’s ðk � 1Þ neighbours at level l is the leastcommon ancestor of one of m’s neighbours and ðk �1Þkðl�1Þ leaf nodes. The distance from any of theseleaf nodes to m is 2l þ 1; so the sum of the distancesequals ðk � 1Þ2kðl�1Þð2l þ 1Þ: Considering all leafnodes which have a least common ancestor, otherthan the root, with one of m’s neighbours, the sum ofdistances is
Ph�1l¼1 ðk � 1Þ2kðl�1Þð2l þ 1Þ:
These three cases cover all possible leaf nodes. Theirpartial results can be combined to give the expressionfor the total sum of distances.X
dðm; nÞ
¼ ðk � 1Þ þ ðk � 1ÞXh�1l¼1
2lkðl�1Þ
þ ðk � 1Þ2Xh�1l¼1
kðl�1Þð2l þ 1Þ ð1Þ
ARTICLE IN PRESSS. Campbell et al. / J. Parallel Distrib. Comput. 64 (2004) 16–28 19
¼ ðk � 1Þ kðh�1Þ þ 2Xh�1l¼1
lkl
!ð2Þ
¼ ðk � 1Þkðh�1Þ þ 2ððh � 1Þkhþ1 � hkh þ kÞðk � 1Þ ð3Þ
¼ � 2
ðk � 1Þ khþ1 þ ð2h þ 1Þkh � kh�1 þ 2
ðk � 1Þ k: ð4Þ
The total number of distances is NP � 1 ¼ kh � 1;
‘ %d ¼� 2
ðk�1Þ khþ1 þ ð2h þ 1Þkh � kh�1 þ 2ðk�1Þ k
kh � 1 : &
Values of average inter-PE distance for variousconfigurations of HiCðk;hÞ are plotted in Fig. 2. Theplots clearly show the reduced average distance which isobtained in a system with a given number of PEs byincreasing the value of k: This improvement is gained atthe cost of extra links and extra switch complexity,which naturally increases the cost of any system. Tojustify the extra cost, algorithms must perform betterwith the extra PE connectivity provided by the extralinks.
4. Fault-tolerance
In this section we study the fault-tolerance of the HiC
interconnection network. The connectivity, fault dia-meter, two-terminal reliability and average two-terminal
k=4k=3
Ave
rage
Dis
tanc
e
Numbe
0
10
20
30
40
50
0 500 1000 1500
Fig. 2. Average distance b
reliability of the HiC topology are determined. Theresults are compared with those of the binary hyper-cube. The binary hypercube is a fault-tolerant topologyand its characteristics have been widely studied andreported [6,14,15]. This makes it an ideal topology to useas a benchmark for fault-tolerance. In general, anysystem with fault-tolerance greater than or near to thatof the binary hypercube can be considered to have goodfault-tolerance.
4.1. Connectivity
We first prove some theorems regarding basic graphtheoretic properties of HiCs.
Theorem 5. The degree dðHiCðk;hÞÞ ¼ 2k:
Proof. dðGÞ ¼ maxðdmÞ: 8mAVðGÞ: In HiCðk;hÞ; degreedm ¼ k when m is the root node or any of the leaf nodesand dm ¼ 2k when m is any other node. &
Theorem 6. The connectivity kðHiCðk;hÞÞ ¼ k:
Proof. In HiCðk;hÞ; k node-disjoint paths exist from anyleaf m to any other node via m’s parent and k � 1neighbours. The root node has k node-disjoint paths toany other node via its k children. Any node n at level l;where 0oloh; has k node-disjoint paths to descendants,or descendants of n’s neighbours, via n’s k � 1 neigh-bours and the appropriate one of n’s children. Node nhas 2k node-disjoint paths to its neighbours via its k
k=5
r of PEs
2000 2500 3000
etween PEs in HiC:
ARTICLE IN PRESSS. Campbell et al. / J. Parallel Distrib. Comput. 64 (2004) 16–2820
children, k � 1 neighbours and single parent. Node n hask node-disjoint paths to all other nodes via n’s parentand k � 1 neighbours. In HiCðk;hÞ; therefore, a minimumof k node-disjoint paths exist between any two nodes.The HiCðk;hÞ is therefore ðk � 1Þ node fault-toler-
ant. &
Theorem 7. The link-connectivity lðHiCðk;hÞÞ ¼ k:
Proof. In HiCðk;hÞ; k link-disjoint paths exist from anyleaf m to any other node via m’s parent and k � 1neighbours. The root node has k link-disjoint paths toany other node via its k children. Any node n at level l;where 0oloh; has k link-disjoint paths to descendants,or descendants of n’s neighbours, via n’s k � 1 neigh-bours and the appropriate one of n’s children. Node nhas 2k link-disjoint paths to its neighbours via its k
children, k � 1 neighbours and single parent. Node n hask link-disjoint paths to all other nodes via n’s parent andk � 1 neighbours. In HiCðk;hÞ; therefore, a minimum of k
link-disjoint paths exist between any two nodes.The HiCðk;hÞ is therefore ðk � 1Þ link fault-toler-
ant. &
A more accurate reflection of the practical fault-tolerance of the network is given by determining theprobability of any fault set of k nodes or l linksdisconnecting the network [5].
Theorem 8. The probability p of an HiCðk;hÞ with N nodes
being disconnected by a k-node fault set is
p ¼ khð2k � 2Þ þ kðh�1Þ � kð2k � 2Þ � 1k � 1 � ðN � kÞ!k!
N!:
Proof. The total number F of k-node fault sets possibleis F ¼ N
k
� �which can be represented as F ¼ N!
ðN�kÞ!k!: LetTl be the number of k-node fault sets which can occur atlevel l which cause disconnection of the network. LetT ¼
Phl¼0Tl be the total number of k-node fault sets
which cause disconnection of the network. At level l ¼ h
there is only the root node, therefore Th ¼ 0: At levell ¼ h � 1 disconnection will occur only if all k nodes in aclique are faulty. There is only one clique at this level,therefore Tðh�1Þ ¼ 1: At level l; where 0oloðh � 1Þ;disconnection of the network will occur if n ð0onpkÞnodes in a clique are faulty, and the parents of the ðk �nÞ non-faulty nodes in the clique are faulty. The numberof ways such a failure can occur at a particular clique is
2k � 1: The total number of cliques for 0oloðh � 1Þ isPh�2l¼1 kðh�l�1Þ ¼ ðkðh�1Þ�kÞ
ðk�1Þ for ka1: ThereforePh�2
l¼1 Tl ¼ð2k�1Þðkðh�1Þ�kÞ
ðk�1Þ : At level l ¼ 0; disconnection of the
network will occur if n ð0onokÞ nodes in a clique arefaulty, and the parents of the ðk � nÞ non-faulty nodesin the clique are faulty. The number of ways such a
failure can occur at a particular clique is 2k � 2: Thetotal number of cliques for l ¼ 0 is kðh�1Þ: Therefore
T0 ¼ kðh�1Þð2k � 2Þ: Summing these terms we obtain
T ¼ Th þ Tðh�1Þ þXh�2l¼1
Tl þ T0 ð5Þ
¼ 1þ ð2k � 1Þðkðh�1Þ � kÞðk � 1Þ þ kðh�1Þð2k � 2Þ ð6Þ
¼ ðkh�1 � 1Þð2kk � 2k þ 1Þk � 1 : & ð7Þ
The probability of any given k node fault set causingdisconnection is given by T=F :
4.2. Fault diameter
From Theorem 6 we have kðHiCðk;hÞÞ ¼ k: The faultdiameter f ðHiCðk;hÞÞ is the largest diameter of thenetwork in the presence of a fault set of k � 1 nodes.Determining f ðHiCðk;hÞÞ involves computing max dðm; nÞfor each pair of PEs in HiCðk;hÞ given any k � 1 nodefault set.Let SðGÞ be the set of all pairs of nodes in VðGÞ: As
we are concerned with the ability of a network toprovide communication between PEs, let SðHiCðk;hÞÞ bethe set of all pairs of leaf nodes in VðHiCðk;hÞÞ: The pairsof leaf nodes within SðHiCðk;hÞÞ can be classifiedaccording to the distance dðm; nÞ: We divideSðHiCðk;hÞÞ into three classes and prove a lemma oneach.
Lemma 9. If dðm; nÞ ¼ 1 there are k node and link disjoint
paths between m and n of length at most dðm; nÞ þ 2:
Proof. If dðm; nÞ ¼ 1 then m and n are in the same clique.Node m has address M ¼ /M0;y;Mh�1S and node nhas address N ¼ /N0;y;Nh�1S: Let u be a leaf nodewhich is a neighbour of m and of n:
(1)
There is one path of length 1:m-n:
(2)
There are k � 2 paths of length 2:m- one of k � 2 possible nodes u-n:
(3)
There is one path of length 3:m-m’s parent -n’s parent -n: &
Lemma 10. If dðm; nÞ41 and dðm; nÞ is ODD there are k
node and link disjoint paths between m and n of length at
most dðm; nÞ þ 1:
ARTICLE IN PRESSS. Campbell et al. / J. Parallel Distrib. Comput. 64 (2004) 16–28 21
Proof. If dðm; nÞ41 and dðm; nÞ is ODD then m and n arenot in the same clique and do not share a commonancestor other than the root. Node m has address M ¼/M0;y;Mh�1S and node n has address N ¼/N0;y;Nh�1S: Let node u with address U ¼/M0;y;Mh�2;Nh�1S and node v with address V ¼/N0;y;Nh�2;Mh�1S be two other leaf nodes. Then u isa neighbour of m and the least common ancestor of u andn; denoted by lcaðu; nÞ; exists at level l; where 0oloh:Also, v is a neighbour of n and lcaðv; mÞ exists at level l;where 0oloh: Let u be any leaf node other than u
which is a neighbour of m: Let vu be that leaf nodewhich is a neighbour of n such that lcaðvu; uÞ exists atlevel l:
(1)
There is one path of length dðm; nÞ:m-lcaðm; vÞ-v-n:
(2)
There is one path of length dðm; nÞ:m-u-lcaðu; nÞ-n:
(3)
There are k � 2 paths of length dðm; nÞ þ 1; eachincluding:m- one of k � 2 possible nodes u-vu-n: &
Lemma 11. If dðm; nÞ is EVEN there are k node and
link disjoint paths between m and n of length at most
dðm; nÞ þ 2:
Proof. If dðm; nÞ is EVEN then m and n are not in thesame clique but share a common ancestor other than theroot, that is lcaðm; nÞ exists at level l; where 1oloh: Letu be any leaf node which is a neighbour of m: Let vu bethat leaf node which is a neighbour of n such thatlcaðvu; uÞ exists at level l:
(1)
There is one path of length dðm; nÞ:m-lcaðm; nÞ-n:
(2)
There are k � 1 paths of length dðm; nÞ þ 2 eachincluding:m- one of k � 1 possible nodes u-vu-n: &
Lemma 12. In an HiCðk;hÞ with hX2; any two leaf nodes
have at least k node disjoint paths between them of length
2h or less.
Proof. From Lemma 9, if dðm; nÞ ¼ 1 there are k nodedisjoint paths of maximum length 3 between m and n:For hX2; 3p2h � 1:From Theorem 3, DðHiCðk;hÞÞ ¼ 2h � 1; which implies
that DðHiCÞ is always ODD. Therefore maxðdðm; nÞÞ ¼DðHiCÞ if dðm; nÞ is ODD. From Lemma 10, if dðm; nÞ isODD, there are k node disjoint paths of maximumlength dðm; nÞ þ 1 between m and n: Combining these tworesults, we see that nodes m and n with dðm; nÞ ODD,
have k node disjoint paths of maximum length 2h
between them.From Lemma 11, if dðm; nÞ is EVEN there are k
node disjoint paths between m and n of length at mostdðm; nÞ þ 2: But distance dðm; nÞ is EVEN if andonly if lcaðm; nÞ exists at level l; where 1oloh: Forloh path length 2l þ 2 is never greater than 2h:Therefore there are k node disjoint paths of maximumlength 2h:These three cases include all possible pairs of leaf
nodes, hence the proof. &
Theorem 13. A hierarchical clique HiCðk;hÞ has a fault
diameter f ¼ 2h for k41 and h41:
Proof. Let m and n be two leaf nodes at distance dðm; nÞ:From Theorem 3 we know that at least one pair ofnodes m; n exists with distance dðm; nÞ ¼ DðHiCðk;hÞÞ ¼2h � 1:If k42 then a fault set of k � 1 nodes can break all
the paths between m and n of length 2h � 1:If k ¼ 2 then a fault set of k � 1 nodes cannot
break all the paths between m and n of length 2h � 1;since the proof for Lemma 10 shows that thereare always two paths of length dðm; nÞ if dðm; nÞ isODD. However, if dðm; nÞ ¼ 2h � 2 then the proof forLemma 11 shows that only a single path of length2h � 2 exists, with a further k � 1 paths of 2h:Thus a fault set of k � 1 nodes can increase dðm; nÞfrom 2h � 2 to 2h:From Lemma 12 we see that a fault set of k � 1 nodes
cannot increase dðm; nÞ to more than 2h regardless of thevalue of k: The largest diameter in the presence of ak � 1 node fault set is therefore 2h: &
In [7] two classes of graphs based on fault diameterwere distinguished.
Definition 14. A class of graphs Gi is strongly resilient if,for all i; there exists a constant t such thatfðGÞpDðGÞ þ t:
Definition 15. A class of graphs Gi is weakly resilient if,for all i; there exists a constant t such thatfðGÞpDðGÞ � t:
The HiCðk;hÞ is strongly resilient, since f ¼ D þ 1: Thisindicates that even under maximally faulty conditionsthe performance of the HiCðk;hÞ will not be severelydegraded.In Fig. 3 the fault diameter of the binary n-cube is
compared with that of HiCð4;hÞ and HiCð3;hÞ for a rangeof network sizes. The fault diameter of the HiCð3;hÞincreases most rapidly with increasing network size,followed by the n-cube. The HiCð4;hÞ clearly has thelowest fault diameter for all network sizes considered.
ARTICLE IN PRESS
HiC(3,h)
n-cube
HiC(4,h)
Faul
t Dia
met
er
Number of PEs
2
4
6
8
10
12
0 200 400 600 800 1000
Fig. 3. The fault-diameter of n-cube and HiC:
S. Campbell et al. / J. Parallel Distrib. Comput. 64 (2004) 16–2822
GivenHiCðk1;h1Þ andHiCðk2;h2Þ with identical numbers ofPEs, if k14k2 then f ðHiCðk1;h1ÞÞof ðHiCðk2;h2ÞÞ:
4.3. Two-terminal reliability
The two-terminal reliability or path reliability R2ðm; nÞbetween a pair of nodes m and n in VðGÞ is defined asthe probability of finding a path entirely composed ofoperational links and nodes between m and n [3]. LetSðGÞ be the set of all pairs of nodes in VðGÞ; thenthe two-terminal reliability R2ðGÞ of graph G is definedas ðminR2ðm; nÞÞ : 8ðm; nÞASðGÞ: As we are concernedwith the ability of a network to provide communica-tion between processors, the two-terminal reliabilityfor the HiC deals only with leaf nodes. Let SðHiCðk;hÞÞbe the set of all pairs of leaf nodes in HiCðk;hÞ: Inthis section a lower bound for R2ðHiCðk;hÞÞ will bedetermined.From Theorem 7, there are at least k link-disjoint
paths between any given pair of leaf nodes m and n inHiCðk;hÞ: Assume that link failures are statisticallyindependent and occur randomly in time. A lowerbound IR2ðm; nÞm can be determined by establishingthe probability of at least one of the k link-disjointpaths between leaf nodes m and n being completelyoperational. Using Lemmas 9–11, minðIR2ðm; nÞmÞ:8ðm; nÞASðHiCðk;hÞÞ can be obtained; a lower bound onR2ðHiCðk;hÞÞ: Assume that all links have a probability p
of being operational. Then the probability of anypath of length l being operational is pl and theprobability of failure of such a path is 1� pl :The probability of failure of i link disjoint paths, all of
length l; is given by ð1� plÞi: Therefore, the probabilitythat at least one of i link disjoint paths of length l isoperational is 1� ð1� plÞi:
Theorem 16. If h41 then IR2ðHiCðk;hÞÞm ¼ 1� ð1�pDðHiCÞþ1Þk:
Proof. From Theorem 3, DðHiCðk;hÞÞ ¼ 2h � 1; whichimplies that DðHiCÞ is always ODD. Therefore,maxðdÞ ¼ DðHiCÞ if d is ODD and maxðdÞ ¼ DðHiCÞ �1 if d is EVEN. From Lemma 11 we know that if dðm; nÞis EVEN there are k link-disjoint paths between m and nof length at most dðm; nÞ þ 2: Therefore a lower boundIR2ðm; nÞm is given by ð1� ð1� pdðm;nÞþ2ÞkÞ: ButmaxðdÞ ¼ DðHiCÞ � 1 if d is EVEN. Therefore minð1�ð1� pdþ2ÞkÞ ¼ 1� ð1� pDðHiCÞþ1Þk:From Lemma 10 we know that if dðm; nÞ41 is ODD
there are k link-disjoint paths between m and n of lengthat most dðm; nÞ þ 1: Therefore a lower bound IR2ðm; nÞmis given by ð1� ð1� pdðm;nÞþ1ÞkÞ: But maxðdÞ ¼ DðHiCÞif d is ODD. Therefore minð1� ð1� pdþ1ÞkÞ ¼ 1� ð1�pDðHiCÞþ1Þk:A lower bound IR2ðHiCðk;hÞÞm is given by
minðIR2ðm; nÞmÞ:8ðm; nÞASðHiCðk;hÞÞ: SinceminðIR2ðm; nÞmÞ is given by ð1� ð1� pDðHiCÞþ1ÞkÞ ifdðm; nÞ is EVEN or if dðm; nÞ is ODD and greater thanone, and if h41; we conclude that IR2ðHiCðk;hÞÞm isgiven by 1� ð1� pDðHiCÞþ1Þk: &
4.4. Average two-terminal reliability
While two-terminal reliability can give an indicationof the likelihood of failure of communications betweenspecific pairs of nodes, or even classes of pairs of nodes,it gives no indication of the average path reliability of apair of nodes in the graph. A graph with a few short,reliable links and many long, unreliable ones isindistinguishable from a graph with many short, reliablelinks and few long, unreliable ones. Average two-terminal reliability overcomes this shortcoming. Aver-age two-terminal reliability R2ðGÞ is defined as1
jSðGÞjP
R2ðm; nÞ : 8ðm; nÞASðGÞ where jSðGÞj ¼ jVðGÞj2
� �:
In this section we derive a lower bound on average two-terminal reliability for the HiCðk;hÞ: A lower bound onthe average two-terminal reliability of the binaryhypercube will also be derived, and compared with thatof the HiC for a range of system sizes.A network is symmetric if it is isomorphic to itself
with any node labelled as the origin. A symmetricnetwork appears the same if viewed from the perspectiveof any node. Therefore, all values of R2ðm; nÞ can bedetermined by considering only those sets of nodes inSðGÞ which contain a given node m: DefineSðG; mÞCSðGÞ as the set of all pairs of nodes in VðGÞwhich contain node m: For symmetric networks, then,
ARTICLE IN PRESSS. Campbell et al. / J. Parallel Distrib. Comput. 64 (2004) 16–28 23
we can redefine R2ðGÞ as 1jVðGÞj�1
PR2ðm; nÞ :
8ðm; nÞASðG; mÞ: Symmetric networks have a furtherproperty however: There is a constant k such thatR2ðm; nÞ ¼ k : 8ðm; nÞASðGÞ with a given distance d:Define SðG; m; dÞCSðGÞ as the set of all pairs of nodes
in VðGÞ which contain node m and have a given value ofdistance d: We can now redefine R2ðGÞ asP
ððR2ðm; nÞ : 8ðm; nÞASðG;m; dÞÞ � jSðG;m; dÞjÞ : 8SðG; m; dÞASðG; mÞjVðGÞj � 1
ð8Þ
Since for the HiC we are concerned only with the PEs,we can replace jVðGÞj � 1 with kh � 1:Recall that Theorem 16 established lower bounds
IR2ðm; nÞm for all pairs of distinct nodesðm; nÞASðHiC; mÞ:
(1)
IR2ðm; nÞm is ð1� ð1� p3ÞkÞ: 8ðm; nÞASðHiC; m; d : 1Þ:(2)
IR2ðm; nÞm is ð1� ð1� pdþ2ÞkÞ: 8ðm; nÞASðHiC; m; d : EVENÞ:(3)
IR2ðm; nÞm is ð1� ð1� pdþ1ÞkÞ: 8ðm; nÞASðHiC; m; d :ODD41Þ:It remains to determine jSðHiC; m; dÞj for eachSðHiC; m; dÞASðHiC; mÞ:
(1)
If dðm; nÞ ¼ 1 then m and n are neighbours, andjSðHiC; m; d : 1Þj ¼ k � 1:
(2)
If dðm; nÞ is even then m and n share a commonancestor, andjSðHiC; m; d :EVENÞj ¼ kðd2Þ � kðd�22 Þ:
(3)
If dðm; nÞ41 is odd then m and n are not in the sameclique and do not share a common ancestor, sojSðHiC; m; d :ODD41Þj ¼ ðkd�12 � k
d�32 Þðk � 1Þ:
Thus a lower bound IR2ðHiCðk;hÞÞm is made up of thesum of three terms. The first term comes from nodepairs with a distance of 1.
Term1 ¼ ðk � 1Þðkh � 1Þ ð1� ð1� p3ÞkÞ:
The second term comes from node pairs with an evendistance.
Term2 ¼Xh�1d2¼1
ð1� ð1� pdþ2ÞkÞ kðd2Þ � kðd�2
2Þ
kh � 1 ð9Þ
¼ ðk � 1Þkðkh � 1Þ
Xh�1d2¼1
kd2ð1� ð1� pdþ2ÞkÞ: ð10Þ
The third term comes from node pairs with an odddistance greater than 1.
Term3 ¼Xh�1
d�12
¼1
ð1� ð1� pdþ1ÞkÞ ðkd�12 � k
d�32 Þðk � 1Þ
kh � 1
ð11Þ
¼ ðk � 1Þ2
kðkh � 1ÞXh�1
d�12
¼1
kd�12 ð1� ð1� pdþ1ÞkÞ ð12Þ
IR2ðHiCðk;hÞÞm is Term1þ Term2þ Term3:We now determine a lower bound IR2ðQÞm on the
average two-terminal reliability for a hypercube Q ofdimension n: Since the hypercube is symmetric, theexpression for R2ðGÞ given in (8) is applicable. The totalnumber of node pairs considered is given by jVðQÞj �1 ¼ 2n � 1: We can also determine that jSðQ; m; dÞj ¼
n!ðn�dÞ!�d!: A lower bound on the two-terminal reliability
IR2ðQÞm was determined in [5]. If the probability of alink being operational is p; then
IR2ðQÞm ¼ minð1� ðð1� pdÞdð1� pðdþ2ÞÞðn�dÞÞÞover all distances 0odpn: Therefore, a lower bound isgiven by
IR2ðQÞm ¼ n!
ð2n � 1ÞXn
d¼1
ð1� ðð1� pdÞdð1� pðdþ2ÞÞðn�dÞÞÞðn � dÞ!� d!
ð13Þ
¼ 2n
2n � 1�1
2n � 1Xn
d¼1
nd
� �ð1� pdÞdð1� pdþ2Þn�d :
ð14Þ
4.5. Comparison
The lower bounds determined for the average two-terminal reliability of hypercube, HiCð4;hÞ and HiCð3;hÞare plotted against the number of PEs in Fig. 4. Thelower bound on the two-terminal reliability of thehypercube rises with network dimension. This is asexpected, since the connectivity and node degree of ahypercube also increase with dimension. The HiCðk;hÞnetwork, by contrast, has fixed connectivity and nodedegree, consequently the lower bound decreases withnetwork size. The rate of decrease is much lower in theHiCð4;hÞ than in the HiCð3;hÞ: Further increasing thevalue of k leads to higher bounds for the value ofaverage two-terminal reliability, but at the cost ofincreased node degree. An HiCð4;4Þ has 256 PEs, nodedegree of 8 and an average two-terminal reliability ofover 0:9 with an operational link probability of p ¼ 0:9:With the same operational link probability, a hypercube
ARTICLE IN PRESS
n-cube
HiC(4,h)
HiC(3,h)
Ave
rage
Rel
iabi
lity
Number of PEs
0
0.2
0.4
0.6
0.8
1
0 50 100 150 200 250
Fig. 4. The average two-terminal reliability of hypercube and HiC:
if ð0oloh � 1Þ then
if /Cl ;y;Ch�1S ¼ /Dl ;y;Dh�1S then
Route message to child /Dl�1;y;Dh�1Selse
if /Cl ;y;Ch�2S ¼ /Dl ;y;Dh�2S then
Route message to neighbouring SE/Dl ;y;Dh�1S
else
S. Campbell et al. / J. Parallel Distrib. Comput. 64 (2004) 16–2824
of the same size has an average two-terminal reliabilityof nearly 1 but node degree of 16.
Route message to parentelse
if ðl ¼ h � 1Þ then
if /ClS ¼ /DlS then
Route message to child /Dl�1;DlSelse
Route message to neighbouring SE /DlS
5. Message routing algorithms
5.1. One to one communication
PEs and SEs handle message routing in differentways. PEs either source or receive messages, SEs serveonly as intermediate nodes for PE to PE communica-tion. Let the address of the current node be/Cl ;y;Ch�1S where ð0plohÞ and the destinationaddress be /D0;y;Dh�1S:A PE sending a message compares /C0;y;Ch�2S
with /D0;y;Dh�2S: If they are not the same themessage is routed to the PE’s parent. If they are thesame then the destination PE is a neighbour and themessage is passed directly. This is expressed formally inAlgorithm 1.
if /C0;y;Ch�2S ¼ /D0;y;Dh�2S then
Route message to neighbouring PE /D0;y;Dh�1Selse
Route message to parentif l ¼ h � 1 then
Send message to all neighbourselse
if /Cl ;y;Ch�1S ¼ /Sl ;y;Sh�1S then
Send message to parent and all children suchthat/Cl�1;y;Ch�1Sa/Sl�1;y;Sh�1S
else
Send message to all children
Algorithm 1. HiC PE message routing algorithm.
A SE at level l has address /Cl ;y;Ch�1S forð0oloh � 1Þ:Upon receipt of a message a SE comparesits own address with the corresponding digits from thedestination node address. If they are the same the SE isan ancestor of the destination node, and the message canbe routed to the child node which is either the
destination or an ancestor of the destination. If theaddresses differ only in the rightmost digit, then aneighbour of the SE is an ancestor of the destinationnode, and the message is routed via that neighbour. Inany other case the message is routed further up thehierarchy, to the SE’s parent. An SE at level l ¼ ðh � 1Þhas address ðClÞ: If ðClÞ ¼ ðDlÞ then the SE is anancestor of the destination node, and the message isrouted to the child node which is the destination or anancestor of the destination. Otherwise the message isrouted to the SE’s neighbour which is an ancestor of thedestination node. This is expressed formally in Algo-rithm 2.
Algorithm 2. HiC SE message routing algorithm.
5.2. One to many communication
The HiC broadcast algorithms are very simple. Thebroadcast algorithm used by the source PE, with address/S0;y;Sh�1S; is trivial. The source PE routes themessage only to its parent. For a SE at level l; whereð0olph � 1Þ; the broadcast algorithm is shown inAlgorithm 3.
Algorithm 3. HiC SE broadcast algorithm.
ARTICLE IN PRESSS. Campbell et al. / J. Parallel Distrib. Comput. 64 (2004) 16–28 25
Note that the message is transmitted to neighboursonly at level l ¼ h � 1: At all other levels only parent orchild links are used. This is to prevent PEs fromreceiving duplicate messages.
Fð0; 0Þ m with address /M0;y;Mh�1S such thatMi ¼ 1 for 0piph � 1for x ¼ 0-2h � 1 do
if xo2h � 1 then
CALL MAPXðx; 0Þfor y ¼ 0-2h � 2 do
CALL MAPYðx; yÞ
PROCEDURE MAPXðx; yÞGiven thatFðx; yÞ m with address /M0;y;Mh�1Sif x is EVEN then
if Mh�1 is EVEN then
Fðx þ 1; yÞ n with address /M0;y;Mh�1 �1S
else
Fðx þ 1; yÞ n with address /M0;y;Mh�1 þ1S
else
if Mh�1 is EVEN then
Fðx þ 1; yÞ n with address /M0 þ1;M1;y;Mh�1S
else
Fðx þ 1; yÞ n with address /N0;y;Nh�1Sk ¼ 0; i ¼ 1repeat
if xþ14i mod 4 is ODD then
k ¼ 2i
else
if xþ14i mod 4 ¼ 2 then
k ¼ 2i þ 1else
i ¼ i þ 1until ka0for i ¼ 0-k � 1 do
if Mi is EVEN then
Ni ¼ Mi � 1else
Ni ¼ Mi þ 1for i ¼ k-h � 1 do
Ni ¼ Mi
6. Network embeddings
The ability of a parallel computer’s interconnectionnetwork to emulate other networks allows it toeffectively employ algorithms and data structuresdeveloped for different parallel architectures.The poten-tial quality of such an emulation can be determinedfrom the parameters of the embedding. Let G and H beundirected graphs. G represents a guest graph and H
represents a host graph. Using the terminology of [11]an embedding of G into H is a mapping F from thenodes of G to the nodes of H: The mapping from anynode g in G to a node h in H is represented by FðgÞ ðhÞ: The dilation of an embedding F is the maximumdistance in the host between the images of adjacent guestnodes. The expansion of the embeddingF is the ratio ofthe number of nodes in the host graph to the number ofnodes in the guest graph, i.e. jVðHÞj=jVðGÞj: Sinceexpansion is primarily a measure of how effectively anembedding utilises the available computing resources wederive it using only the number of PEs in a system anddisregard the SEs. The number of edges of G routedthrough edge eAEðHÞ in embeddingF is cðeÞ: The edge
congestion of F is maxðcðeÞÞ : 8eAEðHÞ:We have chosen to study embeddings of two of the
most popular interconnection networks, two-dimen-sional meshes and binary hypercubes. A wide varietyof algorithms have been developed for these twotopologies, so efficient embeddings ensure easy avail-ability of well tested, high-performance algorithms foruse on the hierarchical cliques network when resourcesare not available to develop custom algorithms. Binarystructures such as meshes or binary hypercubes mapmost naturally onto the HiCðk;hÞ when k is a power oftwo. In the following work, all mappings are made toHiCð4;hÞ:
6.1. Two-dimensional meshes
AmeshM of dimension 2h � 2h can be embedded intoHiCð4;hÞ: A node of mesh M is identified as ðx; yÞ; where0px; yo2h: For HiCð4;hÞ the address of a leaf node m atlevel l is represented by M ¼ /M0;y;Mh�1S: For anymesh node ðx; yÞ where x is even, mesh nodes ðx; yÞ andðx þ 1; yÞ will map to neighbours in the HiCð4;hÞ:Similarly, for any mesh node ðx; yÞ where y is even,mesh nodes ðx; yÞ and ðx; y þ 1Þ will map to neighboursin the HiCð4;hÞ: For any mesh node ðx; yÞ where x is odd,mesh nodes ðx; yÞ and ðx þ 1; yÞ share a commonancestor at level l: Similarly for any mesh node ðx; yÞ
where y is odd, mesh nodes ðx; yÞ and ðx; y þ 1Þ share acommon ancestor at level l: The level l of the commonancestor is determined by the value of x or y: Theembedding is described in Algorithm 4.
Algorithm 4. 2D mesh embedding.
This embedding has dilation 2ðh � 1Þ; expansion 1 andedge congestion 2h�1:
Algorithm 5. 2D mesh embedding procedure MAPX.
ARTICLE IN PRESS
PROCEDURE MAPYðx; yÞGiven thatFðx; yÞ m with address /M0;y;Mh�1Sif y is EVEN then
if Mh�1p2 then
Fðx; y þ 1Þ n with address /M0;y;Mh�1 þ2S
else
Fðx; y þ 1Þ n with address /M0;y;Mh�1 �2S
else
if Mh�142 then
Fðx; y þ 1Þ n with address /M0 þ2;M1;y;Mh�1S
else
Fðx þ 1; yÞ n with address /N0;y;Nh�1Sk ¼ 0; i ¼ 1repeat
if xþ14i mod 4 is ODD then
k ¼ 2i
else
if xþ14i mod 4 ¼ 2 then
k ¼ 2i þ 1else
i ¼ i þ 1until ka0for i ¼ 0-k � 1 do
if Mi42 then
Ni ¼ Mi � 2else
Ni ¼ Mi þ 2for i ¼ k-h � 1 do
Ni ¼ Mi
for all qAQðnÞ do
if n is EVEN then
FðqÞ mfor i ¼ 0 to In=2� 1m do
Mi ¼ ðqn�2i; qn�2i�1Þ10 þ 1else
FðqÞ mMh�1 ¼ ðq1; q0Þ10 þ 1Mh�2 ¼ ðqn�1Þ10 þ 1for i ¼ 1 to In=2� 1m do
Mi�1 ¼ ðqn�2i; qn�2i�1Þ10 þ 1
S. Campbell et al. / J. Parallel Distrib. Comput. 64 (2004) 16–2826
Algorithm 6. 2D mesh embedding procedure MAPY.
6.2. Binary hypercubes
A binary hypercube QðnÞ can be embedded intoHiCð4;Inþ1
2mÞ: For any QðnÞ the address of a node q is
represented as a binary string ðqn�1;y; q0Þ [11]. For anybinary string ðqx;y; qx�yÞ; where ypx; we represent the
decimal value of the binary number ðqx;y; qx�yÞ byðqx;y; qx�yÞ10: For HiCð4;hÞ the address of a leaf node mis represented by M ¼ /M0;y;Mh�1S: A 2-cube canbe trivially embedded into HiCð4;1Þ with dilation, edge
congestion and expansion all equal to one. A 3-cube canbe mapped onto HiCð4;2Þ as two 2-cubes on separate
cliques, with each node also connected to the appro-priate node in the other clique via their common parent.This mapping has dilation and expansion of two andcongestion of one. For each further increment in thevalue of n; if n becomes odd the height of the HiCð4;hÞinto which it is embedded will increase by one and onlyhalf the PEs will be used. If n becomes even the height
will remain the same and the other half of the PEs willbe used. The embedding is described in Algorithm 7.
Algorithm 7. Binary hypercube embedding.
The dilation of this embedding is derived in thefollowing way. A node q of QðnÞ has addressðqn�1;y; q0Þ: If a node p of QðnÞ is one of the n nodesdirectly connected to q then p’s address differs from thatof q in that one bit qx is replaced by qx; where 0pxpn �1: From Algorithm 7 it is clear that, if FðqÞ m andFðpÞ n; then the address of n differs from that of m inthat one digit My is replaced by My71; where0pyrh � 1: If y ¼ h � 1 then m and n are neighbours.If 0pyoh � 1 then lcaðm; nÞ occurs at level y þ 1: Themaximum distance between adjacent nodes of QðnÞ istherefore 2ðh � 2þ 1Þ ¼ 2ðh � 1Þ:The congestion of this embedding is derived as
follows. For np3 congestion is equal to one, that is,for n ¼ 3 congestion is 2ðn�3Þ: For each subsequentincrement in the value of n; if n becomes even the heightof the HiCð4;hÞ into which it is embedded remains thesame, but the number of edges routed through the linksfrom nodes 1–4 doubles as the number of embeddednodes doubles. If n becomes odd the height of theHiCð4;hÞ into which it is embedded will increase by one,but the number of edges routed through the links fromnodes 1–4 still doubles, as once again the number ofembedded nodes doubles. Thus for each increase in n thecongestion increases by 2, and the system congestion canbe expressed as 2ðn�3Þ for nX3:This embedding has:dilation 2ðh � 1Þ;edge congestion 2n�3;
expansion1 : n EVEN;2 : n ODD:
6.3. Embeddings into the FatHiC
In the embeddings considered above the edgecongestion grows rapidly as the size of the embeddednetwork increases, due to congestion in higher-level
ARTICLE IN PRESS
Table 1
Parameters of Embeddings into HiC and GFT
Guest graph Host graph Dilation Expansion Edge
congestion
2D-mesh HiCð4;hÞ 2ðh � 1Þ 1 2h�1
FatHiCð4;hÞ 2ðh � 1Þ 1 1
2h � 2h GFTð2h;2;2Þ 4h 1 3
Hypercube HiCð4;Inþ12
mÞ 2ðh � 1Þ 1 or 2 2n�3
FatHiCð4;Inþ12
mÞ 2ðh � 1Þ 1 or 2 2
order n GFTðn;2;2Þ 2n 1 Jn=2n
Table 2
Hardware requirements of HiCð4;3Þ; FatHiCð4;3Þ and GFTð6;2;2Þ
HiCð4;3Þ FatHiCð4;3Þ GFTð6;2;2Þ
PEs 64 64 64
SEs 21 21 384
Links 210 318 768
S. Campbell et al. / J. Parallel Distrib. Comput. 64 (2004) 16–28 27
communication channels. Since congestion in the upperlevels is possibly the most fundamental problem withtree-based networks it has already been extensivelystudied. A well-accepted solution has been proposed, theFat tree of [9]. In order to prevent congestion in theupper levels dominating communication time for regularalgorithms involving a large proportion of PEs trans-mitting across the diameter of the network simulta-neously, the bandwidth of links in upper layers isincreased. Applying the same principle, we define aFatHiCðk;hÞ as an extension of the HiCðk;hÞ; where thebandwidth of parent–child links increases with the levelof the nodes incident to them. The increase ofbandwidth required is dictated by the cost andperformance requirements of the system; the higherthe bandwidth of the links the higher the expectedperformance, but at a higher cost. While consideringembeddings into the FatHiCðk;hÞ the link bandwidth willbe assumed to increase by a factor of k at each level ofthe network. For example, a FatHiCð4;3Þ would havelinks of bandwidth B from leaf nodes at level zero totheir parents at level one, links of bandwidth 4B fromnodes at level one to their parents at level two, and linksof bandwidth 16B from nodes at level two to the rootnode. Assuming that the cost of the extra bandwidth isapproximated by representing each link of bandwidthkB as k separate links, the number of links in thisversion of a FatHiCðk;hÞ is L ¼ khðk
2þ hÞ � k
2:
Since FatHiCðk;hÞ is topologically identical to HiCðk;hÞany network which can be embedded into HiCðk;hÞ canbe embedded into FatHiCðk;hÞ in exactly the samemanner, with the same dilation and expansion. Thecongestion may be reduced. All embeddings consideredin the previous sections were into HiCð4;hÞ; these sameembeddings are used here into FatHiCð4;hÞ; and thecongestion is determined.Considering the two-dimensional mesh embedding of
Section 6.1, the congestion of the embedding is 1.Considering the binary hypercube embedding of
Section 6.2, the congestion of the embedding is 2.
6.4. Quality of embeddings
In order to provide an indication of the relativequality of these embeddings, Table 1 shows embeddingparameters for HiCðk;hÞ; FatHiCðk;hÞ and another hier-archical network, the generalised fat trees GFTðh;m;wÞ[13]. All the results given here are for GFTðh;2;2Þ: Thisconfiguration has a total number of nodes (PEs andSEs) NðGFTÞ ¼ ðh þ 1Þ � 2h and a number of linksLðGFTÞ ¼ hð2hþ1Þ:For each of the guest graphs studied, the dilation and
expansion of the embeddings into each of the hostgraphs are of the same order. In general, the dilation ofthe GFT is higher by some small factor, indicatingpossible higher latency for communications between
mapped nodes. For each embedding the edge congestionof the HiC grows rapidly, the edge congestion of theGFT is either constant or grows slowly, while the edgecongestion of the FatHiC is constant for each case. Thisindicates congestion would occur rapidly under uniformcommunication conditions with these embeddings intothe HiC: Such a result is not surprising, both theFatHiC used and the GFTðh;2;2Þ maintain constantbandwidth between all levels of the hierarchy while thebandwidth of the HiC decreases in the upper levels ofthe hierarchy. However, the potential performancebenefit yielded by the lower edge congestion of theFatHiC and GFT must be weighed against the cost ofthe extra hardware required to achieve it. For example,a binary hypercube of order 6 maps onto HiCð4;3Þ;FatHiCð4;3Þ or GFTð6;2;2Þ: Table 2 shows that while allthree of these networks have an equal number of PEs,the number of both SEs and Links required by the GFT
is significantly higher than either the HiC or FatHiC:Of course using different members of either family of
networks would yield different results for these map-pings. In general the HiC emphasises local communica-tion over global communication and allows excellent,cost effective local connectivity at the cost of poorerglobal performance. The FatHiC has superior globalcommunication and allows higher quality mappingsthan the GFT ; with fewer switches and less links. TheFatHiC is clearly the best of the hierarchical networksconsidered, offering high-quality embeddings for rea-sonable cost.
7. Discussion and conclusion
We presented the HiC interconnection network formultiprocessor systems. This network, a combination of
ARTICLE IN PRESSS. Campbell et al. / J. Parallel Distrib. Comput. 64 (2004) 16–2828
the tree and fully connected networks, possessestopological and architectural properties superior tomost other hierarchical networks in a variety of problemdomains. Some features of the hierarchical cliquesinclude: (1) flexibility in terms of degree of connectivityand the height of the tree, (2) fault-tolerance, (3) theability to make maximum use of communication localityin parallel algorithms, (4) logarithmic diameter, (5)versatile embedding properties. In addition, the pro-posed addressing scheme facilitates self routing andbroadcasting.
References
[1] A. Agarwal, Limits on interconnection network performance,
IEEE Trans. Parallel Distrib. Systems 2 (4) (1991) 398–412.
[2] S.M. Campbell, M.J. Kumar, Hierarchical cliques: cost effective
multiprocessor interconnection network, IASTED Inter-
national Conference on Parallel and Distributed Computing and
Systems October 16–19, Chicago, Illinois, USA, IASTED, 1996,
pp. 134–138.
[3] C.J. Colbourn, The Combinatorics of Network Reliability,
Oxford University Press, New York, 1987.
[4] W.J. Dally, C.L. Seitz, Deadlock-free message routing in multi-
processor interconnection networks, IEEE Trans. Comput. C-36
(5) (1987) 547–553.
[5] A. El-Amawy, S. Latifi, Properties and performance of folded
hypercubes, IEEE Trans. Parallel Distrib. Systems 2 (1) (1991)
31–42.
[6] A-H. Esfahanian, Generalized measures of fault tolerance with
application to n-cube networks, IEEE Trans. Comput. 38 (11)
(1989) 1586–1591.
[7] M.S. Krishnamoorthy, B. Krishnamurthy, Fault diameter of
interconnection networks, Comput. Math. Appl. 13 (5/6) (1987)
577–582.
[8] J.M. Kumar, L.M. Patnaik, Extended hypercube: a hierarchical
interconnection network of hypercubes, IEEE Trans. Parallel
Distrib. Systems 3 (1) (1992) 45–57.
[9] C.E. Leiserson, Fat-trees: universal networks for hardware-
efficient supercomputing, IEEE Trans. Comput. C-34 (10)
(1985) 892–900.
[10] Q.M. Malluhi, M.A. Bayoumi, The hierarchical hypercube: a new
interconnection topology for massively parallel systems, IEEE
Trans. Parallel Distrib. Systems 5 (1) (1994) 17–30.
[11] B. Monien, H. Sudborough, Embedding one interconnection
network in another, in: G. Tinhofer, E. Mayr, H. Noltemeier, M.
Syslo (Eds.), Computational Graph Theory, Springer, Berlin,
1990.
[12] S.R. Ohring, S.K. Das, Folded Petersen cube networks: new
competitors for the hypercube, The International Parallel
Processing Symposium, Anaheim, CA, USA, April 1993.
[13] S.R. Ohering, M. Ibel, S.K. Das, M.J. Kumar, On generalized fat
trees, International Parallel Processing Symposium, Santa Bar-
bara, CA, April 1995, pp. 37–45.
[14] Y. Saad, M.H. Schultz, Topological properties of hypercubes,
IEEE Trans. Comput. 37 (7) (1988) 867–872.
[15] S-B. Tien, C.S. Raghavendra, Algorithms and bounds for shortest
paths and diameter in faulty hypercubes, IEEE Trans. Parallel
Distrib. Systems 4 (6) (1993) 713–718.
[16] N-F. Tzeng, S. Wei, Enhanced hypercubes, IEEE Trans. Comput.
40 (3) (1991) 284–293.
Dr. Campbell received his bachelor of engineering from the Western
Australian Institute of Technology in 1986 and the Ph.D degree in
computer science from Curtin University of Technology in 1999. In
2002 he joined Fair Isaac Inc where he works developing analytic
platforms. His research interests include design of parallel algorithms,
graph theory and applications, and medical computing.
Dr. Mohan Kumar received his Ph.D. and MTech degrees from the
Indian Institute of Science in 1985 and 1992 resepctively and the BE
degree from Bangalore University, India in 1982. In 2001 he joined the
Department of Computer Science and Engineering as Associate
Professor. He was on the faculty of the School of Computing School
at the Curtin University of Technology, Perth, Australia during 1992-
2000. His interests include pervasive computing, mobile computing,
caching for mobile Internet, active network technologies and
distributed systems. he has published over 80 articles in these areas.
He served as the Program Chair for PerCom 2003 and is the General
Vice Chair for PerCom 2004. He has guest edited special issues leading
journals and is on the editorial board of The Computer Journal.
Dr. Olariu received the M.Sc. and Ph.D. degrees in computer science
from McGill University, Montreal in 1983 and 1986, respectively. In
1986 he joined Old Dominion University where he is a Professor of
Computer Science. Dr. Olariu has published extensively in various
journals, book chapters, and conference proceedings. His research
interests include image processing and machine vision, parallel
architectures, design and analysis of parallel algorithms, computa-
tional graph theory, computational geometry, and mobile computing.
Dr. Olariu serves on the Editorial Board of IEEE Transactions on
Parallel and Distributed Systems, Journal of Parallel and Distributed
Computing, VLSI Design, Parallel Algorithms and Applications,
International Journal of Computer Mathematics, and International
Journal of Foundations of Computer Science.
Recommended