8
Design of a high performance nonblocking copy network for multicast ATM switching X. Liu H.T. Mouftah Indexing terms: Copy networks, Multicast switching Abstract: Multicast switches have found many applications in communication networks. A common design of multicast switches is a tandem of a copy network and a point-to-point routing network. The occurrence of overflow is an inher- ent problem in copy networks. The authors study overflow control mechanisms in copy networks and present a novel copy-network architecture, which is based on T.T. Lee’s design (1988). The network consists of three parts: an input cell- splitting stage, a shared-memory buffering stage and a copy distribution network. The proposed design provides a maximum throughput for the copy network with FIFO capability. A queueing model is also presented, to evaluate and compare the performance of this architecture with other previous work. 1 Introduction A switch is one of the important components in telecom- munication networks. The basic function of a switch is to provide the connection between two users. Multicast switching refers to the function of a switch that can provide a point-to-multiple-points connection. This func- tion is found in many applications, such as tele- conferencing, entertainment video, LAN-bridging, and distributed data processing [l]. The copy-network-based multicast switch is com- monly used because of the flexible control mechanism for easing the network blocking, and the simple architecture for implementation. It is a tandem of a copy network and a point-to-point routing network, as illustrated in Fig. 1. The copy network replicates the inputs to the required number of outputs in an ordered fashion, and the point- to-point routing network then routes each replicated cell to the required destination. In such a design, one inherent problem is the overflow which occurs when the total number of copy requests exceeds the number of output ports. Overflow causes cell loss, and becomes a serious problem as input load increases. Another problem is access fairness, which refers to the input cell selection algorithms when overflow does occur. Assuming all inputs have the same priority, an ideal switch should provide equal access capability to all input ports. 0 IEE, 1994 Paper 13831 (EV, first received 3rd pcember 1993 and in revised form 10th June 1994 The authors are with the Department of Electrical Engineering, Queen’s University at Kingston, Ontario, Canada K7L 3N6 IEE Proc.-Commun., Vol. 141, No. 5, October 1994 point-to-point copy network routing network -- U m L f l o w Fig. 1 A copy network-based multicast network There are several strategies to deal with overflow, such as input buffering [2, 3, 41, HOL (head-of-line) cell selec- tion mechanisms [S, 61, and cell-splitting [7, 81. Input bufering: Buffering is a natural solution to reduce cell loss caused by overflow. Theoretically speaking, as the buffer length approaches infinity, the cell loss will tend to be zero. Two buffering schemes are mainly used: dedicated input buffer [3] and shared-memory input buffer [2, 41. The dedicated buffering scheme is simple and easy to implement, but the memory utilisation is Rot as high as that for the shared-memory scheme, provided that the total memory size of both schemes is the same. Under the nonuniform input traffic conditions, the heavily loaded inputs suffer from a serious cell-loss problem, while the other lightly loaded input buffer case may have some empty buffers. The shared-memory scheme can complement this unevenly used memory problem, so that it can achieve the optimal buffer util- isation. It is particularly good in handling the nonuni- form or bursty input traffic patterns. The drawback of the shared-memory scheme is the extra hardware com- plexity. HOL cell selection: In an input buffering copy-network architecture, several HOL input cells could be blocked if the total number of copies required exceeds the size of the network, and the blocked cell, due to the FIFO queueing discipline, may then hinder the delivery of the The authors would like to thank the anonymous reviewer for providing constructive comments. This work was partially supported by grants from the Natural Science and Engineering Research Council of Canada and the Telecommunications Research Institute of the Ontario Centre of Excel- lence. 317

Design of a high performance nonblocking copy network for multicast ATM switching

  • Upload
    ht

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Design of a high performance nonblocking copy network for multicast ATM switching

Design of a high performance nonblocking copy network for multicast ATM switching

X. Liu H.T. Mouftah

Indexing terms: Copy networks, Multicast switching

Abstract: Multicast switches have found many applications in communication networks. A common design of multicast switches is a tandem of a copy network and a point-to-point routing network. The occurrence of overflow is an inher- ent problem in copy networks. The authors study overflow control mechanisms in copy networks and present a novel copy-network architecture, which is based on T.T. Lee’s design (1988). The network consists of three parts: an input cell- splitting stage, a shared-memory buffering stage and a copy distribution network. The proposed design provides a maximum throughput for the copy network with FIFO capability. A queueing model is also presented, to evaluate and compare the performance of this architecture with other previous work.

1 Introduction

A switch is one of the important components in telecom- munication networks. The basic function of a switch is to provide the connection between two users. Multicast switching refers to the function of a switch that can provide a point-to-multiple-points connection. This func- tion is found in many applications, such as tele- conferencing, entertainment video, LAN-bridging, and distributed data processing [l].

The copy-network-based multicast switch is com- monly used because of the flexible control mechanism for easing the network blocking, and the simple architecture for implementation. It is a tandem of a copy network and a point-to-point routing network, as illustrated in Fig. 1. The copy network replicates the inputs to the required number of outputs in an ordered fashion, and the point- to-point routing network then routes each replicated cell to the required destination.

In such a design, one inherent problem is the overflow which occurs when the total number of copy requests exceeds the number of output ports. Overflow causes cell loss, and becomes a serious problem as input load increases. Another problem is access fairness, which refers to the input cell selection algorithms when overflow does occur. Assuming all inputs have the same priority, an ideal switch should provide equal access capability to all input ports.

0 IEE, 1994 Paper 13831 (EV, first received 3rd pcember 1993 and in revised form 10th June 1994 The authors are with the Department of Electrical Engineering, Queen’s University at Kingston, Ontario, Canada K7L 3N6

I E E Proc.-Commun., Vol. 141, No. 5, October 1994

point-to-point copy network routing network --

U

‘ m L f l o w

Fig. 1 A copy network-based multicast network

There are several strategies to deal with overflow, such as input buffering [2, 3, 41, HOL (head-of-line) cell selec- tion mechanisms [S, 61, and cell-splitting [7, 81.

Input bufering: Buffering is a natural solution to reduce cell loss caused by overflow. Theoretically speaking, as the buffer length approaches infinity, the cell loss will tend to be zero. Two buffering schemes are mainly used: dedicated input buffer [3] and shared-memory input buffer [2, 41. The dedicated buffering scheme is simple and easy to implement, but the memory utilisation is Rot as high as that for the shared-memory scheme, provided that the total memory size of both schemes is the same. Under the nonuniform input traffic conditions, the heavily loaded inputs suffer from a serious cell-loss problem, while the other lightly loaded input buffer case may have some empty buffers. The shared-memory scheme can complement this unevenly used memory problem, so that it can achieve the optimal buffer util- isation. It is particularly good in handling the nonuni- form or bursty input traffic patterns. The drawback of the shared-memory scheme is the extra hardware com- plexity.

HOL cell selection: In an input buffering copy-network architecture, several HOL input cells could be blocked if the total number of copies required exceeds the size of the network, and the blocked cell, due to the FIFO queueing discipline, may then hinder the delivery of the

The authors would like to thank the anonymous reviewer for providing constructive comments. This work was partially supported by grants from the Natural Science and Engineering Research Council of Canada and the Telecommunications Research Institute of the Ontario Centre of Excel- lence.

317

Page 2: Design of a high performance nonblocking copy network for multicast ATM switching

next cell in the queue with a smaller number of copies requested, which may not cause the overflow. HOL cell selection [S, 61 policies are the rules used to choose proper inputs at each time slot, such that the total number of outputs are equal to or the closest to the network size. The selection region could be limited within the HOL cells, or expanded to the cells behind the HOL cells. This mechanism can improve the network per- formance, but since the selection requires a global know- ledge of all inputs, it is not fast enough for an ATM switch. Also, cells may be delivered out of order. More- over, the throughput is still not the best if there is no cell splitting. Due to the irregular combination of copy requests in each time slot, it may happen that the output ports are not fully occupied, even though the inputs over- flow. All of these limit the use of this approach.

Cell-splitting : Traditionally, to schedule a multicast cell, the input is connected to all the requested outputs in one identical time-slot position. However, a multicast cell, which cannot be allocated within one time-slot position, can split and transmit the copies over several time slots. This is called cell splitting. It is an effective method of reducing cell loss and cell delay. One cell-splitting algo- rithm is based on the maximum throughput criteria [8]. Let us denote the network size N , the copy number requested from input line i C N , , and running sums of the copy number Si = x.=, C N , . The running sums are cal- culated on a time-slot basis. Whenever the current running sum Si > N, the encountered input cell is split into two subcells with the copy numbers of CN, , and CN, , respectively, where C N , , = (N - Si- ,), and C N , , = C N , - CN, , . C N , , is transmitted to the outputs, while C N , , , along with the inputs at input ports i + 1, . . . , N , are sent to input queues. This algorithm can guar- antee a full use of the network under overflow conditions.

Based on these strategies, this paper proposes a new copy-network architecture, which has a modular archi- tecture, fully distributed control mechanism and a maximum throughput of 100%. The performance evalu- ation of such architecture and comparison with other previous related work are also provided.

2 The proposed copy-network architecture

In this Section, a new copy-network architecture is pre- sented based on the ideas mentioned above. The general block diagram is shown in Fig. 2. It consists of three parts: an input cell-splitting stage, a shared-memory buf-

fering stage, and a copy distribution network. In the input cell-splitting stage, all inputs are issued a departure time upon arrival, by counting the running sum of total copy numbers in a round robin order. If overflow occurs, the encountered cell i is immediately split into two sub- cells. The first one has the required number of copies equal to the size of the empty output ports, and the departure time is the same as the previously issued DT-,. The other subcell, called the residua1 cell, is arranged to depart one time slot later. A cyclically ordered output address, and the routing information, are also generated in this stage. All this information would form an internal header, which is added to each cell. This control information is used for self routing the cell to a distributively located sharing-memory queue. Cells are then replicated to the required number of copies in the copy distribution network on an FIFO discipline. The detailed designs are described in the following Sections.

2.1 Input cell-splitting stage architecture The input cell-splitting stage consists of two parts: a running adder fabric and N cell-splitting units. The running adder generates running sums Si of copy numbers (CN), specified in the header of input cells. Based on the running sums, cell-splitting processors perform three functions: optimal cell splitting, cell departure-time scheduling, and the new copy-routeing address header generation. Since the maximum CN for each cell is N , the maximum running sum is less than or equal to N 2 , which means 2 log, N bits/output. It is noticed that the higher significant log, N bits of Si can give, not only the information of overflow but also, the degree of overflow. Therefore in the design we use the lower log, N bits of Si denoted by RS, to generate the output addresses of the copy network, and the higher log, N bits as the departure time, denoted by DT. If DT = 0, the cell i will be routed to the output port imme- diately, and if DT 0, the cell i will wait in the queue for DT time slots.

Fig. 3 shows an example of input cell processing in this stage. From the results of Si, it is clear to see that the higher log, N bits (log, N = 2 in this particular case), called overhead, reflect the occurrence of overflow and indicate how many time cycles are needed for the repli- cation of the above input cells as well. Let L be the size of D K . Actually L can be extended or reduced to (log, N f m) bits to satisfy the different delay criteria.

The feedback unit (FBU) is designed to send the S , to the first input line. At the next time slot, the new Si,

copy distribution 1 - , network

shared-memory buffers input cell-splitting stage

1 - 4 Fig. 2

318

The proposed copy-network architecture

IEE Proc.-Commun., Vol. 141, No. 5 , October 1994

Page 3: Design of a high performance nonblocking copy network for multicast ATM switching

running adder cell-splitting units

info. R I min. max. UT. I D I BCN CN

input 0 3 2 2

input 1 7 L

input 2 5 3

input 3 9 4

, , , ,

U out 2 W

min, rnax. UT 01

. - registers

number in upper register = N number in lower register = O

Fig. 3 FBU = DT, feedback unit

Input cell-splitting stage structure CLZ = clear lo zero CP = comparator IHG = internal header generator

i = 1, 2, .. ., N, are issued based on the feedback value. It is possible that the Si overflows; in this case, the cell is lost.

At each cell-splitting unit, the DT-l and D T are com- pared, and the output of the comparator, shown in Table 1, is used to control the cell-splitting process.

Table 1 : Outnut of t h e comDarator in t h e cell-sditt ina unit

l n w t conditions out 1 out 2

DT,_, =DT, 0 0 DT,., < DT, 1 0 DT,., > DT, don't care condition 1

The cell-splitting rules are described as follows:

D T = D T - : No cell-splitting operation takes place. Gate,, is open, and gate, is closed. The internal header format generated in IHG,, shown in Fig. 4a, along with the original cell information are sent to the following registers. D T > D T - , : Cell-splitting operation takes place, gate, is open, and gate, is closed. The internal header format gen- erated in IHG, along with the original cell information are sent to the following registers, see Fig. 4b, where

and D T 2 = O K . CNil = N - RSi+l, CNi, = CN; - CNi1, DT1 = DT-1

D T < D'I-l ; Si overflows. The out2 signal equals 'l', which inhibits input i writing to the registers. The ith cell is lost.

Each copy number then is translated into an address interval represented by two numbers, namely min and max.

mini = RSi-

maxi = RS, - 1

Index reference (IR), is set by the following equations:

IR,, = min

IRi2 = min + Ci,

for i = 0, mini = CN,- 1.

and is used later by the TNT (trunk number translator) to determine the output address for each replicated cell [l]. The newly formed cells are then put into the 2N registers.

2.2 Shared-memory buffering stage This stage is a cascaded combination of a 2N x 2N running adder, a reverse banyan network, and N inter- leaving buffers (see Fig. 5). It performs four functions including cell compacting, sorting, concentrating and buf- fering.

. . . BCN copy number (CN,) departure time (DT,) Active ID (=0/1) * * * * Active ID (=0/1)

la) Header format of cell-solittina unit when DT = DT . - ~ _ _ _ _ _

BCN copy number (CN, , ) departure time (DT,,) Active ID (=1) BCN COW number ICN -) deDarture time (DT -) Active ID (=0/1)

( b ) Header format of cell-splitting unit when DT, DT,_,

Fig. 4 The header format of the cell-splitting unit

I E E hoc.-Comrnun., Vol. 141, No. 5, October 1994 319

Page 4: Design of a high performance nonblocking copy network for multicast ATM switching

The running adder adds up all active cells (specified in the active ID area) recursively to form 2N queueing addresses. The bottom running sum is fed back to the top line of the running adder at the end of each time cycle as

...I 8 4

...I 9 5 1 - -.. 6 ... 7

I D 6 stoge log2N

1 0 1 1 1

1 1 1

Fig. 5 Shared-memory buffertng stage structure

0 , - 2 - 3-

the starting point of the next round of calculation. There- fore queueing addresses are compact and cyclically sequential (these are the nonblocking cell duplication conditions required by the next stage, see the next Section for detail). The shared-memory buffer is physic- ally N distributed memories that are located at each output port of the reverse banyan network. This shared- memory buffer can store the blocked cells efficiently.

A reverse banyan network is used to route the cells cyclically to the queues. In order to multiplex the 2N inputs to N interleaving queues, the switching elements in the last stage are modified to be 2 to 1 time division multiplexers, each of which delivers the two input cells to the output port, one by one. The cell with smaller input address has a higher priority because of the round robin DT issuing discipline.

It may happen that the consecutive inputs from the same input line are delivered in the same time slot. So it is required that the point-to-point switch be capable of providing priority switching in order to preserve the cell sequence. This means the earlier the arrival time of the cell (the time that the cell arrives at the input of the whole multicast switch), the earlier the cell is delivered. Actually, there are several switch designs which can achieve this goal [9, lo].

2.3 Copy distribution network The copy distribution network is a banyan network with broadcast switching nodes capable of cell replication, based on the 2-bit header information. The shared- memory buffers deliver head-of-line cells into the copy distribution network based on their DTs, and the Boolean interval splitting algorithm [I] is used to replic- ate each input cell to the required number of outputs. An extended nonblocking multicast condition of a banyan network [4] is also found independently by the authors [ 111 and use in this network, which is now described.

It has been stated that an N x N reverse banyan network is nonblocking when the input cells have con- secutive output addresses, modulo N [12]. Consider a reverse banyan network with the input-output ‘connec- tions satisfying the nonblocking condition. Let each input-output cell stream go through the network in the reverse direction. One can see that the nonblocking con- dition of the banyan network can be modified as follows:

Theorem I : A banyan network is nonblocking when the input cells are cyclically compact and output addresses are cyclically monotone.

320

. * * 8 ... 9

. S . 6

7 .. .

Fig. 6a illustrates the nonblocking multicast condition in a reverse banyan network, and Fig. 6b shows its image.

4-

5 - 9 2 -

3-9

reverse banyin network

to: 0 -* , 2 l 3 S E + 3 B

bonmn network

to: 6

7 8 mode0

9mod8 1

reverse banvan network

b bonvan network

I’

a bonvon network

3 banyan network

c - compoct - monotone

Nonblocking condition illustration of a banyan network Fig. 6

Theorem 2 : A multicast banyan network is nonblocking if the inputs are compact with cyclical shifts and the outputs are monotone with cyclical shifts.

The proof of theorem 2 is straightforward as in Refer- ence 1. According to theorem 1, each input-output path in different copy groups is link-independent, since the set of input-output trees embedded in copy group must be link-independent; then theorem 2 is verified. Fig. 6c illus- trates the nonblocking multicast routing chditions pre- sented in Reference. 1 and the extended results.

Since the departure times and the output addresses of cells are all issued by the round robin method, and the cells are routed to the queues cyclically, the inputs of the copy distribution network are always compact with cycli- cal sequential output addresses (modulo N) . This guar- antees the nonblocking condition of the network. An example of the copy distribution process is shown in Fig. 7.

copy distribution i=o

inDut oueues 0CN.min.m. DT network 39 TNT . .

7

mpy distribution t

BCN. rnin.mm. DT network , TNT

‘BCN

Fig. 7 network

An example of the input cells replication in the copy distribution

IEE Proc.-Commun., Vol. 141, No. 5, October I994

Page 5: Design of a high performance nonblocking copy network for multicast ATM switching

A trunk number translator (TNT) at each output port of the copy distribution network determines the outgoing trunk number for each copy cell [ l ] .

’ 3 Performance evaluation

The performance of a switching network is usually char- acterised by three parameters: throughput (TP), delay (D), and cell loss (L). T P here is defined as the average number of cells delivered by the network per time slot per output port, D is defined as the average number of time slots a cell spends in the queue, while L refers to the number of cells that were not accepted by the network upon arrival.

In the following part, we present a queueing model for the performance evaluation of the copy network, and illustrate some numerical and simulation results under the independently and identically distributed traffic pattern assumption. Performance comparisons with other related work are also discussed.

3.1 Copy-network performance model The performance of the proposed copy-network architec- ture has been investigated using a finite length queueing model. Let us consider an N x N network. As described in the previous Section, each input cell will reserve an output port upon arrivals by sticking a time stamp at the header. The cells may be split as necessary in order to be sent out as soon as possible. All cells are then sent to the shared-memory FIFO queue, to be served according to the time stamp information carried at the header of each cell. Even though cells are stored on the input cell bases, cells duplicated from the same input may suffer one time- slot delay difference. To simplify the analysis, we model the shared buffer as copy oriented, that is the inputs are dulicated to the required number of copies first, and then stored into the shared-memory queue.

The buffer states can be described by a discrete time finite state Markov chain, in which

( 1 ) where Q, is the queue length at time slot m, A, the total input arrivals at time slot m, and N the size of the network. Knowing the size of the input buffer per input line, the equivalent buffer size for the copied cells, denoted by B, can be estimated by B = N x b x E(Ci) where E(CJ is the average number of copies requested by each input cell. Denoting a, P,[A = k] , k = 0, 1 , . .., MN, the state of queue lengths P(Q = k, I k = 0, 1 , . . . , B), forms a Markov chain with the state transition probabil- ities Pij

= min {max {0, Q, - N + A,}, B }

P,[Q, = j I Q, - = i ] given by

O < i < N j = O

p , , = aN+u-i) O < i < j + N l < j < B - 1 IJ ram

O < i < j j = B

otherwise (2) Knowing the input traffic distribution of ak , the station- ary probability distribution of the queue length can be obtained by the standard Markov model formula:

B B

pi = c P i P i j and Pi = 1 j = 0, 1 , 2 ,..., B (3) i = O j = O

An important factor that affects the performance of the network is the traffic pattern according to which cells arrive at it’s inputs. The traffix pattern is determined by:

(i) the process which describes the arrival of cells at the inputs of the switch

(ii) the distribution of copy numbers.

For simplicity, let us assume the network is operating under a uniform traffic pattern, that is input arrivals are independent and identically distributed. Let Ci be the number of copies requested by an input cell. C i s are inde- pendent and identically distributed with the density func- tion of

Pr (Ci = k) = ( 1 - q 4 ) q k - 1 k = 1, 2, ..., M (4) 1 - q M

where M is the maximum number of copies. The effective input load per input line then can be represented by

j = 1

3.2 Throughput The throughput of the network is related to the state of queue and the input traflic, which can be calculated by

i = O

Here the throughput is defined in two parts. The first part gives the throughput of the network under the con- dition of Q < N, and the second part, P(Q > N), equals the throughput of the network under the condition of Q 2 N .

Theoreticdlly, as the buffer size increases to infinity and the effective input load is equal to or larger than 1 , one can achieve a maximum throughput of TI’,,,,,, = 1 .

Fig. 8 illustrates the results of throughput performance as a function of pef, for various numbers of maximum copy numbers under buffer sizes of 1 and 5, respectively. It can be seen that the throughput performance is very close to the upper bound. Note that the maximum copy

1.0

3

I

3

- a ,0.9

e 5

0.8 0.8 0.9 1 .o

effective input load

Throughput us. effective input load, (network size I6 x I 6 ) Fig. 8 __ numerical results

simulation results The first number in parentheses represents maximum copy oumber/input The second number in parenthem represents buffer size/input

321 IEE Proc.-Commun., Vol. 141, No. 5, October I994

Page 6: Design of a high performance nonblocking copy network for multicast ATM switching

number of M affects the throughput performance. As the maximum copy number increases, T P decreases. This is because, when the maximum number of copies increases, the traffic becomes more bursty. So, if the buffer size is small, the large fluctuation of the total number of copies at the inputs will cause the buffer to be frequently full, and cell losses become unavoidable. This limits the network throughput. Increasing the buffer size can improve the throughput performance. Theoretically, as the buffer size increases to infinity, the maximum throughput as given in eqn. 6 can be achieved. Actually, only increasing the buffer size of each input by 5, the T P can almost reach the upper bound. However this will be at the expense of an increase in the average delay.

3.3 Cell delay By using Little's Law, one can calculate the average waiting time of cells, or cell delay D, in the queue as follows:

Fig. 9 illustrates the results of the average cell delay per- formance as a function of peff for various numbers of

' O r

- 6 ZlZ- U a - m e F 8 -

-

4 -

0 0.6 0.7 0.8 0.9 1 .o

effective input toad

Fig. 9 ~~~- simulation results ~ numerical results e-@ inbite buffer size The first number in parenthescs represents maximum copy numberlinput The second number in parentheses represents buffer sizehnput

Average delay us. effective input load, (network size 16 x 16)

maximum copy number under the buffer sizes of 1 and 5, respectively. It indicates that, when peff is less than 0.6, the average delay is negligibly small, but, as the peff increases, the average delay increases, as the cells will spend a longer time in the queue before they are served. An important and sensitive factor that influences the delay performance is the buffer size. Increasing the buffer size will increase the average delay. Especially under heavy input load conditions, a large buffer size will cause too much delay and seriously degrade the quality of service.

As the network size N + CO, all the output streams are indistinguishable. We can roughly assume the total arrivals to a single output port as a Poisson Process and each output forms an M / D / l queue. The average waiting

322

time D then becomes [13]

which gives the upper bound for the cell delay.

3.4 Cell loss probability Dividing the throughput T P by the effective input load pJf, we obtain the cell success probability [14]. There- fore

(9) T P

P,[cell loss] = 1 - - PefJ

The relation between the peff and the cell loss probabil- ity with different maximum copy number, and the buffer size is shown in Fig. 10. It indicates that under low input

10-1

Fig. 10 16 x 16) _ _ _ ~ simulation results ~ numerical results The first number in parentheses represents maximum copy numbertinput The second number in parentheses represents buKer sizetinput

Cell loss probability us. effectiue input load (network size

load, the cell loss probabilities are all negligible, but as the input load increases, the cell loss probability is very sensitive to pelf. Especially when peff > 0.9, the cell loss probability increases fast.

Finally, we would like to point out that the network size of N is also an important influence on the network performance. As N increases, the simulation results show that both T P and the average delay increases, and the average cell loss probability decreases even though the buffer size/input is fixed. This is predictable because, under these conditions, the probability of a large number of inputs becoming active or idle at the same time is very small, and the whole input stream becomes quite uniform. In other words, the overflow probability of the inputs decreases.

4

A simulation program has been developed to compare the proposed architecture with three other architectures under uniform traffic patterns. The four models con- sidered along with their characteristics are listed in Table 2.

Simulation results of a 16 x 16 switch are shown in Figs. 11, 12 and 13. Fig. 11 shows the network through-

Comparisons with other related work

I E E Proc.-Commun., Vol. 141, No. 5, October I994

Page 7: Design of a high performance nonblocking copy network for multicast ATM switching

put of the four models as a function of pel, when the buffer size of each input is equal to 1. It shows that both the shared-memory and cell-splitting mechanisms are efli- cient in improving the throughput of the copy network.

l . O r

0.5 - 0.6 0.7 0.8 0.9 1.0

effective input load

Fig. 11 0-0 mwJelA A-A modelB 0-0 mcdelC 0-0 modelD Buffer size/input = 1

Throughput US. effective input load (network size I6 x 16)

0.8 0.9 1.0 [live input load

a

Fig. 12 Average delay us. effectiue input load (network sire I6 x 16)

Table 2: Simulation models

Model Input queueing mechanism Cell splitting

Model A shared memory Y [paper's approach]

Model B [2] shared memory n Y Model C [3] dedicated buffer

Model D [l] dedicated buffer n

In fact, if we compare the two approaches of using or not using the cell-splitting mechanism, we find that T P , > TP, and TP, > T P , , which indicate the contribution of the cell-splitting mechanism to the copy-network throughput. In the same way, if we compare the two approaches of using the shared memory or using the dedicated memory, we see that T P , > T P , and TP, > TP,, which indicate the contribution of the shared- memory mechanism to the copy-network throughput. In general Model A has the best throughput performance.

The average cell delay increases as the p,,, becomes larger. Under the same pe,, conditions, the shared- memory architecture causes longer delay, which is due to the increase of the effective buffer, as seen in Fig. 12 (Delay, > Delay,, and Delay, > Delay,). However, by introducing the cell-splitting scheme, delay can be reduced (Delay, < Delay,, and Delay, < Delay,).

Fig. 13 shows the cell loss probability of the four models as a function of pa,,. The two sets of curves illus-

0.6 0.7 0.6 0.9 1.0 effective input load

b

0-0 mcdelA A-A mcdelB 0-0 modelC 0-0 modelD (1 Buffer size/input = 1 b Buffer ske/input = 5

0.6 0.7 0.8 0.9 1.0 0.6 0.7 0.6 0.9 1.0

a b effective input load effective input load

Fig. 13 0-0 modelA A-A modelB 0-0 modelC 0-0 mcdelD a Buffer sue = 1

IEE Proc.-Commun., Vol. 141, No. 5, October I994

Average cell loss probability us. effective input load (network sire I6 x 16)

b Buffer size = 5

323

Page 8: Design of a high performance nonblocking copy network for multicast ATM switching

trate the results with average buffer sizes/input of 1 and 5, respectively. It can be seen that under the same rout- ing mechanism, the shared-memory buffering approaches result in a lower cell loss probability than the ones that are not using the mechanism. Also one can see that as the buffer size increases the average cell loss probability can be reduced. It is clear that Model A gives the lowest cell loss probability.

5 Conclusions

This paper presents a new copy-network architecture with many features. These include access fairness, maximum copy-network throughput, and shared- memory buffering, which provide an efficient method for storing blocked cells. Also, this design preserves cell sequences due to the adopted pure forward network architecture and FIFO service discipline. The distributed control mechanism is desirable for VLSI implementation and scales up well to large size switching applications.

Modelling and performance evaluation show that the proposed architecture has a very good performance com- pared with other previous designs.

The method used in the performance evaluation of the proposed copy-network, architecture can also be used for any shared-memory with cell-splitting scheme based copy networks.

6 References

1 LEE, T.T.: ‘Nonblocking copy network for multicast packet switch- ing’, IEEE J. Sel. Areas Commun., 1988,6, (9). pp. 1455-1467

2 BIANCHINI, R.P., Jr., and KIM, H.S.: ‘Design of a nonblocking shared-memory copy network for ATM’. Proceedings of IEEE InfocomP2, Florence, Italy, 1992, pp. 876-885

3 CHANG, C.-J., and LING, C.J.: ‘Overflow controller in copy network of broadband packet switch’, Electron. Lett., 1991, 27, ( l l ) ,

4 ZHONG, W., ONOZATO, Y., and KANIYIL, J.: ‘A copy network with shared buffers for large-scale multicast ATM switching’, IEEE/ ACM Trans. Networking, 1993,1, (2), pp. 157-165

5 PATTAVINA, A.: ‘Nonblocking architectures for ATM switching’, IEEE Commun. Mag., 1993,31, (2), pp. 38-48

6 HUI, J.Y.: ‘Switching and trallic theory for integrated broadband networks’ (Kluwer Academic Publishers, Boston, Mass., 1990)

7 KIM, C.-K., and LEE, T.T.: ‘Call scheduling algorithms in a multi- cast switch’, IEEE Trans. Commun., 1992,40, (3), pp. 625-635

8 LIU, X., and MOUFTAH, H.T.: ‘Overflow control in multicast net- works’. Proc. of Canadian Conf. on Electrical and Computer Engin- eering, Vancouver, B.C., 1993, pp. 542-545

9 HUANG, A., and KNAUER, S.: ‘Starlite: a wideband digital switch’. Proc. GLOBECOM84.1984, pp. 121-125

10 HUI, J.Y., and ARTHURS, E.: ‘A broadband packet switch for integrated transport’, IEEE J. Sel. Areas Commun., 1987, 5, (8), pp. 1261-1273

11 LIU, X., and MOUFTAH, H.T.: ‘Overflow control architecture in copy networks’. Poster in TRIOPTRC Researcher Retreat, Peter- borough, Ontario, Canada, 1993

12 KIM, H.S.: The nonblocking property of reverse banyan network, IEEE Trans. Commun., 1992,40, (3), pp. 472-476

13 KLEINROCK, L.: ‘Queueing systems, Vol. 1 : Theory’ (Wiley, New York, 1975)

14 HLUCHYJ, M.G., and KAROL, M.J.: ‘Queueing in high- performance packet switching’, IEEE J. Sel. Areas Commun., 1988, 6, (9), pp. 1587-1597

pp. 937-939

324 IEE Proc.-Commun., Vol. 141, No. 5 , October I994