16
A predictive dynamic output buffer reconfiguration (PDOBR) architecture for ATM networks Sumit Ghosh * Computer Science Department, The University of Texas, Tyler, TX 75799, USA Received 11 July 2005; received in revised form 4 June 2006; accepted 6 June 2006 Available online 27 June 2006 Abstract The traditional output buffer architecture in ATM switches is realized in one of two forms – shared or separate. In the shared design, all of the N output links utilize a large common buffer, of size S cells, in an effort to achieve efficient use of the total buffer space. Under the separate buffer architecture, distinct buffers, each of size S N cells, are assigned to every output link, the goal being to realize fair buffer usage by every individual output link. For both architectures, however, the buffer organization is determined, permanently, at the time of the switch fabric design, and may not be altered during actual operation. The highly dynamic and stochastic nature of ATM traffic poses a key difficulty for this tradition. This paper introduces a radically new approach, termed Predictive Dynamic Output Buffer Reconfig- uration (PDOBR) architecture, wherein the output buffer organization in the switch fabric is reconfigured dynamically, i.e., during net- work operation, under the call processor’s control, such that the network incurs minimal cell drop stemming from buffer overflow. Under PDOBR, the output buffer at every node of the ATM network is organized in the form of separate buffers for each output link, of size S N cells, plus a ‘‘floating’’ buffer of the same size, that may be appended, at runtime, to any one of the output links to augment its net buffer capacity. In contrast to the shared buffer that experiences severe congestion and gross unfairness under bursty traffic, separate buffers result is fairness in buffer availability between the output links, yielding efficient behavior. Utilizing its knowledge of the successful user call requests and the magnitudes of the corresponding sustained cell rate (SCR) bandwidth requests, the call processor at every ATM node computes the net bandwidth commitment for each of its output links through a simple summation of the corresponding SCR val- ues. The call processor then compares the net bandwidth commitment for every output link against an empirical threshold, obtained through systematic analysis, to predict and identify a single output link, if any, that is likely to incur relatively high cell traffic. The ‘‘floating’’ buffer is then appended to the output link prior to launching the corresponding user’s traffic. Experiments confirm logical thinking that an increase in the size of the ‘‘floating’’ buffer should result in a superior efficiency for PDOBR. Further analysis reveals that a combination of faster technology, FAST Schottky TTL, and a threshold setting of 70 Mb/s yields a very high throughput of 85.05% and low cell drop rate of 14.95%. However, the highest performance is achieved – namely a high throughput of 91.56% coupled with a low cell drop rate of 8.44%, when the ‘‘floating’’ buffer size is increased by 18.98%, the difference, DL 2 , in the absolute cell drop rates resulting from 70 Mb/s and 100 Mb/s threshold scenarios, and the threshold is set at 70 Mb/s. Ó 2006 Elsevier B.V. All rights reserved. Keywords: Output buffer management; Dynamic buffer reconfiguration; Buffer architecture; ATM network; Call setup; PNNI; Packet switching networks; Modeling and simulation 1. Introduction Given the statistical nature of the traffic, buffering in any packet switch is unavoidable [1]. The underlying reason is that two or more packets or cells within the same time slot may be simultaneously destined for the same output port, even where the switch is characterized as output non-block- ing. Under these circumstances, to avoid cell drop from buffer overflow, buffers must be associated with every out- put port of the switch. In addition, when the traffic arrival is bursty, i.e., abrupt changes in the number of cell arrivals 0140-3664/$ - see front matter Ó 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.comcom.2006.06.002 * Tel.: +1 9035655566; fax: +1 9035655607. E-mail address: [email protected] www.elsevier.com/locate/comcom Computer Communications 29 (2006) 3609–3624

A predictive dynamic output buffer reconfiguration (PDOBR) architecture for ATM networks

Embed Size (px)

Citation preview

www.elsevier.com/locate/comcom

Computer Communications 29 (2006) 3609–3624

A predictive dynamic output buffer reconfiguration(PDOBR) architecture for ATM networks

Sumit Ghosh *

Computer Science Department, The University of Texas, Tyler, TX 75799, USA

Received 11 July 2005; received in revised form 4 June 2006; accepted 6 June 2006Available online 27 June 2006

Abstract

The traditional output buffer architecture in ATM switches is realized in one of two forms – shared or separate. In the shared design,all of the N output links utilize a large common buffer, of size S cells, in an effort to achieve efficient use of the total buffer space. Underthe separate buffer architecture, distinct buffers, each of size S

N cells, are assigned to every output link, the goal being to realize fair bufferusage by every individual output link. For both architectures, however, the buffer organization is determined, permanently, at the time ofthe switch fabric design, and may not be altered during actual operation. The highly dynamic and stochastic nature of ATM traffic posesa key difficulty for this tradition. This paper introduces a radically new approach, termed Predictive Dynamic Output Buffer Reconfig-uration (PDOBR) architecture, wherein the output buffer organization in the switch fabric is reconfigured dynamically, i.e., during net-work operation, under the call processor’s control, such that the network incurs minimal cell drop stemming from buffer overflow. UnderPDOBR, the output buffer at every node of the ATM network is organized in the form of separate buffers for each output link, of size S

Ncells, plus a ‘‘floating’’ buffer of the same size, that may be appended, at runtime, to any one of the output links to augment its net buffercapacity. In contrast to the shared buffer that experiences severe congestion and gross unfairness under bursty traffic, separate buffersresult is fairness in buffer availability between the output links, yielding efficient behavior. Utilizing its knowledge of the successful usercall requests and the magnitudes of the corresponding sustained cell rate (SCR) bandwidth requests, the call processor at every ATMnode computes the net bandwidth commitment for each of its output links through a simple summation of the corresponding SCR val-ues. The call processor then compares the net bandwidth commitment for every output link against an empirical threshold, obtainedthrough systematic analysis, to predict and identify a single output link, if any, that is likely to incur relatively high cell traffic. The‘‘floating’’ buffer is then appended to the output link prior to launching the corresponding user’s traffic. Experiments confirm logicalthinking that an increase in the size of the ‘‘floating’’ buffer should result in a superior efficiency for PDOBR. Further analysis revealsthat a combination of faster technology, FAST Schottky TTL, and a threshold setting of 70 Mb/s yields a very high throughput of85.05% and low cell drop rate of 14.95%. However, the highest performance is achieved – namely a high throughput of 91.56% coupledwith a low cell drop rate of 8.44%, when the ‘‘floating’’ buffer size is increased by 18.98%, the difference, DL2, in the absolute cell droprates resulting from 70 Mb/s and 100 Mb/s threshold scenarios, and the threshold is set at 70 Mb/s.� 2006 Elsevier B.V. All rights reserved.

Keywords: Output buffer management; Dynamic buffer reconfiguration; Buffer architecture; ATM network; Call setup; PNNI; Packet switching networks;Modeling and simulation

1. Introduction

Given the statistical nature of the traffic, buffering in anypacket switch is unavoidable [1]. The underlying reason is

0140-3664/$ - see front matter � 2006 Elsevier B.V. All rights reserved.

doi:10.1016/j.comcom.2006.06.002

* Tel.: +1 9035655566; fax: +1 9035655607.E-mail address: [email protected]

that two or more packets or cells within the same time slotmay be simultaneously destined for the same output port,even where the switch is characterized as output non-block-ing. Under these circumstances, to avoid cell drop frombuffer overflow, buffers must be associated with every out-put port of the switch. In addition, when the traffic arrivalis bursty, i.e., abrupt changes in the number of cell arrivals

3610 S. Ghosh / Computer Communications 29 (2006) 3609–3624

occur for very short durations, the issue of cell drop is fur-ther aggravated and requires even larger buffers to achievethe same cell drop probability, compared to a scenario ofuniform traffic arrival. Although the introduction of buf-fers of infinite capacity promises to eliminate all cell dropand imply superior performance for any traffic type, inreality, buffer sizes are finite. The traditional output bufferarchitecture comes in two forms – separate buffer andshared buffer. Under the separate buffer architecture, dis-tinct and dedicated buffers are assigned to each output linkof any switch. The result is fairness in buffer availabilitybetween the output links, yielding efficient behavior inthe presence of bursty traffic. However, under intense traf-fic, when one or more buffers of specific output links arefull, cells are dropped despite available space in the buffersof other output links. In contrast, the use of a shared bufferarchitecture, where all output links share a common buffer,significantly reduces the amount of memory required toachieve a given cell loss performance under uniform traffic[2–7], regardless of the buffer management strategyemployed. Pashan, Soneru, and Martin [7] observe thatfor a 8 · 8 switch with output buffering and a net bufferof capacity 8000 cells, under input traffic with an averageburst length limited to five cells, and cell loss probabilitylimited to 10�5, while the shared buffer architecture sus-tains up to 88% of the load, the separate buffer designcan only carry up to 45% of the load. However, Causeyand Kim [8] report that when the ratio of the average burstlength to the number of buffer spaces per output linkincreases, the shared buffer is often filled up to 80–90% withcells corresponding to a single output link, resulting insevere congestion and gross unfairness. In conclusion,while the shared and separate buffer architectures appearto function well for non-bursty and relatively bursty traffic,respectively, neither one of them is an ideal match for bur-sty input traffic.

Kwon, Kim, Park, Yoon, and Cho [9] propose an ana-lytical model for output buffer analysis in multiple pathnetworks relative to non-uniform traffic distribution andshow that the performance of the output buffer improveswith increasing non-uniformity of output traffic distribu-tion. Lee and Ahn [10] derive an exact solution for packetloss performance of a N · N non-blocking space divisionATM switch with finite-sized input and output buffers withspeed-up. For given packet loss probability of 10�9 andoffered load up to 0.8, their studies show that optimalpacket loss performance is achieved only at speed-up of 2and not at larger values, for moderate input and outputbuffer sizes (less than 30 packet size). Utilizing simpleand accurate Markov chain performance analysis, Shiand Wing [11] emphasize that a good compromise betweenperformance, cost of implementation, and complexity isachieved through input and output buffered switch archi-tecture. Shi, Abbasi, Zukowski, and Wing [12] claim fur-ther that their studies on non-linear and complexrelationship between packet loss probability and averagepacket delay for an input–output buffered ATM switch

yields insights into switch behavior. Qigang and Shujing[13] report analytical studies on a partially shared outputbuffer architecture for non-priority, high-priority andlow-priority service schemes. Their results reveal that thecell loss probabilities for all services descend rapidly whilethe delays descend slowly with increasing size of the sharedoutput buffer. Kraimeche [14] utilizes the cell loss prioritybit to organize voice and data cells at the output intotwo virtual queues. They compute waiting time and lossperformance for the two queues and claim that their strat-egy is effective in maintaining a balance between the qualityof service (QoS) objectives of the two traffic types. Minken-berg, Engbersen, and Colmant [15] combine output buffer-ing with N separate queues at every input port, each queuecorresponding to an output port. They claim high perfor-mance, generally traffic-insensitive, in their simulationsand report an available implementation in the form ofthe PRIZMA-E switch.

Fan, Ishii, Mark, Ramamurthy, and Ren [16] propose adynamic queue threshold algorithm wherein, for a givenshared memory buffer, dynamic traffic considerations dic-tate the allocation for each output port. They claim, unlikeChoudhury and Hahne’s approach [17], their scheme utiliz-es the buffer completely. In both studies, the results are lim-ited to a single buffer. Rajan and Chu’s [18] study focuseson variable-sized packets; applies to the edge nodes, i.e.,those located at the periphery of a network and directlyintercepts user traffic; and the simulation results are limitedto a single buffer.

To achieve an efficient output buffer management strat-egy, logically, the best elements of separate and sharedbuffer policies needs to be combined into a new strategy.The ideal solution would be one that permits the bufferto be shared among different output links while maintain-ing fairness under intense bursty traffic. This paper investi-gates a new approach.

The remainder of this paper is organized as follows. Sec-tion 2 motivates the need for a new approach, logicallydeduces its key characteristics, and then introduces the Pre-dictive Dynamic Output Buffer Reconfiguration (PDOBR)technique. To validate PDOBR, Section 3 describes themodeling of PDOBR for a representative ATM network,and Section 4 details the stochastic generation of realisticinput traffic stimulus. Section 5 presents the design of thesimulation experiments, the results of the simulations,and a comparative performance analysis of PDOBR versusthe traditional output buffer architecture, for the represen-tative ATM network under stochastically generated realis-tic input traffic stimuli. Finally, Section 6 presents someconclusions.

2. Predictive Dynamic Output Buffer Reconfiguration

(PDOBR)

The motivation underlying the PDOBR design may beexpressed as follows. To successfully integrate the bestelements of shared and separate buffer architectures, a

S. Ghosh / Computer Communications 29 (2006) 3609–3624 3611

proposed design may start with the dedicated output bufferstructure to ensure fairness and introduce a single addition-al ‘‘floating’’ buffer that may be accessed, upon need, byany of the outgoing links. By augmenting its dedicatedbuffer with the ‘‘floating’’ buffer, a specific outgoing linkmay lower its cell loss due to buffer overflow when con-fronted with excess cells. Thus, the ‘‘floating’’ buffer incor-porates the characteristic behavior of the shared buffer.Since any of the outgoing links must succeed in utilizingthe ‘‘floating’’ buffer, multiplexers must be positioned ateach outgoing link to control cell redirection, while the net-work is under operation. The underlying control must pos-sess two characteristics. First, it must be inherently efficientso as to minimize the inevitable cell loss during cell redirec-tion. It is pointed out that the switch fabric operates in thetime scale of nanoseconds (ns). Second, the control must begoverned by a mechanism that is capable of predicting,with reasonable accuracy, the link utilization, i.e., whichof the output links is likely to incur an excessive numberof cells in the near future. The more precise the prediction,the higher the increase in the number of excessive cells atthe output link in question, and faster the logic technologyunderlying the buffer reorganization, the greater the likeli-hood of PDOBR’s contribution. Predictions invariablyinvolve a high-level of computational intelligence and theprocessing time scale is in ms, significantly slower than thatof the switch fabric. Thus, the prediction mechanism mustnecessarily transcend the ATM switch fabric.

The Predictive Dynamic Output Buffer Reconfiguration(PDOBR) strategy, introduced in this paper, is a radicallynew approach to output buffer management. It integratesa dynamically reconfigurable, hardware buffer architecturein the switch fabric with a new management scheme that isdriven by the high-level knowledge about the immediatefuture utilization of the links, acquired by the correspond-ing call processor during call establishment. UnderPDOBR, the output buffer at every node of the ATM net-work is organized in the form of separate buffers for eachoutput link, of size S

Nþ1cells, plus a ‘‘floating’’ buffer of

the same size, that may be appended, at runtime, to anyone of the output links to augment its net buffer capacity.The parameters N and S refer to the number of outputlinks and the maximum number of buffer slots, for theswitch. Utilizing its knowledge of the successful user callrequests and the magnitudes of the corresponding sus-tained cell rate (SCR) or peak cell rate (PCR) bandwidthrequests, the call processor at every ATM node computesthe net bandwidth commitment for each of its output linksthrough a simple summation of the corresponding SCR orPCR values. The call processor then compares the netbandwidth commitment for every output link against anempirical threshold, obtained through systematic analysis,to predict and identify a single output link, if any, that islikely to incur relatively high cell traffic. PDOBR thenappends the ‘‘floating’’ buffer to the output link prior tolaunching the corresponding user’s traffic. The reorganiza-tion remains in effect until, triggered by the termination of

previously initiated user calls and new successful user callrequests, the call processor computes a subsequent recon-figuration. The thinking is that despite stress from heavytraffic, the increased buffer capacity will lower cell lossdue to buffer overflow for the output link. Even thoughthe reorganization occurs at call processing speeds, ratedat ms or slower, the stochastic nature of the call requestsand the highly bursty nature of the cell traffic are likelyto yield significant reduction in cell drop in the overall net-work. The thinking is assessed scientifically through model-ing of a representative ATM network under PDOBR andsimulation for a realistic traffic stimuli on a testbed thatclosely resembles an operational network. In contrast, dur-ing buffer reorganization, neither the switch is able to addnew cells into the buffer structure nor cause cells, held inthe buffer, to be transported across the output links. Thus,cells may be lost as well as delayed. The problem is serioussince the switch fabric operates at speeds of ls or faster. Todetermine which of the two opposing effects bear a strongerinfluence, a systematic, scientific methodology, based onmodeling and simulation, is undertaken and detailed subse-quently in this paper.

3. Modeling the PDOBR approach

The PDOBR technique is modeled for a representative,15-node ATM network, that, in turn, is derived from thevery high performance Backbone Network Service(vBNS) topology, implemented by MCI [19] but notshown here. The 15-node network is shown in Fig. 1 withFig. 1a representing the high-level hierarchy with themajor US cities constituting the peer group leader nodesand Fig. 1b presenting the constituent nodes of the indi-vidual peer groups. Each peer group consists of threenodes resembling a major city flanked by minor citieswithin a 40 mile radius and the nodes are connectedthrough intra-group links. The shaded node representsthe designated peer group leader.

Fig. 2 presents an overview of the PDOBR architecturewith the switching fabric and input buffers represented bythe oval block labeled ‘‘Switching Fabric with Input Buf-fers.’’ While the outgoing links of the PDOBR correspondto those of the underlying ATM switch, the inputs to thePDOBR from the oval block represent the multiplexed out-put lines from the switching fabric to the output buffers.The underlying ATM node consists of N outgoing links,each with its associated separate buffer, and the ‘‘floating’’buffer. At the front end of PDOBR, a demultiplexer deter-mines, for each one of N output links from the switch,whether the incoming cell is forwarded to the shared buffercorresponding to the target outgoing link of the cell or tothe ‘‘floating’’ buffer. In the event, the cell is forwardedto the ‘‘floating’’ buffer, a multiplexer then connects the‘‘floating’’ buffer to the shared buffer corresponding tothe target outgoing link to forward cells from the ‘‘float-ing’’ buffer. While the redirection circuitry may influencethe arrival of cells in the separate buffers, the rate at which

DENVER

CHICAGO

HOUSTONLOS ANGELES

1566 mi

1257 mi

1011 mi

380 mi

1085 mi

DENVER

CHICAGO

HOUSTONLOS ANGELES

SAN JOSE

SANTA CRUZ

IRVINE SANTA MONICA

PASADENA

AURORA

EVANSTON

JOLIET

RIDGE

ENGLEWOOD

LAKEWOOD

RICHMOND

BAYTOWN

PASADENA

(1257 mi)2500 :

(1011 mi)2000 :

(1085 mi)2143 :

(380 mi)750 :

(1566 mi)3093 :

80 : (40 mi)

80 : (40 mi)

80 : (40 mi)

80 : (40 mi)

80 : (40 mi)

80 : (40 mi)

80 : (40 mi)80 : (40 mi)

80 : (40 mi)

80 : (40 mi)

80 : (40 mi)

80 : (40 mi)

80 : (40 mi)

80 : (40 mi)80 : (40 mi)

1 2

3

45

6

78

9

10

11

12

13

14

15

SAN FRANCISCO

SAN FRANCISCO

BERKELEY

PEER GROUP

PEER GROUP

PEER GROUP

PEER GROUP

PEER GROUP

PEER GROUP

PEER GROUP

PEER GROUP

PEER GROUP

PEER GROUP

a

b

Fig. 1. Representative 15-node ATM network.

3612 S. Ghosh / Computer Communications 29 (2006) 3609–3624

cells are extracted is determined by the link speed and isindependent of the redirection circuitry.

Since the underlying switch consists of N incoming andN outgoing links, each rated at a speed of V b/s in thispaper, the core of the switch must sustain a cell transferrate given by,

T t ¼cell sizeN � V

¼ 2:73 ls

N; ð1Þ

where the cell size is 53 bytes and the link speed, V, is155.5 Mb/s. For a 8 · 8 switch fabric, the time for to prop-agate from an incoming link to an outgoing link, is 342 ns.

In the ATM node architecture, multiple processors[20, 21] are engaged in processing calls. Since the multiplex-ers are controlled by the call processors and as one or morecall processor may desire the ‘‘floating’’ buffer to be associ-ated with a given outgoing link, an arbiter is necessary toarbitrate between multiple requests. The link for which

L1

L2

L3

L4

Ln

K

L2’

L3’

L4’

Ln’

L1’

Shared OverflowBuffer

Separate OutputBuffers

OutgoingLinks

muxcontrol

OUTPUT BUFFERCONTROLLER

FROM CALLPROCESSOR

Call Processor

Dmuxselector

muxselector

Dmux

SWITCHING FABRIC WITH INPUT BUFFERS

Multiplexed Output Lines From Switching Fabric

Fig. 2. An overview of the PDOBR architecture.

S. Ghosh / Computer Communications 29 (2006) 3609–3624 3613

the difference between the projected, dynamic bandwidthutilization and the predetermined threshold is the highest,is selected as the candidate for redirection. Following auser call request, call setup is initiated and, upon successfulestablishment, the call processor notes the bandwidth utili-zation for the link that is slated to carry the call. Where theprojected utilization is very high, exceeding the empiricallydetermined threshold, it is likely for the link to incur signif-icant cell drop. To address the potential cell drop, the callprocessor may instruct the multiplexer and demultiplexerin the PDOBR interface to augment the capacity of theseparate buffer for the link with the ‘‘floating’’ buffer. Con-ceivably, cells may be lost during buffer reorganization.However, given that the time scale of buffer redirection islikely to be in ls or even ns, relative to the arrival rate ofuser call requests in ms, the savings from the augmentedbuffer over a much longer time period are expected to sur-pass significantly the number of cells lost duringredirection.

The multiplexer and demultiplexer designs used in thisresearch reflect conservative, easily available, off-the-shelfcomponents and their electrical characteristics includingthe maximum average propagation delay values are

obtained from the manufacturer data sheets [22–26]. Atotal of four technologies – TTL, CMOS, FACT, andFAST, are used to yield a comparative analysis of the mostefficient PDOBR design. The TTL technology is the mostubiquitous bipolar digital integrated circuit family and itsadvantages include high speed and low cost. The CMOStechnology leads in low-power consumption, wide rangeof power supply voltages, and high noise immunity. Fur-thermore, standard libraries for automated designs areavailable from manufacturers. The FACT technology, inessence, is advanced, low-power CMOS. It offers speedsrivaling advanced low-power Schottky technology withoutcompromising the low-power characteristic of CMOS. TheFAST Schottky TTL logic improves over the standardSchottky TTL by reducing the power consumption by75–80%. Also, in FAST, the use of a lower input currenttranslates into higher fanout.

The ‘‘floating’’ buffer is organized as a first-in-first-out(FIFO) storage structure and may be implemented utilizingeither a dual port RAM or register file that permits simulta-neous and synchronous read and write. The control unit andthe addressing logic underlying the buffer design are basedon [27]. The FIFO operations are as follows. A stream write

3614 S. Ghosh / Computer Communications 29 (2006) 3609–3624

into the FIFO occurs when the WRITE clock is asserted,and continues until the FIFO is full when a flag – FULL,is raised. A read stream from the FIFO commences whenthe READ clock is asserted and continues until the FIFOis empty, at which point the flag – EMPTY, is raised. Twocounters – write pointer (WP) and read pointer (RP), pro-vide the addresses during the write and read phases, respec-tively. While WP is incremented following a write, RP isdecremented on a read operation. A ZERO detect on thecounters indicates whether the buffer is EMPTY or FULL.To ensure a representative and practical size for the ‘‘float-ing’’ buffer, the state-of-the-art ForeRunner ASX-200 andASX-1000 [28] ATM switches are analyzed to reveal an out-put buffer size of 13,312 ATM cells. This implies a total out-put buffer capacity of (N · 13,212) cells, given that the totalnumber of buffers equals the number of outgoing links, N.Given that PDOBR utilizes a total of N separate buffersand a single ‘‘floating’’ buffer, logic and reason dictateschoosing an identical size for each of the (N + 1) buffers.Thus, the size of the ‘‘floating’’ and separate buffers is givenby, N�13;312

Nþ1. Furthermore, in this study, the FIFO is assumed

to be a 60 ns DRAM.Utilizing the propagation delays for the constituent ele-

ments – FIFO register, multiplexer, and demultiplexers,from the manufacturer’s data sheets, the total average redi-rection delay for the output buffer architecture, under eachof the four technologies, is shown in Table 1. Given itsexcellent characteristics, especially related to its low-cur-rent and low-voltage amenability, and low-power con-sumption, the CMOS technology is studied extensively inthis research despite exhibiting worse delay than TTL,FACT, and FAST technologies. The thinking is thatalthough redirection speed is faster with TTL, FAST,and FACT, implying lower cell loss, with CMOS technol-ogy, the results not only serve as a conservative referencepoint for industry to consider adopting the PDOBRapproach in future switch designs but also focus on a tech-nology that is popular in industry.

For the call processor, the decision to augment the sep-arate buffer of a specific outgoing link with the ‘‘floating’’buffer is followed by the initiation of the multiplexer redi-rection control signals, and is governed by the followinginequality:

XK

i¼1

ðSCRirÞP UT ; ð2Þ

where SCRir is the sustained cell rate requested by a user call

with reference ‘‘i’’ on outgoing link ‘‘r,’’ K is the total num-

Table 1Average redirection time for the output buffer architecture under PDOBR

Technology DMUX (1-to-2) (ns) DMUX (1-to-N) (ns)

CMOS 88 111FACT 33 38FAST 27 32TTL 72 55.6

ber of user calls currently utilizing link ‘‘r,’’ and UT is thethreshold for the link bandwidth utilization, the determina-tion of which is discussed in detail in Section 5.

4. Input traffic modeling and synthesis of input traffic

distributions

The choice of the input traffic and other network inputparameters in network simulation is crucial towardsobtaining realistic performance data and useful insightsinto the network behavior. The call processing architectureis distributed and employs 10 processors. Traffic generationrepresents a careful tradeoff between the goal of exposingthe network to worst case stress and examining its behaviorand the need to ensure a stable operational network, onethat executes continuously, 24 h a day, 365 days a year,without failing. While much work has been reported inATM traffic modeling, the literature on call setup requestsis sparse. Given the lack of large-scale, operational ATMnetworks in the public domain, operating under the modeof switched virtual circuits, actual data on call setuprequests from operational networks are difficult to obtain.In this research, while the synthetic traffic is stochastic, it isdesigned to resemble an operational system. The key inputtraffic parameters include, (1) call arrival distribution, (2)bandwidth distribution in the calls, (3) call duration distri-butions in the calls, (4) traffic mix, i.e., the relative percent-age of inter- and intra-group calls in the entire network,and (5) the low-level traffic models. With the exception ofitem 4, the distributions in items 1 through 5 are generatedstochastically and the choice of the key parameters areexplained subsequently. Following their generation, trafficstimuli are saved in files for use during simulation.

4.1. Call cluster arrival distribution and network stability

criterion

To emulate a network under intense stress, calls areassumed to arrive in clusters, at times defined by theinter-arrival interval. The number of calls in the clustersare stochastically generated, subject to a maximum of 25.To determine an appropriate call arrival distribution forthe representative network modeled, call clusters are gener-ated through a Poisson distribution function and the net-work is simulated for different choices of the mean valueof the distribution. For each simulation experiment, graphsof the call setup times as a function of simulation time, forevery pair of source and destination nodes, are obtainedand analyzed. Where any of the graphs exhibit a

MUX (N-to-1) (ns) FIFO (ns) TOTAL (ns)

128 579 90625 150 24620 158.5 23763 252 442.6

S. Ghosh / Computer Communications 29 (2006) 3609–3624 3615

non-uniform behavior with the call setup time increasingconsistently with the progress of simulation, the networkis considered to be driven into an unstable region by theexcessive call arrival density. The argument is that, for net-works to remain operationally stable, a graph of the call set-up time as a function of simulation time, for any given pairof source and destination nodes, must remain uniform.Where all of the graphs exhibit uniform behavior, the net-work is considered within the stable operating region.Through trial and error, i.e., by executing a number of sim-ulations for different choices of the mean of the Poisson dis-tribution, this study yields a call cluster arrival distributionthat stresses the network to the edge of stability [29,30,20].

In this study, user calls constitute one of two forms oftraffic – data and voice/audio. While the parameters foraudio traffic are obtained from [31], the choice of theparameters for the data traffic is explained as follows.

This paper argues that the trend of short call durations,of ftp, http, and email message type, will gain increasingprominence in the future, fueled by our impatience andthe rapid growth in the link bandwidths that is alreadyout-pacing the expected increase in the average messagesize transported across the networks. Trace analysis showsthat the current Internet traffic is already dominated by ftpand http transfers of data and images and that their dom-inance is increasing. This paper reasons that while the ubiq-uitous T-1 line of the recent past, rated at 1.5 Mb/s,constitutes the lower bound of the user bandwidth require-ment, the 25–80 Mb/s bandwidth rating of high definitionTV (HDTV) constitutes the upper bound. Between thesetwo extremes, the assumption of 20 Mb/s bandwidth avail-ability for the average user call in the future, is logical andreasonable. Assuming a typical 1 Mbit file size for ftp [32],the 20 Mb/s bandwidth translates into a 50 ms call dura-tion. In this study, the representative call duration for ftpis assumed at 50 ms. However, to achieve higher call densi-ty in the network, i.e., accommodate more user calls whilemaintaining backbone link bandwidth at 155.5 Mb/s, theuser bandwidth distribution is set at 1–8 Mb/s, yieldingan average of 4 Mb/s. While calls originate at all nodesin the network with uniform distribution, the destinationsof the calls are stochastically generated. Last, 90% of allcall requests involve data traffic while 10% relate to audio,and user calls follow a distribution of 80% intra-group and20% intergroup calls. Stability analysis, utilizing the aboveparameters, reveal an intercluster arrival rate of 18 ms forthe stability criterion point which is utilized in the remain-der of the experiments.

4.2. Cell level traffic parameters

The lower-level traffic models utilized in this study maybe explained as follows. A two-state Markov processdescribes the audio traffic model with ON/OFF periodsof activity that utilize the basic parameters provided in[33]. The mean of the active period interval (ON) is352 ms while that of the silent period interval (OFF) is

650 ms. The intercell duration within the ON period is2112 ls and the maximum burst length is 150 cells. Theftp type data traffic generator utilizes an ON/OFF Markovchain model with the basic parameters extracted from [31].The maximum burst length is 2000 cells.

5. Simulation experiments, results, and performance analysis

The key objectives of this section are to assess the prac-tical feasibility of PDOBR and to identify scenarios, if any,under which PDOBR may exhibit superior performanceover the traditional ATM switch fabric design. For a com-parative analysis, two simulators are synthesized – first, asimulation corresponding to the traditional ATM nodethat is intended to serve as a reference, and a second simu-lation that corresponds to the representative ATM networkwith the ATM nodes augmented with PDOBR. The syn-thetic input traffic is based on two assumptions of lowersession durations and higher average user bandwidthsand, as both of these trends are likely to be widely preva-lent in the future, the results and analysis reported herereflect the practical value of PDOBR.

The independent parameters for this study include a (1)set of three choices for the threshold – 50 Mb/s, 70 Mb/s,and 100 Mb/s, (2) a set of three choices for the buffer size– 11,832 cells for every separate and ‘‘floating’’ buffer, anincrease by 13.53% for all buffers, and an increase of18.98% over the size of 11,832 cells only for the ‘‘floating’’buffer, and (3) a set of four underlying technologies imply-ing different redirection durations as listed in Table 1. Thereasons underlying these choices are detailed subsequently.The output metrics in this study, constitute the dependentvariables, and include the (1) throughput, i.e., the percent-age of user cells successfully transported through each nodeof the representative ATM network, (2) average through-put over all nodes of the representative ATM network,(3) cell loss at the multiplexer and demultiplexers duringredirection, both at each node and averaged over all nodesof the network, (4) the occurrences and timing of the redi-rections as a function of the simulation time at specificlinks of given nodes of the network, (5) cumulative cellsdropped at the separate buffers and the floating buffer,for each of the nodes of the ATM network, (6) averagenumber of cells dropped at the separate and ‘‘floating’’ buf-fers throughout the entire network, (7) cumulative cells ser-viced at the separate buffers and the floating buffer, foreach of the nodes of the network, and (8) average numberof cells serviced at the separate and ‘‘floating’’ buffersthroughout the entire network.

This study is structured through a series of nine experi-ments, organized into three parts. Part 1 aims at a compar-ative and systematic analysis of the traditional outputbuffer architecture versus PDOBR, and to determine theimpact of the choice of the threshold value, if appropriate.Experiment 1 consists of a set of simulations of the repre-sentative network in Fig. 1 where the traditional outputbuffer architecture defines the switch fabric of every

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

90.00

100.00

5.00 10.00 15.00

Node ID

50 Mbits/sec70 Mbits/sec100 Mbits/sec

Thr

ough

put,

in p

erce

ntag

e

Traditional buffer architecture

Threshold Value in PDOBR

Fig. 3. Throughput, i.e., cells successfully transported across the switchfabric, of every node, for each of Experiments 1 through 4.

3616 S. Ghosh / Computer Communications 29 (2006) 3609–3624

ATM node. In addition to computing the key output met-rics, enumerated earlier in this section, Experiment 1 yieldsthe average link bandwidth utilization across all links in thenetwork, at 63 Mb/s. Recognizing the fact that this link uti-lization value corresponds to a synthetic input traffic thatreflects a representative though highly stressed network,this paper selects a set of three threshold values – (i)50 Mb/s as a low value and appreciably lower than theexperimental finding of 63 Mb/s, (ii) 100 Mb/s as a highvalue and much higher than 63 Mb/s, and (iii) 70 Mb/sas a reasonable intermediate choice. Logically, the choiceof a low threshold value is likely to trigger frequent multi-plexer redirections, probably causing a greater loss of cellsduring redirection than the reduction in cell loss from buff-er overflow stemming from the increased available bufferspace. In contrast, selecting a high value for the thresholdis likely to trigger multiplexer redirections only infrequent-ly, implying inefficient use of the ‘‘floating’’ buffer and sig-nificant cell loss from buffer overflow. Clearly, thedetermination of an appropriate threshold value to yieldefficiency, is important. Experiments 2 through 4 consistof three separate sets of simulations of the representativenetwork, utilizing the same synthetic input traffic. Whilethe switch fabric in every ATM node employs PDOBR,the threshold values employed in Experiments 2, 3, and4, are 50 Mb/s, 70 Mb/s, and 100 Mb/s, respectively.

Fig. 3 plots the throughput, expressed as a percentage,of the cells successfully transported across the switch fab-ric, for every node of the network, for each of the Experi-ments 1 through 4. While the throughput behavior differsfor each node for each of the four experiments, it is clearthat the performance is worst for the traditional bufferarchitecture and superior under PDOBR. The graph corre-sponding to the choice of the intermediate threshold valueof 70 Mb/s reveals that while the highest throughput value,exceeding 90%, occurs for node 8, the average throughputbehavior exceeds those corresponding to the choices of50 Mb/s and 100 Mb/s. Fig. 3 also reveals that the rangeof variation of the throughput is the highest – 75%–35% = 40%, for the traditional buffer architecture andmuch lower for each of the Experiments 2 through 4 underPDOBR, implying the role of the ‘‘floating’’ buffer insmoothening out the effects of bursty traffic.

To obtain greater insight into the overall behavior of thenetwork under the traditional buffer architecture andPDOBR, the throughput, expressed in percentage, andaveraged over all nodes of the representative network foreach of the Experiments 1 through 4, is examined. Theaverage percentage throughput over all nodes of the net-work equals 57.69%, implying that nearly half of the cellsencountered by the respective switch fabric, are lost dueto buffer overflow. For the given bursty traffic, this reflectspoor throughput. A gradual increase in the throughput isobserved under the use of PDOBR, first for the case wherethe threshold is set to the highest value of 100 Mb/s, fol-lowed by the scenario where the threshold set to the lowestvalue of 50 Mb/s, and the highest performance occurs for

the use of the intermediate threshold value of 70 Mb/s.For the 100 Mb/s threshold, the average percentagethroughput over all nodes of the network equals 64.11%,a slight improvement over that for the traditional outputbuffer architecture. For a threshold choice of 50 Mb/s,the average percentage throughput over all nodes of thenetwork equals 69.56%. While an appreciable 12%improvement over the traditional output buffer architec-ture, the loss of over 30% of the cells in the switch fabric,is not completely satisfactory. Threshold value choices lessthan 50 Mb/s are likely to incur a significant increase in thefrequency of multiplexer indirection. Under these circum-stances, while PDOBR may incur significant cell loss, acommensurate improvement from the increased buffercapacity of the ‘‘floating’’ buffer, may not be realized.For threshold value choices exceeding 100 Mb/s, redirec-tions are likely to occur rarely, the ‘‘floating’’ buffer mayremain largely unused. The situation will resemble the tra-ditional output buffer architecture scenario with the excep-tion that the separate buffers under PDOBR are smaller insize. For a choice of 70 Mb/s as the threshold, the averagepercentage throughput over all nodes of the network equals83.09%, a dramatic 44% improvement over the use of thetraditional output buffer architecture. The graphs inFig. 3 establish a clear rationale that while the PDOBRarchitecture constitutes an effective improvement in cellthroughout across the switch fabric in the ATM nodes,the choice of the right, intermediate threshold value is crit-ical in achieving significant improvement.

During buffer reconfiguration, the ‘‘floating’’ buffer isdisassociated from the former outgoing link and connected

S. Ghosh / Computer Communications 29 (2006) 3609–3624 3617

to a new outgoing link, and the cells in transit through themultiplexer and demultiplexers may be lost. The graphs inFig. 4a present the cumulative cell loss incurred in the mul-tiplexer and demultiplexers during redirection, for thresh-old choices of 50 Mb/s, 70 Mb/s, and 100 Mb/s, as afunction of every node of the network. The bar graph inFig. 4b present the average cumulative cell loss over allnodes of the network, for threshold choices of 50 Mb/s,70 Mb/s, and 100 Mb/s, respectively. Fig. 4a reveals thatthe cell loss is highest corresponding to the lowest thresh-old choice of 50 Mb/s and that the loss gradually diminish-es as the threshold is increased. Clearly, this reflects theoccurrence of frequent buffer reorganizations when thethreshold is low and the infrequent redirections corre-sponding to a high threshold value. The results in Fig. 4bfurther strengthen the findings. Recall, as stated earlier,that the incoming cells from the core switch that are lostdue to buffer unavailability, are not included under cell lossin the redirection circuitry.

It may be observed in Fig. 4a that nodes 2, 8, 12, and 14incur high degree of cell loss in the redirection circuitry.The key reason is that these nodes lie at the group bound-aries and incur heavier traffic from both within the groupand from other groups. This paper argues that the behaviorof PDOBR may be better understood by focusing on oneof these highly stressed nodes, and selects node 2. Figs.5a through 5c present the cumulative redirection time inthe demultiplexers and multiplexer, in ls, for each of thethree links of node 2, under a 50 Mb/s threshold, as a func-tion of the progress of the simulation. The number of cellslost during each redirection is also marked on the graphs.While the average redirection time spans from 1.12 ls for

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

1.10

5.00 10.00 15.00

Node ID

Threshold Value

50 Mbits/sec70 Mbits/sec100 Mbits/sec3

Num

ber

of C

ells

Dro

pped

in M

ux/D

emux

Dur

ing

Red

irec

tion

x 10

a

Fig. 4. Cumulative cell loss incurred under PDOBR in the demultiplexers and mvalues, as a function of (a) each node of the network and (b) averaged over a

link 1, 1.26 ls for link 2, and 1.24 ls for link 3, the numberof cells lost during each redirection ranges between 4 and 5.The total number of cells lost for node 2 is measured at788. Although the average redirection time of 1.12 ls is lessthan the ATM cell width of 2.53 ls at 155.5 Mb/s, when aredirection occurs, it may affect the cells being written intothe separate buffers and the ‘‘floating’’ buffer, plus any cellsfrom the core switch that may be in transit through thedemultiplexer.

When the threshold is increased to 70 Mb/s, the fre-quency of redirection decreases and the total number ofcells lost for node 2 is measured at 615. With the thresholdset at 100 Mb/s, the frequency of redirection decreases evenfurther and number of cells lost for node 2 further drops to471 cells. The average redirection time remains relativelyuniform within the range {1.16 ls, 1.18 ls} and {1.19 ls,1.17 ls} for the threshold settings of 70 Mb/s and100 Mb/s, respectively.

Analysis of the results for node 8 reveal that the totalnumber of cells lost for threshold settings of 50 Mb/s,70 Mb/s, and 100 Mb/s, are 484, 480, and 476, respectively.Clearly, node 8 is less congested than node 2. Also, thecloseness of the cell loss values for different threshold set-ting imply that during the time intervals when the linkbandwidth utilization is high, the actual value exceeds eventhe high threshold setting of 100 Mb/s. As expected, theaverage redirection time remains relatively uniform.

In contrast to node 8, node 12 incurs a higher level ofcongestion than node 2. The numbers of cells lost forthreshold settings of 50 Mb/s, 70 Mb/s, and 100 Mb/s,are 892, 791, and 719, respectively. The noticeablevariation in the number of cells lost, corresponding to the

0.00

Traditionaloutput bufferarchitecture PDOBR

50 M

b/s

thre

shol

d

70 M

b/s

thre

shol

d

100

Mb/

s th

resh

old

Experiments

1 2 3 4

Ave

rage

Cel

l Dro

p in

the

Red

irect

ion

Circ

uitr

y

100.0

200.0

300.0

400.0

500.0

600.0

700.0

800.0

900.0b

ultiplexer during buffer reconfigurations, for different choices of thresholdll nodes of the entire network.

0.00

0.50

1.00

1.50

2.00

2.50

3.00

200.00 400.00 600.00 800.00

3Simulation Time x 10

–6M

ux R

edir

ectio

n T

ime

x 10

(

seco

nds)

3 4

5

cells dropped

4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4

55 5 5 5 5 5 5 5 5 5 5

Node 2 (Link 1)Threshold = 50 Mbits/secsec.μ

sec.μAvg 1.12 Std 0.385

0.00

0.50

1.00

1.50

2.00

2.50

3.00

200.00 400.00 600.00 800.00

3Simulation Time x 10–6

Mux

Red

irec

tion

Tim

e x

10

(se

cond

s)

4

Node 2 (Link 2)Threshold = 50 Mbits/secsec.μ

sec.μAvg 1.26 Std 0.45

5

cells dropped

44 4 4 44 4 4 4

5 5 5 55 5 5 5 5

0.00

0.50

1.00

1.50

2.00

2.50

3.00

200.00 400.00 600.00 800.00

3Simulation Time x 10

–6M

ux R

edir

ectio

n T

ime

x 10

(

seco

nds)

sec.μsec.μAvg 1.24

Std 0.44Node 2 (Link 3)Threshold = 50 Mbits/sec

3

5

4

cells dropped

4 44 4 4 4 44 4 4 44 4 4

5 5 5 5 5 5 5 5 5 5 5 5 5

a

c

b

Fig. 5. Cumulative redirection time in the demultiplexers and multiplexer, as a function of the simulation time and under a threshold of 50 Mb/s, for eachof the three links of node 2 – (a) link 1, (b) link 2, and (c) link 3.

3618 S. Ghosh / Computer Communications 29 (2006) 3609–3624

different threshold settings, reflect the fact that the links ofnode 12 incur a greater variation in their bandwidth utiliza-tion factors.

To gain insights into PDOBR’s role in reducing the celldrop rate, Fig. 6a plots the cumulative number of cellsdropped in the separate buffer in the non-redirected links,averaged over all separate buffers of non-redirected links,as a function of the node identifier. In contrast, Fig. 6b pre-sents the averaged cell drop rate for a redirected link minusthe averaged cell drop rate over the non-redirected link ofthe node, for every node of the network. In essence, thegraph reflects the incremental influence of the ‘‘floating’’buffer in the redirected links relative to cell drop. The firstkey observation in Fig. 6a is that the cell drop rate in the

links in Experiment 1 is the highest, higher that those ofthe non-redirected links in Experiments 2 through 4. Thus,despite larger separate buffer sizes than those in PDOBR,the traditional buffer architecture incurs higher cell loss,attesting to PDOBR’s superior performance. The secondkey observation in Fig. 6a is that the cell drop rate forExperiment 4 is slightly lower than that for Experiment 1but much higher than that for Experiments 2 and 3. Thisimplies that under 100 Mb/s threshold setting, the ‘‘float-ing’’ buffer is utilized sparingly, leading to similar behavioras the traditional buffer architecture. For threshold settingsof 50 Mb/s and 70 Mb/s, the ‘‘floating’’ buffer is utilizedextensively, reducing the stress in the non-redirected linksof the network nodes.

0.00

100.00

200.00

300.00

400.00

500.00

600.00

700.00

800.00

900.00

5.00 10.00 15.00Node ID

3C

ells

Los

t in

Sepa

rate

Buf

fers

x 1

0

Threshold Value

50 Mbits/sec70 Mbits/sec100 Mbits/sec

WithPDOBR

Traditional output buffer architecture

0.00

50.00

100.00

150.00

200.00

250.00

300.00

350.00

400.00

450.00

500.00

5.00 10.00 15.00Node ID

3C

ells

Drp

ped

in S

hare

d B

uffe

r x

10

Threshold50 Mbits/sec70 Mbits/sec100 Mbits/sec

With PDOBR

a b

Fig. 6. Cumulative number of cells (a) dropped in the separate buffer in the non-redirected links, averaged over all separate buffers of non-redirected links,and (b) incrementally dropped in the ‘‘floating’’ buffer of redirected links, as a function of the node identifier.

S. Ghosh / Computer Communications 29 (2006) 3609–3624 3619

Fig. 6b yields two key observations. First, the reason forlower incremental cell drop in the redirected links under100 Mb/s threshold, compared to the 50 Mb/s threshold,is clearly due to the sparing use of the ‘‘floating’’ bufferstemming from infrequent redirections under 100 Mb/sthreshold setting. Second, the reason underlying a signifi-cantly low cell drop rate for 70 Mb/s threshold setting incontrast to 50 Mb/s threshold choice is subtle and complexas explained subsequently. Consider that the call processorinstructs the PDOBR to deassign the ‘‘floating’’ bufferfrom the previous outgoing link Oi and reassign it to thelink Oj. Clearly, the incoming cells from the core switchcorresponding to Oj will encounter a much larger buffer,consisting of the ‘‘floating’’ buffer and the separate buffercorresponding to Oj. At the same time, the serial connec-tion of the buffers will cause the cells to incur a higher tran-sit delay through the redirected link. For a low thresholdsetting of 50 Mb/s, the likelihood is high that more outgo-ing links will qualify for redirection, including those linksin which the resulting traffic may not necessarily benefitfrom the increased buffer capacity. Furthermore, the fre-quency of redirection is likely to be high, implying thatno sooner the ‘‘floating’’ buffer is filled with cells corre-sponding to Oj, the call processor will instruct PDOBRto redirect to the subsequent link, say Ok. As a result,before the traffic associated with Oj was able to successfullyderive any gain from the enhanced buffer capacity, it is cut offfrom the ‘‘floating’’ buffer. In addition, while the ‘‘floating’’buffer is being emptied of its previous content of cells cor-responding to Oj, the incoming cells corresponding to Oj

are denied access to the separate buffer, leading to cell loss.

Thus, the superior cell drop performance for 70 Mb/sthreshold setting reflects a reasonable balance betweenthe frequency of redirection, which services the highlystressed links, and the interval over which a given redirec-tion remains active and lowers cell loss by providingadequate buffer capacity.

Fig. 7 summarizes the cell drop performance. It presentsthe average cell drop rate in the non-redirected and redi-rected links and displays the differential influence of the‘‘floating’’ buffer on cell drop. Fig. 7 reveals two key behav-iors. First, as the threshold setting is increased beyond the50 Mb/s choice, the cell drop rate in the non-redirectedlinks under PDOBR continues to rise. The use of the‘‘floating’’ buffer gradually diminishes with increasinglyhigher threshold values, leading to higher stress in thenon-redirected links. Second, in the graph of the differentialimpact of the ‘‘floating’’ buffer, the cell drop rate is highestfor 50 Mb/s, gradually decreasing for higher threshold set-tings. The greater use of the ‘‘floating’’ buffer at the lowthreshold setting of 50 Mb/s causes a high cell drop ratewhich gradually decreases as the threshold setting isincreased and redirection becomes infrequent. The resultof these two opposing behaviors leads to the inverse bellshaped graph for the redirected links, wherein the cell droprate is observed to reach the lowest corresponding to anintermediate threshold setting of 70 Mb/s. Fig. 7 underliesan important inference in that the vehicle of modeling andsimulation constitutes a practical approach to yield anempirical value for the threshold in PDOBR.

Table 2 summarizes the simulation results for cells lostand dropped in Experiments 1 through 4, averaged over

0.00

50.00

100.00

150.00

200.00

250.00

300.00

350.00

400.00

450.00

500.00

550.00

50 70 100

Threshold Setting, Under PDOBR (MB/s)

3C

ell d

rop

aver

aged

ove

r ou

tgoi

ng li

nks

in e

very

nod

e (X

10

)

Non redirected links(separate buffers)

Redirected links("floating" & separate buffers)

Differential impact of "floating" buffer

Traditional bufferarchitecture

Fig. 7. Cumulative number of cells dropped in the redirected links, non-redirected links, and the differential influence of the ‘‘floating’’ buffers,averaged over all links in every node, as a function of the threshold value.

3620 S. Ghosh / Computer Communications 29 (2006) 3609–3624

all outgoing links of every node in the representative net-work. It is observed that the net cell drop rate of 16.90%is the lowest for 70 Mb/s threshold setting, significantlybelow 42.30% for the traditional output buffer architecture.

The key idea in PDOBR is to temporarily enhance theseparate buffer of a link, identified by the call processoras under excessive stress, with the floating buffer, so as tominimize cell loss and enhance throughput. For the lengthof the duration of its association with the ‘‘floating’’ buffer,the link with the resulting enhanced buffer capacity istermed a redirected link. Thus, for a given node, at anytime instant, only one outgoing link may incur redirectionwhile the remaining links remain non-redirected. In the twosubsequent Figs. 8 and 9, the goal is to examine thethroughput under Experiment 1 as opposed to Experiments2 through 4, focusing on the behavior of the redirectedlinks under PDOBR. Fig. 8a presents the cumulative num-ber of cells serviced by the separate buffer (averaged over

Table 2Summary of simulation results for cells lost in the redirection circuitry and cell1 through 4

ThresholdSetting

Avg. no. cells lost pernode in redirection circuitry

Cumulative cells droppenode at non-redirected

Traditional 0 567,523Output bufferArchitecture

PDOBR, 50 Mb/s 684.46 121,085PDOBR, 70 Mb/s 600 159,175PDOBR, 100 Mb/s 459 477,489

all of the separate buffers of any given node) in the non-re-directed links, as a function of the node identifiers of thenetwork. Fig. 8b presents the throughput for a redirectedlink at a node minus the average throughput over thenon-redirected link of the node, for every node of the net-work. Thus, the data reflects the cells that are incrementallyserviced by the ‘‘floating’’ buffer of the redirected links.

In Fig. 8a, the key observation is that the throughput for anon-redirected link in the absence of PDOBR is higher thanthose under PDOBR. The reasons are twofold. First, eachseparate buffer is slightly bigger in the absence of PDOBRthan that under PDOBR. Recall from Sections 2 and 3 thatthe cumulative number of buffer slots in the switch underPDOBR is kept identical to that in the absence of PDOBR.Second, it is expected that, under PDOBR, the redirectedlinks will bear a greater burden of the traffic cells than thenon-redirected links, implying higher throughput. Inessence, only separate buffers influence the behavior of thenon-redirected links. Fig. 8b reveals the differential impactof the ‘‘floating’’ buffer on the throughput. The highestthroughput is observed for the choice of the 70 Mb/s thresh-old value, followed by that for the threshold value of 50Mb/s, and the lowest throughput is observed for thethreshold choice of 100 Mb/s. For the threshold value of100 Mb/s, the rate of indirection is the lowest, implying thatthe ‘‘floating’’ buffer is least utilized and causes leastthroughout improvement. Although, the rate of indirectionis the highest for 50 Mb/s threshold, the excessive use of the‘‘floating’’ buffer causes the improvement in throughput tobe limited by an increase in the cell drop rate.

Fig. 9 summarizes the throughput performance. It pre-sents the average throughput in the non-redirected andredirected links and displays the differential influence ofthe ‘‘floating’’ buffer on throughput. A key observation isthe significantly higher throughput at the redirected links,even higher than that under Experiment 1, clearly confirm-ing the thinking underlying PDOBR. The dynamically aug-mented buffer space for a link under stress implies superiorthroughput for the overall network. Fig. 9 reveals that,under PDOBR, the graph of the redirected link as well asthe one representing the differential influence of the‘‘floating’’ buffer, both assume bell shaped forms, withthe highest throughput achieved in the neighborhood of70 Mb/s threshold setting. In addition, the non-redirected

s dropped in the redirected and non-redirected link buffers in Experiments

d perlinks

Cumulative cells dropped pernode at redirected links

Differential impactof ‘‘floating’’ buffer

Net cellsdropped

0 42.30%

414,063 292,978 30.43%210,210 81,035 16.90%489,861 12,372 35.88%

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

5.00 10.00 15.00

Node ID

6C

ells

in S

tand

ard

Out

put B

uffe

rs x

10

Threshold Value

50 Mbits/sec70 Mbits/sec100 Mbits/sec

WithPDOBR

Traditional output buffer architecture

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

5.00 10.00 15.00

Node ID

6C

ells

in O

utpu

t Sha

red

Buf

fer

x 10

Threshold Value

50 Mbits/sec70 Mbits/sec100 Mbits/sec

With PDOBRa b

Fig. 8. Cumulative number of cells (a) serviced by the separate buffer in the non-redirected links, averaged over all separate buffers of non-redirected links,and (b) incrementally serviced by the ‘‘floating’’ buffer of redirected links, as a function of the node identifier.

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

1.10

1.20

50 70 100

Threshold Setting (Mb/s) Under PDOBR

Non Redirected Links(separate buffers)

Redirected Links("floating" & separate buffers)

Differential Impact of "Floating" Buffer

6T

hrou

ghpu

t ave

rage

d ov

er o

utgo

ing

links

in e

very

nod

e (X

10 )

Traditional bufferarchitecture

Fig. 9. Cumulative number of cells serviced at the redirected links, non-redirected links, and the incremental influence of the ‘‘floating’’ buffers,averaged over all links in every node, as a function of the threshold value.

S. Ghosh / Computer Communications 29 (2006) 3609–3624 3621

links also incur the highest throughput in the neighbor-hood of 70 Mb/s threshold setting.

The findings in Fig. 9 corroborate those in Fig. 7 in thata threshold setting in the neighborhood of 70 Mb/s yields

efficient utilization of PDOBR and constitutes a judiciouschoice.

Table 3 summarizes the simulation results relative tothroughput, in Experiments 1 through 4.

Part 2 of this paper focuses on studying the influence ofincreased buffer capacities on PDOBR’s performance.Experiments 5 through 7 underlie part 2 but only the keyinferences are presented here. First, as expected, PDOBRappears to benefit from increased buffer size. Second, bychoosing reasonably higher buffer sizes, one may compen-sate for a poor choice of the threshold setting and achievehigh performance. Third, the experimentations offer anobjective and pragmatic methodology to determine newbuffer sizes.

Part 3 of this paper focuses on a comparative analysisof the impact of hardware technology and ‘‘floating’’buffer size, on PDOBR’s performance. With the choiceof the higher performance, FAST technology for themultiplexers, as opposed to CMOS, an appreciablereduction in the redirection time is expected to yield acommensurate increase in PDOBR’s performance. UnlikeCMOS, which yields an average redirection delay of906 ns, FAST offers a redirection delay as low as237 ns. Under part 3, Experiment 8 is characterized bya threshold choice of 70 Mb/s and for the increase inthe ‘‘floating’’ buffer size, a higher factor of 18.98%,the difference in the percentage cells dropped correspond-ing to 70 Mb/s and 100 Mb/s threshold choices, is select-ed. Experiment 9 employs FAST technology for themultiplexer redirection circuitry with the threshold heldat 70 Mb/s.

Table 3Summary of simulation results for cell throughput in Experiments 1 through 4

Threshold Throughput at thenon-redirected links

Throughput at theredirected links

Differential throughput(incremental influence ‘‘floating’’ buffer)

Traditional output buffer architecture 774,139 0 NilPDOBR, 50 Mb/s 443,249 948,209 504,960PDOBR, 70 Mb/s 583,794 1,184,103 600,309PDOBR, 100 Mb/s 569,972 876,236 306,264

3622 S. Ghosh / Computer Communications 29 (2006) 3609–3624

Fig. 10 presents the throughput (in %), i.e., the percent-age of cells successfully transported across the switch fab-ric, at every node, for each of Experiments 8 and 9. Thegraphs for Experiments 3 and 7 are superimposed inFig. 10 to facilitate comparative analysis. It may beobserved that while the throughputs for both Experiments8 and 9 exceed those of Experiments 3 and 7, Experiment 8reveals the highest throughput for PDOBR. The averagethroughput over all nodes for Experiments 8 and 9 areobtained as 91.56% and 85.05%, respectively. Thus,although the use of a faster technology to underwrite theredirection circuitry enhances PDOBR’s performance, amodest increase in the ‘‘floating’’ buffer capacity yields dra-matic improvement. The underlying reasons are as follows.With faster technology, the demultiplexers and multiplexerwill be switched quickly, implying that less of the incomingcells from the core switch will be lost in the redirection cir-cuitry. However, before resources can be fully turned over

45.00

50.00

55.00

60.00

65.00

70.00

75.00

80.00

85.00

90.00

95.00

100.00

5.00 10.00 15.00

Node ID

With PDOBRThreshold Value

Thr

ough

put (

in %

)

70 Mb/s (Experiment 3)70 Mb/s (Experiment 8)70 Mb/s (Experiment 9)50 Mb/s (Experiment 7)

Fig. 10. Throughput (in %), i.e., cells successfully transported across theswitch fabric, of every node, for each of Experiments 8 (‘‘floating’’ buffersize is increased by 18.98%, the threshold is set at 70 Mb/s, and CMOS isthe underlying technology) and 9 (the threshold is set at 70 Mb/s, andFAST replaces CMOS).

to the new redirected outgoing link, the cells in the ‘‘float-ing’’ buffer corresponding to the previous redirected outgo-ing link, must first be emptied. Given that this emptyingprocess derives little to no benefit from the superior multi-plexer technology and that it is slow and consumes the bulkof the redirection time, the overall improvement resultingfrom Experiment 9, is modest. Experiment 8 represents acoupling of two parameters – the threshold choice of70 Mb/s and an increase in the ‘‘floating’’ buffer size, eachone of which has already been found to improve PDOBRperformance. Thus, as expected, Experiment 8 reveals adramatic improvement of throughput, at 91.56%. Dataon cell drop behavior from Experiments 8 and 9 are alsostudied in detail but not presented here. The key inferenceis that the employment of FAST technology, relative toCMOS, succeeds in lowering the number of cells droppedin the redirection circuitry.

6. Conclusions

This paper has introduced a radically new approach –PDOBR architecture, wherein the output buffer organiza-tion in the switch fabric is reconfigured dynamically, i.e.,during network operation, under the call processor’scontrol, such that the network incurs minimal cell dropstemming from buffer overflow. PDOBR is modeled for a15-node representative ATM network and simulated withrealistic timing and other hardware parameters under real-istic and intense input traffic stimulus. The simulation uti-lizes asynchronous distributed techniques and is executedon a testbed of 15+ network of workstations, therebyclosely resembling a real-world operational ATM network.For the stochastically generated intense input traffic, therepresentative network, utilizing the traditional static buff-er architecture, yields a throughput of 57.69% and a celldrop rate of 42.30%. Under PDOBR, a total of three dis-crete threshold values are utilized – 50 Mb/s, 70 Mb/s,and 100 Mb/s, with the fastest link speed in the networkrated at 155.5 Mb/s. Performance analysis of PDOBRreveals that a threshold choice of 70 Mb/s yields a superiorcombination of high throughput and low cell drop, wherethe initial size of the ‘‘floating’’ buffer is set equal to thatof each of the separate buffers. Experiments confirm logicthat an increase in the size of the ‘‘floating’’ buffer shouldresult in a superior efficiency for PDOBR. Further analysisreveals that a combination of faster technology, FASTSchottky TTL, and a threshold setting of 70 Mb/s yields

S. Ghosh / Computer Communications 29 (2006) 3609–3624 3623

a very high throughput of 85.05% and low cell drop rate of14.95%. However, the highest performance is achieved –namely a high throughput of 91.56% coupled with a lowcell drop rate of 8.44%, when the ‘‘floating’’ buffer size isincreased by 18.98%, the difference in the absolute cell droprates resulting from 70 Mb/s and 100 Mb/s threshold sce-narios, and the threshold is set at 70 Mb/s. Transcendingthe goal of designing efficient ATM switch architectures,PDOBR’s success inspires confidence in the thinking that,even in complex systems with subsystems that operate atvastly different execution speeds, significant efficiency ispossible to achieve, as follows. The slower but high-level,complex decision making process is permitted to synthesizesimple and effective rules that are fired quickly and guide avery fast but far less intelligent, lower-level process. Theresult is significant benefit for the total system in the longterm. Asynchronous distributed simulation techniques playa key role in both validating the approach and determiningthe key parameters of this approach. Conceivably, higheroperational efficiency may be achieved, despite systemdelays and latencies, by utilizing two additional flags –ALMOST-FULL and ALMOST-EMPTY, in the pro-posed floating buffer architecture. These flags wouldrepresent hardwired signals from the buffer control unitto convey the anticipated state of the FIFO to the callprocessor. The ALMOST-FULL flag informs the callprocessor that cells are likely to be dropped soon, at whichpoint the processor may choose to discard cells or collectstatistical information relative to the health of the network.On the other hand, the ALMOST-EMPTY flag impliesthat the floating buffer is under utilized, at which pointthe call processor may choose to re-assign outgoing cellsback to the original output buffer, overwrite the currentredirection stage, or infer that the net bandwidth for thelink has fallen below the empirical threshold.

Acknowledgment

The author acknowledges Drs. Ricardo Citro and TonyLee for their assistance with this research effort.

References

[1] R.Y. Awdeh, H.T. Mouftah, Survey of ATM switch architecture,Comp. Net. ISDN Syst. 27 (1995) 1567–1613.

[2] F. Kamoun, L. Kleinrock, Analysis of shared finite storage in acomputer network node environment under general traffic conditions,IEEE Trans. Commun. 28 (7) (1980) 992–1003.

[3] A.E. Eckberg, T.C. Hou, Effects of output buffer sharing on bufferrequirements in an ATDM packet switch, P. IEEE INFOCOM ’88(1988) 459–566.

[4] N. Endo, T. Kozaki, T. Ohuchi, H. Kuwahara, S. Gohara, Sharedbuffer memory switch for an ATM exchange, IEEE Trans. Commun.41 (1) (1993) 237–245.

[5] I. Iliadis, Performance of a packet switch with input and outputqueueing under unbalanced traffic, P. IEEE INFOCOM ’92 3 (1992)743–752.

[6] A. Pattavina, Nonblocking architectures for ATM switching, IEEECommun. Mag. (2) (1993).

[7] M.A. Pashan, M.D. Soneru, G.D. Martin, Technologies for broad-band switching, AT&T Tech. J. (1993) 39–47.

[8] J. Causey, H. Kim, Comparison of buffer allocation schemes in ATMSwitches: complete sharing, partial sharing and dedicated allocation,IEEE Int. Conf. Commun. (2) (1994) 1164–1168.

[9] B. Kwon, B. Kim, J. Park, H. Yoon, J. Cho, Performance analysis ofoutput buffers in multistage interconnection networks with multiplepaths. In: Proceedings of the Seventh IEEE Symposium on Paralleland Distributed Processing, (1995) pp. 260–265.

[10] M.J. Lee, D.S. Ahn, Packet loss analysis of non-blocking ATMswitches with non-uniform traffic and performance improvement byoutput buffer sharing. In: Proceedings of Global TelecommunicationsConference, Globecom’93, 29 Nov-2 Dec 1993, pp. 1079–1084.

[11] Hong Shi, O. Wing, A novel design approach of ATM switches forVLSI implementations. In: Proceedings of the Canadian Conferenceon Electrical and Computer Engineering, vol. 2, 14–17 Sep 1993, pp.684–687.

[12] Hong Shi, N. Abbasi, C. Zukowski, O. Wing, Buffer size trade-offs ininput/output buffered ATM switches under various conditions. In:Proceedings of the Fourth International Conference on ComputerCommunications and Networks, 20–23 Sep 1995, pp. 258–263.

[13] Zhou Qigang, Zhang Shujing, Performance analysis of a new shareoutput buffer ATM switch. In: Proceedings of the InternationalConference on Communication Technology, ICCT’98, vol. 1, 22–24Oct 1998, pp. 37–40.

[14] B. Kraimeche, A buffer management and channel allocation strategyfor QoS support in an ATM switch. In: Proceedings of the NinthInternational Conference on Computer Communication and Net-works, 2000, pp. 463–466.

[15] C. Minkenberg, T. Engbersen, M. Colmant, A robust switcharchitecture for bursty traffic. In: Proceedings of the InternationalZurich Seminar on Broadband Communications, 2000, pp. 207–214.

[16] Ruixue Fan, Alexander Ishii, Brian Mark, G. Ramamurthy, QiangRen, ‘‘An optimal buffer management scheme with dynamic thresh-olds,’’. In: Proceedings of the Global Telecommunications Confer-ence, 1999, pp. 631–637.

[17] A.K. Choudhury, E.L. Hahne, Dynamic queue length thresholds in ashare-memory packet switch, IEEE/ACM Trans. Net. 6 (2) (1998)130–140.

[18] V. Rajan, Y. Chu, An enhanced dynamic packet buffer management.In: Proceedings of the 10th IEEE Symposium on Computers andCommunications (ISCC), June 2005, pp. 869–874.

[19] ‘‘MCI Systems Technical Documentation. (www.mci.com), 1998.[20] Ricardo Citro, Sumit Ghosh, Simulation-based stability study of a

representative, large-scale ATM network for a distributed callprocessing architecture. In: Proceedings of the IEEE InternationalConference on Networking (ICN’01), July 2001.

[21] Ricardo Citro, Sumit Ghosh. The utilization of distributed processingon signaling calls in high speed ATM network: architecture, model-ing, simulation, & performance analysis. In: Proceedings of theTwenty-First IASTED International Conference on Parallel andDistributed Computing and Networks (PDCN’2003), February 2003.

[22] W. Kleitz, Digital Electronics; A Practical Approach, fifth ed.,Prentice-Hall, Englewood Cliffs, NJ, 1999.

[23] Motorola, ‘‘Motorola High-Speed CMOS Logic Data Book’’,Motorola Inc., 1989.

[24] Motorola, ‘‘Motorola FAST Data Book’’, Motorola Inc., 1990.[25] Motorola, ‘‘Motorola FAST Data Book’’, Motorola Inc., 1989.[26] Motorola, ‘‘Motorola TTL Data Book’’, Motorola Inc., 1973.[27] Neil H.E. Weste, Kamran Eshraghian, ‘‘Principles of CMOS VLSI

Design: A Systems Perspective’’, Addison-Wesley VLSI SystemsSeries, second ed., 1993.

[28] ‘‘Fore Systems Technical Documentation, (www.fore.com/atm-edu),1996.

[29] T. Lee, S. Ghosh, A distributed approach to real-time payments-processing in a partially-connected network of banks: modelingand simulation, Simulation – J. Soc. Comp. Simul. 62 (3) (1994)180–201.

3624 S. Ghosh / Computer Communications 29 (2006) 3609–3624

[30] S. Ghosh, T. Lee, Modeling and Asynchronous Distributed Simulation:AnalyzingComplexSystems,IEEEPress,NewJersey,,NewYork,2000.

[31] Z. Dziong, ATM Network Resource Management, McGraw-Hill,Inc., 1997.

[32] L. Kleinrock, The Latency/Bandwidth tradeoff in gigabit networks,IEEE Commun. Mag. (1992).

[33] C. Courcoubetis, G. Fouskas, R. Weber, On the performance of aneffective bandwidth formula, P. ITC14 (1994) 201–212.

Sumit Ghosh presently serves as the chair of theComputer Science Department at The Universityof Texas at Tyler. He is the recipient of the IEEEComputer Society’s Technical AchievementAward for 2004. Sumit received his B. Tech.degree (1980) from the Indian Institute of Tech-nology at Kanpur, India, and his M.S. and Ph.D.(1984) degrees from Stanford University, Cali-fornia. Prior to UT Tyler, Sumit held the title ofThomas E. Hattrick Endowed Chaired Professorof Information Systems Engineering in theDepartment of Electrical and Computer Engi-neering at Stevens Institute of Technology in

Hoboken, New Jersey. At Stevens, he served as the director of the com-puter engineering program, founded the Secure Network Systems DesignLaboratory (SENDLAB), architected the Graduate Certificate Program inSecure Network Systems Design, served on the Advisory Board of theExecutive Leadership Institute (ELI) of the School of Technology Man-agement, and co-organized the ‘‘Management of Technologies’’ (MOT)Annual Symposium Series. Sumit also co-organized two NSF-sponsoredworkshops – Modeling and Simulation for Design of Large Software-Intensive Systems: Challenges and New Research Directions (DLS’03) andSecure Ultra Large Networks: Capturing User Requirements withAdvanced Modeling and Simulation Tools (ULN’03). Prior to Stevens, hehad served as the associate chairman for research and graduate programsin the Computer Science and Engineering Department at Arizona StateUniversity. At ASU, he also chaired the faculty search committee for threeconsecutive years, was responsible for assessing the quality and content ofgraduate courses, and founded the Networking and Distributed Algo-rithms Laboratory. Before ASU, Sumit had been on the faculty ofComputer Engineering at Brown University, Rhode Island, and evenbefore that he had been a member of technical staff (principal investigator)of VLSI Systems Research Department at Bell Laboratories Research(Area 11) in Holmdel, New Jersey. His additional industrial experienceincludes Silvar-Lisco in Menlo Park, CA., Fairchild Advanced Researchand Development, and Schlumberger Palo Alto Research Center. Sumit

was the founding VP for Education in the Society for Computer Simu-lation and Modeling International (SCS) and presently serves on theBoard of Directors. As VP for Education, his efforts had included (1)architecting an undergraduate curriculum in modeling and simulation anddeveloping the accreditation procedures and (2) developing a list ofgraduate-level courses along with their contents and a comprehensivecertification examination to be administered by the SCS. He is the primaryauthor of five reference books: Hardware Description Languages: Con-cepts and Principles (IEEE Press, 2000); Modeling and AsynchronousDistributed Simulation of Complex Systems (IEEE Press, 2000); Intelli-gent Transportation Systems: New Principles and Architectures (CRCPress, 2000; First reprint 2002); Principles of Secure Network SystemsDesign (Springer-Verlag, 2002); and Algorithm Design for NetworkedInformation Technology Systems: Principles and Applications (Springer-Verlag, 2003). The Principles of Secure Network Systems Design book hadbeen translated into Simplified Chinese by Chongqing University, China,in 2003-2004. He has written five invited book chapters and co-edited thebook titled, Guarding Your Business: A Management Approach toSecurity (Kluwer Academic Publishers, 2004). He has written 95+ trans-actions/journal papers and 95 refereed conference papers. He is currentlyco-editing a book titled, Cybercrimes, with the former Assistant USAttorney of the US Department of Justice and is completing his sixthreference book titled, Innovations in Network Analysis and Design: AModeling and Simulation Approach, for Springer-Verlag that will presentthe discipline of networking from an entirely new perspective and providenew research directions. Sumit’s research focuses on fundamental andchallenging yet practical problems that are of potential benefit to society.For a list of his research pursuits, please visit http://cs.uttyler.edu/Faculty/Ghosh/index.html. Sumit’s current professional aspirations aretwo-fold. First, he is developing an original interdisciplinary research areathat integrates networked computational systems, which underlines all ofengineering and computer science, with the disciplines of chemistry,biology, medicine, business, and law, at a fundamental level. While theresearch problems will be exceptionally challenging yet practical, thepromise that they will lead to a quantum leap in our civilization’sadvancement, is undeniable. Second, with inspiration and guidance fromEmeritus Prof. C.V. Ramamoorthy of UC Berkeley and others, Sumit isdeveloping the foundation of an entirely new educational infrastructure,grounded in creativity and humbleness, that will enable future citizens tothread through multiple careers in vastly different professional disciplinesthroughout their very long working lives as they grow from young adultsinto their 80 s, 90 s, and even 100 s, still in sound mental and physicalhealth, all the while actively contributing toward the progress of society.These two efforts underlie a new model of interaction between academiaand industry that will help train a new cadre of engineers for the high-techindustry at the next level.