3C-2 Efficient Fair Queueing for ATM Networks using ...cheung/Courses/558a/Syllabus/7-network/... · Hiroshima University 1-4-1Kagamiyama, Higashi-Hiroshima ... We present important

Efficient Fair Queueing for ATM Networksusing Uniform Round Robin*

Norio MatsufurttGraduate School of Engineering

Hiroshima University1-4-1 Kagamiyama, Higashi-Hiroshima

739-8527, [email protected]. hiroshima-u.ac.jp

Abstract— In this paper, we investigate scheduling algorithms that aresuitable for ATM networks. In ATM networks, atl packets have a fixedsmall length of 53 bytes and they are transmitted at very high rate. There-fore a time complexity of a scheduling algorithm is quite important. Mostscheduling algorithms proposed so far have a complexity of O (log iV) perpacket, where N denotes the number of connections sharing the link. Incontrast, weighted round robin(WRR) has the advantage of having O(1)complexity; however, it is knowu that its delay property gets worse as Nincreases. To solve this problem, in this paper we propose two new variantsof WRR, uniform round robin (URR) and idling unz~ormround robin (1-URR).Both disciplines provide end-to-end delay and fairness bounds which areindependent of N. Complexity of URR, however, sfightly increases as Nincreases, while I-URR has complexity of O(1) per packet. I-URR alsoworks as a traffic shaper, so that it can significantly atleviate congestion onthe network. We atso introdnce a hierarehicat WRR discipline (H-WRR)which consists of dWferentWRR servers using I-URR as the root server.H-WRR etRcientlyaccommodates both guaranteed aod best-effortconnec-tions, while maintaining O(1)complexity per packet. If several connectionsare reservingthe same bandwidth, their delay bounds achieved by H-WRRare dose to those of weighted fair queneing.

I. INTRODUCTION

Future broadband packet network will enable a single com-munication link to integrate services with a wide range of qualityof service (QoS) requirements. In order to satisfy QoS require-ments of various applications such as audio, video, ftp, etc.,switching nodes of the networks must control the use of net-work resources, e.g. bandwidth and buffer space, by properlyscheduling them. The scheduling algorithms provide the band-width guarantees for each connection with bounded packet delayif the source conforms to a certain traffic specification.

The best-known scheduling algorithm is weighted fair queue-ing (WFQ)[ 1], or packet-by-packet generalized processor shar-ing (PGPS)[2] [3], which provides the shortest end-to-end delaybound as well as fair allocation of bandwidth. Unfortunately,the time required for selecting a packet to be transmitted underPGPS is O(N), where N denotes the number of connectionssharing the link. To reduce the complexity, several schedulingalgorithms have been proposed so far[4]-[ 14]. Most of these al-gorithms are based on virtual time functions and their operationsare similar: They calculate a timestamp for each arriving packet,and transmit the packets in increasing order of their timestamps.Each scheduling algorithm adopts its own algorithm to calculate

* This work was supportedin part by Research for the Future Programof Japan Society for the Promotion of Science under the Project ‘<IntegratedNetwork Architecture for Advanced MultimecUa Application Systems” (JSPS-RFTF97R16301)

Reiji AibaraInformation Processing Center

Hiroshima University1-4-2 Kagamiyama, Higashi-Hiroshima

739-8526, Japanray@hiroshima-u .ac.jp

the timestamps. Some algorithms[4] [7][9] successfully reducecomplexity to O (log N), while they are maintaining the sameend-to-end delay bound as PGPS. Recently, leap forward vir-tual clock (LFVC)[1O] was proposed, which has O(log log N)complexity with slight increase of the end-to-end delay and un-fairness. To the best of our knowledge, deficit round robin(DRR)[l 1] is the only discipline which achieves fair scheduli-ng of bandwidth with O(1) complexity. The end-to-end delaybound of DRR, however, significantly increases with N.

Asynchronous Transfer Mode (ATM) is widely acknowledgedas the base technology for the integrated communication tnet-works. In ATM networks, all packets, called cells, have a smallfixed length of 53 bytes. Cells are normally transmitted at about155 Mbps, which means that the cycle time of the switch has tobe about 2.7 ps. Since a commercial switch may have from 16to 1024 input lines, each output port must be prepared to accepta batch of 16 to 1024 cells every 2.7 us to achieve no cell lossunder congested situations. At 622 Mbps, a new batch of cellsarrives about every 680 ns. Thus the implementation complexityof the scheduling algorithm is significant for ATM switches.

The design goal of our scheduling algorithm is to guaranteeacceptable delay and fairness bounds with 0(1 ) complexity inATM switches. Since the length of a cell is constant, we fo-cus on the weighted round robin discipline (WRR)[ 12], whichhas especially low complexity of implementation. However itis known that WRR has a drawback that the end-to-end delaybound increases with N[17]. Since DRR is an extension of WR-R and inherits this property, its delay also depends on N. Tosolve this problem, in this paper, we propose two new variantsof WRR, uniform round robin (URR) and idling uniform roundrobin (1-URR), both of which provide end-to-end delay boundswhich are independent of N. Complexity of URR, however,slightly increases as N increases, while I-URR has 0(1) com-plexity with the end-to-end delay bound comparable to URR. Wealso show the fairness properties of URR and I-URR in termsof throughput faimess[5] and worst-case fair index[6], whichare also independent of N. Although both algorithms have suffi-cient fairness properties, their end-to-end delay bounds are ratherlarge compared with PGPS. To improve the delay bounds, weintroduce a hierarchical WRR discipline (H-WRR) which con-sists of different WRR servers using I-URR as the root server. Ifseveral connections are reserving the same bandwidth, H-WKR

provides an end-to-end delay bound that is close to PGPS.

0-7803-5420-6/99/$10.00 (c) 1999 IEEE

2

In Section 2, we define URR and I-URR disciplines. Section3 is devoted to the analysis of URR and I-URR to derive theirperformance. Comparative evaluation and discussions are givenin Section 4. Section 5 describes a hierarchical WRR disci-pline. In Section 6 we present several simulation experiments.Concluding remarks are made in Section 7.

Ep yfj fj

[111111112131415111. 1211131114111511&4 b

R=8 Delay R=8 ~Delay

(a) Batch assignment (b) Ideal assigmnent

Fig. 1. Slot assignment algorithms,

II. WEIGHTEDROUNDROBINDISCIPLINES

We refer to the sequence of cells transmitted in a certainconnection as a~ow. Each flow is stored in a distinct queue, sothat each flow is served separately. The WRR server cyclicallyserves the flows regardless of the pattern of cell arrivals. Werefer to the cycle as a round, where the flows are served ina fixed sequence that is computed in advance. The round isdivided into a fixed number of sub-intervals, called slots, each ofwhich corresponds to one cell transmission time L ~/C, whereL. and C denote the size of a cell and the capacity of the server,respectively. The number of slots in a round is referred to as around size denoted by R. Let slot j (O < j < R – 1) denotethe jth slot in a round and we assume a round starts from slot O.Each slot is assigned to a flow that has priority to be served atthe slot. Let flow f~ denote the flow to which slot j is assigned,then, ~j represents the service sequence in a round. The WRRdiscipline is fully characterized by R and $j. We refer to a slot atwhich the prior flow has no backlog as an idle slot. If the WRRserver encounters an idle slot, the server immediately skips overthe slot. Therefore, WRR is a work-conserving discipline.

Now we consider the way to determine j ~. Let N, pi, and wibe the number of flows sharing the link, the service rate allocatedto flow i, and the number of slots assigned to flow i in a round,respectively, where ~ ~=1Wi = R holds. The WRR serverattempts to transmit cells of flow i at the rate of C. wi/R. Thuswi/R must be equal to the normalized service rate allocated toflow i, ri = p~/C. The simplest way to decide j ~ is to assignsuccessive wi slots to flow i from i = 1 to N. We refer to thisassignment as a batch assignment. In a network with all packetshaving the same size, DRR is identical to WRR adopting thebatch assignment. This assignment, however, results in poorperformance, since a packet of flow i arriving just after thesuccessive w i slots can be delayed at least R – w i + 1 slots. Inaddition, the characteristics of each flow’s traffic may becomemore bursty. In contrast, if we assign the wi slots to flow iso that the intervals between successive two assigned slots areclose to R/wi, the delay will be improved. Fig. 1 illustratesthe difference of delay between the above two assignments forN=5, R=8, w1=4, andw~=l(2<i< 5).

The scheduling style of Fig. 1 (b) was first adopted by Kat-evenis et al.[ 12]; however, no practical algorithms to decide .f~is presented in the study. Itai and Rosberg [15] proposed an algo-rithm to uniformly assign the slots for a multiple access channel,which they call golden ratio control policy (GRCP). GRCP em-ploys the golden ratio p-1 = (W – 1)/2 for the assignmentand has a special property if R is a Fibonacci number. Howeverif R is not a Fibonacci number, no property is shown for GRCP.In [16] we proposed an algorithm to decide f ~, the un~orm as-signment algorithm, which has a special property for any valueof R and any set of wis.

A. Uniform Round Robin Discipline

Our slot assignment algorithm is based on the PGPS discipline.Suppose that, for 1 < i < N, a flow i cell arrive at a POPSserver at time O, LC/pi, 2LC/pi, .... and (wi – l) LC/pi. Letcj (O < j < R – 1) be the jth cell to depart under POPS.Then, we set j~ = k, where Cj belongs to flow k. Clearly, wislots are assigned to flow i and the assigned slots are distributeduniformly. Now we define our algorithm in another way withoutemploying PGPS. Let v; be the number of slots assigned to flowi between slot Oand slot j. For convenience, we assume v ~~1 := Ofor any i. Then v: for O < j < R – 1 is obtained by

‘{ VJ-l + 1 if f~ = i,u; = J-1a otherwise.

(1)

Using v{, our algorithm is described as follows.Uniform assignment algorithm:Repeat the following step from j = Oto j = R – 1.

Let the eligible set Ej be the set of flow i that satisfies

v~-l/r~ < j. Then, .fj = k G Ej where (v~-l + l)/r~ =tinie~, {(v!– 1+ l)/~i}. In the case that several flows havethe same smallest value, the tie-breaking rule is to select flowk for the smallest value of k.

Actually we can use [~~-l/ril and ~(w~-l + 1)/ril instead~ofv/-l/~i and (~~-1 +1) /ri, thus the algorithm is calculated usingonly integer values. We refer to the WRR discipline adoptingthe above uniform allocation algorithm as Uniform Round-Robin(URR). We present important properties of the uniform assign-ment algorithm. The proof is presented in the appendix. Let slot~i,n be the nth slot (1 < n < wj) assigned to flow i, that is,

f “,” = i and v~”” = n.Theorem 1: Suppose that slots in a round are assigned using

the uniform assignment algorithm. The following propertieshold for any number of flows N and any weights w is.

1. The number of slots assigned to flow i in a round is equaltowiforl~i <N.

2. For 1< i ~ N and 1 ~ n < wi, si,n is bounded as

(?I – 1)/7’i < Si,,n < ?Z/~i. (2)

Fig. 2 illustrates the bounds of s~,~ for R = 8 and wi ❑ = 3(ri = 3/8). In the figure, for example, slot sj,l must be one ofthe slots O, 1, and 2, while slot s i,z must be one of the slots 3,4, and 5. From the figure, we can see that the slots assigned toflow i are uniformly placed in a round.

B. Idling URR discipline

The URR server is realized as a simple state machine with thenumber of states R, therefore, it is easy to implement URR in

0-7803-5420-6/99/$10.00 (c) 1999 IEEE

wi=3o I/r 2/r 3\r = R

i i i

arrival of p:

_L.. ..&.._~...t=o~

lstr&nd ~: 2ndround~

F]g. 2. Range of the begianing of the assigned slots. Fig. 3. Boundaries of the ronnds.

hardware[ 12]. However, time complexity to select a flow to beserved increases slightly with R, since the URR server has to skipover R – 1 slots in the worst case if an only flow is backlogged.To solve this problem, we propose a modified version of URR,which we call Idling URR (1-URR). The behavior of the I-URRserver is the same as URR when there is no idle slot in a round.However when the server encounters an idle slot, the server doesnot skip over it but selects a flow to be served using anotherpolicy. Although we may adopt an arbitra~ policy for the idleslots, it should be quite simple so that the resulting I-URR servercan be easily implemented. A simple policy is to skip over asmany slots as possible. Another practical policy is presentedlater in Section 5. In the following section, however, we assumethat when the I-URR server encounters an idle slot, the serversimply becomes idle during the slot, since this policy gives thelower bound on the performance of I-URR.

III. PERFORMANCE ANALYSIS

In this section, we analyze the performance of URR and I-URR in ATM networks with respect to the end-to-end delaybound, the maximum backlog, the throughput fairness, and theworst-case fair index. Consider a queueing system with a singleserver of rate C. Let d and t$~be the interval of a slot LC/C andthe average service interval of flow i, L ./pi, respectively.

Definition 1: A backlogged period for jlow i is any periodof time during which flow z’is continuously backlogged in thesystem.

Definition 2: A maximal backlogged period for jlow i is amaximal interval of time during which flow i is continuouslybacklogged in the system.

Consider a maximal backlogged period for flow i and let p ~be the kth cell of flow i to depart under URR in the period. Also,let a$ and d$ denote the time that p: arrives and the time that p!departs under URR, respectively. We first show the followinglemma.

Lemma 1: Under URR, for 1 ~ 1,n and 1< n, the departuretimes of cells are bounded as:

where e(tl, tz) denotes the number of idle slots in the interval[t,,tz].

Proofi Suppose that the first packet p j arrives within slot a inaround and let the time that the round starts be time zero (Fig. 3).Let ka denote the maximal integer which satisfiess ~)~=< a, ifno such integer exists, k~ = O. We first prove (3). Suppose that~? is Ser=d at slots. ,n. in the onth roud, then

n = Wi(on – 1) + n* – ka. (6)

Let e(tl, tz) denote the number of idle slots during the interval[tl, tz]. Since d? corresponds to the end of slot s,,n. in the o“thround,

d? = (R(on – 1) + (Si,n. + 1) – e(O,d;)) d. (7)

From (2), si,n. < n*/ri anda~ ~ (CY– e(O, a~))6,

( )d?–al< R(on–l) +~+l–a–e(a\, d~) 6. (8)8

d? – a; becomes maximal if the first packet just missed a slotallocated to flow i. Let slot st,~ be the missed slot. We havetwo cases.

Casel: a = Si,m+l andka = m. From(2), a = si,~a+l >(k@- l)/ri + 1. Thus, from (8)

d~–a; < (R(o” – 1)+ ;z ‘l-(W+l)

-e(aj,d~))6

(

Wi(On – l)+n*–k.=

)+ ~ –e(a~, d?) d

r~ z

= (n+ I)di - e(a~, d~)6. (9)

The first equality holds from R = wi/ri, and the last equalityholds from (6) ad J/ri = C$i.

Case 2: Q = O(si,~ = R – 1) andka = O. In this case,

(Wi(On –1) + n*——)

+ 1 –e(a~, d~) 6ri

The last inequality holds from l/ri >1. From (9) and (10), weobtain (3). Inequalities (4) md (5) are straightforward from (6),(7), and (2). ❑

Now we consider the case of I-URR. The only differencebetween I-URR and URR is that the I-URR server is left idleduring the idle slots instead of skipping over them. Thereforefrom (6), (7) and (2), (4) and (5) are also satisfied for I-URR bysetting e(d~, d? ) = O. However (3) is slightly modified, sinceif p! just missed slot si,~, the server becomes idle instead ofskipping over it. In this case, d = s i,~ and d? – a; increasesby one slot or 6. Consequently we obtain the following results.

Lemma 2: UnderI-uRR, for 1 s n, 1and 1< n, the departuretimes of cells are bounded as:

0-7803-5420-6/99/$10.00 (c) 1999 IEEE

Using Lemmas 1and 2, we show bounds on the service offeredto flow i in any backlogged period. Let W} (r, t) be the amountof service offered to flow i by a server kin the interval [~,t].

Lemma 3: Let [r, t] beany backlogged period for flow i underURR, then Wi”RR(~, t) is bounded as:

W~RR(r, t) ~ pi(t – I- -I-e(~, t)c$)– 2LC (14)

W~RR(T, t) ~ pi(t – r + e(~, t)d) + 2LC (15)

Proofi We first prove (14). Consider any backlogged periodfor flow i [T,t]. Let d: and d? be the maximal departure timethat satisfies d! s ~ and the minimal departure time that satisfiest < d?, respectively. If no such d: exists, we set d! = d: a a}.

Since flow i is backlogged at time t,d~ always exists. From thedefinitions of d: and d)’, WiuRR(T, t) ~ (u – 1 – l)LC holds.Due to (3), (4), and e(dj, d?) ~ e(~, t),

t–r~d~–d~ ~ (u–l+l)6i–e(r, t)6. (16)

Therefore,

W~RR(T,t) ~ (u –1 + l)Lc – 2Lc

> t–T+e(~, t)6L _2L—

& c ‘

= pi(t – T + e(~, t)d) – 2LC. (17)

The second inequality holds from (16) and the last equality holdsfrom LJiii = pi.

Next, we prove (15). Consider any backlogged period offlow i [~,t].Letd: and d? be the minimal departure time thatsatisfies ~ < d! < t and the maximal departure time that satisfiesr g d: < t, respectively. Notice that the definitions of d ~ andd: are different from the above paragraph. We have three cases.

Case 1: There are no such d: and dj’. In this case,WiuRR(~, t) = O. Thus (15) is satisfied.

Case 2: d!= d:. In this case, W~RR(r, t) = Lc. Thus (15)is also satisfied.

Case 3: d: < d;. In this case, W~RR(~, t)= (u– 1+ l)LC.From (5) and e(d~, d?) < e(~, t),

t–r > d~–d~ > (u–l–l)d~–e(~, t)d. (18)

Thus,

W: RR(~,t) = (u– 1– l)Lc + 2LC

< t–~+e(r, t)t$LC + 2LC—

&

= pi(t – ~ + e(r, t)6) + 2LC.U (19)

In the case of I-URR, we obtain the following lemma fromLemma 2.

Lemma 4: Let [~,t] be any backlogged period for flow i underI-uRR, then W~-uRR(~, t) is bounded as:

J’V/-~R~(T,t) > pi(t–r–(3)-2LC (20)

W/-uRR(T, t) < pi(t -r)+ 2LC (21)

Proofi The proof is the same as that of Lemma 3 except forusing (11), (12), and (13) instead of(3), (4), and (5). ❑

Notice that inequality (21) indicates that the output traffic ofI-URR conforms to the leaky bucket process with the parameters(2L,, pi). This means that the I-URR server works as a trafficshaper. Thus I-URR significantly alleviate network congestioncaused by the burstiness of the traffic.

4

A. End-to-end Delay and Backlog Bounds

In order to obtain the bounds on the end-to-end delay and {the

backlog of URR and I-URR, we introduce the concept of theclass of ,0? servers proposed by Stiliadis and Varma[17]. Mamyservers, e.g. PGPS, VC, SCFQ, and WRR belong to this claSS.We show that URR and I-URR servers are also belong to the,02 class. A server k in L%?is characterized by two parameters:latency 0$ and allocated rate p,. From the results of Stiliadis andVarma, the upper bounds on both the end-to-end delay and thebacklog can be derived from the latency 6$.

The following two lemmas give us the Iatencies of URR amdI-URR servers, respectively.

Lemma 5: (URR belongs to ,0?) In a network where allpackets have the same size L c, the URR server belongs to IXwith a latency less than or equal to 2L c/p~.

Proofi The proof is straightforward from (14) and Lemma 6in [18]1. ❑

Lemma 6: (1-URR belongs to ,0?) In a network where allpackets have the same size L., the I-URR server belongs to 1Xwith a latency less than or equal to 2L ./p~ + L./C.

Proofi The proof is straightforward from (20) and Lemma 6in [18J2.CI

Based on the latencies, we obtain the following results 3:Theorem 2: (End-to-end Delay and Backlog Bounds of

URR) In a network where all packets have the same size L.,suppose that the traffic of flow i conforms to a leaky bucketprocess with parameters (oi, pi ) and the scheduling algorithm ateach of the K servers on the path of flow i is URR. Then,

1,

2<

the end-to-end delay of packets belonging to flow i, denot-ed by D~RR, is bounded as

(:22)

The maximum backlog of flow i in the kth node Q ~~R (t).,..is bounded as

Q~fR(~) g rJi+ 2kLc.

Proofi The proof is straightforwardfrom Lemma 5 and theresults of Stiliadis and Varma[17].0

For I-URR, since the output traffic conforms to a leaky bucketprocess with parameters (2LC, pi) as shown by (21), we obtainthe following tight bound on the backlog.

Theorem 3: (End-to-end Delay and Backlog Bounds ofI-URR) In a network where all packets have the same size L c,suppose that the traffic of flow i conforms to a leaky bucketprocess with parameters (o~, pi ) and the scheduling algorithm ateach of the K servers on the path of flow i is I-URR. Then,

1. the end-to-end delay of packets belonging to flow i, denot-ed by D~-uRR, is bounded as

1From the proof of Lemma 6 in [18], actually Lemma 5 can be PrOv~ from

W’,uRR(aj, t) ~ p; (t– a:)– 2L~, which can be derived from only (3).‘As described in footnote 1, actually Lemma 6 is derived from only (1 1).3The form~ bo~ds on the end-to-end delay for CR serversaregreater’l.han

(22)and(23) by L./p,;however, they can be improved using the idea presentedin [17].

0-7803-5420-6/99/$10.00 (c) 1999 IEEE

where ck is the capacity of the kth server along the path.2. The maximum backlog of flow i in the kth node Q f:#RR(t)

is bounded as:

[1Q&RR(~) S oi + 2L + ~ JL fork= 1(24)

Proofi The proof is straightforward from the inequality (21),Lemma 6, and the results of Stiliadis and Varma[17]. ❑

In practice pi < C’ is satisfied, therefore, the thh-d term of(24) and the second term of (25) usually become zero. Theorem3 indicates that I-URR significantly saves buffer space allocatedfor each flow especially for a flow with a large bucket depth c i.

B. Fairness Properties

We show throughput fairness introduced by Golestani[5] inorder to evaluate short-term unfairness of URR and I-URR interms of the differences of services offered to two flows. For anytwo flows i and j that are continuously backlogged during [~,t],throughput fairness of server k, F k is a constant which satisfies

\@(T,t) /P, - W~(7, t)/pjl S F’. (26)

From Lemma 3 and Lemma 4, the following theorems are s-traightforward.

Theorem 4: (Throughput Fairness of URR) In a networkwhere all packet have the same length L c, throughput fairnessof URR is FURR = 2( L~/pi + L./Pj).

Theorem 5: (Throughput Fairness of I-URR) In a networkwhere all packet have the same length L c, throughput fairnessof I-TJRRis F1-URR = 2(Lc/p~ -t- Lc/Pj) + L./C.

We also describe another fairness measure called worst-casefair index (WFI) introduced by Bennett and Zhang[6]. Intuitive-ly, a backlog of size Q i of tlOW i should be served by Q i/pi.The WFI indicates the worst case discrepancy from this desiredtime.

A service discipline k is called worst-case fair for flow i if thedelay of any packet p: of flow i is bounded by

d~’k < a{ + Q~(a~)/pi +m~, (27)

where d~’kis the time that pj departs under the server k and a ~is the time that p: arrives and Q ~(a! ) is the queue size of flowi at time a!, and mf is a constant independent of the queues ofthe other flows sharing the server, which is called WFI for flow

‘/C’. Thei. The normalized WFI is defined as w k = maxt piwifollowing theorems show the WFIS of URR and I-URR.

Theorem 6: (WFI of URR) In a network where all packetshave the same size L,, URR is the worst-case fair for flow i withthe normalized WFI w‘RR = LC/C.

Proofi Consider a maximal backlogged period for flow i andlet P;, aj, and d: denote the jth packet to depart under,URR inthe period, the time that p{ arrives, and the time that pi departs

h(a:)under URR. Also, let Qi (a{) and pi be the queue size of

flow i at time a; including p; and the packet at the head of theflow i queue at time a;, where Qt(aj) = (j – h(a~) + l)LC issatisfied. We have two cases.

case 1: If p~(a;) = p:, from (3),

d~–a{ s d{–a~ ~ (j+l)6i

= (Q(4)+ L~)/Pi, (28)

where the last equality holds from Q(a{ ) = jLC and ~i = Lc/’pi.

Case 2: If Pi‘(a:) # p}, from (4),

dj – a! < d{ – d~(”i)-l < (j – h(a~) + 2)6;% — —

= (Q(@)+ L)/Pi, (29)

where the last equality holds from Q(a\) = (j – h(a~) + 1)Lc.From (28) and (29), we obtain WURR= L./C. 0

Theorem 7: (WFI of I-URR) In a network where all packetshave the same size Lc, I-URR is worst-case fair for flow i withthe normalized WFI W1-URR= LC/C(l + maxi pi/C).

ProoE The proof is the same as that of Theorem 6 except thatwe use(11) and (12) instead of (3) and (4). ❑

IV. PERFORMANCECOMPARISON

A. Performance

In this section, we compare the performance of URR andI-URR with other disciplines in a network where all packetshave the same length L c. Table 1 summarizes the performanceof several scheduling disciplines including URR and I-URR.In the table, K is the number of switching nodes along thepath of flow i between end hosts. For each switching nc}dek, ck and Nk represents the capacity of the output link and themaximum number of flows on the tink, respectively. In the table,BRR (Batch-assigned Round Robin) denotes the WRR disciplineadopting the batch assignment mentioned in Section 2 and .R kdenotes its round size at node k. The performance of BRR. isderived from [17]. The bottom row indicates the best valueof each performance measure, which is achieved by differentscheduling disciplines proposed so far, given in parentheses.

We can first see from the table that all of the four performancemeasures of URR and I-URR are independent of the number offlows on the link Nk, while those of BRR strongly depend onR’. SiUCeLc/C’ and pi/C’ are quite Small in praCtiCe,theperformance of I-URR is close to that of URR except for themaximum backlog. The delay bound of I-URR is larger thanthat of PGPS by KLc/pi. When ui is quite large compared withK, this differences is not significant. Even if o~ is negligible,the delay bound is less than twice that of PGPS. Note that unlikethe timestamp-based disciplines, the actual delay of I-URR isindependent of the load on the network, but depends on howunifonnty slots are assigned in a round. In the next section, wepresent a method to improve the delay bounds in the case thatseveral flows are reserving the same service rate. The maximumbacklog of I-URR is the small constant 4LC for k >2 regardlessof the burstiness of the flow. This means that I-URR considerablyreduces the buffer space allocated for guaranteed connections,so that best-effort comections can utilize a large buffer space inorder to minimize their cell loss rates. Although the throughputfairness of I-URR is about twice as large m that of SCFQ, it is

independent of Nk and is a small constant. Furthermore, WF1

0-7803-5420-6/99/$10.00 (c) 1999 IEEE

TABLE I

PERFORMANCEOFSEVERALSCHEDULINGDISCIPLINES

Server End-to-enddelay beund Maximumbacklog Throughputfairness normtilzed WFIatthe kth server at the kth server at the kth server

BRR ~Lc(Rk – tot + 1)

-~+~ C, u, +5 c,

piLc(R~ – W, + 1) L$O(Rk)

P% ‘ Ckk=, k=l

URR :+(2K–l)Z u, + 2kL. 2 ~+~c

3K

I-URR ;+(2K–1):+~:ui + 2L. fork=l4Lt ferk~2 2(;+: )+$ :(1+”?9

h=,K

;+(K– l): +x$ u, + kL.(?+2) $

k=l

(

PGPS[3] , VC[4] , FFQ[9] ,

) (

PGPS, VC, FFQ,SPFQ[9], WF2Q + [7] SPFQ, WF2Q+ )

(SCFQ[5], SFQ[8]) (WF2Q+)

of I-URR is close to that of WF2Q, which is the optimal valueof WFI[6].

Now we present numerical examples of the delay bounds ofURR(DURR) and I-URR (D1-URR) in comparison with that ofPGPS (DPGPS) for two cases: 1) o~ >> Lc and pi E Ck, and2) ~~CYL. and Pi << ck. Consider an ATM network where thecapacity of all links are the same rate of ck = 155 Mbps. Thepacket size in ATM networks is a constant of L. = 424 bhs. Thenumber of switches between end hosts is assumed to be K = 16.We first assume the case 1), where p i is half of ck or 77.5 Mbpsand Oi = 100LC. In this case, we obtain D ‘Gps = 673ps,while D“RR z 7171-Mand D l-URR E 760ps. In the case 2),where pi is 1.5 Mbps imd ~i = 5LC, we obtain DPGPS = 5.7ms, while DURR N D1–URR a 10.2 ms.

B. Complexity

In Table 2, we summarize time complexity to select a packetto be served for several scheduling disciplines. From the table,BRR and I-URR have the lowest complexity of O(1). The per-formance of BRR, however, gets worse as N~ increases as shownin Table 1. Thus I-URR is the only discipline which has 0(1)complexity while providing the end-to-end delay and fairnessbounds which are independent of Nk. I-URR, however, requiresthe calculation of the slot assignment, therefore, we evaluate itscomplexity taking the slot assignment into consideration.

We refer to the set of service rates reserved by flows,

(Pl~ PZ)-) PN ) aS aflow state On the link. The slot assignment iSdone when the flow state on the link is changed, e.g. when newconnections are established. Since the set of connections variesover time, the slot assignment must be done repeatedly as long asthe flow state changes. There are a certain time lag between thetime that a request for a connection setup is accepted and the timethe first cell of the connection can be served. In the worst case,the time lag is estimated at twice as long as the time that uniformslot assignment spends. Like the timestamp-based disciplines,the uniform slot assignment is achieved using a priority queue.In the algorithm, insertion and deletion operations are repeatedR times. If each operation have a complexity of O(log N), thecomplexity of the uniform slot assignment is O(R log N). Sup-posing that the insertion and the deletion operations take a timecorresponding to n slots, the algorithm takes n times as long as

the length of a round. Note that to implement timestamp-basedscheduling, the value of n must be one. For instance, if the min-imum bandwidth that can be reserved is 64 kbps and L. = 424bits, the length of a round is about 6.6 ms regardless of the linkspeed. Thus for n = 5, the uniform assignment takes 33 ms andthe time lag becomes 66 ms. Note that when a new connection isset up, ATM switches perform admission control and routing aswell as signaling for the connection setup. We believe the com-putational complexity of the slot assignment is not significantcompared with these processes.

V. HIERARCHICAL WRR DISCIPLINE

In this section, we consider away of hierarchically organizingthe WRR servers[7] [19], using I-URR as the root server. ~lepurposes of introducing hierarchical WRR (H-WRR) are: 1) topresent an efficient policy for idle slots, and 2) to improve theend-to-end delay bound.

A. Policy for idle slots

I-URR can meet various QoS requirements by properly choos-ing the policy for idle slots. Since connections with guaranteedQoS are expected to have no congestion in the network, we preferto assign the idle slots to best-effort connections. Typically, AT-M connections are grouped into four classes based on their trafficcharacteristics: constant bit rate (CBR), variable bit rate (VBR),available bit rate (ABR), and unspecified bit rate (UBR)[21].A connection that belongs to CBR or VBR requires guaranteedbandwidth and bounded delay. We refer to a flow of a confec-tion that belongs to either ABR or UBR class as a best-eflort

TABLE 11

COMPARISONOFCOMPLEXITY

Server Complexity

PGPS, WF2Q, URR O(N)VC,SCFQ,SFQ,FFQ,SPFQ,WF2Q+ O(log N)LFVC O(log log N)

I BRR, I-URR I 0/1)- - ‘ I

0-7803-5420-6/99/$10.00 (c) 1999 IEEE

jlow that should be assigned the idle slots, where “best-effort”does not mean no QoS guarantees. Since ABR connections withminimum cell rate (MCR) > 0 require guaranteed bandwidth,the server must serve each of them at a rate of greater than itsMCR.

We consider the aggregated flow consisting of all of the best-effort flows to be a super flow b. The I-URR server allocatesthe aggregated service rate (slots) to super flow b, where theallocated service rate must be greater than the sum of MCRS ofABR connections. When the I-URR server encounters either aslot assigned to super flow b or an idle slot, the slot is assignedto a flow selected by a second-level server from the flows be-longing to super flow b. Since the second-level server should beimplemented with low complexity, and the best-effort flows aretolerant of delay, we can employ BRR as the second level server.

B. Improvement of End-to-end Delay Bound

Typically flows of the same applications require the sameamount of bandwidth. Consider a situation where M flows arereserving the same service rate of p (bit/see) on a link. In thiscase, instead of assigning the slots to each flow, we consider theset of ill flows to be a super flow s and assign the aggregatedslots to super flow s. When the I-URR server encounters a slotassigned to super flows, the slot is assigned to a flowbelonging tosuper flows in the round robin manner. This algorithm improvesthe end-to-end delay bounds of the A!fflows as follows.

Consider a maximal backlogged period [~,t] for flow i belong-ing to super flows. Also, let [~s, t.] be the maximal backloggedperiod of super flow s with T, ~ T < t ~ t~. A super flow iscalled backlogged if at least one flow belonging to the super flowis backlogged. Let P$ (p$), a$ (a!), and d~ (d:) denote the kthcell of flow i (super flows) to depart under I-URR in the period[~,t] ([7., t,]), the time that p! (P$) arrives and the time that p!(P$) departs under I-URR, respectively. Let d! be the maximaldeparture time which satisfies dj < aj, if no such dj exists, weset d; = d: G a:. Since flows in super flows are served in theround robh manner, in the backlogged period for flow i, at leastone flow i cell is served every Al cell transmissions of super flows. Therefore from(11) and (12),

d$ – a; < d~tnM _ d~ < (nM+ l)~C + ~— s— Alp c

()1 L. L.——

‘+Z ;+5”(30)

This inequality results in the latency of equal to or less than(1+ l/Af)LC/p~ + LC/C for flow i. Therefore, the end-to-enddelay of flow i is bounded by

If Af is sufficiently large, this bound is close to that of PGPS.Note that this improvement can be applied to URR in the sameway. Since the round robin server is implemented with O(1)complexity, H-WRR still maintains O(1) complexity. Fig. 4shows an example of the structure of H-WRR presented in thissection. In the figure, each leaf node corresponds to a queue,

nLink

Fig. 4. A structure of the hierarchical WRR.

IHost Switch Switch Switch Switch~1 Host15 A B c D 2

Flows 10 10 10 10(Flows 1-5) Flows Flows Flows Flows

(Fk)w’S6-15) (FtOWS16.25) (F]OWS26.35) (FrOWS36-45)

Fig. 5. Simulation model,

where two queues are added at best-effort node B, one for U13Rconnections and the other for a subclass of ABR cormecticms(with MCR= 0)[ 13]. All CBR connections are assumed to bereserving the same bandwidth, so that they are served in theround robin manner at node A.

VI. SIMULATIONRESULTS

In this section, we present simulation experiments to illus-trate delay properties of I-URR and the improvement of delayachieved by H-wRR. We compare the delay properties of 1-URRwith those of BRR and SCFQ. Fig. 5 shows our simulation set-ting, where five flows, each of which is called flow i (1 s i s 5),are established from host 1to host 2 through four ATM switchesA, B, C, and D. The four ATM switches implement the samescheduling discipline in their output buffers, where the buffersize is adequate for achieving no cell loss for all flows. Each ofthe five flows generates cells at a rate of 5 YO of the link capacity,where the link capacity is assumed to be 155 Mbps. At eachATM switch, flows 1-5 share the output link with the other 10flows, each of which generates cells at an average rate selectedrandomly with a condition that the sum of the rates over the 10flows is equal to 7594.of the link capacity. In every switch, eachflow is reserving the same amount of bandwidth as its averagerate. Thus the bandwidth of each link is fully reserved by theflows.

In the figure, each of flows 6-45 is an onJoff source that con-tinuously generates cells at the link rate during an on-period andgenerates no cell during an off-period. The number of cells gene-rated during an on-period is randomly selected between 1 and30 and the length of an off-period is determined by the number ofcells generated in the previous on-period so that the onloff sourceconforms to a leaky bucket process with parameters (30L ~, av-erage rate). Each simulation runs 105 slots long (about 270 ins).For lttck Of SpitGG,we prcmmt only a few experiments which

typically show the properties of I-URR and H-WRR.

0-7803-5420-6/99/$10.00 (c) 1999 IEEE

8

0.2

1 , I205 206 207 208

Time (ITIS+)209 210

Fig. 6. Cell delay of a constant bit rate source.

I I

0.2

I I205 206 207 208

Time (ITIS)209 210

Fig. 7. Comparison between I-URR and H-WRR

A. Constant Bit Rate Sources

We assume that each of flows 1-5 is a constant bit rate source.Fig. 6 shows the cell delay of flow 1at host 2 under I-URR, BRR,and SCFQ. The round size of BRR is set to 100, since we assumethe minimum service rate is 1YOof the link capacity. From thefigure, the cell delays under I-URR and SCFQ are much smallerthan that of BRR. The variation of cell delay under I-URR isperiodic and smaller than that of SCFQ, since the cell delayunder I-URR in each switch depends only on when the first cellof the flow arrives in a round. Fig. 7 shows the cell delay of eachof flow 1-5 under I-URR. In the figure, we also show results ofH-WRR where all of flows 1-5 are aggregated to a super flow andserved in the round robin manner as described in the previoussection. Since under H-wRR, flows 1-5 shares the service rateof 25% of the link capacity, not only the maximum delay butalso the average delay of each flow are significantly improved asshown in the figure.

B. On/off sources

Next, we assume each of flows 1-5 is an onloff source asdescribed above. Fig. 8 shows the cell delay of flow 1 at host2 for I-URR, BRR, and SCFQ. The cell delay under I-URR israther large compared with SCFQ. However, the output trafficof each flow is shaped under I-URR, so that the queue lengthsof each flow at switch B, C, and D become much smaller thanthose of SCFQ and BRR. For instance, Fig. 9 shows the queuelength of flow 1 at switch B, where the queue length of I-URRis at most one or two cells, while those of BRR and SCFQ reachabout 10cells, This implies I-URR can achieve no cell loss evenif only a few buffers are available for the flow. Note that thequeue lengths of BRR and SCI?Q increase as the burstiness of

TI-URR flow 1 —

BRRflowl -----SCFQ flow 1 -----

‘t -1a~.-g

1

195 200 205Time (rm+)

210

Fig. 8. Cell delay of a on-off source.

20 I 1I-URR flow 1 —

BRRflowl -----SCFQ flow 1 –-–

15

%

3 ,0

3

5:::

!!

n205 206 207 208 209 210

T!m8 (MS)

Fig. 9. Queue length at switch B.

the source increases, while that of I-URR is a constant as givenby the analysis.

VII. CONCLUDINGREMARKS

In this paper, we proposed and analyzed two new variants ofthe WRR discipline, URR and I-URR, both of which adopt theuniform assignment algorithm to determine the service sequencein a round. The main advantage of both algorithms comparedwith the timestamp-based algorithms such as PGPS, is that theyrequire neither timestamp calculation nor sorting of packets ac-cording to their timestamps. I-URR successfully guarantees thedelay and fairness bounds with 0(1 ) time complexity per pack-et. Recently several works have done to improve implememlta-tion (both algorithmic and hardware) efficiencies of timestarnp-based scheduling[9][10 ][20]. Their complexities, however, stillincreases with N. I-URR is applicable to high speed ATMnetworks where timestamp-based scheduling cannot be imple-mented. Note that in an S x S nonblocking output buffered ATMswitch, a maximum of S cells arrive at an output buffer in theone cell transmission cycle. Thus complexity of enqueueing ismuch more significant than that of dequeueing. Since both URRand I-URR require no particular operations when arriving cellsare queued, both algorithms easily support switches with a largenumber of input ports.

As mentioned above, however, there are two drawbacks: (1)the end-to-end delay bounds is greater than that of PGPS byl{LC/pi, (2) there is a certain time lag between the time theflow state is changed and the time the change is reflected by theactual scheduling. To improve the delay bound and efficiencyof I-UKR, we also prmtmtecta way of organizing diffcmmtWllRdisciplines in a hlerarctical structure, using I-URR as its root

0-7803-5420-6/99/$10.00 (c) 1999 IEEE

9

server. If several connections are reserving the same bandwidth,their delay bounds achieved by H-WRR are close to those ofPGPS, while H-WRR still maintains 0( 1) complexity.

ACKNOWLEDGMENTS

The authors would like to appreciate Mr. K. Nishimura ofHiroshima University for his helpful guidance. They also wishto greatly thank Professor M. Yamashita, Mr. J. N. Rodrfguez,and Mr. N. Johnson of Hiroshima University for their helpfuladvice. They would also like to thank the anonymous refereesfor their insightful comments.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

N333REN133s

A. Demers, S. Keshav, and S. Shenkar, “Analysis and simnfation of a fairqueueing algorithm: IEEE SIGCOM’89, pp.1-12, 1989.Abhay K. Parekh, and Robert G. Gallager, “A generalized processor sharingapproachtoflowcontrol in integrated services networks: The stigle-nodecase: IEEEIACM Trans. Networking, vol.1, no.3, pp.344–357, 1993.Abhay K. Parekh, and Robert G. Gallager, “A generalized processor sharingapproach to flOWcontrol in integrated services networks: The multiple nodecase,” IEEEIACM Trans. Networking, VO1.2,no.2, pp. 137–150, 1994.L. Zhrmg, “VirtualClock: A new traffic control algorithm for packet switch-ing networks? ACM. Trans. Comput. Syst,, VOL9,PP.101–124, 1991.S. Jantaloddln Golestani, “A self-clocked fair queueing scheme for broad-band applications,” in Proc.IEEE INFOCOM’94, pp.636-r546,1994.Jon C. R. Bennett and Hui Zhang, “WF’2Q: Worst-case fair weighted fairqueueing;’ in Pmt. IEEE INFOCOM96, pp.12&128, 1996.Jon C. R. Bennett and Hui Zhang, “Hlerarchlcal packet fair queueing algo-ritfnns: IEEEIACM Tram. Networking, VO1.5,no.5, pp.675489, 1997.Pawan Goyaf, Harrick M. Vat, and Haichen Cheng, “Start-time fair queue-ing A scheduling algorithm for integrated services packet switching net-works: IEEEIACM Trans. Networking, VO1.5,no.5, pp.690-704, 1997.Dbnitrios Stiliadis and Anujan Varma, “Efficient fair queueing algorithmsfor packet-switched networks” IEEEIACM Trans. Networking, VO1.6,no.2,pp.175–185, 1998.

[10] Subhash Suri, George Varghese, and Girish Chandmnmenon, “Leap for-ward virtual clock A new fair queuing scheme with guaranteed delays andthroughput fairness: in Proc. IEEE INFOCOM 97, 1997.

[11] M. Shreedhar, and George Varghese, “Efficient fair queuing using deficitround-robin,” IEEEIACM Trans. Networking, vol. 4, no. 3, pp.375–385,1996.

[12] Manofis Katevenis, Stefanos Sidiropoulos, and Costas Courcoubetis,“Weighted ronnd-robin cell multiplexing in a general-purpose ATM switchchip? IEEE J. Select. Areas Comrnun.,VOL9,no.8, pp.1265–1279, 1991.

[13] Anthony Hung and George Kesidis, “Bandwidth scheduling for wide-areaATM networks using virtual finishing times: IEEEIACM Trans.Networking,vol. 4, no. 1, pp.49–54, 1996.

[14] Charles. R. Kalmanek, Hemant Kanakia, and Srinivasan Keshav, “Ratecontrolled servers for very high-speed networks: in IEEE Global Telecom-munications Conference, 1990.

[15] Alon Itai and Zvi Rosberg, “Golden ratio control policy for a multiple-access channel,” IEEE Trans. Automatic Control, vol. AC-29, no. 8, pp.712–718, 1984.

[16] Norio Matsufurn, Kouji NMdmura, and Reiji Albara, “A study on perfor-mance improvement for weighted ronnd-robin scheduler? TechnicalReportof IEICE CQ98-7, 1998.

[17] Dknitrios Stiliadis and Anujan Varma, “Latency-rate servers: A generalmodel for analysis of traffic scheduling algorithms,” in Proc. IEEE INFO-COM96, PP.111-119, 1996.

[18] Dbnitrios Stiliadis and Anujan Varrna, “Latency-rate servers:A general model for analysis of traffic scheduling algorithm-

“ U.C. Santa Cruz, Tech. rep. UCSC-CRL-95-38, July 1995,~&x//www.cse.ucsc.edu/research/hsnlab/publications/.

[19] Sally Floyed and Van Jacobson, “Lti-sharing and resource managementmodels for packet networks,” IEEEIACM Trans. Networking, vol. 3, no. 4,1995.

[20] H. Jonathan Chao, Hsiling Cheng, Yau-Ren Jenq, and Daein Jeong, “De-sign of a generalized priority queue manager for ATM switches: IEEE J.Select. Areas Commun., VO1.15,no.5, PP.867–880, 1997.

[21] The ATM Fomm, ATM User-Network Interface (UNI) Specification Ver-sion 3.1. Upper Saddle River, NJ: Prentice-Hall, 1995.

APPENDJX

I. PROOF OF THEOREM 1

ProoE We first prove property 1. From the definition ofEj and wi/ri = R, the number of slots assigned to flow icannot exceed w i. Therefore, if 13j # @holds for any slot j(O ~ j < R – 1), wi slots are assigned to flow i for all flows.We prove the result by contradiction. Assume that Ei = 0 holdsfor a slot j. From the definition of Ej, v~-l/ri > j holds forany flow i. By multiplying both sides by r i and then summiigover i, we have ~~ v~–l > j. The left side represents the totalnumber of slots assigned between slot Oand slot j – 1, therefore,it must be equal to j. Thus the above inequality does not hold.Consequently, Ej # @holds for any slot j.

Next, we prove property 2. For any slot s, (2) can be writtenas

(v~e – 1)/rfs s s < v~./rf8. (32)

We prove the above inequality. The left side of the inequalityis satisfied as the following. Due to js c ES, (v;. — 1)/r~. =

v~;l/r~. < s. Then we show the right side of the inequality.Let m be the largest integer that satisfies O < m ~ s – 1 amd

v~~ /rfm > v;. /r f.. We have two cases.Case 1: There is no such m. In this case,

vj./r~. = ~q;~8(vjj/rfj ) = ,~~xN(v~/ri). (33)— — ——

Here the following lemma is useful.Lemma Al: If both ~~1 ri ~ 1 and ~fll 12 = z are

satisfied, ~g~x~ (Zi/ri) ~ z holds.

Proofi W= prove the lemma by contradiction. Assume thatfor any i, li/ri < X. By multiplying both sides by ri and then

summing over i, we obtain ~~, li < x . xi ri ~ x. This

inequality is contradictory to the assumption ~ ~ *1~= i-r.El

From (33), Lemma Al, and xi V: = s + 1,

v~8/rfa > s+l > s. (34:

Case 2: There is m 20. In this case,

~ /rf, form+l <j~s-1. (35v~m/rf~ > v~.lrfs 2 of,

Let 1~and H be the number of slots assigned to flow i betweenslot m + 1 and slot s, or v; – V,rnand the set of flow i withli # (), where ~ie~ li = s – m holds. From (35), no flOW

i ~ H is included in 13~. That is, mini=~ (v~–l/ri) > m.

Consequently,

Since ~iEH r; ~ 1 and ~ie~ li = S– m, maxieH(li/ri) 2

s — m holds from Lemma A. 1. Thus,

v~./rfa >m+(s -m) =s. (:37)

•1

0-7803-5420-6/99/$10.00 (c) 1999 IEEE

Documents

3C-2 Efficient Fair Queueing for ATM Networks using ...cheung/Courses/558a/Syllabus/7-network/... · Hiroshima University 1-4-1Kagamiyama, Higashi-Hiroshima ... We present important