researchdirect.westernsydney.edu.auresearchdirect.westernsydney.edu.au/islandora/object/uws:17589... · ACKNOWLEDGEMENTS I would like to thank my principal supervisor, Dr. Kenan Matawie

BLACK BOX MODELLING OF CONGESTION

CONTROL PROTOCOLS FOR COMPUTER

NETWORKS

S. Ravi Jagannathan

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE

REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

University of Western Sydney

August 2009

© Copyright by S. Ravi Jagannathan 2009

All Rights Reserved

DEDICATION

This thesis is dedicated to my wife Deepa and to my daughter Priyanka. Their support

and encouragement has been critical to the completion of this project.

ACKNOWLEDGEMENTS

I would like to thank my principal supervisor, Dr. Kenan Matawie at the School of

Computing and Mathematics of the University of Western Sydney, for his incredible

support, understanding, empathy and his constant encouragement and for his great

patience. His work has been very important in leading me through my thesis

research. I also would like to deeply thank my wife Deepa and my daughter Priyanka

who have provided emotional support throughout my studies in Australia. Without

them, I would not have had the courage to carry this project through to completion.

TABLE OF CONTENTS

Abstract

Acknowledgements

1. INTRODUCTION 1

1.1 Thesis Overview 1

1.2.1 Tiered Architecture 2

1.2.2 LAN’s & WAN’s 3

1.2.3 Internetworking 6

1.2.4 Black Box v/s White Box 8

1.2.5 Stochastic Modeling 9

1.2.6 Quantitative Modeling & Simulation in Software 10

2. INTERNETWORKING ISSUES 11

2.1 Introduction 11

2.2 Overview of Internetworking Concepts 11

2.3 Switching Overview 13

2.4 The Tiered Approach 16

2.5 Evaluating Backbone Capabilities 17

2.5.1 Path Optimization 18

2.5.2 Traffic Prioritizing 18

2.5.3 Load Splitting 21

2.5.4 Alternative Paths 21

2.5.5 Encapsulation (tunnels) 22

2.6 Distribution Services 22

2.6.1 Backbone Bandwidth Management 22

2.6.2 Area and Service Filtering 23

2.6.3 Policy-Based Distribution 23

2.6.4 Inter-Protocol Route Redistribution 24

2.6.5 Media Translation 24

2.7 Local Access Services 25

2.7.1 Value Added Addressing 25

2.7.2 Network Segmentation 26

2.7.3 Broadcast v/s Multicast 26

2.7.4 Naming, Proxy & Local Cache 27

2.7.5 Media Access Security 28

2.7.6 Router Discovery 28

2.7.7 ICMP 28

2.7.8 Proxy ARP 29

2.7.9 RIP 29

2.8 Constructing Internets By Design 29

2.9 Using Switches (Revisited) 30

2.9.1 Switches v/s Routers 30

2.9.2 Key Issues 31

3. NETWORK PERFORMANCE CHARACTERISTICS 32

3.1 Introduction 32

3.2 Frame Operations 33

3.2.1 Ethernet Frames 33

3.2.2 Fast Ethernet Frames 38

3.2.3 Gigabit Ethernets 39

3.2.4 Frame Overhead 41

3.3 Availability Levels 41

3.4 Network Traffic Estimation 43

3.5 An Excursion into Queuing theory 46

3.5.1 Buffer Memory Considerations 47

3.6 Ethernet Performance Details 49

3.6.1 Network Frame Rate 50

3.6.2 GE Considerations 51

3.6.3 Actual Operating Rate 52

3.7 Bridging a Network 52

4. ISSUES AT THE NETWORK, TRANSPORT AND

APPLICATION LAYERS 54

4.1 Internetworking Overview 54

4.2 Protocol Architecture 57

4.3 Design Issues 58

4.3.1 Addressing 58

4.3.2 Routing 59

4.3.3 Datagram Lifetime 60

4.3.4 Fragmentation/ Reassembly 60

4.4 Routing and Route Protocols 61

4.5 Routing Revisited 62

4.5.1 Routing Protocols 65

4.5.2 DV Protocols 67

4.5.3 LS Protocols 69

4.6 Excursion into the Transport Layer 71

4.7 Multimedia Service 72

4.8 Delay Calculations 74

4.8.1 10/100/1000 Mbps Ethernets 74

4.8.2 Switches 75

5. ETHERNET LAN’s REVISITED 77

5.1 Introduction 77

5.2 Transmission Media 78

5.2.1 Twisted Pair Comes into Two Varieties 79

5.2.2 Coaxial Cable 80

5.2.3 Optical Fibre Cable 81

5.3 An Excursion into the Ethernet Family 84

5.3.1 10 Mbps LAN 85

5.3.2 Fast Ethernet (100 Mbps) 87

5.3.3 Gigabit Ethernet (1000 Mbps) 90

5.3.4 10 Gigabit Ethernet 94

5.4 LAN Ethernet Design 97

5.4.1 Campus wide VLAN’s with Multilayer Switching 99

5.5 Switches Revisited 100

5.5.1 Scalability,Latency,Global Effect

of Failures/Collisions 101

5.5.2 Encoding Schemes 101

6. BLACK BOX CONGESTION CONTROL 106

6.1 The Basic Problem 106

6.2 Black Box Approach Described 108

6.3 TCP Tahoe & Reno 109

6.4 ACK’ing & ACK Clocking 111

6.5 TCP New Reno 113

6.6 Sack & D-Sack 114

6.7 Fack 115

6.8 Limited Transit 115

6.9 TCP Vegas 116

6.10 Sierra 117

6.11 TCP Friendly Rate Control 122

6.12 Mo-Walrand Algorithm 124

6.13 Packet Pair 124

6.14 Balakrishnan & Seshan’s Congestion Manager 125

6.15 The “Goodness” of any Black Box Solution 126

7. STOCHASTIC MODELLING OF CONGESTION 127

CONTROL ALGORITHMS

7.1 Motivation 127

7.2 Introduction 127

7.3 TCP/IP Stack Overview 128

7.4 Common Algorithms 130

7.5 A New Approach to Constrained Optimization 136

7.5.1 Introduction 136

7.5.2 Lagrange – Kuhn – Tucker Sufficiency 141

7.5.3 Penalty Methods 142

7.5.4 Exact Penalty Methods 145

7.5.5 Barrier Methods 146

7.5.6 Utility Functions 148

7.6 Stochastic Models 149

7.6.1 Deterministic Limits 149

7.6.2 Per Source Dynamics 150

7.6.3 Explicit Utility Feedback 151

7.7 Stochastic Models (Concluded) 152

7.7.1 Queue - Width Marking 153

8. QUANTITATIVE MODELING AND SOFTWARE 156

SIMULATION

8.1 A Quick Review 156

8.1.1 Tahoe/ Reno 156

8.1.2 SACK 158

8.1.3 New Reno 158

8.1.4 TCP Vegas 158

8.1.5 Sierra 160

8.2 Quantitative Modeling 161

8.2.1 Modeling Tahoe/ Reno 162

8.2.2 Modeling Vegas 164

8.2.3 Modeling Sierra 165

8.3 The Simulation - in - Software Project 168

8.3.1 NS 2 168

8.3.2 Opnet Simulator Tool Suite 171

8.3.3 Simulation Theoreticals 174

8.4 Sierra Simulation Project 177

8.5 Conclusions 182

LIST OF TABLES

3.1 Frame Overhead 42

3.2 Probabilities 48

3.3 Ethernet Frame Processing (Frames/Sec) 51

4.1 Major Network Layer Protocols 64

4.2 IP Addressing Overview 65

4.3 The Major Distance Vector Protocols 69

4.4 The Major Link State Protocols 71

4.5 Some Ethernet Delay Calculations 74

LIST OF FIGURES

2.1 Internetworking Scenario 12

2.2 Routers and Switches 13

2.3 Flow of Inter-subnet traffic with Layer 3 Switching 16

2.4 Priority Queuing 19

2.5 Custom Queuing 20

2.6 WFQ 21

2.7 Policy-based distribution: SAP Filtering 24

2.8 SR/TL Bridging Topology 25

3.1 Ethernet and IEEE 802.3 Frame Formats 33

3.2 Source & Destination Address Field Formats 36

3.3 Fast Ethernet Frame Formats 38

3.4 GE Frame Formats with Carrier Extension 40

3.5 GE Packet Bursting 40

3.6 Subdivided Networks 45

3.7 Typical LAN Information Distribution 46

3.8 Linking LANs with Different Operating Rates 53

4.1 The IP Header 56

4.2 The IPv6 Header 57

4.3 IP Operation 58

4.4 The Count-to-Infinity Problem 67

5.1 The Fast Ethernet Tree 88

5.2 Server/Switch Connection 90

5.3 GE Architecture 91

5.4 GE with Carrier Extension 93

5.5 GE with Packet Bursting 94

5.6 10 GE Architecture 95

5.7 10GE Serial & Parallel Implementations 97

5.8 Traditional Hub & Router Campus Networks 97

5.9 Interconnecting 10Base-T & 10Base-5 Networks 98

5.10 Campus Wide VLAN Design 99

5.11 Multilayer Switching 99

5.12 Connecting coaxial cable NIC to a wire hub 100

5.13 Hubs & Switches 101

5.14 Some basic encoding schemes 103

5.15 8B/10B Encoding 105

8.1 ns2 Class Hierarchy 169

8.2 Reference Simulation Topology 178

8.3 Reno Throughput 179

8.4 New Reno Throughput 179

8.5 Sierra Throughput 180

8.6 Sierra vs. Tahoe, Relative Throughput 180

8.7 Sierra vs. Reno, Relative Throughput 181

8.8 Sierra vs. New Reno, Relative Throughput 181

ABBREVIATIONS

AAA Authentication, Authorization & Accounting

APPN Advanced Peer to Peer Network

AOR Actual Operating Rate

ARP Address Resolution Protocol

AS Autonomous System

ATM Asynchronous Transmission Mode

BGP Border Gateway Protocol

CA Congestion Avoidance

CD Collision Detect

CIDR Classless Inter Domain Resolution

CM Congestion Manager

CRC Cyclic Redundancy Check

CSMA Carrier Sense Multiple Access

DHCP Dynamic Host Configuration Protocol

DLSW Data Length SWitching

DNS Domain Naming Service

D-SACK Duplicate SACK

DV Distance Vector

EBCC Equation Based Congestion Control

ECN Explicit Congestion Notification

ES Extended System

ESD End of Stream Delimiter

FACK Forward ACK

FCS Frame Check Sequence

FTP File Transfer Protocol

HSSG High Speed Study Group

ICMP Internet Control Message Protocol

IEEE Institute of Electronic & Electrical Engineers

IETF International Engineering Task Force

IGRP Interior Gateway Routing Protocol

IHL Internet Header Length

IP Internet Protocol

IPDU Internet Protocol Data Unit

IS Intermediate System

ISO International Standards Organization

LAN Local Area Network

LLC Logical Link Control

LS Link State

MAC Media Access Control

MTBF Mean Time between Failures

MTTR Mean Time to Repair

MTU Maximum Transmission Unit

NAPA Network Attachment Point Address

NetBIOS Network Basic Input/Output System

NLSP Netware Link State Protocol

NS Network Simulator

OFC Optical Fiber Cable

OSPF Open Shortest Path First

PBX Private Branch eXchange

PCS Physical Coding Sublayer

PDF Probability Density Function

PDU Protocol Data Unit

PMA Physical Medium Dependent

PMD Physical Medium Dependent

PSTN Public Switched Telephone Network

QOS Quality of Service

RED Random Early Detect

RFC Request for Comment

RIP Routing Information Protocol

RSVP Resource Reservation Protocol

RTMP Real Time Management Protocol

RTT Round Trip Time

SACK Selective ACKnowledgement

SAP Service Access Point

SCAM Sierra Congestion Avoidance Method

SNA Systems Network Architecture

SNMP Simple Network Management Protocol

SOF Start of Frame

SQS Sierra Quick Start

SS Slow Start

SSD Start of Stream Delimiter

SSTHRESH Slow Start Threshold

STD State Transition Diagram

STP Shielded Twisted Pair

UTP Unshielded Twisted Pair

TCP Transmission Control Protocol

TFRC TCP Friendly Rate Control

Thruput Throughput

TTL Time to Live

UDP User Datagram Protocol

VLAN Virtual LAN

VOIP Voice over IP

VTP VLAN Trunking Protocol

WFQ Weighted Fair Queuing

ABSTRACT

In this thesis, we look at some fundamental problems facing computer networking

technology. An extensive treatment of these areas is presented in the first instance. A

number of putative concepts, terminology and techniques, as pertinent to numerous

schools of thought, are presented, investigated and critiqued.

Going forward, we narrow our focus of consideration to some basic issues and trends

in the management of Internet congestion control, as well as many (now) traditional

attempts to address these problems. Key formulations are laid out, which set up the

problem at hand, and we raise many more fundamental questions. Key trends

observable in the literature are discussed. This provides a relatively smooth

introduction to the subject.

Reference is then made to Sierra, a novel “Black Box” congestion control

algorithm/protocol, which itself is the subject of serious ongoing refinement, having

already been baselined in five research papers on the subject. The “Black Box”

terminology was in essence conceived many years ago by van Jacobson, and is

revived in this thesis.

A framework for the comparative, stochastic (theoretical) analysis of various

congestion control algorithms/protocols is taken up and investigated. From a

theoretical, quantitative perspective, it is shown that Sierra offers relatively superior

throughput related performance levels.

Finally, we take up the matter of comparative simulation of Sierra, vis-à-vis its

“competitors”. For this project, the popular network simulator (tool) OPNET is taken

up and deployed. We present (here) and analyze the results from numerous

simulation experiments. The outcome is that Sierra enhances throughput, as against

other more traditional Black Box algorithms (Vegas, Reno, New Reno, etc.).

1

1

INTRODUCTION

1.1 THESIS OVERVIEW Quality of Service (QoS) can be defined as the problem of allocating short network

resources to a set of users/applications (web, media, voice, video) in a manner that

best meets their individual needs and determination to pay (for the resources). The

Internet is not a QoS network; instead it distributes resources approximately equally

in the face of congestion (Miller, 2004; Turner, 1986 for a retroactive perspective).

In this thesis, we make a crucial distinction between “Black Box” and “White Box”

congestion management, following terminology introduced by van Jacobson (v.

Jacobson, 1988), which was not universally adopted by the research community.

Black Box methods are end-to-end, and respond to network congestion without

paying any heed to what is inside the network, in particular the innards of the in-

between routers in a connection’s path. Having optimized network performance end

to end, we can then proceed to optimize the routers themselves which is White Box

congestion management. White Box congestion management has been specifically

excluded from discussion in this thesis, and is the subject of future research and

investigation. We focus on Black Box methods; in particular we introduce our novel

Black Box congestion management algorithm, Sierra (Jagannathan & Matawie,

2005). Good general references are (CCIE Fundamentals, 2000; Cerf & Kahn, 1974)

Sierra is analytically compared to other Black Box methods, namely

• TCP Tahoe (Jacobson, 1988)

2

• TCP Reno (Jain, 1990)

• TCP New Reno (Floyd & Henderson, 1999)

• TCP Vegas (Brakmo & Peterson, 1995).

The comparison of Sierra to other Black Box methods such as Mo-Walrand,

Balakrishnan-Seshan Congestion Manager, Keshav’s Packet Pair and TFRC (TCP

Friendly Rate Control) is the subject of future investigation and research (Keshav,

1991; Mahdavi & Floyd, 1997).

We place the discussion of Black Box methods in the context of Local Area

Networks (LANs) and LAN Design.

It is estimated that the great majority (and increasing number) of installed LANs are

Ethernet based, vis-à-vis Token Ring and Asynchronous Transfer Mode (ATM)

(Jain,1995). One of the reasons for this may be the significantly less expensive

hardware, with acceptable performance at the same time. Other arguments may also

be made for the fact that Token Ring/ATM LANs have become rather like the dodo

bird, and so in this thesis we specifically exclude their discussion (Braden, 1999).

We will have occasion to discuss two evolution mechanisms for LAN selection

within an organization (Held & Jagannathan, 2004).

• Bottom Up

• Top Down.

1.1.1 Tiered Architecture

In the former a three-tiered architecture results, whereas in the latter, an enterprise

LAN strategy drives a centralized approach to the family.

We also provide an extensive tutorial on the Ethernet family. Commencing with a

review of the various versions of the 10BASE family of networks, we move on to:

• the Fast Ethernet family (Black, 1998)

3

• the Gigabit Ethernet family (Martin & Chapman, 1989)

• the recently standardized 10 Gbps version of Ethernet (Kousky, 2000).

We return to the introduction of this matter presently (Johnson, 1996; Tanenbaum,

2003).

We look at the constraints associated with each member of the Ethernet family,

including the so-called 5-4-3 rule and cabling limitations. We also examine

transmission media and their characteristics, including twisted pair, coaxial cable as

well as both single mode and multimode fiber optics. (IEEE 802.8 committee)

We then turn our attention to Ethernet performance characteristics. Topics in this

chapter include the various issues at the data link layer, including framing, the

interframe gap, frame overhead, and their effect on performance and information

transfer, as well as reliable data exchange and error management.

We then concern ourselves with issues at the network, transport and application

layers with a further look at frames and their effect on processing, delays, and

latency considerations. Because the Transmission Control Protocol/Internet Protocol

(TCP/IP) is by far the most popular protocol transported by Ethernet, we will note

the delays required when packets are transported within an Ethernet frame. In doing

so, we can determine if it is practical to transport delay-sensitive information, such as

voice/video, over Ethernet.

1.1.2 LANs & WANs

No LAN is an island. There is always the ineluctable need to hook up your LAN with

other networks, either locally to another LAN, or to the Internet, or by a WAN to

another LAN. Chapter 2 deals with internetworking and the problems associated with

interconnecting geographically separated LANs (Cisco Systems, 2004). We note

that, in conclusion, in certain quarters Network Management (Kousky, 2000) is

4

deemed a LAN Design issue, at least partly. Our position is that this is rather a

modification tool, as the network needs to be up and running in the first place to

deploy Network Management techniques. Accordingly, we specifically exclude a

detailed discussion of this topic in this thesis. (Krol, 1999)

A LAN is a set of locally interconnected devices, connected via the same type or

different types of transmission media. Different LAN devices include (Perlman,

2001):

• stations and segments

• repeaters

• hubs

• bridges (different types)

• LAN switches

• routers

• brouters

• gateways

• network interface cards

• file servers.

We need to assume the different types of network toplology, and the overall structure

or architecture of popular LAN solutions. (Stallings, 1993) Key topologies include:

• loop

• ring

• bus

• token Bus

• tree

• star

5

• hybrid (mixed).

Ring topologies are not important as they are on their way out in relation to current

networking trends. By topology, we mean the geometry and geography of

interconnected LAN stations and segments.

Transmission media that interconnect devices and encoding techniques are the

rightful subject of an entire chapter in its own right.

The members of the Ethernet family of networks include, as alluded earlier:

• 10 Mbps Ethernet

• Fast Ethernet

• Gigabit Ethernet

• 10 Gigabit Ethernet.

Each member has its own immediate relatives.

For instance, within 10 Mbps, there are (Stallings, 1997):

• 10BASE-T

• 10BASE-2

• 10BASE-5

• 10BROAD-36, and

• 10BASE-F.

Within 100 Mbps Fast Ethernet, there are (Johnson, 1996):

• 100BASE-TX

• 100BASE-FX, and

• 100BASE-T4.

Similarly, within 1000 Mbps Ethernet, better known as Gigabit Ethernet, there occur

(Stallings, 1997):

• 1000BASE-LX

6

• 1000BASE-SX, and

• 1000BASE-CX.

However, in 10 Gbps operations, better known as 10 Gigabit Ethernet, only one

standardized LAN is defined.

Recall that a LAN solution consists of a transmission media, MAC protocol and

encoding mechanism, all operating by using a predefined topology.

The transmission media is concerned with the properties of the physical carrier that

bears the signals from source to destination.

The MAC protocol governs the method by which signals access the transmission

media.

The encoding mechanism defines how data and control codes are coded. Because all

of the LAN technologies that we discuss are baseband (digital transmission of digital

data), we will be concerned with only digital encoding. Different signal elements are

used to represent a binary 1 and a binary 0. A number of different encoding schemes

are discussed.

1.1.3 Internetworking

One uses the term “internetwork” or simply “internet” to denote an arbitrary

collection of networks interconnected in some fashion to provide host-to-host

connectivity and deliver a service. For instance, an organization may have a number

of sites, each implementing a LAN solution, and they might decide to interconnect

these LANs using point-to-point links. (Shenker, 1995; Shenker & Wroclawski,

1997)

The term internet needs to be distinguished from the term Internet, which represents

the global interconnection of many existing networks, including 802.3, 802.5 and

even ATM. In certain circles, the preceding networks are termed as “physical

7

networks” whereas a collection of connected physical networks is termed as a

“logical network”. In this context, a collection of LANs connected by switches and

bridges is still one network, whereas a collection of networks connected by routers is

called an internet. The key tool for managing internets is IP.

We also have occasion to treat network performance. We first look at the issue of

frame sizes and the length of an information field and the overhead of a frame.

Therefore, we first deal with, in detail, the composition of a LAN Ethernet frame.

Can the length of LAN frames or their information carrying capability be adjusted to

achieve enhancement in performance? Likewise, the effect of frame length on bridge

and router operations is investigated.

If we have an up and running LAN and wish to enhance or expand it, we can monitor

current LAN traffic to predict the effect of the expansion or the enhancement on a

similar planned network. But, when a brand new network is being put into place, we

lack a prior baseline. In this situation we need a theoretical framework to estimate

network traffic. That framework occurs through the use of a LAN traffic estimation

technique. We will explore the use of this technique to predict future network growth

and the effect of such expansion on the future planned network as well as the

segmentation of a LAN to improve network performance (Braden, 1999).

We will have a discussion of issues at the network, transport and application layers,

with an overview of internetworking at large. We have observed that some of the key

functions of the ‘router’ include the linking of different networks, routing and

delivery of data between processes and applications in End Systems (ESs, an ISO

terminology for end hosts) to edge devices on different networks, and to do all this

seamlessly and transparently in relation to the network architecture in these attached

networks.

8

IP (IPv4) is the predominantly popular protocol supporting these functions. We note

that a new standard for IP addressing was initially specified by the Internet

Engineering Task Force (IETF) variously called IPv6 and IPng. (Turner,1996;

Deering & Hinden, 1998). IPv6 addresses are 128 bits in length. IPv6 supports the

higher speed of today’s networks and the mix of multimedia data streams. Basically,

there was a need for more addresses to assign to all conceivable devices. As noted,

the source and destination addresses are 128 bits in length. It is expected that all

TCP/IP installations will eventually migrate to IPv6, although this process may take

several years, if not decades, to be achieved.

1.1.4 Black Box vs. White Box

In Chapter 6, we return to Black Box congestion management in the context of

LANs. What is unicast congestion management? Consider a set of hosts connecting

one-to-one with another set of hosts via the Internet using TCP/IP. The sending host

is called the “ingress” and the receiver is called the “egress”. Each ingress injects a

certain volume of traffic (packets) into the Net which are duly routed inside the Net

by the in-between routers. Now, interpose the Network. The Net is able to handle a

certain amount of traffic in an overall sense. Consider that the Net is comprised of

ISs (an ISO terminology for routers) a.k.a. routers. If the traffic volume arriving at a

router is “too high” then the router, in accordance with its own White Box methods,

will discard or drop packets. As the number of packets dropped rises in the Net, the

situation becomes one of “congestion”. (Lin & Morris, 1997).

In this thesis, we examine many traditional mechanisms to handle congestion at end

nodes and then critique them. We introduce a new algorithm (called Sierra) and close

by performing a comparative analysis of all algorithms using experimental

simulation. What we do not do is to optimize router behaviour within the network

9

itself - this is the White Box problem, and will be the subject of future investigation

and research (Hashem, 1988; Low & Lapsley, 1999; May et al. 1999).

Specifically, we look at (Jagannathan & Matawie, 2005):

• TCP Tahoe (Jacobson, 1988)

• TCP Reno (Jain, 1990)

• TCP New Reno (Floyd, 1999)

• TCP Vegas (Brakmo & Peterson, 1995)

• Mo-Walrand mechanism (Mo & Walrand, 2000)

• Balakrishnan-Seshan Congestion Manager (Balakrishnan et al. 1999)

• Equation Based Congestion Control (Mahdavi & Floyd, 1997)

• Keshav’s Packet Pair (Keshav, 1991)

• Sierra (Jagannathan & Matawie, 2005).

1.1.5 Stochastic Modeling

Chapter 7 is concerned with the stochastic modeling of congestion control

mechanisms. This chapter is a compilation of results from many diverse sources,

specifically convex optimization as applied to the field of mathematical congestion

control as such. No claim is made to the originality of most of the results in this

chapter; however, we have reorganized most of the theory in a manner that will lend

itself to future exploitation for the purposes of evaluating congestion control

methods. We use the basic “buffer overflow model” paradigm (Kelly, 2000), which

is the cornerstone of our analysis of congestion control algorithms. (Hoe, 1995). It is

true that, unlike systems in modern physics, all aspects of TCP and its progressive

behavioural evolution over time are fully under our control. However, the sheer

scale of TCP’s operation and behaviour is tremendous, and it is probably the very

largest and most complex man-made control system ever evolved. We need

10

mathematical models to “capture” such a system. We also introduce a new

systematic approach to constrained optimization, using Lagrange-Kuhn-Tucker

multipliers as well as Penalty and Barrier methods. It remains to apply these

extended theorems and propositions to congestion control itself viewed as an

optimization problem (Bertsekas, 2003).

Finally, we look at stochastic models and their deterministic limits offering a Central

Limit like theorem for per flow dynamics (Kelly, 1979). We also provide a law of

large numbers for stochastic congestion flows, as well as computing the variance of

per source rates. Among other things, it is demonstrated that Sierra is superior to

Reno and Vegas in terms of delivered throughput.

1.1.6 Quantitative Modeling & Simulation in Software

In Chapter 8 we are concerned with demonstrating the relative superiority of the

novel Sierra congestion control algorithm. For this, we deploy two techniques:

• Quantitative modeling, at a level similar to the theory found in (Jain & Hassan,

2001). New results are obtained for the relative performance of Sierra against

other competitors like Tahoe, Reno, New Reno, etc.

• Software simulation - firstly a tool selection project is described, along with our

experience with two popular tools, i.e., ns2 and OPNET. The innards of both

tools is described. The rationale is given for why we ended up selecting OPNET.

Using the latter, a simulation project was designed, constructed and implemented.

The final result is that Sierra provides superior throughput compared to its

predecessors. Sierra is also fair. When coupled with the other algorithms, it does not

“hog bandwidth”. These results give us confidence that Sierra is a superior

innovation. Further experimental study is indicated and pointed up there. This will

concern the result of adjustments to Sierra parameters on protocol performance.

11

2

INTERNETWORKING ISSUES

2.1 INTRODUCTION

One uses the term “internetwork” or simply “internet” to denote an arbitrary

collection of networks interconnected in some fashion to provide host-to-host

connectivity and deliver a service. For instance, an organization might have a

number of sites, each implementing a LAN solution, and they might decide to

interconnect these LANs using point-to-point links. (Stallings, 1997)

This term “internet” needs to be distinguished from the term Internet, which

represents the global interconnection of many existing networks, including 802.3,

802.5, and even ATM. In certain circles, the preceding networks are termed

“physical networks”, whereas a collection of connected physical networks is termed

a “logical network”. In this context, a collection of LANs connected by switches and

bridges is still one network, whereas a collection of networks connected by routers is

called an internet. The key tool for managing internets is IP. (Cisco Systems, 2000).

Also, we study the characteristics of network devices, such as switches, routers, their

various flavours and interactions, vis-à-vis their impact on congestion management.

2.2 OVERVIEW OF INTERNETWORKING CONCEPTS

Network designers are faced with a daunting task when constructing an internetwork,

because it is possible to use a mixture of four hardware devices (Perlman, 2001):

12

1. hubs (concentrators)

2. bridges

3. switches

4. routers.

Figure 2.1 Internetworking Scenario

These are discussed in detail elsewhere, but their key properties are quickly recalled

here, for the sake of continuity and completeness.

Hubs are used to link multiple users to a single physical unit, in turn connecting them

to the network. They simply regenerate incoming signals out all ports, other than the

port the data was received on, to all the attached stations.

Bridges serve to subdivide segments within the same network. They too function at

Layer 2, independent of the network layer and other higher layer protocols (Layer 3

and above). (Lippins, 2001)

Switches have more ports than bridges and can be considered to represent multiport

bridges with added intelligence. If the number of ports is N, each operating at

10Mbps, then the switch separates collision domains and provides an overall

Host Host

Host Host

Host Host Host

Router Router

Router

Network C

Network B

Network A

13

throughput of 10 x N/2 Mbps. Thus, while switches protect existing cabling

infrastructure, they do increase performance and bandwidth.

Routers separate broadcast domains and connect disparate networks. Driven

primarily by the IP protocol, routers make forwarding decisions based on Ipv4

address formats rather than link-layer MAC addresses.

The trend today is to move away from bridges and hubs and on to routers and

switches when designing internets.

2.3 SWITCHING OVERVIEW

HOST

HOST

HOST

HOST

HOST

RouterRouter switch 2switch 1Server

Figure 2.2 Routers and Switches

Switching data frames can occur via one of the following techniques (Stallings,

1991; Held, 2000):

• cut-through

• store and forward

• hybrid

• fragment-free.

With cut-through switches, only the destination MAC address in incoming frames is

examined, and based on that address only a forwarding decision is made. No other

14

checks occur. Thus, a cut-through switch has the lowest delay and should be

considered for supporting real-time applications, such as VoIP and streaming media.

With a store and forward switch, the entire frame is copied into the switch’s internal

memory, examined for occurrence of any errors, then sent out the right port. Because

the entire frame is stored and the frame is variable in length, the delay is also

variable.

As errored frames are removed by the destination device on a LAN, the necessity of

such error checking by a LAN switch has been questioned. [However, the filtering

capacity should be more useful to route protocols carried in frames to destination

ports more easily than by frame destination address]. Especially if one has hundreds

and thousands of devices attached to a large switch.

Hybrid switches represent a combination of cut-through and store and forward

switches. They work as cut-through switches until a certain level or threshold of

errors is reached, at which point they revert to performing as store and forward

switches. This means that the efficiency of these types of switches is also variable.

The major advantage of a hybrid switch is its minimal latency when error rates are

low and it becomes a store and forward switch when error rates rise, allowing it to

discard frames when error rates get high (Krol, 1999).

A fourth type of switch, fragment-free, examines the first 64 bytes of incoming

frames for any errors. If none are found, they push out the entire frame (without

storing it), using the belief that most errors are likely to occur there in the first 64

bytes. Because fragment-free switches have a slightly longer delay than cut-through

switches, but the delay is uniform, they can usually be used for VoIP and streaming

media applications (Braden, 1989).

15

Maintaining switch operations denotes the build and maintenance of switching

tables, route tables and service tables.

Switching occurs at Layer 2 and routing at Layer 3. In other words, switches work

based on the contents of 6-byte MAC addresses and routers use the 4-byte Ipv4 or

16-byte Ipv6 addresses (Held, 2000).

Switches automatically build and maintain Layer 2 switching tables to track and

learn MAC addresses. If a destination MAC address is not known, the switch

broadcasts that frame out all ports other than the port the frame arrived on. By noting

the source address of the frame and the port it arrived on, the switch updates its

internal tables via a backward learning process. In comparison, routers are

configured with the IP address of the networks attached to their ports and operate

based on 4- or 16-byte IP addresses.

With increasing bandwidth-hungry applications on the market, hubs in wiring closets

are rapidly being replaced by LAN switches. There is also an increasing demand for

intersubnet communications, which must flow through a router. In this connection

Figure 2.2 depicts a typical situation showing the relationship between switches and

routers. Switches primarily move data within a local geographical area, such as a

building. In comparison, routers provide long-distance and global interconnectivity.

Data flow from many hosts are passed serially through routers, which means that if

there is significant traffic for a server accessed remotely via routers, there is the

ineluctable possibility of congested bottlenecks at the routers.

To partly alleviate this problematic situation, Layer 3 capabilities are being added

throughout many networks, typically within formerly Layer 2 switches. Figure 2.3

depicts such a scenario.

16

HOST

Serverswitch A

Layer 2/3

switch B

switch C

Layer 2/3

Router

Layer 2/3

Figure 2.3 Flow of Intersubnet Traffic with Layer 3 Switches

2.4 THE TIERED (LAYERED) APPROACH

Both ISO OSI and TCP/IP reference models are instances of a hierarchical approach

to designing networks. Each layer is ascribed a set of functions or responsibilities

that it provides as services to the layer above. Internetwork design uses a hierarchical

tiered approach to help simplify the overall task. The advantage of a hierarchical

design is modularity, which allows different elements in the tier to be independently

constructed by different vendors and used mutually in an interoperable fashion. This

also facilitates management of change in the internet by containing the cost and

complexity thereof (Cisco Systems, 2000)

Traditionally, hierarchical internet design uses three tiers:

1. Backbone (core) tier – optimal inter-site communication

2. Distribution tier – policy-based connectivity

3. Local-access tier – user access to the network.

The core tier provides high-speed packet switching without any time-consuming

packet manipulation (e.g., filtering, error checks).

17

The distribution tier interfaces with the core and local-access tiers. It manipulates

data from the local-access tier and passes it on to the backbone.

Some of the functions of the distribution tier include:

• address or area aggregation

• department or workgroup access

• broadcast (and multicast) domain definition

• VLAN routing

• media translation

• security.

Some of the functions of the local-access tier include:

• shared bandwidth

• switched bandwidth

• MAC layer filtering

• microsegmentation

2.5 EVALUATING BACKBONE CAPABILITIES

The evaluation of the backbone capability of a tiered network is extremely important,

because it represents the primary data path. In this section, we will discuss the

following:

• path optimization (Coltum, 1999)

• traffic priorities (Garcia-Luna-Aceves, 1993)

• load splitting (Garcia-Luna-Aceves, 1993)

• alternative paths (Braden, 1999)

• tunneling (Stallings, 1993).

18

2.5.1 Path Optimization

Recall that in computer networks there are two types of protocols:

1. route protocols

2. routing protocols.

The former have essentially to do with addressing techniques, whereas the latter

pertain to trajectory selection from a fabric of paths available via routers and other

networks.

Convergence occurs when there is a change in the network properties and all routers

subsequently agree upon the optimal routes. This action takes place by means of

neighbour greeting and autoconfiguration.

Routing protocols, discussed elsewhere, come in two varieties:

1. metric optimizing protocols

2. policy-based protocols.

Examples of the former are RIP (Perlman, 2001) and OSPF (Coltum, 1999). An

example of the latter is BGP. IGRP uses a hybrid metric based on bandwidth, load

and delay. Link state protocols like OSPF and IS-IS (Perlman, 2001) minimize the

cost associated with selected path.

2.5.2 Traffic Prioritizing

Whereas some networks can prioritize homogeneous internal traffic, routers

prioritize heterogeneous flows. Such categorization is differentiated treatment, which

ensures that critical data are given an edge over less important flows (Golestani,

1994; Goyal et al. 1996; Shreedhar & Varghese, 1995; Stoica et al. 1998).

There are three types of category queuing (Demers et al. 1990):

• Priority

19

• Custom

• WFQ.

2.5.2.1 Priority Queuing

Traffic is categorized by a specific metric, such as protocol type. Typically, four

output queues are used:

1. high

2. medium

3. normal

4. low priority.

Figure 2.4 illustrates an example of priority queuing. Note that UDP, which is

typically represented by small segment lengths, such as DNS queries, is shown to

receive high priority in this example. Most Layer 3 routers and switches permit the

administrator to easily define data assigned to different queues.

Traffic Priority

UDP High

BLAT High

DECnet Medium

Vines Medium

TCP Medium

Other Normal

Apple Talk Low

ROUTER

No Priority

Traffic

Backbone Network

Traffic in order of

Priority

Figure 2.4 Priority Queuing

2.5.2.2 Custom Queuing

Custom queuing provides more granularity than priority queuing, wherein multiple

higher layer protocols are supported. Custom queuing reserves a portion of the

20

bandwidth for a certain protocol, guaranteeing a pre-determined bandwidth for it.

Figure 2.5 illustrates an example of custom queuing.

APPN

TCP/IP

NetBIOS

MISC

S

S

A

S

T

N

M

A

T

N

MM

M N

S S

40%

20%

20%

20%

A T

N H M L

M

Figure 2.5 Custom Queuing

2.5.2.3 Weighted Fair Queuing (Keshav, 1997)

The WFQ method uses TDM to segment the available bandwidth among the several

clients on the interface. By assigning weights, each client gets a weighted (for

example, ToS) treatment based on a defined metric, such as arrival rates. Note that if

all arrivals are assigned equal weights, low-volume traffic gets an edge over high-

volume traffic. Figure 2.6 illustrates WFQ (Crowcroft & Oechslin, 1998; Demers et

al. 1990).

WFQ uses an algorithm to dynamically identify data streams at an interface and sort

them into logical queues. Note that in certain cases, such as with SNA (Clark, 1992)

one cannot distinguish between sessions. In DLSW+, SNA traffic is multiplexed over

a single TCP session. In APPN (Joyce & Walker, 1992) they are multiplexed onto

21

one LLC2 session. Because WFQ treats these sessions as a single conversation, the

algorithm does not lend itself to SNA.

In priority queuing and custom queuing, access lists need to be pre-installed.

However, this is not the case with WFQ, which sorts among specific traffic streams

in real-time.

AA A

CC C

BB B

C

D

EE EE

CA CB D

Figure 2.6 Weighted Fair Queuing

2.5.3 Load Splitting

This is exactly what the name implies, load balancing over different paths. Load

splitting can be done with:

• IP (using equal cost paths)

• (E)IGRP (with unequal cost alternatives).

Up to four paths may be used for one destination network. Load splitting of bridged

traffic over serial lines is also possible.

2.5.4 Alternative Paths

The necessity here is to provide for complementary paths to a destination, in case of

link failures on active networks. End-to-end reliability is achieved only when there is

22

redundancy throughout the network. Because redundancy is so expensive, most

providers support redundancy on segments carrying mission-critical data.

Routers are the key to reliable internetworking. However, merely making hardware

at the nodes more available does not make the internet more reliable.

Instead, it is necessary to have redundant links as well. Unless all backbone routers

are fault tolerant, it is necessary also to ensure that redundant links should terminate

at different routers. Thus, a fully fault tolerant router situation is not only

prohibitively expensive, it does not address the link reliability issue. We will return

to reliability options later.

2.5.5 Encapsulation (Tunneling) (Stallings, 1993)

Encapsulation or tunneling is a simple operation, which takes packets or frames

from one network and hides them within frames from another protocol.

2.6 DISTRIBUTION SERVICES

We include a discussion of the following functionalities (Black, 1998):

• backbone bandwidth management

• area and service filtering

• policy-based distribution

• gateway service

• inter-protocol route redistribution

• media translation.

2.6.1 Backbone Bandwidth Management

To optimize use of the backbone, routers are able to offer features such as:

23

• priority queuing

• routing protocol metrics

• termination of local sessions.

Metrics on queues, overflow mechanisms, and routing protocol are all adjustable to

gain more control over forwarding packets through the internet. If a local session

terminates, a router can proxy for it instead of passing through all session control to

the multi-protocol backbone (Fenner, 1997).

2.6.2 Area and Service Filtering (Cisco Systems, 2000)

This functionality is achieved by the use of access lists, which control the movement

of data based on, among other things, network addresses. Service protocols are

applicable to specific protocols.

2.6.3 Policy-Based Distribution (Coltum, 1999)

A policy in our context is a set of rules governing end-to-end traffic to a backbone

network. For example (Davin & Heybey, 1990):

• A LAN department may send traffic to the backbone using three different

protocols, whereas it may wish to expedite one specific protocol through the

backbone as it contains mission-critical data.

• Another department may wish to exclude all but remote login and e-mail

from entering its LAN.

These are departmental policies, and organizational policies can exist as well. For

example, an organization might decree that no Web-based e-mail should enter or

leave its intranet.

24

Different policies may require different internetworking technologies, which may all

need to be integrated and co-exist harmoniously.

One possible way to implement policies is via SAPs (Service Access Points). This

situation is depicted in Figure 2.7.

In Figure 2.7, SAPs from the NetWare servers advertise services to clients.

Depending on whether services are provided locally or remotely, SAP filters prevent

SAP traffic from leaving the router interface.

Server

Router Router

Server

BACKBONE

NETWORK

Clients Clients

Figure 2.7 Policy-Based Distribution: SAP Filtering

2.6.4 Inter-protocol Route Redistribution (Krol, 1999)

The section above on gateway services related to two end nodes using different route

protocols to be able to communicate. Meanwhile, routers can interchange different

routing protocols (RIP, OSPF, IGRP, etc.), which exchange routing information at

the router. Static routing information can also be re-distributed.

2.6.5 Media Translation

These are techniques that translate frames from one network system to another. If

there are attributes in the one system with no counterpart in the other, we have a

problem on our hands. Different vendors will make different decisions as to how to

25

manage this situation. For example, when a direct bridging is sought between, for

example, Token Ring and Ethernet, one uses either SR/TL or SRT bridging.

SRT allows the router to use both SR bridges and a transparent bridging algorithm.

There is a standard way to convert between SR and translational bridges, as

illustrated in Figure 2.8

SRB Translational BridgeRouter

Lose RIFs

Gain RIFs

Figure 2.8 SR/TL Bridging Topology

2.7 LOCAL ACCESS SERVICES

Topics we consider here include:

• value-added addressing

• network segmentation

• broadcast or multicast capability

• naming, proxy, and local cache

• media access security

• router discovery.

2.7.1 Value-Added Addressing (Held, 2000)

When different addressing schemes exist for LANs, such as IP and NetWare, they

interoperate less than perfectly over multi-segmented LANs/WANs.

26

Inter-protocol specific helper addressing is a method that such traffic normally would

not be allowed to transit. For example, a client may search for a server and then

broadcast a message that must transit many routers. Normally such frames would be

dropped, but helper addresses allow such messages to go past routers.

Multiple helper addresses are supported on each router interface to allow forwarding

to remote destinations.

2.7.2 Network Segmentation (Stallings, 1997)

This is an instance of the usage of local access routers to implement local policies

and thus limit unnecessary traffic by segmenting traffic within component segments.

One way to accomplish this is by strategically positioning routers and building in

specific segmentation policies.

For example, a large LAN might be subdivided into segments, such that traffic on a

segment might be limited to:

• local broadcasts

• unicast intra-segment traffic

• traffic for another specific router.

Careful distribution of hosts and clients leads to reduced congestion in the network.

2.7.3 Broadcast or Multicast Capabilities

Routers intrinsically drop broadcast messages. But these are quite commonplace and

need to be curbed to reduce traffic to a manageable level and reduce broadcast

storms. Again, helper addresses help aid multicasts and broadcasts.

27

To fully support IP multicast, the IGMP (Internet Group Management Protocol) must

be deployed on hosts. IGMP enables hosts to dynamically report their multicast

group memberships to a multicast router (Miller, 2004; Fenner, 1997):

• Multicast routers send IGMP queries to their attached LANs and stations

respond with their membership information. The multicast router attached to

the LAN then takes responsibility for sending multicast datagrams from one

attached network to all other networks with multicast membership. If an

IGMP query brings no response, that group is deemed to have no members.

No further messages are sent to that group in the future.

2.7.4 Naming, Proxy and Local Cache Capabilities

These three router capabilities reduce traffic and enable efficient internet operation.

They include: (Held & Jagannathan, 2004):

• naming service support

• proxies

• local caching of network information.

Naming is a well-known mechanism used to resolve names to addresses. Common

examples of addressing schemes include:

• IP Domain Naming Service (DNS) (RFC 1034)

• Network Basic Input Output System (NetBIOS)

• IPX.

A router can proxy a name server. For instance, a list of NetBIOS addresses can be

maintained, avoiding the overhead of transmitting client/server broadcast in an SR

bridge environment.

In that case the router does the following (Martin & Chapman, 1989):

28

• Only one (duplicate) query frame is allowed per time period configured.

• A cache of NetBIOS server addresses with client names (and MAC addresses) is

maintained, limiting broadcast across the network.

2.7.5 Media Access Security

This serves to

• keep local traffic from inappropriately accessing the backbone

• keep backbone traffic from inappropriately entering the LAN.

Both problems need packet filtering to be alleviated. Packet filtering reduces traffic

levels to improve performance. Also, as its name implies, this function improves

security and reduces congestion. The most popular filtering mechanism is the access

list approach.

2.7.6 Router Discovery (Cisco Systems, 2000)

As its name implies, this service is a process of finding routers, ES-IS. [ES is ISO

terminology for host stations and IS pertains to routers]. Limited to exchanges

between hosts and routers, Hello messages are sent by ESs to all routers on the

subnet and in reverse. Both carry subnet and Layer 3 addresses of their generating

systems.

2.7.7 Internet Control Message Protocol (RFC 1256)

RFC 1256 outlines a process for ICMP. There is no single, standardized protocol for

this mechanism.

29

2.7.8 Proxy ARP

A proxy-ARP-enabled router responds on behalf of all hosts that it has a connection

with. This allows hosts to assume that all other hosts are on the network.

2.7.9 Routing Information Protocol

RIP is commonly available on hosts and is used to find the most suited router given

an address.

2.8 CONSTRUCTING INTERNETS BY DESIGN

We start with backbone considerations: (Held & Jagannathan, 2004):

• multi-protocol routing backbone

• uni-protocol backbone.

When several Layer 3 network protocols are routed through a common backbone,

without encapsulation, the situation is that of a multi-protocol routing backbone. Two

strategies are available:

1. Integrated routing – uses one preferred routing protocol that determines

the metric-minimizing path.

2. Ships in the night – uses a different routing protocol for each route

protocol.

All routers support one specific routing protocol per specific route protocol. They

then encapsulate all other routing protocols within the preferred supported routing

protocol.

30

2.9 USING SWITCHES (REVISITED) (Held, 2000)

Vendors and implementations are moving away from hubs and bridges to switches

and routers. All switches operate at Layer 2 and have the following benefits:

• superior segmentation

• increased aggregate forwarding

• increased backbone throughput.

LAN switches address end host requirements for greater bandwidth. Rather than

deploying hubs, by using switches, designers can increase performance and better

exploit existing media. Also, previously unavailable functionality such as VLANs

may become available when the functionality is incorporated into a switch. In

addition, by delivering links to interconnect existing, shared hubs in wiring closets,

and to server farms, a scalable bandwidth becomes available.

2.9.1 Comparison of Switches and Routers (Perlman, 2001)

To conclude our discussion of internetworking, we will summarize the major

differences between switches and routers.

Key features of switches include:

• high bandwidth

• high performance

• low cost

• easy configuration.

Key features of routers include:

• broadcast firewalling

• hierarchical addressing

31

• inter-LAN communication

• quick convergence

• policy routing

• QoS routing

• security

• redundancy

• load splitting

• traffic flow management.

Note that, as switch technology gains momentum, switches of the future will address

all these router functionalities. Although routers currently have more features than

switches, a new series of switches that include built-in routing capability deserve the

consideration of network designers and analysts when constructing or revising a

network (Jacobson & Nichols, 1999; Jacobson et al. 1999, 2000).

The material in this chapter serves as a good background for the congestion

management criteria introduced later in this thesis. It has been introduced to make

the thesis more self-contained as such. Various details are included herein, and these

will be sourced from later chapters.

32

3

NETWORK PERFORMANCE

CHARACTERISTICS

3.1 INTRODUCTION

In this chapter, we turn our attention to Ethernet LAN performance, using general

well understood characteristics about Ethernet, which are introduced in Appendix A.

First we look at the issue of frame sizes and the length of the information field and

the overhead of a frame. Therefore, we first deal with, in detail, the composition of a

LAN Ethernet frame. Can the length of LAN Frames or their information carrying

capability be adjusted to achieve enhancement in performance? Likewise, the effect

of frame length on bridge and router operations is investigated (Sahu et al. 2000).

If we have an up and running LAN and wish to expand or enhance it, we can monitor

current LAN traffic to predict the effect of an expansion or the enhancement on a

similar planned network. But, when a brand new network is being put in place, we

lack a prior baseline. In this situation, we need a theoretical framework to estimate

network traffic. That framework occurs through the use of a LAN traffic estimation

technique. We will explore the use of this technique to predict future network growth

and the effect of such expansion on the future planned network, as well as the

segmentation of a LAN to improve network performance (Davidson, 1992).

33

3.2 FRAME OPERATIONS

The key to understanding Ethernet LAN performance is to first appreciate how an

Ethernet frame is constructed (RFC 1534, RFC 826). Within a frame, there are

essential informational elements (data bits) and non-essential control or padding

information. We first examine the fields in a typical LAN frame. When expanding or

establishing a network or when connecting disparate LANs, we will need to consider

the overheads associated with the frame format.

ETHERNET

8 bytes 6 bytes 6 bytes 2 bytes 46 to 1500 4 bytes

bytes

IEEE 802.3

7 bytes 1 byte 6 bytes 6 bytes 2 bytes 46 to 1500 4 bytes

bytes

Length Data

Destination

Address

Source

AddressType DataPreamble

Preamble

Start of

Frame

Delimiter

Destination

AddressFCS

FCS

Source

Address

Figure 3.1 Ethernet and IEEE 802.3 Frame Formats

3.2.1 Ethernet Frames

Figure 3.1 depicts the standard composition of both IEEE 802.3 and Ethernet frames.

34

Note that the preamble in IEEE 802.3 is seven bytes, whereas in Ethernet it is eight

bytes. Both are used for synchronization and consist of a repeating sequence of

binary 1s and binary 0s. The IEEE 802.3 standard replaced the last byte of the

Ethernet preamble field with a one-byte Start-Of-Frame (SOF) delimiter. That byte

has a sequence of 1s and 0s, but terminates with two set bits. Another difference

between the two frames occurs in the protocol field in Ethernet that was replaced by

the length field in the IEEE 802.3 frame. The two-byte protocol field contains a

value that identifies the protocol transported in the frame, such as IP or IPX. In

comparison, the length field identifies the length of the frame in an IEEE 802.3

environment. This means that only one protocol can theoretically be transported in an

IEEE 802.3 frame.

Because most organizations need to transport multiple protocols, the data field of the

IEEE frame was used to convey several subfields that allowed multiple protocols to

be transported. Referred to as a SubNetwork Access Protocol (SNAP), this frame

retains the IEEE 802.3 frame format, but inserts special codes within the beginning

of the data field to indicate the type of data that is transported.

Whereas some vendors produce dually functioning IEEE 802.3 and Ethernet

hardware, this is done mostly to preclude the wholesale replacement of idiosyncratic

(IEEE 802.3 versus Ethernet) NICs in their workstations.

We now discuss the frame fields in order.

3.2.1.1 Preamble

The preamble field consists of alternating 1s and 0s that serve to announce the arrival

of the frame and for all listeners in the network to synchronize themselves.

35

Furthermore, this field serves to ensure a minimum 9.6 micro-second (µs) frame

spacing at 10 Mbps to use for error control and recovery.

3.2.1.2 SOF Delimiter

This field, which only applies to IEEE 802.3 frames, consists of a format identical to

the preamble, with alternating 1s and 0s for the first six bits. The seventh and eighth

bits are both set to 1, which breaks the synchronization pattern and alerts the listener

that the data is coming.

A controller strips off the preamble and SOF delimiters from incoming frames before

buffering them. Accordingly, as the preamble and SOF delimiter are included in

computations of length or they are not, minimum and maximum frame lengths can be

determined differently in calculations. That is, minimum and maximum length

frames will be eight bytes longer on the media than when in a computer’s NIC.

3.2.1.3 Source and Destination Addresses

Both source and destination addresses occur in IEEE 802.3 and Ethernet frames. The

destination address indicates the recipient of the frame and the source address

indicates the originator. Two-byte source and destination addresses apply only to

802.3 and, although designed for use by small LANs, were never seriously

implemented. In comparison, six-byte addresses apply to IEEE 802.3 and Ethernet

and are de facto addressing standards. They exist within two special fields, as

depicted in Figure 3.2.

Those fields are:

• I/G (Individual/Group) bit – this is 0 for unicast frames and 1 for multicast

frames.

36

• U/L (Universal/Local) bit – applies only to six-byte addresses

o 0 � universally assigned by IEEE

o 1 � locally administered by vendor

(a) 2 - Byte field (IEEE 802.3)

16-bit address field

(b) 6 - Byte field (IEEE 802.3)

48-bit address field

I/G 15 address bits

U/LI/G 46 address bits

Figure 3.2 Source and Destination Address Field Formats

3.2.1.4 Type

The type field applies only to Ethernet. This field identifies the network layer

protocol carried. There is a different connotation for IEEEE 802.3 and therefore rules

out interoperability between the two protocols.

3.2.1.5 Length

The length field is two bytes and is used to identify the number of data bytes in the

data field. As noted, there is a proviso in length calculations according to whether the

37

preamble and SOF are or are not included, but this does not affect the length field’s

value.

Short frames have an effect on reliable MAC delivery. It is possible that a short

frame has collided and corrupted with another, but the sender still believes the

transmission is successful. To preclude this possibility, it is deemed that that the

minimum length of all frames on an Ethernet must be at least twice the media’s

propagation delay. For instance, in a 10 Mbps coax-based LAN with a maximum

length of 2500 m, the minimum time per the IEEE 802.3 standard is 51.2 µs. In turn,

that time corresponds to 64 bytes, because 64 bytes x 8 bits/byte x 10-7 secs/bit is

51.2 µs. As network speed rises, either the minimum frame size must also rise or the

maximum segment length must fall.

3.2.1.6 Data Field

The data field has a minimum value of 46 bytes to ensure that the frame is minimally

72 bytes in length. The effect of having a minimum length data field requires that

information that is less than 46 bytes be padded to reach the minimum length. In

certain documentation, this is referred to as a PAD subfield. Regardless of the

manner, reference fill characters are added when necessary to ensure that the

minimum length is 46 bytes.

The maximum length of the data field is 1500 bytes. The implication of this is that

data-intensive applications such as multimedia imaging and file transfers must use

multiple frames.

38

1 byte 7 bytes 1 byte 6 bytes 6 bytes 1 byte 46 to 1500 4 bytes 1 byte

bytes

SSD ESDPreamble SFDDestination

AddressL/T Data FCS

Source

Address

Figure 3.3 Fast Ethernet Frame Format

3.2.1.7 Frame Check Sequence

This FCS field is four bytes in length. A Cyclic Reliability Check (CRC) is

calculated using both address fields, the type/length field, as well as the data field.

This CRC is placed in the four byte FCS field by the sender. The receiver then

recalculates the CRC at the other end. If CRC sent and CRC received match, the

frame is accepted, otherwise the receiver simply drops the frame.

There are two other possibilities that can occur that will result in a frame being

dropped. Those possibilities include:

1. Length of data field does not match the value in the length field.

2. Frame length is not a multiple of eight.

3.2.2 Fast Ethernet Frames

Figure 3.3 illustrates the Fast Ethernet frame in detail. We see that the frame format

for Fast Ethernet is similar to the IEEE 802.3 frame, except for the Start of Stream

Delimiter (SSD) and End of Stream Delimiter (ESD). SSD signals the arrival of a

frame, whereas ESD indicates that the frame has been successfully transmitted.

The other thing about Fast Ethernet frames is that Ethernet and IEEE 802.3 are

Manchester encoded with an interframe time gap of 9.6 µs between frames.(Johnson,

1996).

39

In comparison, Fast Ethernet is transmitted using 4B5B encoding and an interframe

gap of 0.96 µs. Both SSD and ESD fall within this gap.

3.2.3 Gigabit Ethernet Frames

Recall that the operating speed of an Ethernet network is reflected in either an

increase in frame length or a decrease in maximum segment length.

At 1 Gbps, if a minimum frame length of 64 bytes is maintained, the network

separation falls to 20 m. In structured cabling within an office building, horizontal

cabling takes up to 10 m from the wall socket to the desktop. Therefore, to

accomplish an increase of network cabling to around 200 m, two special techniques

are employed:

• carrier extension

• packet bursting.

These are quickly discussed here.

3.2.3.1 Carrier Extension

Carrier extension extends the Ethernet slot time to 512 bytes (from 64 bytes). This is

achieved by padding the minimal 64-byte frame from the earlier 64 bytes. This

action results in the carrier signal being placed on the network with an extension of

up to 512 bytes. To accomplish this, frames less than 512 bytes in length are padded

with special carrier extension symbols.

At the receiver end, extension symbols are stripped off prior to FCS checks. Note

that carrier extension applies only to half-duplex Ethernet. This significantly

degrades performance, especially when coupled with short packets. To sort this

issue, packet bursting was introduced.

40

64-byte minimum

512-byte maximum

ESDPreambleSource

AddressL/T Data FCSSFD

Destination

Address

Figure 3.4 Gigabit Ethernet Frame Format with Carrier Extension

3.2.3.2 Packet Bursting

If a station has multiple frames to transmit, it does so after the first (padded) frame is

successfully transmitted. Subsequent frames are not padded, but are limited by the

maximum frame length. Figure 3.5 depicts Gigabit Ethernet with packet bursting.

We note therein that that the first two packets transmitted were less than 512 bytes in

length and were extended. Future packets within the burst time of 1500 bytes are

transmitted to completion, this is indicated by Packet 3. The interframe gap between

frames is reduced from 9.6 µs on a 10 Mbps LAN to 0.096 µs on Gigabit Ethernet.

Burst Time (1500 bytes)

Carrier

Sense

Packet 1 xxxx Packet 2 xxxx Packet 3

Slot Time

(512 bytes)

Figure 3.5 Gigabit Ethernet Packet Bursting

41

3.2.4 Frame Overhead (Held & Jagannathan, 2004)

Table 3.1 summarizes the frame overhead percentage associated with transporting in

Ethernet and Gigabit Ethernet frames as the number of bytes of information varies

from 1 byte to 1500 bytes. As indicated, the percent overhead varies from 1.7

percent to 98.61 percent. We see that performance can degrade considerably with

interactive traffic. The information in Table 3.1 can be important for network

performance, especially in client/server situations, where it is preferable to send a

lesser number of frames with information for several transactions at once, as that

reduces the number of interframe gaps, which in turn improves the efficiency of the

data flow.

3.3 AVAILABILITY LEVELS

We first define availability (A):

A% = [operational time / total time] x 100

expressed as a percentage.

Consider a bridge that works round the clock. Over a year’s time, assume that the

bridge failed once and took 8 hours to repair. So out of 8760 hours per annum, the

device was operational for 8752 hours.

42

Table 3.1 Frame Overhead

Bytes

Ratio of Ratio of

Info in Overhead/ Percentage Overhead/ Percentage

Data Field Frame Length Overhead Frame Length Overhead

1 71/72 98.61 519/520 98.61

10 62/72 86.11 510/520 98.08

20 52/72 72.22 500/520 96.15

30 42/72 58.33 490/520 94.23

45 27/72 37.50 475/520 91.35

46 26/72 36.11 474/520 91.15

64 26/90 28.89 456/520 87.69

128 26/154 16.88 392/520 75.38

256 26/282 9.22 264/520 50.77

512 26/538 4.83 26/538 4.83

1024 26/1050 2.48 26/1050 2.48

1500 26/1526 1.70 26/1526 1.70

Ethernet Gigabit Ethernet

Then the availability of the bridge becomes (Held, 2000):

A% = (8752/8760) x 100 = 99.9%

There are two options to increase the availability of devices. Either deploy redundant

devices or have devices with multiple ports.

What are the implications?

Reliability is typically measured in terms of the mean time between failures (MTBF)

and mean time to repair (MTTR). Can these parameters be used to better understand

availability levels? The answer can be obtained in the formula for availability

expressed in terms of MTBF and MTTR as follows:

A% = [ MTBF/(MTBF + MTTR) ] x 100

From the above formula, it is important to remember that these are mean or average

times. Otherwise the calculations will be erroneous. The mean times need to be

measured across a range of devices installed; this is the MTBF information provided

by device vendors, which may be used in place of the in-house determined average

figures.

43

We note that, if devices are connected in series, then for the system availability As:

As = Π Ai

Whereas if devices are connected in parallel:

As = { 1 - Π ( 1 - Ai ) } x 100

For hybrid topologies of devices, one can consider the system as a sequence of series

and parallel elements and compute the overall level of availability as simply as one

computes the impedance of a block of series and parallel resistors.

3.4 NETWORK TRAFFIC ESTIMATION (Held, 2000)

We now consider the use of an a priori scheme to estimate or predict network

performance. Plan for segmentation if high use is predicted, using a local bridge,

switching device, or similar device to enhance performance (Awduche et al. 1999).

In this case, one estimates traffic by considering the required functions of each

network user. Group or classify similar network users into a workstation class. Do

the calculations for one typical member of the workstation class, then multiply by the

number of workstations in that class to obtain an estimate of traffic for the entire

class. Repeat the procedure for all workstation classes and add the results to arrive at

the average traffic for the entire network (Jain, 1995).

For a typical workstation class, activities performed may include:

• load application

• load graphic image

• save graphic image

• send e-mail message

• receive e-mail message

• print graphic image

44

• print text data

• invoke a client/server database.

After selecting the activities, determine:

• message size

• number of frames per message

• frame size

• frequency per hour.

Subsequently, use the following formula:

Bit Rate = (frames/message x frame size x 8 x frequency/hour) / (3600

s/hour)

Now that you have calculated this number for each activity performed for the

workstation class, add up the bit rates (bps) for the entire class representative.

If there are N stations in the class, then the bit rate per class = N x (bit rate per

representative). Typical classes may include (project) managers, architects,

secretaries, engineers, programmers and system administration staff.

Finally, add the computed bit rates for all classes to arrive at the total estimated bit

rate for the entire network. The projected growth rates for workstation classes may

then be estimated to arrive at the projected bit rate for future utilizations.

When projecting the traffic load, it is important to note that utilization levels of

Ethernet beyond 50 percent will result in performance degradation that begins to

become observable. At such a time, you should consider segmentation using two-port

local bridges, which are less expensive than routers or switches. Figure 3.6 illustrates

this situation, by example, placing selected user classes within separate bridged

segments.

45

host

Ethernet segment A

Ethernet segment B

host

Bridge

Figure 3.6 A Subdivided Network

For the example shown in Figure 3.6, let us assume that the most busy workstation

class is programmers. Then, we may wish to consider placing all the users of that

class in a separate segment connected to the other segments via a local bridge. If, for

instance, the network utilization for the network is 65 percent, with the programmers

consuming 54 percent, then segmenting the network results in a utilization of (65 x

54) percent or 35 percent for that class. Although not a perfect improvement, it is a

betterment of the situation.

From this example, we can note why server farms should not all be placed in one

segment. It is akin to putting all one’s eggs in the same basket, with most if not all

client/server transactions in the same segment.

Figure 3.7 illustrates the 80-20 rule. When interconnecting two or more separate

networks, this so-called 80-20 rule applies, with 80 percent of traffic typically intra-

LAN and the remaining 20 percent inter-LAN.

46

Internet

10 percent

intranet 10 percent

departmental

80 percent

Figure 3.7 Typical LAN Information Distribution

3.5 AN EXCURSION INTO QUEUING THEORY (Bertsekas,

2003)

Now we need a procedure to estimate waiting times in the system and to select

appropriate equipment with sufficient memory to meet specific requirements of the

organization. To do so, we need some information about arrivals and servicing times

for frames arriving in a network. Then we apply classical results from queuing theory

to arrive at characteristics of the system. Queuing theory has to do with managing

both delays and buffer memory in remote bridges and routers used to link networks.

If the delay is too high or the memory too low, performance in these devices

degrades, necessitating retransmissions.

Queuing theory affords us models to determine and manage delays, to investigate the

effects of modifying operating rates of circuits, as well as to determine the minimum

47

acceptable memory requirements for devices to maintain a satisfactory level of

performance. We now turn our attention to these features, as determined by queuing

theory (Aggarwal et al. 2000).

Consider the scenario where two LANs are inter-connected via remote bridges or

routers. Assuming a single-channel, single-phase queuing model with Poisson

arrivals and arbitrary servicing times, the following formulas apply and yield

information on waiting line characteristics (Held, 2000).

λ = arrival rate

μ = mean service rate

Utilization: P = λ/μ

Probability: Po = 1 - λ/μ

Length of system: L = λ/(μ - λ)

Length of queue = Lq = [λ2 / μ(μ - λ)]

Waiting time in queue: Wq = Lq/λ

Waiting time in system: W = Wq + 1/μ

3.2.1 Buffer Memory Considerations

We assume that:

Pn = probability of n units in a system

= (λ/μ)n ( 1 - λ/μ )

= pn ( 1 – p )

With:

P (n>k) = (λ/μ)k = pk

Where: μ = service rate

48

λ = arrival rate

For:

n = 0 to n = 20

k = 0 to k = 20

Table 3.2 depicts these probabilities, derived from a calculative computer

programme.

Table 3.2 Probabilities

N P (N ) K P (N > K )

0 0 .6 2 5 0 0 0 0 0 0 1 .0 0 0 0 0 0 0 0

1 0 .2 3 4 3 7 5 0 0 1 0 .3 7 5 0 0 0 0 0

2 0 .0 8 7 8 9 0 6 3 2 0 .1 4 0 6 2 5 0 0

3 0 .0 3 2 9 5 8 9 8 3 0 .0 5 2 7 3 4 3 8

4 0 .0 1 2 3 5 9 6 2 4 0 .0 1 9 7 7 5 3 9

5 0 .0 0 4 6 3 4 8 6 5 0 .0 0 7 4 1 5 7 7

6 0 .0 0 1 7 3 8 0 7 6 0 .0 0 2 7 8 0 9 1

7 0 .0 0 0 6 5 1 7 8 7 0 .0 0 1 0 4 2 8 4

8 0 .0 0 0 2 4 4 4 2 8 0 .0 0 0 3 9 1 0 7

9 0 .0 0 0 0 9 1 2 6 9 0 .0 0 0 1 4 6 6 5

1 0 0 .0 0 0 0 3 4 3 7 1 0 0 .0 0 0 0 5 4 9 9

1 1 0 .0 0 0 0 1 2 8 9 1 1 0 .0 0 0 0 2 0 6 2

1 2 0 .0 0 0 0 0 4 8 3 1 2 0 .0 0 0 0 0 7 7 3

1 3 0 .0 0 0 0 0 1 8 1 1 3 0 .0 0 0 0 0 2 9 0

1 4 0 .0 0 0 0 0 0 6 8 1 4 0 .0 0 0 0 0 1 0 9

1 5 0 .0 0 0 0 0 0 2 5 1 5 0 .0 0 0 0 0 0 4 1

1 6 0 .0 0 0 0 0 0 1 0 1 6 0 .0 0 0 0 0 0 1 5

1 7 0 .0 0 0 0 0 0 0 4 1 7 0 .0 0 0 0 0 0 0 6

1 8 0 .0 0 0 0 0 0 0 1 1 8 0 .0 0 0 0 0 0 0 2

1 9 0 .0 0 0 0 0 0 0 1 1 9 0 .0 0 0 0 0 0 0 1

2 0 0 .0 0 0 0 0 0 0 0 2 0 0 .0 0 0 0 0 0 0 0

P (N u n its ) a n d P (K + u n its in s ys te m )

P (N fra m e s in sy s te m )

P ro b a b ility o f N U n its P ro b a b ility o f K o r M o re U n its

Using the data in Table 3.2, you can make the following computations.

To obtain a level of 99.9 percent of the occurrences, which is equivalent to saying

0.1 percent of non-manageable occurrences, note first from the table that:

When k = 7,

P (N > 7) is 0.00104284

49

So one selects k = 8 to satisfy the requirement of handling 99.9 percent of

occurrences, wherein the frame arrival rate exceeds the servicing rate of the bridge or

router.

So, if frame length is 1200 bytes, the memory requirement is:

Memory = 1200 bytes/frame x 8 frames

= 9600 bytes

This procedure yields a nine-step approach for determining storage requirements:

1. Set λ = mean arrival rate

2. Set µ = mean servicing rate

3. Determine the utilization level

4. Determine the level of service required when λ > µ

5. Set N = units in system

6. Set K = level of service the server is required to provide

7. Find p(N > K)

8. Extract K = number of frames to be queued

9. Multiply average frame length by the number of frames K to be queued.

This yields the memory values for all situations with a pre-determined probability

level.

3.6 ETHERNET PERFORMANCE DETAILS

Because of the random nature of collisions in the CSMA/CD protocol, Ethernet bus

performance is nondeterministic. As a result, performance characteristics and delays

are not predictable. All we have are average and peak utilizations and this is

information we may use to segment an existing network to enhance performance.

(Held, 2003)

50

3.6.1 Network Frame Rate

The parameter for Fast Ethernet is ten times the value for 10 Mbps Ethernet.

Similarly, the rate for Gigabit Ethernet is ten times that for Fast Ethernet, but that

proposition is true only for certain types of frames, where carrier extension is used.

Let us quickly revisit the IEEE 802.3 frame formats (Held & Jagannathan, 2004):

• Preamble (8 bytes) [7 byte preamble and 1 byte SOF delimiter]

• Destination Address (6 bytes)

• Source address (6 bytes)

• Length / Type (3 bytes)

• Data (46 to 1500 bytes)

• FCS (4 bytes)

• Total = 72 bytes to 1526 bytes.

Under Ethernet and IEEE 802.3 frames operating at 10 Mbps, there is a dead gap of

9.6 µs between frames. This can be used to determine the frame rate on the network.

For example, consider a 10 Mbps LAN.

The bit time is 10-7 s or 1 ns.

If we assume frame length of 1526 bytes maximum, then the time per frame is

9.6 µs + 1526 bytes x 8 bits/byte x 100 ns/bit = 1.23 ms

Because one 1526-byte frame requires 1.23 ms, in 1 s there are 1/1.23 or

approximately 812 maximally sized frames. Such a situation using maximal frames

occurs when doing data-intensive file transfers.

Table 3.3 shows frame rates for Ethernet, Fast Ethernet and Gigabit Ethernet. It

summarizes the frame processing requirements for these networks under 50 percent

and 100 percent load conditions, based on minimum and maximum frame sizes.

51

These rates indicate the number of frames that a bridge connected to a LAN must be

capable of handling.

If a bridge, switch or router is used on a LAN, the data contained in Table 3.2 can be

used to determine the minimum required processing speed for the device.

Table 3.3 Ethernet Frame Processing (Frames per Second)

100% Load

Ethernet 1526 406 812

72 7440 14,880

Fast Ethernet 1526 4060 8120

72 74,400 148,800

Gigabit Ethernet 520 117,481 234,962

1526 40,637 81,274

Size (Bytes)

Frames per Second

50% Load

Network Type Average Frame

3.6.2 Gigabit Ethernet Considerations

In Gigabit Ethernet, carrier extension is used to ensure a minimum framelength of

512 bytes (or 520 with preamble or SOF delimiter). The carrier extension runs from

0 to 448 bytes according to the length of the pure data content of the frame. The

interframe gap is 0.096 µs (Johnson, 1996).

This in turn entails that:

Frame Rate = 0.096 µs + 520 bytes x 8 bits/byte x 1 ns/bit = 4.256 µs

So, in 1 s there are a maximum of

1 / 4.256 = 234, 962 minimum sized frames

In turn, this means that with the transmission of a maximum length 1526-byte frame,

the frame rate is:

52

0.096 µs + (1526 bytes x 8 bits/byte x 1 ns/bit) = 12.304 µs

Therefore, in 1 s there can be a maximum of 81,274 maximum length frames.

If we look again at Table 3.3, we see easily that the performance ratios of Fast

Ethernet and Gigabit Ethernet are actually 1.579:1 and 15.79:1, respectively, in

relation to 10 Mbps Ethernet. This is despite the 10:1 and 100:1 increase in data

rates.

3.6.3 Actual Operating Rate (Held, 2000)

To estimate the actual operating rate of an Ethernet network, you need to deduct the

dead time from the maximum throughput. For example, at 10 Mbps our computation

would be as follows:

10 Mbps – (9.6 µs / 100 ns x 812) = 9,922,048 bps = 9.922 Mbps

Therefore we see that for maximum utilization of an Ethernet, one must transmit

9.922 Mbps of the actual data rate.

Similarly, for Fast Ethernet:

Operational Rate = 100 – (0.96 µs/10 ns x 8127)

= 99.22 Mbps

For Gigabit Ethernet, the operational rate is 992.19 Mbps.

3.7 BRIDGING A NETWORK

When utilization levels reach 50 to 60 percent for extended time periods, LAN

modification is in order. One way to do this is to use local bridges to segment the

bigger LAN. Once the decision has been made to bridge the network, the next step is

to determine that the filtering and forwarding rate of the bridge is minimally equal to

the actual operating rate (frames/sec) of the network as measured or estimated.

53

Until now, we assumed that two bridged segments have the same operating rates.

This is not always the case. For example, one department may be working at 10

Mbps, whereas another uses Fast Ethernet. What is the throughput between the

LANs? Figure 3.8 below depicts the situation.

LAN BLAN A Bridge

R1 R2

Figure 3.8 Linking LANs with Different Operating Rates

To compute the time to transfer information from one network to the other, let us

assume the following parameters:

RT = Time taken to send A � B

R1 = Time taken to send A � Br

R2 = Time taken to send Br � B.

Then the time taken to transmit data between the two networks becomes:

RT = (R1 x R2) / (R1 + R2).

For instance, if we estimated:

R1 = 812 frames/s @ 10 Mbps

R2 = 8130 frames/s @ 100 Mbps

Then:

RT = 738 frames/s

Assuming that the sending station has full access to the bandwidth and other

resources of the bridge.

54

4

ISSUES AT THE NETWORK,

TRANSPORT AND APPLICATION

LAYERS

4.1 INTERNETWORKING OVERVIEW

We begin our discussion of issues at the upper layers, with an overview of

internetworking at large. We introduced the router elsewhere and noted that some of

its key functions include the linking of different networks, routing and delivery of

data between processes and application in End Systems (ESs) [an ISO terminology

for edge devices on different networks], and to do all this seamlessly and

transparently in relation to the network architecture in the attached networks (Clark,

1988; Balakrishnan et al. 1995). This chapter has dealt with the higher TCP/IP layers

and places further analysis in perspective. This discussion is important because Black

Box methods work at the Transport Layer level (White Box works at NW Layer).

One protocol, and predominantly the popular one, that supports these functions is the

IP or Ipv4. Figure 4.1 shows the IP header, which is a minimum of 20 Octets. The

fields in this depiction are as follows (Held, 2000):

• Version (4 bits) – indicates version number to enable evolution.

• Internet header length (IHL) (4 bits) – length of the header in units of 32-bit

words. Minimum value of IHL is 5 and maximum is 20 Octets.

55

• Type of service (8 bits) – this yields guidance to ES IP modules and to routers

en route about the relative priority of packets.

• Total length (16 bits) – total IP packet length in Octets.

• Identification (ID) (16 bits) – a sequence number that, along with the source

or destination addresses and user protocol, identifies a packet uniquely. Given

these three values, this ID field is then unique in value.

• Flags (3 bits) – only two bits are different as defined here. The more bit

indicates whether this is the last fragment in the original packet. The do not

fragment bit prohibits further fragmentation when set. Note that, if en route,

the second flag bit is set and if the packet exceeds the maximum transmission

unit (MTU) size, the packet is simply discarded. Hence, source routing is

preferable when this bit is set to avoid subnetworks with too low of an MTU

size.

• Fragment offset (13 bits) – this field indicates where in the original segment

this fragment belongs, in 64-unit bits. By implication, all but the last packet

must contain a data bit in multiples of 64 bits in length.

• Time to live (TTL) (8 bits) – how long, in seconds, a packet is allowed to

remain on the internet. Each router en route to the destination must decrement

the TTL by one at least, so this field is similar to a hop count.

• Protocol (8 bits) – indicates the next higher layer protocol, which is to receive

the data in the destination, essentially indicating the type of the earlier header

in the packet after the IP header.

• Header checksum (16 bits) – an error detection code applied only to the

header. Because some header fields change in transit, this checksum is

recalculated and reverified at each router en route.

56

• Source address (32 bits) – formulated to allow a variable allocation of bits to

specify ES attached to the specified network.

• Destination address (32 bits) – same as source address.

• Options (variable) – encodes requested options such as security, source

routing, routing recording and timestamps.

• Padding (variable) – used to ensure that packet length is a multiple of 32 bits

in length.

4 8 16 32

Version IHL

Flag

Total LengthToS

Identification

TTL Protocol Header Checksum

Fragment Offset

Options + padding

Source Address

Destination Address

Figure 4.1 The IP Header

We observe that a new standard for IP addressing was initially specified by the

Internet Engineering Task Force (IETF), variously called Ipv6 or Ipng. This scheme

is depicted in Figure 4.2, where we see that the Ipv6 uses addresses 128 bits in length

(Deering & Hinden, 1998; Shenker, 1995).

57

0 12 16 324

0 o

cte

ts

Flow Label Traffic classVer

Source Address

Destination Address

Next Header Hop LimitPayload Length

Figure 4.2 The Ipv6 Header

Ipv6 supports the higher speeds of today’s networks and the mix of multimedia data

streams. Basically, there was a need for more addresses to assign to all conceivable

devices. As noted, the source and destination addresses are 128 bits in length. It is

expected that all TCP/IP installations will eventually graduate to Ipv6, although this

process will take many years, if not decades, to be achieved.

4.2 PROTOCOL ARCHITECTURE

Figure 4.3 shows two LANs interconnected over an X.25 network. Therein, we see

the operation of IP for data exchange between ESs, A and B, attached to the two

LANs.

The IP at A receives blocks of data from the higher layer protocol software at A. It

then attaches an IP header with the global IP address of B. This address consists of a

network ID and an ES identifier and the resulting unit is called an Internet Protocol

data unit (IPDU) or simply a datagram.

58

ES (A) ES (B)

TCP TCP

IP IP

LLC LLC X.25.3 X.25.3 LLC LLC

MAC MAC X.25.2 X.25.2 MAC MAC

Phy Phy Phy Phy Phy Phy

IP

Router XRouter X

IP

X.25 Packet

switched WAN

Figure 4.3 IP Operation

The datagram is encapsulated within the LAN protocol and sent to the Router X,

which promptly decapsulates the LAN fields to examine the IP header. The same

router then further encapsulates the datagram with X.25 protocol fields, and sends it

across the WAN to the remote Router Y. That router then decapsulates the X.25

fields and recovers the datagram, then wraps it up with layer 2 fields as suitable for

LAN 2 and sends it off to the device B.

4.3 DESIGN ISSUES

We now turn our attention to some design issues in more detail. These include:

• addressing (Held, 2000)

• routing (Perlman, 2001)

• datagram lifetime (Krol, 1999)

• fragmentation or reassembly (Stallings, 1993).

4.3.1 Addressing

A unique address is associated with each ES and intermediate system (IS) router

within a configuration. This is called an IP address and is used to route a datagram

through an internet to the target system indicated by the destination address.

59

Once data arrives at the remote ES, it must be processed and delivered to some

process or application therein. Typically, multiple applications will be concurrently

supported and one application may support several users.

Each application and maybe each user of an application is assigned in the system

architecture with a port. Minimally, each application has a port number that is unique

in that system. Furthermore, for instance, a File Transfer Protocol (FTP) application

may support several concurrent data transfers and in that situation each transfer is

dynamically assigned a port number.

There are two levels of IP addressing. First there is the globally applicable (and

possibly redundant) Ipv4 address. Second, for each device on the network, there is a

unique address. Examples are the MAC address (802 network) and the X.25 host

address. These are sometimes referred to as network attachment point addresses

(NAPAs). (Held, 2000)

The issue of addressing scope is relevant only for global IP addresses. On the other

hand, port numbers are unique only within a given system. Hence if there are two

systems, A and B, the following ports work uniquely - A.1 and B.1.

4.3.2 Routing

Generally, routing is achieved by maintaining a table within each router. The routing

table yields, for a given target system, the next hop (next router) to which the

datagram should be sent (Perlman, 2001).

Routing tables can be static or dynamic. Whereas a static table can contain redundant

routes for routers that are not available, a dynamic routing table is more flexible.

“Neighbour greeting” is used to determine the next hop. This helps with congestion

control and also to address the mismatch between LAN and WAN transmission rates.

60

Source routing occurs when a sending station dictates a sequential list of routers that

the datagram must traverse; this specification is inserted within the datagram itself by

the sending station. Route recording is when each router in the trajectory of the

datagram appends its IP address to the datagram. This often helps with network

maintenance and trouble-shooting.

4.3.3 Datagram Lifetime

There is a possibility, especially with dynamic routing, that a datagram or some of its

fragments keeps circulating endlessly in the internet, especially when there are

sudden, significant changes in the network traffic or when there is a flaw in the

system’s routing tables. To preclude this problem, datagrams are sometimes marked

with a lifetime field similar to a hop count. The hop count is initially set to N and

decremented by one as the datagram passes through each router en route. When the

hop count reaches zero, the datagram is discarded.

4.3.4 Fragmentation or Reassembly

Subnetworks within an internet may specify different MTUs, which refers to the

largest size of datagrams in transit. It is not feasible to dictate one uniform maximum

packet size across networks. So when the next subnetwork has a smaller MTU size

compared to the previous subnetwork, the only option is to fragment the packet. (Of

course, unless the do not fragment bit flag is set, when the packet is discarded).

In IP, reassembly of fragmented datagrams happens in the ES at the destination. The

following fields in the IP header are used for handling fragmentation and reassembly:

61

• Data unit identifier (ID) – a composite of source and destination addresses, an

identifier of higher level protocol that generated the data (e.g. TCP), and a

sequence number supported by that protocol layer.

• Data length - the length of the user data field.

• Offset - the position of a fragment of user data in the data field of the original

datagram in multiples of 64 bits.

• More Flag- specifies that more fragments follow.

One problematic issue that must be dealt with is that of lost fragments. IP does not

guarantee delivery. There are two mechanisms to deal with this issue.

First, one mechanism is for the reassembly function to generate a local real-time

clock that keeps ticking. If the clock expires prior to reassembly, that entire effort is

abandoned and received fragments are discarded.

A second approach uses the datagram lifetime, a part of the header of each incoming

fragment. The lifetime field continues to be decremented by the reassembly function.

If the lifetime expires to full-time reassembly, received fragments are discarded.

4.4 ROUTING AND ROUTE PROTOCOLS

In essence, a router is primarily a packet switch. In this connection, the term packet

refers to the so-called protocol data unit (PDU) that traverses from the Network

Layer 3 software in one system across a network to the Layer in another system. The

packet contains, among other information, the addresses of source and target ESs.

The ESs are then devices that generate or receive the overwhelming majority of all

packets traversing the network (Miller, 2004).

Another piece of terminology is the “subnetworks”. This refers to a collection of

network resources that can be reached without going through a router. If a router is

62

involved in going though a network from one ES to another, these ESs are in

different subnetworks.

Finally, an internetwork (or simply an internet) refers to a collection of two or more

subnetworks interconnected by routers.

In this context, the role of the router is simple. It receives packets from ESs (and

possibly other routers) and routes them through the internet to the appropriate

destination network. Once the packet has arrived at the destination subnetwork, the

last router in the path traversed forwards the packet to the intended ES recipient.

Some of the other more advanced functions possibly supported by routers include

fragmentation, congestion control and fairly sophisticated packet filters providing a

modicum of security.

The frame is the mechanism to transport data. Packets ride within frames and are the

PDU to get from one ES to another across the internet. If two ESs are within the

same subnetwork, the packet is framed by the transmitter and can pass through

bridges. If the two ESs are in different subnetworks, the packets will go from the

sender to the local router which decapsulates the Layer 3 header and sends it across

to the next IS in a new frame.

It is worth mentioning that whereas bridges maintain forwarding tables and track

virtually all devices on the subnetwork, there are no entries in these tables for ESs on

different subnetworks and how to reach them. In the latter case, data must pass

through routers.

4.5 ROUTING REVISITED

The network layer protocol in an internet is responsible for all end-to-end routing of

packets. There are many such protocols and they all share some common features.

63

They all use a packet structure and an address format. All of them specify a network,

ToS, as well as other issues like fragmentation, connectionless vs. connection-

oriented service, and packet prioritization (Kousky, 2000).

The fundamental concept for the network layer protocol is the Layer 3 address. This

address is hierarchical, with at least two addressing segments defined. The first

identifies a subnetwork and the second an ES within that subnetwork. These two

fields are always present, whereas some specific Layer 3 protocols define additional

fields (Bennett & Zhang, 1996).

Note that each router and device interface must have a unique Layer 3 address, a

situation not unlike an area code and local phone number in the telephone industry.

This unique IP address is the basis for all routing within the internet. (Egevang &

Francis, 1994).

The vast majority of all Layer 3 protocols are connectionless and datagram (best-

effort) based. We note that connectionless service is one wherein the upper layer

protocol or architecture has no means of requesting an end-to-end relationship or

connection with another ES. All that the Layer 3 protocol can do is to provide data

with a destination address. All acknowledgements, flow control, and sequencing of

messages are managed by the upper layer protocols or applications.

A datagram network is one wherein routers are unable to establish an end-to-end

circuit for traffic. Every packet received by a router is routed independently of earlier

or later packets. So no guaranteed QoS can be provided. On the other hand, if the

network supports end-to-end circuits, intermediate routers would know that packets

will arrive on an established circuit and expected loads can be defined at the time of

circuit setup (Cerf et al. 1974).

64

Table 4.1 summarizes four of the major network layer protocols in use today, along

with their key features. All four are connectionless and datagram based (Bajko et al.

1999).

Table 4.1 Major Network Layer Route Protocols

Address Address

Length Fields Additional

(octets) (octets) Capabilities Used in

Internet 4 NETID (var) Fragmentation, Internet, most

Protocol (IP) HOSTID (var) nondelivery network

notice environments

subnetting

Internetwork 12 Network (4) Automatic client NetWare

Packet Node (6) addressing

Exchange Socket (2)

Protocol (IPX)

Datagram 4 Network (2) Automatic client AppleTalk

Delivery Node (1) addressing

Protocol (DDP) Socket (1)

VINES Internet 6 Network (4) Fragmentation, VINES

Protocol (VIP) Subnetwork (2) nondelivery

notice,

Automatic

addressing

Network Layer

Protocol

IP addressing is the most complex of all. The boundary between the IP subnetwork

number (NETID) and ES number (HOSTID) is not rigidly fixed. The boundary

varies depending on the address class and the subnet mask being used. In Table 4.2,

we see that there are five address classes, three of which are used for deploying

subnetworks.

The more difficult aspect of the IP addressing mechanism is the notion of the subnet

mask. The purpose of this mask is to take a NETID and divide it into smaller

subnetworks connected by routers. For instance, the Class B address 128.13.0.0 can

be subdivided into 256 smaller networks designated as 128.13.1.0, 128.13.2.0,

128.13.3.0 and so on, using the 255.255.255.0 subnet mask. This process is called

65

subnetting. The mask may also be used by routers to summarize routes. For example,

all the Class C networks from 199.12.0.0 to 199.12.255.0 can be advertised as

199.12.0.0 using the 255.255.0.0 mask. This process is called supernetting or

Classless Inter Domain Routing (CIDR).

Table 4.2 IP Addressing Overview

1st

Octet Length of # of # of

Address Class Value NETID NETID HOSTID

Class A 1-126 1 octet 126 16,777,214

Class B 128-191 2 octets 16,382 65,534

Class C 192-223 3 octets 12,097,150 254

Class D 224-239 Multicast N/A N/A

Class E 240-255 Reserved N/A N/A

Usually, IP addresses are assigned to devices manually, but there can be exceptions.

For example, the Dynamic Host Configuration Protocol (DHCP) permits a DHCP

server to dynamically lease addresses to ESs when they come on line. IP can

fragment packets if required, with reassembly at the destination ES.

4.5.1 Routing Protocols

Routing protocols are responsible for maintaining routing tables dynamically. The

routing protocols monitor the network and accordingly update the routing tables

when network changes occur. Most Network Layer Route protocols use at least two

routing protocols. (Miller, 2004).

These protocols can be evaluated according to a number of criteria:

• bandwidth

• metrics

66

• convergence time

• memory space

• processing power.

Bandwidth is the first criterion for appraisal. Maintaining routing tables means that

routers need to greet each other, which consumes bandwidth. The more bandwidth so

consumed for administrative reasons, the less that is available to the protocol for its

proper intended purpose.

The next point is the metric that the routing protocol minimizes. Some use simple

hop count, whereas other more sophisticated protocols use such metrics as delay,

bandwidth, packet loss, or a combination of these metrics.

The third assessment point is convergence time. This is the delay between the

occurrence of network changes and the time taken for all routers to refresh and

updated themselves with the most current state of affairs, as well as to alter the

routing tables.

Finally, routing protocols use up memory space and processing power within the

routers. With ever more powerful devices, this becomes less of an issue.

Regardless of the specific routing protocol used, these protocols can all be clubbed

under the banner of distributed protocols, meaning that route recalculation occurs at

routers in the internet. We note that centralized routing protocols wherein a single

system makes all routing decisions and then downloading routing tables for all

routers is a relatively recent phenomenon and is happening by degrees. There are two

distinct types of distributed routing protocols (Cisco Systems, 2000; Blake et al.

1998):

1. Distance Vector (DV)

2. Link State (LS).

67

4.5.2 DV Protocols

DV protocols are also called Vector Distance or Bellman-Ford protocols. They have

three important features (Tanenbaum, 2003):

1. Routing updates produced have a list of target or cost pairs.

2. Updates are sent to all neighboring devices.

3. Re-routing calculations are performed within each system.

In essence, DV protocols extract a list of learned destinations and the costs to them

and pass this knowledge on to neighboring devices. These neighbours then use this

information to identify better routes than currently exist in routing tables, at which

point the latter are updated.

Elsewhere, we noted that DV protocols have a count-to-infinity problem. Figure 4.4

illustrates this situation. We see therein that Router A has a route to Subnetwork 1

and Router B uses router A to reach this subnetwork.

If Router A loses the route and Router B advertises this fact before Router A can

advertise the loss, Router A will accept Router B as a new route to Subnetwork 1,

forming a loop. Any packet destined for Subnetwork 1 and arriving at one of these

routers will endlessly thrash between them.

(1) Router A

Loses link

(3) Router A learns route from (2) Router B advertises

Router B and forms loop route to subnetwork 1

Router BRouter A

Subnetwork 2Subnetwork 1 Subnetwork 3

Figure 4.4 The Count-to-Infinity Problem

68

To address this problem, most DV protocols employ a split horizon. This is a rule

that prevents a router from advertising routes downstream, from where they were

learned.

Split horizon with poison reverse is a version that permits advertisement of these

routes but sets the cost to infinity, to prevent other routers from learning. Most DV

protocols send complete updates periodically. This can seriously affect performance

in relation to convergence time, so an improvement to this is to have event-driven

(triggered) updates. However, if the table update was due to a failed (suboptimal)

route, it is possible for a router that has not received information about the change to

reintroduce the old route back into the network. To preclude this occurrence, routers

are required to place failed routes into a hold down state, for a time typically three

times the normal update interval. The first news (good/bad) travels quickly, but

subsequent good news (new routes) is learned more slowly.

In summary, DV protocols are simple to design and implement with little demand for

memory and processing power. However, convergence is a major problem and can

consume a fair amount of network resources. Table 4.3 summarizes the key attributes

of the major DV protocols, including RIP, RIP II, IGRP, EIGRP, RTMP, RTP and

BGP 4.

69

Table 4.3 The Major Distance Vector Protocols

Update Documented

Used to Route Metric(s) Interval by

Routing Internet Hop count 30 s RFC 1058

Information Protocol

Protocol (RIP) (IP)

RIP v2 (RIP II) IP Hop count 30 s RFC 1388

Routing Internet Delay, hop 60 s Novell/ Xerox

Information Packet count

Protocol Exchange

Exchange (IPX)

Interior Gateway IP Delay, 90 s Cisco

Routing Protocol bandwidth

(IGRP) (reliability,

load)

Enhanced IGRP IP, IPX, DDP Delay, Event Cisco

(EIGRP) bandwidth driven

(reliability,

load)

Routing Table Datagram Hop count 10 s Apple

Maintenance Delivery Computer

Protocol (RTMP) Protocol

(DDP)

Routing Table VINES Delay 90 s Banyan

Protocol (RTP) Internet

Protocol

(VIP)

Border Gateway IP Hybrid Event RFC 1771

Protocol version 4 (also policy driven

(BGP 4) based)

Routing Protocol

4.5.3 LS Protocols (Cisco Systems, 2000)

The other family of routing protocols, LS (Link State), has three distinguishing

features:

1. routing update broadcast

2. LS database

3. full route recalculation.

Routing updates are broadcast, in a manner similar to flooding. Full route

recalculation is performed last for all routers running the protocol. The central

feature of LS protocol is the LS database. Routing updates as flooded are all stored in

a local database. This database has enough information to graph out the entire

70

network, calculate alternative paths, and construct a routing table. All the databases

must synchronise. New routers in the network can obtain a copy of the database from

nearby routers, whereas existing routers periodically revisit and verify the integrity

of the database. The size of the database is impacted by the size and complexity of

the network. Although updates are kept small and are event–driven, the flooding still

consumes bandwidth. So there came about certain techniques to reduce the effect of

flooding of this information.

First, only selected routers are required to send a flooded update, priority checking

the update against their database to avoid re-sending already sent updates. Another

mechanism is for networks running LS to be segmented into areas, with updates

flooded only within the areas. To handle interarea routing information, specific

routers are assigned to summarize interarea routes to other areas.

LS protocols converge more quickly. They can be more bandwidth-friendly than DV

and less susceptible to routing routes. However, they can be more complex to

designs, configure and implement. They also consume more router resources like

memory and processing power.

The Internet is the collection of all existing internets. Each internet is locally

administered and referred to as an autonomous system (AS). Any routing protocol

working within an AS is called a Interior Gateway Protocol (IGP). However, there is

also a need to route across ASS when such protocols are called EGPs (Exterior

Gateway Protocols).

Table 4.4 summarizes the key attributes of the major LS routing protocols that exist

today, including OSPF, IS-IS, and NLSP.

71

Table 4.4 The Major Link State Protocols

LS Routing Protocol Used to Route Metric(s) Documented by

Open Shortest Path First IP Dimensionless RFC 2178

(OSPF)

Intermediate System to IP, CLNS Dimensionless RFC 1142 &

Intermediate System (IS-IS) ISO DP 10589

Netware Link Services IPX Dimensionless Novell

Protocol (NLSP)

4.6 EXCURSION INTO THE TRANSPORT LAYER

TCP/IP has two fundamental transport layer protocols (Clark et al. 1987; Floyd,

1994):

1. TCP (Davidson, 1992)

2. User Datagram Protocol (UDP) (Tanenbaum, 2003).

TCP is a reliable two-way byte stream protocol and is rather complex. It guarantees

in-sequence and accurate delivery of data by building a checksummed virtual circuit

connection over and on top of IP’s unreliable, connectionless, best-effort service. It

also deploys flow control and congestion control mechanisms that allow for efficient

use of bandwidth. In this context, a window of packets is sent pending

acknowledgement. This windowing mechanism is the basis for TCP’s flow and

congestion management capabilities (Aggarwal et al. 2000).

TCP also supports multiplexing, whereby messages can be sent to different processes

on the same host. It does this by means of port abstraction, wherein every process on

an ES is assigned a unique (locally) port number. The TCP header has a source and

destination port number. Services such as Telnet use this port abstraction and allow

multiple clients to connect to the same service (Barford & Crovella, 1998, 1999).

72

UDP is a best-effort unreliable connectionless protocol for applications without any

sequencing/flow control requirements. It is often used when promptness, rather than

accurate delivery, is sought – for example, when sending speech or video.

4.7 MULTIMEDIA SERVICE

We now turn our attention to the transport of multimedia applications, which is an

instance of issues at the application layer.

VoIP is the enabling technology for multimedia service integration. Savings from

integration can be significant and the technology is available.

Companies with a private voice network retain one or more PBXs to implement the

integrated service. A PBX is the switching element linking two users in a voice or

video connection. PBXs have three basic components (Keshav, 1997):

1. wiring

2. hardware

3. software.

Wiring is dedicated to each phone in use. This allows employees to call each other.

To gain access to the PSTN, PBXs need phone lines purchased from the telephone

company.

Hardware includes a switched network connecting two phones and servers for PBX

software. Software controls such functions as call setup, forwarding, call transfer,

call hold, as well as generating per-call statistics. Organizations with multiple

locations require PBXs dedicated at each site connected by public-leased lines (Joyce

& Walker, 1992).

Moving towards full VoIP deployment, we will gradually see PSTNs replaced by

public data networks, the introduction of VoIP gateways, as well as VoIP

73

gatekeepers, and the customer premise equipment (CPE). To fully support call

management, the International Telecommunications Union’s ITU-T standardized the

H.323 protocol, which describes terminals, equipment and services for multimedia

connections over a LAN. Voice is only one of the services supported.

The H.323 recommendations are proving to be the basis by which many backbone,

access and CPE vendors are developing VoIP components with assured

interoperability. H.323 VoIP products can be broken down into the following

categories, mapping loosely to network layers:

• CPE – includes such devices as Microsoft NetMeeting conferencing software,

Intel Proshare conferencing software, the Selsius Ethernet phone, as well as

the Symbol Netvision phone, an H.323 telephone that plugs into an Ethernet

port.

• Network infrastructure equipment - includes standard routers, hubs, and switches.

Because voice is sensitive to delays and losses, a number of router features such

as Random Early Detection (RED) (Floyd & Jacobson, 1993; Lin & Morris,1997;

Lin et al. 1999), weighted fair queuing (WFQ) (Keshav, 1991), Resource

Reservation Protocol (RSVP), compressed RTP, and multiclass, multilink PPP,

have evolved over the years to address these issues (Demers et al. 1990).

• Servers – provide a major VoIP benefit of utilizing the Internet model, with clear

demarcation between network infrastructure and network applications. The

H.323 gatekeeper service, for example, supports call control, an Authentication,

Authorization and Accounting (AAA) server provides billing and accounting;

and a Simple Network Management Protocol (SNMP) server provides for

network management.

74

• Gateways – represent an evolutionary step for organizations moving towards

VoIP. Given that it will be a while for the data network to handle all multimedia

communication, gateways are an interim measure to link the new VoIP services

with existing public or private voice networks.

4.8 SOME DELAY CALCULATIONS

In this section, we turn our attention to computing the delay times and latency that

occur when we deal with the transmission of small pieces of multimedia across an

internet (Cruz, 1998; Bennett et al. 2001). Because UDP is used to transport digitized

voice, we first need to note that the UDP header is 16 bytes in length. (Held, 2000)

4.8.1 10 Mbps Ethernet, 100 Mbps Fast Ethernet, and 1000 Mbps Gigabit

Ethernet

The delay time calculations for the 10 Mbps Ethernet, 100 Mbps Fast Ethernet and

1000 Mbps Gigabit Ethernet are presented in Table 4.5 below (Johnson, 1996).

Table 4.5 Some Ethernet Delay Calculations

Interframe

Gap (µs) Delay Time Maximum Delay Time

10 Mbps 9.6 ∆ = 9.6 µs + (8 + 6 + 6 + 2 ∆ max = 9.6 µs +1500 bytes

Ethernet + 20 + 16 + 100 + 7) bytes x 8 bits/byte x 10-7

s/bit

x 8 bits/byte x 10-7

s/bit

100 Mbps 0.96 ∆ = .96 µs + (8 + 6 + 6 + 2 ∆ max = .96 µs +1500 bytes

Fast + 20 + 16 + 100 + 7) bytes x 8 bits/byte x 10-8

s/bit

Ethernet x 8 bits/byte x 10-8

s/bit

1000 Mbps 0.096 ∆ = .096 µs + (8 + 6 + 6 + 2 ∆ max = .096 µs +1500 bytes

Gigabit + 20 + 16 + 100 + 7) bytes x 8 bits/byte x 10-9

s/bit

Ethernet x 8 bits/byte x 10-9

s/bit

Ethernet

75

Key parameters for the calculations include the interframe gap (varies, given in µs),

the MAC level preamble (8 bytes), destination MAC address (6 bytes), source

address (6 bytes), type/length field (2 bytes), and the data field (minimum 46 bytes

and maximum 1500 bytes). Within the data field is the encapsulated IP header. The

entire 20 bytes of the IP header need to be read to get to the UDP header, because the

UDP port must also be read. We also factor roughly 7 bytes of RTP information, plus

a small quantity (approximately 100 bytes) of multimedia data. For a maximum

length Ethernet field, the frame length increases to 1500 bytes.

Note that the maximum delay time only applies to data transported between

multimedia carrying packets. This explains why file transfers and similar operations

that have frames inserted between two voice packets can cause distortion on slow

Ethernet LANs.

By computing multimedia delay times, as well as the effect of the insertion of

packets transporting data between digitized voice packets, it becomes possible to

determine if your LANs can handle multimedia prior to implementing a new

application.

4.8.2 Switches (Perlman, 2001)

For a store and forward Ethernet switch, the entire frame must be received, stored,

and processed.

In this context, we can compute minimum and maximum delay times for a 10 Mbps

switch as follows:

∆min = 9.6 µs + (72 bytes x 8 bits/byte x 10-7 secs/bit)

= 67.2 µs

∆max = 9.6 µs + (1526 bytes x 8 bits/byte x 10-9 secs/bit)

76

= 12.304 ms

Similar considerations apply for switches at higher speeds and computations may be

made with the appropriate interframe gap and processing speeds at the higher rates.

When examining vendor specifications for latency, one needs to be cautious.

Sometimes vendors do not denote the frame length used for latency measurements.

This frame length has been factored into our computations carried out above.

As mentioned earlier, typically Black Box congestion control solutions have tended

to operate at the Transport Layer level. Therefore, it has been important to

understand this Layer in detail. [Note that White Box congestion control also

operates at the Network Layer level, hence the discussion of this Layer, mainly for

both the sake of completeness as well as to place future research over the coming

years in perspective]. The treatment of higher layers, as introduced in this chapter,

will be sourced from later points in this thesis.

77

5

THE ETHERNET FAMILY OF LANs

REVISITED 5.1 INTRODUCTION

In this chapter we turn our attention to the several members of the Ethernet family,

including (Stallings, 1993 & 1997):

• 10 Mbps Ethernet

• Fast Ethernet

• Gigabit Ethernet

• 10 Gigabit Ethernet.

Each member of the family has its immediate relatives - for instance, within 10 Mbps

Ethernet, we find 10BASE-T, 10BASE-2, 10BASE-5, 10BROAD-36, and 10BASE-

F. Within 100 Mbps Fast Ethernet (Johnson, 1996), we find 100 BASE-TX,

100BASE-FX, and 100BASE-T4. Also in 1000 Mbps Ethernet, better known as

Gigabit Ethernet (Krol, 1999) there are 1000BASE-LX, 1000BASE-SX, 1000BASE-

CX, and 1000BASE-T. However, in 10 Gbps operations, better known as 10-Gigabit

Ethernet, only one standardized LAN is defined. (Martin & Chapman, 1989)

Recall that a LAN solution consists of a transmission medium, MAC protocol, and

encoding mechanism, and operates using a predefined topology. The transmission

medium is concerned with the properties of the physical carrier that bears the signals

from source to destination. The MAC protocol governs the method by which signals

78

access the medium. The encoding mechanism defines how data and control codes are

encoded. Because all of the LAN technologies we discuss are baseband (digital

transmission of digital data), we will be concerned with only digital encoding.

Different signal elements are used to represent binary 1 and binary 0. A number of

encoding schemes are discussed later in this chapter.

Elsewhere, in other chapters, we look at all these components, paying particular

attention to topology and MAC protocols. The other two components of LANs –

transmission media and encoding schemes – were mentioned there only in passing,

primarily for the sake of continuity and completeness. We are able to afford a more

detailed treatment of these aspects of LANs later in this chapter. (Plummer, 1982)

5.2 TRANSMISSION MEDIA

Let us first turn our attention to the transmission media. There are primarily five

possibilities (Braden, 1999; Black, 1998):

1. UTP

2. STP

3. coaxial cable

4. OFC

5. unguided (wireless).

The last medium will not be treated here, being a topic in its own right.

Each medium has its benefits and limitations. Whatever the medium, the following

key considerations and characteristics must be considered (Stallings, 1997):

• bandwidth (note that the higher the bandwidth, the higher the data rate)

• transmission impairments

79

• mutual interference

• cost

• ease of Installation

• geographic scope supported

• maximum speed of communication supported.

During the ensuing discussion, we will have an opportunity to make observations

about each of these criteria.

5.2.1 Twisted Pair Comes in Two Varieties

1. UTP

2. STP.

UTP is the least expensive of twisted pair wiring. Office buildings come pre-wired

with a lot of excess 100 ohm voice grade UTP wires. Because twisted pair easily

bends around corners and is commonly located near office desks, this medium is

both readily available and easy to install. However, UTP is susceptible to

considerable interference from external fields and picks up a lot of noise as well.

To improve the performance of UTP, STP was developed. This medium shields each

pair of twisted wire using a metallic sheath to reduce interference. This improves

performance, but is more expensive than UTP. Also, it is not as easy as UTP to bend

around corners.

The initial in voice-grade UTP media was to provide services at 1 to 16 bps and was

adequate and suitable for applications extant at that time.

However, since that time, users have migrated to higher bandwidth applications, and

their requirements of the LAN have moved up considerably. Typically, 100 Mbps

Fast Ethernet and 1000 Mbps Gigabit Ethernet have become necessary, so ways had

80

to be found to upgrade LAN performance to these levels. To deal with these new

requirements, three types of UTP cable were initially standardized jointly by the

Electronics Industry Association (EIA) and the Telecommunications Industry

Association (TIA) within their joint EIA/TIA-568 standard. The three types of cable

are:

1. Cat(egory) 3

2. Cat(egory) 4

3. Cat(egory) 5.

Along with the cables and the associated hardware, the speeds supported by these

media include:

• Cat 3 – up to 16 MHz

• Cat 4 – up to 20 MHz

• Cat 5 – up to 100 MHz.

Note that the ability of these categories of UTP to support different types of LAN

transmission will depend on the signaling method used by different LANs. For

instance, consider a 4B/6B encoding. In this case, a Cat 5 cable will support a

transmission rate of (6/4) x 100 = 150 Mbps.

5.2.2 Coaxial Cable

Coaxial cable consists of a pair of conductors, but is constructed differently to allow

the operation of a broader spectrum of frequencies.

Within a coaxial cable is a concentric pair of conductors, with the inner conductor

providing a single wire surrounded by a dielectric insulator. The insulator is in turn

covered by the second hollow conductor ring. The second ring conductor is protected

by a jacket, which forms a shield. Because of the shielding, coaxial cable is less

81

susceptible to noise and interference than twisted pair. Greater distances are possible

as well, as is the support for more attached stations. For example, coaxial cable

supports hundreds of Mbps over transmission distances of 1 km. Although coaxial

cable is more expensive than STP, it provides greater capacity. However, coaxial

cable is less flexible than twisted pair in terms of bending around at the point of

connection.

5.2.2.1 Coaxial Adapters

This device connects an existing (thin) coaxial cable bus network (up to 29 stations)

with a wire hub. The maximum network span is 100 m. The purpose of the adapter is

to connect the BNC T-connector used with 10BASE-2 to the UTP cable whose other

end is attached to the wire hub.

Essentially the coaxial adapter is a two-port repeater connecting one 10BASE-T port

and one this coaxial BNC port (10BASE-2). Through its use, a thin coaxial cable

(200 m length and 29 stations) can be integrated into a 10BASE-T network without

any modification to the existing coaxial network infrastructure.

The 5-4-3 rule applies to each member of the 10Mbps Ethernet family. As a

refresher, the rule indicates that data frames can traverse a maximum of three (3)

populated segments, four (4) repeater hops, and five (5) total segments. Any segment

with a workstation represents a populated segment.

5.2.3 Optical Fibre Cable (RFC 2127)

OFCs are made of three possible substances, in order of decreasing cost and

performance:

• ultra pure fused silica

82

• multicomponent glass

• plastic fibre.

An OFC consists of three concentric rings

• core

• cladding

• jacket.

The core is the central section composed of many fibres. Each fibre is surrounded by

a cladding consisting of either a glass or plastic coating. The outermost ring is

composed of plastic-like materials to protect the cladded fibres.

OFC is being used for long-haul telecommunications, as well as in LAN

environments. The ongoing improvements in technology and the dropping cost to

manufacture are making OFC more and more popular as an alternative medium for

LANs. The key advantages for the use of optical fibre include a significantly high

bandwidth, which permits a much higher data rate than obtainable on copper-based

media, its micro compact size, and immunity to electronic interference from external

sources.

OFC systems cover both the infrared and visible spectra. The different types of OFC

technologies include step index multimode, single mode, and graded index

multimode. Once a light pulse enters an OFC, it behaves according to the physical

properties of the core and cladding.

In multimode step index fibre, the light bounces off the cladding at different angles,

continuing down the core, whereas others are absorbed by the cladding. This type of

OFC supports data rates up to approximately 200 Mbps for distances up to 1 km.

83

By gradually decreasing the refractive index of the core, reflected rays are focused

along the core, more efficiently yielding data transmission rates of up to 3 Gbps over

many kilometers. This type of optical fibre is referred to as graded index multimode.

The last type of optical fibre focuses rays even farther, so that only one wavelength

can pass through the core at a time. This type of fibre is referred to as single mode

and is the most expensive, as well as best performing. Lasers can be coupled to

single mode fibre, permitting extremely high data rates at long distances.

5.2.3.1 Fibre Optic Technology

A significant enhancement to the 10BASE-T Ethernet technology is Fibre Optic

Repeater Link (FORL). Transmission of data occurs along dual fibre cable (1 for

transmission, 1 for reception). OFC technology enables the support of multiple

Ethernet segments at distances of up to 2 km. Using a fibre transceiver, one can

connect remote stations, connect a wire hub and a fibre hub, and support multiple

stations (Davidson, 1992).

When an optical transceiver is used on a wire hub, one can connect to dual fibre

cable. Basic OFC devices are briefly described here mainly for the sake of

completeness and continuity of the discussion.

5.2.3.1.1 Optical Transceiver

This device consists of electronics and circuitry that translate ON and OFF indicators

into the presence and absence of light signals, which in turn are mapped to an

encoding scheme.

84

5.2.3.1.2 Fibre Hubs (Perlman, 2001)

Fibre hubs consist of many FORL ports, one AUI port, and one or more 10BASE-T

ports. The FORL ports link fibre hubs to fibre adapters or fibre adapters to fibre

BAUs in PCs.

5.2.3.1.3 Fibre Adapters

A fibre adapter is a media conversion device, translating between coaxial and optical

fibre. The fibre adapter extends the transmission distance between a wire hub and a

station from 100 m to 2 km, with an adapter required at each end of the fibre link,

unless a station is directly connected to a fibre hub. When attached to a fibre hub, the

distance separation is 2 km. When attached to a wire hub, the maximum transmission

distance is reduced to 15 m. When attached to a PC’s NAU, the separation is again 2

km.

5.3 AN EXCURSION INTO THE ETHERNET FAMILY

In this section, we look at the various members of the Ethernet family, including the

10 Mbps LAN series of standards, 100 Mbps Fast Ethernet, 1000 Mbps Gigabit

Ethernet, and the 10 Gigabit network. The common IEEE notation for these network

solutions is:

< Data Rate Mbps > < Encoding method >< Maximum segment length in hundreds

of metres>

Therefore, using this notation, we can illustrate that a 10BASE-2 network represents

a 10 Mbps baseband network with a maximum segment length of 200 m. In actuality,

the maximum length of a 10BASE-2 network is 185 m, but the IEE nomenclature

refers to the standard as a 200 m maximum length.

85

5.3.1 10 Mbps LAN (Held, 2000)

There are five 10 Mbps LANs standardized by the IEEE

• 10BASE-5

• 10BASE-2

• 10BASE-T

• 10BROAD-36

• 10BASE-F.

5.3.1.1 10BASE-5

The media cable used for this network is 50 ohm coaxial cable, which means less

interference (low noise) and less reflections. The data rate supported is 10 Mbps. The

maximum segment length is 500 m. 10BASE-5 is extensible using repeaters, with a

maximum of four repeaters between any two stations on the LAN based on the

previously mentioned 5-4-3 rule. The maximum LAN length is itself 2.5 km. The

topology is a bus structure resulting in stations contending for access to the bus. The

cable diameter is 10 mm and, at most, 100 nodes are permitted per segment. The

maximum number of nodes per network is 1024 and the maximum node spacing is

1000 m. The cable and wire type is more formally referred to as RG-50/58, with N-

type connectors.

5.3.1.2 10BASE-2 (Stallings, 1997)

A 10BASE-2 network represents a less expensive and less capable LAN in

comparison to the 10BASE-5. In a 10BASE-2 network, the electronics are attached

to the station without an AUI. Although this network uses a 50 ohm bus topology

cable with the same data rate of 10 Mbps as a 10BASE-5 network, there are

86

significant differences between the two. First, the 10BASE-2 cable diameter is 5 mm,

which means it is significantly thinner than the cable used in a 10BASE-5 network.

Because of this, the cable used in a 10BASE-2 network is sometimes referred to as

thinnet or cheapnet cable. Because of the thinner cable, the maximum segment length

is 185 m, with up to 15 m allowed for the tap, resulting in a segment length of 200 m.

The 10BASE-2 network permits a maximum network span of 1000 m, with 30 nodes

per segment allowed, as well as 1024 maximum nodes per network. Minimum and

maximum node spacing on a 10BASE-2 network are 0.5 m and 200 m, respectively.

The maximum number of segments is three.

Note that 10BASE-5 and 10BASE-2 can be interconnected. However, because

10BASE-2 is less noise resistant, unexpected problems can occur when these two

types of network are bridged. The type of cable used for a 10BASE-2 network is

more formally referred to as RG-6/6a/22, with BNC connectors used to attach to the

cable.

5.3.1.3 10BASE-T (Martin & Chapman, 1989)

The 10BASE-T LAN is a twisted wire hub centric network that permits the use of the

prewired installed base of UTP cable in most organizations. Because the 10BASE-T

network is hub based, the topology can be viewed as a star.

The data rate of a 10BASE-T network is again 10 Mbps, and the encoding is

Manchester. The maximum link length is 100 m. 10BASE-T is interoperable with

10BASE-2 or 10BASE-5 networks. The maximum number of coaxial cable segments

in a path between stations is three. The maximum number of repeaters is four and the

87

maximum number of segments is five. Thus, this follows the 5-4-3 rule previously

mentioned in this chapter.

5.3.1.4 10BROAD-36

As previously noted, 10BROAD-36 uses radio frequency modems and does not

encode data digitally. Because this technology is largely superseded and does not use

digital encoding, it will not be discussed further.

5.3.1.5 10BASE-F

The 10Mbps version of Ethernet that operates over optical fibre is referred to as

10BASE-F. This network provides many benefits associated with the use of optical

fibre and can be implemented in three versions:

• 10BASE-FP – this is a star situation using stations and repeaters up to 1 km

per link.

• 10BASE-FL – point-to-point connections of either stations or repeaters up to

2 km.

• 10BASE-FB – defines a point-to-point link that can be used to connect

repeaters at up to 2 km.

The media for all three versions of 10BASE-F is an optical fibre pair, with encoding

occurring using Manchester encoding. 10BASE-FP supports 33 stations per star.

5.3.2 Fast Ethernet (100 Mbps) (Johnson, 1996)

Fast Ethernet provides a low-cost Ethernet networking capability at a data rate of 100

Mbps. This version of Ethernet uses the same frame format as Ethernet 802.3 LANs,

88

as well as the same MAC protocol. The topology is a star, because it is based on the

use of hubs.

Figure 5.1 illustrates what we refer to as the Fast Ethernet tree and indicates the

different versions of this networking technology, as well as the different media the

versions support. As indicated in Figure 5.1, Fast Ethernet comes in three basic

versions:

• 100BASE-TX – the topology is a star, and the encoding method is MLT-3.

The media used are either two pairs STP or two pairs Cat 5 UTP. The

maximum segment length (link) is 100 m. The total network span is 2500 m.

• 100BASE-FX – the topology is a star, with data encoding being either 4B5B

or NRZ-I and uses two optical fibres, one for each direction.

• 100BASE-T4 – topology is a star, with the encoding being either 8B6T or

NRZ. It is the most popular version of Fast Ethernet as it supports the use of

either four Cat 3 or Cat 5 UTP cables.

100 BASE-T

100 BASE-X

100BASE-TX 100 BASE-FX 100 BASE-T4

2 Cat 5 UTP 2 STP 2 Optical Floor 4 Cat 3 or Cat 5 UTP

Figure 5.1 The Fast Ethernet Tree

Each version of Fast Ethernet uses the same MAC protocol (CSMA/CD) and framing

as their 10Mbps cousins. The only difference between the two families concerns the

89

envelopes of framing at 100 Mbps, with Fast Ethernet using special codes referred to

as starting and ending delimiters. Because those delimiters are ignored by network

adapters, the resulting framing is considered to be the same as on a 10 Mbps Ethernet

network.

One key difference between 100 Mbps Fast Ethernet and its 10 Mbps cousins

concerns the 5-4-3 rule. Specifically, the 5-4-3 rule does not apply to Fast Ethernet.

Cable distance is restricted to a maximum of 100 m, and without optical fibre

technology the maximum distance between nodes is 205 m. If two Fast Ethernets are

connected, the distance between the networks is restricted to a maximum of 5 m.

Thus, Fast Ethernet networks cannot be cascaded.

There are two key aspects to selecting the architecture of a Fast Ethernet network:

• backbone operation

• switch segmentation.

5.3.2.1 Backbone Operation

In this context, a 100 Mbps Fast Ethernet hub is used with autoconfiguring 10/100

Mbps ports to connect two or more 10BASE-T networks. Unfortunately, this

scenario results in one big collision domain, without any performance improvement

for users on the horizontal axis.

If some users are relocated from the 10BASE-T hub to the backbone hub, those users

can avail of the 100 Mbps rates in the backbone. This occurs specifically when

servers are placed on the backbone.

90

5.3.2.2 Switch Segmentation

For true performance improvement, replace the 100BASE-T hub with a LAN switch,

which functions broadly, as described elsewhere, by providing N/2 connections each

at 100 Mbps, where N represents the number of ports on the switch.

You can use a switch to provide connectivity with both departmental servers and

individual stations. The former are attached directly to the switch and are serviced at

100 Mbps, whereas the latter are placed on 10 Mbps LAN segments. This design

technique provides additional bandwidth as appropriate to departmental servers,

whereas the total bandwidth is shared on the 10 Mbps LAN segments.

5.3.3 Gigabit Ethernet (1000 Mbps) (RFC 2128)

Gigabit Ethernet is a LAN technology that allows for transmission at data rates of

1000 Mbps. Figure 5.2 illustrates the architecture of a Gigabit Ethernet LAN, with a

1 Gbps LAN switch servicing three 100 Mbps Fast Ethernet hubs, while a server or

perhaps a server farm is supported directly at a 1 Gbps data rate.

100 Mbps

100 Mbps

100 Mbps

100/ 1000 Mbps Module

Server

Hub Hub Hub

Figure 5.2 Server/ Switch Connection

91

Gigabit Ethernet uses the same CSMA/CD MAC protocols as Fast Ethernet and

Ethernet, which makes all three interoperable.

There are four physical layers supported by the Gigabit Ethernet series of standards.

Those standards and their physical media used are summarized below:

• 1000BASE-LX – 1300 nm laser on single/multimode fibre

• 1000BASE-SX – 850 nm laser on multimode fibre

• 1000BASE-CX – short-haul copper twinax STP cable

• 1000BASE-T – long-haul copper UTP.

Data transferred to a specific type of Gigabit media are encoded using either 8B/10B

or 4D PAM5. This is important to know from a design standpoint as it has a bearing

on the data rates.

MAC Media Access Control Layer

Interface

PMD Physical Medium Dependent

Physical Layer

Physical Medium

1000 Mbps

LLC Logical Link Control Layer

Data Link Layer

Reconciliation Sublayer

MDI media dependent

GMII Gigabit Media

Independent Interface

PCS Physical Coding Sublayer

PMA Physical Medium Attachment

Higher Layers

Figure 5.3 Gigabit Ethernet Architecture

92

The basic Gigabit Ethernet architecture is depicted in Figure 5.3. Encoded data are

transmitted to the MAC layer, with the exception of the UTP Gigabit Ethernet where

a special GMII (Gigabit Media Independent Interface) is defined to connect the

physical and MAC layers.

The GMII is an interface that provides 1 byte parallel receive and transmit as a chip-

to-chip synchronous interface. The GMII is divided into three sublayers:

• Physical coding sublayer (PCS) – provides a uniform interface to the

reconciliation layer for all physical media. 8B/10B encoding is used. Carrier

sense and collision detection are functions of the PCS. It also supports the

autonegotiation process for NICs.

• Physical medium attachment (PMA) sublayer – provides a medium-

independent means for the PCS to support various serial bit-oriented physical

media.

• Physical medium dependent (PMD) sublayer – maps the physical medium to

the PCS. It defines the physical layer signaling for various media.

Unlike when using a 1 Gbps LAN switch, which does not require a separate MAC

protocol, because of the available full-duplex operation, when using Gigabit Ethernet

on LAN interconnecting devices, an enhancement is required to the basic CSMA/CD

scheme. This occurs as (Held & Jagannathan, 2004):

• carrier extension

• packet bursting.

Carrier extension represents a way of maintaining the IEEE 802.3 minimum and

maximum frame sizes plus decent cabling distances.

93

Figure 5.4 illustrates the Gigabit Ethernet frame format to include carrier extension.

The carrier extension are nondata extension symbols included as a padding within

the collision window to ensure the minimum length frame is 512 bytes. The entire

padded frame is considered for collision detection, whereas only the original data

without extension are used for error check frame check sequence considerations.

64-byte minimum

512-bytes maximum

Duration of Carrier event

L/T Data FCS ExtensionPreamble SFDDestination

Address

Source

Address

Figure 5.4 Gigabit Ethernet with Carrier Extension

Legend: SFD = Start of Frame Delimiter; FCS = Frame Check Sequence.

Note that carrier extension wastes bandwidth with too many extension symbols per

actual meaningful information sent. Small packets may use up to 448 pad bytes.

But with packet bursting, a burst of packets are sent. The first of these are padded as

before. Subsequent packets are sent back-to-back without extensions, maintaining the

interframe gap and burst time. This procedure substantially improves throughput.

Figure 5.5 shows how packet bursting works.

94

Burst Time (1500 bytes)

Carrier

Sense

Send Data

Packet 1 FFFFFFF Packet 2 FFF Packet 3 FFF Packet 4

Extension Bits

Slot Time 512 B

Figure 5.5 Gigabit Ethernet with Packet Bursting

5.3.4 10 Gigabit Ethernet

As the demand for high-speed networks continued to grow, a need for faster Ethernet

technology became apparent. In early 1999, the IEEE 802.3 committee chartered the

High Speed Study Group (HSSG) to standardize what has come to be known as 10

Gigabit Ethernet. Some of the objectives of the HSSG included:

• support for 10 G Ethernet at 2 to 3 times the cost of Gigabit Ethernet

• maintain earlier frame formats

• meet IEEE 802.3 functional requirements

• provide compatibility with IEEE 802.3 flows

• media independent interface

• full-duplex only

• speed independent MAC

• support for star LAN topologies

• support for existing and new cabling infrastructure.

95

Some of the benefits of 10 Gigabit Ethernet include a low-cost solution for enhanced

bandwidth and faster switching. Because there is no need for fragmentation,

reassembly or address translations, and switches are faster than routers, using 10

Gigabit as a backbone technology provides a mechanism to remove bottlenecks with

a scalable upgrade path.

Upper Layers

MAC Client

MAC

Reconciliation

Presentation

Session

Transport PCS

Network PMA

Data Link PMD

Physical MDI

Application

Medium

10GMII

Figure 5.6 10 Gigabit Ethernet Architecture

Figure 5.6 illustrates the architecture of 10 Gigabit Ethernet. Similar to other

versions of Ethernet, several layers are defined, some specific to the use of different

types of media. Let us examine some of those layers:

• MAC layer – provides a logical connection between the MAC clients of itself

and its peer station. Functions include initialize, control and managing

connections.

• Reconciliation sublayer – acts as a command translator. It maps MAC layer

terms and commands into the electrical formats for the physical layer.

96

• 10 Gigabit media independent interface (10GMII) – functions as the standard

interface between the MAC layer and the physical layer.

• PCS sublayer – codes and encodes data to and from the MAC sublayer. No

standard encoding scheme is defined for this layer.

• PMA sublayer – serializes code groups into bit stream suitable for serial bit-

oriented physical devices and vice versa.

• PMD sublayer – responsible for signal transmission. Amplification,

modulation and wave shaping functions are performed by this sublayer.

Different PMD devices support different media.

• Media dependent interface (MDI) sublayer – references a connector. The

sublayer defines different connector types that attach to different media.

• Physical layer architecture – there are two structures for the physical layer

implementation of the 10 GB Ethernet:

o serial implementation

o parallel implementation.

The former uses one high-speed (10Gbps) PCS/PMA/PMD circuit block,

whereas the latter uses multiple blocks at lower speed. Figure 5.7 depicts

these two situations.

Currently, the preferred media adopted for 10 Gigabit Ethernet is optical fibre.

Because of its high data rate, it is doubtful that a copper version of the technology

can be developed.

97

10 GMII

PCS PCS PCS PCS

PMA PMA PMA PMA

PMD PMD PMD PMD

Reconciliation Reconciliation

Distributor/Collector

Medium 10 Gbps

Figure 5.7 10 Gigabit Ethernet Serial and Parallel Implementations

5.4 LAN ETHERNET DESIGN

The first step in the Ethernet LAN design process is to selct a concentrating device,

such as a wire hub, bridge, LAN switch, or router. All can be at 10 Mbps to 1 Gbps

speed. Then select a transmission medium (and encoding scheme), all appropriately

at 10 to 1000 Mbps speed.

Let us first consider a wire hub and router model, as illustrated in Figure 5.8 where

an I/O port on the wire hub enables an extended 10BASE-T network, basically a star

topology (extended).

Workgroup Workgroup Workgroup

Server Server Server

Building A Building B Building B

Hub Hub Hub Hub Hub Hub

Enterprise Server

Hub HubHub

Backbone LAN

Figure 5.8 Traditional Hub and Router Campus Network

98

What are the advantages of 10BASE-T? In comparison to coaxial cable, it is less

expensive and more flexible, there is an extensive installed base of extra wiring, and

being point-to-point, any breakage impacts only one user.

When using 10BASE-T, connectivity to other types of IEEE 802.3 LANs, such as

10BASE-5 and 10BASE-2, occurs via AUI on the hub. There is one AUI port per

wire hub, which is used to connect to other IEEE 802.3 networks. Figure 5.9 depicts

the interconnectivity between 10BASE-T and 10BASE-5 networks.

T T COAX 10 BASE 5

UTP To TC

next hub

UTP

TC

MAU

Host computer

Wire Hub

UTP

BackBone

Figure 5.9 Interconnecting 10Base-T and 10Base-5 Networks

The next step in designing LANs is depicted in Figure 5.10, wherein Layer 2

switching is used in the core, distribution and access layers. There are four

workgroups attached to the access layer switches. Router X connects to all four

virtual LANs (VLANs). Layer 3 switching and services are concentrated in Router X

as well. Enterprise servers are connected logically to Router X. Router X is typically

referred to as a “Router on a Stick”, serving many VLAN connections.

99

Building A Building B Building C


Enterprise servers Enterprise server

Hub Hub Hub

ISL - attachedWorkgroup

server

Distribution Distribution Distribution

Core

Router X

Figure 5.10 Campuswide VLAN Design

5.4.1 Campuswide VLANs with Multilayer Switching (Cisco Systems, 2000)

This type of networking structure makes it possible for configured stations to

relocate to a different floor or even a different building, e.g., a mobile user plugs a

laptop into a different LAN port in a different building. Such a situation is typically

handled by the use of a VLAN Trunking Protocol (VTP) and is illustrated in Figure

5.11

Building Building Building


FEC/ISL server

Hub Hub Hub

Distribution Distribution Distribution

Workgroup

ISL - attached

server

Enterprise server

Enterprise

server

Enterprise

server

Multi Layer

Switch

Figure 5.11 Multilayer Switching

100

We can now see how a computer on a coaxial-cable-based network connects to a

wire hub. A transceiver interfaces the transceiver cable much like a host station is

connected to a coaxial network. In this situation, the NIC is connected via a

transceiver cable to the transceiver on the LAN. However, when a coaxial cable NIC

is attached to a wire hub, an AUI adaptor interfaces the NIC with a UTP that

connects to the wire hub. Figure 5.12 illustrates the use of an AUI adaptor so that a

NIC can be connected to a wire hub port using a UTP cable.

AUI Connector

PC

AUI Adapter

Wire Hub

NIC

Figure 5.12 Connecting a Coaxial Cable NIC to a Wire Hub

5.5 SWITCHES REVISITED (Perlman, 2001)

Switches are a fundamental aspect of most networks. They allow source and target

nodes to communicate over a network at the same rate without slowing each other

down by sharing multiple simultaneous connections. Switches have been discussed

in detail elsewhere and are revisited here briefly for the sake of continuity.

The following problems were observed with a hub-based network configuration.

(Zhang, 1989) provides a historical perspective.

101

5.5.1 Scalability, Latency, Global Effect of Failures and Collisions

By adopting the schematic configuration in Figure 5.13, it becomes possible to

alleviate some of the problems associated with a hub-based network configuration.

Switches alleviate many hub-based problems by dedicating bandwidth to individual

unicast communication paths or connections. So, if there are N ports on the switch,

with each connection at 10 Mbps, the switch itself delivers N/2 x 10 Mbps to the

configuration.

BACKBONESWITCH SWITCH

HUB HUB

Figure 5.13 Hubs and Switches

5.5.2 Encoding Schemes

In concluding this chapter, we will briefly review LAN encoding schemes as they

define how it becomes possible to achieve a given data rate on a particular media that

supports a given signaling rate. Various common LAN encoding schemes are

described below in order of increasing complexity.

5.5.2.1 Nonreturn to Zero Level

NRZ-L is perhaps the simplest encoding method (see Figure 5.14). Under this

encoding scheme, a high voltage is used to indicate the presence of a one bit, and the

102

absence of voltage is used to indicate the presence of a zero bit. Because two or more

successive set or nonset bit positions need clocking to differentiate the bits, it would

be expensive to use this coding on a LAN.

5.5.2.2 Nonreturn to Zero Invert on 1s

Under an NRZ-I encoding scheme, one maintains a constant voltage pulse per bit

time. Data is encoded as the presence or absence of a transition at the beginning of a

bit time. A transition at bit start signifies a binary 1. No transition is a binary 0. This

is a case of differential coding. Its main benefit is that it may be more reliable to

detect a transition in the presence of noise than to compare voltage threshold values.

Another consideration is that it sometimes becomes easy to lose polarity of the

signal. Because there are problems with NRZ - in particular the loss of time when

one bit ends and another one begins, which leads to a drift in timing and ensuing

corruption of the signal - other coding schemes were introduced.

5.5.2.3 Manchester

In Manchester coding there is a transition at the middle of each bit period. A high-to-

low transition is a binary 0, whereas a low-to-high transition is coded as a binary 1.

Notice that this coding method provides a self-clocking function as individual bits

from pairs of set or nonset bits can be distinguished.

5.5.2.4 Differential Manchester

Under Differential Manchester encoding, the mid bit transition does only clocking.

The presence of a transition at the beginning of a bit period means a binary 0. No

transition at the beginning of a bit period is a binary 1.

103

NRZ-L

NRZ I

Manchester

Differential Manchester

Figure 5.14 Some Basic Encoding Schemes

Both Manchester and Differential Manchester coding techniques are popular for

LANs. They are sometimes called biphase, because there may be as many as two

transitions per bit time. So, the maximum modulation rate is twice that of NRZ,

which in turn means more bandwidth.

The advantages of Manchester and Differential Manchester coding include

synchronization (based on transitions), no DC component, and a built-in error

detection capability, because noise needs to invert the bit before and after, which is

unlikely.

5.5.2.5 4B/5B-NRZ-I

Under 4B/5B encoding, encoding is done four bits at a time. Every four bits

translates into five code bits. The efficiency is thus 80 percent. For further

synchronization, each code bit is treated as a binary value and further encoded with

NRZ-I. This scheme lends itself to optical fibre transmission.

104

5.5.2.6 MLT-3

Under MLT-3, three levels are used for encoding a binary 1:

• A positive (+ve) voltage

• No voltage

• A negative (-ve) voltage.

The steps involved are:

• If the next input bit is 0, the next output value persists.

• If the next input bit is 1, there will be a transition as follows:

o preceding value +/- � next output is 0

o preceding value 0 � next output is nonzero, and opposite in sign to

last nonzero output.

Note that this in turn implies that if the signaling rate is one-third of the operating

rate, a baud rate of 33.33 MHz will support a LAN at 100 MHz.

5.5.2.7 8B/10B

8B/10B encoding is popularly used in fibre channel and Gigabit Ethernet. Under this

encoding technique, each eight bits of data translates into ten bits of output. 8B/GB

was developed and patented by IBM; it is more powerful than 4B/5B in terms of

transmission features and error detection. Figure 5.15 illustrates an example of the

generic mB/nB encoding, mapping m source bits into n output bits.

105

Encoding Switch

Control

Adapter Interface

5B/6B

functions

3B/6B

functions

Disparity

Control

Figure 5.15 8B/ 10B Encoding

Note the use of a functionality called disparity control - this basically addresses the

excess of 0s over 1s and vice versa. An excess in either case is called a disparity. If

there is one, the disparity control block complements the 10-byte block to redress the

problem.

106

6

BLACK BOX CONGESTION

CONTROL 6.1 THE BASIC PROBLEM

In one picture of the Internet, we see a constellation of switching elements (a.k.a.

routers/switches/ISs) [IS = Intermediate Systems]. So there are a number of

potentially different paths from one ingress host to another egress target host. This is

the key notion to appreciate: a host at one end of the Internet communicating with

another host on the “other side” of the Internet via packets of information sent across

links connecting a sequence of routers. “Flow control” is the consideration that a

very fast or “aggressive” ingress end-host should not overwhelm a lower-capacity

egress end-host which would then be forced to drop fast arriving packets which it

cannot process at their incoming speeds. This then is flow control, an end-host to

end-host consideration. Whereas “congestion control” brings to bear a slightly

different consideration. Can the ingress host overwhelm the network itself (Internet)?

Can a host inject packets into the network so fast that in-between routers are unable

to cope with the volume of incoming packets, and so are forced to drop some. This

is “congestion” and we need provisions to deal with it. As will be seen, there are two

aspects to this problem (Clark & Fang, 1998; Clark & Tennenhouse, 1988; Clark,

1988; Anjum & Tassiulas, 1999; Bajko et al. 1999):

107

• Given a pair of end-hosts in communication, regardless of the innards of the

routers in the network, how is it possible to optimize network usage? This

may be called the Black Box approach to congestion management. Examples

are TCP Tahoe, TCP Reno and TCP Vegas. We also introduce a novel Black

Box mechanism, tentatively called “Sierra”. Black Box congestion control is

the subject of this chapter (Balakrishnan et al. 1999; Jain, 1990).

• The related Inter-Network counterpart, White Box modeling, then attempts to

optimize the innate workings of the (sequence of) routers (switching

elements) that connect the communicating hosts. Examples are FIFO, Tail

Drop, and the RED family (Charny et al. 1995) of routers. This White Box

approach is out of scope for this thesis. (Bennett & Zhang, 1996; Athuraliya

et al. 2001)

We observe here that we are in the “Unicast” situation, where one end-host is

communicating with one other end-host across the Internet. In our arrangement, one

first optimizes the network usage, then proceeds to optimize the network itself

(internally). Black Box followed by White Box. By this method, one gets an optimal

solution to the congestion management problem itself. The terminology of “Black

Box” and “White Box” models was first introduced by (van Jacobson, 1988), one of

the pioneering figures in the field of congestion control, as early as 1988, but it

would appear that his overtures have not been generally followed through to a logical

conclusion in the literature since then (Clark et al. 1988; Benmohammed & Meerkov,

1993; Morris, 1999 provide a general treatment).

108

6.2 THE BLACK BOX APPROACH DESCRIBED

Fundamental to all the known Black Box models (except perhaps TFRC [TCP

Friendly Rate Control] and Sierra) is the notion of a congestion “window”. The

window is believed to encapsulate the innate “capacity” of the network to deal with

incoming packet flows between pairs (Unicast) of communicating end-hosts. Realize

that the congestion window is a key design concept for most Black Box models. For

a general discussion see (Tanenbaum, 2003; Stevens, 1997).

The congestion window value is based on the so-called Delay & Bandwidth product,

i.e.,

Window = Delay x Bandwidth

In this equation, Delay is the time (RTT) taken for packets to be delivered and acked

from ingress to egress. And Bandwidth is the number of packets that the network can

deliver per unit time to the egress. If one sent a single packet and waited for an ack,

one could send at most 1 packet per RTT. However, if one discharged a Window’s

worth of packets, then one can send that many packets per twice the RTT (Cruz,

1987; Keshav & Morgan, 1997; Kalambi et al. 2002; Morris, 1997). Also see (Wang,

1999; Wang & Crowcroft, 1991) for an interesting discussion.

But it remains to be seen if this is the best way to do the Black Box model, i.e., in

terms of the macro notions of Delay and Bandwidth. Most of the well known

congestion control protocols in existence are window-based and assume that Delay

and Bandwidth are key descriptors of network flow. In the sequel, as promised, we

will look at some alternatives to window-based congestion control, most notably:

• TFRC (TCP Friendly Rate Control) (Mahdavi & Floyd, 1997), and

• Sierra (introduced in Jagannathan & Matawie, 2005).

109

6.3 TCP TAHOE AND RENO

TCP Tahoe and Reno were introduced by van Jacobson as a window-based end to

end congestion management scheme.

Tahoe operates in three stages: slow-start, congestion avoidance and timeout. In

slowstart, the window increases by one unit for every acknowledgement received. So

we see that the window effectively doubles (multiplicatively) each round-trip time.

When TCP Tahoe starts up, it enters slow-start, and stays there till a loss occurs

(under the presumption that a loss is almost always caused by congestion overflow at

in-between router buffers). [This loss is detected only after waiting for a timeout or

waiting for an acknowledgement.] When loss is detected, TCP Tahoe sets the

Ssthresh (slow start threshold) to cwnd/2 and then sets cwnd to 1. After a timeout the

system remains in slow-start till cwnd reaches Ssthresh at which instance the system

enters congestion avoidance. In congestion avoidance TCP slowly, additively,

increases the window size and Ssthresh with each ack as follows:

cwnd � cwnd + 1/cwnd

This has the effect that, after receiving a full window of acks, the window has

increased by exactly 1. This is additive increase. Note that on the other hand there is

multiplicative decrease in window size, because Ssthresh decreases multiplicatively

from cwnd. This process is sometimes called AIMD [Additive Increase &

Multiplicative Decrease] (Chiu & Jain, 1989; Vojnovic et al. 2002).

If a retransmitted packet is lost, Tahoe calls up the exponential retransmit timer

backoff algorithm. This algorithm dictates that with each successive retransmission

of a packet, TCP should double its timeout value. This procedure allows TCP flows

to send at far less than 1 packet per RTT, so that many flows are able to share a

110

bottleneck without loss of stability. When a packet is finally transmitted successfully,

TCP Tahoe returns to slow-start with cwnd = 1.

SS :

cwnd += 1 per ack, and so

cwnd += cwnd, per window of acks

CA :

cwnd += 1/cwnd, per ack, and so

cwnd += 1 per window of acks

One problem with TCP Tahoe is that it regresses too much in response to an isolated

(single packet) loss. This can lead to burstiness rather than persistent overload so a

less stringent backoff may be posited.

van Jacobson fixed this problem rather quickly by introducing TCP Reno with its

two new procedures:

• Fast retransmit

• Fast recovery.

Continuing our discussion above on AIMD, consider the following alternative

behaviour (Floyd et al. 1999) for controlling behaviour, introduced as noted because

Tahoe is too stringent after timeouts.

Duplicate Acks (discussed below in the section on Ack & Ack clocking) are sent that

mention only the last correctively received segment. When 3 duplicate Acks <k> are

received, regard this as a symptom of congestion due to non-received packet <k+1>,

and retransmit <k+1>. The receiver, when he receives <k+1>, then sends a

cumulative Ack acknowledging all received packets after <k+1>, which then do not

have to be re-transmitted. This behaviour is called “Fast Retransmit”. In Reno, Fast

Retransmit is coupled with another related behaviour, “Fast Recovery”, that

111

overcomes Tahoe’s harsh penalty of dropping cwnd to 1 after congestion is detected.

This behaviour works as follows, once congestion is detected:

Ssthresh = cnwd / 2

cwnd = Ssthresh (Thus re-enter CA, not SS).

This is Fast Recovery.

A further discussion of these concepts may be found in (Stevens, 1997). One

problem with Reno is manifest when multiple packets are lost within a single

window (despite fast retransmit/recovery). One solution to this, New Reno, was

offered in (Floyd & Henderson, 1999), and is discussed below after Ack / Ack

clocking.

6.4 ACK’ING AND ACK CLOCKING

Recall that the congestion window cwnd determines how many packets can be

launched into the network, pending acknowledgement. Packets launched into the

network are qualified by the following key settings in the TCP header:

• Sequence #

• Ack #.

Each segment of data sent into the network has a certain length and consists of a

certain number of bytes. Each byte has a “sequence #”, with the seq # of a segment

being the seq # of the first byte of that segment. In other words

Seq # <2> = Seq # <1> + Len [seg <1> ]

Seq # <k> = Seq # <k-1> + Len [seg<k-1>]

If IP fragments a segment, it places the same, segment specific < IDENTIFICATION

# > for all packets in that segment. This enables the egress node to successfully

reassemble fragmented segments (Floyd & Fall, 1998; Floyd & Jacobson, 1991).

112

At the destination, the receiver acknowledges received segments. Here there are two

mechanisms.

Assume that the source TCP has sent out the following segments with their

associated Seq #’s :

<Seqk> , k = 1,2,3,4, etc.

Suppose that segment < ka > is “lost” en route due to congestion. Then there are two

behaviours possible at the egress TCP. Every successfully re-assembled segment is

acknowledged by an Ack # bearing its Seq #. If a segment is not fully received and

assembled, the target is silent, and eventually the sender times out and re-transmits

the lost segment(s). For a leisurely discussion of TCP/IP as relating to our problem,

see (Held, 2002; Blake et al. 1998).

Consider the following scenario. Segment <k> is lost, whereas segments <k+1> and

<k+2> arrive intact. The latter two are not acknowledged by the receiver and so the

sender TCP does not know that they have been delivered. After segment <k> times

out it is re-transmitted. So segments <k>, <k+1> and <k+2> arrive out of order at the

destination. The next Ack from the egress then specifies cumulatively that up to

segment <k+2> have been received, so (unless it has timed out <k+1> and <k+2> as

well) the sender is aware that they have been delivered.

There is yet another variant here. Why not Ack out-of-order segments as well,

instead of being silent when an earlier segment is not fully received? This is partly

because of the TCP Header, which has provision only for a PAR, i.e., if Ack # = N,

all segments up to and including N have been successfully received. TCP can handle

only one problematic segment per window. If there is more than one problematical

segment per window, TCP’s RTT timer value backs off exponentially.

113

This simple observation leads to two mechanisms referred to in the literature as

SACK (Selective ACK) and D-SACK (Duplicate SACK). SACK notes which

segments have been successfully received (instead of silence when an earlier

segment is in error), and therefore which were not received, so that the sender is able

to proactively determine and transmit precisely the missing segments, rather than

having to wait for timeouts and re-transmitting the entire window again. Having

motivated these concepts, SACK and D-SACK are treated in more detail in the

sequel.

6.5 TCP NEW RENO

Timeouts affect TCP performance in two ways. Firstly, a flow has to wait for a

timeout to occur and cannot send data in that period of time. Secondly, after re-

transmission timeout occurs, cwnd goes back down to 1. These facts can adversely

affect the flow’s throughput and performance (Floyd, 1999; Floyd & Henderson,

1999).

Whereas in Reno, partial Acks cause an exit from Fast Recovery, this results in a

timeout in case of multiple segment losses. In New Reno, when a partial Ack is

received at a sender, it does not come out of Fast Recovery. The presumption instead

is that the segment immediately after the most recently acked one was lost, and hence

the lost segment is re-transmitted. Therefore, when multiple segments are lost, New

Reno does not wait for a retransmission timeout and continues to re-transmit lost

segments every time a partial Ack is received. Therefore, in New Reno, Fast

Recovery starts when 3 dup-acks are received and terminates when either a re-

transmission timeout occurs or when an Ack arrives acknowledging the outstanding

data when the Fast Recovery began. Partial Acks deflate the congestion window by

114

the volume of new acknowledged data and then add one segment and re-enter fast

recovery.

6.6 SACK AND D-SACK

The philosophy behind SACK is that of transcending TCP’s limitation to handle at

most 1 problematic segment per window of data. What SACK does is for the receiver

to use dome of the “Options” fields in the TCP header to “selectively” lay out to the

sender exactly which segments have arrived successfully (and so which have not), so

that the sender can then send precisely those segments.

There have been a number of protocols using the SACK idea. However, we believe

that SACK has not been widely standardized in the Internet probably because of a

lack of consensus as to how the Options field is to be utilized. Also not agreed

universally, is how re-transmission is avoided for SACK’ed segments in the sender’s

re-transmission queues (Floyd et al. 2000). Also see (Jacobson & Braden, 1988) in

this connection.

As with other end-to-end mechanisms, the SACK method requires both end-points to

concur that SACK is being used, i.e., both ends are SACK compatible. For one

possible implementation of SACK, see (Mathis et al. 1996).

D-SACK (Duplicate-SACK) combines SACK information from the receiver with

additional data acknowledging receipt of duplicate segments, which (the duplicate

segments) are thus identified for the sender. Then the sender (at most once per

window) takes corrective action to remediate the re-transmission by “un-halving”

Ssthresh, the Slow Start threshold, and reverting to either Slow Start or Fast

Recovery. D-SACK can be useful in environments with persistent re-ordering of

packets. D-SACK is also discussed in (Mathis et al. 1996).

115

6.7 FACK

Forward Acknowledgements (FACK) also attempt to solve the problem of better

recovery from multiple losses. FACK derives its name from the protocol keeping

track of the highest sequence number for correctly received data. (Mathis &

Mahdavi, 1996)

With FACK, TCP registers two more variables:

• F_ack : for the most forward segment acknowledged by the receiver using

SACK

• Re_data : for the total amount of outstanding, re-sent data in the network.

With these variables, it is possible to calculate the amount of outstanding data during

recovery as

F_ack - Re_data

FACK TCP moderates this value (total outstanding data in network) within one

segment of the congestion window cwnd. The latter itself is constant during fast

recovery. Also the F_ack variable is used to more readily trigger fast retransmit

(Floyd & Jacobson, 1991).

6.8 LIMITED TRANSMIT

It has been observed in many places in the literature that in the archetypal scenario of

the Internet:

• 56% of all retransmissions are due to timeouts, and

• only 44% are due to triple duplicates.

116

This observation, coupled with the fact that timeouts are so expensive, has led a

number of researchers to propose the Limited Transmit (LT) (Allman et al. 2000)

mechanism.

LT works very simply, by allowing a sender to send new segments after each of 2

duplicate Acks, instead of waiting for 3 duplicate Acks.

By this method, it is noted that:

• over 25% of the above (56% timeouts) can be avoided.

6.9 TCP VEGAS

The Vegas algorithm is also a window-based, TCP Protocol (Brakmo & Peterson,

1995). It operates in a manner using variation in round trip delays to sense and

counter congestion in the Internet. The basic paradigm is to increase the congestion

window size when delays decrease, and decrease the window size when delays rise

up. Vegas is further developed in (Low et al. 2001; Mo et al. 1999).

The algorithm works as follows:

• Set two control (design) parameters, α and β.

• Calculate the expected throughput, E, where

E = [Current window size] / [BaseRTT]

where BaseRTT is the smallest RTT (Round Trip Time) seen by the source

up to time t. This is estimated to be the real propagation delay.

• Calculate the Actual throughput, A, where

A = [ # acks received ] / RTT

[by sending a distinguished control packet, and computing RTT as the difference

between the Ack reception time and the transmission time for the distinguished

control packet]

117

During this RTT, count the total number of Acks received.

• Use the control equations

o α ≤ E – A ≤ β � Leave Window unchanged

o α > E – A � W = W + 1

o β < E – A � W = W – 1

Whereas it is difficult to directly interpret the dynamics of TCP Vegas in terms of the

more traditional congestion control algorithms presented earlier, we later (Chapter 5)

offer a simple calculation of a lower bound to the throughput offered by TCP Vegas.

We then compare this with the throughputs offered by Tahoe/Reno and Sierra. Also

in the simulation exercise presented in the appendix, we compare, both individually

and jointly, the throughputs offered by all the three major congestion control

algorithms considered in this thesis, i.e., Reno, Vegas and Sierra. We then are able to

compare the relative merits (in terms of throughput) of the algorithms (Ahn et al.

1995; Boutremans & Le Boudec, 2000).

6.10 SIERRA

Sierra was alluded to in (Jagannathan & Matawie, 2004). Here we present more

details.

Sierra uses three basic parameters to optimize network usage:

• RTTs (Round Trip Times)

• Ro,α (T) (Rate, o = egress/output, α = connection)

• Ri,α (T) (Rate, i = ingress/input, α = connection).

118

Unlike in earlier analyses, we do not perform any exponential smoothing of these

parameters. The key contention is:

• Neither Timeouts, nor Duplicate Acks, nor RTT fluctuations are ideal descriptors

of network congestion.

• Use instead RTT measurements coupled with information about ingress/egress

rates to monitor network congestion.

One reason why Timeouts, D-Acks and RTT+/- are not used is because there can be

potential re-routing within the Net, causing these parameters to fluctuate.

So, how do we use RTT, Ro,α (T), and Ri,α (T) to achieve improvement in

performance?

The basic Sierra algorithm can be summarized in a simple manner:

Ro,α (T + RTT(T)/2) < Ri,α (T) � Ri,α (T + ∆T) = Ri,α (T) - ∆

Ro,α (T + RTT(T)/2) ≥ Ri,α (T) � Ri,α (T + ∆T) = Ri,α (T) + ∆

The key issues are therefore:

Q: What is the granularity of time intervals?

A: This is dictated by the averaged out Round Trip Times as measured at the egress.

Q: What is the granularity of Packet Sizes?

A: This is dictated by the maximal screen window that we wish to convey.

With Sierra there is no need to acknowledge every packet, as with conventional TCP

congestion control algorithms. Specifically

Ro,α (T + RTT(T)/2) > Ri,α (T) � No losses or congestion.

Sierra’s policy regarding Acks is to use the Ack space within the TCP header to carry

information about:

119

• RTT(T)

• Ro,α (T) and

• Ri,α (T).

In a nutshell, here is how Sierra works.

At T = To the ingress sends R packets, each timestamped with the sending epoch, i.e.,

1,2,3…

Shortly afterwards, this window arrives at the egress, where the following are

computed using an averaging process:

• Ro,α (To)

• RTT (0)

These are averaged out within a window of ∆o, which is the maximum jitter we are

able to tolerate. These two parameters are sent back to the egress, using several

control packets (to guard against loss of control information).

At the ingress, the following computations are performed:

• T1 = T0 + RTT(0)

• Ri (T1) = Ri (T0) +/- ∆

According as

Ro (T0) >/< Ri (T0) [where >/< means greater or lesser than]

The process continues as shown in the table below, with

• Tk = Tk-1 + RTT(k-1)

• Ri (Tk) = Ri (Tk-1) +/- ∆

According as

Ro (Tk-1) >/< Ri (Tk-1) [ where >/< means greater or lesser than]

120

As the process continues, the following questions arise:

1. How to determine Ro (Tk) and RTT(k)? Our suggested averaging process is more

robust than using RTT fluctuations due to, as mentioned, re-routing variations.

2. What if control information is lost? How much redundancy to be built in?

3. What about Sierra’s “slow” starting?

4. What is ∆ ?

These are design questions that will be tackled later, using simulation variations.

What QoS guarantees can Sierra commit to communicating ESs?

Firstly, from a theoretical perspective, as mentioned, maximally efficient network

utilization. Given a TCP connection, it is not possible to use the Black Box network

more efficiently. Sierra starts with a slow start like mechanism, dubbed “Sierra

Quick Start” (SQS) to attain near optimal ingress rates. After that it’s a question of

using the control equations to maintain the sending rate at a high level of stability.

Here is the Sierra algorithm in detail:

SCAM (Sierra Congestion Avoidance Method)

1. Send Ro Bytes around I = (To - ∆o , To + ∆o).

2. Mark 2N of these as control packets associated with Epoch 0. These are to be

spread evenly around I. We assume that N of these will be lost. So the egress

waits till N of these are received.

3. At the egress, using N control packets, compute RTTo.

4. Also at the egress, compute Ro (0) by summing over Epoch 0 packets over RTTo

+/- ∆o.

5. Send this information, again using 2N control packets, back to the ingress.

121

6. At ingress, using N control packets (N lost), compute RTTi .

7. Set RTT = RTTo + RTTi.

8. Apply the control equation for future epochs.

9. Note that key Sierra design parameters are

a. What is Ro ?

b. What is ∆o ?

c. What is ∆ ?

d. What is N ?

We suggest, in the first instance,

a. Ro = 1.5 MBps

b. ∆o = 5 ms

c. ∆ = 10 KB

d. N = 5.

N.B. The SCAM algorithm itself builds into “Sierra Quick Start (SQS)” which itself

works as follows:

SQS (Sierra Quick Start)

• Identify the optimal level of service desired from the network. Call this Ro

(bytes/sec).

• Set Ri (0) = Ro.

• Launch Ri (k = 0) packets within a ∆o window.

• At egress, as in SCAM, compute Ro (k = 0) as well as RTT o (k = 0). Send this

information back to egress.

• At ingress, compute RTTi (k = 0). Set RTT(0) = RTTi + RTTo.

• If Ri (0) > Ro (0), then Ri (1) = Ri (0) / 2.

122

• By induction,

o If Ri (k) > Ro (k) then Ri (k+1) = Ri (k) / 2.

• When Ri (n) ≤ Ro (n)

o exit SQS

o proceed to SCAM.

6.11 TCP FRIENDLY RATE CONTROL (TFRC)

It is well known that when TCP and UDP connections share a communications link,

UDP tends to hog the bandwidth, because it does not back off under congestion like

TCP does. AQM mechanisms like FRED tend to provide a fairer share of bandwidth

distribution between TCP and UDP, unlike the case of plain RED and Tail Drop

routers (Bennett et al. 1994; Karandikar et al. 2000; Lo Monaco et al. 2001).

A notion of TCP Friendly Rate Control has been proposed according as “a non-TCP

connection should receive the same share of bandwidth as a TCP connection, if they

traverse the same path” (Deb & Srikant 2003; Tsang & Wong, 1996).

Such friendly connections estimate the bandwidth of TCP connections and

subsequently regulate their own ingress rates (Bonomi & Fendick, 1995; Charny et

al. 1995).

TFRC has been developed in a number of investigations (Mahdavi & Floyd, 1997). It

uses a simple stochastic model of Internet behaviour to analytically model congestion

behaviour. In the final analysis, the method reduces to using a control equation that

limits the sending rate of an EBCC (TFRC) [Equation Based Congestion Control]

sender to the steady-state throughput (not goodput) of the connection. By capping the

sending rate of the ingress to a maximum of

123

λ (Throughput) = max ( Wmax / RTT , )321(8/33,1min(3/2 2

0 ppbpTbpRTT

s

++)

it becomes possible for connections to avoid the flapping behaviour of AIMD in

window sizes, thus maintaining a smoother behaviour.

In the above equation:

T = ingress rate (bytes/sec)

B(p) = steady state throughput, not goodput

RTT = Round Trip Time

p = steady state loss event rate

To = TCP retransmit timeout value

s = packet size.

It may be noted that the control equation above is only one particular model of the

stochastic behaviour of TCP connections and what it does is to maintain the sending

rate close to the “steady state” of Tahoe and Reno. Such “friendly” flows perform

TCP friendly rate control, by observing network parameters, and capping the sending

rate using the above formula. This approach has implications for transmitting

multimedia over the Internet (Et, 1994; Kung et al. 1994; Mishra & Kanakia, 1992).

We believe that our proposal, Sierra, arrives at a maximal upper bound for ingress

rates into the network, without presuming the validity of an underlying windowing

mechanism. As a result, it at least improves performance, which will be

demonstrated, using throughput levels. (Zhang & Ferrari, 1993, 1994).

124

6.12 MO-WALRAND ALGORITHM

This is an encouraging development from our point of view, as it clearly

differentiates the Black Box and White Box approaches to congestion control.

In (Mo & Walrand, 2000), the authors use a backlog estimator, not unlike Vegas, but

their (Mo-Walrand) connections constantly adapt their window size (MWcwnd),

proportionally with the separation from a target backlog. They demonstrate that their

scheme (Mo-Walrand scheme) is “proportionately fair” in the sense of Kelly. But we

observe at the outset that their scheme has some of the same problems as Vegas, due

to sub-optimal RTTD estimation.

The authors observe that when connections arrive at a bottleneck which is

maintaining an ongoing queue, later arrivals sense a bloated RTTD so they then

adjust their window sizes to suffer a backlog between A and B, in addition to the

extant queue. If the system is not tampered with, the standing queue persists forever.

They suggest using RED to alleviate this, but this is against our two-pronged

philosophy at large, and can be suboptimal. (Ramakrishan et al. 1988, 1990, 1999)

Elsewhere we simulate the performance of the Mo-Walrand mechanism, in the

presence/absence of competing protocols.

6.13 PACKET PAIR

Packet Pair is an ingenious mechanism proposed by S. Keshav (Morgan & Keshav,

1999), and it works as follows.

Each source probes its available bandwidth in the network by transmitting piggy-

backed pairs of probe packets and measuring the distance (separation) between the

resulting pairs of acknowledgements, while periodically adapting the send rate in a

125

manner as to avoid overflow/underflow at the bottleneck switch’s buffer. There is a

timeout and re-transmission mechanism to cope with losses. (Keshav, 1991, 1997)

Keshav has tested this approach with simulation experiments. These appear to show

that packet-pair sources are retransmit stable, at least up to 500% nominal offered

loads. A buffer management of “drop entire longest queue” appears to give packet-

pair a significant advantage over non-packet-pair sources in times of overload.

In summary, under congestion, packet-pair is stable and can manage with a fraction

of the buffer capacity at switches (vis-à-vis round trip window) and offers

asymptotically good throughput of 83-92% (83% for “drop longest entire queues”,

and 92% for “drop last packets”).

When packet-pair and non-packet-pair sources share a congested connection path,

the former appear to have a decisive advantage over the latter.

Elsewhere we evaluate the relative merits and demerits of packet-pair performance,

in the presence of competing mechanisms.

6.14 BALAKRISHNAN & SESHAN’S CONGESTION MANAGER

In (Balakrishnan et al. 1999) the authors offer a Congestion Manager (CM), which

works at the end systems, to manage congestion state between connections with the

same end to end path. CM leaves error control to higher layers.

CM has a congestion control protocol which works as follows:

Twice per RTT, the source end-system inserts a (probe) control packet into the net,

armed with a probe sequence number, and the number of bytes sent since the last

probe. If the target detects a difference in the number of bytes received vis-à-vis the

number reported in a probe, then they report a loss back to the source. Lost probes

are handled via the sequence number mechanism. As long as a probe is not reordered

126

with data packets, this CM mechanism works well to detect information loss. If a

probe is re-ordered, later probes will remediate the byte count, and thus CM recovers

from a false loss detection, by undoing the backoff.

6.15 THE ‘GOODNESS’ OF ANY BLACK BOX SOLUTION

The following criteria may be applied:

• Throughput over time (a plot of the rate at which data is delivered, including

incorrect/unacceptable data).

• Goodput over time (we need to differentiate acceptable delivered data, versus

unacceptable data, for instance data within and outside of the jitter window. What

is the delivery rate of good data?).

• Fairness behaviour (for instance, when Sierra and Vegas share a bottleneck, how

is the bandwidth distributed between them. Similarly, when two Sierra

connections, with different design parameters, share a path, how is bandwidth

distributed? (Anjum et al. 1999; Kelly, 2003; Boutremans et al. 2000; Hasegawa

et al. 2000).

• Steady state behaviours (limiting behaviour when the time factor becomes very

large) (Cruz, 1987).

• Stability (Bennett et al. 2001; Georgiadis et al. 1997; Jain 1989, 1995).

In this thesis, we have studied the first three aspects, namely:

• throughput

• goodput

• fairness.

Further study of steady state behaviour and long term stability, as well as the direct

effect of Sierra parameters (on performance), will be studied in future research.

127

7

STOCHASTIC MODELLING OF

CONGESTION CONTROL

ALGORITHMS

7.1 MOTIVATION

Consider an Internet traffic source sending packets into a single link connected to

(an)other source(s). The source’s “window” is the maximum number of packets

discharged without waiting for acknowledgement, at any point in time. Such a

window concept is the cornerstone of a number of congestion control algorithms,

such as Tahoe, Reno, New Reno, Vegas and Sierra (partly rate-based). All these are

briefly treated. We then develop quantitative models for the performance

(throughput) levels of all these algorithms. It is shown that Sierra is by far,

analytically, the most superior among all these algorithms. A goal of this chapter is

to introduce the neophyte in congestion management to the discipline and then be in

a position to contribute meaningfully to the literature. As such, all concepts

introduced are motivated and developed from the ground up.

7.2 INTRODUCTION

We begin the discussion by briefly describing the TCP/IP protocol stack, for the sake

of completeness and overview. This is followed by a quick treatment of some

128

popular congestion control algorithms, including TCP Tahoe/Reno, TCP Vegas and

our Sierra. Sierra was developed and elucidated in detail in a series of publications

by Jagannathan and Matawie. We have also included in this thesis all the various

details from these articles. Our position is that researchers can gain further insight

into the behaviour of TCP through the use of mathematical modeling. The basic

paradigm is the “buffer overflow model” which is the cornerstone of all our analysis

of key congestion control algorithms. Unlike, say, systems in modern physics, it is

true that all aspects of TCP and its progressive evolution in behaviour over time are

fully under our control. However, the sheer scale of TCP’s operation and domain is

tremendous, and is probably the very largest and most complex man-made control

system ever deployed. To attempt to “capture” any system of this magnitude, we

need mathematical models. Indeed there are, in the literature, a number of such

(complex) models. On the other hand, the tools we use in this chapter are relatively

simple in nature, and revolve around the buffer overflow model (Floyd & Jacobson,

1991,1993). We provide proofs on select occasions, for the sole purpose that these

techniques may be adapted and applied to the study of congestion management.

7.3 TCP/IP STACK OVERVIEW

The TCP/IP (Transmission Control Protocol/Internet Protocol) protocol offers a

structured, layered architecture for Internet communications, and has widely

superseded its open peer, the OSI (Open Systems Interconnection) protocol. Whereas

the OSI stack had 7 layers, TCP/IP has only 5 layers (Tanenbaum, 2004; Braden,

1998; Davidson, 1992). These five layers are

• Physical Layer

• Data Link Layer

• Network Layer

129

• Transport Layer

• Application Layer.

We briefly describe these layers now, for the sake of overview and completeness, to

place our further discussions in perspective.

The Physical Layer deals with the transmission of raw bits over a communications

channel. Here it is ensured that if a “high” bit is sent, then it is received as a “high”

bit, not a “low” bit. Such matters as voltage representation of 1s and 0s, the time

duration of individual bits, how an initial connection is established and then

destructed after the completion of the communication, are all matters that are handled

at the Physical layer. We are primarily concerned with the interfaces (mechanical,

physical and procedural) with the basic underlying physical transmission medium

underneath. (Black, 1998).

The Data Link Layer is concerned with all the protocols that collect a number of bits

into a “frame” and delivers that frame from one end of a communications channel to

the other. The pertinent issues here are medium access control as well as per hop

error detection and correction. Also of relevance is to keep a fast sender from

overwhelming a slow receiver with data. Frequently this “flow control” and error

management are integrated.

The Network Layer’s basic concern is the routing of data packets across the subnet

from source to destination. Routing can be either static (wired-in) or highly dynamic,

and determined afresh per new packet (Chapman & Kung, 1999; Deerin & Hinden,

1998).

A key design issue at this layer is addressing, which is used critically by the network

layer for packet delivery, but the latter may not be entirely reliable. Packet losses can

occur, especially when node buffers (at intermediate routers) overflow and source

130

transmission rates are greater than processing rates at in-between routers (Braden et

al. 1998; Coltum, 1999; Kousky, 2000; Ramakrishnan et al, 1987).

The Transport Layer’s critical function is to add reliability to the network layer’s

capability. Depending on the protocol used, the transport layer can detect and

retransmit lost packets, if that is necessary. Congestion control algorithms, as

initiated by van Jacobson (ca. 1988) operate at this level. Note that while the

predominant transport protocol in today’s Internet is TCP (Transmission Control

Protocol) - a connection-oriented, end to end, reliable protocol - there do exist other

candidates like UDP (User Datagram Protocol) which is used by some real-time

applications such as video transmission. UDP is a connection-less and non reliable

protocol (Kalampoukas et al. 1995; Kolarov, 2001; Clark et al. 1987; Crowcroft &

Oechslin, 1998; Lin & Kung, 1999; Fraser, 1983).

The Application Layer is the topmost layer and consists of such protocols as ftp, http,

etc., which utilize lower layers to transfer files or other data across the Internet.

7.4 COMMON CONGESTION CONTROL ALGORITHMS

The congestion control algorithms presented here all work at the Transport Layer.

Firstly we define “congestion”.

“Flow control”, the consideration that a very fast or aggressive ingress end host to

not overwhelm a lower-capacity egress end host, who would then be forced to

drop/discard fast arriving packets that it does not have the capability to process. This

is flow control, an end to end consideration. Whereas “congestion control” brings to

bear a slightly different set of notions. Can the ingress host overwhelm the network

(Internet) itself? In other words, can a sending host introduce packets into the

network so fast that in-between routers are incapable of coping with the volume of

131

incoming traffic, and are forced to drop some packets? This is congestion and one

needs provisions to deal with it. (see for example, Jagannathan & Matawie, 2005 for

a leisurely discussion; Parekh, 1992, 1994).

We have proposed a two-pronged approach to managing (unicast) Internet

congestion.

1. Given a pair of end hosts in communication, regardless of the internals of in-

between routers, can we maximize network utilization? This we call the Black

Box approach to congestion management. Popular examples are Tahoe, Reno and

Vegas. Elsewhere (see the suite of publications by Jagannathan & Matawie) we

have proposed “Sierra”. The motif of this paper is the quantitative comparison of

the performance of these algorithms. (Zhang et al. 1991).

2. The related Inter-Network counterpart of Black Box is White Box modeling,

wherein one attempts to optimize the internals of routers-in-the-path between

communicating hosts. Examples include FIFO, Tail-Drop and the RED Family.

(Bennett & Zhang, 1996; Braden et al. 1998).

Whereas this dichotomy of Black Box and White Box models was first introduced by

van Jacobson, a leading figure in the area of congestion control, in the late eighties,

his overtures have not been generally and readily adopted by the congestion

management community. (Vishweswariah & Heidemann, 1997).

In this chapter we have been concerned largely with the Black Box solutions and we

attempt to quantitatively compare the relative performance of Tahoe/Reno, Vegas

and Sierra. These were elaborated earlier in Chapter 6, but are revisited here for the

sake of continuity.

132

TCP Tahoe/Reno

Introduced in the late 1980’s by van Jacobson, TCP Tahoe was the precursor to many

congestion control algorithms, and has been remarkably successful in managing

“congestion collapse”. Tahoe, initially, very quickly and exponentially, probes the

available bandwidth to determine an optimal window size, denoted cwnd. This

process is (strangely) called Slow Start. Once the congestion window reaches the

value of a certain Slow Start threshold, Ssthresh, Slow Start terminates and yields

way to “Congestion Avoidance”, wherein the congestion window is increased

linearly. When an exception occurs (timeouts/triple duplicate) the window size is

halved. In the literature, this is often referred to as AIMD (Additive Increase &

Multiplicative Decrease) (Jain, 1990). As noted, Tahoe has spawned a number of

variants, such as:

• TCP Reno (Jacobson, 1988)

• TCP New Reno (Floyd et al. 1999)

• TCP SACK (Mathis et al. 1996).

For the purpose of our discussion in this paper, we disregard the differences in the

way these Tahoe- inspired protocols respond to exceptions.

Slow Start: start with a window size of 1. Dispatch a window’s worth of packets and

for every ack received, increment the congestion window by 1, till the window

reaches the value of “Ssthresh”. Should an exception (Timeout/Triple Duplicate)

occur before then, halve Ssthresh, set the window to 1, and re-enter Slow Start. If the

window has reached Ssthresh without an exception, exit Slow Start and enter the

“Congestion Avoidance” phase.

133

The control equations for SS (Slow Start) and CA (Congestion Avoidance) are as

follows:

• SS:

cwnd = cwnd + 1, per ack

cwnd = 2 * cwnd per window of acks

• CA:

cwnd = cwnd + 1/ cwnd, per ack

cwnd = cwnd + 1, per window of acks

As observed earlier, different congestion control algorithms (Tahoe, Reno, New

Reno, and SACK) respond slightly differently when exceptions occur, i.e., they

reduce the window size in different ways. But, for the purposes of our quantitative

modeling, we disregard these differences and presuppose only the control equations

presented above.

TCP Vegas (Brakmo & Peterson, 1995; Ahn et al. 1995)

The Vegas algorithm is also a window-based TCP protocol. It operates in a manner

using the variation in round trip delays to sense and counter congestion in the

network. The point is that by measuring the fluctuations in RTT (Round Trip Time),

Vegas tries to estimate the number of packets queued in the routers for that

connection and tries to keep that number between certain limits.

Vegas will redefine the retransmission method as implemented in Reno by:

• reading/recording the system clock when a segment is sent and when an ack is

received, and uses this RTT information to make retransmit decisions.

134

• Duplicate acks received make Vegas retransmit if the segment took longer than

the estimated RTT, which is interpreted as a lost segment (no waiting for triple

duplicates).

• After 1 or 2 non-duplicate acks are received, again the time interval is checked. If

greater than the estimated RTT, a retransmit is done (no waiting for duplicate

acks).

Set two control (design) parameters, A and B.

Calculate the Expected Throughput, E, where

E = [Current Window Size] / [BaseRTT].

BaseRTT is the smallest RTT seen by the ingress, up to the time t. This is estimated

to be the real propagation delay.

Calculate the Observed Throughput, O, where

A = [# acks received] / RTT.

by sending a distinguished control packet, and computing RTT as the difference

between the ack reception time and the transmission time of that distinguished

control packet. During this RTT, count the total number of acks received.

Use these control equations:

• A ≤ E – O ≤ B � Leave window unchanged

• A > E – O � W ++

• B < E – O � W --

Also a modified Slow Start is incorporated, whereby Vegas allows exponential

growth of the window every other RTT instead of every RTT. In this way, it

becomes possible to compare Expected and Observed throughputs.

Whereas it is difficult to directly interpret the dynamics of TCP Vegas in terms of the

more traditional congestion control algorithms presented earlier, we are able to offer

135

a simple calculation of a lower bound to the throughput offered by TCP Vegas. We

will then compare this with the throughput delivered by Tahoe/Reno and Sierra.

Sierra (Jagannathan & Matawie, 2005)

Sierra is a simple, yet novel, Black Box algorithm that uses three basic parameters to

optimize network usage. Sierra is further developed in other work in progress.

• RTTs (Round Trip Times), introduced earlier

• ingress rates (Rin)

• egress Rates (Rout).

No exponential smoothing is performed.

Briefly, the control equations are, for SQS (Sierra Quick Start) and SCAM (Sierra

Congestion Avoidance Mechanism), as follows.

SQS :

Rin = Rin / 2

till Rin ≤ Rout.

SCAM :

Rin = Rin +1

If Rin ≤ Rout.

And

Rin = Rin – 1

If Rin > Rout.

We use this intuitively appealing approach to determine the average throughput in

the steady state for Sierra.

Further research is needed when we deviate from the single source - single link

model that was assumed in this thesis, to carry over to the multi-source and shared-

136

link model wherein many of these protocols compete for bandwidth, and notions

such as fairness (Hasegawa et al. 2000) etc. come into play. Whereas there have been

some such, if complex, analyses, the buffer overflow model should provide a simple

method to compare the relative comparison of competing algorithms.

7.5 A CONSOLIDATED APPROACH TO CONSTRAINED

OPTIMIZATION

7.5.1 Introduction

Lagrange multiplier preliminaries: (Bertsekas, 2003; Kunniyur & Srikant, 2000;

Kelly, 1979 to 2003):

Lemma 1: [see Bertsekas, 2003]

Let f : Rn � R

gi : Rn � R ( i = 1, 2,…,m )

hi : Rn � R ( i = 1, 2,….,l )

Consider the optimization problem:

Minimize f(x) such that

gi (x) ≤ 0 ( i = 1,2,………..,m)

hi (x) = 0 ( i = 1,2,………...,l)

x ∈ X, X open in Rn

Then ∇ hi (x~) Linear Independent ⇒

Fo ∩ Go ∩ Ho = φ

where

Fo = { d : ∇ f(x~)T d < 0 }

Go = { d : ∇ g(x~)T d < 0 ; i ∈ I }

Ho = { d : ∇ h(x~)T d = 0 ; i = 1,2,…l}

137

Theorem 2 (Fritz John Necessaries) (Low & Lapsley, 1999):

Let the conditions of the Lemma obtain. Let x~ be a feasible solution and

I = { i : gi (x~) = 0 }

Let f, gi ( i ∈ I ) be differentiable at x~

Let hi ( i = 1,2,…..l) be continuously differentiable at x~

Also let gi ( i ∉ I ) be continuous at x~

If x~ is a local optimum then

∃ uo , ui (i ∈ I), vi ( i = 1,2,… l)

such that

uo ∇f (x~) + ∑ ui ∇gi (x~) + ∑ vi ∇hi (x

~) = 0

with uo, ui ≥ 0

uo, ui, vi not all zero.

Proof :

If ∇hi (x~) are Linear Dependent then the solution is trivial.

Hence assume ∇hi (x~) are Linear Independent.

Let A1 = a matrix with rows ∇f(x~) and ∇gi(x~) [ i ∈ I ]

And A2 = a matrix with rows ∇hi(x~)

By the Lemma, the system

A1d < 0

A2d = 0

has no solution.

Consider

S1 = A1d and S2 = z1 z1 < 0 and z2 = 0

A2d z2

138

Note S1 ∩ S2 = φ and

S1 , S2 are convex.

So ∃ p non-zero s.t. p = p1

p2

with pT S1 ≥ pT S2 and pT = ( p1T, p2

T )

⇒ p1T A1 d + p2

T A2 d ≥ p1T z1 + p2

T z2

⇒ Letting RHS → 0

We have

p1T A1 d + p2

T A2 d ≥ 0 ∀ d

Let d = - ( A1T p1 + A2

T p2 )

We then have,

− || A1T p1 + A2

T p2|| 2 ≥ 0

⇒ A1T p1 + A2

T p2 = 0

Letting

uo = p1(0)

ui = p1(i) ( i ∈ I)

vj = p2(j) ( i = 1,2,….l )

the result follows.

QED.

Fritz John Sufficiency

Lemma 3 [see Bertsekas, 2003]

Let Fo, Go, Ho as above.

Then, Fo ∩ Go ∩ Ho = φ

139

And f pseudo-convex at x~ ,

gi (i ∈ I) strictly pseudo-convex over Nε (x~)

hi (i = 1,2,….,l) affine (convex and concave)

⇒ x~ a local minimum

Theorem 4:

Let x~ a Fritz John solution.

Let S = { x : gi (x) ≤ 0 ∀ i ∈ I } ∩ { x : hi (x) = 0, ∀ i = 1,2,…. l }

Further let hi (x~) affine (convex & concave), and

∇hi (x~) Linearly Independent, and there exists Nε (x

~) such that

over S ∩ Nε (x~) we have

f pseudo-convex and

gi strictly pseudoconvex, then

⇒ x~ a local minimum for the problem

Proof:

Let Fo ∩ Go ∩ Ho ≠ φ, so

∃ d ∈ Fo ∩ Go ∩ Ho

We also have

uo ∇f (x~) + ∑ ui ( ∇gi (x~)) + ∑ vi ∇hi (x

~) = 0

uo ∇f (x~) d + ∑ ui ( ∇gi (x~)) d = 0

uo, ui ≥ 0, and d ∈ Fo ∩ Go ⇒ uo = 0 = ui

Hence,

vi ∇hi (x~) = 0

which contradicts the linear independence of the ∇hi

⇒ Fo ∩ Go ∩ Ho = φ

140

hi affine ⇒ d a feasible direction iff d ∈ Ho

Since gi is strictly pseudo-convex over S ∩ Nε (x~) we have

D = Go ∩ Ho

where D is the cone of feasible directions at x~

Hence, Fo ∩ D = φ

We have S ∩ Nε (x~) convex and f pseudo-convex at x~, hence

d = x – x~ ∈ D ∀ x ∈ S ∩ Nε (x~)

whence we must have

∇ f(x~)T d < 0

Hence if x~ is not a local minimum, we have

∃ direction d ∈ Fo ∩ D, which is a contradiction. Hence x~ is a local minimum. QED.

Lagrange-Kuhn-Tucker necessity

Theorem 5:

Let X ≠ φ, and open in Rn

f: Rn → R

gi: Rn → R

hi: Rn → R

The problem is to minimize f(x) such that

gi (x) ≤ 0

hj (x) = 0

x ∈ X

Let I = { I:gi (x~) = 0 } Let further

f, gi (i ∈ I) differentiable at x~

gi (i ∉ I) continuous at x~

141

hj continuously differentiable at x~

Also suppose ∇gi(x~) and ∇hj (x

~) are Linearly Independent.

Then (Lemma 5)

x~ is a local minimum ⇒

∃ ui ( i ∈ I ) , ∃ vj ( j = 1,2,…..l )

∇f(x~) + ∑ ui ∇gi(x~) + ∑ vi ∇hi(x

~) = 0

with ui ≥ 0

Proof :

This result follows readily from the Fritz John necessity conditions.

7.5.2 Lagrange-Kuhn-Tucker Sufficiency:

Theorem 1:

We address the same optimization problem. If LKT hold at x~ ∈ X, then

• f is pseudo-convex over Nε (x~) ∩ S

• gi is strictly pseudo-convex over Nε (x~) ∩ S

• hi is affine

Then

∇hi(x~) Linearly Independent ⇒ x~ a local minimum.

Proof : Follows readily from Fritz John sufficiency.

Theorem 2:

We address the same optimization problem. If LKT hold at x~ ∈ X, let

J = { i: vi > 0 }

K = { i : vi < 0 }

Let

• f pseudo-convex at x~

142

• gi quasi-convex at x~

• hi quasi-convex at x~ ∀ i ∈ J

• hi quasi-concave at x~ ∀ i ∈ K

Then x~ is a global minimum.

Proof:

∀ i ∈ I, we have

gi (x~ + λ(x – x~)) = gi (λx + (1-λ)x~)

≤ max { gi (x), gi (x~) }

= 0

= gi (x~)

Hence,

∇gi (x~)T (x – x~) ≤ 0

Similarly,

∇hi (x~)T (x – x~) ≤ 0 ∀ i ∈ J

∇hi (x~)T (x – x~) ≥ 0 ∀ i ∈ K

Hence,

[ ∑ ui ∇gi (x~) + ∑ vi ∇hi (x

~) ]T (x – x~) ≤ 0

From LKT, we have

∇f(x~) (x – x~) ≥ 0

⇒ f(x) ≥ f(x~) (by pseudo-convexity)

QED.

7.5.3 Penalty Methods: (Freund, 2004)

Penalty methods pertain to the replacement of a constrained optimization problem by

a series of unconstrained problems that are more easily solvable. Solutions to the

143

unconstrained problem in turn converge to a solution for the constrained problem. In

this context we define

Definition: A function p(x) : Rn → R is called a penalty function for the usual

constrained optimization problem (with equality and inequality constraints) if

• p(x) = 0 if g(x) ≤ 0 and h(x) = 0

• p(x) > 0 if g(x) > 0 or h(x) ≠ 0

Definition: A penalty optimization programme is

P(k) : minimize f(x) + k p(x) w.r.t. x if x ∈ Rn

We have the following

Penalty Convergence Lemma (without proof; see Bertsekas, 2000):

Let f, g, h, and p be continuous functions. Let further { xi : i ∈ N } be a series of

solutions to the penalty programme. Then, any limit point of {xi } is an optimal

solution for the original constrained problem.

We now prove a key result for penalty methods.

Let p(x) = ∑ [ max { 0, gi(x) ]q + ∑ | hi(x) |q

be a penalty function for the general constrained problem with equality and

inequality constraints.

So p(x) = θ (( gi(x))+, hi(x))

Let ∂θ(y)/∂yi = 0 at yi = 0 ( i = 1,2,….m ) (*)

Then p(x) is differentiable whenever the functions gi and hi are.

We then have,

∇p(x) = ∑ ∂θ(g+(x))/∂yi ∇gi (x) + ∑ ∂ θ(hi(x))/∂yi ∇hi(x)

Letting

uik = ck ∂θ(g+(xk))/∂yi and (**)

vik = ck ∂θ(h(xk))/∂yi

144

we see that the LKT conditions are satisfied.

From the Penalty Convergence Lemma, we have that if xk → x~, then x~ is an

optimal solution of the original optimization problem.

Let I = { i : gi (x~) = 0 }

And N = { i : gi (x~) < 0 }

∀ i ∈ N ⇒ uki = 0, ∀ k sufficiently large

∀ i ∈ I ⇒ uki ≥ 0

Let uk → u~ and

Let vk → v~

We have

∇ f(xk) + ∑ uki ∇gi (x

k) + ∑ vki ∇hi (x

k) = 0

⇒ by continuity, we have

⇒ ∇f(x~) + ∑ ui~ ∇gi(x

~) + ∑ vi~ ∇hi(x

~) = 0

and ui~ ≥ 0

So the LKT conditions are satisfied, and it remains to show that

uk → u~, and

vk → v~

Assume there are no such limit points.

For large k we have u~i = 0, ∀ i ∈ N

So

∑uki ∇gi(x

k) + ∑ vki ∇ hi(x

k) = ∑ uki/||u

k|| ∇gi(xk) + ∑ vk

i /||vk||∇ hi(x

k)

We have ||uk||,||vk|| → ∞

Hence, ∑ u~i ∇gi(x

~) + ∑ v~i ∇ hi(x

~) = 0

which violates the Linear Independence of the gradients.

145

It is easy to now show that the limit points are unique, again invoking the linear

independence of the gradients.

We thus have:

Theorem 7: Let θ (y) satisfy (*) and be continuously differentiable. Assume further

that f and g are differentiable. Define uk, vk using (**). Then if xk → x ~, and the

gradient vectors are Linear Independent at x~, we have uk → u~ and vk → v~, where

u~, v~ are LKT multipliers for the optimal solution x~ of the original constrained

optimization problem.

7.5.4 Exact Penalty Methods: (Freund, 2004; Boyd & Vanderberghe, 2006)

Here we choose a penalty function such that the solution to the penalty programme is

also a solution to the original constrained optimization problem (with equality and

inequality constraints).

Let p(x) ≅ ∑ (gi (x))+ + ∑ | hi (x) |

Let x^ solve the original problem P.

We have

q(c,x) = f(x) + k ∑ (gi (x))+ + k ∑ |hi(x)|

≥ f(x) + ∑ u*i (gi (x))+ + ∑ v*i |hi(x)|

≥ f(x) + ∑ u*i gi (x^) + ∇ gi (x^) (x – x^) + ∑ v*i (x^) + ∇ hi(x^) (x – x^)

= f(x) + ∑ u*i ∇ gi (x^)T (x – x^) + ∑ v*i ∇ hi (x^)T (x – x^)

= f(x) - ∇ f(x^)T (x – x^)

= q(k, x^)

So q(k, x^) ≤ q(k,x) solves P(k)

Now let x~ solve P(k). If x^ solves P as well, we have,

f(x~) + k ∑ (gi(x~))+ + k ∑ |hi(x

~)|

146

≤ f(x^) + k ∑ (gi(x^))+ + k ∑ |hi (x^)|

= f(x^)

Hence,

F(x~) ≤ f(x^) – k ∑ (gi(x~)+ - k ∑ |hi (x

~)|

which is a contradiction.

So x~ solves P as well. We have proven

Theorem 8: Suppose P is a generalized convex programme with inequality and

equality constraints, for which LKT are necessary, and let P(k) be a penalty

programme wherein

P(x) = ∑ (gi (x))+ + ∑ |hi (x)|

If k > max { u*I , v*I } where {u*, v*} are LKT multipliers then the optimal

solutions for P(k) and P coincide.

7.5.5 Barrier Methods: (Boyd & Vanderberghe, 2006)

This method acts to place a very high cost at the boundary of the feasible region to

dissuade solution points from ever approaching the boundary. We have, akin to the

penalty programme, the barrier programme, formulated as in:

B(k) : minimize f(x) + 1/k b(x)

s.t. g(x) < 0,

h(x) = 0,

x ∈ Rn

We have (without proof) the

Barrier convergence lemma: Let f, g, h, and b be continuous functions. Let {xi: i ∈

N} be a series of solutions to B (ki). Suppose ∃ an optimal solution x^ of P for which

147

∀ ε > 0 , Nε (x^) ∩ { x: g(x) < 0 and h(x) = 0 } ≠ φ . Then we have that any limit

point x~ of {xi } solves P.

Let b(x) = γ (g(x)) + γ (h(x))

We have,

∇ b(x) = ∑ ∂/∂yi [γ(g(x))] ∇gi(x) + ∑ ∂/∂yi [γ(h(x))] ∇hi(x)

xk solves B(ck) ⇒ ∇ f(xk) + 1/ck ∇ b(xk) = 0

i.e., ∇ f(xk) + 1/ck ∑ ∂/∂yi [γ(g(xk))] ∇gi(xk) + 1/ck ∑ ∂/∂yi [γ(hi(x

k))] ∇hi (xk)

Define uki = 1/ck ∂/∂yi [γ(g(xk))] and

vki = 1/ck ∂/∂yi [γ(h(xk))]

We then have,

∇ f(xk) + ∑ uki ∇ gi (x

k) + ∑ vki ∇ hi (x

k)

Hence uk , vk are Lagrange-Kuhn-Tucker multiplier vectors.

Note uki → 0 as k → ∞ as ck → ∞ and since gi (x

k) → gi (x~) < 0 and

∂/∂yi [ γ (h(x~))] is finite.

Similarly vki → 0 as k → ∞

Also, uki ≥ 0 ∀i, k sufficiently large.

Suppose uk → u~ as k → ∞. Then u~ ≥ 0 and ui = 0 ∀ i ∉ I

Also suppose vk → v~ as k → ∞

From continuity of all functions,

∇ f(x~) + ∑ ui~ ∇gi (x

~) + ∑ vi~ ∇hi (x

~)

with u~ ≥ 0 and u~T g(x~) = 0

Hence, uk, vk are Lagrange-Kuhn-Tucker multiplier vectors.

In a manner identical to the corresponding theorem for penalty functions, it can be

shown that ∃! u~, v~ such that uk → u~ and vk → v~

148

Hence we have,

Theorem 8: Suppose there exists an optimal solution x~ of P such that

∀ ε > 0 [ Nε (x~) ∩ { x/ g(x) < 0 } ∩ { x/h(x) = 0 } ≠ φ

Let γ(y) be continuously differentiable with uk, vk as defined above.

If xk → x~ and ∇gi (x~) , ∇ hi (x

~) are Linear Independent, then uk → u~ where u~ is

a vector of Lagrange-Kuhn-Tucker multipliers for the optimal solution x~ of P.

7.5.6 Utility Functions: (Kelly, 2000)

Most congestion control algorithms can be cast in terms of an optimization problem

of utility functions, for example:

Maximize ∑ Ur (xr)

xr

such that ∑ xr ≤ cl

and xr ≥ 0

For instance, Ur (xr) = arctan (xr Tr √ β) / √ βTr for TCP Reno.

A well behaved limit is Ur (xr) = -1/[xr Tr ]

For TCP Vegas, we have

Ur (xr) = α Tr log xr s.t. ∑ xr ≤ cl

We also have for Sierra,

Ur (xr) = - E {[Wor +/- 2n]/2n}

where E denotes the expected value.

It now remains to solve all three problems using Penalty/Barrier methods.

As mentioned earlier, we provide proofs in this chapter to introduce techniques that

may be readily adapted for the sake of studying congestion control problems. This

section is an example of this policy.

149

7.6 STOCHASTIC MODELS (Kelly, 1979, 2000, 2003)

7.6.1 Deterministic Limits

Consider N sources accessing a single link. We have the dynamics of the rth source

given by

xr(N) (k+1) = xr

(N) (k) + α (w – Mr(k))

where xr(N) (k) is the window size W(k) at time k of the rth connection. Recall the

earlier discussion in Chapter 6 about congestion windows.

We assume that W(k) is a Poisson random variable with mean (and variance) xr(N) (k)

The marking process at each link marks every packet with a probability κN(k) at time

k. We assume

κN (k) = pN(yN(k))

with pN (z) = 1 – exp( - γ/N z)

Next define

x(N) (k) = 1/N ∑ xi(N) (k)

We have

x(N) (k+1) = x(N) (k) + κ ( w – 1/N ∑ Mr(k))

We show that if

E [x(N) (0) – x(0)]2 → 0 as N → ∞

We have P [ | x(N) (k) – x(k) | > ε ] → 0 as N → ∞ with x(k) and p(x) as defined

above.

The proof is by induction. Assume it is true for k > 0 we will prove it for k+1.

E [x(N) (k) - x (k)]2 = E [x(N) (k-1) + κ(w – 1/N ∑ Mr(N) (k-1) – x(k-1) - κ(w – x(k-

1)p(x(k-1))]2

150

= E [ 1/N ∑ Mr(N) (k) ]2 – 2 E [ (1/N ∑ Mr

(N) (k)) (x(k)p(x(k))]

+ (x(k)

p(x(k)))2

→ 0 as N → ∞

Hence the result, and a weak law of large numbers obtains for average source rates.

7.6.2 Per Source Dynamics (Ott, 1999)

Here we study the dynamics of individual sources. Whereas the average source rate

obeys a weak law of large numbers, individual sources display random variations. In

this section, we provide models for this random behaviour. Specifically, we will

compute their variance.

Proposition 9: xi(N)(k) a series of input rates

x(N)(k) → x(k) in probability

We have

P { 1/√N [ ∑ xi(N)(k) – x(k) ] ≤ a } → 1/√2N ∫ exp ( -x2/2) dx

Assume this lemma (without proof):

Let z1, z2, …. be a sequence of random variables with distribution functions FZn and

moment generating functions φZn ( n ≥ 1) and

let z be a random variable with distribution function Fz and moment generating

function φz.

Then, φZn(t) → φz (t) ∀ t ⇒ FZn (t) → Fz (t), ∀t where Fz (t) is continuous.

Proof of Proposition 9 (Ott, 1999)

The moment generating function of xi(N)/√n = φXi/√n (t) = E [ exp{ t xi/√n}]

= φ ( t/√n)

So the m.g.f. of ∑ xi(N)/√n is given by

151

φ∑ Xr/√n (t) = [ φ (t/√n) ]n

Letting L(t) = log φ (t)

We note

L’(0) = µ = 0

L’’(0) = E [x2] = 1

By Lemma

Limn →∞ [(t/√n)/n-1] = Limn→∞ [ L’’(t/√n) t2/2] = t2 / 2

Hence we have proven the result for µ = 0 and σ = 1. The general result follows by

considering

(X - µ) /σ. Q.E.D.

7.6.3 Explicit Utility Feedback: (Benmohamed & Meerkov, 1993; Kelly, 1997)

When explicit feedback is available, it is possible to study its effect on the standard

deviation of individual flows.

As before, the dynamics of the rth source are given by

Xr(N) (k+1) = Xr

(N) (k) + τ (w – Poisson (Xr(N) (k)) ( 1- exp(-γyN (k) y/N )

As before we can show that

X(N) (k) = 1/N Xr(N) (k) converges as N → ∞ to X(k) such that

X(k+1) = X(k) + κ (w – X(k) p(X(k)))

i.e., Xr(k+1) = Xr(k) + τ ( w – Poisson (Xr (k))(1 – exp(-γX(k))

We thus have

σk+12 = ( 1 - τ p(X(k))) σk

2 + κ2 p2 ((x(k))E (Xr(k))

So σ∞ = [ τ p2(X∞) X∞] / [ ( 1 - τ p(X∞)]2

152

We note by comparison that this standard deviation is lower than the case of

probabilistic feedback by a factor of √p(X∞).

7.7 STOCHASTIC MODELS (concluded)

We start by motivating the “memory-less” or “Markov” property of statistical

distributions.

Definition: Assume one has a distribution of failure times of an item which are

statistically distributed. The distribution is MEMORY-LESS iff an item which has

been in use for some time is as good as a new item with regards to the amount of

time remaining until that item fails.

Proposition 10: If a statistical distribution has the memory-less property, then it is the

exponential distribution (i.e., the only distribution with the memory-less property is

the exponential distribution).

Proof: The memory-less condition is equivalent to

P ( X ≤ x + t, X > t) / P (X > t) = P (X ≤ x)

i.e., P (X ≤ x + t, X > t) = P (X ≤ x) P (X > t )

If X is a non-negative and continuous random variable, this equation becomes

P(t < X ≤ x + t) = P (0 < X ≤ x) P (X > t)

i.e.,

Fx (x + t) – Fx (t) = [Fx (x) - Fx (0) ] [ 1 - Fx (t) ]

Noting that Fx (0) = 0 and rearranging,

[ Fx (x+t) - Fx(x) ] / t = Fx(t) [ 1 - Fx(x) ] / t

Taking the limit as t � 0, we obtain

F’^(x) = F^ (0) [ 1 - Fx(x) ]

where Fx^ (x) denotes the derivative of Fx(x)

Let Rx(x) = 1 - Fx(x)

153

We have

Rx’(x) = Rx’(0) Rx(x)

This differential equation has a solution given by

Rx(x) = k e Rx(x) x

where k is an integration constant.

Noting that

k = Rx(0) = 1 and

letting

Rx’ (0) = - Fx’(0) = - fx (0) = - λ, we get

Rx(x) = e –λx

Hence

Fx(x) = 1 - Rx(x) = 1 – e –λx (x > 0)

We thus conclude that X is an exponential r.v. with parameter

λ = fx(0) ( > 0 )

Note that the memory-less property may be equivalently expressed as

P ( X > x + t / X > t ) = P (X > x) x > 0, t > 0, OR

P ( X > x + t) = P (X > x) P ( X> t) x, t > 0

These equations are satisfied when X is an exponential r.v.

7.7.1 Queue-width Marking

Consider k proportionally fair primal-dual controllers using one link of capacity Nk

The dynamics of the n’th source are described by

∂/∂t xn = κ ( w – xnPn (b^))

where b^ is the queue-width of the link.

We have

154

∂b^/∂t = ∑xi – Nk

Let b = b^/N and suppose that

PN(b^) = P(b)

In that case, if x = (∑xi ) / N

then,

∂x/∂t = κ ( w – x P(b))

∂b/∂t = [ x – c ]b+

Examples of PN(b^) and p(b) are given by the exponential distribution.

Discrete time versions of the above equations are

Xn(k+1) = Xn(k) + κ ( w – Xn(k) PN(b(k)) (**)

and b(k+1) = b(k) + [x – c]b+

Given the dynamics of the n’th source the stochastic models are given by

Xr(N) (n+1) = Xr

(N)(k) + κ ( w – Mn(N)(k+1))

and b^(N)(k+1) = b^(n)(K) + [ ∑ xR(n)(k) – c] respectively.

We can show that with the exponential distribution example (Tinnakornsrisuphap &

Makowski, 2003; Deb & Srikant, 2003).

X(N)(k) := 1/N [ ∑ Xn(N)(k) ]

and b(N)(k) := b(N)(k) / N

converge in probability to (**) as N � ∞

Consider this stochastic model for a TCP-type congestion controller

Xn(N)(k+1) = Xn

(N)(k) + κ ( w – Xn(N) Mn

(N)(k))

Recall that this corresponds to the utility function - w / Xn

Defining, as earlier,

X(N)(k) = 1/N [ ∑ Xn(N)(k) ]

We get

155

X(N)(k+1) = X(N)(k) + κ [ w – 1/N ∑ Xn(N)(k) Mn

(N)(k) ]

We expect a law of large numbers

1/N LimN�

∞ ∑ Xn(N)(k) Mn

(N)(k)

= LimN�

∞ E [Xn(N)(k) Mn

(N)(k) ]

= E [ Xn2(k) ( 1 – e-σX(k) )]

with

Mn (k) = Poisson [ Xn(k) ( 1 – e-σX(k) ) ]

A similar result appears in (Tinnakornsrisuphap & Makowski, 2003).

Note that X(k) is deterministic and in fact equal to E [ Xn(k) ]. But Xn(k) is a r.v.

with second moment explicitly affecting the dynamics of X(k).

A satisfactory resolution of the average behaviour of the TCP source and the correct

limiting behaviour is lacking at this time.

This chapter has included a presentation and discussion of numerous results and

theories from the fields of convex optimization, control theory as well as advanced

probability theory. These results, which have been gleaned from numerous sources,

appear for the first time in one place as such (in this chapter), and will be deployed in

future advanced investigations into the nature and relative behavior and performance

of Sierra vis-à-vis its predecessors and competitors. We reference our fifth research

paper on Sierra, where the quantitative superiority of Sierra is discussed in detail.

However, no claim is made as to the originality of the various propositions (and

proofs) presented in this chapter. This serves as a concentrated collation of results

from optimization/control theories adapted from the perspective of mathematical

congestion control and management.

156

8

QUANTITATIVE MODELING AND

SOFTWARE SIMULATION

This chapter involves two complementary aspects of the modeling of various

congestion control algorithms, including our novel Sierra, in terms of both:

• Quantitative modeling, at a level that is comparable to what can be found in the

literature on the subject (Jain & Hassan, 2001)

• Simulation-in-Software, using the OPNET Modeler simulator.

Most of the work reported in this chapter is original in nature.

8.1 A QUICK REVIEW

We begin with a quick discussion of various popular models. These are presented

mainly for completeness of the discussion, from a modeling/simulation perspective.

8.1.1 TCP Tahoe/Reno

Introduced in the late 1980’s by van Jacobson (v. Jacobson, 1988), TCP Tahoe was

the precursor to many congestion control algorithms, and has been remarkably

successful in managing “congestion collapse”. Tahoe, initially, very quickly and

exponentially, probes the available bandwidth to determine an optimal window size,

denoted cwnd. This process has been called Slow Start, though in fact it is quite

“quick”. Once the congestion window size reaches the value of a certain Slow Start

threshold, sthresh, Slow Start relents to “Congestion Avoidance”, wherein the

congestion window is increased (approximately) linearly. When an exception occurs

157

(timeouts/triple duplicate acks), the window size is halved. In the literature, this is

often referred to as AIMD (Additive Increase & Multiplicative Decrease).

As observed, Tahoe has spawned a number of variants, such as:

• TCP Reno • TCP New Reno

• TCP SACK.

For all intensive purposes, we briefly consider Slow Start & Congestion Avoidance.

Slow Start: Start with a window size of 1. Dispatch a window’s worth of packets and

for every ack received, increment the congestion window by 1, till the window

reaches the value of “Ssthresh”. Should an exception (Timeout/Triple Duplicate)

occur before then, halve Ssthresh, set the window to 1, and re-enter Slow Start. If the

window reached Ssthresh without any exception, exit Slow Start and enter the

“Congestion Avoidance” phase.

The control equations for SS (Slow Start) and CA (Congestion Avoidance) are as

follows:

• SS:

cwnd = cwnd + 1, per ack

cwnd = 2* cwnd, per window of acks.

• CA:

cwnd = cwnd + 1/cwnd, per ack

cwnd = cwnd + 1, per window of acks.

We quickly consider some of the major differences between Tahoe, Reno, New Reno

and SACK (Selective Ack).

Tahoe is a combination of:

• Slow Start

158

• Congestion Avoidance

• Fast Retransmit.

Reno has this contribution to make. Follow the Fast Retransmit phase by Congestion

Avoidance, rather than Slow Start, since the triple duplicates do indicate that data is

indeed flowing in the connection.

8.1.2 SACK

SACK is negotiated through the use of the TCP Header option fields, whereby the

receiver can offer feedback to the sender in the form of selective acknowledgement

options, whereby it is reported as to which blocks of data have arrived. This tells the

sender which continuous byte blocks have been received. In this manner, the receiver

queues and re-orders data without the need for them to be retransmitted.

8.1.3 New Reno

New Reno is a technique which operates at a time when SACK is not negotiated, and

allows TCP to recover faster from a multitude of lost packets from within one

congestion window (a performance drawback of Reno). The technique is an

improvement of the way that TCP Reno deals with the loss of several packets from

within one window.

Generally, SACK performs better that New Reno, per se, all things considered.

8.1.4 TCP Vegas

The Vegas algorithm is also a window-based TCP protocol. It operates in a manner

using the variation in round trip delays to sense and counter congestion in the

network. The point is that by measuring the fluctuations in RTT, Vegas tries to

estimate the number of packets queued in the routers for that connection and tries to

keep that number between certain limits.

Vegas will redefine the retransmission method as implemented in Reno by:

159

• Read/record the system clock when a segment is sent and when an ack is

received, and uses this RTT information to make retransmit decisions.

• Duplicate acks received make Vegas retransmit if the segment took longer than

the estimated RTT, which is interpreted as a lost segment (no waiting for triple

duplicates).

• After 1 or 2 non-duplicate acks are received, again the time interval is checked. If

greater than the estimated RTT, a retransmit is done (no waiting for duplicate

acks).

Set two control (design) parameters, A and B.

Calculate the Expected Throughput, E, where

E = [Current Window Size] / [BaseRTT]

BaseRTT is the smallest RTT (Round Trip Time) seen by the ingress, up to the time

t. This is estimated to be the real propagation delay.

Calculate the Observed Throughput, O, where

A = [# acks received] / RTT

by sending a distinguished control packet, and computing RTT as the difference

between the ack reception time and the transmission time of that distinguished

control packet. During this RTT, count the total number of acks received.

Use these control equations:

• A ≤ E – O ≤ B � Leave window unchanged

• A > E – O � W ++

• B < E – O � W --

Also a modified Slow Start is incorporated, whereby Vegas allows exponential

growth of the window every other RTT instead of every RTT. In this way, it

becomes possible to compare Expected and Observed throughputs.

160

8.1.5 Sierra

Sierra is a simple, yet novel, Black Box algorithm that uses three basic parameters to

optimize network usage. Sierra is further explicated in [7] and other work in

preparation.

• RTTs (Round Trip Times), earlier introduced

• Ingress Rates (Rin)

• Egress Rates (Rout).

No exponential smoothing is performed.

Briefly, the control equations for SSS (Sierra Smart Start) and SCAM (Sierra

Congestion Avoidance Mechanism) are as follows:

SSS:

Rin /= 2

Until Rin ≤ Rout

SCAM:

Rin ++

If Rin ≤ Rout

Rin --

If Rin > Rout

Rin and Rout pertain to the transmission rates at ingress and egress. Egress rates are

continually calculated at the receiver and relayed back to the sender, using control

packets.

We use this intuitively appealing approach to determine the average steady state

throughput for Sierra.

161

8.2 QUANTITATIVE MODELING

As a general scenario for all further quantitative analysis, we initially consider a

simple setting wherein a single host accesses a single link, en route to a single

destination. We will assume that “L” is the capacity of the link, in packets/sec. At the

speed of light, let “t” be the propagation delay. It is very easy to see that the queuing

delay at the link is 1/L. Hence the round trip time is given by

T = 2 times (t + 1/L)

We see that if the link capacity is,say, 50 Mbps, with a packet size of 500 Bytes, then

L = 12,500 packets/sec

1/L = 0.08 msecs

If the link length is, say, 1000 km, then the propagation delay is

[2 times 10exp6] / [3 times 10exp6]

= 6.6 msecs

Therefore,

T = 2 (6.6 + 0.08) = 13.36 msecs.

The factor LT is usually called the Bandwidth Delay product.

Let B stand for the buffer size at the link.

Let us assume furthermore that the propagation delay between source and destination

is “t”.

Therefore, (Lt) packets are in transit. Clearly, the total number of un-acknowledged

packets is

# = LT + B

It is clear that

If W > LT + B

Then there is buffer overflow and the attendant packet loss.

162

This above inequality will be the cornerstone for all our further analysis of the

performance of different congestion control algorithms.

In a nutshell, we will presume that if

W > LT + B

there will be exceptions in terms of undelivered packets.

8.2.1 Modeling Tahoe/Reno

Slow Start

Packets are discharged every (nT +m/L) seconds. It is easy to see that

W (t = nT + m/L) = 2n-1 + m + 1, with 0 ≤ m ≤ 2n-1 – 1

Similarly, the queue length is

Q (t = nT + m/L) = m+2 , with 0 ≤ m ≤ 2n-1 – 1

Therefore,

Qmax = 2n-1 + 1

Wmax = 2n

We denote the time between <t = nT> and <t = (n+1)T> as a “sub-epoch” wherein

the window size doubles.

As a further approximation, Ssthresh (introduced earlier) reaches

Ssthresh = [ LT + B] / 2

This implies that

2n-1 ≤ [LT +B] / 2

n ≤ log 2 (LT + B)

where,

n = the number of sub-epochs in SS

L = the link capacity

T = the average round trip time

163

B = the link buffer size.

At its peak, 2n-1 = (LT + B)/2 a realized upper bound.

In principle, Q is upper bounded by B, which implies

Qmax ≤ B

Wherefrom

(LT + B + 2) / 2 ≤ B

Whence B ≥ LT + 2

This is our simple expression for the non occurrence of exceptions (timeouts/TD).

If this inequality is satisfied, there is no buffer overflow.

At the extreme

Qmax = B

2n-1 + 1 = B

So

N = 1 + log 2 (B – 1)

Wmax = 2 (B – 1)

Notice that

W ~ 2 t/T

Whence

Tss = T log 2 [2 (B – 1)]

= T ( 1 + log 2 [B-1] )

Similarly,

Nss (packets delivered in SS)

= 2B – 3

Congestion Avoidance

We have

164

W (t) = W (t – T) + at

= (LT + B)/2 + at

W (tm) = Ssthresh

� 2n-1 + m + 1 = [LT + B] / 2

� n ≤ log 2 (LT + B)

So,

Nca = ∫ W(t)/T dt

= [LT + B]/2T * Tca + a Tca2 / 2

where Tca = [LT + B] / m

M being the slope of the CA line, an approximation to the square root curve.

Thus, Throughput (Tahoe)

[Nss + Nca] / [Tss + Tca]

where

Nss = 2B – 3

Tss = T [1 + log2 (B – 1)]

Nca = (LT + B) / 2T * Tca + a Tca2 /2

Tca = (LT + B)/m.

8.2.2 Modeling Vegas

The control equations were presented earlier.

With Wmax = LT + B, we have

E – O = {[LT + B][T = n/L] – T [LT + B + n]} / T [T + n/L]

= [Bn/L] / [T(T+n/L)] (Brakmo & Peterson, 1995)

a ≥ (B/L) / (T2 / n + T/L)

n ≤ a L T2 / (B – Ta)

165

So,

E [n] ≤ a L E [T2] / ( B – a E[T] )

= a L T2 / B – aT

Hence,

Throughput = E [LT + B + n] / E [T + n/L]

= { LT + B + 6 E[1] }/ { T + 6 E[1]/L}

E [1] = [d L T2] / [B + dT], say

Vegas throughput

= { [LT + B] [B + dT] + 6dLT2 } / { T + 6dT2 }

8.2.3 Modeling Sierra

As will be noted, the ingress rate goes up, down or flat in units. Let the probability of

these be as follows:

• P (+) = 1/m

• P (-) = 1/l

• P (flat) = 1/s.

Let K be a sample ingress rate, at discrete time t.

Being interested in steady-state performance, we skip the Jump Start state

completely, and assume a starting rate of K0 at time t = 0. To reach K from K0, we

will need steps P (Plus), L (Less), and S (Same).

We readily have:

P + L + S = t

P – L = K - K0

We have also the following probabilities

P [Rate = K @ Time = t] = (1/m)P (1/l)L (1/s)S

166

Therefore, the average throughput is the mathematical expectation of the Ingress

Rate, as follows:

Throughput (at Time = t) = E [Rate (t)] = ∑ K P [Rate = K @ Time = t]

Average Throughput = Lim t -> ∞ [Throughput (at Time = t)]

Continuing, we have

[Rate (t)] = (1/m)[t – S –K +Ko]/2 (1/l)[t – S – K + Ko]/2 (1 – 1/m – 1/l)S

= (1/m)t/2 (1/l)t/2 [(1/m)[– S –K +Ko]/2 (1/l)[– S – K + Ko]/2 (1 – 1/m – 1/l)S

= (1 – 1/l – 1/s)t/2 (1 – 1/m – 1/s)t/2 (1/m)[– S –K +Ko]/2 (1/l)[– S – K + Ko]/2

Times (1 – 1/m – 1/l)S

Note that in the two (first & second) terms in t above, they tend to 1 in the limit to

infinity; hence we may discard them in the first instance.

So we have,

E [Rate (t)] = ∑ K * (1/m)[K-Ko-S]/2 (1/l)[Ko-K-S]/2 (1/s)S

= (1/m)[-Ko-S]/2 (1/l)[Ko-S]/2 (1/s)S * ∑ K (l/m)K/2

Letting S � 0, we have

E [Rate (t)] = (1/m)-Ko/2 (1/l)Ko/2 * ∑ K (l/m)K/2

Wherefrom, we have

Average Throughput (Sierra) = (1/m)-Ko/2 (1/l)Ko/2 (l/m)1/2 / [1 – (l/m)1/2 ]2

This is our final expression for the average throughput of Sierra.

It is readily seen that this will yield a better performance than that of Reno, New

Reno, Vegas and Tahoe.

We have used mathematical machinery to a level that is consistent to what can be

observed in the literature on the subject. Various references are sighted (Hasegawa et

al. 1999; Allman et al. 2000).

167

We have now introduced a number of concepts starting from a basic discussion of

the Transport Layer. We presented a number of popular congestion control

algorithms, including Tahoe/Reno, Vegas, New Reno and Sierra. In the first half of

the chapter, we developed

• basic mathematical models to analytically study the behavior and performance of

these protocols.

In the second half of the chapter, we deal with

• OPNET Modeler simulation results for all the protocols.

Sample calculations were also presented, which appear to indicate the relative

superiority of Sierra. (Jagannathan & Matawie, 2009)

Further research is needed when we deviate from the one-flow unicast model, to a

multi-source, shared-link model, wherein many protocols compete for bandwidth,

and notions such as fairness (Hasegawa et al. 1999) come into play. Whereas there

have been such, if complex, analyses, the buffer overflow paradigm should provide a

simple tool to relatively compare the performance.

168

8.3 THE SIMULATION-IN-SOFTWARE PROJECT

The focus of this section is on Simulation, (discrete event), using the popular OPNET

Modeler software tool. The simulation exercise in software, as a project per se, is

much more involved than just a computer programming task, using the software tool.

(Ibanez & Nichols, 1998).

Two major simulation tools were evaluated as part of this study, as follows:

• NS 2 (Network Simulator)

• OPNET Modeler.

To gain further insight into the project, we set out some salient aspects of both these

tools as such.

8.3.1 NS 2

The Network Simulator NS 2 tool is a widely adopted and deployed software tool

used for the simulation of advanced TCP/IP protocols and algorithms, per se. Like

other tools, it is an object-oriented system which can emulate many very complex

and real-life network topologies, characteristics, programs and algorithms.

The tool itself was developed many years ago, by the Internet and congestion control

community at the University of California, at Berkeley, and is maintained by

academia. The software system that is NS 2 is written in C++ and Object Oriented

Tcl (interpreted) scripts. (Fall & Floyd, 1996)

The following well known diagram depicts the class hierarchy that is embedded into

the NS 2 system.

An NS network model is constructed by interconnecting components (a.k.a. NS

objects).

Some of these objects include:

169

________________________________

Figure 8.1 NS 2 Class Hierarchy

• Nodes: These represent clients, hosts, routers and switches, also known as

Intermediate systems.

• Classifiers: These help determine the outward interface objects, depending on

destination (and on occasion, source) address. Various types of classifiers

include:

o address classifiers

o multi cast classifiers

o multi path classifiers

o replicators.

Trace/drop

NsObject

Connector Classifier

Queue Agent

Delay

Trace Multicast Replicator

Address

DropTail RED AgentTcp Trace/Enq

Trace/Deq Trace/Hop

TclObject

Reno New Reno Vegas Fack Sack FullTcp

170

• Links: These are used to connect nodes to construct the network topology. One

defines a pair of head & tail, with an interconnecting queue. Links can be

simplex or duplex. Queues can be of the types:

o drop Tail

o fair queuing

o deficit round robin

o RED

o class based queuing.

• Agents: These are the transport layer end points. Broadly, these can be classified

as TCP or UDP agents. We are mainly concerned with TCP agents. These occur

in the following flavors (in NS 2):

o TCP/Tahoe

o TCP/Reno

o TCP/New Reno

o TCP/Sack 1

o TCP/Vegas

o TCP/Fack

o TCP/Sink

o And others, which we will not be concerned with here.

• Applications: These entities sit on top of the transport layer and produce data to

model simulation features. They are activated to the transport endpoints as such.

At the transport layer, we will mainly use TCP and Sierra.

Some main applications include:

o FTP (File Transfer Protocol)

o Telnet (Remote Login).

171

Traffic Generators:

These can be of the following types:

o Exponential ON-OFF: generates packets at a fixed rate, during the ON

periods and no packets sent during OFF periods.

o Pareto On-OFF: Same as exponential ON-OFF except that the ON-OFF

periods .are constructed from a Pareto distribution.

o Constant Bit Rate (CBR): generates packets at a constant rate. Random noise

may be introduced.

OUR EXPERIENCE WITH NS 2

Our experience with NS 2, over the course of several months, was as follows:

ADVANTAGES: TCP Vegas interpreted

MAIN DISADVANTAGES:

o not menu driven (procedural, complex object orientation)

o rather user unfriendly (too many ad hoc macros)

o can be pretty non-robust at times (very frequent core dumps)

o presentation of results/graphs not automated (relatively hard to generate graphs)

o limited applications supported.

8.3.2 OPNET Simulator Tool Suite

This suite of software products was developed by OPNET Inc. (earlier Mil3 Inc.).

and has been very well received by both academics as well as professional IT

practitioners in the industry. It is usually deployed for the purposes of TCP/IP

Network performance simulation projects. The suite includes:

• OPNET Modeler

• IT Guru

• other products (which we will not appeal to here).

172

In fact, our experience was limited and confined only to Modeler, which provided

all of the functionality we needed for our study.

Unlike NS 2, Opnet Modeler is totally menu driven, with some very user-friendly

and natural GUI interfaces for all stages of the simulation project. A slew of libraries

in the tool will model popular TCP/IP protocols & applications (such as IP QoS,

MPLS, RSVP, etc.). Also, a variety of network hardware (routers, switches, links,

etc. that are available in the networking marketplace, can be emulated and are

supported by the tool.

Opnet Modeler has a three level hierarchy:

• network level

• node level

• process level.

The network level is at the highest bracket in the scheme of things, and is used to

build a network by inter-connecting nodes with links. The node level allows one to

formalize the internal architecture of nodes. And finally, the process-level modeling

allows one to specify functional level behavior of objects using Finite State

Machines and State Transition Diagrams (STD).

The first two levels of the modeling exercise are carried out mainly by either “drag

and drop” or menu selection (drop down choices), and hence we will not discuss

them any further. But a few more words on the process-level modeler are in order.

As a simulation enabler, OPNET Modeler graphically depicts all processes using

State Transition Diagramming (STD). Actions in each state are labeled “Executives”,

and can be:

• enter executives - upon entering the state

• exit executives - when leaving the state.

173

The Process Editor is used to create Finite State Machines.

Apart from the executive code blocks, the following additional code blocks are also

used in the Process Editor:

• State Variables (SV)

• Temporary Variables (TV)

• Header Block (HB)

• Function Block (FB)

• Diagnostic Block (DB)

• Termination Block (TB).

Each of these programming blocks carries a putative meaning that is indicated by

their names.

Parameters Finally, in OPNET Modeler, object parameters are configured using

“Parameter Editors” These come in four flavors:

• Link Model: Link object parameters are modeled. Specifies the Link (in Link

Editor) and allows one to set parameters.

• Packet Format Model: This is useful for setting data format structure (TCP, IP,

etc.) for packets that are generated. The Packet Editor is used for these. As

before, the Editor allows one to set attributes.

• ICI Model: This is handy for supporting interrupt-driven communication between

processes. These are generally used to exchange control information between

protocol layers. The ICI Editor is used for this purpose.

• PDF Model: Defining statistical distributions. It is used to define parameters of

traffic generators. The traffic profile can be set for applications, and for this the

PDF Editor is deployed.

174

OUR EXPERIENCE WITH OPNET MODELER

ADVANTAGES:

• robust

• totally menu driven

• user Friendly

• advanced support for presentation (graphs, tables, etc.) of simulation results.

DISADVANTAGES:

• TCP Vegas not yet incorporated.

FOR ALL OF THESE REASONS AS GIVEN ABOVE, VIS-À-VIS THE PRO AND

CON FEATURES OF BOTH OPNET MODELER AS WELL AS NS 2, A

STRATEGIC DECISION WAS MADE TO USE OPNET MODELER AS THE

SOFTWARE SIMULATION TOOL OF CHOICE TO BE DEPLOYED AND

PROGRAMMED FOR THE PURPOSES OF THIS THESIS, TO EVALUATE THE

RELATIVE MERITS OF OUR NOVEL CONGESTION CONTROL ALGORITHM,

SIERRA.

8.3.3 Simulation Theoreticals

Software-based simulation has been espoused very often in the industry as a very

popular tool for the evaluation of the relative performance of various complex

TCP/IP protocols, algorithms, and models. The role and contribution of software

simulation in the field of TCP/IP performance evaluation cannot be under-

emphasized. This process is discussed here, where we deal with simulation basics,

concepts and fundamental issues.

175

Simulation has many obvious benefits such as:

• convenient alternative when the physical network of choice is not available or

absent

• a variety of workloads and network conditions can be emulated

• repeatable comparison of alternative protocols/architectures

• fine details can be incorporated

• complements quantitative analysis, as in this thesis.

Simulation has been introduced in various works on the subject, including (Jain et al.

2004; Law & Kelton, 2001).

Over the course of development of the Simulation process, the following 10 step

approach has been generally recommended:

1. Define the Objectives of the study.

2. Construct reference NW model.

3. Select the fixed parameters.

4. Select performance metrics.

5. Determine variable parameters.

6. Embed all of the above in software.

7. Construct simulation software programs.

8. Execute the simulation program.

9. Collect performance data.

10. Present and interpret results.

We will return very shortly to the implementation of these 10 steps into our project.

It cannot be understated that there is a whole lot more to simulation than just

programming in to the simulator tool chosen. It would be a very grave mistake to

plunge directly and headlong into programming.

176

For now, we continue with our discussion of simulation technology, as such.

Continuous vs. DES

The system modeled can be either continuous (flow of fluids in a reactor) or discrete

(packet queues in a router). Depending on the nature of the model, one has either

continuous or discrete situation. Note the main component of DES (Discrete Event

Simulation) is a linked list of events waiting to happen.

Terminating vs. Steady State

A terminating simulation is used to study a system for a clearly defined period of

time, or a number of events, e.g., performance of a new protocol stack between 9 am

and 5 pm. Peak hour traffic is simulated, with the simulation terminating after

exactly 8 hours.

However, Steady State Simulation is a different matter. We cannot just terminate the

project. One must wait till the study has reached a steady state as such. For example,

we may want to study the long-term packet loss rates in a congested router.

Synthetic vs. Trace Driven

For all simulation activity, we need input traffic patterns. Oftentimes, traffic is

“synthetically” generated by random data generations. For this to happen, pre-

defined models (Poisson, exponential, ON-OFF, self-similar) must be selected in the

first instance.

Although random generators are a popular choice, they can never totally model the

actual traffic in a given network situation. To achieve this, one performs “trace-

driven” simulation and deploys it.

177

The steps involved are:

• Capture traces of packet arrival times (from an operational network).

• Process the traces.

• Use these traces as simulation inputs.

Note that no random traffic generation is involved. The relative performance of

several algorithms and models can be compared.

8.4 SIERRA SIMULATION PROJECT

Herein, we set out the various 10 steps of the recommended approach discussed

above, that was carried out as part of the actual simulation exercise involving Sierra,

our novel transport layer congestion control algorithm.

The objective of the study was to study the relative performance of various

congestion control schemes, as follows:

1. TCP Tahoe alone

2. TCP Reno alone

3. TCP New Reno alone

4. Sierra alone

5. Sierra vs. Tahoe

6. Sierra vs. Reno

7. Sierra vs. New Reno.

The reference network model is as follows:

178

Figure 8.2 Reference Simulation Topology

The only performance metric studied was average throughput. This alone was

observed at the end of the study, as such. Fixed parameters included were the traffic

profile for the steady state DES simulation exercise.

The network model was constructed and fixed parameters incorporated into the

Modeler simulation tool system. Sierra was programmed into the various process

level modelers (discussed above), and the simulation program was exercised.

Also note that all the simulation exercises were repeatedly re-enacted using different

seeds, again and again.

Simulation Results

The simulation results appear below. It will be noted that whereas Sierra promises

additional throughput (see Figures 8.3, 8.4 and 8.5), as compared to the others, it also

appears to be fair, in that it does not “hog bandwidth” at the expense of other more

compliant and conservative protocols. This is an encouraging reading from the

graphs presented (see Figures 8.6, 8.7 and 8.8).

IP Cloud Router

Node 1

Node 2

Node 4

Node 3

Router

179

Figure 8.3 Reno throughput

New Reno

-20

0

20

40

60

80

100

120

140

160

0 10000 20000 30000

Time

Thruput (bytes/sec)

Series1

180

Sierra

-10

0

10

20

30

40

50

60

70

80

0 10000 20000 30000

Time

Thruput (bytes/sec)

Series1

Figure 8.5 Sierra throughput

Sierra vs. Tahoe

-20

0

20

40

60

80

100

120

0 5000 10000 15000 20000 25000 30000

Time

Thruput (bytes/sec)

Series1 Series2

Figure 8.6 Sierra vs. Tahoe, relative throughput

181

Sierra vs. Reno

-20

0

20

40

60

80

100

120

0 10000 20000 30000

Time

Thruput (bytes/sec)

Series1 Series2

Figure 8.7 Sierra vs. Reno, relative throughput

Sierra vs. New Reno

-20

0

20

40

60

80

100

120

0 10000 20000 30000

Time

Thruput (bytes/sec)

Series1 Series2

Figure 8.8 Sierra vs. New Reno, relative throughput

182

8.5 CONCLUSIONS

In this thesis, we reviewed a number of technologies deployed in today’s computer

networks. All aspects and layers involved in constructing such networks were

presented, discussed and analyzed.

We then focused our attention on the design and development of models for the

Black Box aspect of the congestion control problem, that is so critical today when

applications contend for bandwidth from the edges of the network, either the Internet

itself or private wide area networks.

We introduced the Black Box vs. White Box dichotomy for congestion control

models, which is so critical for any appropriate treatment of this problem.

Specifically we excluded treatment of White Box modeling, which is deferred to

future work.

Black Box modeling has been the subject of numerous studies, analyses, refinements

and improvement over the years. A comprehensive analysis of these developments

was given in Chapter 6, from a critical point of view for further refinement.

Our novel protocol, Sierra (the subject of five comprehensive research papers), was

presented and developed from the ground up. Further work in this project, from a

simulation/mathematical perspective, was planned out in detail.

Later, we did a comprehensive consolidation of results from control theory as well as

convex optimization theory, following the leadership of F.P. Kelly (one of the

guiding fathers of mathematical congestion control). We then collated a plethora of

techniques and results that are generally applicable to the analysis of congestion

management, from a stochastic point of departure.

Finally, in Chapter 8, we applied the programme laid out in earlier chapters to prove,

quantitatively, that Sierra offers better throughput than its main competitors. Also,

183

we carried out a simulation (in software) project using the popular OPNET tool. This

again demonstrates the fairness and superiority of Sierra.

As a result, we are able to demonstrate the goodness of our novel protocol, Sierra,

from both an analytical and an experimental perspective.

Further research is indicated in the following areas:

• incorporation of White Box methods into this project (RED, BLUE, etc.)

• further simulation, using ECN and WHITE BOX techniques

• the fit of ECN into our project

• investigation of the results of tweaking Sierra parameters on the ensuing

available throughput and fairness

• application of comprehensive stochastic modeling techniques to White Box

protocols.

It is expected this work will be carried out in the next few years. This is a good area

for further exploration in which we hope we will be involved in the coming years.

184

REFERENCES

Aggarwal, A., Savage, S. and Anderson, T. Understanding the performance of TCP

pacing. Proceedings of INFOCOM 2000 (March 2000) vol. 3, 1157-1165

Ahn, J., Danzig, P., Liu, Z, and Yan, L. Evaluation of TCP Vegas: emulation and

experiment. SIGCOMM Symposium on Communications Architectures and Protocols

(1995)

Allman, M., Balakrishnan, H., and Floyd, S. Enhancing TCP’s loss recovery using

limited transmit. IETF RFC, August 2000

Anjum, F., & Tassiulas, L. Fair bandwidth sharing among adaptive and non-adaptive

flows in the Internet. Proceedings of IEEE INFOCOM ’99 (March 1999)

Athuraliya, S., Li, V.H., Low, S, and Yin, Q. REM: Active Queue Management.

IEEE Network (2001)

Awduche, D.O., Malcolm, J., Agogbua, J, O’Dell, M, and McManus, J.

Requirements for Traffic Engineering over MPLS. IETF RFC 2702 (September

1999)

Bajko, G., Moldovan, I., Pop, O., Biro, J. TCP flow control algorithms for routers.

Technical Report, Technical University of Budapest, Hungary, 1999

Balakrishnan, H., Rahul, H., and Seshan, S. An integrated congestion management

architecture for Internet hosts. Proceedings of ACM SIGCOMM (August 1999) 175-

187

185

Balakrishnan, H., Seshan, S., Amir, E., and Katz, R. Improving TCP/IP performance

over wireless networks. Proceedings of 1st ACM conference on Mobile

communications and Networking (mobicom) (November 1995)

Barford, P. & Crovella, M. Generating representative web workloads for network

and server performance evaluation. Proceedings of ACM SIGMETRICS ’98 (June

1998)

Barford, P., & Crovella, M. A performance evaluation of hyper text transfer

protocols. Proceedings of ACM SIGMETRICS ’99 (March 1999)

Benmohammed, L. & Meerkov, S.M. Feedback control of congestion in packet

switching networks: the case of a single congested node. IEEE/ACM Transactions on

Networking 1,6, (1993) 693-707

Bennett, J., Benson, K., Charny, A, Courteney, W., and LeBoudec, J-Y. Delay jitter

bounds and packet scale rate guarantee for expedited forwarding. INFOCOM 2001

(April 2001) vol. 3, 1502-1509

Bennett, J. & Zhang, H. WF2 Q. Worst-case fair weighted fair queuing. Proceedings

of IEEE INFOCOM ’96 (San Francisco, 1996), vol. 1, 120-128

Bertsekas, D. Nonlinear Programming, Athena Press, 2003

Black, U. Physical Layer Interfaces and Protocols, IEEE Computer Society Press,

1998

Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., and Weiss, W. An

architecture for differentiated services. IETF RFC 2474, December 1998

186

Bonomi, F., & Fendick, K. The rate-based flow control framework for the available

bit rate ATM service. IEEE Network Magazine (March/April 1995), 25-39

Boutremans, C., & LeBoudec, J-Y. A note on the fairness of TCP Vegas. Broadband

communications (2000)

Boyd, S. & Vanderberghe, L. Convex Optimization. CUP, 2006

Braden, B., Clark, D., Crowcroft, J., Davie, B, Deering, S., Estrin, D., Floyd, S.,

Jacobson, V., Minshall, G., Partridge, C., Peterson, L., Ramakrishnan, K., Shenker,

S., Wrowclawski, J., and Zhang, L. Recommendations on queue management and

congestion avoidance in the Internet. Internet RFC 2309, 1998

Braden, R. Requirements for Internet hosts: Communications Layers. RFC 1122, 1999

Brakmo, L. & Peterson, L. TCP Vegas: End to End congestion avoidance on a global

Internet. IEEE Journal on Selected Areas on Communication 13,8 (October 1995)

1465-1480

CCIE Fundamentals, Cisco Systems, 2000, Cisco Press

Cerf, V., & Kahn, R. A protocol for packet network intercommunication.

Transactions on Communications 22,5 (May 1974) 637-648

Chapman, A., & Kung, H.T., Traffic management for aggregate IP streams.

Proceedings of CCBR (Ottawa, March 1999)

187

Charny, A., Clark, D., and Jain, R. Congestion control with explicit rate indication.

Proceedings of the IEEE International Communications Conference (ICC) (June

1995) 1954-1963

Chiu, D., & Jain, R. Analysis of the increase/decrease algorithms for congestion

avoidance in computer networks. Journal of Computer Networks 17,1 (June 1989)

1-14

Clark, D., & Fang, W. Explicit allocation of best-effort packet delivery service.

IEEE/ACM Transactions on Networking 6,4 (August 1998) 362-373

Clark, D., & Tennenhouse, D. Architectural considerations for a new generation of

protocols. ACM SIGCOMM ’88 (1988)

Clark, D.D. The design philosophy of the DARPA Internet. ACM SIGCOMM ’88

(1988)

Clark, D.D., Lambert, M.L., and Zhang, L. NETBLT: A high throughput transport

protocol. SIGCOMM Symposium on Communications Architectures and Protocols

(August 1987), 353-359

Coltum, R. OSPF: An Internet Routing Protocol. ConneXions: The Interoperability

Report, 3,8, 1999

Crowcroft, J., & Oechslin, P. Differentiated end-to-end Internet services using a

weighted proportional fair sharing TCP. Computer Communications Review 28,3

(July 1998)

188

Cruz, R.L. A calculus for Network Delay and a Note on Topologies of

Interconnection Networks. PhD thesis, University of Illinois, July 1987.

Davidson, J. An Introduction to TCP/IP, Springer 1992

Davin, J., & Heybey, A. A simulation study of fair queuing and policy enforcement.

Computer Communication Review 20,5 (October 1990), 23-29

Deb, S. & Srikant, R. Rate-based vs. queue-based models of congestion

control.University of Illinois Technical Report, 2003

Deering, S., & Hinden, R. Internet protocol, version 6 (IPv6). Internet RFC 2460,

December 1998

Demers, A., Keshav, S., and Shenker, S. Analysis and simulation of a fair queueing

algorithm. Internetworking: Research and Experience 1 (1990), 3-26

Egevang, K., & Francis, P. The IP Network Address Translator (NAT). IETF RFC

1631, May 1994. Covers NAT, but does not cover Port Address Translation (PAT).

Et, A.L. Closed-Loop Rate-Based Traffic Management. ATM Forum 94-0211R3,

April 1994

Fall, K., & Floyd, S. Simulation based comparisons of Tahoe, Reno and SACK TCP.

ACM Computer Communications Review 26,3 (July 1996), 5-21

Fenner, W. Internet Group Management Protocol, ver. 2. RFC 2236, 1997

Floyd, S. TCP and Explicit Congestion Notification. ACM Computer Communication

Review 24,5 (October 1994), 10-23

189

Floyd, S. The New Reno modification to TCP’s fast recovery algorithm.

http://www.aciri.org/floyd/papers/rfc2582.txt, April 1999

Floyd, S., & Fall, K. Promoting the use of end-to-end congestion control in the

Internet. IEEE/ACM Transactions on Networking (February 1998)

Floyd, S. & Henderson, T. The New Reno modification to TCP’s Fast Recovery

Algorithm. RFC 2582 (1999)

Floyd, S., & Jacobson, V. Connections with multiple congested gateways in packet-

switched networks part 1: One-way traffic. Computer Communications Review 21,5

(October 1991), 30-47

Floyd, S., & Jacobson, V. Traffic phase effects in packet-switched gateways. ACM

Computer Communication Review 21,2 (April 1991)

Floyd, S., & Jacobson, V. Random early detection gateways for congestion

avoidance. IEEE/ACM Transactions on Networking 1,4 (August 1993)

Floyd, S., Mahdavi, J., Mathis, M., and Podolsky,M. Extension to the SACK option

for TCP. RFC 2883 (2000)

Fraser, A.G. Towards a universal data transport system. IEEE Journal on Selected

Areas in Communication 1,5 (November 1983)

Garcia-Luna-Aceves, J.J. Loop free routing using Diffusing Computations.

IEEE/ACM Transactions in Networking, 1,4, 1993

Georgiadis, L., Guerin, R., and Parekh, A. Optimal multiplexing on a single link:

Delay and buffer requirements. IEEE Transactions on Information Theory (1997)

190

Golestani, S.J. A self-clocked fair queuing scheme for broadband applications.

Proceedings of INFOCOM (Toronto, Canada, June 1994) 636-646

Goyal, P., Vin, H., and Chen, H. Start-time fair queuing: A scheduling algorithm for

integrated services packet switching networks. Proceedings of ACM SIGCOMM ’96

(Palo Alto, CA, 1996), 167-168

Hasegawa, G., Murata, M., and Miyahara, H. Fairness and stability of congestion

control mechanisms of TCP. INFOCOM ’99 (March 1999) vol.3, 1329-1336

Hashem, E. Analysis of random drop for gateway congestion control. Technical

Report, Laboratory for Computer Science, MIT, 1988

Held, G. & Jagannathan, S.R. Practical Network Design Techniques, 2nd ed. CRC

Press, 2004

Held, G. ABCs of IP Addressing. CRC Press 2000

Held, G. Enhancing LAN Performance. Wiley Press, 2003

Hoe, J. Start-up dynamics of TCP’s congestion control and avoidance schemes.

Master’s thesis, MIT, 1995

Ibanez, J., & Nichols, K. Preliminary simulation evaluation of an assured service.

Internet draft, August 1998

IETF. Integrated services in the Internet architecture: an overview. IETF RFC 1633,

June 1994

191

Jagannathan, S. & Matawie, K. Issues and trends in unicast congestion management.

IWSM proceedings, Leuven, Belgium (2003), 201-206

Jagannathan, S. & Matawie, K. A survey of issues and recent trends in unicast

congestion control for the Internet. Journal of American Academy of Business,

Cambridge, (2005), 328-335

Jagannathan, S. & Matawie, K. Stochastic modeling of congestion control

algorithms. Global Management & Information Technology Research Conference,

New York, May 2006

Jacobson, V. Congestion avoidance and control. Proceedings of the SIGCOMM ’88

Symposium (August 1988) 314-332

Jacobson, V., & Braden, R. TCP extensions for long-delay paths. IETF RFC 1072,

October 1988, Discusses TCP SACK

Jacobson, V., Nichols, K., and Poduri, K. An expedited forwarding phb. IETF RFC

2598, June 1999

Jacobson, V., Nichols, K., and Poduri, K. The “virtual wire” behaviour aggregate.

Internet draft, March 2000

Jacobson, V., Nichols, K., and Poduri, K. The virtual wire per-domain behaviour.

Internet draft, July 2000

Jain, R. A delay-based approach for congestion avoidance in interconnected

heterogeneous computer networks. Computer Communications Review 19,5 (1989)

56-71

192

Jain, R. Congestion control in computer networks: Issues and trends. IEEE Network,

(1990) 24-30

Jain, R. Congestion control and traffic management in ATM networks: Recent

advances and a survey. Invited Submission to Computer Networks and ISDN systems

(February 1995) vol. 28, 1723-1738

Johnson, H.W. Fast Ethernet. Prentice Hall, 1996

Kalambi, D., Handley, M. and Rohrs, C. Internet congestion control for future high

bandwidth-delay product environments. Proceedings of ACM SIGCOMM, 2002

Karandikar, S., Kalyanaraman, S., Bagal, P., and Packer, B. TCP rate control.

Computer Communications Review 30,1 (2000)

Kelly, F.P. Charging and rate control of elastic traffic. European Transactions on

Telecommunications (1997)

Kelly, F.P. Maulloo, A., and Tan, D. Rate control in communication networks:

Shadow prices, proportional fairness and stability. Journal of the Operations

Research Society (1998)

Kelly, F.P. Reversibility and Stochastic Networks. Wiley, 1979 Kelly, F.P. Models for a self-managed Internet. Philosophical Transactions of the

Royal Society, 2000, 2335-2348 Kelly, F.P. Fairness and stability of End-to-End congestion control. European

Journal of Control, 9 (2003) 149-65

193

Kent, S., & Atkinson, R. Security Architecture for the Internet protocol. RFC 2401

November 1998

Keshav, S. A control-theoretic approach to flow control. Proceedings of ACM

SIGCOMM ’91 (September 1991)

Keshav, S. On the efficient implementation of fair queueing. Journal of

Internetworking Research and Experience (1991)

Keshav, S. An engineering approach to computer networking. Addison Wesley, 1997

Keshav, S., & Morgan, S.P. Smart: Performance with overload and random losses.

IEEE INFOCOM ’97 (April 1997)

Kolarov, A. Study of the TCP/UDP Fairness issue for assured forwarding per-hop

behaviour in differentiated services networks. 2001 IEEE workshop on High

Performance Switching and Routing (May 2001) 190-196

Kousky, K. Bridging the Network Gap. LAN Technology 6,1 2000 Krol, E. The Hitchhiker’s guide to the Internet. RFC 1118, Sep 1999 Kung, H., Blackwell, T. and Chapman, A. Credit-based flow control for ATM

networks: Credit update protocol, adaptive credit allocation, and statistical

multiplexing. Proceedings of ACM SIGCOMM ’94 (September 1994) 101-114

Kunniyur, S. & Srikant, R. End-to-end congestion control: Utility functions, random

losses and ECN marks. Proceedings INFOCOM 2000 (March 2000)

194

Lakshman, T.V. & Madhow, U. The performance of TCP/IP for networks with high

bandwidth-delay products and random loss. IEEE/ACM Transactions on Networking

1,4 (October 1997)

Lin, D. & Kung, H.T. TCP trunking: Design, Implementation and Performance IEEE

ICNP (October 1999), 222-231

Lin, D., and Morris, R. Dynamics of Random Early Detection. SIGCOMM ’97

(August 1997)

Lippins, N. The Internetwork Decade. Data Communications 21,14 : (October 2001)

Lo Monaco, G., Feroz, A., and Kalyanaraman, S. TCP Friendly marking for scalable

best-effort services on the Internet. Computer Communications Review, 2001

Low, S. & Lapsley, D.E. Optimization flow control: Basic algorithm and

convergence. IEEE/ACM Transactions on Networking 7,6 (December 1999)

Low, S., Peterson, L., and Wang, L. Understanding Vegas: A duality model.

Proceedings ACM Sigmetrics 2001 (June 2001)

Mahdavi, J. & Floyd, S. TCP friendly unicast rate-based flow control. Technical

Note, Jan 8, 1997

Martin, J. & Chapman, K.K. LANs: Architectures and Implementations. Prentice

Hall, 1989

Mathis, M. & Mahdavi, J. Forward Acknowledgement: Refining TCP congestion

control. Proceedings of ACM SIGCOMM “96 (Palo Alto, CA, August 1996),

281-291

195

Mathis, M., Mahdavi, J., Floyd, S., and Romanow, A. TCP Selective

Acknowledgement options. Internet RFC 2018 (October 1996)

May, M., Bolot, J., Diot, C., and Lyles, B. Reasons not to deploy RED. Proceedings

IEEE/IFIP IWQoS ’99 (June 1999)

Miller, M.A. Internet Technologies Handbook. Wiley, 2004

Mishra, P.P. & Kanakia, H.R. A hop-by-hop rate-based congestion control scheme.

Proceedings of ACM SIGCOMM ’92 (August 1992)

Mo, J., La., R.J., Anantharam, V., and Walrand, J. Analysis and comparison of TCP

Reno and Vegas. Proceedings of IEEE INFOCOM ’99 (March 1999)

Mo, J. & Walrand, J. Fair end-to-end window-based congestion control. IEEE/ACM

Transactions on Networking (2000)

Morgan, S.P. & Keshav, S. Packet pair rate control-buffer requirements and overload

performance. Technical Note, 1998

Morris, R. TCP behaviour with many flows. Proceedings of ICNP ’97 (October

1997)

Morris, R. Scalable TCP congestion control. PhD Thesis, Harvard University,

January 1999

Nichols, K., Jacobson, V., and Zhang, L. A two-bit differentiated services

architecture for the Internet. Internet Draft ftp://ftp.ee.lbl.gov/papers/dsarch.pdf

November 1997

196

Ott, T.J. ECN Protocols and the TCP Paradigm

http://web.njit.edu/mlt/papers/index.html

Ott, T.J., Lakshman, T.V., and Wong, L.H. SRED: stabilized RED. Proceedings of

IEEE INFOCOM ’99 (March 1999)

Parekh, A.K. A generalized processor sharing approach to flow control in Integrated

Services Networks. PhD thesis, MIT, February 1992

Parekh, A.K. & Gallagher, R.G. A generalized processor sharing approach to flow

control in Integrated Services Networks - the multiple node case. IEEE/ACM

Transactions on Networking (April 1994) 137-150

Perlman, R. Interconnections: Bridges, Switches and Routers. Addison Wesley, 2001

Plummer, D. An Ethernet Address Resolution Protocol. RFC 826, 1982

Ramakrishnan, K.K. & Floyd, S. A proposal to add Explicit Congestion Notification

(ECN) to IP. Internet RFC 2481 January 1999

Ramakrishnan, K.K. & Jain, R. A binary feedback scheme for congestion avoidance

in computer networks with a connectionless network layer. Proceedings of ACM

SIGCOMM ’88 (August 1988), 303-313

Ramakrishnan, K.K. & Jain, R. A binary feedback scheme for congestion avoidance

in computer networks. ACM Transactions on Computer Systems 8,2 (1990) 158-181

Ramakrishnan, K.K., Jain. R., and Chiu, D.M. Congestion avoidance in computer

networks with a connectionless network layer. Part iv: A selective binary feedback

197

scheme for general topologies methodology. Technical Report DEC-TR-510, Digital

Equipment Corporation, 1987

Sahu, S., Nain, P., Towsely, D, Diot, C. and Firiou, V. On achievable service

differentiation with token bucket marking for tcp. ACM Sigmetrics ’00 (June 2000)

Seddigh, N., Nandy, B., and Pieda, P. Bandwidth assurance issues for tcp flows in a

differentiated services network. IEEE GLOBECOM ’99 (December 1999) 1792-1798

Shenker, S. Fundamental design issues for the future Internet. IEEE Journal on

Selected Areas in Communication (1995), 1176-1188

Shenker, S. & Wroclawski, J. General characterization parameters for integrated

service network elements. IETF RFC 2212 (September 1997)

Shreedhar, M. & Varghese, G. Efficient fair queueing using deficit round robin.

Proceedings of ACM SIGCOMM (September 1995)

Stallings, W. Local and Metropolitan Area Networks. Prentice Hall, 1997

Stallings, W. Networking Standards. Addison Wesley, 1993

Stevens, W.R. TCP slow start, congestion avoidance, fast retransmit and fast

recovery algorithms. Internet RFC 2001 (January 1997)

Stoica, I., Shenker, S., and Zhang, H. Core-stateless Fair queueing: Achieving

approximately fair bandwidth allocations in high speed networks. Proceedings of

ACM SIGCOMM ’98 (1998) 118-130

Tanenbaum, A. Computer Networks. 4th ed., Prentice Hall, 2004

198

Tinnakornsrisuphap, P. & Makowski, A.M. Limit behavior of ECN/RED gateways

under a large number of TCP flows. Proceedings of INFOCOM (San Francisco, CA)

Apr. 2003, 873-83.

Tsang, D., & Wong, W. A new rate-based switch algorithm for ABR traffic to

achieve max-min fairness with analytical approximation and delay adjustment.

Proceedings of IEEE INFOCOM ’96 (March 1996) 1174-1181

Turner, J. New directions in communications, or which way to the information age?

IEEE Communications Magazine (1986)

Visweswaraiah, V. & Heidemann, J. Improving restart of idle TCP connections.

Technical Report TR-97-661, University of Southern California, November 1997

Vojnovic, M., Le Boudec, J-Y., and Boutremans, C. Global fairness of additive-

increase and multiplicative-decrease with heterogeneous round trip times. IEEE

INFOCOM 2000 (March 2000) 1303-1312

Wang, S.Y. Decoupling control from dataq for TCP Congestion Control. PhD thesis

, Harvard University, 1999

Wang, Z. & Crowcroft, J. A new congestion control scheme: Slow Start and search

(tri-S). ACM Computer Communications Review SIGCOMM 21, 1 (1991) 32-43

Zhang, H. & Ferrari, D. Rate-controlled static priority queuing. Proceedings of IEEE

INFOCOM ’93 (San Francisco, 1993)

199

Zhang, H. & Ferrari, D. Rate-controlled service disciplines. Journal of High Speed

networking (1994)

Zhang, L. A new architecture for packet switching network protocols. Technical

Report MIT LCS TR-45, Laboratory of Computer Science, MIT, August 1989

Zhang, L., Shenker, S., and Clark, D. Observations on the dynamics of a congestion

control algorithm: the effects of two-way traffic. ACM Computer Communications

Review (September 1991)

200

BIBLIOGRAPHY

Aalto, S. and Lassila, P. Impact of size based scheduling on flow level performance

analysis in wireless down-link data channels. Proceedings of the 20th International

Teletraffic Congress (ITC-20) 1096-1107, 2007, Ottawa, Canada

Ahmed, T., Mehaoua, A., Boutaba, B. and Iraqi, Y. Adaptive Packet Video

Streaming over IP Networks: A cross layer approach. IEEE Journal on Selected

Areas in Communications, vol. 23, no. 2, 385-401, 2005

Chen. C, Li, Z-G, Soh, Y-C. TCP friendly source adaptation for multimedia

applications over the Internet. Proceedings of the 15th International Packet Video

Workshop (PV ’06) Hangzhou, China, Apr. 2006

Chen, M. and Zakhor, A. Multiple TFRC Connection Based Rate Control for

Wireless Networks. IEEE Transactions on Multimedia, vol. 8, no. 5, 1045-1062,

2006

Cheung, G. and Yoshimura, T. Streaming Agent: A network proxy for media

streaming in 3G wireless networks. IEEE International Packet Video Workshop,

Pittsburgh, PA, USA, Apr 2002

Floyd, S. and McCanne, S. Network Simulator, LBNL Public Domain Software,

http://www.isi.edu/nsnam/ns/.

201

Hyytia, E., Lassila, P. and Virtamo, J. Spatial Node Distribution of the Random

Waypoint Mobility Model with Applications. IEEE Transactions on Mobile

Computing, vol. 5, no. 6, 680-694, 2006

Jacobs, S. and Eleffheriadis, A. Streaming video using TCP flow control and

dynamic rate shaping. Journal of Visual Communication and Image Representation.

vol. 9, no. 3, 211-222, 1998

Kalman, M., Steinbach, E., and Grod, B. Adaptive media playout for low delay video

streaming channels. IEEE Transactions on Circuits and Systems for Video

Technology, vol. 14, no. 6, 841-851, 2004

Kilpi, J. and Lassila, P. Micro- and Macroscopic analysis of RTT variability in GPRS

and UMTS networks. Proceedings of Networking 1176-1181, 2006, Coimbra,

Portugal

Kim, Y-G., Kim, J. and Jay-Kuo, C.C. TCP-Friendly Internet video with Network

Aware error control. IEEE Transactions on Circuits and Systems for Video

Technology vol. 14, no. 2, 256-268, 2004

Lassila, P. and Kuusela, P. Performance of TCP on low bandwidth wireless links

with delay spikes. European Transactions on Telecommunications, 2007, to appear.

202

Luna, C.E., Eisenberg, Y. et al. Joint Source coding and data rate adaptation for

energy efficient wireless video streaming. IEEE Transactions on Selected Areas in

Communications, vol. 21, no. 10, pp 1710-1720, 2003

Sastry, N.R. and Lan, S.S.. CYRF: A theory of window based unicast congestion

control. IEEE/ACM Transactions on Networking, vol. 13, no. 2, 330-342, 2005

Schaar, M. van der, and Shankar, S. Cross layer wireless multimedia transmission:

Challenges, principles and a new paradigm. IEEE Wireless Communications, vol.

12, no. 4, 50-58, 2005

Sharma, V., Virtano, J. and Lassila, P. Performance Analysis of the Random Early

Detection Algorithm. Probability in the Engineering and Informational Sciences,

vol. 16, no. 3, 367-388, 2002

Silsalem, D. TCP Friendly congestion control for multimedia communication in the

Internet. PhD Thesis, Technical University of Berlin, Berlin, Germany, 2000

Vieron, J. and Guillemot, C. Real Time constrained TCP compatible rate control for

Video over the Internet. IEEE Transactions on Multimedia, vol. 6, no. 4, 634-646,

2004

Yan, Y., Katrinis, K. et al. Media- and TCP-Friendly congestion control for scalable

video streams. IEEE Transactions on Multimedia, vol. 8, no. 2, 196-206, 2006

203

Zhu, P. et al. Joint design of source rate control and QoS aware congestion control

for video streaming over the Internet. IEEE Transactions on Multimedia, vol. 9, no.

2, 336-376, 2007

GLOSSARY

Most of the Specialized and Technical terms used within the body of this Thesis tend

rather to be “Topics” in their own right, and they don’t quite lend themselves to any

concise definitions as such.

All technical/specialized terms introduced in the thesis are elaborated on and

explained at the time of introduction.

Documents

researchdirect.westernsydney.edu.auresearchdirect.westernsydney.edu.au/islandora/object/uws:17589... · ACKNOWLEDGEMENTS I would like to thank my principal supervisor, Dr. Kenan Matawie