Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
BLACK BOX MODELLING OF CONGESTION
CONTROL PROTOCOLS FOR COMPUTER
NETWORKS
S. Ravi Jagannathan
A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE
REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
University of Western Sydney
August 2009
© Copyright by S. Ravi Jagannathan 2009
All Rights Reserved
DEDICATION
This thesis is dedicated to my wife Deepa and to my daughter Priyanka. Their support
and encouragement has been critical to the completion of this project.
ACKNOWLEDGEMENTS
I would like to thank my principal supervisor, Dr. Kenan Matawie at the School of
Computing and Mathematics of the University of Western Sydney, for his incredible
support, understanding, empathy and his constant encouragement and for his great
patience. His work has been very important in leading me through my thesis
research. I also would like to deeply thank my wife Deepa and my daughter Priyanka
who have provided emotional support throughout my studies in Australia. Without
them, I would not have had the courage to carry this project through to completion.
TABLE OF CONTENTS
Abstract
Acknowledgements
1. INTRODUCTION 1
1.1 Thesis Overview 1
1.2.1 Tiered Architecture 2
1.2.2 LAN’s & WAN’s 3
1.2.3 Internetworking 6
1.2.4 Black Box v/s White Box 8
1.2.5 Stochastic Modeling 9
1.2.6 Quantitative Modeling & Simulation in Software 10
2. INTERNETWORKING ISSUES 11
2.1 Introduction 11
2.2 Overview of Internetworking Concepts 11
2.3 Switching Overview 13
2.4 The Tiered Approach 16
2.5 Evaluating Backbone Capabilities 17
2.5.1 Path Optimization 18
2.5.2 Traffic Prioritizing 18
2.5.3 Load Splitting 21
2.5.4 Alternative Paths 21
2.5.5 Encapsulation (tunnels) 22
2.6 Distribution Services 22
2.6.1 Backbone Bandwidth Management 22
2.6.2 Area and Service Filtering 23
2.6.3 Policy-Based Distribution 23
2.6.4 Inter-Protocol Route Redistribution 24
2.6.5 Media Translation 24
2.7 Local Access Services 25
2.7.1 Value Added Addressing 25
2.7.2 Network Segmentation 26
2.7.3 Broadcast v/s Multicast 26
2.7.4 Naming, Proxy & Local Cache 27
2.7.5 Media Access Security 28
2.7.6 Router Discovery 28
2.7.7 ICMP 28
2.7.8 Proxy ARP 29
2.7.9 RIP 29
2.8 Constructing Internets By Design 29
2.9 Using Switches (Revisited) 30
2.9.1 Switches v/s Routers 30
2.9.2 Key Issues 31
3. NETWORK PERFORMANCE CHARACTERISTICS 32
3.1 Introduction 32
3.2 Frame Operations 33
3.2.1 Ethernet Frames 33
3.2.2 Fast Ethernet Frames 38
3.2.3 Gigabit Ethernets 39
3.2.4 Frame Overhead 41
3.3 Availability Levels 41
3.4 Network Traffic Estimation 43
3.5 An Excursion into Queuing theory 46
3.5.1 Buffer Memory Considerations 47
3.6 Ethernet Performance Details 49
3.6.1 Network Frame Rate 50
3.6.2 GE Considerations 51
3.6.3 Actual Operating Rate 52
3.7 Bridging a Network 52
4. ISSUES AT THE NETWORK, TRANSPORT AND
APPLICATION LAYERS 54
4.1 Internetworking Overview 54
4.2 Protocol Architecture 57
4.3 Design Issues 58
4.3.1 Addressing 58
4.3.2 Routing 59
4.3.3 Datagram Lifetime 60
4.3.4 Fragmentation/ Reassembly 60
4.4 Routing and Route Protocols 61
4.5 Routing Revisited 62
4.5.1 Routing Protocols 65
4.5.2 DV Protocols 67
4.5.3 LS Protocols 69
4.6 Excursion into the Transport Layer 71
4.7 Multimedia Service 72
4.8 Delay Calculations 74
4.8.1 10/100/1000 Mbps Ethernets 74
4.8.2 Switches 75
5. ETHERNET LAN’s REVISITED 77
5.1 Introduction 77
5.2 Transmission Media 78
5.2.1 Twisted Pair Comes into Two Varieties 79
5.2.2 Coaxial Cable 80
5.2.3 Optical Fibre Cable 81
5.3 An Excursion into the Ethernet Family 84
5.3.1 10 Mbps LAN 85
5.3.2 Fast Ethernet (100 Mbps) 87
5.3.3 Gigabit Ethernet (1000 Mbps) 90
5.3.4 10 Gigabit Ethernet 94
5.4 LAN Ethernet Design 97
5.4.1 Campus wide VLAN’s with Multilayer Switching 99
5.5 Switches Revisited 100
5.5.1 Scalability,Latency,Global Effect
of Failures/Collisions 101
5.5.2 Encoding Schemes 101
6. BLACK BOX CONGESTION CONTROL 106
6.1 The Basic Problem 106
6.2 Black Box Approach Described 108
6.3 TCP Tahoe & Reno 109
6.4 ACK’ing & ACK Clocking 111
6.5 TCP New Reno 113
6.6 Sack & D-Sack 114
6.7 Fack 115
6.8 Limited Transit 115
6.9 TCP Vegas 116
6.10 Sierra 117
6.11 TCP Friendly Rate Control 122
6.12 Mo-Walrand Algorithm 124
6.13 Packet Pair 124
6.14 Balakrishnan & Seshan’s Congestion Manager 125
6.15 The “Goodness” of any Black Box Solution 126
7. STOCHASTIC MODELLING OF CONGESTION 127
CONTROL ALGORITHMS
7.1 Motivation 127
7.2 Introduction 127
7.3 TCP/IP Stack Overview 128
7.4 Common Algorithms 130
7.5 A New Approach to Constrained Optimization 136
7.5.1 Introduction 136
7.5.2 Lagrange – Kuhn – Tucker Sufficiency 141
7.5.3 Penalty Methods 142
7.5.4 Exact Penalty Methods 145
7.5.5 Barrier Methods 146
7.5.6 Utility Functions 148
7.6 Stochastic Models 149
7.6.1 Deterministic Limits 149
7.6.2 Per Source Dynamics 150
7.6.3 Explicit Utility Feedback 151
7.7 Stochastic Models (Concluded) 152
7.7.1 Queue - Width Marking 153
8. QUANTITATIVE MODELING AND SOFTWARE 156
SIMULATION
8.1 A Quick Review 156
8.1.1 Tahoe/ Reno 156
8.1.2 SACK 158
8.1.3 New Reno 158
8.1.4 TCP Vegas 158
8.1.5 Sierra 160
8.2 Quantitative Modeling 161
8.2.1 Modeling Tahoe/ Reno 162
8.2.2 Modeling Vegas 164
8.2.3 Modeling Sierra 165
8.3 The Simulation - in - Software Project 168
8.3.1 NS 2 168
8.3.2 Opnet Simulator Tool Suite 171
8.3.3 Simulation Theoreticals 174
8.4 Sierra Simulation Project 177
8.5 Conclusions 182
LIST OF TABLES
3.1 Frame Overhead 42
3.2 Probabilities 48
3.3 Ethernet Frame Processing (Frames/Sec) 51
4.1 Major Network Layer Protocols 64
4.2 IP Addressing Overview 65
4.3 The Major Distance Vector Protocols 69
4.4 The Major Link State Protocols 71
4.5 Some Ethernet Delay Calculations 74
LIST OF FIGURES
2.1 Internetworking Scenario 12
2.2 Routers and Switches 13
2.3 Flow of Inter-subnet traffic with Layer 3 Switching 16
2.4 Priority Queuing 19
2.5 Custom Queuing 20
2.6 WFQ 21
2.7 Policy-based distribution: SAP Filtering 24
2.8 SR/TL Bridging Topology 25
3.1 Ethernet and IEEE 802.3 Frame Formats 33
3.2 Source & Destination Address Field Formats 36
3.3 Fast Ethernet Frame Formats 38
3.4 GE Frame Formats with Carrier Extension 40
3.5 GE Packet Bursting 40
3.6 Subdivided Networks 45
3.7 Typical LAN Information Distribution 46
3.8 Linking LANs with Different Operating Rates 53
4.1 The IP Header 56
4.2 The IPv6 Header 57
4.3 IP Operation 58
4.4 The Count-to-Infinity Problem 67
5.1 The Fast Ethernet Tree 88
5.2 Server/Switch Connection 90
5.3 GE Architecture 91
5.4 GE with Carrier Extension 93
5.5 GE with Packet Bursting 94
5.6 10 GE Architecture 95
5.7 10GE Serial & Parallel Implementations 97
5.8 Traditional Hub & Router Campus Networks 97
5.9 Interconnecting 10Base-T & 10Base-5 Networks 98
5.10 Campus Wide VLAN Design 99
5.11 Multilayer Switching 99
5.12 Connecting coaxial cable NIC to a wire hub 100
5.13 Hubs & Switches 101
5.14 Some basic encoding schemes 103
5.15 8B/10B Encoding 105
8.1 ns2 Class Hierarchy 169
8.2 Reference Simulation Topology 178
8.3 Reno Throughput 179
8.4 New Reno Throughput 179
8.5 Sierra Throughput 180
8.6 Sierra vs. Tahoe, Relative Throughput 180
8.7 Sierra vs. Reno, Relative Throughput 181
8.8 Sierra vs. New Reno, Relative Throughput 181
ABBREVIATIONS
AAA Authentication, Authorization & Accounting
APPN Advanced Peer to Peer Network
AOR Actual Operating Rate
ARP Address Resolution Protocol
AS Autonomous System
ATM Asynchronous Transmission Mode
BGP Border Gateway Protocol
CA Congestion Avoidance
CD Collision Detect
CIDR Classless Inter Domain Resolution
CM Congestion Manager
CRC Cyclic Redundancy Check
CSMA Carrier Sense Multiple Access
DHCP Dynamic Host Configuration Protocol
DLSW Data Length SWitching
DNS Domain Naming Service
D-SACK Duplicate SACK
DV Distance Vector
EBCC Equation Based Congestion Control
ECN Explicit Congestion Notification
ES Extended System
ESD End of Stream Delimiter
FACK Forward ACK
FCS Frame Check Sequence
FTP File Transfer Protocol
HSSG High Speed Study Group
ICMP Internet Control Message Protocol
IEEE Institute of Electronic & Electrical Engineers
IETF International Engineering Task Force
IGRP Interior Gateway Routing Protocol
IHL Internet Header Length
IP Internet Protocol
IPDU Internet Protocol Data Unit
IS Intermediate System
ISO International Standards Organization
LAN Local Area Network
LLC Logical Link Control
LS Link State
MAC Media Access Control
MTBF Mean Time between Failures
MTTR Mean Time to Repair
MTU Maximum Transmission Unit
NAPA Network Attachment Point Address
NetBIOS Network Basic Input/Output System
NLSP Netware Link State Protocol
NS Network Simulator
OFC Optical Fiber Cable
OSPF Open Shortest Path First
PBX Private Branch eXchange
PCS Physical Coding Sublayer
PDF Probability Density Function
PDU Protocol Data Unit
PMA Physical Medium Dependent
PMD Physical Medium Dependent
PSTN Public Switched Telephone Network
QOS Quality of Service
RED Random Early Detect
RFC Request for Comment
RIP Routing Information Protocol
RSVP Resource Reservation Protocol
RTMP Real Time Management Protocol
RTT Round Trip Time
SACK Selective ACKnowledgement
SAP Service Access Point
SCAM Sierra Congestion Avoidance Method
SNA Systems Network Architecture
SNMP Simple Network Management Protocol
SOF Start of Frame
SQS Sierra Quick Start
SS Slow Start
SSD Start of Stream Delimiter
SSTHRESH Slow Start Threshold
STD State Transition Diagram
STP Shielded Twisted Pair
UTP Unshielded Twisted Pair
TCP Transmission Control Protocol
TFRC TCP Friendly Rate Control
Thruput Throughput
TTL Time to Live
UDP User Datagram Protocol
VLAN Virtual LAN
VOIP Voice over IP
VTP VLAN Trunking Protocol
WFQ Weighted Fair Queuing
ABSTRACT
In this thesis, we look at some fundamental problems facing computer networking
technology. An extensive treatment of these areas is presented in the first instance. A
number of putative concepts, terminology and techniques, as pertinent to numerous
schools of thought, are presented, investigated and critiqued.
Going forward, we narrow our focus of consideration to some basic issues and trends
in the management of Internet congestion control, as well as many (now) traditional
attempts to address these problems. Key formulations are laid out, which set up the
problem at hand, and we raise many more fundamental questions. Key trends
observable in the literature are discussed. This provides a relatively smooth
introduction to the subject.
Reference is then made to Sierra, a novel “Black Box” congestion control
algorithm/protocol, which itself is the subject of serious ongoing refinement, having
already been baselined in five research papers on the subject. The “Black Box”
terminology was in essence conceived many years ago by van Jacobson, and is
revived in this thesis.
A framework for the comparative, stochastic (theoretical) analysis of various
congestion control algorithms/protocols is taken up and investigated. From a
theoretical, quantitative perspective, it is shown that Sierra offers relatively superior
throughput related performance levels.
Finally, we take up the matter of comparative simulation of Sierra, vis-à-vis its
“competitors”. For this project, the popular network simulator (tool) OPNET is taken
up and deployed. We present (here) and analyze the results from numerous
simulation experiments. The outcome is that Sierra enhances throughput, as against
other more traditional Black Box algorithms (Vegas, Reno, New Reno, etc.).
1
1
INTRODUCTION
1.1 THESIS OVERVIEW Quality of Service (QoS) can be defined as the problem of allocating short network
resources to a set of users/applications (web, media, voice, video) in a manner that
best meets their individual needs and determination to pay (for the resources). The
Internet is not a QoS network; instead it distributes resources approximately equally
in the face of congestion (Miller, 2004; Turner, 1986 for a retroactive perspective).
In this thesis, we make a crucial distinction between “Black Box” and “White Box”
congestion management, following terminology introduced by van Jacobson (v.
Jacobson, 1988), which was not universally adopted by the research community.
Black Box methods are end-to-end, and respond to network congestion without
paying any heed to what is inside the network, in particular the innards of the in-
between routers in a connection’s path. Having optimized network performance end
to end, we can then proceed to optimize the routers themselves which is White Box
congestion management. White Box congestion management has been specifically
excluded from discussion in this thesis, and is the subject of future research and
investigation. We focus on Black Box methods; in particular we introduce our novel
Black Box congestion management algorithm, Sierra (Jagannathan & Matawie,
2005). Good general references are (CCIE Fundamentals, 2000; Cerf & Kahn, 1974)
Sierra is analytically compared to other Black Box methods, namely
• TCP Tahoe (Jacobson, 1988)
2
• TCP Reno (Jain, 1990)
• TCP New Reno (Floyd & Henderson, 1999)
• TCP Vegas (Brakmo & Peterson, 1995).
The comparison of Sierra to other Black Box methods such as Mo-Walrand,
Balakrishnan-Seshan Congestion Manager, Keshav’s Packet Pair and TFRC (TCP
Friendly Rate Control) is the subject of future investigation and research (Keshav,
1991; Mahdavi & Floyd, 1997).
We place the discussion of Black Box methods in the context of Local Area
Networks (LANs) and LAN Design.
It is estimated that the great majority (and increasing number) of installed LANs are
Ethernet based, vis-à-vis Token Ring and Asynchronous Transfer Mode (ATM)
(Jain,1995). One of the reasons for this may be the significantly less expensive
hardware, with acceptable performance at the same time. Other arguments may also
be made for the fact that Token Ring/ATM LANs have become rather like the dodo
bird, and so in this thesis we specifically exclude their discussion (Braden, 1999).
We will have occasion to discuss two evolution mechanisms for LAN selection
within an organization (Held & Jagannathan, 2004).
• Bottom Up
• Top Down.
1.1.1 Tiered Architecture
In the former a three-tiered architecture results, whereas in the latter, an enterprise
LAN strategy drives a centralized approach to the family.
We also provide an extensive tutorial on the Ethernet family. Commencing with a
review of the various versions of the 10BASE family of networks, we move on to:
• the Fast Ethernet family (Black, 1998)
3
• the Gigabit Ethernet family (Martin & Chapman, 1989)
• the recently standardized 10 Gbps version of Ethernet (Kousky, 2000).
We return to the introduction of this matter presently (Johnson, 1996; Tanenbaum,
2003).
We look at the constraints associated with each member of the Ethernet family,
including the so-called 5-4-3 rule and cabling limitations. We also examine
transmission media and their characteristics, including twisted pair, coaxial cable as
well as both single mode and multimode fiber optics. (IEEE 802.8 committee)
We then turn our attention to Ethernet performance characteristics. Topics in this
chapter include the various issues at the data link layer, including framing, the
interframe gap, frame overhead, and their effect on performance and information
transfer, as well as reliable data exchange and error management.
We then concern ourselves with issues at the network, transport and application
layers with a further look at frames and their effect on processing, delays, and
latency considerations. Because the Transmission Control Protocol/Internet Protocol
(TCP/IP) is by far the most popular protocol transported by Ethernet, we will note
the delays required when packets are transported within an Ethernet frame. In doing
so, we can determine if it is practical to transport delay-sensitive information, such as
voice/video, over Ethernet.
1.1.2 LANs & WANs
No LAN is an island. There is always the ineluctable need to hook up your LAN with
other networks, either locally to another LAN, or to the Internet, or by a WAN to
another LAN. Chapter 2 deals with internetworking and the problems associated with
interconnecting geographically separated LANs (Cisco Systems, 2004). We note
that, in conclusion, in certain quarters Network Management (Kousky, 2000) is
4
deemed a LAN Design issue, at least partly. Our position is that this is rather a
modification tool, as the network needs to be up and running in the first place to
deploy Network Management techniques. Accordingly, we specifically exclude a
detailed discussion of this topic in this thesis. (Krol, 1999)
A LAN is a set of locally interconnected devices, connected via the same type or
different types of transmission media. Different LAN devices include (Perlman,
2001):
• stations and segments
• repeaters
• hubs
• bridges (different types)
• LAN switches
• routers
• brouters
• gateways
• network interface cards
• file servers.
We need to assume the different types of network toplology, and the overall structure
or architecture of popular LAN solutions. (Stallings, 1993) Key topologies include:
• loop
• ring
• bus
• token Bus
• tree
• star
5
• hybrid (mixed).
Ring topologies are not important as they are on their way out in relation to current
networking trends. By topology, we mean the geometry and geography of
interconnected LAN stations and segments.
Transmission media that interconnect devices and encoding techniques are the
rightful subject of an entire chapter in its own right.
The members of the Ethernet family of networks include, as alluded earlier:
• 10 Mbps Ethernet
• Fast Ethernet
• Gigabit Ethernet
• 10 Gigabit Ethernet.
Each member has its own immediate relatives.
For instance, within 10 Mbps, there are (Stallings, 1997):
• 10BASE-T
• 10BASE-2
• 10BASE-5
• 10BROAD-36, and
• 10BASE-F.
Within 100 Mbps Fast Ethernet, there are (Johnson, 1996):
• 100BASE-TX
• 100BASE-FX, and
• 100BASE-T4.
Similarly, within 1000 Mbps Ethernet, better known as Gigabit Ethernet, there occur
(Stallings, 1997):
• 1000BASE-LX
6
• 1000BASE-SX, and
• 1000BASE-CX.
However, in 10 Gbps operations, better known as 10 Gigabit Ethernet, only one
standardized LAN is defined.
Recall that a LAN solution consists of a transmission media, MAC protocol and
encoding mechanism, all operating by using a predefined topology.
The transmission media is concerned with the properties of the physical carrier that
bears the signals from source to destination.
The MAC protocol governs the method by which signals access the transmission
media.
The encoding mechanism defines how data and control codes are coded. Because all
of the LAN technologies that we discuss are baseband (digital transmission of digital
data), we will be concerned with only digital encoding. Different signal elements are
used to represent a binary 1 and a binary 0. A number of different encoding schemes
are discussed.
1.1.3 Internetworking
One uses the term “internetwork” or simply “internet” to denote an arbitrary
collection of networks interconnected in some fashion to provide host-to-host
connectivity and deliver a service. For instance, an organization may have a number
of sites, each implementing a LAN solution, and they might decide to interconnect
these LANs using point-to-point links. (Shenker, 1995; Shenker & Wroclawski,
1997)
The term internet needs to be distinguished from the term Internet, which represents
the global interconnection of many existing networks, including 802.3, 802.5 and
even ATM. In certain circles, the preceding networks are termed as “physical
7
networks” whereas a collection of connected physical networks is termed as a
“logical network”. In this context, a collection of LANs connected by switches and
bridges is still one network, whereas a collection of networks connected by routers is
called an internet. The key tool for managing internets is IP.
We also have occasion to treat network performance. We first look at the issue of
frame sizes and the length of an information field and the overhead of a frame.
Therefore, we first deal with, in detail, the composition of a LAN Ethernet frame.
Can the length of LAN frames or their information carrying capability be adjusted to
achieve enhancement in performance? Likewise, the effect of frame length on bridge
and router operations is investigated.
If we have an up and running LAN and wish to enhance or expand it, we can monitor
current LAN traffic to predict the effect of the expansion or the enhancement on a
similar planned network. But, when a brand new network is being put into place, we
lack a prior baseline. In this situation we need a theoretical framework to estimate
network traffic. That framework occurs through the use of a LAN traffic estimation
technique. We will explore the use of this technique to predict future network growth
and the effect of such expansion on the future planned network as well as the
segmentation of a LAN to improve network performance (Braden, 1999).
We will have a discussion of issues at the network, transport and application layers,
with an overview of internetworking at large. We have observed that some of the key
functions of the ‘router’ include the linking of different networks, routing and
delivery of data between processes and applications in End Systems (ESs, an ISO
terminology for end hosts) to edge devices on different networks, and to do all this
seamlessly and transparently in relation to the network architecture in these attached
networks.
8
IP (IPv4) is the predominantly popular protocol supporting these functions. We note
that a new standard for IP addressing was initially specified by the Internet
Engineering Task Force (IETF) variously called IPv6 and IPng. (Turner,1996;
Deering & Hinden, 1998). IPv6 addresses are 128 bits in length. IPv6 supports the
higher speed of today’s networks and the mix of multimedia data streams. Basically,
there was a need for more addresses to assign to all conceivable devices. As noted,
the source and destination addresses are 128 bits in length. It is expected that all
TCP/IP installations will eventually migrate to IPv6, although this process may take
several years, if not decades, to be achieved.
1.1.4 Black Box vs. White Box
In Chapter 6, we return to Black Box congestion management in the context of
LANs. What is unicast congestion management? Consider a set of hosts connecting
one-to-one with another set of hosts via the Internet using TCP/IP. The sending host
is called the “ingress” and the receiver is called the “egress”. Each ingress injects a
certain volume of traffic (packets) into the Net which are duly routed inside the Net
by the in-between routers. Now, interpose the Network. The Net is able to handle a
certain amount of traffic in an overall sense. Consider that the Net is comprised of
ISs (an ISO terminology for routers) a.k.a. routers. If the traffic volume arriving at a
router is “too high” then the router, in accordance with its own White Box methods,
will discard or drop packets. As the number of packets dropped rises in the Net, the
situation becomes one of “congestion”. (Lin & Morris, 1997).
In this thesis, we examine many traditional mechanisms to handle congestion at end
nodes and then critique them. We introduce a new algorithm (called Sierra) and close
by performing a comparative analysis of all algorithms using experimental
simulation. What we do not do is to optimize router behaviour within the network
9
itself - this is the White Box problem, and will be the subject of future investigation
and research (Hashem, 1988; Low & Lapsley, 1999; May et al. 1999).
Specifically, we look at (Jagannathan & Matawie, 2005):
• TCP Tahoe (Jacobson, 1988)
• TCP Reno (Jain, 1990)
• TCP New Reno (Floyd, 1999)
• TCP Vegas (Brakmo & Peterson, 1995)
• Mo-Walrand mechanism (Mo & Walrand, 2000)
• Balakrishnan-Seshan Congestion Manager (Balakrishnan et al. 1999)
• Equation Based Congestion Control (Mahdavi & Floyd, 1997)
• Keshav’s Packet Pair (Keshav, 1991)
• Sierra (Jagannathan & Matawie, 2005).
1.1.5 Stochastic Modeling
Chapter 7 is concerned with the stochastic modeling of congestion control
mechanisms. This chapter is a compilation of results from many diverse sources,
specifically convex optimization as applied to the field of mathematical congestion
control as such. No claim is made to the originality of most of the results in this
chapter; however, we have reorganized most of the theory in a manner that will lend
itself to future exploitation for the purposes of evaluating congestion control
methods. We use the basic “buffer overflow model” paradigm (Kelly, 2000), which
is the cornerstone of our analysis of congestion control algorithms. (Hoe, 1995). It is
true that, unlike systems in modern physics, all aspects of TCP and its progressive
behavioural evolution over time are fully under our control. However, the sheer
scale of TCP’s operation and behaviour is tremendous, and it is probably the very
largest and most complex man-made control system ever evolved. We need
10
mathematical models to “capture” such a system. We also introduce a new
systematic approach to constrained optimization, using Lagrange-Kuhn-Tucker
multipliers as well as Penalty and Barrier methods. It remains to apply these
extended theorems and propositions to congestion control itself viewed as an
optimization problem (Bertsekas, 2003).
Finally, we look at stochastic models and their deterministic limits offering a Central
Limit like theorem for per flow dynamics (Kelly, 1979). We also provide a law of
large numbers for stochastic congestion flows, as well as computing the variance of
per source rates. Among other things, it is demonstrated that Sierra is superior to
Reno and Vegas in terms of delivered throughput.
1.1.6 Quantitative Modeling & Simulation in Software
In Chapter 8 we are concerned with demonstrating the relative superiority of the
novel Sierra congestion control algorithm. For this, we deploy two techniques:
• Quantitative modeling, at a level similar to the theory found in (Jain & Hassan,
2001). New results are obtained for the relative performance of Sierra against
other competitors like Tahoe, Reno, New Reno, etc.
• Software simulation - firstly a tool selection project is described, along with our
experience with two popular tools, i.e., ns2 and OPNET. The innards of both
tools is described. The rationale is given for why we ended up selecting OPNET.
Using the latter, a simulation project was designed, constructed and implemented.
The final result is that Sierra provides superior throughput compared to its
predecessors. Sierra is also fair. When coupled with the other algorithms, it does not
“hog bandwidth”. These results give us confidence that Sierra is a superior
innovation. Further experimental study is indicated and pointed up there. This will
concern the result of adjustments to Sierra parameters on protocol performance.
11
2
INTERNETWORKING ISSUES
2.1 INTRODUCTION
One uses the term “internetwork” or simply “internet” to denote an arbitrary
collection of networks interconnected in some fashion to provide host-to-host
connectivity and deliver a service. For instance, an organization might have a
number of sites, each implementing a LAN solution, and they might decide to
interconnect these LANs using point-to-point links. (Stallings, 1997)
This term “internet” needs to be distinguished from the term Internet, which
represents the global interconnection of many existing networks, including 802.3,
802.5, and even ATM. In certain circles, the preceding networks are termed
“physical networks”, whereas a collection of connected physical networks is termed
a “logical network”. In this context, a collection of LANs connected by switches and
bridges is still one network, whereas a collection of networks connected by routers is
called an internet. The key tool for managing internets is IP. (Cisco Systems, 2000).
Also, we study the characteristics of network devices, such as switches, routers, their
various flavours and interactions, vis-à-vis their impact on congestion management.
2.2 OVERVIEW OF INTERNETWORKING CONCEPTS
Network designers are faced with a daunting task when constructing an internetwork,
because it is possible to use a mixture of four hardware devices (Perlman, 2001):
12
1. hubs (concentrators)
2. bridges
3. switches
4. routers.
Figure 2.1 Internetworking Scenario
These are discussed in detail elsewhere, but their key properties are quickly recalled
here, for the sake of continuity and completeness.
Hubs are used to link multiple users to a single physical unit, in turn connecting them
to the network. They simply regenerate incoming signals out all ports, other than the
port the data was received on, to all the attached stations.
Bridges serve to subdivide segments within the same network. They too function at
Layer 2, independent of the network layer and other higher layer protocols (Layer 3
and above). (Lippins, 2001)
Switches have more ports than bridges and can be considered to represent multiport
bridges with added intelligence. If the number of ports is N, each operating at
10Mbps, then the switch separates collision domains and provides an overall
Host Host
Host Host
Host Host Host
Router Router
Router
Network C
Network B
Network A
13
throughput of 10 x N/2 Mbps. Thus, while switches protect existing cabling
infrastructure, they do increase performance and bandwidth.
Routers separate broadcast domains and connect disparate networks. Driven
primarily by the IP protocol, routers make forwarding decisions based on Ipv4
address formats rather than link-layer MAC addresses.
The trend today is to move away from bridges and hubs and on to routers and
switches when designing internets.
2.3 SWITCHING OVERVIEW
HOST
HOST
HOST
HOST
HOST
RouterRouter switch 2switch 1Server
Figure 2.2 Routers and Switches
Switching data frames can occur via one of the following techniques (Stallings,
1991; Held, 2000):
• cut-through
• store and forward
• hybrid
• fragment-free.
With cut-through switches, only the destination MAC address in incoming frames is
examined, and based on that address only a forwarding decision is made. No other
14
checks occur. Thus, a cut-through switch has the lowest delay and should be
considered for supporting real-time applications, such as VoIP and streaming media.
With a store and forward switch, the entire frame is copied into the switch’s internal
memory, examined for occurrence of any errors, then sent out the right port. Because
the entire frame is stored and the frame is variable in length, the delay is also
variable.
As errored frames are removed by the destination device on a LAN, the necessity of
such error checking by a LAN switch has been questioned. [However, the filtering
capacity should be more useful to route protocols carried in frames to destination
ports more easily than by frame destination address]. Especially if one has hundreds
and thousands of devices attached to a large switch.
Hybrid switches represent a combination of cut-through and store and forward
switches. They work as cut-through switches until a certain level or threshold of
errors is reached, at which point they revert to performing as store and forward
switches. This means that the efficiency of these types of switches is also variable.
The major advantage of a hybrid switch is its minimal latency when error rates are
low and it becomes a store and forward switch when error rates rise, allowing it to
discard frames when error rates get high (Krol, 1999).
A fourth type of switch, fragment-free, examines the first 64 bytes of incoming
frames for any errors. If none are found, they push out the entire frame (without
storing it), using the belief that most errors are likely to occur there in the first 64
bytes. Because fragment-free switches have a slightly longer delay than cut-through
switches, but the delay is uniform, they can usually be used for VoIP and streaming
media applications (Braden, 1989).
15
Maintaining switch operations denotes the build and maintenance of switching
tables, route tables and service tables.
Switching occurs at Layer 2 and routing at Layer 3. In other words, switches work
based on the contents of 6-byte MAC addresses and routers use the 4-byte Ipv4 or
16-byte Ipv6 addresses (Held, 2000).
Switches automatically build and maintain Layer 2 switching tables to track and
learn MAC addresses. If a destination MAC address is not known, the switch
broadcasts that frame out all ports other than the port the frame arrived on. By noting
the source address of the frame and the port it arrived on, the switch updates its
internal tables via a backward learning process. In comparison, routers are
configured with the IP address of the networks attached to their ports and operate
based on 4- or 16-byte IP addresses.
With increasing bandwidth-hungry applications on the market, hubs in wiring closets
are rapidly being replaced by LAN switches. There is also an increasing demand for
intersubnet communications, which must flow through a router. In this connection
Figure 2.2 depicts a typical situation showing the relationship between switches and
routers. Switches primarily move data within a local geographical area, such as a
building. In comparison, routers provide long-distance and global interconnectivity.
Data flow from many hosts are passed serially through routers, which means that if
there is significant traffic for a server accessed remotely via routers, there is the
ineluctable possibility of congested bottlenecks at the routers.
To partly alleviate this problematic situation, Layer 3 capabilities are being added
throughout many networks, typically within formerly Layer 2 switches. Figure 2.3
depicts such a scenario.
16
HOST
Serverswitch A
Layer 2/3
switch B
switch C
Layer 2/3
Router
Layer 2/3
Figure 2.3 Flow of Intersubnet Traffic with Layer 3 Switches
2.4 THE TIERED (LAYERED) APPROACH
Both ISO OSI and TCP/IP reference models are instances of a hierarchical approach
to designing networks. Each layer is ascribed a set of functions or responsibilities
that it provides as services to the layer above. Internetwork design uses a hierarchical
tiered approach to help simplify the overall task. The advantage of a hierarchical
design is modularity, which allows different elements in the tier to be independently
constructed by different vendors and used mutually in an interoperable fashion. This
also facilitates management of change in the internet by containing the cost and
complexity thereof (Cisco Systems, 2000)
Traditionally, hierarchical internet design uses three tiers:
1. Backbone (core) tier – optimal inter-site communication
2. Distribution tier – policy-based connectivity
3. Local-access tier – user access to the network.
The core tier provides high-speed packet switching without any time-consuming
packet manipulation (e.g., filtering, error checks).
17
The distribution tier interfaces with the core and local-access tiers. It manipulates
data from the local-access tier and passes it on to the backbone.
Some of the functions of the distribution tier include:
• address or area aggregation
• department or workgroup access
• broadcast (and multicast) domain definition
• VLAN routing
• media translation
• security.
Some of the functions of the local-access tier include:
• shared bandwidth
• switched bandwidth
• MAC layer filtering
• microsegmentation
2.5 EVALUATING BACKBONE CAPABILITIES
The evaluation of the backbone capability of a tiered network is extremely important,
because it represents the primary data path. In this section, we will discuss the
following:
• path optimization (Coltum, 1999)
• traffic priorities (Garcia-Luna-Aceves, 1993)
• load splitting (Garcia-Luna-Aceves, 1993)
• alternative paths (Braden, 1999)
• tunneling (Stallings, 1993).
18
2.5.1 Path Optimization
Recall that in computer networks there are two types of protocols:
1. route protocols
2. routing protocols.
The former have essentially to do with addressing techniques, whereas the latter
pertain to trajectory selection from a fabric of paths available via routers and other
networks.
Convergence occurs when there is a change in the network properties and all routers
subsequently agree upon the optimal routes. This action takes place by means of
neighbour greeting and autoconfiguration.
Routing protocols, discussed elsewhere, come in two varieties:
1. metric optimizing protocols
2. policy-based protocols.
Examples of the former are RIP (Perlman, 2001) and OSPF (Coltum, 1999). An
example of the latter is BGP. IGRP uses a hybrid metric based on bandwidth, load
and delay. Link state protocols like OSPF and IS-IS (Perlman, 2001) minimize the
cost associated with selected path.
2.5.2 Traffic Prioritizing
Whereas some networks can prioritize homogeneous internal traffic, routers
prioritize heterogeneous flows. Such categorization is differentiated treatment, which
ensures that critical data are given an edge over less important flows (Golestani,
1994; Goyal et al. 1996; Shreedhar & Varghese, 1995; Stoica et al. 1998).
There are three types of category queuing (Demers et al. 1990):
• Priority
19
• Custom
• WFQ.
2.5.2.1 Priority Queuing
Traffic is categorized by a specific metric, such as protocol type. Typically, four
output queues are used:
1. high
2. medium
3. normal
4. low priority.
Figure 2.4 illustrates an example of priority queuing. Note that UDP, which is
typically represented by small segment lengths, such as DNS queries, is shown to
receive high priority in this example. Most Layer 3 routers and switches permit the
administrator to easily define data assigned to different queues.
Traffic Priority
UDP High
BLAT High
DECnet Medium
Vines Medium
TCP Medium
Other Normal
Apple Talk Low
ROUTER
No Priority
Traffic
Backbone Network
Traffic in order of
Priority
Figure 2.4 Priority Queuing
2.5.2.2 Custom Queuing
Custom queuing provides more granularity than priority queuing, wherein multiple
higher layer protocols are supported. Custom queuing reserves a portion of the
20
bandwidth for a certain protocol, guaranteeing a pre-determined bandwidth for it.
Figure 2.5 illustrates an example of custom queuing.
APPN
TCP/IP
NetBIOS
MISC
S
S
A
S
T
N
M
A
T
N
MM
M N
S S
40%
20%
20%
20%
A T
N H M L
M
Figure 2.5 Custom Queuing
2.5.2.3 Weighted Fair Queuing (Keshav, 1997)
The WFQ method uses TDM to segment the available bandwidth among the several
clients on the interface. By assigning weights, each client gets a weighted (for
example, ToS) treatment based on a defined metric, such as arrival rates. Note that if
all arrivals are assigned equal weights, low-volume traffic gets an edge over high-
volume traffic. Figure 2.6 illustrates WFQ (Crowcroft & Oechslin, 1998; Demers et
al. 1990).
WFQ uses an algorithm to dynamically identify data streams at an interface and sort
them into logical queues. Note that in certain cases, such as with SNA (Clark, 1992)
one cannot distinguish between sessions. In DLSW+, SNA traffic is multiplexed over
a single TCP session. In APPN (Joyce & Walker, 1992) they are multiplexed onto
21
one LLC2 session. Because WFQ treats these sessions as a single conversation, the
algorithm does not lend itself to SNA.
In priority queuing and custom queuing, access lists need to be pre-installed.
However, this is not the case with WFQ, which sorts among specific traffic streams
in real-time.
AA A
CC C
BB B
C
D
EE EE
CA CB D
Figure 2.6 Weighted Fair Queuing
2.5.3 Load Splitting
This is exactly what the name implies, load balancing over different paths. Load
splitting can be done with:
• IP (using equal cost paths)
• (E)IGRP (with unequal cost alternatives).
Up to four paths may be used for one destination network. Load splitting of bridged
traffic over serial lines is also possible.
2.5.4 Alternative Paths
The necessity here is to provide for complementary paths to a destination, in case of
link failures on active networks. End-to-end reliability is achieved only when there is
22
redundancy throughout the network. Because redundancy is so expensive, most
providers support redundancy on segments carrying mission-critical data.
Routers are the key to reliable internetworking. However, merely making hardware
at the nodes more available does not make the internet more reliable.
Instead, it is necessary to have redundant links as well. Unless all backbone routers
are fault tolerant, it is necessary also to ensure that redundant links should terminate
at different routers. Thus, a fully fault tolerant router situation is not only
prohibitively expensive, it does not address the link reliability issue. We will return
to reliability options later.
2.5.5 Encapsulation (Tunneling) (Stallings, 1993)
Encapsulation or tunneling is a simple operation, which takes packets or frames
from one network and hides them within frames from another protocol.
2.6 DISTRIBUTION SERVICES
We include a discussion of the following functionalities (Black, 1998):
• backbone bandwidth management
• area and service filtering
• policy-based distribution
• gateway service
• inter-protocol route redistribution
• media translation.
2.6.1 Backbone Bandwidth Management
To optimize use of the backbone, routers are able to offer features such as:
23
• priority queuing
• routing protocol metrics
• termination of local sessions.
Metrics on queues, overflow mechanisms, and routing protocol are all adjustable to
gain more control over forwarding packets through the internet. If a local session
terminates, a router can proxy for it instead of passing through all session control to
the multi-protocol backbone (Fenner, 1997).
2.6.2 Area and Service Filtering (Cisco Systems, 2000)
This functionality is achieved by the use of access lists, which control the movement
of data based on, among other things, network addresses. Service protocols are
applicable to specific protocols.
2.6.3 Policy-Based Distribution (Coltum, 1999)
A policy in our context is a set of rules governing end-to-end traffic to a backbone
network. For example (Davin & Heybey, 1990):
• A LAN department may send traffic to the backbone using three different
protocols, whereas it may wish to expedite one specific protocol through the
backbone as it contains mission-critical data.
• Another department may wish to exclude all but remote login and e-mail
from entering its LAN.
These are departmental policies, and organizational policies can exist as well. For
example, an organization might decree that no Web-based e-mail should enter or
leave its intranet.
24
Different policies may require different internetworking technologies, which may all
need to be integrated and co-exist harmoniously.
One possible way to implement policies is via SAPs (Service Access Points). This
situation is depicted in Figure 2.7.
In Figure 2.7, SAPs from the NetWare servers advertise services to clients.
Depending on whether services are provided locally or remotely, SAP filters prevent
SAP traffic from leaving the router interface.
Server
Router Router
Server
BACKBONE
NETWORK
Clients Clients
Figure 2.7 Policy-Based Distribution: SAP Filtering
2.6.4 Inter-protocol Route Redistribution (Krol, 1999)
The section above on gateway services related to two end nodes using different route
protocols to be able to communicate. Meanwhile, routers can interchange different
routing protocols (RIP, OSPF, IGRP, etc.), which exchange routing information at
the router. Static routing information can also be re-distributed.
2.6.5 Media Translation
These are techniques that translate frames from one network system to another. If
there are attributes in the one system with no counterpart in the other, we have a
problem on our hands. Different vendors will make different decisions as to how to
25
manage this situation. For example, when a direct bridging is sought between, for
example, Token Ring and Ethernet, one uses either SR/TL or SRT bridging.
SRT allows the router to use both SR bridges and a transparent bridging algorithm.
There is a standard way to convert between SR and translational bridges, as
illustrated in Figure 2.8
SRB Translational BridgeRouter
Lose RIFs
Gain RIFs
Figure 2.8 SR/TL Bridging Topology
2.7 LOCAL ACCESS SERVICES
Topics we consider here include:
• value-added addressing
• network segmentation
• broadcast or multicast capability
• naming, proxy, and local cache
• media access security
• router discovery.
2.7.1 Value-Added Addressing (Held, 2000)
When different addressing schemes exist for LANs, such as IP and NetWare, they
interoperate less than perfectly over multi-segmented LANs/WANs.
26
Inter-protocol specific helper addressing is a method that such traffic normally would
not be allowed to transit. For example, a client may search for a server and then
broadcast a message that must transit many routers. Normally such frames would be
dropped, but helper addresses allow such messages to go past routers.
Multiple helper addresses are supported on each router interface to allow forwarding
to remote destinations.
2.7.2 Network Segmentation (Stallings, 1997)
This is an instance of the usage of local access routers to implement local policies
and thus limit unnecessary traffic by segmenting traffic within component segments.
One way to accomplish this is by strategically positioning routers and building in
specific segmentation policies.
For example, a large LAN might be subdivided into segments, such that traffic on a
segment might be limited to:
• local broadcasts
• unicast intra-segment traffic
• traffic for another specific router.
Careful distribution of hosts and clients leads to reduced congestion in the network.
2.7.3 Broadcast or Multicast Capabilities
Routers intrinsically drop broadcast messages. But these are quite commonplace and
need to be curbed to reduce traffic to a manageable level and reduce broadcast
storms. Again, helper addresses help aid multicasts and broadcasts.
27
To fully support IP multicast, the IGMP (Internet Group Management Protocol) must
be deployed on hosts. IGMP enables hosts to dynamically report their multicast
group memberships to a multicast router (Miller, 2004; Fenner, 1997):
• Multicast routers send IGMP queries to their attached LANs and stations
respond with their membership information. The multicast router attached to
the LAN then takes responsibility for sending multicast datagrams from one
attached network to all other networks with multicast membership. If an
IGMP query brings no response, that group is deemed to have no members.
No further messages are sent to that group in the future.
2.7.4 Naming, Proxy and Local Cache Capabilities
These three router capabilities reduce traffic and enable efficient internet operation.
They include: (Held & Jagannathan, 2004):
• naming service support
• proxies
• local caching of network information.
Naming is a well-known mechanism used to resolve names to addresses. Common
examples of addressing schemes include:
• IP Domain Naming Service (DNS) (RFC 1034)
• Network Basic Input Output System (NetBIOS)
• IPX.
A router can proxy a name server. For instance, a list of NetBIOS addresses can be
maintained, avoiding the overhead of transmitting client/server broadcast in an SR
bridge environment.
In that case the router does the following (Martin & Chapman, 1989):
28
• Only one (duplicate) query frame is allowed per time period configured.
• A cache of NetBIOS server addresses with client names (and MAC addresses) is
maintained, limiting broadcast across the network.
2.7.5 Media Access Security
This serves to
• keep local traffic from inappropriately accessing the backbone
• keep backbone traffic from inappropriately entering the LAN.
Both problems need packet filtering to be alleviated. Packet filtering reduces traffic
levels to improve performance. Also, as its name implies, this function improves
security and reduces congestion. The most popular filtering mechanism is the access
list approach.
2.7.6 Router Discovery (Cisco Systems, 2000)
As its name implies, this service is a process of finding routers, ES-IS. [ES is ISO
terminology for host stations and IS pertains to routers]. Limited to exchanges
between hosts and routers, Hello messages are sent by ESs to all routers on the
subnet and in reverse. Both carry subnet and Layer 3 addresses of their generating
systems.
2.7.7 Internet Control Message Protocol (RFC 1256)
RFC 1256 outlines a process for ICMP. There is no single, standardized protocol for
this mechanism.
29
2.7.8 Proxy ARP
A proxy-ARP-enabled router responds on behalf of all hosts that it has a connection
with. This allows hosts to assume that all other hosts are on the network.
2.7.9 Routing Information Protocol
RIP is commonly available on hosts and is used to find the most suited router given
an address.
2.8 CONSTRUCTING INTERNETS BY DESIGN
We start with backbone considerations: (Held & Jagannathan, 2004):
• multi-protocol routing backbone
• uni-protocol backbone.
When several Layer 3 network protocols are routed through a common backbone,
without encapsulation, the situation is that of a multi-protocol routing backbone. Two
strategies are available:
1. Integrated routing – uses one preferred routing protocol that determines
the metric-minimizing path.
2. Ships in the night – uses a different routing protocol for each route
protocol.
All routers support one specific routing protocol per specific route protocol. They
then encapsulate all other routing protocols within the preferred supported routing
protocol.
30
2.9 USING SWITCHES (REVISITED) (Held, 2000)
Vendors and implementations are moving away from hubs and bridges to switches
and routers. All switches operate at Layer 2 and have the following benefits:
• superior segmentation
• increased aggregate forwarding
• increased backbone throughput.
LAN switches address end host requirements for greater bandwidth. Rather than
deploying hubs, by using switches, designers can increase performance and better
exploit existing media. Also, previously unavailable functionality such as VLANs
may become available when the functionality is incorporated into a switch. In
addition, by delivering links to interconnect existing, shared hubs in wiring closets,
and to server farms, a scalable bandwidth becomes available.
2.9.1 Comparison of Switches and Routers (Perlman, 2001)
To conclude our discussion of internetworking, we will summarize the major
differences between switches and routers.
Key features of switches include:
• high bandwidth
• high performance
• low cost
• easy configuration.
Key features of routers include:
• broadcast firewalling
• hierarchical addressing
31
• inter-LAN communication
• quick convergence
• policy routing
• QoS routing
• security
• redundancy
• load splitting
• traffic flow management.
Note that, as switch technology gains momentum, switches of the future will address
all these router functionalities. Although routers currently have more features than
switches, a new series of switches that include built-in routing capability deserve the
consideration of network designers and analysts when constructing or revising a
network (Jacobson & Nichols, 1999; Jacobson et al. 1999, 2000).
The material in this chapter serves as a good background for the congestion
management criteria introduced later in this thesis. It has been introduced to make
the thesis more self-contained as such. Various details are included herein, and these
will be sourced from later chapters.
32
3
NETWORK PERFORMANCE
CHARACTERISTICS
3.1 INTRODUCTION
In this chapter, we turn our attention to Ethernet LAN performance, using general
well understood characteristics about Ethernet, which are introduced in Appendix A.
First we look at the issue of frame sizes and the length of the information field and
the overhead of a frame. Therefore, we first deal with, in detail, the composition of a
LAN Ethernet frame. Can the length of LAN Frames or their information carrying
capability be adjusted to achieve enhancement in performance? Likewise, the effect
of frame length on bridge and router operations is investigated (Sahu et al. 2000).
If we have an up and running LAN and wish to expand or enhance it, we can monitor
current LAN traffic to predict the effect of an expansion or the enhancement on a
similar planned network. But, when a brand new network is being put in place, we
lack a prior baseline. In this situation, we need a theoretical framework to estimate
network traffic. That framework occurs through the use of a LAN traffic estimation
technique. We will explore the use of this technique to predict future network growth
and the effect of such expansion on the future planned network, as well as the
segmentation of a LAN to improve network performance (Davidson, 1992).
33
3.2 FRAME OPERATIONS
The key to understanding Ethernet LAN performance is to first appreciate how an
Ethernet frame is constructed (RFC 1534, RFC 826). Within a frame, there are
essential informational elements (data bits) and non-essential control or padding
information. We first examine the fields in a typical LAN frame. When expanding or
establishing a network or when connecting disparate LANs, we will need to consider
the overheads associated with the frame format.
ETHERNET
8 bytes 6 bytes 6 bytes 2 bytes 46 to 1500 4 bytes
bytes
IEEE 802.3
7 bytes 1 byte 6 bytes 6 bytes 2 bytes 46 to 1500 4 bytes
bytes
Length Data
Destination
Address
Source
AddressType DataPreamble
Preamble
Start of
Frame
Delimiter
Destination
AddressFCS
FCS
Source
Address
Figure 3.1 Ethernet and IEEE 802.3 Frame Formats
3.2.1 Ethernet Frames
Figure 3.1 depicts the standard composition of both IEEE 802.3 and Ethernet frames.
34
Note that the preamble in IEEE 802.3 is seven bytes, whereas in Ethernet it is eight
bytes. Both are used for synchronization and consist of a repeating sequence of
binary 1s and binary 0s. The IEEE 802.3 standard replaced the last byte of the
Ethernet preamble field with a one-byte Start-Of-Frame (SOF) delimiter. That byte
has a sequence of 1s and 0s, but terminates with two set bits. Another difference
between the two frames occurs in the protocol field in Ethernet that was replaced by
the length field in the IEEE 802.3 frame. The two-byte protocol field contains a
value that identifies the protocol transported in the frame, such as IP or IPX. In
comparison, the length field identifies the length of the frame in an IEEE 802.3
environment. This means that only one protocol can theoretically be transported in an
IEEE 802.3 frame.
Because most organizations need to transport multiple protocols, the data field of the
IEEE frame was used to convey several subfields that allowed multiple protocols to
be transported. Referred to as a SubNetwork Access Protocol (SNAP), this frame
retains the IEEE 802.3 frame format, but inserts special codes within the beginning
of the data field to indicate the type of data that is transported.
Whereas some vendors produce dually functioning IEEE 802.3 and Ethernet
hardware, this is done mostly to preclude the wholesale replacement of idiosyncratic
(IEEE 802.3 versus Ethernet) NICs in their workstations.
We now discuss the frame fields in order.
3.2.1.1 Preamble
The preamble field consists of alternating 1s and 0s that serve to announce the arrival
of the frame and for all listeners in the network to synchronize themselves.
35
Furthermore, this field serves to ensure a minimum 9.6 micro-second (µs) frame
spacing at 10 Mbps to use for error control and recovery.
3.2.1.2 SOF Delimiter
This field, which only applies to IEEE 802.3 frames, consists of a format identical to
the preamble, with alternating 1s and 0s for the first six bits. The seventh and eighth
bits are both set to 1, which breaks the synchronization pattern and alerts the listener
that the data is coming.
A controller strips off the preamble and SOF delimiters from incoming frames before
buffering them. Accordingly, as the preamble and SOF delimiter are included in
computations of length or they are not, minimum and maximum frame lengths can be
determined differently in calculations. That is, minimum and maximum length
frames will be eight bytes longer on the media than when in a computer’s NIC.
3.2.1.3 Source and Destination Addresses
Both source and destination addresses occur in IEEE 802.3 and Ethernet frames. The
destination address indicates the recipient of the frame and the source address
indicates the originator. Two-byte source and destination addresses apply only to
802.3 and, although designed for use by small LANs, were never seriously
implemented. In comparison, six-byte addresses apply to IEEE 802.3 and Ethernet
and are de facto addressing standards. They exist within two special fields, as
depicted in Figure 3.2.
Those fields are:
• I/G (Individual/Group) bit – this is 0 for unicast frames and 1 for multicast
frames.
36
• U/L (Universal/Local) bit – applies only to six-byte addresses
o 0 � universally assigned by IEEE
o 1 � locally administered by vendor
(a) 2 - Byte field (IEEE 802.3)
16-bit address field
(b) 6 - Byte field (IEEE 802.3)
48-bit address field
I/G 15 address bits
U/LI/G 46 address bits
Figure 3.2 Source and Destination Address Field Formats
3.2.1.4 Type
The type field applies only to Ethernet. This field identifies the network layer
protocol carried. There is a different connotation for IEEEE 802.3 and therefore rules
out interoperability between the two protocols.
3.2.1.5 Length
The length field is two bytes and is used to identify the number of data bytes in the
data field. As noted, there is a proviso in length calculations according to whether the
37
preamble and SOF are or are not included, but this does not affect the length field’s
value.
Short frames have an effect on reliable MAC delivery. It is possible that a short
frame has collided and corrupted with another, but the sender still believes the
transmission is successful. To preclude this possibility, it is deemed that that the
minimum length of all frames on an Ethernet must be at least twice the media’s
propagation delay. For instance, in a 10 Mbps coax-based LAN with a maximum
length of 2500 m, the minimum time per the IEEE 802.3 standard is 51.2 µs. In turn,
that time corresponds to 64 bytes, because 64 bytes x 8 bits/byte x 10-7 secs/bit is
51.2 µs. As network speed rises, either the minimum frame size must also rise or the
maximum segment length must fall.
3.2.1.6 Data Field
The data field has a minimum value of 46 bytes to ensure that the frame is minimally
72 bytes in length. The effect of having a minimum length data field requires that
information that is less than 46 bytes be padded to reach the minimum length. In
certain documentation, this is referred to as a PAD subfield. Regardless of the
manner, reference fill characters are added when necessary to ensure that the
minimum length is 46 bytes.
The maximum length of the data field is 1500 bytes. The implication of this is that
data-intensive applications such as multimedia imaging and file transfers must use
multiple frames.
38
1 byte 7 bytes 1 byte 6 bytes 6 bytes 1 byte 46 to 1500 4 bytes 1 byte
bytes
SSD ESDPreamble SFDDestination
AddressL/T Data FCS
Source
Address
Figure 3.3 Fast Ethernet Frame Format
3.2.1.7 Frame Check Sequence
This FCS field is four bytes in length. A Cyclic Reliability Check (CRC) is
calculated using both address fields, the type/length field, as well as the data field.
This CRC is placed in the four byte FCS field by the sender. The receiver then
recalculates the CRC at the other end. If CRC sent and CRC received match, the
frame is accepted, otherwise the receiver simply drops the frame.
There are two other possibilities that can occur that will result in a frame being
dropped. Those possibilities include:
1. Length of data field does not match the value in the length field.
2. Frame length is not a multiple of eight.
3.2.2 Fast Ethernet Frames
Figure 3.3 illustrates the Fast Ethernet frame in detail. We see that the frame format
for Fast Ethernet is similar to the IEEE 802.3 frame, except for the Start of Stream
Delimiter (SSD) and End of Stream Delimiter (ESD). SSD signals the arrival of a
frame, whereas ESD indicates that the frame has been successfully transmitted.
The other thing about Fast Ethernet frames is that Ethernet and IEEE 802.3 are
Manchester encoded with an interframe time gap of 9.6 µs between frames.(Johnson,
1996).
39
In comparison, Fast Ethernet is transmitted using 4B5B encoding and an interframe
gap of 0.96 µs. Both SSD and ESD fall within this gap.
3.2.3 Gigabit Ethernet Frames
Recall that the operating speed of an Ethernet network is reflected in either an
increase in frame length or a decrease in maximum segment length.
At 1 Gbps, if a minimum frame length of 64 bytes is maintained, the network
separation falls to 20 m. In structured cabling within an office building, horizontal
cabling takes up to 10 m from the wall socket to the desktop. Therefore, to
accomplish an increase of network cabling to around 200 m, two special techniques
are employed:
• carrier extension
• packet bursting.
These are quickly discussed here.
3.2.3.1 Carrier Extension
Carrier extension extends the Ethernet slot time to 512 bytes (from 64 bytes). This is
achieved by padding the minimal 64-byte frame from the earlier 64 bytes. This
action results in the carrier signal being placed on the network with an extension of
up to 512 bytes. To accomplish this, frames less than 512 bytes in length are padded
with special carrier extension symbols.
At the receiver end, extension symbols are stripped off prior to FCS checks. Note
that carrier extension applies only to half-duplex Ethernet. This significantly
degrades performance, especially when coupled with short packets. To sort this
issue, packet bursting was introduced.
40
64-byte minimum
512-byte maximum
ESDPreambleSource
AddressL/T Data FCSSFD
Destination
Address
Figure 3.4 Gigabit Ethernet Frame Format with Carrier Extension
3.2.3.2 Packet Bursting
If a station has multiple frames to transmit, it does so after the first (padded) frame is
successfully transmitted. Subsequent frames are not padded, but are limited by the
maximum frame length. Figure 3.5 depicts Gigabit Ethernet with packet bursting.
We note therein that that the first two packets transmitted were less than 512 bytes in
length and were extended. Future packets within the burst time of 1500 bytes are
transmitted to completion, this is indicated by Packet 3. The interframe gap between
frames is reduced from 9.6 µs on a 10 Mbps LAN to 0.096 µs on Gigabit Ethernet.
Burst Time (1500 bytes)
Carrier
Sense
Packet 1 xxxx Packet 2 xxxx Packet 3
Slot Time
(512 bytes)
Figure 3.5 Gigabit Ethernet Packet Bursting
41
3.2.4 Frame Overhead (Held & Jagannathan, 2004)
Table 3.1 summarizes the frame overhead percentage associated with transporting in
Ethernet and Gigabit Ethernet frames as the number of bytes of information varies
from 1 byte to 1500 bytes. As indicated, the percent overhead varies from 1.7
percent to 98.61 percent. We see that performance can degrade considerably with
interactive traffic. The information in Table 3.1 can be important for network
performance, especially in client/server situations, where it is preferable to send a
lesser number of frames with information for several transactions at once, as that
reduces the number of interframe gaps, which in turn improves the efficiency of the
data flow.
3.3 AVAILABILITY LEVELS
We first define availability (A):
A% = [operational time / total time] x 100
expressed as a percentage.
Consider a bridge that works round the clock. Over a year’s time, assume that the
bridge failed once and took 8 hours to repair. So out of 8760 hours per annum, the
device was operational for 8752 hours.
42
Table 3.1 Frame Overhead
Bytes
Ratio of Ratio of
Info in Overhead/ Percentage Overhead/ Percentage
Data Field Frame Length Overhead Frame Length Overhead
1 71/72 98.61 519/520 98.61
10 62/72 86.11 510/520 98.08
20 52/72 72.22 500/520 96.15
30 42/72 58.33 490/520 94.23
45 27/72 37.50 475/520 91.35
46 26/72 36.11 474/520 91.15
64 26/90 28.89 456/520 87.69
128 26/154 16.88 392/520 75.38
256 26/282 9.22 264/520 50.77
512 26/538 4.83 26/538 4.83
1024 26/1050 2.48 26/1050 2.48
1500 26/1526 1.70 26/1526 1.70
Ethernet Gigabit Ethernet
Then the availability of the bridge becomes (Held, 2000):
A% = (8752/8760) x 100 = 99.9%
There are two options to increase the availability of devices. Either deploy redundant
devices or have devices with multiple ports.
What are the implications?
Reliability is typically measured in terms of the mean time between failures (MTBF)
and mean time to repair (MTTR). Can these parameters be used to better understand
availability levels? The answer can be obtained in the formula for availability
expressed in terms of MTBF and MTTR as follows:
A% = [ MTBF/(MTBF + MTTR) ] x 100
From the above formula, it is important to remember that these are mean or average
times. Otherwise the calculations will be erroneous. The mean times need to be
measured across a range of devices installed; this is the MTBF information provided
by device vendors, which may be used in place of the in-house determined average
figures.
43
We note that, if devices are connected in series, then for the system availability As:
As = Π Ai
Whereas if devices are connected in parallel:
As = { 1 - Π ( 1 - Ai ) } x 100
For hybrid topologies of devices, one can consider the system as a sequence of series
and parallel elements and compute the overall level of availability as simply as one
computes the impedance of a block of series and parallel resistors.
3.4 NETWORK TRAFFIC ESTIMATION (Held, 2000)
We now consider the use of an a priori scheme to estimate or predict network
performance. Plan for segmentation if high use is predicted, using a local bridge,
switching device, or similar device to enhance performance (Awduche et al. 1999).
In this case, one estimates traffic by considering the required functions of each
network user. Group or classify similar network users into a workstation class. Do
the calculations for one typical member of the workstation class, then multiply by the
number of workstations in that class to obtain an estimate of traffic for the entire
class. Repeat the procedure for all workstation classes and add the results to arrive at
the average traffic for the entire network (Jain, 1995).
For a typical workstation class, activities performed may include:
• load application
• load graphic image
• save graphic image
• send e-mail message
• receive e-mail message
• print graphic image
44
• print text data
• invoke a client/server database.
After selecting the activities, determine:
• message size
• number of frames per message
• frame size
• frequency per hour.
Subsequently, use the following formula:
Bit Rate = (frames/message x frame size x 8 x frequency/hour) / (3600
s/hour)
Now that you have calculated this number for each activity performed for the
workstation class, add up the bit rates (bps) for the entire class representative.
If there are N stations in the class, then the bit rate per class = N x (bit rate per
representative). Typical classes may include (project) managers, architects,
secretaries, engineers, programmers and system administration staff.
Finally, add the computed bit rates for all classes to arrive at the total estimated bit
rate for the entire network. The projected growth rates for workstation classes may
then be estimated to arrive at the projected bit rate for future utilizations.
When projecting the traffic load, it is important to note that utilization levels of
Ethernet beyond 50 percent will result in performance degradation that begins to
become observable. At such a time, you should consider segmentation using two-port
local bridges, which are less expensive than routers or switches. Figure 3.6 illustrates
this situation, by example, placing selected user classes within separate bridged
segments.
45
host
Ethernet segment A
Ethernet segment B
host
Bridge
Figure 3.6 A Subdivided Network
For the example shown in Figure 3.6, let us assume that the most busy workstation
class is programmers. Then, we may wish to consider placing all the users of that
class in a separate segment connected to the other segments via a local bridge. If, for
instance, the network utilization for the network is 65 percent, with the programmers
consuming 54 percent, then segmenting the network results in a utilization of (65 x
54) percent or 35 percent for that class. Although not a perfect improvement, it is a
betterment of the situation.
From this example, we can note why server farms should not all be placed in one
segment. It is akin to putting all one’s eggs in the same basket, with most if not all
client/server transactions in the same segment.
Figure 3.7 illustrates the 80-20 rule. When interconnecting two or more separate
networks, this so-called 80-20 rule applies, with 80 percent of traffic typically intra-
LAN and the remaining 20 percent inter-LAN.
46
Internet
10 percent
intranet 10 percent
departmental
80 percent
Figure 3.7 Typical LAN Information Distribution
3.5 AN EXCURSION INTO QUEUING THEORY (Bertsekas,
2003)
Now we need a procedure to estimate waiting times in the system and to select
appropriate equipment with sufficient memory to meet specific requirements of the
organization. To do so, we need some information about arrivals and servicing times
for frames arriving in a network. Then we apply classical results from queuing theory
to arrive at characteristics of the system. Queuing theory has to do with managing
both delays and buffer memory in remote bridges and routers used to link networks.
If the delay is too high or the memory too low, performance in these devices
degrades, necessitating retransmissions.
Queuing theory affords us models to determine and manage delays, to investigate the
effects of modifying operating rates of circuits, as well as to determine the minimum
47
acceptable memory requirements for devices to maintain a satisfactory level of
performance. We now turn our attention to these features, as determined by queuing
theory (Aggarwal et al. 2000).
Consider the scenario where two LANs are inter-connected via remote bridges or
routers. Assuming a single-channel, single-phase queuing model with Poisson
arrivals and arbitrary servicing times, the following formulas apply and yield
information on waiting line characteristics (Held, 2000).
λ = arrival rate
μ = mean service rate
Utilization: P = λ/μ
Probability: Po = 1 - λ/μ
Length of system: L = λ/(μ - λ)
Length of queue = Lq = [λ2 / μ(μ - λ)]
Waiting time in queue: Wq = Lq/λ
Waiting time in system: W = Wq + 1/μ
3.2.1 Buffer Memory Considerations
We assume that:
Pn = probability of n units in a system
= (λ/μ)n ( 1 - λ/μ )
= pn ( 1 – p )
With:
P (n>k) = (λ/μ)k = pk
Where: μ = service rate
48
λ = arrival rate
For:
n = 0 to n = 20
k = 0 to k = 20
Table 3.2 depicts these probabilities, derived from a calculative computer
programme.
Table 3.2 Probabilities
N P (N ) K P (N > K )
0 0 .6 2 5 0 0 0 0 0 0 1 .0 0 0 0 0 0 0 0
1 0 .2 3 4 3 7 5 0 0 1 0 .3 7 5 0 0 0 0 0
2 0 .0 8 7 8 9 0 6 3 2 0 .1 4 0 6 2 5 0 0
3 0 .0 3 2 9 5 8 9 8 3 0 .0 5 2 7 3 4 3 8
4 0 .0 1 2 3 5 9 6 2 4 0 .0 1 9 7 7 5 3 9
5 0 .0 0 4 6 3 4 8 6 5 0 .0 0 7 4 1 5 7 7
6 0 .0 0 1 7 3 8 0 7 6 0 .0 0 2 7 8 0 9 1
7 0 .0 0 0 6 5 1 7 8 7 0 .0 0 1 0 4 2 8 4
8 0 .0 0 0 2 4 4 4 2 8 0 .0 0 0 3 9 1 0 7
9 0 .0 0 0 0 9 1 2 6 9 0 .0 0 0 1 4 6 6 5
1 0 0 .0 0 0 0 3 4 3 7 1 0 0 .0 0 0 0 5 4 9 9
1 1 0 .0 0 0 0 1 2 8 9 1 1 0 .0 0 0 0 2 0 6 2
1 2 0 .0 0 0 0 0 4 8 3 1 2 0 .0 0 0 0 0 7 7 3
1 3 0 .0 0 0 0 0 1 8 1 1 3 0 .0 0 0 0 0 2 9 0
1 4 0 .0 0 0 0 0 0 6 8 1 4 0 .0 0 0 0 0 1 0 9
1 5 0 .0 0 0 0 0 0 2 5 1 5 0 .0 0 0 0 0 0 4 1
1 6 0 .0 0 0 0 0 0 1 0 1 6 0 .0 0 0 0 0 0 1 5
1 7 0 .0 0 0 0 0 0 0 4 1 7 0 .0 0 0 0 0 0 0 6
1 8 0 .0 0 0 0 0 0 0 1 1 8 0 .0 0 0 0 0 0 0 2
1 9 0 .0 0 0 0 0 0 0 1 1 9 0 .0 0 0 0 0 0 0 1
2 0 0 .0 0 0 0 0 0 0 0 2 0 0 .0 0 0 0 0 0 0 0
P (N u n its ) a n d P (K + u n its in s ys te m )
P (N fra m e s in sy s te m )
P ro b a b ility o f N U n its P ro b a b ility o f K o r M o re U n its
Using the data in Table 3.2, you can make the following computations.
To obtain a level of 99.9 percent of the occurrences, which is equivalent to saying
0.1 percent of non-manageable occurrences, note first from the table that:
When k = 7,
P (N > 7) is 0.00104284
49
So one selects k = 8 to satisfy the requirement of handling 99.9 percent of
occurrences, wherein the frame arrival rate exceeds the servicing rate of the bridge or
router.
So, if frame length is 1200 bytes, the memory requirement is:
Memory = 1200 bytes/frame x 8 frames
= 9600 bytes
This procedure yields a nine-step approach for determining storage requirements:
1. Set λ = mean arrival rate
2. Set µ = mean servicing rate
3. Determine the utilization level
4. Determine the level of service required when λ > µ
5. Set N = units in system
6. Set K = level of service the server is required to provide
7. Find p(N > K)
8. Extract K = number of frames to be queued
9. Multiply average frame length by the number of frames K to be queued.
This yields the memory values for all situations with a pre-determined probability
level.
3.6 ETHERNET PERFORMANCE DETAILS
Because of the random nature of collisions in the CSMA/CD protocol, Ethernet bus
performance is nondeterministic. As a result, performance characteristics and delays
are not predictable. All we have are average and peak utilizations and this is
information we may use to segment an existing network to enhance performance.
(Held, 2003)
50
3.6.1 Network Frame Rate
The parameter for Fast Ethernet is ten times the value for 10 Mbps Ethernet.
Similarly, the rate for Gigabit Ethernet is ten times that for Fast Ethernet, but that
proposition is true only for certain types of frames, where carrier extension is used.
Let us quickly revisit the IEEE 802.3 frame formats (Held & Jagannathan, 2004):
• Preamble (8 bytes) [7 byte preamble and 1 byte SOF delimiter]
• Destination Address (6 bytes)
• Source address (6 bytes)
• Length / Type (3 bytes)
• Data (46 to 1500 bytes)
• FCS (4 bytes)
• Total = 72 bytes to 1526 bytes.
Under Ethernet and IEEE 802.3 frames operating at 10 Mbps, there is a dead gap of
9.6 µs between frames. This can be used to determine the frame rate on the network.
For example, consider a 10 Mbps LAN.
The bit time is 10-7 s or 1 ns.
If we assume frame length of 1526 bytes maximum, then the time per frame is
9.6 µs + 1526 bytes x 8 bits/byte x 100 ns/bit = 1.23 ms
Because one 1526-byte frame requires 1.23 ms, in 1 s there are 1/1.23 or
approximately 812 maximally sized frames. Such a situation using maximal frames
occurs when doing data-intensive file transfers.
Table 3.3 shows frame rates for Ethernet, Fast Ethernet and Gigabit Ethernet. It
summarizes the frame processing requirements for these networks under 50 percent
and 100 percent load conditions, based on minimum and maximum frame sizes.
51
These rates indicate the number of frames that a bridge connected to a LAN must be
capable of handling.
If a bridge, switch or router is used on a LAN, the data contained in Table 3.2 can be
used to determine the minimum required processing speed for the device.
Table 3.3 Ethernet Frame Processing (Frames per Second)
100% Load
Ethernet 1526 406 812
72 7440 14,880
Fast Ethernet 1526 4060 8120
72 74,400 148,800
Gigabit Ethernet 520 117,481 234,962
1526 40,637 81,274
Size (Bytes)
Frames per Second
50% Load
Network Type Average Frame
3.6.2 Gigabit Ethernet Considerations
In Gigabit Ethernet, carrier extension is used to ensure a minimum framelength of
512 bytes (or 520 with preamble or SOF delimiter). The carrier extension runs from
0 to 448 bytes according to the length of the pure data content of the frame. The
interframe gap is 0.096 µs (Johnson, 1996).
This in turn entails that:
Frame Rate = 0.096 µs + 520 bytes x 8 bits/byte x 1 ns/bit = 4.256 µs
So, in 1 s there are a maximum of
1 / 4.256 = 234, 962 minimum sized frames
In turn, this means that with the transmission of a maximum length 1526-byte frame,
the frame rate is:
52
0.096 µs + (1526 bytes x 8 bits/byte x 1 ns/bit) = 12.304 µs
Therefore, in 1 s there can be a maximum of 81,274 maximum length frames.
If we look again at Table 3.3, we see easily that the performance ratios of Fast
Ethernet and Gigabit Ethernet are actually 1.579:1 and 15.79:1, respectively, in
relation to 10 Mbps Ethernet. This is despite the 10:1 and 100:1 increase in data
rates.
3.6.3 Actual Operating Rate (Held, 2000)
To estimate the actual operating rate of an Ethernet network, you need to deduct the
dead time from the maximum throughput. For example, at 10 Mbps our computation
would be as follows:
10 Mbps – (9.6 µs / 100 ns x 812) = 9,922,048 bps = 9.922 Mbps
Therefore we see that for maximum utilization of an Ethernet, one must transmit
9.922 Mbps of the actual data rate.
Similarly, for Fast Ethernet:
Operational Rate = 100 – (0.96 µs/10 ns x 8127)
= 99.22 Mbps
For Gigabit Ethernet, the operational rate is 992.19 Mbps.
3.7 BRIDGING A NETWORK
When utilization levels reach 50 to 60 percent for extended time periods, LAN
modification is in order. One way to do this is to use local bridges to segment the
bigger LAN. Once the decision has been made to bridge the network, the next step is
to determine that the filtering and forwarding rate of the bridge is minimally equal to
the actual operating rate (frames/sec) of the network as measured or estimated.
53
Until now, we assumed that two bridged segments have the same operating rates.
This is not always the case. For example, one department may be working at 10
Mbps, whereas another uses Fast Ethernet. What is the throughput between the
LANs? Figure 3.8 below depicts the situation.
LAN BLAN A Bridge
R1 R2
Figure 3.8 Linking LANs with Different Operating Rates
To compute the time to transfer information from one network to the other, let us
assume the following parameters:
RT = Time taken to send A � B
R1 = Time taken to send A � Br
R2 = Time taken to send Br � B.
Then the time taken to transmit data between the two networks becomes:
RT = (R1 x R2) / (R1 + R2).
For instance, if we estimated:
R1 = 812 frames/s @ 10 Mbps
R2 = 8130 frames/s @ 100 Mbps
Then:
RT = 738 frames/s
Assuming that the sending station has full access to the bandwidth and other
resources of the bridge.
54
4
ISSUES AT THE NETWORK,
TRANSPORT AND APPLICATION
LAYERS
4.1 INTERNETWORKING OVERVIEW
We begin our discussion of issues at the upper layers, with an overview of
internetworking at large. We introduced the router elsewhere and noted that some of
its key functions include the linking of different networks, routing and delivery of
data between processes and application in End Systems (ESs) [an ISO terminology
for edge devices on different networks], and to do all this seamlessly and
transparently in relation to the network architecture in the attached networks (Clark,
1988; Balakrishnan et al. 1995). This chapter has dealt with the higher TCP/IP layers
and places further analysis in perspective. This discussion is important because Black
Box methods work at the Transport Layer level (White Box works at NW Layer).
One protocol, and predominantly the popular one, that supports these functions is the
IP or Ipv4. Figure 4.1 shows the IP header, which is a minimum of 20 Octets. The
fields in this depiction are as follows (Held, 2000):
• Version (4 bits) – indicates version number to enable evolution.
• Internet header length (IHL) (4 bits) – length of the header in units of 32-bit
words. Minimum value of IHL is 5 and maximum is 20 Octets.
55
• Type of service (8 bits) – this yields guidance to ES IP modules and to routers
en route about the relative priority of packets.
• Total length (16 bits) – total IP packet length in Octets.
• Identification (ID) (16 bits) – a sequence number that, along with the source
or destination addresses and user protocol, identifies a packet uniquely. Given
these three values, this ID field is then unique in value.
• Flags (3 bits) – only two bits are different as defined here. The more bit
indicates whether this is the last fragment in the original packet. The do not
fragment bit prohibits further fragmentation when set. Note that, if en route,
the second flag bit is set and if the packet exceeds the maximum transmission
unit (MTU) size, the packet is simply discarded. Hence, source routing is
preferable when this bit is set to avoid subnetworks with too low of an MTU
size.
• Fragment offset (13 bits) – this field indicates where in the original segment
this fragment belongs, in 64-unit bits. By implication, all but the last packet
must contain a data bit in multiples of 64 bits in length.
• Time to live (TTL) (8 bits) – how long, in seconds, a packet is allowed to
remain on the internet. Each router en route to the destination must decrement
the TTL by one at least, so this field is similar to a hop count.
• Protocol (8 bits) – indicates the next higher layer protocol, which is to receive
the data in the destination, essentially indicating the type of the earlier header
in the packet after the IP header.
• Header checksum (16 bits) – an error detection code applied only to the
header. Because some header fields change in transit, this checksum is
recalculated and reverified at each router en route.
56
• Source address (32 bits) – formulated to allow a variable allocation of bits to
specify ES attached to the specified network.
• Destination address (32 bits) – same as source address.
• Options (variable) – encodes requested options such as security, source
routing, routing recording and timestamps.
• Padding (variable) – used to ensure that packet length is a multiple of 32 bits
in length.
4 8 16 32
Version IHL
Flag
Total LengthToS
Identification
TTL Protocol Header Checksum
Fragment Offset
Options + padding
Source Address
Destination Address
Figure 4.1 The IP Header
We observe that a new standard for IP addressing was initially specified by the
Internet Engineering Task Force (IETF), variously called Ipv6 or Ipng. This scheme
is depicted in Figure 4.2, where we see that the Ipv6 uses addresses 128 bits in length
(Deering & Hinden, 1998; Shenker, 1995).
57
0 12 16 324
0 o
cte
ts
Flow Label Traffic classVer
Source Address
Destination Address
Next Header Hop LimitPayload Length
Figure 4.2 The Ipv6 Header
Ipv6 supports the higher speeds of today’s networks and the mix of multimedia data
streams. Basically, there was a need for more addresses to assign to all conceivable
devices. As noted, the source and destination addresses are 128 bits in length. It is
expected that all TCP/IP installations will eventually graduate to Ipv6, although this
process will take many years, if not decades, to be achieved.
4.2 PROTOCOL ARCHITECTURE
Figure 4.3 shows two LANs interconnected over an X.25 network. Therein, we see
the operation of IP for data exchange between ESs, A and B, attached to the two
LANs.
The IP at A receives blocks of data from the higher layer protocol software at A. It
then attaches an IP header with the global IP address of B. This address consists of a
network ID and an ES identifier and the resulting unit is called an Internet Protocol
data unit (IPDU) or simply a datagram.
58
ES (A) ES (B)
TCP TCP
IP IP
LLC LLC X.25.3 X.25.3 LLC LLC
MAC MAC X.25.2 X.25.2 MAC MAC
Phy Phy Phy Phy Phy Phy
IP
Router XRouter X
IP
X.25 Packet
switched WAN
Figure 4.3 IP Operation
The datagram is encapsulated within the LAN protocol and sent to the Router X,
which promptly decapsulates the LAN fields to examine the IP header. The same
router then further encapsulates the datagram with X.25 protocol fields, and sends it
across the WAN to the remote Router Y. That router then decapsulates the X.25
fields and recovers the datagram, then wraps it up with layer 2 fields as suitable for
LAN 2 and sends it off to the device B.
4.3 DESIGN ISSUES
We now turn our attention to some design issues in more detail. These include:
• addressing (Held, 2000)
• routing (Perlman, 2001)
• datagram lifetime (Krol, 1999)
• fragmentation or reassembly (Stallings, 1993).
4.3.1 Addressing
A unique address is associated with each ES and intermediate system (IS) router
within a configuration. This is called an IP address and is used to route a datagram
through an internet to the target system indicated by the destination address.
59
Once data arrives at the remote ES, it must be processed and delivered to some
process or application therein. Typically, multiple applications will be concurrently
supported and one application may support several users.
Each application and maybe each user of an application is assigned in the system
architecture with a port. Minimally, each application has a port number that is unique
in that system. Furthermore, for instance, a File Transfer Protocol (FTP) application
may support several concurrent data transfers and in that situation each transfer is
dynamically assigned a port number.
There are two levels of IP addressing. First there is the globally applicable (and
possibly redundant) Ipv4 address. Second, for each device on the network, there is a
unique address. Examples are the MAC address (802 network) and the X.25 host
address. These are sometimes referred to as network attachment point addresses
(NAPAs). (Held, 2000)
The issue of addressing scope is relevant only for global IP addresses. On the other
hand, port numbers are unique only within a given system. Hence if there are two
systems, A and B, the following ports work uniquely - A.1 and B.1.
4.3.2 Routing
Generally, routing is achieved by maintaining a table within each router. The routing
table yields, for a given target system, the next hop (next router) to which the
datagram should be sent (Perlman, 2001).
Routing tables can be static or dynamic. Whereas a static table can contain redundant
routes for routers that are not available, a dynamic routing table is more flexible.
“Neighbour greeting” is used to determine the next hop. This helps with congestion
control and also to address the mismatch between LAN and WAN transmission rates.
60
Source routing occurs when a sending station dictates a sequential list of routers that
the datagram must traverse; this specification is inserted within the datagram itself by
the sending station. Route recording is when each router in the trajectory of the
datagram appends its IP address to the datagram. This often helps with network
maintenance and trouble-shooting.
4.3.3 Datagram Lifetime
There is a possibility, especially with dynamic routing, that a datagram or some of its
fragments keeps circulating endlessly in the internet, especially when there are
sudden, significant changes in the network traffic or when there is a flaw in the
system’s routing tables. To preclude this problem, datagrams are sometimes marked
with a lifetime field similar to a hop count. The hop count is initially set to N and
decremented by one as the datagram passes through each router en route. When the
hop count reaches zero, the datagram is discarded.
4.3.4 Fragmentation or Reassembly
Subnetworks within an internet may specify different MTUs, which refers to the
largest size of datagrams in transit. It is not feasible to dictate one uniform maximum
packet size across networks. So when the next subnetwork has a smaller MTU size
compared to the previous subnetwork, the only option is to fragment the packet. (Of
course, unless the do not fragment bit flag is set, when the packet is discarded).
In IP, reassembly of fragmented datagrams happens in the ES at the destination. The
following fields in the IP header are used for handling fragmentation and reassembly:
61
• Data unit identifier (ID) – a composite of source and destination addresses, an
identifier of higher level protocol that generated the data (e.g. TCP), and a
sequence number supported by that protocol layer.
• Data length - the length of the user data field.
• Offset - the position of a fragment of user data in the data field of the original
datagram in multiples of 64 bits.
• More Flag- specifies that more fragments follow.
One problematic issue that must be dealt with is that of lost fragments. IP does not
guarantee delivery. There are two mechanisms to deal with this issue.
First, one mechanism is for the reassembly function to generate a local real-time
clock that keeps ticking. If the clock expires prior to reassembly, that entire effort is
abandoned and received fragments are discarded.
A second approach uses the datagram lifetime, a part of the header of each incoming
fragment. The lifetime field continues to be decremented by the reassembly function.
If the lifetime expires to full-time reassembly, received fragments are discarded.
4.4 ROUTING AND ROUTE PROTOCOLS
In essence, a router is primarily a packet switch. In this connection, the term packet
refers to the so-called protocol data unit (PDU) that traverses from the Network
Layer 3 software in one system across a network to the Layer in another system. The
packet contains, among other information, the addresses of source and target ESs.
The ESs are then devices that generate or receive the overwhelming majority of all
packets traversing the network (Miller, 2004).
Another piece of terminology is the “subnetworks”. This refers to a collection of
network resources that can be reached without going through a router. If a router is
62
involved in going though a network from one ES to another, these ESs are in
different subnetworks.
Finally, an internetwork (or simply an internet) refers to a collection of two or more
subnetworks interconnected by routers.
In this context, the role of the router is simple. It receives packets from ESs (and
possibly other routers) and routes them through the internet to the appropriate
destination network. Once the packet has arrived at the destination subnetwork, the
last router in the path traversed forwards the packet to the intended ES recipient.
Some of the other more advanced functions possibly supported by routers include
fragmentation, congestion control and fairly sophisticated packet filters providing a
modicum of security.
The frame is the mechanism to transport data. Packets ride within frames and are the
PDU to get from one ES to another across the internet. If two ESs are within the
same subnetwork, the packet is framed by the transmitter and can pass through
bridges. If the two ESs are in different subnetworks, the packets will go from the
sender to the local router which decapsulates the Layer 3 header and sends it across
to the next IS in a new frame.
It is worth mentioning that whereas bridges maintain forwarding tables and track
virtually all devices on the subnetwork, there are no entries in these tables for ESs on
different subnetworks and how to reach them. In the latter case, data must pass
through routers.
4.5 ROUTING REVISITED
The network layer protocol in an internet is responsible for all end-to-end routing of
packets. There are many such protocols and they all share some common features.
63
They all use a packet structure and an address format. All of them specify a network,
ToS, as well as other issues like fragmentation, connectionless vs. connection-
oriented service, and packet prioritization (Kousky, 2000).
The fundamental concept for the network layer protocol is the Layer 3 address. This
address is hierarchical, with at least two addressing segments defined. The first
identifies a subnetwork and the second an ES within that subnetwork. These two
fields are always present, whereas some specific Layer 3 protocols define additional
fields (Bennett & Zhang, 1996).
Note that each router and device interface must have a unique Layer 3 address, a
situation not unlike an area code and local phone number in the telephone industry.
This unique IP address is the basis for all routing within the internet. (Egevang &
Francis, 1994).
The vast majority of all Layer 3 protocols are connectionless and datagram (best-
effort) based. We note that connectionless service is one wherein the upper layer
protocol or architecture has no means of requesting an end-to-end relationship or
connection with another ES. All that the Layer 3 protocol can do is to provide data
with a destination address. All acknowledgements, flow control, and sequencing of
messages are managed by the upper layer protocols or applications.
A datagram network is one wherein routers are unable to establish an end-to-end
circuit for traffic. Every packet received by a router is routed independently of earlier
or later packets. So no guaranteed QoS can be provided. On the other hand, if the
network supports end-to-end circuits, intermediate routers would know that packets
will arrive on an established circuit and expected loads can be defined at the time of
circuit setup (Cerf et al. 1974).
64
Table 4.1 summarizes four of the major network layer protocols in use today, along
with their key features. All four are connectionless and datagram based (Bajko et al.
1999).
Table 4.1 Major Network Layer Route Protocols
Address Address
Length Fields Additional
(octets) (octets) Capabilities Used in
Internet 4 NETID (var) Fragmentation, Internet, most
Protocol (IP) HOSTID (var) nondelivery network
notice environments
subnetting
Internetwork 12 Network (4) Automatic client NetWare
Packet Node (6) addressing
Exchange Socket (2)
Protocol (IPX)
Datagram 4 Network (2) Automatic client AppleTalk
Delivery Node (1) addressing
Protocol (DDP) Socket (1)
VINES Internet 6 Network (4) Fragmentation, VINES
Protocol (VIP) Subnetwork (2) nondelivery
notice,
Automatic
addressing
Network Layer
Protocol
IP addressing is the most complex of all. The boundary between the IP subnetwork
number (NETID) and ES number (HOSTID) is not rigidly fixed. The boundary
varies depending on the address class and the subnet mask being used. In Table 4.2,
we see that there are five address classes, three of which are used for deploying
subnetworks.
The more difficult aspect of the IP addressing mechanism is the notion of the subnet
mask. The purpose of this mask is to take a NETID and divide it into smaller
subnetworks connected by routers. For instance, the Class B address 128.13.0.0 can
be subdivided into 256 smaller networks designated as 128.13.1.0, 128.13.2.0,
128.13.3.0 and so on, using the 255.255.255.0 subnet mask. This process is called
65
subnetting. The mask may also be used by routers to summarize routes. For example,
all the Class C networks from 199.12.0.0 to 199.12.255.0 can be advertised as
199.12.0.0 using the 255.255.0.0 mask. This process is called supernetting or
Classless Inter Domain Routing (CIDR).
Table 4.2 IP Addressing Overview
1st
Octet Length of # of # of
Address Class Value NETID NETID HOSTID
Class A 1-126 1 octet 126 16,777,214
Class B 128-191 2 octets 16,382 65,534
Class C 192-223 3 octets 12,097,150 254
Class D 224-239 Multicast N/A N/A
Class E 240-255 Reserved N/A N/A
Usually, IP addresses are assigned to devices manually, but there can be exceptions.
For example, the Dynamic Host Configuration Protocol (DHCP) permits a DHCP
server to dynamically lease addresses to ESs when they come on line. IP can
fragment packets if required, with reassembly at the destination ES.
4.5.1 Routing Protocols
Routing protocols are responsible for maintaining routing tables dynamically. The
routing protocols monitor the network and accordingly update the routing tables
when network changes occur. Most Network Layer Route protocols use at least two
routing protocols. (Miller, 2004).
These protocols can be evaluated according to a number of criteria:
• bandwidth
• metrics
66
• convergence time
• memory space
• processing power.
Bandwidth is the first criterion for appraisal. Maintaining routing tables means that
routers need to greet each other, which consumes bandwidth. The more bandwidth so
consumed for administrative reasons, the less that is available to the protocol for its
proper intended purpose.
The next point is the metric that the routing protocol minimizes. Some use simple
hop count, whereas other more sophisticated protocols use such metrics as delay,
bandwidth, packet loss, or a combination of these metrics.
The third assessment point is convergence time. This is the delay between the
occurrence of network changes and the time taken for all routers to refresh and
updated themselves with the most current state of affairs, as well as to alter the
routing tables.
Finally, routing protocols use up memory space and processing power within the
routers. With ever more powerful devices, this becomes less of an issue.
Regardless of the specific routing protocol used, these protocols can all be clubbed
under the banner of distributed protocols, meaning that route recalculation occurs at
routers in the internet. We note that centralized routing protocols wherein a single
system makes all routing decisions and then downloading routing tables for all
routers is a relatively recent phenomenon and is happening by degrees. There are two
distinct types of distributed routing protocols (Cisco Systems, 2000; Blake et al.
1998):
1. Distance Vector (DV)
2. Link State (LS).
67
4.5.2 DV Protocols
DV protocols are also called Vector Distance or Bellman-Ford protocols. They have
three important features (Tanenbaum, 2003):
1. Routing updates produced have a list of target or cost pairs.
2. Updates are sent to all neighboring devices.
3. Re-routing calculations are performed within each system.
In essence, DV protocols extract a list of learned destinations and the costs to them
and pass this knowledge on to neighboring devices. These neighbours then use this
information to identify better routes than currently exist in routing tables, at which
point the latter are updated.
Elsewhere, we noted that DV protocols have a count-to-infinity problem. Figure 4.4
illustrates this situation. We see therein that Router A has a route to Subnetwork 1
and Router B uses router A to reach this subnetwork.
If Router A loses the route and Router B advertises this fact before Router A can
advertise the loss, Router A will accept Router B as a new route to Subnetwork 1,
forming a loop. Any packet destined for Subnetwork 1 and arriving at one of these
routers will endlessly thrash between them.
(1) Router A
Loses link
(3) Router A learns route from (2) Router B advertises
Router B and forms loop route to subnetwork 1
Router BRouter A
Subnetwork 2Subnetwork 1 Subnetwork 3
Figure 4.4 The Count-to-Infinity Problem
68
To address this problem, most DV protocols employ a split horizon. This is a rule
that prevents a router from advertising routes downstream, from where they were
learned.
Split horizon with poison reverse is a version that permits advertisement of these
routes but sets the cost to infinity, to prevent other routers from learning. Most DV
protocols send complete updates periodically. This can seriously affect performance
in relation to convergence time, so an improvement to this is to have event-driven
(triggered) updates. However, if the table update was due to a failed (suboptimal)
route, it is possible for a router that has not received information about the change to
reintroduce the old route back into the network. To preclude this occurrence, routers
are required to place failed routes into a hold down state, for a time typically three
times the normal update interval. The first news (good/bad) travels quickly, but
subsequent good news (new routes) is learned more slowly.
In summary, DV protocols are simple to design and implement with little demand for
memory and processing power. However, convergence is a major problem and can
consume a fair amount of network resources. Table 4.3 summarizes the key attributes
of the major DV protocols, including RIP, RIP II, IGRP, EIGRP, RTMP, RTP and
BGP 4.
69
Table 4.3 The Major Distance Vector Protocols
Update Documented
Used to Route Metric(s) Interval by
Routing Internet Hop count 30 s RFC 1058
Information Protocol
Protocol (RIP) (IP)
RIP v2 (RIP II) IP Hop count 30 s RFC 1388
Routing Internet Delay, hop 60 s Novell/ Xerox
Information Packet count
Protocol Exchange
Exchange (IPX)
Interior Gateway IP Delay, 90 s Cisco
Routing Protocol bandwidth
(IGRP) (reliability,
load)
Enhanced IGRP IP, IPX, DDP Delay, Event Cisco
(EIGRP) bandwidth driven
(reliability,
load)
Routing Table Datagram Hop count 10 s Apple
Maintenance Delivery Computer
Protocol (RTMP) Protocol
(DDP)
Routing Table VINES Delay 90 s Banyan
Protocol (RTP) Internet
Protocol
(VIP)
Border Gateway IP Hybrid Event RFC 1771
Protocol version 4 (also policy driven
(BGP 4) based)
Routing Protocol
4.5.3 LS Protocols (Cisco Systems, 2000)
The other family of routing protocols, LS (Link State), has three distinguishing
features:
1. routing update broadcast
2. LS database
3. full route recalculation.
Routing updates are broadcast, in a manner similar to flooding. Full route
recalculation is performed last for all routers running the protocol. The central
feature of LS protocol is the LS database. Routing updates as flooded are all stored in
a local database. This database has enough information to graph out the entire
70
network, calculate alternative paths, and construct a routing table. All the databases
must synchronise. New routers in the network can obtain a copy of the database from
nearby routers, whereas existing routers periodically revisit and verify the integrity
of the database. The size of the database is impacted by the size and complexity of
the network. Although updates are kept small and are event–driven, the flooding still
consumes bandwidth. So there came about certain techniques to reduce the effect of
flooding of this information.
First, only selected routers are required to send a flooded update, priority checking
the update against their database to avoid re-sending already sent updates. Another
mechanism is for networks running LS to be segmented into areas, with updates
flooded only within the areas. To handle interarea routing information, specific
routers are assigned to summarize interarea routes to other areas.
LS protocols converge more quickly. They can be more bandwidth-friendly than DV
and less susceptible to routing routes. However, they can be more complex to
designs, configure and implement. They also consume more router resources like
memory and processing power.
The Internet is the collection of all existing internets. Each internet is locally
administered and referred to as an autonomous system (AS). Any routing protocol
working within an AS is called a Interior Gateway Protocol (IGP). However, there is
also a need to route across ASS when such protocols are called EGPs (Exterior
Gateway Protocols).
Table 4.4 summarizes the key attributes of the major LS routing protocols that exist
today, including OSPF, IS-IS, and NLSP.
71
Table 4.4 The Major Link State Protocols
LS Routing Protocol Used to Route Metric(s) Documented by
Open Shortest Path First IP Dimensionless RFC 2178
(OSPF)
Intermediate System to IP, CLNS Dimensionless RFC 1142 &
Intermediate System (IS-IS) ISO DP 10589
Netware Link Services IPX Dimensionless Novell
Protocol (NLSP)
4.6 EXCURSION INTO THE TRANSPORT LAYER
TCP/IP has two fundamental transport layer protocols (Clark et al. 1987; Floyd,
1994):
1. TCP (Davidson, 1992)
2. User Datagram Protocol (UDP) (Tanenbaum, 2003).
TCP is a reliable two-way byte stream protocol and is rather complex. It guarantees
in-sequence and accurate delivery of data by building a checksummed virtual circuit
connection over and on top of IP’s unreliable, connectionless, best-effort service. It
also deploys flow control and congestion control mechanisms that allow for efficient
use of bandwidth. In this context, a window of packets is sent pending
acknowledgement. This windowing mechanism is the basis for TCP’s flow and
congestion management capabilities (Aggarwal et al. 2000).
TCP also supports multiplexing, whereby messages can be sent to different processes
on the same host. It does this by means of port abstraction, wherein every process on
an ES is assigned a unique (locally) port number. The TCP header has a source and
destination port number. Services such as Telnet use this port abstraction and allow
multiple clients to connect to the same service (Barford & Crovella, 1998, 1999).
72
UDP is a best-effort unreliable connectionless protocol for applications without any
sequencing/flow control requirements. It is often used when promptness, rather than
accurate delivery, is sought – for example, when sending speech or video.
4.7 MULTIMEDIA SERVICE
We now turn our attention to the transport of multimedia applications, which is an
instance of issues at the application layer.
VoIP is the enabling technology for multimedia service integration. Savings from
integration can be significant and the technology is available.
Companies with a private voice network retain one or more PBXs to implement the
integrated service. A PBX is the switching element linking two users in a voice or
video connection. PBXs have three basic components (Keshav, 1997):
1. wiring
2. hardware
3. software.
Wiring is dedicated to each phone in use. This allows employees to call each other.
To gain access to the PSTN, PBXs need phone lines purchased from the telephone
company.
Hardware includes a switched network connecting two phones and servers for PBX
software. Software controls such functions as call setup, forwarding, call transfer,
call hold, as well as generating per-call statistics. Organizations with multiple
locations require PBXs dedicated at each site connected by public-leased lines (Joyce
& Walker, 1992).
Moving towards full VoIP deployment, we will gradually see PSTNs replaced by
public data networks, the introduction of VoIP gateways, as well as VoIP
73
gatekeepers, and the customer premise equipment (CPE). To fully support call
management, the International Telecommunications Union’s ITU-T standardized the
H.323 protocol, which describes terminals, equipment and services for multimedia
connections over a LAN. Voice is only one of the services supported.
The H.323 recommendations are proving to be the basis by which many backbone,
access and CPE vendors are developing VoIP components with assured
interoperability. H.323 VoIP products can be broken down into the following
categories, mapping loosely to network layers:
• CPE – includes such devices as Microsoft NetMeeting conferencing software,
Intel Proshare conferencing software, the Selsius Ethernet phone, as well as
the Symbol Netvision phone, an H.323 telephone that plugs into an Ethernet
port.
• Network infrastructure equipment - includes standard routers, hubs, and switches.
Because voice is sensitive to delays and losses, a number of router features such
as Random Early Detection (RED) (Floyd & Jacobson, 1993; Lin & Morris,1997;
Lin et al. 1999), weighted fair queuing (WFQ) (Keshav, 1991), Resource
Reservation Protocol (RSVP), compressed RTP, and multiclass, multilink PPP,
have evolved over the years to address these issues (Demers et al. 1990).
• Servers – provide a major VoIP benefit of utilizing the Internet model, with clear
demarcation between network infrastructure and network applications. The
H.323 gatekeeper service, for example, supports call control, an Authentication,
Authorization and Accounting (AAA) server provides billing and accounting;
and a Simple Network Management Protocol (SNMP) server provides for
network management.
74
• Gateways – represent an evolutionary step for organizations moving towards
VoIP. Given that it will be a while for the data network to handle all multimedia
communication, gateways are an interim measure to link the new VoIP services
with existing public or private voice networks.
4.8 SOME DELAY CALCULATIONS
In this section, we turn our attention to computing the delay times and latency that
occur when we deal with the transmission of small pieces of multimedia across an
internet (Cruz, 1998; Bennett et al. 2001). Because UDP is used to transport digitized
voice, we first need to note that the UDP header is 16 bytes in length. (Held, 2000)
4.8.1 10 Mbps Ethernet, 100 Mbps Fast Ethernet, and 1000 Mbps Gigabit
Ethernet
The delay time calculations for the 10 Mbps Ethernet, 100 Mbps Fast Ethernet and
1000 Mbps Gigabit Ethernet are presented in Table 4.5 below (Johnson, 1996).
Table 4.5 Some Ethernet Delay Calculations
Interframe
Gap (µs) Delay Time Maximum Delay Time
10 Mbps 9.6 ∆ = 9.6 µs + (8 + 6 + 6 + 2 ∆ max = 9.6 µs +1500 bytes
Ethernet + 20 + 16 + 100 + 7) bytes x 8 bits/byte x 10-7
s/bit
x 8 bits/byte x 10-7
s/bit
100 Mbps 0.96 ∆ = .96 µs + (8 + 6 + 6 + 2 ∆ max = .96 µs +1500 bytes
Fast + 20 + 16 + 100 + 7) bytes x 8 bits/byte x 10-8
s/bit
Ethernet x 8 bits/byte x 10-8
s/bit
1000 Mbps 0.096 ∆ = .096 µs + (8 + 6 + 6 + 2 ∆ max = .096 µs +1500 bytes
Gigabit + 20 + 16 + 100 + 7) bytes x 8 bits/byte x 10-9
s/bit
Ethernet x 8 bits/byte x 10-9
s/bit
Ethernet
75
Key parameters for the calculations include the interframe gap (varies, given in µs),
the MAC level preamble (8 bytes), destination MAC address (6 bytes), source
address (6 bytes), type/length field (2 bytes), and the data field (minimum 46 bytes
and maximum 1500 bytes). Within the data field is the encapsulated IP header. The
entire 20 bytes of the IP header need to be read to get to the UDP header, because the
UDP port must also be read. We also factor roughly 7 bytes of RTP information, plus
a small quantity (approximately 100 bytes) of multimedia data. For a maximum
length Ethernet field, the frame length increases to 1500 bytes.
Note that the maximum delay time only applies to data transported between
multimedia carrying packets. This explains why file transfers and similar operations
that have frames inserted between two voice packets can cause distortion on slow
Ethernet LANs.
By computing multimedia delay times, as well as the effect of the insertion of
packets transporting data between digitized voice packets, it becomes possible to
determine if your LANs can handle multimedia prior to implementing a new
application.
4.8.2 Switches (Perlman, 2001)
For a store and forward Ethernet switch, the entire frame must be received, stored,
and processed.
In this context, we can compute minimum and maximum delay times for a 10 Mbps
switch as follows:
∆min = 9.6 µs + (72 bytes x 8 bits/byte x 10-7 secs/bit)
= 67.2 µs
∆max = 9.6 µs + (1526 bytes x 8 bits/byte x 10-9 secs/bit)
76
= 12.304 ms
Similar considerations apply for switches at higher speeds and computations may be
made with the appropriate interframe gap and processing speeds at the higher rates.
When examining vendor specifications for latency, one needs to be cautious.
Sometimes vendors do not denote the frame length used for latency measurements.
This frame length has been factored into our computations carried out above.
As mentioned earlier, typically Black Box congestion control solutions have tended
to operate at the Transport Layer level. Therefore, it has been important to
understand this Layer in detail. [Note that White Box congestion control also
operates at the Network Layer level, hence the discussion of this Layer, mainly for
both the sake of completeness as well as to place future research over the coming
years in perspective]. The treatment of higher layers, as introduced in this chapter,
will be sourced from later points in this thesis.
77
5
THE ETHERNET FAMILY OF LANs
REVISITED 5.1 INTRODUCTION
In this chapter we turn our attention to the several members of the Ethernet family,
including (Stallings, 1993 & 1997):
• 10 Mbps Ethernet
• Fast Ethernet
• Gigabit Ethernet
• 10 Gigabit Ethernet.
Each member of the family has its immediate relatives - for instance, within 10 Mbps
Ethernet, we find 10BASE-T, 10BASE-2, 10BASE-5, 10BROAD-36, and 10BASE-
F. Within 100 Mbps Fast Ethernet (Johnson, 1996), we find 100 BASE-TX,
100BASE-FX, and 100BASE-T4. Also in 1000 Mbps Ethernet, better known as
Gigabit Ethernet (Krol, 1999) there are 1000BASE-LX, 1000BASE-SX, 1000BASE-
CX, and 1000BASE-T. However, in 10 Gbps operations, better known as 10-Gigabit
Ethernet, only one standardized LAN is defined. (Martin & Chapman, 1989)
Recall that a LAN solution consists of a transmission medium, MAC protocol, and
encoding mechanism, and operates using a predefined topology. The transmission
medium is concerned with the properties of the physical carrier that bears the signals
from source to destination. The MAC protocol governs the method by which signals
78
access the medium. The encoding mechanism defines how data and control codes are
encoded. Because all of the LAN technologies we discuss are baseband (digital
transmission of digital data), we will be concerned with only digital encoding.
Different signal elements are used to represent binary 1 and binary 0. A number of
encoding schemes are discussed later in this chapter.
Elsewhere, in other chapters, we look at all these components, paying particular
attention to topology and MAC protocols. The other two components of LANs –
transmission media and encoding schemes – were mentioned there only in passing,
primarily for the sake of continuity and completeness. We are able to afford a more
detailed treatment of these aspects of LANs later in this chapter. (Plummer, 1982)
5.2 TRANSMISSION MEDIA
Let us first turn our attention to the transmission media. There are primarily five
possibilities (Braden, 1999; Black, 1998):
1. UTP
2. STP
3. coaxial cable
4. OFC
5. unguided (wireless).
The last medium will not be treated here, being a topic in its own right.
Each medium has its benefits and limitations. Whatever the medium, the following
key considerations and characteristics must be considered (Stallings, 1997):
• bandwidth (note that the higher the bandwidth, the higher the data rate)
• transmission impairments
79
• mutual interference
• cost
• ease of Installation
• geographic scope supported
• maximum speed of communication supported.
During the ensuing discussion, we will have an opportunity to make observations
about each of these criteria.
5.2.1 Twisted Pair Comes in Two Varieties
1. UTP
2. STP.
UTP is the least expensive of twisted pair wiring. Office buildings come pre-wired
with a lot of excess 100 ohm voice grade UTP wires. Because twisted pair easily
bends around corners and is commonly located near office desks, this medium is
both readily available and easy to install. However, UTP is susceptible to
considerable interference from external fields and picks up a lot of noise as well.
To improve the performance of UTP, STP was developed. This medium shields each
pair of twisted wire using a metallic sheath to reduce interference. This improves
performance, but is more expensive than UTP. Also, it is not as easy as UTP to bend
around corners.
The initial in voice-grade UTP media was to provide services at 1 to 16 bps and was
adequate and suitable for applications extant at that time.
However, since that time, users have migrated to higher bandwidth applications, and
their requirements of the LAN have moved up considerably. Typically, 100 Mbps
Fast Ethernet and 1000 Mbps Gigabit Ethernet have become necessary, so ways had
80
to be found to upgrade LAN performance to these levels. To deal with these new
requirements, three types of UTP cable were initially standardized jointly by the
Electronics Industry Association (EIA) and the Telecommunications Industry
Association (TIA) within their joint EIA/TIA-568 standard. The three types of cable
are:
1. Cat(egory) 3
2. Cat(egory) 4
3. Cat(egory) 5.
Along with the cables and the associated hardware, the speeds supported by these
media include:
• Cat 3 – up to 16 MHz
• Cat 4 – up to 20 MHz
• Cat 5 – up to 100 MHz.
Note that the ability of these categories of UTP to support different types of LAN
transmission will depend on the signaling method used by different LANs. For
instance, consider a 4B/6B encoding. In this case, a Cat 5 cable will support a
transmission rate of (6/4) x 100 = 150 Mbps.
5.2.2 Coaxial Cable
Coaxial cable consists of a pair of conductors, but is constructed differently to allow
the operation of a broader spectrum of frequencies.
Within a coaxial cable is a concentric pair of conductors, with the inner conductor
providing a single wire surrounded by a dielectric insulator. The insulator is in turn
covered by the second hollow conductor ring. The second ring conductor is protected
by a jacket, which forms a shield. Because of the shielding, coaxial cable is less
81
susceptible to noise and interference than twisted pair. Greater distances are possible
as well, as is the support for more attached stations. For example, coaxial cable
supports hundreds of Mbps over transmission distances of 1 km. Although coaxial
cable is more expensive than STP, it provides greater capacity. However, coaxial
cable is less flexible than twisted pair in terms of bending around at the point of
connection.
5.2.2.1 Coaxial Adapters
This device connects an existing (thin) coaxial cable bus network (up to 29 stations)
with a wire hub. The maximum network span is 100 m. The purpose of the adapter is
to connect the BNC T-connector used with 10BASE-2 to the UTP cable whose other
end is attached to the wire hub.
Essentially the coaxial adapter is a two-port repeater connecting one 10BASE-T port
and one this coaxial BNC port (10BASE-2). Through its use, a thin coaxial cable
(200 m length and 29 stations) can be integrated into a 10BASE-T network without
any modification to the existing coaxial network infrastructure.
The 5-4-3 rule applies to each member of the 10Mbps Ethernet family. As a
refresher, the rule indicates that data frames can traverse a maximum of three (3)
populated segments, four (4) repeater hops, and five (5) total segments. Any segment
with a workstation represents a populated segment.
5.2.3 Optical Fibre Cable (RFC 2127)
OFCs are made of three possible substances, in order of decreasing cost and
performance:
• ultra pure fused silica
82
• multicomponent glass
• plastic fibre.
An OFC consists of three concentric rings
• core
• cladding
• jacket.
The core is the central section composed of many fibres. Each fibre is surrounded by
a cladding consisting of either a glass or plastic coating. The outermost ring is
composed of plastic-like materials to protect the cladded fibres.
OFC is being used for long-haul telecommunications, as well as in LAN
environments. The ongoing improvements in technology and the dropping cost to
manufacture are making OFC more and more popular as an alternative medium for
LANs. The key advantages for the use of optical fibre include a significantly high
bandwidth, which permits a much higher data rate than obtainable on copper-based
media, its micro compact size, and immunity to electronic interference from external
sources.
OFC systems cover both the infrared and visible spectra. The different types of OFC
technologies include step index multimode, single mode, and graded index
multimode. Once a light pulse enters an OFC, it behaves according to the physical
properties of the core and cladding.
In multimode step index fibre, the light bounces off the cladding at different angles,
continuing down the core, whereas others are absorbed by the cladding. This type of
OFC supports data rates up to approximately 200 Mbps for distances up to 1 km.
83
By gradually decreasing the refractive index of the core, reflected rays are focused
along the core, more efficiently yielding data transmission rates of up to 3 Gbps over
many kilometers. This type of optical fibre is referred to as graded index multimode.
The last type of optical fibre focuses rays even farther, so that only one wavelength
can pass through the core at a time. This type of fibre is referred to as single mode
and is the most expensive, as well as best performing. Lasers can be coupled to
single mode fibre, permitting extremely high data rates at long distances.
5.2.3.1 Fibre Optic Technology
A significant enhancement to the 10BASE-T Ethernet technology is Fibre Optic
Repeater Link (FORL). Transmission of data occurs along dual fibre cable (1 for
transmission, 1 for reception). OFC technology enables the support of multiple
Ethernet segments at distances of up to 2 km. Using a fibre transceiver, one can
connect remote stations, connect a wire hub and a fibre hub, and support multiple
stations (Davidson, 1992).
When an optical transceiver is used on a wire hub, one can connect to dual fibre
cable. Basic OFC devices are briefly described here mainly for the sake of
completeness and continuity of the discussion.
5.2.3.1.1 Optical Transceiver
This device consists of electronics and circuitry that translate ON and OFF indicators
into the presence and absence of light signals, which in turn are mapped to an
encoding scheme.
84
5.2.3.1.2 Fibre Hubs (Perlman, 2001)
Fibre hubs consist of many FORL ports, one AUI port, and one or more 10BASE-T
ports. The FORL ports link fibre hubs to fibre adapters or fibre adapters to fibre
BAUs in PCs.
5.2.3.1.3 Fibre Adapters
A fibre adapter is a media conversion device, translating between coaxial and optical
fibre. The fibre adapter extends the transmission distance between a wire hub and a
station from 100 m to 2 km, with an adapter required at each end of the fibre link,
unless a station is directly connected to a fibre hub. When attached to a fibre hub, the
distance separation is 2 km. When attached to a wire hub, the maximum transmission
distance is reduced to 15 m. When attached to a PC’s NAU, the separation is again 2
km.
5.3 AN EXCURSION INTO THE ETHERNET FAMILY
In this section, we look at the various members of the Ethernet family, including the
10 Mbps LAN series of standards, 100 Mbps Fast Ethernet, 1000 Mbps Gigabit
Ethernet, and the 10 Gigabit network. The common IEEE notation for these network
solutions is:
< Data Rate Mbps > < Encoding method >< Maximum segment length in hundreds
of metres>
Therefore, using this notation, we can illustrate that a 10BASE-2 network represents
a 10 Mbps baseband network with a maximum segment length of 200 m. In actuality,
the maximum length of a 10BASE-2 network is 185 m, but the IEE nomenclature
refers to the standard as a 200 m maximum length.
85
5.3.1 10 Mbps LAN (Held, 2000)
There are five 10 Mbps LANs standardized by the IEEE
• 10BASE-5
• 10BASE-2
• 10BASE-T
• 10BROAD-36
• 10BASE-F.
5.3.1.1 10BASE-5
The media cable used for this network is 50 ohm coaxial cable, which means less
interference (low noise) and less reflections. The data rate supported is 10 Mbps. The
maximum segment length is 500 m. 10BASE-5 is extensible using repeaters, with a
maximum of four repeaters between any two stations on the LAN based on the
previously mentioned 5-4-3 rule. The maximum LAN length is itself 2.5 km. The
topology is a bus structure resulting in stations contending for access to the bus. The
cable diameter is 10 mm and, at most, 100 nodes are permitted per segment. The
maximum number of nodes per network is 1024 and the maximum node spacing is
1000 m. The cable and wire type is more formally referred to as RG-50/58, with N-
type connectors.
5.3.1.2 10BASE-2 (Stallings, 1997)
A 10BASE-2 network represents a less expensive and less capable LAN in
comparison to the 10BASE-5. In a 10BASE-2 network, the electronics are attached
to the station without an AUI. Although this network uses a 50 ohm bus topology
cable with the same data rate of 10 Mbps as a 10BASE-5 network, there are
86
significant differences between the two. First, the 10BASE-2 cable diameter is 5 mm,
which means it is significantly thinner than the cable used in a 10BASE-5 network.
Because of this, the cable used in a 10BASE-2 network is sometimes referred to as
thinnet or cheapnet cable. Because of the thinner cable, the maximum segment length
is 185 m, with up to 15 m allowed for the tap, resulting in a segment length of 200 m.
The 10BASE-2 network permits a maximum network span of 1000 m, with 30 nodes
per segment allowed, as well as 1024 maximum nodes per network. Minimum and
maximum node spacing on a 10BASE-2 network are 0.5 m and 200 m, respectively.
The maximum number of segments is three.
Note that 10BASE-5 and 10BASE-2 can be interconnected. However, because
10BASE-2 is less noise resistant, unexpected problems can occur when these two
types of network are bridged. The type of cable used for a 10BASE-2 network is
more formally referred to as RG-6/6a/22, with BNC connectors used to attach to the
cable.
5.3.1.3 10BASE-T (Martin & Chapman, 1989)
The 10BASE-T LAN is a twisted wire hub centric network that permits the use of the
prewired installed base of UTP cable in most organizations. Because the 10BASE-T
network is hub based, the topology can be viewed as a star.
The data rate of a 10BASE-T network is again 10 Mbps, and the encoding is
Manchester. The maximum link length is 100 m. 10BASE-T is interoperable with
10BASE-2 or 10BASE-5 networks. The maximum number of coaxial cable segments
in a path between stations is three. The maximum number of repeaters is four and the
87
maximum number of segments is five. Thus, this follows the 5-4-3 rule previously
mentioned in this chapter.
5.3.1.4 10BROAD-36
As previously noted, 10BROAD-36 uses radio frequency modems and does not
encode data digitally. Because this technology is largely superseded and does not use
digital encoding, it will not be discussed further.
5.3.1.5 10BASE-F
The 10Mbps version of Ethernet that operates over optical fibre is referred to as
10BASE-F. This network provides many benefits associated with the use of optical
fibre and can be implemented in three versions:
• 10BASE-FP – this is a star situation using stations and repeaters up to 1 km
per link.
• 10BASE-FL – point-to-point connections of either stations or repeaters up to
2 km.
• 10BASE-FB – defines a point-to-point link that can be used to connect
repeaters at up to 2 km.
The media for all three versions of 10BASE-F is an optical fibre pair, with encoding
occurring using Manchester encoding. 10BASE-FP supports 33 stations per star.
5.3.2 Fast Ethernet (100 Mbps) (Johnson, 1996)
Fast Ethernet provides a low-cost Ethernet networking capability at a data rate of 100
Mbps. This version of Ethernet uses the same frame format as Ethernet 802.3 LANs,
88
as well as the same MAC protocol. The topology is a star, because it is based on the
use of hubs.
Figure 5.1 illustrates what we refer to as the Fast Ethernet tree and indicates the
different versions of this networking technology, as well as the different media the
versions support. As indicated in Figure 5.1, Fast Ethernet comes in three basic
versions:
• 100BASE-TX – the topology is a star, and the encoding method is MLT-3.
The media used are either two pairs STP or two pairs Cat 5 UTP. The
maximum segment length (link) is 100 m. The total network span is 2500 m.
• 100BASE-FX – the topology is a star, with data encoding being either 4B5B
or NRZ-I and uses two optical fibres, one for each direction.
• 100BASE-T4 – topology is a star, with the encoding being either 8B6T or
NRZ. It is the most popular version of Fast Ethernet as it supports the use of
either four Cat 3 or Cat 5 UTP cables.
100 BASE-T
100 BASE-X
100BASE-TX 100 BASE-FX 100 BASE-T4
2 Cat 5 UTP 2 STP 2 Optical Floor 4 Cat 3 or Cat 5 UTP
Figure 5.1 The Fast Ethernet Tree
Each version of Fast Ethernet uses the same MAC protocol (CSMA/CD) and framing
as their 10Mbps cousins. The only difference between the two families concerns the
89
envelopes of framing at 100 Mbps, with Fast Ethernet using special codes referred to
as starting and ending delimiters. Because those delimiters are ignored by network
adapters, the resulting framing is considered to be the same as on a 10 Mbps Ethernet
network.
One key difference between 100 Mbps Fast Ethernet and its 10 Mbps cousins
concerns the 5-4-3 rule. Specifically, the 5-4-3 rule does not apply to Fast Ethernet.
Cable distance is restricted to a maximum of 100 m, and without optical fibre
technology the maximum distance between nodes is 205 m. If two Fast Ethernets are
connected, the distance between the networks is restricted to a maximum of 5 m.
Thus, Fast Ethernet networks cannot be cascaded.
There are two key aspects to selecting the architecture of a Fast Ethernet network:
• backbone operation
• switch segmentation.
5.3.2.1 Backbone Operation
In this context, a 100 Mbps Fast Ethernet hub is used with autoconfiguring 10/100
Mbps ports to connect two or more 10BASE-T networks. Unfortunately, this
scenario results in one big collision domain, without any performance improvement
for users on the horizontal axis.
If some users are relocated from the 10BASE-T hub to the backbone hub, those users
can avail of the 100 Mbps rates in the backbone. This occurs specifically when
servers are placed on the backbone.
90
5.3.2.2 Switch Segmentation
For true performance improvement, replace the 100BASE-T hub with a LAN switch,
which functions broadly, as described elsewhere, by providing N/2 connections each
at 100 Mbps, where N represents the number of ports on the switch.
You can use a switch to provide connectivity with both departmental servers and
individual stations. The former are attached directly to the switch and are serviced at
100 Mbps, whereas the latter are placed on 10 Mbps LAN segments. This design
technique provides additional bandwidth as appropriate to departmental servers,
whereas the total bandwidth is shared on the 10 Mbps LAN segments.
5.3.3 Gigabit Ethernet (1000 Mbps) (RFC 2128)
Gigabit Ethernet is a LAN technology that allows for transmission at data rates of
1000 Mbps. Figure 5.2 illustrates the architecture of a Gigabit Ethernet LAN, with a
1 Gbps LAN switch servicing three 100 Mbps Fast Ethernet hubs, while a server or
perhaps a server farm is supported directly at a 1 Gbps data rate.
100 Mbps
100 Mbps
100 Mbps
100/ 1000 Mbps Module
Server
Hub Hub Hub
Figure 5.2 Server/ Switch Connection
91
Gigabit Ethernet uses the same CSMA/CD MAC protocols as Fast Ethernet and
Ethernet, which makes all three interoperable.
There are four physical layers supported by the Gigabit Ethernet series of standards.
Those standards and their physical media used are summarized below:
• 1000BASE-LX – 1300 nm laser on single/multimode fibre
• 1000BASE-SX – 850 nm laser on multimode fibre
• 1000BASE-CX – short-haul copper twinax STP cable
• 1000BASE-T – long-haul copper UTP.
Data transferred to a specific type of Gigabit media are encoded using either 8B/10B
or 4D PAM5. This is important to know from a design standpoint as it has a bearing
on the data rates.
MAC Media Access Control Layer
Interface
PMD Physical Medium Dependent
Physical Layer
Physical Medium
1000 Mbps
LLC Logical Link Control Layer
Data Link Layer
Reconciliation Sublayer
MDI media dependent
GMII Gigabit Media
Independent Interface
PCS Physical Coding Sublayer
PMA Physical Medium Attachment
Higher Layers
Figure 5.3 Gigabit Ethernet Architecture
92
The basic Gigabit Ethernet architecture is depicted in Figure 5.3. Encoded data are
transmitted to the MAC layer, with the exception of the UTP Gigabit Ethernet where
a special GMII (Gigabit Media Independent Interface) is defined to connect the
physical and MAC layers.
The GMII is an interface that provides 1 byte parallel receive and transmit as a chip-
to-chip synchronous interface. The GMII is divided into three sublayers:
• Physical coding sublayer (PCS) – provides a uniform interface to the
reconciliation layer for all physical media. 8B/10B encoding is used. Carrier
sense and collision detection are functions of the PCS. It also supports the
autonegotiation process for NICs.
• Physical medium attachment (PMA) sublayer – provides a medium-
independent means for the PCS to support various serial bit-oriented physical
media.
• Physical medium dependent (PMD) sublayer – maps the physical medium to
the PCS. It defines the physical layer signaling for various media.
Unlike when using a 1 Gbps LAN switch, which does not require a separate MAC
protocol, because of the available full-duplex operation, when using Gigabit Ethernet
on LAN interconnecting devices, an enhancement is required to the basic CSMA/CD
scheme. This occurs as (Held & Jagannathan, 2004):
• carrier extension
• packet bursting.
Carrier extension represents a way of maintaining the IEEE 802.3 minimum and
maximum frame sizes plus decent cabling distances.
93
Figure 5.4 illustrates the Gigabit Ethernet frame format to include carrier extension.
The carrier extension are nondata extension symbols included as a padding within
the collision window to ensure the minimum length frame is 512 bytes. The entire
padded frame is considered for collision detection, whereas only the original data
without extension are used for error check frame check sequence considerations.
64-byte minimum
512-bytes maximum
Duration of Carrier event
L/T Data FCS ExtensionPreamble SFDDestination
Address
Source
Address
Figure 5.4 Gigabit Ethernet with Carrier Extension
Legend: SFD = Start of Frame Delimiter; FCS = Frame Check Sequence.
Note that carrier extension wastes bandwidth with too many extension symbols per
actual meaningful information sent. Small packets may use up to 448 pad bytes.
But with packet bursting, a burst of packets are sent. The first of these are padded as
before. Subsequent packets are sent back-to-back without extensions, maintaining the
interframe gap and burst time. This procedure substantially improves throughput.
Figure 5.5 shows how packet bursting works.
94
Burst Time (1500 bytes)
Carrier
Sense
Send Data
Packet 1 FFFFFFF Packet 2 FFF Packet 3 FFF Packet 4
Extension Bits
Slot Time 512 B
Figure 5.5 Gigabit Ethernet with Packet Bursting
5.3.4 10 Gigabit Ethernet
As the demand for high-speed networks continued to grow, a need for faster Ethernet
technology became apparent. In early 1999, the IEEE 802.3 committee chartered the
High Speed Study Group (HSSG) to standardize what has come to be known as 10
Gigabit Ethernet. Some of the objectives of the HSSG included:
• support for 10 G Ethernet at 2 to 3 times the cost of Gigabit Ethernet
• maintain earlier frame formats
• meet IEEE 802.3 functional requirements
• provide compatibility with IEEE 802.3 flows
• media independent interface
• full-duplex only
• speed independent MAC
• support for star LAN topologies
• support for existing and new cabling infrastructure.
95
Some of the benefits of 10 Gigabit Ethernet include a low-cost solution for enhanced
bandwidth and faster switching. Because there is no need for fragmentation,
reassembly or address translations, and switches are faster than routers, using 10
Gigabit as a backbone technology provides a mechanism to remove bottlenecks with
a scalable upgrade path.
Upper Layers
MAC Client
MAC
Reconciliation
Presentation
Session
Transport PCS
Network PMA
Data Link PMD
Physical MDI
Application
Medium
10GMII
Figure 5.6 10 Gigabit Ethernet Architecture
Figure 5.6 illustrates the architecture of 10 Gigabit Ethernet. Similar to other
versions of Ethernet, several layers are defined, some specific to the use of different
types of media. Let us examine some of those layers:
• MAC layer – provides a logical connection between the MAC clients of itself
and its peer station. Functions include initialize, control and managing
connections.
• Reconciliation sublayer – acts as a command translator. It maps MAC layer
terms and commands into the electrical formats for the physical layer.
96
• 10 Gigabit media independent interface (10GMII) – functions as the standard
interface between the MAC layer and the physical layer.
• PCS sublayer – codes and encodes data to and from the MAC sublayer. No
standard encoding scheme is defined for this layer.
• PMA sublayer – serializes code groups into bit stream suitable for serial bit-
oriented physical devices and vice versa.
• PMD sublayer – responsible for signal transmission. Amplification,
modulation and wave shaping functions are performed by this sublayer.
Different PMD devices support different media.
• Media dependent interface (MDI) sublayer – references a connector. The
sublayer defines different connector types that attach to different media.
• Physical layer architecture – there are two structures for the physical layer
implementation of the 10 GB Ethernet:
o serial implementation
o parallel implementation.
The former uses one high-speed (10Gbps) PCS/PMA/PMD circuit block,
whereas the latter uses multiple blocks at lower speed. Figure 5.7 depicts
these two situations.
Currently, the preferred media adopted for 10 Gigabit Ethernet is optical fibre.
Because of its high data rate, it is doubtful that a copper version of the technology
can be developed.
97
10 GMII
PCS PCS PCS PCS
PMA PMA PMA PMA
PMD PMD PMD PMD
Reconciliation Reconciliation
Distributor/Collector
Medium 10 Gbps
Figure 5.7 10 Gigabit Ethernet Serial and Parallel Implementations
5.4 LAN ETHERNET DESIGN
The first step in the Ethernet LAN design process is to selct a concentrating device,
such as a wire hub, bridge, LAN switch, or router. All can be at 10 Mbps to 1 Gbps
speed. Then select a transmission medium (and encoding scheme), all appropriately
at 10 to 1000 Mbps speed.
Let us first consider a wire hub and router model, as illustrated in Figure 5.8 where
an I/O port on the wire hub enables an extended 10BASE-T network, basically a star
topology (extended).
Workgroup Workgroup Workgroup
Server Server Server
Building A Building B Building B
Hub Hub Hub Hub Hub Hub
Enterprise Server
Hub HubHub
Backbone LAN
Figure 5.8 Traditional Hub and Router Campus Network
98
What are the advantages of 10BASE-T? In comparison to coaxial cable, it is less
expensive and more flexible, there is an extensive installed base of extra wiring, and
being point-to-point, any breakage impacts only one user.
When using 10BASE-T, connectivity to other types of IEEE 802.3 LANs, such as
10BASE-5 and 10BASE-2, occurs via AUI on the hub. There is one AUI port per
wire hub, which is used to connect to other IEEE 802.3 networks. Figure 5.9 depicts
the interconnectivity between 10BASE-T and 10BASE-5 networks.
T T COAX 10 BASE 5
UTP To TC
next hub
UTP
TC
MAU
Host computer
Wire Hub
UTP
BackBone
Figure 5.9 Interconnecting 10Base-T and 10Base-5 Networks
The next step in designing LANs is depicted in Figure 5.10, wherein Layer 2
switching is used in the core, distribution and access layers. There are four
workgroups attached to the access layer switches. Router X connects to all four
virtual LANs (VLANs). Layer 3 switching and services are concentrated in Router X
as well. Enterprise servers are connected logically to Router X. Router X is typically
referred to as a “Router on a Stick”, serving many VLAN connections.
99
Building A Building B Building C
Hub Hub Hub Hub Hub Hub
Enterprise servers Enterprise server
Hub Hub Hub
ISL - attachedWorkgroup
server
Distribution Distribution Distribution
Core
Router X
Figure 5.10 Campuswide VLAN Design
5.4.1 Campuswide VLANs with Multilayer Switching (Cisco Systems, 2000)
This type of networking structure makes it possible for configured stations to
relocate to a different floor or even a different building, e.g., a mobile user plugs a
laptop into a different LAN port in a different building. Such a situation is typically
handled by the use of a VLAN Trunking Protocol (VTP) and is illustrated in Figure
5.11
Building Building Building
Hub Hub Hub Hub Hub Hub
FEC/ISL server
Hub Hub Hub
Distribution Distribution Distribution
Workgroup
ISL - attached
server
Enterprise server
Enterprise
server
Enterprise
server
Multi Layer
Switch
Figure 5.11 Multilayer Switching
100
We can now see how a computer on a coaxial-cable-based network connects to a
wire hub. A transceiver interfaces the transceiver cable much like a host station is
connected to a coaxial network. In this situation, the NIC is connected via a
transceiver cable to the transceiver on the LAN. However, when a coaxial cable NIC
is attached to a wire hub, an AUI adaptor interfaces the NIC with a UTP that
connects to the wire hub. Figure 5.12 illustrates the use of an AUI adaptor so that a
NIC can be connected to a wire hub port using a UTP cable.
AUI Connector
PC
AUI Adapter
Wire Hub
NIC
Figure 5.12 Connecting a Coaxial Cable NIC to a Wire Hub
5.5 SWITCHES REVISITED (Perlman, 2001)
Switches are a fundamental aspect of most networks. They allow source and target
nodes to communicate over a network at the same rate without slowing each other
down by sharing multiple simultaneous connections. Switches have been discussed
in detail elsewhere and are revisited here briefly for the sake of continuity.
The following problems were observed with a hub-based network configuration.
(Zhang, 1989) provides a historical perspective.
101
5.5.1 Scalability, Latency, Global Effect of Failures and Collisions
By adopting the schematic configuration in Figure 5.13, it becomes possible to
alleviate some of the problems associated with a hub-based network configuration.
Switches alleviate many hub-based problems by dedicating bandwidth to individual
unicast communication paths or connections. So, if there are N ports on the switch,
with each connection at 10 Mbps, the switch itself delivers N/2 x 10 Mbps to the
configuration.
BACKBONESWITCH SWITCH
HUB HUB
Figure 5.13 Hubs and Switches
5.5.2 Encoding Schemes
In concluding this chapter, we will briefly review LAN encoding schemes as they
define how it becomes possible to achieve a given data rate on a particular media that
supports a given signaling rate. Various common LAN encoding schemes are
described below in order of increasing complexity.
5.5.2.1 Nonreturn to Zero Level
NRZ-L is perhaps the simplest encoding method (see Figure 5.14). Under this
encoding scheme, a high voltage is used to indicate the presence of a one bit, and the
102
absence of voltage is used to indicate the presence of a zero bit. Because two or more
successive set or nonset bit positions need clocking to differentiate the bits, it would
be expensive to use this coding on a LAN.
5.5.2.2 Nonreturn to Zero Invert on 1s
Under an NRZ-I encoding scheme, one maintains a constant voltage pulse per bit
time. Data is encoded as the presence or absence of a transition at the beginning of a
bit time. A transition at bit start signifies a binary 1. No transition is a binary 0. This
is a case of differential coding. Its main benefit is that it may be more reliable to
detect a transition in the presence of noise than to compare voltage threshold values.
Another consideration is that it sometimes becomes easy to lose polarity of the
signal. Because there are problems with NRZ - in particular the loss of time when
one bit ends and another one begins, which leads to a drift in timing and ensuing
corruption of the signal - other coding schemes were introduced.
5.5.2.3 Manchester
In Manchester coding there is a transition at the middle of each bit period. A high-to-
low transition is a binary 0, whereas a low-to-high transition is coded as a binary 1.
Notice that this coding method provides a self-clocking function as individual bits
from pairs of set or nonset bits can be distinguished.
5.5.2.4 Differential Manchester
Under Differential Manchester encoding, the mid bit transition does only clocking.
The presence of a transition at the beginning of a bit period means a binary 0. No
transition at the beginning of a bit period is a binary 1.
103
NRZ-L
NRZ I
Manchester
Differential Manchester
Figure 5.14 Some Basic Encoding Schemes
Both Manchester and Differential Manchester coding techniques are popular for
LANs. They are sometimes called biphase, because there may be as many as two
transitions per bit time. So, the maximum modulation rate is twice that of NRZ,
which in turn means more bandwidth.
The advantages of Manchester and Differential Manchester coding include
synchronization (based on transitions), no DC component, and a built-in error
detection capability, because noise needs to invert the bit before and after, which is
unlikely.
5.5.2.5 4B/5B-NRZ-I
Under 4B/5B encoding, encoding is done four bits at a time. Every four bits
translates into five code bits. The efficiency is thus 80 percent. For further
synchronization, each code bit is treated as a binary value and further encoded with
NRZ-I. This scheme lends itself to optical fibre transmission.
104
5.5.2.6 MLT-3
Under MLT-3, three levels are used for encoding a binary 1:
• A positive (+ve) voltage
• No voltage
• A negative (-ve) voltage.
The steps involved are:
• If the next input bit is 0, the next output value persists.
• If the next input bit is 1, there will be a transition as follows:
o preceding value +/- � next output is 0
o preceding value 0 � next output is nonzero, and opposite in sign to
last nonzero output.
Note that this in turn implies that if the signaling rate is one-third of the operating
rate, a baud rate of 33.33 MHz will support a LAN at 100 MHz.
5.5.2.7 8B/10B
8B/10B encoding is popularly used in fibre channel and Gigabit Ethernet. Under this
encoding technique, each eight bits of data translates into ten bits of output. 8B/GB
was developed and patented by IBM; it is more powerful than 4B/5B in terms of
transmission features and error detection. Figure 5.15 illustrates an example of the
generic mB/nB encoding, mapping m source bits into n output bits.
105
Encoding Switch
Control
Adapter Interface
5B/6B
functions
3B/6B
functions
Disparity
Control
Figure 5.15 8B/ 10B Encoding
Note the use of a functionality called disparity control - this basically addresses the
excess of 0s over 1s and vice versa. An excess in either case is called a disparity. If
there is one, the disparity control block complements the 10-byte block to redress the
problem.
106
6
BLACK BOX CONGESTION
CONTROL 6.1 THE BASIC PROBLEM
In one picture of the Internet, we see a constellation of switching elements (a.k.a.
routers/switches/ISs) [IS = Intermediate Systems]. So there are a number of
potentially different paths from one ingress host to another egress target host. This is
the key notion to appreciate: a host at one end of the Internet communicating with
another host on the “other side” of the Internet via packets of information sent across
links connecting a sequence of routers. “Flow control” is the consideration that a
very fast or “aggressive” ingress end-host should not overwhelm a lower-capacity
egress end-host which would then be forced to drop fast arriving packets which it
cannot process at their incoming speeds. This then is flow control, an end-host to
end-host consideration. Whereas “congestion control” brings to bear a slightly
different consideration. Can the ingress host overwhelm the network itself (Internet)?
Can a host inject packets into the network so fast that in-between routers are unable
to cope with the volume of incoming packets, and so are forced to drop some. This
is “congestion” and we need provisions to deal with it. As will be seen, there are two
aspects to this problem (Clark & Fang, 1998; Clark & Tennenhouse, 1988; Clark,
1988; Anjum & Tassiulas, 1999; Bajko et al. 1999):
107
• Given a pair of end-hosts in communication, regardless of the innards of the
routers in the network, how is it possible to optimize network usage? This
may be called the Black Box approach to congestion management. Examples
are TCP Tahoe, TCP Reno and TCP Vegas. We also introduce a novel Black
Box mechanism, tentatively called “Sierra”. Black Box congestion control is
the subject of this chapter (Balakrishnan et al. 1999; Jain, 1990).
• The related Inter-Network counterpart, White Box modeling, then attempts to
optimize the innate workings of the (sequence of) routers (switching
elements) that connect the communicating hosts. Examples are FIFO, Tail
Drop, and the RED family (Charny et al. 1995) of routers. This White Box
approach is out of scope for this thesis. (Bennett & Zhang, 1996; Athuraliya
et al. 2001)
We observe here that we are in the “Unicast” situation, where one end-host is
communicating with one other end-host across the Internet. In our arrangement, one
first optimizes the network usage, then proceeds to optimize the network itself
(internally). Black Box followed by White Box. By this method, one gets an optimal
solution to the congestion management problem itself. The terminology of “Black
Box” and “White Box” models was first introduced by (van Jacobson, 1988), one of
the pioneering figures in the field of congestion control, as early as 1988, but it
would appear that his overtures have not been generally followed through to a logical
conclusion in the literature since then (Clark et al. 1988; Benmohammed & Meerkov,
1993; Morris, 1999 provide a general treatment).
108
6.2 THE BLACK BOX APPROACH DESCRIBED
Fundamental to all the known Black Box models (except perhaps TFRC [TCP
Friendly Rate Control] and Sierra) is the notion of a congestion “window”. The
window is believed to encapsulate the innate “capacity” of the network to deal with
incoming packet flows between pairs (Unicast) of communicating end-hosts. Realize
that the congestion window is a key design concept for most Black Box models. For
a general discussion see (Tanenbaum, 2003; Stevens, 1997).
The congestion window value is based on the so-called Delay & Bandwidth product,
i.e.,
Window = Delay x Bandwidth
In this equation, Delay is the time (RTT) taken for packets to be delivered and acked
from ingress to egress. And Bandwidth is the number of packets that the network can
deliver per unit time to the egress. If one sent a single packet and waited for an ack,
one could send at most 1 packet per RTT. However, if one discharged a Window’s
worth of packets, then one can send that many packets per twice the RTT (Cruz,
1987; Keshav & Morgan, 1997; Kalambi et al. 2002; Morris, 1997). Also see (Wang,
1999; Wang & Crowcroft, 1991) for an interesting discussion.
But it remains to be seen if this is the best way to do the Black Box model, i.e., in
terms of the macro notions of Delay and Bandwidth. Most of the well known
congestion control protocols in existence are window-based and assume that Delay
and Bandwidth are key descriptors of network flow. In the sequel, as promised, we
will look at some alternatives to window-based congestion control, most notably:
• TFRC (TCP Friendly Rate Control) (Mahdavi & Floyd, 1997), and
• Sierra (introduced in Jagannathan & Matawie, 2005).
109
6.3 TCP TAHOE AND RENO
TCP Tahoe and Reno were introduced by van Jacobson as a window-based end to
end congestion management scheme.
Tahoe operates in three stages: slow-start, congestion avoidance and timeout. In
slowstart, the window increases by one unit for every acknowledgement received. So
we see that the window effectively doubles (multiplicatively) each round-trip time.
When TCP Tahoe starts up, it enters slow-start, and stays there till a loss occurs
(under the presumption that a loss is almost always caused by congestion overflow at
in-between router buffers). [This loss is detected only after waiting for a timeout or
waiting for an acknowledgement.] When loss is detected, TCP Tahoe sets the
Ssthresh (slow start threshold) to cwnd/2 and then sets cwnd to 1. After a timeout the
system remains in slow-start till cwnd reaches Ssthresh at which instance the system
enters congestion avoidance. In congestion avoidance TCP slowly, additively,
increases the window size and Ssthresh with each ack as follows:
cwnd � cwnd + 1/cwnd
This has the effect that, after receiving a full window of acks, the window has
increased by exactly 1. This is additive increase. Note that on the other hand there is
multiplicative decrease in window size, because Ssthresh decreases multiplicatively
from cwnd. This process is sometimes called AIMD [Additive Increase &
Multiplicative Decrease] (Chiu & Jain, 1989; Vojnovic et al. 2002).
If a retransmitted packet is lost, Tahoe calls up the exponential retransmit timer
backoff algorithm. This algorithm dictates that with each successive retransmission
of a packet, TCP should double its timeout value. This procedure allows TCP flows
to send at far less than 1 packet per RTT, so that many flows are able to share a
110
bottleneck without loss of stability. When a packet is finally transmitted successfully,
TCP Tahoe returns to slow-start with cwnd = 1.
SS :
cwnd += 1 per ack, and so
cwnd += cwnd, per window of acks
CA :
cwnd += 1/cwnd, per ack, and so
cwnd += 1 per window of acks
One problem with TCP Tahoe is that it regresses too much in response to an isolated
(single packet) loss. This can lead to burstiness rather than persistent overload so a
less stringent backoff may be posited.
van Jacobson fixed this problem rather quickly by introducing TCP Reno with its
two new procedures:
• Fast retransmit
• Fast recovery.
Continuing our discussion above on AIMD, consider the following alternative
behaviour (Floyd et al. 1999) for controlling behaviour, introduced as noted because
Tahoe is too stringent after timeouts.
Duplicate Acks (discussed below in the section on Ack & Ack clocking) are sent that
mention only the last correctively received segment. When 3 duplicate Acks <k> are
received, regard this as a symptom of congestion due to non-received packet <k+1>,
and retransmit <k+1>. The receiver, when he receives <k+1>, then sends a
cumulative Ack acknowledging all received packets after <k+1>, which then do not
have to be re-transmitted. This behaviour is called “Fast Retransmit”. In Reno, Fast
Retransmit is coupled with another related behaviour, “Fast Recovery”, that
111
overcomes Tahoe’s harsh penalty of dropping cwnd to 1 after congestion is detected.
This behaviour works as follows, once congestion is detected:
Ssthresh = cnwd / 2
cwnd = Ssthresh (Thus re-enter CA, not SS).
This is Fast Recovery.
A further discussion of these concepts may be found in (Stevens, 1997). One
problem with Reno is manifest when multiple packets are lost within a single
window (despite fast retransmit/recovery). One solution to this, New Reno, was
offered in (Floyd & Henderson, 1999), and is discussed below after Ack / Ack
clocking.
6.4 ACK’ING AND ACK CLOCKING
Recall that the congestion window cwnd determines how many packets can be
launched into the network, pending acknowledgement. Packets launched into the
network are qualified by the following key settings in the TCP header:
• Sequence #
• Ack #.
Each segment of data sent into the network has a certain length and consists of a
certain number of bytes. Each byte has a “sequence #”, with the seq # of a segment
being the seq # of the first byte of that segment. In other words
Seq # <2> = Seq # <1> + Len [seg <1> ]
Seq # <k> = Seq # <k-1> + Len [seg<k-1>]
If IP fragments a segment, it places the same, segment specific < IDENTIFICATION
# > for all packets in that segment. This enables the egress node to successfully
reassemble fragmented segments (Floyd & Fall, 1998; Floyd & Jacobson, 1991).
112
At the destination, the receiver acknowledges received segments. Here there are two
mechanisms.
Assume that the source TCP has sent out the following segments with their
associated Seq #’s :
<Seqk> , k = 1,2,3,4, etc.
Suppose that segment < ka > is “lost” en route due to congestion. Then there are two
behaviours possible at the egress TCP. Every successfully re-assembled segment is
acknowledged by an Ack # bearing its Seq #. If a segment is not fully received and
assembled, the target is silent, and eventually the sender times out and re-transmits
the lost segment(s). For a leisurely discussion of TCP/IP as relating to our problem,
see (Held, 2002; Blake et al. 1998).
Consider the following scenario. Segment <k> is lost, whereas segments <k+1> and
<k+2> arrive intact. The latter two are not acknowledged by the receiver and so the
sender TCP does not know that they have been delivered. After segment <k> times
out it is re-transmitted. So segments <k>, <k+1> and <k+2> arrive out of order at the
destination. The next Ack from the egress then specifies cumulatively that up to
segment <k+2> have been received, so (unless it has timed out <k+1> and <k+2> as
well) the sender is aware that they have been delivered.
There is yet another variant here. Why not Ack out-of-order segments as well,
instead of being silent when an earlier segment is not fully received? This is partly
because of the TCP Header, which has provision only for a PAR, i.e., if Ack # = N,
all segments up to and including N have been successfully received. TCP can handle
only one problematic segment per window. If there is more than one problematical
segment per window, TCP’s RTT timer value backs off exponentially.
113
This simple observation leads to two mechanisms referred to in the literature as
SACK (Selective ACK) and D-SACK (Duplicate SACK). SACK notes which
segments have been successfully received (instead of silence when an earlier
segment is in error), and therefore which were not received, so that the sender is able
to proactively determine and transmit precisely the missing segments, rather than
having to wait for timeouts and re-transmitting the entire window again. Having
motivated these concepts, SACK and D-SACK are treated in more detail in the
sequel.
6.5 TCP NEW RENO
Timeouts affect TCP performance in two ways. Firstly, a flow has to wait for a
timeout to occur and cannot send data in that period of time. Secondly, after re-
transmission timeout occurs, cwnd goes back down to 1. These facts can adversely
affect the flow’s throughput and performance (Floyd, 1999; Floyd & Henderson,
1999).
Whereas in Reno, partial Acks cause an exit from Fast Recovery, this results in a
timeout in case of multiple segment losses. In New Reno, when a partial Ack is
received at a sender, it does not come out of Fast Recovery. The presumption instead
is that the segment immediately after the most recently acked one was lost, and hence
the lost segment is re-transmitted. Therefore, when multiple segments are lost, New
Reno does not wait for a retransmission timeout and continues to re-transmit lost
segments every time a partial Ack is received. Therefore, in New Reno, Fast
Recovery starts when 3 dup-acks are received and terminates when either a re-
transmission timeout occurs or when an Ack arrives acknowledging the outstanding
data when the Fast Recovery began. Partial Acks deflate the congestion window by
114
the volume of new acknowledged data and then add one segment and re-enter fast
recovery.
6.6 SACK AND D-SACK
The philosophy behind SACK is that of transcending TCP’s limitation to handle at
most 1 problematic segment per window of data. What SACK does is for the receiver
to use dome of the “Options” fields in the TCP header to “selectively” lay out to the
sender exactly which segments have arrived successfully (and so which have not), so
that the sender can then send precisely those segments.
There have been a number of protocols using the SACK idea. However, we believe
that SACK has not been widely standardized in the Internet probably because of a
lack of consensus as to how the Options field is to be utilized. Also not agreed
universally, is how re-transmission is avoided for SACK’ed segments in the sender’s
re-transmission queues (Floyd et al. 2000). Also see (Jacobson & Braden, 1988) in
this connection.
As with other end-to-end mechanisms, the SACK method requires both end-points to
concur that SACK is being used, i.e., both ends are SACK compatible. For one
possible implementation of SACK, see (Mathis et al. 1996).
D-SACK (Duplicate-SACK) combines SACK information from the receiver with
additional data acknowledging receipt of duplicate segments, which (the duplicate
segments) are thus identified for the sender. Then the sender (at most once per
window) takes corrective action to remediate the re-transmission by “un-halving”
Ssthresh, the Slow Start threshold, and reverting to either Slow Start or Fast
Recovery. D-SACK can be useful in environments with persistent re-ordering of
packets. D-SACK is also discussed in (Mathis et al. 1996).
115
6.7 FACK
Forward Acknowledgements (FACK) also attempt to solve the problem of better
recovery from multiple losses. FACK derives its name from the protocol keeping
track of the highest sequence number for correctly received data. (Mathis &
Mahdavi, 1996)
With FACK, TCP registers two more variables:
• F_ack : for the most forward segment acknowledged by the receiver using
SACK
• Re_data : for the total amount of outstanding, re-sent data in the network.
With these variables, it is possible to calculate the amount of outstanding data during
recovery as
F_ack - Re_data
FACK TCP moderates this value (total outstanding data in network) within one
segment of the congestion window cwnd. The latter itself is constant during fast
recovery. Also the F_ack variable is used to more readily trigger fast retransmit
(Floyd & Jacobson, 1991).
6.8 LIMITED TRANSMIT
It has been observed in many places in the literature that in the archetypal scenario of
the Internet:
• 56% of all retransmissions are due to timeouts, and
• only 44% are due to triple duplicates.
116
This observation, coupled with the fact that timeouts are so expensive, has led a
number of researchers to propose the Limited Transmit (LT) (Allman et al. 2000)
mechanism.
LT works very simply, by allowing a sender to send new segments after each of 2
duplicate Acks, instead of waiting for 3 duplicate Acks.
By this method, it is noted that:
• over 25% of the above (56% timeouts) can be avoided.
6.9 TCP VEGAS
The Vegas algorithm is also a window-based, TCP Protocol (Brakmo & Peterson,
1995). It operates in a manner using variation in round trip delays to sense and
counter congestion in the Internet. The basic paradigm is to increase the congestion
window size when delays decrease, and decrease the window size when delays rise
up. Vegas is further developed in (Low et al. 2001; Mo et al. 1999).
The algorithm works as follows:
• Set two control (design) parameters, α and β.
• Calculate the expected throughput, E, where
E = [Current window size] / [BaseRTT]
where BaseRTT is the smallest RTT (Round Trip Time) seen by the source
up to time t. This is estimated to be the real propagation delay.
• Calculate the Actual throughput, A, where
A = [ # acks received ] / RTT
[by sending a distinguished control packet, and computing RTT as the difference
between the Ack reception time and the transmission time for the distinguished
control packet]
117
During this RTT, count the total number of Acks received.
• Use the control equations
o α ≤ E – A ≤ β � Leave Window unchanged
o α > E – A � W = W + 1
o β < E – A � W = W – 1
Whereas it is difficult to directly interpret the dynamics of TCP Vegas in terms of the
more traditional congestion control algorithms presented earlier, we later (Chapter 5)
offer a simple calculation of a lower bound to the throughput offered by TCP Vegas.
We then compare this with the throughputs offered by Tahoe/Reno and Sierra. Also
in the simulation exercise presented in the appendix, we compare, both individually
and jointly, the throughputs offered by all the three major congestion control
algorithms considered in this thesis, i.e., Reno, Vegas and Sierra. We then are able to
compare the relative merits (in terms of throughput) of the algorithms (Ahn et al.
1995; Boutremans & Le Boudec, 2000).
6.10 SIERRA
Sierra was alluded to in (Jagannathan & Matawie, 2004). Here we present more
details.
Sierra uses three basic parameters to optimize network usage:
• RTTs (Round Trip Times)
• Ro,α (T) (Rate, o = egress/output, α = connection)
• Ri,α (T) (Rate, i = ingress/input, α = connection).
118
Unlike in earlier analyses, we do not perform any exponential smoothing of these
parameters. The key contention is:
• Neither Timeouts, nor Duplicate Acks, nor RTT fluctuations are ideal descriptors
of network congestion.
• Use instead RTT measurements coupled with information about ingress/egress
rates to monitor network congestion.
One reason why Timeouts, D-Acks and RTT+/- are not used is because there can be
potential re-routing within the Net, causing these parameters to fluctuate.
So, how do we use RTT, Ro,α (T), and Ri,α (T) to achieve improvement in
performance?
The basic Sierra algorithm can be summarized in a simple manner:
Ro,α (T + RTT(T)/2) < Ri,α (T) � Ri,α (T + ∆T) = Ri,α (T) - ∆
Ro,α (T + RTT(T)/2) ≥ Ri,α (T) � Ri,α (T + ∆T) = Ri,α (T) + ∆
The key issues are therefore:
Q: What is the granularity of time intervals?
A: This is dictated by the averaged out Round Trip Times as measured at the egress.
Q: What is the granularity of Packet Sizes?
A: This is dictated by the maximal screen window that we wish to convey.
With Sierra there is no need to acknowledge every packet, as with conventional TCP
congestion control algorithms. Specifically
Ro,α (T + RTT(T)/2) > Ri,α (T) � No losses or congestion.
Sierra’s policy regarding Acks is to use the Ack space within the TCP header to carry
information about:
119
• RTT(T)
• Ro,α (T) and
• Ri,α (T).
In a nutshell, here is how Sierra works.
At T = To the ingress sends R packets, each timestamped with the sending epoch, i.e.,
1,2,3…
Shortly afterwards, this window arrives at the egress, where the following are
computed using an averaging process:
• Ro,α (To)
• RTT (0)
These are averaged out within a window of ∆o, which is the maximum jitter we are
able to tolerate. These two parameters are sent back to the egress, using several
control packets (to guard against loss of control information).
At the ingress, the following computations are performed:
• T1 = T0 + RTT(0)
• Ri (T1) = Ri (T0) +/- ∆
According as
Ro (T0) >/< Ri (T0) [where >/< means greater or lesser than]
The process continues as shown in the table below, with
• Tk = Tk-1 + RTT(k-1)
• Ri (Tk) = Ri (Tk-1) +/- ∆
According as
Ro (Tk-1) >/< Ri (Tk-1) [ where >/< means greater or lesser than]
120
As the process continues, the following questions arise:
1. How to determine Ro (Tk) and RTT(k)? Our suggested averaging process is more
robust than using RTT fluctuations due to, as mentioned, re-routing variations.
2. What if control information is lost? How much redundancy to be built in?
3. What about Sierra’s “slow” starting?
4. What is ∆ ?
These are design questions that will be tackled later, using simulation variations.
What QoS guarantees can Sierra commit to communicating ESs?
Firstly, from a theoretical perspective, as mentioned, maximally efficient network
utilization. Given a TCP connection, it is not possible to use the Black Box network
more efficiently. Sierra starts with a slow start like mechanism, dubbed “Sierra
Quick Start” (SQS) to attain near optimal ingress rates. After that it’s a question of
using the control equations to maintain the sending rate at a high level of stability.
Here is the Sierra algorithm in detail:
SCAM (Sierra Congestion Avoidance Method)
1. Send Ro Bytes around I = (To - ∆o , To + ∆o).
2. Mark 2N of these as control packets associated with Epoch 0. These are to be
spread evenly around I. We assume that N of these will be lost. So the egress
waits till N of these are received.
3. At the egress, using N control packets, compute RTTo.
4. Also at the egress, compute Ro (0) by summing over Epoch 0 packets over RTTo
+/- ∆o.
5. Send this information, again using 2N control packets, back to the ingress.
121
6. At ingress, using N control packets (N lost), compute RTTi .
7. Set RTT = RTTo + RTTi.
8. Apply the control equation for future epochs.
9. Note that key Sierra design parameters are
a. What is Ro ?
b. What is ∆o ?
c. What is ∆ ?
d. What is N ?
We suggest, in the first instance,
a. Ro = 1.5 MBps
b. ∆o = 5 ms
c. ∆ = 10 KB
d. N = 5.
N.B. The SCAM algorithm itself builds into “Sierra Quick Start (SQS)” which itself
works as follows:
SQS (Sierra Quick Start)
• Identify the optimal level of service desired from the network. Call this Ro
(bytes/sec).
• Set Ri (0) = Ro.
• Launch Ri (k = 0) packets within a ∆o window.
• At egress, as in SCAM, compute Ro (k = 0) as well as RTT o (k = 0). Send this
information back to egress.
• At ingress, compute RTTi (k = 0). Set RTT(0) = RTTi + RTTo.
• If Ri (0) > Ro (0), then Ri (1) = Ri (0) / 2.
122
• By induction,
o If Ri (k) > Ro (k) then Ri (k+1) = Ri (k) / 2.
• When Ri (n) ≤ Ro (n)
o exit SQS
o proceed to SCAM.
6.11 TCP FRIENDLY RATE CONTROL (TFRC)
It is well known that when TCP and UDP connections share a communications link,
UDP tends to hog the bandwidth, because it does not back off under congestion like
TCP does. AQM mechanisms like FRED tend to provide a fairer share of bandwidth
distribution between TCP and UDP, unlike the case of plain RED and Tail Drop
routers (Bennett et al. 1994; Karandikar et al. 2000; Lo Monaco et al. 2001).
A notion of TCP Friendly Rate Control has been proposed according as “a non-TCP
connection should receive the same share of bandwidth as a TCP connection, if they
traverse the same path” (Deb & Srikant 2003; Tsang & Wong, 1996).
Such friendly connections estimate the bandwidth of TCP connections and
subsequently regulate their own ingress rates (Bonomi & Fendick, 1995; Charny et
al. 1995).
TFRC has been developed in a number of investigations (Mahdavi & Floyd, 1997). It
uses a simple stochastic model of Internet behaviour to analytically model congestion
behaviour. In the final analysis, the method reduces to using a control equation that
limits the sending rate of an EBCC (TFRC) [Equation Based Congestion Control]
sender to the steady-state throughput (not goodput) of the connection. By capping the
sending rate of the ingress to a maximum of
123
λ (Throughput) = max ( Wmax / RTT , )321(8/33,1min(3/2 2
0 ppbpTbpRTT
s
++)
it becomes possible for connections to avoid the flapping behaviour of AIMD in
window sizes, thus maintaining a smoother behaviour.
In the above equation:
T = ingress rate (bytes/sec)
B(p) = steady state throughput, not goodput
RTT = Round Trip Time
p = steady state loss event rate
To = TCP retransmit timeout value
s = packet size.
It may be noted that the control equation above is only one particular model of the
stochastic behaviour of TCP connections and what it does is to maintain the sending
rate close to the “steady state” of Tahoe and Reno. Such “friendly” flows perform
TCP friendly rate control, by observing network parameters, and capping the sending
rate using the above formula. This approach has implications for transmitting
multimedia over the Internet (Et, 1994; Kung et al. 1994; Mishra & Kanakia, 1992).
We believe that our proposal, Sierra, arrives at a maximal upper bound for ingress
rates into the network, without presuming the validity of an underlying windowing
mechanism. As a result, it at least improves performance, which will be
demonstrated, using throughput levels. (Zhang & Ferrari, 1993, 1994).
124
6.12 MO-WALRAND ALGORITHM
This is an encouraging development from our point of view, as it clearly
differentiates the Black Box and White Box approaches to congestion control.
In (Mo & Walrand, 2000), the authors use a backlog estimator, not unlike Vegas, but
their (Mo-Walrand) connections constantly adapt their window size (MWcwnd),
proportionally with the separation from a target backlog. They demonstrate that their
scheme (Mo-Walrand scheme) is “proportionately fair” in the sense of Kelly. But we
observe at the outset that their scheme has some of the same problems as Vegas, due
to sub-optimal RTTD estimation.
The authors observe that when connections arrive at a bottleneck which is
maintaining an ongoing queue, later arrivals sense a bloated RTTD so they then
adjust their window sizes to suffer a backlog between A and B, in addition to the
extant queue. If the system is not tampered with, the standing queue persists forever.
They suggest using RED to alleviate this, but this is against our two-pronged
philosophy at large, and can be suboptimal. (Ramakrishan et al. 1988, 1990, 1999)
Elsewhere we simulate the performance of the Mo-Walrand mechanism, in the
presence/absence of competing protocols.
6.13 PACKET PAIR
Packet Pair is an ingenious mechanism proposed by S. Keshav (Morgan & Keshav,
1999), and it works as follows.
Each source probes its available bandwidth in the network by transmitting piggy-
backed pairs of probe packets and measuring the distance (separation) between the
resulting pairs of acknowledgements, while periodically adapting the send rate in a
125
manner as to avoid overflow/underflow at the bottleneck switch’s buffer. There is a
timeout and re-transmission mechanism to cope with losses. (Keshav, 1991, 1997)
Keshav has tested this approach with simulation experiments. These appear to show
that packet-pair sources are retransmit stable, at least up to 500% nominal offered
loads. A buffer management of “drop entire longest queue” appears to give packet-
pair a significant advantage over non-packet-pair sources in times of overload.
In summary, under congestion, packet-pair is stable and can manage with a fraction
of the buffer capacity at switches (vis-à-vis round trip window) and offers
asymptotically good throughput of 83-92% (83% for “drop longest entire queues”,
and 92% for “drop last packets”).
When packet-pair and non-packet-pair sources share a congested connection path,
the former appear to have a decisive advantage over the latter.
Elsewhere we evaluate the relative merits and demerits of packet-pair performance,
in the presence of competing mechanisms.
6.14 BALAKRISHNAN & SESHAN’S CONGESTION MANAGER
In (Balakrishnan et al. 1999) the authors offer a Congestion Manager (CM), which
works at the end systems, to manage congestion state between connections with the
same end to end path. CM leaves error control to higher layers.
CM has a congestion control protocol which works as follows:
Twice per RTT, the source end-system inserts a (probe) control packet into the net,
armed with a probe sequence number, and the number of bytes sent since the last
probe. If the target detects a difference in the number of bytes received vis-à-vis the
number reported in a probe, then they report a loss back to the source. Lost probes
are handled via the sequence number mechanism. As long as a probe is not reordered
126
with data packets, this CM mechanism works well to detect information loss. If a
probe is re-ordered, later probes will remediate the byte count, and thus CM recovers
from a false loss detection, by undoing the backoff.
6.15 THE ‘GOODNESS’ OF ANY BLACK BOX SOLUTION
The following criteria may be applied:
• Throughput over time (a plot of the rate at which data is delivered, including
incorrect/unacceptable data).
• Goodput over time (we need to differentiate acceptable delivered data, versus
unacceptable data, for instance data within and outside of the jitter window. What
is the delivery rate of good data?).
• Fairness behaviour (for instance, when Sierra and Vegas share a bottleneck, how
is the bandwidth distributed between them. Similarly, when two Sierra
connections, with different design parameters, share a path, how is bandwidth
distributed? (Anjum et al. 1999; Kelly, 2003; Boutremans et al. 2000; Hasegawa
et al. 2000).
• Steady state behaviours (limiting behaviour when the time factor becomes very
large) (Cruz, 1987).
• Stability (Bennett et al. 2001; Georgiadis et al. 1997; Jain 1989, 1995).
In this thesis, we have studied the first three aspects, namely:
• throughput
• goodput
• fairness.
Further study of steady state behaviour and long term stability, as well as the direct
effect of Sierra parameters (on performance), will be studied in future research.
127
7
STOCHASTIC MODELLING OF
CONGESTION CONTROL
ALGORITHMS
7.1 MOTIVATION
Consider an Internet traffic source sending packets into a single link connected to
(an)other source(s). The source’s “window” is the maximum number of packets
discharged without waiting for acknowledgement, at any point in time. Such a
window concept is the cornerstone of a number of congestion control algorithms,
such as Tahoe, Reno, New Reno, Vegas and Sierra (partly rate-based). All these are
briefly treated. We then develop quantitative models for the performance
(throughput) levels of all these algorithms. It is shown that Sierra is by far,
analytically, the most superior among all these algorithms. A goal of this chapter is
to introduce the neophyte in congestion management to the discipline and then be in
a position to contribute meaningfully to the literature. As such, all concepts
introduced are motivated and developed from the ground up.
7.2 INTRODUCTION
We begin the discussion by briefly describing the TCP/IP protocol stack, for the sake
of completeness and overview. This is followed by a quick treatment of some
128
popular congestion control algorithms, including TCP Tahoe/Reno, TCP Vegas and
our Sierra. Sierra was developed and elucidated in detail in a series of publications
by Jagannathan and Matawie. We have also included in this thesis all the various
details from these articles. Our position is that researchers can gain further insight
into the behaviour of TCP through the use of mathematical modeling. The basic
paradigm is the “buffer overflow model” which is the cornerstone of all our analysis
of key congestion control algorithms. Unlike, say, systems in modern physics, it is
true that all aspects of TCP and its progressive evolution in behaviour over time are
fully under our control. However, the sheer scale of TCP’s operation and domain is
tremendous, and is probably the very largest and most complex man-made control
system ever deployed. To attempt to “capture” any system of this magnitude, we
need mathematical models. Indeed there are, in the literature, a number of such
(complex) models. On the other hand, the tools we use in this chapter are relatively
simple in nature, and revolve around the buffer overflow model (Floyd & Jacobson,
1991,1993). We provide proofs on select occasions, for the sole purpose that these
techniques may be adapted and applied to the study of congestion management.
7.3 TCP/IP STACK OVERVIEW
The TCP/IP (Transmission Control Protocol/Internet Protocol) protocol offers a
structured, layered architecture for Internet communications, and has widely
superseded its open peer, the OSI (Open Systems Interconnection) protocol. Whereas
the OSI stack had 7 layers, TCP/IP has only 5 layers (Tanenbaum, 2004; Braden,
1998; Davidson, 1992). These five layers are
• Physical Layer
• Data Link Layer
• Network Layer
129
• Transport Layer
• Application Layer.
We briefly describe these layers now, for the sake of overview and completeness, to
place our further discussions in perspective.
The Physical Layer deals with the transmission of raw bits over a communications
channel. Here it is ensured that if a “high” bit is sent, then it is received as a “high”
bit, not a “low” bit. Such matters as voltage representation of 1s and 0s, the time
duration of individual bits, how an initial connection is established and then
destructed after the completion of the communication, are all matters that are handled
at the Physical layer. We are primarily concerned with the interfaces (mechanical,
physical and procedural) with the basic underlying physical transmission medium
underneath. (Black, 1998).
The Data Link Layer is concerned with all the protocols that collect a number of bits
into a “frame” and delivers that frame from one end of a communications channel to
the other. The pertinent issues here are medium access control as well as per hop
error detection and correction. Also of relevance is to keep a fast sender from
overwhelming a slow receiver with data. Frequently this “flow control” and error
management are integrated.
The Network Layer’s basic concern is the routing of data packets across the subnet
from source to destination. Routing can be either static (wired-in) or highly dynamic,
and determined afresh per new packet (Chapman & Kung, 1999; Deerin & Hinden,
1998).
A key design issue at this layer is addressing, which is used critically by the network
layer for packet delivery, but the latter may not be entirely reliable. Packet losses can
occur, especially when node buffers (at intermediate routers) overflow and source
130
transmission rates are greater than processing rates at in-between routers (Braden et
al. 1998; Coltum, 1999; Kousky, 2000; Ramakrishnan et al, 1987).
The Transport Layer’s critical function is to add reliability to the network layer’s
capability. Depending on the protocol used, the transport layer can detect and
retransmit lost packets, if that is necessary. Congestion control algorithms, as
initiated by van Jacobson (ca. 1988) operate at this level. Note that while the
predominant transport protocol in today’s Internet is TCP (Transmission Control
Protocol) - a connection-oriented, end to end, reliable protocol - there do exist other
candidates like UDP (User Datagram Protocol) which is used by some real-time
applications such as video transmission. UDP is a connection-less and non reliable
protocol (Kalampoukas et al. 1995; Kolarov, 2001; Clark et al. 1987; Crowcroft &
Oechslin, 1998; Lin & Kung, 1999; Fraser, 1983).
The Application Layer is the topmost layer and consists of such protocols as ftp, http,
etc., which utilize lower layers to transfer files or other data across the Internet.
7.4 COMMON CONGESTION CONTROL ALGORITHMS
The congestion control algorithms presented here all work at the Transport Layer.
Firstly we define “congestion”.
“Flow control”, the consideration that a very fast or aggressive ingress end host to
not overwhelm a lower-capacity egress end host, who would then be forced to
drop/discard fast arriving packets that it does not have the capability to process. This
is flow control, an end to end consideration. Whereas “congestion control” brings to
bear a slightly different set of notions. Can the ingress host overwhelm the network
(Internet) itself? In other words, can a sending host introduce packets into the
network so fast that in-between routers are incapable of coping with the volume of
131
incoming traffic, and are forced to drop some packets? This is congestion and one
needs provisions to deal with it. (see for example, Jagannathan & Matawie, 2005 for
a leisurely discussion; Parekh, 1992, 1994).
We have proposed a two-pronged approach to managing (unicast) Internet
congestion.
1. Given a pair of end hosts in communication, regardless of the internals of in-
between routers, can we maximize network utilization? This we call the Black
Box approach to congestion management. Popular examples are Tahoe, Reno and
Vegas. Elsewhere (see the suite of publications by Jagannathan & Matawie) we
have proposed “Sierra”. The motif of this paper is the quantitative comparison of
the performance of these algorithms. (Zhang et al. 1991).
2. The related Inter-Network counterpart of Black Box is White Box modeling,
wherein one attempts to optimize the internals of routers-in-the-path between
communicating hosts. Examples include FIFO, Tail-Drop and the RED Family.
(Bennett & Zhang, 1996; Braden et al. 1998).
Whereas this dichotomy of Black Box and White Box models was first introduced by
van Jacobson, a leading figure in the area of congestion control, in the late eighties,
his overtures have not been generally and readily adopted by the congestion
management community. (Vishweswariah & Heidemann, 1997).
In this chapter we have been concerned largely with the Black Box solutions and we
attempt to quantitatively compare the relative performance of Tahoe/Reno, Vegas
and Sierra. These were elaborated earlier in Chapter 6, but are revisited here for the
sake of continuity.
132
TCP Tahoe/Reno
Introduced in the late 1980’s by van Jacobson, TCP Tahoe was the precursor to many
congestion control algorithms, and has been remarkably successful in managing
“congestion collapse”. Tahoe, initially, very quickly and exponentially, probes the
available bandwidth to determine an optimal window size, denoted cwnd. This
process is (strangely) called Slow Start. Once the congestion window reaches the
value of a certain Slow Start threshold, Ssthresh, Slow Start terminates and yields
way to “Congestion Avoidance”, wherein the congestion window is increased
linearly. When an exception occurs (timeouts/triple duplicate) the window size is
halved. In the literature, this is often referred to as AIMD (Additive Increase &
Multiplicative Decrease) (Jain, 1990). As noted, Tahoe has spawned a number of
variants, such as:
• TCP Reno (Jacobson, 1988)
• TCP New Reno (Floyd et al. 1999)
• TCP SACK (Mathis et al. 1996).
For the purpose of our discussion in this paper, we disregard the differences in the
way these Tahoe- inspired protocols respond to exceptions.
Slow Start: start with a window size of 1. Dispatch a window’s worth of packets and
for every ack received, increment the congestion window by 1, till the window
reaches the value of “Ssthresh”. Should an exception (Timeout/Triple Duplicate)
occur before then, halve Ssthresh, set the window to 1, and re-enter Slow Start. If the
window has reached Ssthresh without an exception, exit Slow Start and enter the
“Congestion Avoidance” phase.
133
The control equations for SS (Slow Start) and CA (Congestion Avoidance) are as
follows:
• SS:
cwnd = cwnd + 1, per ack
cwnd = 2 * cwnd per window of acks
• CA:
cwnd = cwnd + 1/ cwnd, per ack
cwnd = cwnd + 1, per window of acks
As observed earlier, different congestion control algorithms (Tahoe, Reno, New
Reno, and SACK) respond slightly differently when exceptions occur, i.e., they
reduce the window size in different ways. But, for the purposes of our quantitative
modeling, we disregard these differences and presuppose only the control equations
presented above.
TCP Vegas (Brakmo & Peterson, 1995; Ahn et al. 1995)
The Vegas algorithm is also a window-based TCP protocol. It operates in a manner
using the variation in round trip delays to sense and counter congestion in the
network. The point is that by measuring the fluctuations in RTT (Round Trip Time),
Vegas tries to estimate the number of packets queued in the routers for that
connection and tries to keep that number between certain limits.
Vegas will redefine the retransmission method as implemented in Reno by:
• reading/recording the system clock when a segment is sent and when an ack is
received, and uses this RTT information to make retransmit decisions.
134
• Duplicate acks received make Vegas retransmit if the segment took longer than
the estimated RTT, which is interpreted as a lost segment (no waiting for triple
duplicates).
• After 1 or 2 non-duplicate acks are received, again the time interval is checked. If
greater than the estimated RTT, a retransmit is done (no waiting for duplicate
acks).
Set two control (design) parameters, A and B.
Calculate the Expected Throughput, E, where
E = [Current Window Size] / [BaseRTT].
BaseRTT is the smallest RTT seen by the ingress, up to the time t. This is estimated
to be the real propagation delay.
Calculate the Observed Throughput, O, where
A = [# acks received] / RTT.
by sending a distinguished control packet, and computing RTT as the difference
between the ack reception time and the transmission time of that distinguished
control packet. During this RTT, count the total number of acks received.
Use these control equations:
• A ≤ E – O ≤ B � Leave window unchanged
• A > E – O � W ++
• B < E – O � W --
Also a modified Slow Start is incorporated, whereby Vegas allows exponential
growth of the window every other RTT instead of every RTT. In this way, it
becomes possible to compare Expected and Observed throughputs.
Whereas it is difficult to directly interpret the dynamics of TCP Vegas in terms of the
more traditional congestion control algorithms presented earlier, we are able to offer
135
a simple calculation of a lower bound to the throughput offered by TCP Vegas. We
will then compare this with the throughput delivered by Tahoe/Reno and Sierra.
Sierra (Jagannathan & Matawie, 2005)
Sierra is a simple, yet novel, Black Box algorithm that uses three basic parameters to
optimize network usage. Sierra is further developed in other work in progress.
• RTTs (Round Trip Times), introduced earlier
• ingress rates (Rin)
• egress Rates (Rout).
No exponential smoothing is performed.
Briefly, the control equations are, for SQS (Sierra Quick Start) and SCAM (Sierra
Congestion Avoidance Mechanism), as follows.
SQS :
Rin = Rin / 2
till Rin ≤ Rout.
SCAM :
Rin = Rin +1
If Rin ≤ Rout.
And
Rin = Rin – 1
If Rin > Rout.
We use this intuitively appealing approach to determine the average throughput in
the steady state for Sierra.
Further research is needed when we deviate from the single source - single link
model that was assumed in this thesis, to carry over to the multi-source and shared-
136
link model wherein many of these protocols compete for bandwidth, and notions
such as fairness (Hasegawa et al. 2000) etc. come into play. Whereas there have been
some such, if complex, analyses, the buffer overflow model should provide a simple
method to compare the relative comparison of competing algorithms.
7.5 A CONSOLIDATED APPROACH TO CONSTRAINED
OPTIMIZATION
7.5.1 Introduction
Lagrange multiplier preliminaries: (Bertsekas, 2003; Kunniyur & Srikant, 2000;
Kelly, 1979 to 2003):
Lemma 1: [see Bertsekas, 2003]
Let f : Rn � R
gi : Rn � R ( i = 1, 2,…,m )
hi : Rn � R ( i = 1, 2,….,l )
Consider the optimization problem:
Minimize f(x) such that
gi (x) ≤ 0 ( i = 1,2,………..,m)
hi (x) = 0 ( i = 1,2,………...,l)
x ∈ X, X open in Rn
Then ∇ hi (x~) Linear Independent ⇒
Fo ∩ Go ∩ Ho = φ
where
Fo = { d : ∇ f(x~)T d < 0 }
Go = { d : ∇ g(x~)T d < 0 ; i ∈ I }
Ho = { d : ∇ h(x~)T d = 0 ; i = 1,2,…l}
137
Theorem 2 (Fritz John Necessaries) (Low & Lapsley, 1999):
Let the conditions of the Lemma obtain. Let x~ be a feasible solution and
I = { i : gi (x~) = 0 }
Let f, gi ( i ∈ I ) be differentiable at x~
Let hi ( i = 1,2,…..l) be continuously differentiable at x~
Also let gi ( i ∉ I ) be continuous at x~
If x~ is a local optimum then
∃ uo , ui (i ∈ I), vi ( i = 1,2,… l)
such that
uo ∇f (x~) + ∑ ui ∇gi (x~) + ∑ vi ∇hi (x
~) = 0
with uo, ui ≥ 0
uo, ui, vi not all zero.
Proof :
If ∇hi (x~) are Linear Dependent then the solution is trivial.
Hence assume ∇hi (x~) are Linear Independent.
Let A1 = a matrix with rows ∇f(x~) and ∇gi(x~) [ i ∈ I ]
And A2 = a matrix with rows ∇hi(x~)
By the Lemma, the system
A1d < 0
A2d = 0
has no solution.
Consider
S1 = A1d and S2 = z1 z1 < 0 and z2 = 0
A2d z2
138
Note S1 ∩ S2 = φ and
S1 , S2 are convex.
So ∃ p non-zero s.t. p = p1
p2
with pT S1 ≥ pT S2 and pT = ( p1T, p2
T )
⇒ p1T A1 d + p2
T A2 d ≥ p1T z1 + p2
T z2
⇒ Letting RHS → 0
We have
p1T A1 d + p2
T A2 d ≥ 0 ∀ d
Let d = - ( A1T p1 + A2
T p2 )
We then have,
− || A1T p1 + A2
T p2|| 2 ≥ 0
⇒ A1T p1 + A2
T p2 = 0
Letting
uo = p1(0)
ui = p1(i) ( i ∈ I)
vj = p2(j) ( i = 1,2,….l )
the result follows.
QED.
Fritz John Sufficiency
Lemma 3 [see Bertsekas, 2003]
Let Fo, Go, Ho as above.
Then, Fo ∩ Go ∩ Ho = φ
139
And f pseudo-convex at x~ ,
gi (i ∈ I) strictly pseudo-convex over Nε (x~)
hi (i = 1,2,….,l) affine (convex and concave)
⇒ x~ a local minimum
Theorem 4:
Let x~ a Fritz John solution.
Let S = { x : gi (x) ≤ 0 ∀ i ∈ I } ∩ { x : hi (x) = 0, ∀ i = 1,2,…. l }
Further let hi (x~) affine (convex & concave), and
∇hi (x~) Linearly Independent, and there exists Nε (x
~) such that
over S ∩ Nε (x~) we have
f pseudo-convex and
gi strictly pseudoconvex, then
⇒ x~ a local minimum for the problem
Proof:
Let Fo ∩ Go ∩ Ho ≠ φ, so
∃ d ∈ Fo ∩ Go ∩ Ho
We also have
uo ∇f (x~) + ∑ ui ( ∇gi (x~)) + ∑ vi ∇hi (x
~) = 0
uo ∇f (x~) d + ∑ ui ( ∇gi (x~)) d = 0
uo, ui ≥ 0, and d ∈ Fo ∩ Go ⇒ uo = 0 = ui
Hence,
vi ∇hi (x~) = 0
which contradicts the linear independence of the ∇hi
⇒ Fo ∩ Go ∩ Ho = φ
140
hi affine ⇒ d a feasible direction iff d ∈ Ho
Since gi is strictly pseudo-convex over S ∩ Nε (x~) we have
D = Go ∩ Ho
where D is the cone of feasible directions at x~
Hence, Fo ∩ D = φ
We have S ∩ Nε (x~) convex and f pseudo-convex at x~, hence
d = x – x~ ∈ D ∀ x ∈ S ∩ Nε (x~)
whence we must have
∇ f(x~)T d < 0
Hence if x~ is not a local minimum, we have
∃ direction d ∈ Fo ∩ D, which is a contradiction. Hence x~ is a local minimum. QED.
Lagrange-Kuhn-Tucker necessity
Theorem 5:
Let X ≠ φ, and open in Rn
f: Rn → R
gi: Rn → R
hi: Rn → R
The problem is to minimize f(x) such that
gi (x) ≤ 0
hj (x) = 0
x ∈ X
Let I = { I:gi (x~) = 0 } Let further
f, gi (i ∈ I) differentiable at x~
gi (i ∉ I) continuous at x~
141
hj continuously differentiable at x~
Also suppose ∇gi(x~) and ∇hj (x
~) are Linearly Independent.
Then (Lemma 5)
x~ is a local minimum ⇒
∃ ui ( i ∈ I ) , ∃ vj ( j = 1,2,…..l )
∇f(x~) + ∑ ui ∇gi(x~) + ∑ vi ∇hi(x
~) = 0
with ui ≥ 0
Proof :
This result follows readily from the Fritz John necessity conditions.
7.5.2 Lagrange-Kuhn-Tucker Sufficiency:
Theorem 1:
We address the same optimization problem. If LKT hold at x~ ∈ X, then
• f is pseudo-convex over Nε (x~) ∩ S
• gi is strictly pseudo-convex over Nε (x~) ∩ S
• hi is affine
Then
∇hi(x~) Linearly Independent ⇒ x~ a local minimum.
Proof : Follows readily from Fritz John sufficiency.
Theorem 2:
We address the same optimization problem. If LKT hold at x~ ∈ X, let
J = { i: vi > 0 }
K = { i : vi < 0 }
Let
• f pseudo-convex at x~
142
• gi quasi-convex at x~
• hi quasi-convex at x~ ∀ i ∈ J
• hi quasi-concave at x~ ∀ i ∈ K
Then x~ is a global minimum.
Proof:
∀ i ∈ I, we have
gi (x~ + λ(x – x~)) = gi (λx + (1-λ)x~)
≤ max { gi (x), gi (x~) }
= 0
= gi (x~)
Hence,
∇gi (x~)T (x – x~) ≤ 0
Similarly,
∇hi (x~)T (x – x~) ≤ 0 ∀ i ∈ J
∇hi (x~)T (x – x~) ≥ 0 ∀ i ∈ K
Hence,
[ ∑ ui ∇gi (x~) + ∑ vi ∇hi (x
~) ]T (x – x~) ≤ 0
From LKT, we have
∇f(x~) (x – x~) ≥ 0
⇒ f(x) ≥ f(x~) (by pseudo-convexity)
QED.
7.5.3 Penalty Methods: (Freund, 2004)
Penalty methods pertain to the replacement of a constrained optimization problem by
a series of unconstrained problems that are more easily solvable. Solutions to the
143
unconstrained problem in turn converge to a solution for the constrained problem. In
this context we define
Definition: A function p(x) : Rn → R is called a penalty function for the usual
constrained optimization problem (with equality and inequality constraints) if
• p(x) = 0 if g(x) ≤ 0 and h(x) = 0
• p(x) > 0 if g(x) > 0 or h(x) ≠ 0
Definition: A penalty optimization programme is
P(k) : minimize f(x) + k p(x) w.r.t. x if x ∈ Rn
We have the following
Penalty Convergence Lemma (without proof; see Bertsekas, 2000):
Let f, g, h, and p be continuous functions. Let further { xi : i ∈ N } be a series of
solutions to the penalty programme. Then, any limit point of {xi } is an optimal
solution for the original constrained problem.
We now prove a key result for penalty methods.
Let p(x) = ∑ [ max { 0, gi(x) ]q + ∑ | hi(x) |q
be a penalty function for the general constrained problem with equality and
inequality constraints.
So p(x) = θ (( gi(x))+, hi(x))
Let ∂θ(y)/∂yi = 0 at yi = 0 ( i = 1,2,….m ) (*)
Then p(x) is differentiable whenever the functions gi and hi are.
We then have,
∇p(x) = ∑ ∂θ(g+(x))/∂yi ∇gi (x) + ∑ ∂ θ(hi(x))/∂yi ∇hi(x)
Letting
uik = ck ∂θ(g+(xk))/∂yi and (**)
vik = ck ∂θ(h(xk))/∂yi
144
we see that the LKT conditions are satisfied.
From the Penalty Convergence Lemma, we have that if xk → x~, then x~ is an
optimal solution of the original optimization problem.
Let I = { i : gi (x~) = 0 }
And N = { i : gi (x~) < 0 }
∀ i ∈ N ⇒ uki = 0, ∀ k sufficiently large
∀ i ∈ I ⇒ uki ≥ 0
Let uk → u~ and
Let vk → v~
We have
∇ f(xk) + ∑ uki ∇gi (x
k) + ∑ vki ∇hi (x
k) = 0
⇒ by continuity, we have
⇒ ∇f(x~) + ∑ ui~ ∇gi(x
~) + ∑ vi~ ∇hi(x
~) = 0
and ui~ ≥ 0
So the LKT conditions are satisfied, and it remains to show that
uk → u~, and
vk → v~
Assume there are no such limit points.
For large k we have u~i = 0, ∀ i ∈ N
So
∑uki ∇gi(x
k) + ∑ vki ∇ hi(x
k) = ∑ uki/||u
k|| ∇gi(xk) + ∑ vk
i /||vk||∇ hi(x
k)
We have ||uk||,||vk|| → ∞
Hence, ∑ u~i ∇gi(x
~) + ∑ v~i ∇ hi(x
~) = 0
which violates the Linear Independence of the gradients.
145
It is easy to now show that the limit points are unique, again invoking the linear
independence of the gradients.
We thus have:
Theorem 7: Let θ (y) satisfy (*) and be continuously differentiable. Assume further
that f and g are differentiable. Define uk, vk using (**). Then if xk → x ~, and the
gradient vectors are Linear Independent at x~, we have uk → u~ and vk → v~, where
u~, v~ are LKT multipliers for the optimal solution x~ of the original constrained
optimization problem.
7.5.4 Exact Penalty Methods: (Freund, 2004; Boyd & Vanderberghe, 2006)
Here we choose a penalty function such that the solution to the penalty programme is
also a solution to the original constrained optimization problem (with equality and
inequality constraints).
Let p(x) ≅ ∑ (gi (x))+ + ∑ | hi (x) |
Let x^ solve the original problem P.
We have
q(c,x) = f(x) + k ∑ (gi (x))+ + k ∑ |hi(x)|
≥ f(x) + ∑ u*i (gi (x))+ + ∑ v*i |hi(x)|
≥ f(x) + ∑ u*i gi (x^) + ∇ gi (x^) (x – x^) + ∑ v*i (x^) + ∇ hi(x^) (x – x^)
= f(x) + ∑ u*i ∇ gi (x^)T (x – x^) + ∑ v*i ∇ hi (x^)T (x – x^)
= f(x) - ∇ f(x^)T (x – x^)
= q(k, x^)
So q(k, x^) ≤ q(k,x) solves P(k)
Now let x~ solve P(k). If x^ solves P as well, we have,
f(x~) + k ∑ (gi(x~))+ + k ∑ |hi(x
~)|
146
≤ f(x^) + k ∑ (gi(x^))+ + k ∑ |hi (x^)|
= f(x^)
Hence,
F(x~) ≤ f(x^) – k ∑ (gi(x~)+ - k ∑ |hi (x
~)|
which is a contradiction.
So x~ solves P as well. We have proven
Theorem 8: Suppose P is a generalized convex programme with inequality and
equality constraints, for which LKT are necessary, and let P(k) be a penalty
programme wherein
P(x) = ∑ (gi (x))+ + ∑ |hi (x)|
If k > max { u*I , v*I } where {u*, v*} are LKT multipliers then the optimal
solutions for P(k) and P coincide.
7.5.5 Barrier Methods: (Boyd & Vanderberghe, 2006)
This method acts to place a very high cost at the boundary of the feasible region to
dissuade solution points from ever approaching the boundary. We have, akin to the
penalty programme, the barrier programme, formulated as in:
B(k) : minimize f(x) + 1/k b(x)
s.t. g(x) < 0,
h(x) = 0,
x ∈ Rn
We have (without proof) the
Barrier convergence lemma: Let f, g, h, and b be continuous functions. Let {xi: i ∈
N} be a series of solutions to B (ki). Suppose ∃ an optimal solution x^ of P for which
147
∀ ε > 0 , Nε (x^) ∩ { x: g(x) < 0 and h(x) = 0 } ≠ φ . Then we have that any limit
point x~ of {xi } solves P.
Let b(x) = γ (g(x)) + γ (h(x))
We have,
∇ b(x) = ∑ ∂/∂yi [γ(g(x))] ∇gi(x) + ∑ ∂/∂yi [γ(h(x))] ∇hi(x)
xk solves B(ck) ⇒ ∇ f(xk) + 1/ck ∇ b(xk) = 0
i.e., ∇ f(xk) + 1/ck ∑ ∂/∂yi [γ(g(xk))] ∇gi(xk) + 1/ck ∑ ∂/∂yi [γ(hi(x
k))] ∇hi (xk)
Define uki = 1/ck ∂/∂yi [γ(g(xk))] and
vki = 1/ck ∂/∂yi [γ(h(xk))]
We then have,
∇ f(xk) + ∑ uki ∇ gi (x
k) + ∑ vki ∇ hi (x
k)
Hence uk , vk are Lagrange-Kuhn-Tucker multiplier vectors.
Note uki → 0 as k → ∞ as ck → ∞ and since gi (x
k) → gi (x~) < 0 and
∂/∂yi [ γ (h(x~))] is finite.
Similarly vki → 0 as k → ∞
Also, uki ≥ 0 ∀i, k sufficiently large.
Suppose uk → u~ as k → ∞. Then u~ ≥ 0 and ui = 0 ∀ i ∉ I
Also suppose vk → v~ as k → ∞
From continuity of all functions,
∇ f(x~) + ∑ ui~ ∇gi (x
~) + ∑ vi~ ∇hi (x
~)
with u~ ≥ 0 and u~T g(x~) = 0
Hence, uk, vk are Lagrange-Kuhn-Tucker multiplier vectors.
In a manner identical to the corresponding theorem for penalty functions, it can be
shown that ∃! u~, v~ such that uk → u~ and vk → v~
148
Hence we have,
Theorem 8: Suppose there exists an optimal solution x~ of P such that
∀ ε > 0 [ Nε (x~) ∩ { x/ g(x) < 0 } ∩ { x/h(x) = 0 } ≠ φ
Let γ(y) be continuously differentiable with uk, vk as defined above.
If xk → x~ and ∇gi (x~) , ∇ hi (x
~) are Linear Independent, then uk → u~ where u~ is
a vector of Lagrange-Kuhn-Tucker multipliers for the optimal solution x~ of P.
7.5.6 Utility Functions: (Kelly, 2000)
Most congestion control algorithms can be cast in terms of an optimization problem
of utility functions, for example:
Maximize ∑ Ur (xr)
xr
such that ∑ xr ≤ cl
and xr ≥ 0
For instance, Ur (xr) = arctan (xr Tr √ β) / √ βTr for TCP Reno.
A well behaved limit is Ur (xr) = -1/[xr Tr ]
For TCP Vegas, we have
Ur (xr) = α Tr log xr s.t. ∑ xr ≤ cl
We also have for Sierra,
Ur (xr) = - E {[Wor +/- 2n]/2n}
where E denotes the expected value.
It now remains to solve all three problems using Penalty/Barrier methods.
As mentioned earlier, we provide proofs in this chapter to introduce techniques that
may be readily adapted for the sake of studying congestion control problems. This
section is an example of this policy.
149
7.6 STOCHASTIC MODELS (Kelly, 1979, 2000, 2003)
7.6.1 Deterministic Limits
Consider N sources accessing a single link. We have the dynamics of the rth source
given by
xr(N) (k+1) = xr
(N) (k) + α (w – Mr(k))
where xr(N) (k) is the window size W(k) at time k of the rth connection. Recall the
earlier discussion in Chapter 6 about congestion windows.
We assume that W(k) is a Poisson random variable with mean (and variance) xr(N) (k)
The marking process at each link marks every packet with a probability κN(k) at time
k. We assume
κN (k) = pN(yN(k))
with pN (z) = 1 – exp( - γ/N z)
Next define
x(N) (k) = 1/N ∑ xi(N) (k)
We have
x(N) (k+1) = x(N) (k) + κ ( w – 1/N ∑ Mr(k))
We show that if
E [x(N) (0) – x(0)]2 → 0 as N → ∞
We have P [ | x(N) (k) – x(k) | > ε ] → 0 as N → ∞ with x(k) and p(x) as defined
above.
The proof is by induction. Assume it is true for k > 0 we will prove it for k+1.
E [x(N) (k) - x (k)]2 = E [x(N) (k-1) + κ(w – 1/N ∑ Mr(N) (k-1) – x(k-1) - κ(w – x(k-
1)p(x(k-1))]2
150
= E [ 1/N ∑ Mr(N) (k) ]2 – 2 E [ (1/N ∑ Mr
(N) (k)) (x(k)p(x(k))]
+ (x(k)
p(x(k)))2
→ 0 as N → ∞
Hence the result, and a weak law of large numbers obtains for average source rates.
7.6.2 Per Source Dynamics (Ott, 1999)
Here we study the dynamics of individual sources. Whereas the average source rate
obeys a weak law of large numbers, individual sources display random variations. In
this section, we provide models for this random behaviour. Specifically, we will
compute their variance.
Proposition 9: xi(N)(k) a series of input rates
x(N)(k) → x(k) in probability
We have
P { 1/√N [ ∑ xi(N)(k) – x(k) ] ≤ a } → 1/√2N ∫ exp ( -x2/2) dx
Assume this lemma (without proof):
Let z1, z2, …. be a sequence of random variables with distribution functions FZn and
moment generating functions φZn ( n ≥ 1) and
let z be a random variable with distribution function Fz and moment generating
function φz.
Then, φZn(t) → φz (t) ∀ t ⇒ FZn (t) → Fz (t), ∀t where Fz (t) is continuous.
Proof of Proposition 9 (Ott, 1999)
The moment generating function of xi(N)/√n = φXi/√n (t) = E [ exp{ t xi/√n}]
= φ ( t/√n)
So the m.g.f. of ∑ xi(N)/√n is given by
151
φ∑ Xr/√n (t) = [ φ (t/√n) ]n
Letting L(t) = log φ (t)
We note
L’(0) = µ = 0
L’’(0) = E [x2] = 1
By Lemma
Limn →∞ [(t/√n)/n-1] = Limn→∞ [ L’’(t/√n) t2/2] = t2 / 2
Hence we have proven the result for µ = 0 and σ = 1. The general result follows by
considering
(X - µ) /σ. Q.E.D.
7.6.3 Explicit Utility Feedback: (Benmohamed & Meerkov, 1993; Kelly, 1997)
When explicit feedback is available, it is possible to study its effect on the standard
deviation of individual flows.
As before, the dynamics of the rth source are given by
Xr(N) (k+1) = Xr
(N) (k) + τ (w – Poisson (Xr(N) (k)) ( 1- exp(-γyN (k) y/N )
As before we can show that
X(N) (k) = 1/N Xr(N) (k) converges as N → ∞ to X(k) such that
X(k+1) = X(k) + κ (w – X(k) p(X(k)))
i.e., Xr(k+1) = Xr(k) + τ ( w – Poisson (Xr (k))(1 – exp(-γX(k))
We thus have
σk+12 = ( 1 - τ p(X(k))) σk
2 + κ2 p2 ((x(k))E (Xr(k))
So σ∞ = [ τ p2(X∞) X∞] / [ ( 1 - τ p(X∞)]2
152
We note by comparison that this standard deviation is lower than the case of
probabilistic feedback by a factor of √p(X∞).
7.7 STOCHASTIC MODELS (concluded)
We start by motivating the “memory-less” or “Markov” property of statistical
distributions.
Definition: Assume one has a distribution of failure times of an item which are
statistically distributed. The distribution is MEMORY-LESS iff an item which has
been in use for some time is as good as a new item with regards to the amount of
time remaining until that item fails.
Proposition 10: If a statistical distribution has the memory-less property, then it is the
exponential distribution (i.e., the only distribution with the memory-less property is
the exponential distribution).
Proof: The memory-less condition is equivalent to
P ( X ≤ x + t, X > t) / P (X > t) = P (X ≤ x)
i.e., P (X ≤ x + t, X > t) = P (X ≤ x) P (X > t )
If X is a non-negative and continuous random variable, this equation becomes
P(t < X ≤ x + t) = P (0 < X ≤ x) P (X > t)
i.e.,
Fx (x + t) – Fx (t) = [Fx (x) - Fx (0) ] [ 1 - Fx (t) ]
Noting that Fx (0) = 0 and rearranging,
[ Fx (x+t) - Fx(x) ] / t = Fx(t) [ 1 - Fx(x) ] / t
Taking the limit as t � 0, we obtain
F’^(x) = F^ (0) [ 1 - Fx(x) ]
where Fx^ (x) denotes the derivative of Fx(x)
Let Rx(x) = 1 - Fx(x)
153
We have
Rx’(x) = Rx’(0) Rx(x)
This differential equation has a solution given by
Rx(x) = k e Rx(x) x
where k is an integration constant.
Noting that
k = Rx(0) = 1 and
letting
Rx’ (0) = - Fx’(0) = - fx (0) = - λ, we get
Rx(x) = e –λx
Hence
Fx(x) = 1 - Rx(x) = 1 – e –λx (x > 0)
We thus conclude that X is an exponential r.v. with parameter
λ = fx(0) ( > 0 )
Note that the memory-less property may be equivalently expressed as
P ( X > x + t / X > t ) = P (X > x) x > 0, t > 0, OR
P ( X > x + t) = P (X > x) P ( X> t) x, t > 0
These equations are satisfied when X is an exponential r.v.
7.7.1 Queue-width Marking
Consider k proportionally fair primal-dual controllers using one link of capacity Nk
The dynamics of the n’th source are described by
∂/∂t xn = κ ( w – xnPn (b^))
where b^ is the queue-width of the link.
We have
154
∂b^/∂t = ∑xi – Nk
Let b = b^/N and suppose that
PN(b^) = P(b)
In that case, if x = (∑xi ) / N
then,
∂x/∂t = κ ( w – x P(b))
∂b/∂t = [ x – c ]b+
Examples of PN(b^) and p(b) are given by the exponential distribution.
Discrete time versions of the above equations are
Xn(k+1) = Xn(k) + κ ( w – Xn(k) PN(b(k)) (**)
and b(k+1) = b(k) + [x – c]b+
Given the dynamics of the n’th source the stochastic models are given by
Xr(N) (n+1) = Xr
(N)(k) + κ ( w – Mn(N)(k+1))
and b^(N)(k+1) = b^(n)(K) + [ ∑ xR(n)(k) – c] respectively.
We can show that with the exponential distribution example (Tinnakornsrisuphap &
Makowski, 2003; Deb & Srikant, 2003).
X(N)(k) := 1/N [ ∑ Xn(N)(k) ]
and b(N)(k) := b(N)(k) / N
converge in probability to (**) as N � ∞
Consider this stochastic model for a TCP-type congestion controller
Xn(N)(k+1) = Xn
(N)(k) + κ ( w – Xn(N) Mn
(N)(k))
Recall that this corresponds to the utility function - w / Xn
Defining, as earlier,
X(N)(k) = 1/N [ ∑ Xn(N)(k) ]
We get
155
X(N)(k+1) = X(N)(k) + κ [ w – 1/N ∑ Xn(N)(k) Mn
(N)(k) ]
We expect a law of large numbers
1/N LimN�
∞ ∑ Xn(N)(k) Mn
(N)(k)
= LimN�
∞ E [Xn(N)(k) Mn
(N)(k) ]
= E [ Xn2(k) ( 1 – e-σX(k) )]
with
Mn (k) = Poisson [ Xn(k) ( 1 – e-σX(k) ) ]
A similar result appears in (Tinnakornsrisuphap & Makowski, 2003).
Note that X(k) is deterministic and in fact equal to E [ Xn(k) ]. But Xn(k) is a r.v.
with second moment explicitly affecting the dynamics of X(k).
A satisfactory resolution of the average behaviour of the TCP source and the correct
limiting behaviour is lacking at this time.
This chapter has included a presentation and discussion of numerous results and
theories from the fields of convex optimization, control theory as well as advanced
probability theory. These results, which have been gleaned from numerous sources,
appear for the first time in one place as such (in this chapter), and will be deployed in
future advanced investigations into the nature and relative behavior and performance
of Sierra vis-à-vis its predecessors and competitors. We reference our fifth research
paper on Sierra, where the quantitative superiority of Sierra is discussed in detail.
However, no claim is made as to the originality of the various propositions (and
proofs) presented in this chapter. This serves as a concentrated collation of results
from optimization/control theories adapted from the perspective of mathematical
congestion control and management.
156
8
QUANTITATIVE MODELING AND
SOFTWARE SIMULATION
This chapter involves two complementary aspects of the modeling of various
congestion control algorithms, including our novel Sierra, in terms of both:
• Quantitative modeling, at a level that is comparable to what can be found in the
literature on the subject (Jain & Hassan, 2001)
• Simulation-in-Software, using the OPNET Modeler simulator.
Most of the work reported in this chapter is original in nature.
8.1 A QUICK REVIEW
We begin with a quick discussion of various popular models. These are presented
mainly for completeness of the discussion, from a modeling/simulation perspective.
8.1.1 TCP Tahoe/Reno
Introduced in the late 1980’s by van Jacobson (v. Jacobson, 1988), TCP Tahoe was
the precursor to many congestion control algorithms, and has been remarkably
successful in managing “congestion collapse”. Tahoe, initially, very quickly and
exponentially, probes the available bandwidth to determine an optimal window size,
denoted cwnd. This process has been called Slow Start, though in fact it is quite
“quick”. Once the congestion window size reaches the value of a certain Slow Start
threshold, sthresh, Slow Start relents to “Congestion Avoidance”, wherein the
congestion window is increased (approximately) linearly. When an exception occurs
157
(timeouts/triple duplicate acks), the window size is halved. In the literature, this is
often referred to as AIMD (Additive Increase & Multiplicative Decrease).
As observed, Tahoe has spawned a number of variants, such as:
• TCP Reno • TCP New Reno
• TCP SACK.
For all intensive purposes, we briefly consider Slow Start & Congestion Avoidance.
Slow Start: Start with a window size of 1. Dispatch a window’s worth of packets and
for every ack received, increment the congestion window by 1, till the window
reaches the value of “Ssthresh”. Should an exception (Timeout/Triple Duplicate)
occur before then, halve Ssthresh, set the window to 1, and re-enter Slow Start. If the
window reached Ssthresh without any exception, exit Slow Start and enter the
“Congestion Avoidance” phase.
The control equations for SS (Slow Start) and CA (Congestion Avoidance) are as
follows:
• SS:
cwnd = cwnd + 1, per ack
cwnd = 2* cwnd, per window of acks.
• CA:
cwnd = cwnd + 1/cwnd, per ack
cwnd = cwnd + 1, per window of acks.
We quickly consider some of the major differences between Tahoe, Reno, New Reno
and SACK (Selective Ack).
Tahoe is a combination of:
• Slow Start
158
• Congestion Avoidance
• Fast Retransmit.
Reno has this contribution to make. Follow the Fast Retransmit phase by Congestion
Avoidance, rather than Slow Start, since the triple duplicates do indicate that data is
indeed flowing in the connection.
8.1.2 SACK
SACK is negotiated through the use of the TCP Header option fields, whereby the
receiver can offer feedback to the sender in the form of selective acknowledgement
options, whereby it is reported as to which blocks of data have arrived. This tells the
sender which continuous byte blocks have been received. In this manner, the receiver
queues and re-orders data without the need for them to be retransmitted.
8.1.3 New Reno
New Reno is a technique which operates at a time when SACK is not negotiated, and
allows TCP to recover faster from a multitude of lost packets from within one
congestion window (a performance drawback of Reno). The technique is an
improvement of the way that TCP Reno deals with the loss of several packets from
within one window.
Generally, SACK performs better that New Reno, per se, all things considered.
8.1.4 TCP Vegas
The Vegas algorithm is also a window-based TCP protocol. It operates in a manner
using the variation in round trip delays to sense and counter congestion in the
network. The point is that by measuring the fluctuations in RTT, Vegas tries to
estimate the number of packets queued in the routers for that connection and tries to
keep that number between certain limits.
Vegas will redefine the retransmission method as implemented in Reno by:
159
• Read/record the system clock when a segment is sent and when an ack is
received, and uses this RTT information to make retransmit decisions.
• Duplicate acks received make Vegas retransmit if the segment took longer than
the estimated RTT, which is interpreted as a lost segment (no waiting for triple
duplicates).
• After 1 or 2 non-duplicate acks are received, again the time interval is checked. If
greater than the estimated RTT, a retransmit is done (no waiting for duplicate
acks).
Set two control (design) parameters, A and B.
Calculate the Expected Throughput, E, where
E = [Current Window Size] / [BaseRTT]
BaseRTT is the smallest RTT (Round Trip Time) seen by the ingress, up to the time
t. This is estimated to be the real propagation delay.
Calculate the Observed Throughput, O, where
A = [# acks received] / RTT
by sending a distinguished control packet, and computing RTT as the difference
between the ack reception time and the transmission time of that distinguished
control packet. During this RTT, count the total number of acks received.
Use these control equations:
• A ≤ E – O ≤ B � Leave window unchanged
• A > E – O � W ++
• B < E – O � W --
Also a modified Slow Start is incorporated, whereby Vegas allows exponential
growth of the window every other RTT instead of every RTT. In this way, it
becomes possible to compare Expected and Observed throughputs.
160
8.1.5 Sierra
Sierra is a simple, yet novel, Black Box algorithm that uses three basic parameters to
optimize network usage. Sierra is further explicated in [7] and other work in
preparation.
• RTTs (Round Trip Times), earlier introduced
• Ingress Rates (Rin)
• Egress Rates (Rout).
No exponential smoothing is performed.
Briefly, the control equations for SSS (Sierra Smart Start) and SCAM (Sierra
Congestion Avoidance Mechanism) are as follows:
SSS:
Rin /= 2
Until Rin ≤ Rout
SCAM:
Rin ++
If Rin ≤ Rout
Rin --
If Rin > Rout
Rin and Rout pertain to the transmission rates at ingress and egress. Egress rates are
continually calculated at the receiver and relayed back to the sender, using control
packets.
We use this intuitively appealing approach to determine the average steady state
throughput for Sierra.
161
8.2 QUANTITATIVE MODELING
As a general scenario for all further quantitative analysis, we initially consider a
simple setting wherein a single host accesses a single link, en route to a single
destination. We will assume that “L” is the capacity of the link, in packets/sec. At the
speed of light, let “t” be the propagation delay. It is very easy to see that the queuing
delay at the link is 1/L. Hence the round trip time is given by
T = 2 times (t + 1/L)
We see that if the link capacity is,say, 50 Mbps, with a packet size of 500 Bytes, then
L = 12,500 packets/sec
1/L = 0.08 msecs
If the link length is, say, 1000 km, then the propagation delay is
[2 times 10exp6] / [3 times 10exp6]
= 6.6 msecs
Therefore,
T = 2 (6.6 + 0.08) = 13.36 msecs.
The factor LT is usually called the Bandwidth Delay product.
Let B stand for the buffer size at the link.
Let us assume furthermore that the propagation delay between source and destination
is “t”.
Therefore, (Lt) packets are in transit. Clearly, the total number of un-acknowledged
packets is
# = LT + B
It is clear that
If W > LT + B
Then there is buffer overflow and the attendant packet loss.
162
This above inequality will be the cornerstone for all our further analysis of the
performance of different congestion control algorithms.
In a nutshell, we will presume that if
W > LT + B
there will be exceptions in terms of undelivered packets.
8.2.1 Modeling Tahoe/Reno
Slow Start
Packets are discharged every (nT +m/L) seconds. It is easy to see that
W (t = nT + m/L) = 2n-1 + m + 1, with 0 ≤ m ≤ 2n-1 – 1
Similarly, the queue length is
Q (t = nT + m/L) = m+2 , with 0 ≤ m ≤ 2n-1 – 1
Therefore,
Qmax = 2n-1 + 1
Wmax = 2n
We denote the time between <t = nT> and <t = (n+1)T> as a “sub-epoch” wherein
the window size doubles.
As a further approximation, Ssthresh (introduced earlier) reaches
Ssthresh = [ LT + B] / 2
This implies that
2n-1 ≤ [LT +B] / 2
n ≤ log 2 (LT + B)
where,
n = the number of sub-epochs in SS
L = the link capacity
T = the average round trip time
163
B = the link buffer size.
At its peak, 2n-1 = (LT + B)/2 a realized upper bound.
In principle, Q is upper bounded by B, which implies
Qmax ≤ B
Wherefrom
(LT + B + 2) / 2 ≤ B
Whence B ≥ LT + 2
This is our simple expression for the non occurrence of exceptions (timeouts/TD).
If this inequality is satisfied, there is no buffer overflow.
At the extreme
Qmax = B
2n-1 + 1 = B
So
N = 1 + log 2 (B – 1)
Wmax = 2 (B – 1)
Notice that
W ~ 2 t/T
Whence
Tss = T log 2 [2 (B – 1)]
= T ( 1 + log 2 [B-1] )
Similarly,
Nss (packets delivered in SS)
= 2B – 3
Congestion Avoidance
We have
164
W (t) = W (t – T) + at
= (LT + B)/2 + at
W (tm) = Ssthresh
� 2n-1 + m + 1 = [LT + B] / 2
� n ≤ log 2 (LT + B)
So,
Nca = ∫ W(t)/T dt
= [LT + B]/2T * Tca + a Tca2 / 2
where Tca = [LT + B] / m
M being the slope of the CA line, an approximation to the square root curve.
Thus, Throughput (Tahoe)
[Nss + Nca] / [Tss + Tca]
where
Nss = 2B – 3
Tss = T [1 + log2 (B – 1)]
Nca = (LT + B) / 2T * Tca + a Tca2 /2
Tca = (LT + B)/m.
8.2.2 Modeling Vegas
The control equations were presented earlier.
With Wmax = LT + B, we have
E – O = {[LT + B][T = n/L] – T [LT + B + n]} / T [T + n/L]
= [Bn/L] / [T(T+n/L)] (Brakmo & Peterson, 1995)
a ≥ (B/L) / (T2 / n + T/L)
n ≤ a L T2 / (B – Ta)
165
So,
E [n] ≤ a L E [T2] / ( B – a E[T] )
= a L T2 / B – aT
Hence,
Throughput = E [LT + B + n] / E [T + n/L]
= { LT + B + 6 E[1] }/ { T + 6 E[1]/L}
E [1] = [d L T2] / [B + dT], say
Vegas throughput
= { [LT + B] [B + dT] + 6dLT2 } / { T + 6dT2 }
8.2.3 Modeling Sierra
As will be noted, the ingress rate goes up, down or flat in units. Let the probability of
these be as follows:
• P (+) = 1/m
• P (-) = 1/l
• P (flat) = 1/s.
Let K be a sample ingress rate, at discrete time t.
Being interested in steady-state performance, we skip the Jump Start state
completely, and assume a starting rate of K0 at time t = 0. To reach K from K0, we
will need steps P (Plus), L (Less), and S (Same).
We readily have:
P + L + S = t
P – L = K - K0
We have also the following probabilities
P [Rate = K @ Time = t] = (1/m)P (1/l)L (1/s)S
166
Therefore, the average throughput is the mathematical expectation of the Ingress
Rate, as follows:
Throughput (at Time = t) = E [Rate (t)] = ∑ K P [Rate = K @ Time = t]
Average Throughput = Lim t -> ∞ [Throughput (at Time = t)]
Continuing, we have
[Rate (t)] = (1/m)[t – S –K +Ko]/2 (1/l)[t – S – K + Ko]/2 (1 – 1/m – 1/l)S
= (1/m)t/2 (1/l)t/2 [(1/m)[– S –K +Ko]/2 (1/l)[– S – K + Ko]/2 (1 – 1/m – 1/l)S
= (1 – 1/l – 1/s)t/2 (1 – 1/m – 1/s)t/2 (1/m)[– S –K +Ko]/2 (1/l)[– S – K + Ko]/2
Times (1 – 1/m – 1/l)S
Note that in the two (first & second) terms in t above, they tend to 1 in the limit to
infinity; hence we may discard them in the first instance.
So we have,
E [Rate (t)] = ∑ K * (1/m)[K-Ko-S]/2 (1/l)[Ko-K-S]/2 (1/s)S
= (1/m)[-Ko-S]/2 (1/l)[Ko-S]/2 (1/s)S * ∑ K (l/m)K/2
Letting S � 0, we have
E [Rate (t)] = (1/m)-Ko/2 (1/l)Ko/2 * ∑ K (l/m)K/2
Wherefrom, we have
Average Throughput (Sierra) = (1/m)-Ko/2 (1/l)Ko/2 (l/m)1/2 / [1 – (l/m)1/2 ]2
This is our final expression for the average throughput of Sierra.
It is readily seen that this will yield a better performance than that of Reno, New
Reno, Vegas and Tahoe.
We have used mathematical machinery to a level that is consistent to what can be
observed in the literature on the subject. Various references are sighted (Hasegawa et
al. 1999; Allman et al. 2000).
167
We have now introduced a number of concepts starting from a basic discussion of
the Transport Layer. We presented a number of popular congestion control
algorithms, including Tahoe/Reno, Vegas, New Reno and Sierra. In the first half of
the chapter, we developed
• basic mathematical models to analytically study the behavior and performance of
these protocols.
In the second half of the chapter, we deal with
• OPNET Modeler simulation results for all the protocols.
Sample calculations were also presented, which appear to indicate the relative
superiority of Sierra. (Jagannathan & Matawie, 2009)
Further research is needed when we deviate from the one-flow unicast model, to a
multi-source, shared-link model, wherein many protocols compete for bandwidth,
and notions such as fairness (Hasegawa et al. 1999) come into play. Whereas there
have been such, if complex, analyses, the buffer overflow paradigm should provide a
simple tool to relatively compare the performance.
168
8.3 THE SIMULATION-IN-SOFTWARE PROJECT
The focus of this section is on Simulation, (discrete event), using the popular OPNET
Modeler software tool. The simulation exercise in software, as a project per se, is
much more involved than just a computer programming task, using the software tool.
(Ibanez & Nichols, 1998).
Two major simulation tools were evaluated as part of this study, as follows:
• NS 2 (Network Simulator)
• OPNET Modeler.
To gain further insight into the project, we set out some salient aspects of both these
tools as such.
8.3.1 NS 2
The Network Simulator NS 2 tool is a widely adopted and deployed software tool
used for the simulation of advanced TCP/IP protocols and algorithms, per se. Like
other tools, it is an object-oriented system which can emulate many very complex
and real-life network topologies, characteristics, programs and algorithms.
The tool itself was developed many years ago, by the Internet and congestion control
community at the University of California, at Berkeley, and is maintained by
academia. The software system that is NS 2 is written in C++ and Object Oriented
Tcl (interpreted) scripts. (Fall & Floyd, 1996)
The following well known diagram depicts the class hierarchy that is embedded into
the NS 2 system.
An NS network model is constructed by interconnecting components (a.k.a. NS
objects).
Some of these objects include:
169
________________________________
Figure 8.1 NS 2 Class Hierarchy
• Nodes: These represent clients, hosts, routers and switches, also known as
Intermediate systems.
• Classifiers: These help determine the outward interface objects, depending on
destination (and on occasion, source) address. Various types of classifiers
include:
o address classifiers
o multi cast classifiers
o multi path classifiers
o replicators.
Trace/drop
NsObject
Connector Classifier
Queue Agent
Delay
Trace Multicast Replicator
Address
DropTail RED AgentTcp Trace/Enq
Trace/Deq Trace/Hop
TclObject
Reno New Reno Vegas Fack Sack FullTcp
170
• Links: These are used to connect nodes to construct the network topology. One
defines a pair of head & tail, with an interconnecting queue. Links can be
simplex or duplex. Queues can be of the types:
o drop Tail
o fair queuing
o deficit round robin
o RED
o class based queuing.
• Agents: These are the transport layer end points. Broadly, these can be classified
as TCP or UDP agents. We are mainly concerned with TCP agents. These occur
in the following flavors (in NS 2):
o TCP/Tahoe
o TCP/Reno
o TCP/New Reno
o TCP/Sack 1
o TCP/Vegas
o TCP/Fack
o TCP/Sink
o And others, which we will not be concerned with here.
• Applications: These entities sit on top of the transport layer and produce data to
model simulation features. They are activated to the transport endpoints as such.
At the transport layer, we will mainly use TCP and Sierra.
Some main applications include:
o FTP (File Transfer Protocol)
o Telnet (Remote Login).
171
Traffic Generators:
These can be of the following types:
o Exponential ON-OFF: generates packets at a fixed rate, during the ON
periods and no packets sent during OFF periods.
o Pareto On-OFF: Same as exponential ON-OFF except that the ON-OFF
periods .are constructed from a Pareto distribution.
o Constant Bit Rate (CBR): generates packets at a constant rate. Random noise
may be introduced.
OUR EXPERIENCE WITH NS 2
Our experience with NS 2, over the course of several months, was as follows:
ADVANTAGES: TCP Vegas interpreted
MAIN DISADVANTAGES:
o not menu driven (procedural, complex object orientation)
o rather user unfriendly (too many ad hoc macros)
o can be pretty non-robust at times (very frequent core dumps)
o presentation of results/graphs not automated (relatively hard to generate graphs)
o limited applications supported.
8.3.2 OPNET Simulator Tool Suite
This suite of software products was developed by OPNET Inc. (earlier Mil3 Inc.).
and has been very well received by both academics as well as professional IT
practitioners in the industry. It is usually deployed for the purposes of TCP/IP
Network performance simulation projects. The suite includes:
• OPNET Modeler
• IT Guru
• other products (which we will not appeal to here).
172
In fact, our experience was limited and confined only to Modeler, which provided
all of the functionality we needed for our study.
Unlike NS 2, Opnet Modeler is totally menu driven, with some very user-friendly
and natural GUI interfaces for all stages of the simulation project. A slew of libraries
in the tool will model popular TCP/IP protocols & applications (such as IP QoS,
MPLS, RSVP, etc.). Also, a variety of network hardware (routers, switches, links,
etc. that are available in the networking marketplace, can be emulated and are
supported by the tool.
Opnet Modeler has a three level hierarchy:
• network level
• node level
• process level.
The network level is at the highest bracket in the scheme of things, and is used to
build a network by inter-connecting nodes with links. The node level allows one to
formalize the internal architecture of nodes. And finally, the process-level modeling
allows one to specify functional level behavior of objects using Finite State
Machines and State Transition Diagrams (STD).
The first two levels of the modeling exercise are carried out mainly by either “drag
and drop” or menu selection (drop down choices), and hence we will not discuss
them any further. But a few more words on the process-level modeler are in order.
As a simulation enabler, OPNET Modeler graphically depicts all processes using
State Transition Diagramming (STD). Actions in each state are labeled “Executives”,
and can be:
• enter executives - upon entering the state
• exit executives - when leaving the state.
173
The Process Editor is used to create Finite State Machines.
Apart from the executive code blocks, the following additional code blocks are also
used in the Process Editor:
• State Variables (SV)
• Temporary Variables (TV)
• Header Block (HB)
• Function Block (FB)
• Diagnostic Block (DB)
• Termination Block (TB).
Each of these programming blocks carries a putative meaning that is indicated by
their names.
Parameters Finally, in OPNET Modeler, object parameters are configured using
“Parameter Editors” These come in four flavors:
• Link Model: Link object parameters are modeled. Specifies the Link (in Link
Editor) and allows one to set parameters.
• Packet Format Model: This is useful for setting data format structure (TCP, IP,
etc.) for packets that are generated. The Packet Editor is used for these. As
before, the Editor allows one to set attributes.
• ICI Model: This is handy for supporting interrupt-driven communication between
processes. These are generally used to exchange control information between
protocol layers. The ICI Editor is used for this purpose.
• PDF Model: Defining statistical distributions. It is used to define parameters of
traffic generators. The traffic profile can be set for applications, and for this the
PDF Editor is deployed.
174
OUR EXPERIENCE WITH OPNET MODELER
ADVANTAGES:
• robust
• totally menu driven
• user Friendly
• advanced support for presentation (graphs, tables, etc.) of simulation results.
DISADVANTAGES:
• TCP Vegas not yet incorporated.
FOR ALL OF THESE REASONS AS GIVEN ABOVE, VIS-À-VIS THE PRO AND
CON FEATURES OF BOTH OPNET MODELER AS WELL AS NS 2, A
STRATEGIC DECISION WAS MADE TO USE OPNET MODELER AS THE
SOFTWARE SIMULATION TOOL OF CHOICE TO BE DEPLOYED AND
PROGRAMMED FOR THE PURPOSES OF THIS THESIS, TO EVALUATE THE
RELATIVE MERITS OF OUR NOVEL CONGESTION CONTROL ALGORITHM,
SIERRA.
8.3.3 Simulation Theoreticals
Software-based simulation has been espoused very often in the industry as a very
popular tool for the evaluation of the relative performance of various complex
TCP/IP protocols, algorithms, and models. The role and contribution of software
simulation in the field of TCP/IP performance evaluation cannot be under-
emphasized. This process is discussed here, where we deal with simulation basics,
concepts and fundamental issues.
175
Simulation has many obvious benefits such as:
• convenient alternative when the physical network of choice is not available or
absent
• a variety of workloads and network conditions can be emulated
• repeatable comparison of alternative protocols/architectures
• fine details can be incorporated
• complements quantitative analysis, as in this thesis.
Simulation has been introduced in various works on the subject, including (Jain et al.
2004; Law & Kelton, 2001).
Over the course of development of the Simulation process, the following 10 step
approach has been generally recommended:
1. Define the Objectives of the study.
2. Construct reference NW model.
3. Select the fixed parameters.
4. Select performance metrics.
5. Determine variable parameters.
6. Embed all of the above in software.
7. Construct simulation software programs.
8. Execute the simulation program.
9. Collect performance data.
10. Present and interpret results.
We will return very shortly to the implementation of these 10 steps into our project.
It cannot be understated that there is a whole lot more to simulation than just
programming in to the simulator tool chosen. It would be a very grave mistake to
plunge directly and headlong into programming.
176
For now, we continue with our discussion of simulation technology, as such.
Continuous vs. DES
The system modeled can be either continuous (flow of fluids in a reactor) or discrete
(packet queues in a router). Depending on the nature of the model, one has either
continuous or discrete situation. Note the main component of DES (Discrete Event
Simulation) is a linked list of events waiting to happen.
Terminating vs. Steady State
A terminating simulation is used to study a system for a clearly defined period of
time, or a number of events, e.g., performance of a new protocol stack between 9 am
and 5 pm. Peak hour traffic is simulated, with the simulation terminating after
exactly 8 hours.
However, Steady State Simulation is a different matter. We cannot just terminate the
project. One must wait till the study has reached a steady state as such. For example,
we may want to study the long-term packet loss rates in a congested router.
Synthetic vs. Trace Driven
For all simulation activity, we need input traffic patterns. Oftentimes, traffic is
“synthetically” generated by random data generations. For this to happen, pre-
defined models (Poisson, exponential, ON-OFF, self-similar) must be selected in the
first instance.
Although random generators are a popular choice, they can never totally model the
actual traffic in a given network situation. To achieve this, one performs “trace-
driven” simulation and deploys it.
177
The steps involved are:
• Capture traces of packet arrival times (from an operational network).
• Process the traces.
• Use these traces as simulation inputs.
Note that no random traffic generation is involved. The relative performance of
several algorithms and models can be compared.
8.4 SIERRA SIMULATION PROJECT
Herein, we set out the various 10 steps of the recommended approach discussed
above, that was carried out as part of the actual simulation exercise involving Sierra,
our novel transport layer congestion control algorithm.
The objective of the study was to study the relative performance of various
congestion control schemes, as follows:
1. TCP Tahoe alone
2. TCP Reno alone
3. TCP New Reno alone
4. Sierra alone
5. Sierra vs. Tahoe
6. Sierra vs. Reno
7. Sierra vs. New Reno.
The reference network model is as follows:
178
Figure 8.2 Reference Simulation Topology
The only performance metric studied was average throughput. This alone was
observed at the end of the study, as such. Fixed parameters included were the traffic
profile for the steady state DES simulation exercise.
The network model was constructed and fixed parameters incorporated into the
Modeler simulation tool system. Sierra was programmed into the various process
level modelers (discussed above), and the simulation program was exercised.
Also note that all the simulation exercises were repeatedly re-enacted using different
seeds, again and again.
Simulation Results
The simulation results appear below. It will be noted that whereas Sierra promises
additional throughput (see Figures 8.3, 8.4 and 8.5), as compared to the others, it also
appears to be fair, in that it does not “hog bandwidth” at the expense of other more
compliant and conservative protocols. This is an encouraging reading from the
graphs presented (see Figures 8.6, 8.7 and 8.8).
IP Cloud Router
Node 1
Node 2
Node 4
Node 3
Router
179
Figure 8.3 Reno throughput
New Reno
-20
0
20
40
60
80
100
120
140
160
0 10000 20000 30000
Time
Thruput (bytes/sec)
Series1
180
Sierra
-10
0
10
20
30
40
50
60
70
80
0 10000 20000 30000
Time
Thruput (bytes/sec)
Series1
Figure 8.5 Sierra throughput
Sierra vs. Tahoe
-20
0
20
40
60
80
100
120
0 5000 10000 15000 20000 25000 30000
Time
Thruput (bytes/sec)
Series1 Series2
Figure 8.6 Sierra vs. Tahoe, relative throughput
181
Sierra vs. Reno
-20
0
20
40
60
80
100
120
0 10000 20000 30000
Time
Thruput (bytes/sec)
Series1 Series2
Figure 8.7 Sierra vs. Reno, relative throughput
Sierra vs. New Reno
-20
0
20
40
60
80
100
120
0 10000 20000 30000
Time
Thruput (bytes/sec)
Series1 Series2
Figure 8.8 Sierra vs. New Reno, relative throughput
182
8.5 CONCLUSIONS
In this thesis, we reviewed a number of technologies deployed in today’s computer
networks. All aspects and layers involved in constructing such networks were
presented, discussed and analyzed.
We then focused our attention on the design and development of models for the
Black Box aspect of the congestion control problem, that is so critical today when
applications contend for bandwidth from the edges of the network, either the Internet
itself or private wide area networks.
We introduced the Black Box vs. White Box dichotomy for congestion control
models, which is so critical for any appropriate treatment of this problem.
Specifically we excluded treatment of White Box modeling, which is deferred to
future work.
Black Box modeling has been the subject of numerous studies, analyses, refinements
and improvement over the years. A comprehensive analysis of these developments
was given in Chapter 6, from a critical point of view for further refinement.
Our novel protocol, Sierra (the subject of five comprehensive research papers), was
presented and developed from the ground up. Further work in this project, from a
simulation/mathematical perspective, was planned out in detail.
Later, we did a comprehensive consolidation of results from control theory as well as
convex optimization theory, following the leadership of F.P. Kelly (one of the
guiding fathers of mathematical congestion control). We then collated a plethora of
techniques and results that are generally applicable to the analysis of congestion
management, from a stochastic point of departure.
Finally, in Chapter 8, we applied the programme laid out in earlier chapters to prove,
quantitatively, that Sierra offers better throughput than its main competitors. Also,
183
we carried out a simulation (in software) project using the popular OPNET tool. This
again demonstrates the fairness and superiority of Sierra.
As a result, we are able to demonstrate the goodness of our novel protocol, Sierra,
from both an analytical and an experimental perspective.
Further research is indicated in the following areas:
• incorporation of White Box methods into this project (RED, BLUE, etc.)
• further simulation, using ECN and WHITE BOX techniques
• the fit of ECN into our project
• investigation of the results of tweaking Sierra parameters on the ensuing
available throughput and fairness
• application of comprehensive stochastic modeling techniques to White Box
protocols.
It is expected this work will be carried out in the next few years. This is a good area
for further exploration in which we hope we will be involved in the coming years.
184
REFERENCES
Aggarwal, A., Savage, S. and Anderson, T. Understanding the performance of TCP
pacing. Proceedings of INFOCOM 2000 (March 2000) vol. 3, 1157-1165
Ahn, J., Danzig, P., Liu, Z, and Yan, L. Evaluation of TCP Vegas: emulation and
experiment. SIGCOMM Symposium on Communications Architectures and Protocols
(1995)
Allman, M., Balakrishnan, H., and Floyd, S. Enhancing TCP’s loss recovery using
limited transmit. IETF RFC, August 2000
Anjum, F., & Tassiulas, L. Fair bandwidth sharing among adaptive and non-adaptive
flows in the Internet. Proceedings of IEEE INFOCOM ’99 (March 1999)
Athuraliya, S., Li, V.H., Low, S, and Yin, Q. REM: Active Queue Management.
IEEE Network (2001)
Awduche, D.O., Malcolm, J., Agogbua, J, O’Dell, M, and McManus, J.
Requirements for Traffic Engineering over MPLS. IETF RFC 2702 (September
1999)
Bajko, G., Moldovan, I., Pop, O., Biro, J. TCP flow control algorithms for routers.
Technical Report, Technical University of Budapest, Hungary, 1999
Balakrishnan, H., Rahul, H., and Seshan, S. An integrated congestion management
architecture for Internet hosts. Proceedings of ACM SIGCOMM (August 1999) 175-
187
185
Balakrishnan, H., Seshan, S., Amir, E., and Katz, R. Improving TCP/IP performance
over wireless networks. Proceedings of 1st ACM conference on Mobile
communications and Networking (mobicom) (November 1995)
Barford, P. & Crovella, M. Generating representative web workloads for network
and server performance evaluation. Proceedings of ACM SIGMETRICS ’98 (June
1998)
Barford, P., & Crovella, M. A performance evaluation of hyper text transfer
protocols. Proceedings of ACM SIGMETRICS ’99 (March 1999)
Benmohammed, L. & Meerkov, S.M. Feedback control of congestion in packet
switching networks: the case of a single congested node. IEEE/ACM Transactions on
Networking 1,6, (1993) 693-707
Bennett, J., Benson, K., Charny, A, Courteney, W., and LeBoudec, J-Y. Delay jitter
bounds and packet scale rate guarantee for expedited forwarding. INFOCOM 2001
(April 2001) vol. 3, 1502-1509
Bennett, J. & Zhang, H. WF2 Q. Worst-case fair weighted fair queuing. Proceedings
of IEEE INFOCOM ’96 (San Francisco, 1996), vol. 1, 120-128
Bertsekas, D. Nonlinear Programming, Athena Press, 2003
Black, U. Physical Layer Interfaces and Protocols, IEEE Computer Society Press,
1998
Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., and Weiss, W. An
architecture for differentiated services. IETF RFC 2474, December 1998
186
Bonomi, F., & Fendick, K. The rate-based flow control framework for the available
bit rate ATM service. IEEE Network Magazine (March/April 1995), 25-39
Boutremans, C., & LeBoudec, J-Y. A note on the fairness of TCP Vegas. Broadband
communications (2000)
Boyd, S. & Vanderberghe, L. Convex Optimization. CUP, 2006
Braden, B., Clark, D., Crowcroft, J., Davie, B, Deering, S., Estrin, D., Floyd, S.,
Jacobson, V., Minshall, G., Partridge, C., Peterson, L., Ramakrishnan, K., Shenker,
S., Wrowclawski, J., and Zhang, L. Recommendations on queue management and
congestion avoidance in the Internet. Internet RFC 2309, 1998
Braden, R. Requirements for Internet hosts: Communications Layers. RFC 1122, 1999
Brakmo, L. & Peterson, L. TCP Vegas: End to End congestion avoidance on a global
Internet. IEEE Journal on Selected Areas on Communication 13,8 (October 1995)
1465-1480
CCIE Fundamentals, Cisco Systems, 2000, Cisco Press
Cerf, V., & Kahn, R. A protocol for packet network intercommunication.
Transactions on Communications 22,5 (May 1974) 637-648
Chapman, A., & Kung, H.T., Traffic management for aggregate IP streams.
Proceedings of CCBR (Ottawa, March 1999)
187
Charny, A., Clark, D., and Jain, R. Congestion control with explicit rate indication.
Proceedings of the IEEE International Communications Conference (ICC) (June
1995) 1954-1963
Chiu, D., & Jain, R. Analysis of the increase/decrease algorithms for congestion
avoidance in computer networks. Journal of Computer Networks 17,1 (June 1989)
1-14
Clark, D., & Fang, W. Explicit allocation of best-effort packet delivery service.
IEEE/ACM Transactions on Networking 6,4 (August 1998) 362-373
Clark, D., & Tennenhouse, D. Architectural considerations for a new generation of
protocols. ACM SIGCOMM ’88 (1988)
Clark, D.D. The design philosophy of the DARPA Internet. ACM SIGCOMM ’88
(1988)
Clark, D.D., Lambert, M.L., and Zhang, L. NETBLT: A high throughput transport
protocol. SIGCOMM Symposium on Communications Architectures and Protocols
(August 1987), 353-359
Coltum, R. OSPF: An Internet Routing Protocol. ConneXions: The Interoperability
Report, 3,8, 1999
Crowcroft, J., & Oechslin, P. Differentiated end-to-end Internet services using a
weighted proportional fair sharing TCP. Computer Communications Review 28,3
(July 1998)
188
Cruz, R.L. A calculus for Network Delay and a Note on Topologies of
Interconnection Networks. PhD thesis, University of Illinois, July 1987.
Davidson, J. An Introduction to TCP/IP, Springer 1992
Davin, J., & Heybey, A. A simulation study of fair queuing and policy enforcement.
Computer Communication Review 20,5 (October 1990), 23-29
Deb, S. & Srikant, R. Rate-based vs. queue-based models of congestion
control.University of Illinois Technical Report, 2003
Deering, S., & Hinden, R. Internet protocol, version 6 (IPv6). Internet RFC 2460,
December 1998
Demers, A., Keshav, S., and Shenker, S. Analysis and simulation of a fair queueing
algorithm. Internetworking: Research and Experience 1 (1990), 3-26
Egevang, K., & Francis, P. The IP Network Address Translator (NAT). IETF RFC
1631, May 1994. Covers NAT, but does not cover Port Address Translation (PAT).
Et, A.L. Closed-Loop Rate-Based Traffic Management. ATM Forum 94-0211R3,
April 1994
Fall, K., & Floyd, S. Simulation based comparisons of Tahoe, Reno and SACK TCP.
ACM Computer Communications Review 26,3 (July 1996), 5-21
Fenner, W. Internet Group Management Protocol, ver. 2. RFC 2236, 1997
Floyd, S. TCP and Explicit Congestion Notification. ACM Computer Communication
Review 24,5 (October 1994), 10-23
189
Floyd, S. The New Reno modification to TCP’s fast recovery algorithm.
http://www.aciri.org/floyd/papers/rfc2582.txt, April 1999
Floyd, S., & Fall, K. Promoting the use of end-to-end congestion control in the
Internet. IEEE/ACM Transactions on Networking (February 1998)
Floyd, S. & Henderson, T. The New Reno modification to TCP’s Fast Recovery
Algorithm. RFC 2582 (1999)
Floyd, S., & Jacobson, V. Connections with multiple congested gateways in packet-
switched networks part 1: One-way traffic. Computer Communications Review 21,5
(October 1991), 30-47
Floyd, S., & Jacobson, V. Traffic phase effects in packet-switched gateways. ACM
Computer Communication Review 21,2 (April 1991)
Floyd, S., & Jacobson, V. Random early detection gateways for congestion
avoidance. IEEE/ACM Transactions on Networking 1,4 (August 1993)
Floyd, S., Mahdavi, J., Mathis, M., and Podolsky,M. Extension to the SACK option
for TCP. RFC 2883 (2000)
Fraser, A.G. Towards a universal data transport system. IEEE Journal on Selected
Areas in Communication 1,5 (November 1983)
Garcia-Luna-Aceves, J.J. Loop free routing using Diffusing Computations.
IEEE/ACM Transactions in Networking, 1,4, 1993
Georgiadis, L., Guerin, R., and Parekh, A. Optimal multiplexing on a single link:
Delay and buffer requirements. IEEE Transactions on Information Theory (1997)
190
Golestani, S.J. A self-clocked fair queuing scheme for broadband applications.
Proceedings of INFOCOM (Toronto, Canada, June 1994) 636-646
Goyal, P., Vin, H., and Chen, H. Start-time fair queuing: A scheduling algorithm for
integrated services packet switching networks. Proceedings of ACM SIGCOMM ’96
(Palo Alto, CA, 1996), 167-168
Hasegawa, G., Murata, M., and Miyahara, H. Fairness and stability of congestion
control mechanisms of TCP. INFOCOM ’99 (March 1999) vol.3, 1329-1336
Hashem, E. Analysis of random drop for gateway congestion control. Technical
Report, Laboratory for Computer Science, MIT, 1988
Held, G. & Jagannathan, S.R. Practical Network Design Techniques, 2nd ed. CRC
Press, 2004
Held, G. ABCs of IP Addressing. CRC Press 2000
Held, G. Enhancing LAN Performance. Wiley Press, 2003
Hoe, J. Start-up dynamics of TCP’s congestion control and avoidance schemes.
Master’s thesis, MIT, 1995
Ibanez, J., & Nichols, K. Preliminary simulation evaluation of an assured service.
Internet draft, August 1998
IETF. Integrated services in the Internet architecture: an overview. IETF RFC 1633,
June 1994
191
Jagannathan, S. & Matawie, K. Issues and trends in unicast congestion management.
IWSM proceedings, Leuven, Belgium (2003), 201-206
Jagannathan, S. & Matawie, K. A survey of issues and recent trends in unicast
congestion control for the Internet. Journal of American Academy of Business,
Cambridge, (2005), 328-335
Jagannathan, S. & Matawie, K. Stochastic modeling of congestion control
algorithms. Global Management & Information Technology Research Conference,
New York, May 2006
Jacobson, V. Congestion avoidance and control. Proceedings of the SIGCOMM ’88
Symposium (August 1988) 314-332
Jacobson, V., & Braden, R. TCP extensions for long-delay paths. IETF RFC 1072,
October 1988, Discusses TCP SACK
Jacobson, V., Nichols, K., and Poduri, K. An expedited forwarding phb. IETF RFC
2598, June 1999
Jacobson, V., Nichols, K., and Poduri, K. The “virtual wire” behaviour aggregate.
Internet draft, March 2000
Jacobson, V., Nichols, K., and Poduri, K. The virtual wire per-domain behaviour.
Internet draft, July 2000
Jain, R. A delay-based approach for congestion avoidance in interconnected
heterogeneous computer networks. Computer Communications Review 19,5 (1989)
56-71
192
Jain, R. Congestion control in computer networks: Issues and trends. IEEE Network,
(1990) 24-30
Jain, R. Congestion control and traffic management in ATM networks: Recent
advances and a survey. Invited Submission to Computer Networks and ISDN systems
(February 1995) vol. 28, 1723-1738
Johnson, H.W. Fast Ethernet. Prentice Hall, 1996
Kalambi, D., Handley, M. and Rohrs, C. Internet congestion control for future high
bandwidth-delay product environments. Proceedings of ACM SIGCOMM, 2002
Karandikar, S., Kalyanaraman, S., Bagal, P., and Packer, B. TCP rate control.
Computer Communications Review 30,1 (2000)
Kelly, F.P. Charging and rate control of elastic traffic. European Transactions on
Telecommunications (1997)
Kelly, F.P. Maulloo, A., and Tan, D. Rate control in communication networks:
Shadow prices, proportional fairness and stability. Journal of the Operations
Research Society (1998)
Kelly, F.P. Reversibility and Stochastic Networks. Wiley, 1979 Kelly, F.P. Models for a self-managed Internet. Philosophical Transactions of the
Royal Society, 2000, 2335-2348 Kelly, F.P. Fairness and stability of End-to-End congestion control. European
Journal of Control, 9 (2003) 149-65
193
Kent, S., & Atkinson, R. Security Architecture for the Internet protocol. RFC 2401
November 1998
Keshav, S. A control-theoretic approach to flow control. Proceedings of ACM
SIGCOMM ’91 (September 1991)
Keshav, S. On the efficient implementation of fair queueing. Journal of
Internetworking Research and Experience (1991)
Keshav, S. An engineering approach to computer networking. Addison Wesley, 1997
Keshav, S., & Morgan, S.P. Smart: Performance with overload and random losses.
IEEE INFOCOM ’97 (April 1997)
Kolarov, A. Study of the TCP/UDP Fairness issue for assured forwarding per-hop
behaviour in differentiated services networks. 2001 IEEE workshop on High
Performance Switching and Routing (May 2001) 190-196
Kousky, K. Bridging the Network Gap. LAN Technology 6,1 2000 Krol, E. The Hitchhiker’s guide to the Internet. RFC 1118, Sep 1999 Kung, H., Blackwell, T. and Chapman, A. Credit-based flow control for ATM
networks: Credit update protocol, adaptive credit allocation, and statistical
multiplexing. Proceedings of ACM SIGCOMM ’94 (September 1994) 101-114
Kunniyur, S. & Srikant, R. End-to-end congestion control: Utility functions, random
losses and ECN marks. Proceedings INFOCOM 2000 (March 2000)
194
Lakshman, T.V. & Madhow, U. The performance of TCP/IP for networks with high
bandwidth-delay products and random loss. IEEE/ACM Transactions on Networking
1,4 (October 1997)
Lin, D. & Kung, H.T. TCP trunking: Design, Implementation and Performance IEEE
ICNP (October 1999), 222-231
Lin, D., and Morris, R. Dynamics of Random Early Detection. SIGCOMM ’97
(August 1997)
Lippins, N. The Internetwork Decade. Data Communications 21,14 : (October 2001)
Lo Monaco, G., Feroz, A., and Kalyanaraman, S. TCP Friendly marking for scalable
best-effort services on the Internet. Computer Communications Review, 2001
Low, S. & Lapsley, D.E. Optimization flow control: Basic algorithm and
convergence. IEEE/ACM Transactions on Networking 7,6 (December 1999)
Low, S., Peterson, L., and Wang, L. Understanding Vegas: A duality model.
Proceedings ACM Sigmetrics 2001 (June 2001)
Mahdavi, J. & Floyd, S. TCP friendly unicast rate-based flow control. Technical
Note, Jan 8, 1997
Martin, J. & Chapman, K.K. LANs: Architectures and Implementations. Prentice
Hall, 1989
Mathis, M. & Mahdavi, J. Forward Acknowledgement: Refining TCP congestion
control. Proceedings of ACM SIGCOMM “96 (Palo Alto, CA, August 1996),
281-291
195
Mathis, M., Mahdavi, J., Floyd, S., and Romanow, A. TCP Selective
Acknowledgement options. Internet RFC 2018 (October 1996)
May, M., Bolot, J., Diot, C., and Lyles, B. Reasons not to deploy RED. Proceedings
IEEE/IFIP IWQoS ’99 (June 1999)
Miller, M.A. Internet Technologies Handbook. Wiley, 2004
Mishra, P.P. & Kanakia, H.R. A hop-by-hop rate-based congestion control scheme.
Proceedings of ACM SIGCOMM ’92 (August 1992)
Mo, J., La., R.J., Anantharam, V., and Walrand, J. Analysis and comparison of TCP
Reno and Vegas. Proceedings of IEEE INFOCOM ’99 (March 1999)
Mo, J. & Walrand, J. Fair end-to-end window-based congestion control. IEEE/ACM
Transactions on Networking (2000)
Morgan, S.P. & Keshav, S. Packet pair rate control-buffer requirements and overload
performance. Technical Note, 1998
Morris, R. TCP behaviour with many flows. Proceedings of ICNP ’97 (October
1997)
Morris, R. Scalable TCP congestion control. PhD Thesis, Harvard University,
January 1999
Nichols, K., Jacobson, V., and Zhang, L. A two-bit differentiated services
architecture for the Internet. Internet Draft ftp://ftp.ee.lbl.gov/papers/dsarch.pdf
November 1997
196
Ott, T.J. ECN Protocols and the TCP Paradigm
http://web.njit.edu/mlt/papers/index.html
Ott, T.J., Lakshman, T.V., and Wong, L.H. SRED: stabilized RED. Proceedings of
IEEE INFOCOM ’99 (March 1999)
Parekh, A.K. A generalized processor sharing approach to flow control in Integrated
Services Networks. PhD thesis, MIT, February 1992
Parekh, A.K. & Gallagher, R.G. A generalized processor sharing approach to flow
control in Integrated Services Networks - the multiple node case. IEEE/ACM
Transactions on Networking (April 1994) 137-150
Perlman, R. Interconnections: Bridges, Switches and Routers. Addison Wesley, 2001
Plummer, D. An Ethernet Address Resolution Protocol. RFC 826, 1982
Ramakrishnan, K.K. & Floyd, S. A proposal to add Explicit Congestion Notification
(ECN) to IP. Internet RFC 2481 January 1999
Ramakrishnan, K.K. & Jain, R. A binary feedback scheme for congestion avoidance
in computer networks with a connectionless network layer. Proceedings of ACM
SIGCOMM ’88 (August 1988), 303-313
Ramakrishnan, K.K. & Jain, R. A binary feedback scheme for congestion avoidance
in computer networks. ACM Transactions on Computer Systems 8,2 (1990) 158-181
Ramakrishnan, K.K., Jain. R., and Chiu, D.M. Congestion avoidance in computer
networks with a connectionless network layer. Part iv: A selective binary feedback
197
scheme for general topologies methodology. Technical Report DEC-TR-510, Digital
Equipment Corporation, 1987
Sahu, S., Nain, P., Towsely, D, Diot, C. and Firiou, V. On achievable service
differentiation with token bucket marking for tcp. ACM Sigmetrics ’00 (June 2000)
Seddigh, N., Nandy, B., and Pieda, P. Bandwidth assurance issues for tcp flows in a
differentiated services network. IEEE GLOBECOM ’99 (December 1999) 1792-1798
Shenker, S. Fundamental design issues for the future Internet. IEEE Journal on
Selected Areas in Communication (1995), 1176-1188
Shenker, S. & Wroclawski, J. General characterization parameters for integrated
service network elements. IETF RFC 2212 (September 1997)
Shreedhar, M. & Varghese, G. Efficient fair queueing using deficit round robin.
Proceedings of ACM SIGCOMM (September 1995)
Stallings, W. Local and Metropolitan Area Networks. Prentice Hall, 1997
Stallings, W. Networking Standards. Addison Wesley, 1993
Stevens, W.R. TCP slow start, congestion avoidance, fast retransmit and fast
recovery algorithms. Internet RFC 2001 (January 1997)
Stoica, I., Shenker, S., and Zhang, H. Core-stateless Fair queueing: Achieving
approximately fair bandwidth allocations in high speed networks. Proceedings of
ACM SIGCOMM ’98 (1998) 118-130
Tanenbaum, A. Computer Networks. 4th ed., Prentice Hall, 2004
198
Tinnakornsrisuphap, P. & Makowski, A.M. Limit behavior of ECN/RED gateways
under a large number of TCP flows. Proceedings of INFOCOM (San Francisco, CA)
Apr. 2003, 873-83.
Tsang, D., & Wong, W. A new rate-based switch algorithm for ABR traffic to
achieve max-min fairness with analytical approximation and delay adjustment.
Proceedings of IEEE INFOCOM ’96 (March 1996) 1174-1181
Turner, J. New directions in communications, or which way to the information age?
IEEE Communications Magazine (1986)
Visweswaraiah, V. & Heidemann, J. Improving restart of idle TCP connections.
Technical Report TR-97-661, University of Southern California, November 1997
Vojnovic, M., Le Boudec, J-Y., and Boutremans, C. Global fairness of additive-
increase and multiplicative-decrease with heterogeneous round trip times. IEEE
INFOCOM 2000 (March 2000) 1303-1312
Wang, S.Y. Decoupling control from dataq for TCP Congestion Control. PhD thesis
, Harvard University, 1999
Wang, Z. & Crowcroft, J. A new congestion control scheme: Slow Start and search
(tri-S). ACM Computer Communications Review SIGCOMM 21, 1 (1991) 32-43
Zhang, H. & Ferrari, D. Rate-controlled static priority queuing. Proceedings of IEEE
INFOCOM ’93 (San Francisco, 1993)
199
Zhang, H. & Ferrari, D. Rate-controlled service disciplines. Journal of High Speed
networking (1994)
Zhang, L. A new architecture for packet switching network protocols. Technical
Report MIT LCS TR-45, Laboratory of Computer Science, MIT, August 1989
Zhang, L., Shenker, S., and Clark, D. Observations on the dynamics of a congestion
control algorithm: the effects of two-way traffic. ACM Computer Communications
Review (September 1991)
200
BIBLIOGRAPHY
Aalto, S. and Lassila, P. Impact of size based scheduling on flow level performance
analysis in wireless down-link data channels. Proceedings of the 20th International
Teletraffic Congress (ITC-20) 1096-1107, 2007, Ottawa, Canada
Ahmed, T., Mehaoua, A., Boutaba, B. and Iraqi, Y. Adaptive Packet Video
Streaming over IP Networks: A cross layer approach. IEEE Journal on Selected
Areas in Communications, vol. 23, no. 2, 385-401, 2005
Chen. C, Li, Z-G, Soh, Y-C. TCP friendly source adaptation for multimedia
applications over the Internet. Proceedings of the 15th International Packet Video
Workshop (PV ’06) Hangzhou, China, Apr. 2006
Chen, M. and Zakhor, A. Multiple TFRC Connection Based Rate Control for
Wireless Networks. IEEE Transactions on Multimedia, vol. 8, no. 5, 1045-1062,
2006
Cheung, G. and Yoshimura, T. Streaming Agent: A network proxy for media
streaming in 3G wireless networks. IEEE International Packet Video Workshop,
Pittsburgh, PA, USA, Apr 2002
Floyd, S. and McCanne, S. Network Simulator, LBNL Public Domain Software,
http://www.isi.edu/nsnam/ns/.
201
Hyytia, E., Lassila, P. and Virtamo, J. Spatial Node Distribution of the Random
Waypoint Mobility Model with Applications. IEEE Transactions on Mobile
Computing, vol. 5, no. 6, 680-694, 2006
Jacobs, S. and Eleffheriadis, A. Streaming video using TCP flow control and
dynamic rate shaping. Journal of Visual Communication and Image Representation.
vol. 9, no. 3, 211-222, 1998
Kalman, M., Steinbach, E., and Grod, B. Adaptive media playout for low delay video
streaming channels. IEEE Transactions on Circuits and Systems for Video
Technology, vol. 14, no. 6, 841-851, 2004
Kilpi, J. and Lassila, P. Micro- and Macroscopic analysis of RTT variability in GPRS
and UMTS networks. Proceedings of Networking 1176-1181, 2006, Coimbra,
Portugal
Kim, Y-G., Kim, J. and Jay-Kuo, C.C. TCP-Friendly Internet video with Network
Aware error control. IEEE Transactions on Circuits and Systems for Video
Technology vol. 14, no. 2, 256-268, 2004
Lassila, P. and Kuusela, P. Performance of TCP on low bandwidth wireless links
with delay spikes. European Transactions on Telecommunications, 2007, to appear.
202
Luna, C.E., Eisenberg, Y. et al. Joint Source coding and data rate adaptation for
energy efficient wireless video streaming. IEEE Transactions on Selected Areas in
Communications, vol. 21, no. 10, pp 1710-1720, 2003
Sastry, N.R. and Lan, S.S.. CYRF: A theory of window based unicast congestion
control. IEEE/ACM Transactions on Networking, vol. 13, no. 2, 330-342, 2005
Schaar, M. van der, and Shankar, S. Cross layer wireless multimedia transmission:
Challenges, principles and a new paradigm. IEEE Wireless Communications, vol.
12, no. 4, 50-58, 2005
Sharma, V., Virtano, J. and Lassila, P. Performance Analysis of the Random Early
Detection Algorithm. Probability in the Engineering and Informational Sciences,
vol. 16, no. 3, 367-388, 2002
Silsalem, D. TCP Friendly congestion control for multimedia communication in the
Internet. PhD Thesis, Technical University of Berlin, Berlin, Germany, 2000
Vieron, J. and Guillemot, C. Real Time constrained TCP compatible rate control for
Video over the Internet. IEEE Transactions on Multimedia, vol. 6, no. 4, 634-646,
2004
Yan, Y., Katrinis, K. et al. Media- and TCP-Friendly congestion control for scalable
video streams. IEEE Transactions on Multimedia, vol. 8, no. 2, 196-206, 2006
203
Zhu, P. et al. Joint design of source rate control and QoS aware congestion control
for video streaming over the Internet. IEEE Transactions on Multimedia, vol. 9, no.
2, 336-376, 2007
GLOSSARY
Most of the Specialized and Technical terms used within the body of this Thesis tend
rather to be “Topics” in their own right, and they don’t quite lend themselves to any
concise definitions as such.
All technical/specialized terms introduced in the thesis are elaborated on and
explained at the time of introduction.