CS716 Advanced Computer Networks By Dr. Amir Qayyum

11

CS716

Advanced Computer Networks

By Dr. Amir Qayyum

2

Lecture No. 25

Review Lecture

4

Switched Networks

– Two or more nodes connected by a link

– Circular nodes (switches) implement the network

– Squared nodes (hosts) use the network

• A network can be defined recursively as...

5

Switched Networks

– Two or more networks connected by one or more nodes: internetworks

– Circular nodes (router or gateway) interconnects the networks

– A cloud denotes “any type of independent network”

• A network can be defined recursively as...

6

Switching Strategies

• Circuit switching: Carry bit streams

a. establishes a dedicated circuit

b. links reserved for use by communication channel

c. send/receive bit stream at constant rate

d. example: original telephone network

• Packet switching: Store-and-forward

messagesa. operates on discrete

blocks of datab. utilizes resources

dynamically according to traffic demand

c. send/receive messages at variable rate

d. example: Internet

7

Multiplexing• Physical links/switches must be shared among users

– (Synchronous) Time-Division Multiplexing (TDM)

– Frequency-Division Multiplexing (FDM)

L1

L2

L3

R1

R2

R3Switch 1 Switch 2

Multiple flows on a single link

Do you see any problem with TDM / FDM ?

8

Statistical Multiplexing• On-demand time-division, possibly synchronous (ATM)

• Schedule link on a per-packet basis

• Buffer packets in switches that are contending for the link

• Packets from different sources interleaved on link

…

Do you see any problem ?

9

Inter-Process Communication• Turn host-to-host connectivity into process-to-process

communication, making the communication meaningful.• Fill gap between what applications expect and what the

underlying technology provides.

Abstraction for application-level communication

Host

Host

Application

Host

Application

Host Host

Channel

10

Abstract Channel Functionality

• What functionality does a channel provide ?– Smallest set of abstract channel types adequate

for largest number of applications

• Where the functionality is implemented ?– Network as a simple bit-pipe with all high-level

communication semantics at the hosts

– More intelligent switches allowing hosts to be “dumb” devices (telephone network)

11

Performance Metrics

• … and to do so while delivering “good” performance

• Bandwidth (throughput)– Data transmitted per unit time, e.g. 10 Mbps– Link bandwidth versus end-to-end bandwidth– Notation

• KB = 210 bytes• Kbps = 103 bits per second

12

Performance Metrics• Latency / Delay

– Time to send message from point A to point B– One-way versus Round-Trip Time (RTT)– Components

Latency = Propagation + Transmit + QueuePropagation = Distance / cTransmit = Size / Bandwidth

• Note:• No queuing delay in direct (point-to-point) link• Bandwidth irrelevant if size = 1 bit• Process-to-process latency includes software processing overhead

(dominates over shorter distances)

13

Delay x Bandwidth Product

• Amount of data “in flight” or “in the pipe”• Example: 100ms RTT x 45Mbps BW = 560KB• This much data must be buffered before the sender

responds to slowdown the request

Delay

Bandwidth

14

Network Architecture

• The challenge is to fill the gap between hardware capabilities and application expectations, and to do so while delivering “good” performance

• Designers cope with this complex task by developing a network architecture as a guideline– Layering, protocols, standards

15

Layering• Alternative abstractions at each layer• Manageable network components• Modify layers independently

Hardware

Host-to-host connectivity

Application programs

Request/replychannel

Message streamchannel

16

Protocols

• Building blocks of a network architecture

• Each protocol object has two different interfaces– service interface: operations on this protocol

– peer-to-peer interface: messages exchanged with peer

• Term “protocol” is overloaded– Specification of peer-to-peer interface– Module that implements this interface– Peer modules are interoperable if both accurately

follow the specifications

17

Host 1 Host 2

Service

interface

Peer-to-peer

interface

Protocol Interfaces

High-level

object

High-levelobject

ProtocolProtocol

18

Protocol Graph – Network Architecture• Collection of protocols and their dependencies

– Most peer-to-peer communication is indirect– Peer-to-Peer is direct only at hardware level

Host 1 Host 2

Fileapplication

Digitallibrary

application

Videoapplication

Fileapplication

Digitallibrary

application

Videoapplication

RRP RRPMSP MSP

HHP HHP

RRP: Request Reply Protocol

MSP: Message Stream Protocol

HHP: Host-to-Host Protocol

19

Protocol Machinery

• Multiplexing and Demultiplexing (demux key)• Encapsulation (header/body) in peer-to-peer

interfaces– Indirect communication (except at hardware level)– Each protocol adds a header– Part of header includes demultiplexing field (e.g., pass

up to request/reply or to message stream?)

20

Encapsulation

Host 1 Host 2

Applicationprogram

Applicationprogram

Data Data

RRP RRP

RRP Data

HHP HHP

RRP DataHHP

RRP Data

21

Standard Architectures

• Open System Interconnect (OSI) Architecture– International Standards Organization (ISO)

– International Telecommunications Union (ITU), formerly CCITT

– “X dot” series: X.25, X.400, X.500

– Primarily a reference model

22

OSI Architecture

Application

Presentation

Session

Transport

End host

One or more nodeswithin the network

Network

Data link

Physical

Network

Data link

Physical

Network

Data link

Physical

Application

Presentation

Session

Transport

End host

Network

Data link

Physical

Application

Data formatting

Connection management

Process-to-process communication channel

Host-to-host packet delivery

Framing of data bits

Transmission of raw bits

User level

OS kernel

23

Internet Architecture

• TCP/IP Architecture– Developed with ARPANET and NSFNET

– Internet Engineering Task Force (IETF)

• Culture: implement, then standardize

• OSI culture: standardize, then implement

– Became popular with release of Berkeley Software Distribution (BSD) Unix; i.e. free software

– Standard suggestions traditionally debated publically through “Request For Comments” (RFC’s)

24


• Implementation and design done together• Hourglass Design (bottleneck is IP)• Application vs Application Protocol (FTP, HTTP)

… NETnNET2NET1

IP

TCP UDP

FTP HTTP NV TFTP

25


• Layering is not very strict

Application

TCP UDP

IP

Network

26

Networking in the Internet Age

27

Network Application Programming Interface (API)

• Interface that the OS provides to its networking subsystem– Most network protocols are implemented in software

– All systems implement network protocols as part of the OS

– Each OS is free to define its own network API

– Applications can be ported from one OS to another if APIs are similar

• *IF* application program does not interact with other parts of the OS other than the network (file system, fork processes, display …)

28

Protocols and API

• Protocols provide a certain set of services

• API provides a syntax by which those services can be invoked

• Implementation is responsible for mapping API syntax onto protocol services

29

Socket API• Use sockets as “abstract endpoints” of

communication• Issues

– Creating & identifying sockets– Sending & receiving data

• Mechanisms– UNIX system calls and library routines

socket

process

30

Protocol-to-Protocol Interface• A protocol interacts with a lower level protocol

like an application interacts with underlying network

• Why not using available network APIs for PPI ?– Inefficiencies built into the socket interface

• Application programmer tolerate them to simplify their task

– inefficiency at one level

• Protocol implementers do not tolerate them– inefficiencies at several layers of protocols

31

Protocol-to-Protocol Interface Issues

• Configure Multiple Layers– Static vs Extensible

• Process Model– Avoid context switches

• Buffer Model– Avoid data copies

32

Process Model

(a) (b)Process-per-Protocol Process-per-Message

inter-process communication

procedure call

33

Buffer Model

Buffer Copy Buffer Copy

Application Process

Topmost Protocol

send() deliver()

34

Network Programming

• Things to Learn– Internet protocols (IP, TCP, UDP, …)– Sockets API (Application Programming Interface)

• Why IP and Sockets• Allows a common name space across most of Internet

– IP (Internet Protocol) is standard• Reduces number of translations, which incur overhead

– Sockets: reasonably simple and elegant Unix interface (most servers run Unix)

35

Socket Programming

• Reading: Stevens 2nd edition, Chapter 1-6• Sockets API: A transport layer service interface

– Introduced in 1981 by BSD 4.1

– Implemented as library and/or system calls

– Similar interfaces to TCP and UDP

– Can also serve as interface to IP (for super-user) known as “raw sockets”

– Linux also provides interface to MAC layer (for super-user) known as “data-link sockets”

36

Client-Server Model

• Asymmetric relationship• Server/Daemon

– Well-known name

– Waits for contact

– Process requests, sends replies

• Client– Initiates contact

– Waits for response

Server

Client Client

Client

37

Client-Server Model• Bidirectional communication channel• Service models

– Sequential: server processes only one client’s requests at a time

– Concurrent: server processes multiple clients’ requests simultaneously

– Hybrid: server maintains multiple connections, but processes requests sequentially

• Server and client categories not disjoint– Server can be client of another server– Server as client of its own client (peer-to-peer

architecture)

38

TCP Connections

• TCP connection setup via 3-way handshake– J and K are sequence numbers for messages

Client Server

SYN J

SYN K

ACK J+1

ACK K+1 Hmmm … RTT is

important!

39

TCP Connections

• TCP connection teardown (4 steps) (either client or server can initiate connection teardown)

Client Server

FIN J

FIN K

ACK K+1

ACK J+1

active close

passive close

closes connection

Hmmm … Latency matters!

40

UDP - Aspects of Services

• Unit of transfer is a datagram (variable length packet)

• Unreliable, drops packets silently

• No ordering guarantees

• No flow control

• 16-bit port space (distinct from TCP ports) allows multiple recipients on a single host

41

Addresses and Data

• Internet domain names: human readable– Mnemonic

– Variable Length

• e.g. www.case.edu.pk, www.carepvtltd.com (FQDN)

• IP addresses: easily handled by routers/computers– Fixed Length

– Tied (loosely) to geography

• e.g. 131.126.143.82 or 212.0.0.1

42

Endianness

• Machines on Internet have different endianness

• Little-endian (Intel, DEC): least significant byte of word stored in lowest memory address

• Big-endian (Sun, SGI, HP): most significant byte...

43

Socket Address Structures• Socket address structures (all fields in network byte order

except sin_family)

IP addressstruct in_addr {

in_addr_t s_addr; /* 32-bit IP address */

};

TCP or UDP addressstruct sockaddr_in {

short sin_family; /* e.g., AF_INET */ushort sin_port; /* TCP / UDP port */struct in_addr; /* IP address */

};

44

Address Conversion• All binary values used and returned by these functions

are network byte ordered

struct hostent* gethostbyname (const char* hostname);

translates English host name to IP address (uses DNS)

struct hostent* gethostbyaddr (const char* addr, size_t len, int family);

translates IP address to English host name (not secure)

int gethostname (char* name, size_t namelen);reads host’s name (use with gethostbyname to find local

IP)

45

Address Conversionin_addr_t inet_addr (const char* strptr);

translate dotted-decimal notation to IP address; returns -1 on failure, thus cannot handle broadcast value “255.255.255.255”

int inet_aton (const char* strptr, struct in_addr inaddr);

translate dotted-decimal notation to IP address; returns 1 on success, 0 on failure

char* inet_ntoa (struct in_addr inaddr);

translate IP address to ASCII dotted-decimal notation (e.g., “128.32.36.37”); not thread-safe

46

Socket API

• Creating a socketint socket(int domain, int type, int protocol)

• domain (family) = AF_INET, PF_UNIX, AF_OSI• type = SOCK_STREAM, SOCK_DGRAM• protocol = TCP, UDP, UNSPEC• return value is a handle for the newly created

socket

47

Sockets (cont)

• Passive Open (on server)int bind(int socket, struct sockaddr *addr, int

addr_len)int listen(int socket, int backlog)int accept(int socket, struct sockaddr *addr,

int addr_len)

• Active Open (on client)int connect(int socket, struct sockaddr *addr,

int addr_len)

48

Sockets (cont)

• Sending Messagesint send(int socket, char *msg, int mlen, int

flags)

• Receiving Messagesint recv(int socket, char *buf, int blen, int

flags)

49

Point-to-Point Links

Reading: Peterson and Davie, Ch. 2

OutlineHardware building blocksEncodingFramingError DetectionReliable transmission

• Sliding Window Algorithm

50

Direct Link Issues in the OSI and Hardware/Software Contexts

Transport

Network

Data Link

Physical

Session

Presentation

Application

user-level software

kernel software (device drivers)

reliability

framing, error detection, MAC

encoding hardware (network adapter)

51

Hardware Building Blocks

• Nodes– Hosts: general-purpose computers– Switches: typically special-purpose hardware– Routers (connecting networks): varies

• Links– Copper wire with electronic signaling– Glass fiber with optical signaling– Wireless with electromagnetic (radio, infrared,

microwave) signaling

52

Links

• Physical Media– Twisted pair cable– Coaxial cable– Optical fiber– Space

• Media is used to propagate signals• Signals are electromagnetic waves of certain

frequency, traveling at speed of light

53

Signals Over a Link

• Signal is modulated for transmission– Varying frequency/amplitude/phase to

receive distinguishable signals

• Binary data (0s and 1s) is encoded in a signal– Make it understandable by the receiving

host

54

Bits Over a Link

• Bit streams may be transmitted both ways at a time on a point-to-point link– Full Duplex

• Sometimes two nodes must alternate link usage– Half Duplex

55

Encoding

• Signals propagate over a physical medium– Modulate electromagnetic waves

– e.g. vary voltage

• Encode binary data onto signals that propagate

Signalling component

Signal

Bits

Node NodeAdaptor Adaptor

56

Encoding

• Problems with signal transmission– Attenuation: signal power absorbed by medium

– Dispersion: a discrete signal spreads in space

– Noise: random background “signals”

modulator demodulatora string

of signals

Digital data (a string of symbols)

Digital data (a string of symbols)

57

RS-232(-C)

• Communication between computer and modem

• Uses two voltage levels (+15V, -15V), a binary voltage encoding

• Data rate limited to 19.2 kbps (RS-232-C) raised in later standards

58

Binary Voltage Encoding

• NRZ (Non-Return to Zero)

• NRZI (NRZ Inverted)

• Manchester (used by IEEE 802.3, 10 Mbps Ethernet)

• 4B/5B (8B/10B) in Fast Ethernet

59

Non-Return to Zero (NRZ)

• Encode binary data onto signals– e.g. 0 as low signal and 1 as high signal

– Voltage does not return to zero between bits

• Known as Non-Return to Zero (NRZ)

Bits

NRZ

0 0 1 0 1 1 1 1 0 1 0 0 0 0 1 0

60

Problem: Consecutive 1s or 0s

• Low signal (0) may be interpreted as no signal• High signal (1) leads to baseline wander• Unable to recover clock

– Sender’s and receiver’s clock have to be precisely synchronized

– Receiver resynchronizes on each signal transition

– Clock Drift in long periods without transition

Sender’s clock

Receiver’s clock

61

Alternative Encodings• Non-Return to Zero Inverted (NRZI)

• Make a transition from current signal (switch voltage level) to encode/transmit a “one”

• Stay at current signal (maintain voltage level) to encode/transmit a “zero”

• Solves the problem of consecutive ones (shifts to 0s)

62

Alternative Encodings• Manchester (in IEEE 802.3 – 10 Mbps

Ethernet)

• Split cycle into two parts– Send high--low for “1”, low--high for “0”– Transmit XOR of NRZ encoded data and the

clock

• Only 50% efficient (1/2 bit per transition)

63

4B/5B Encoding

• Every 4 consecutive bits of data encoded in a 5-bit code (symbol)– 4-bit pattern is “translated” to a 5-bit pattern (not addition)

• 5-bit codes selected to have no more than one leading 0 and no more than two trailing 0s – 00xxx (8 symbols) and xx000 (4 symbols) are illegal– 5 free symbols (non-data)

• Thus, never gets more than three consecutive 0s• Resulting 5-bit codes are transmitted using NRZI • Achieves 80% efficiency

64

Binary Voltage Encoding

• Problem: wide frequency range required, implying– Significant dispersion– Uneven attenuation

• Prefer to use narrow frequency band (carrier frequency)

• Types of modulation– Amplitude Modulation (AM)– Frequency Modulation (FM)– Phase/Phase Shift– Combination of these (e.g. QAM)

65

Phase Modulation Algorithm

• Send carrier frequency for one period• Perform phase shift• Shift value encodes symbol

– Value in range [0, 360º]– Multiple values for multiple symbols– Represent as circle

1350 450

2250 3150

1800 00

900

2700

8-symbol example

66

Constellation Pattern for V.32 QAM

For a given symbol:1. Perform phase shift

2. Change to new amplitude 450

150

• Points in constellation diagram– Chosen to maximize error

detection– Process called trellis coding

67

Bit Rate and Baud Rate

• Bit rate is bits per second

• Baud rate is “symbols” per second

• If each symbol contains 4 bits then data rate is 4 times the baud rate

68

What Limits Baud Rate ?

• Baud rates are typically limited by electrical signaling properties

• No matter how small the voltage or how short the wire, changing voltages takes time

• Electronics are slow as compared to optics

69

Summary of Encoding

• Problems: attenuation, dispersion, noise• Digital transmission allows periodic regeneration• Variety of binary voltage encodings

– High frequency components limit to short range– More voltage levels provide higher data rate

• Carrier frequency and modulation– Amplitude, frequency, phase, and combination (QAM)

• Nyquist (noiseless) and Shannon (noisy) limits on data rates

70

Framing• Breaks continuous stream/sequence of bits into a

frame and demarcates units of transfer• Typically implemented by network adaptor

– Adaptor fetches/deposits frames out of or into host memory

Frames

BitsAdaptor Adaptor Node BNode A

71

Advantages of Framing

• Synchronization recovery– Consider continuous stream of unframed bytes– Recall RS-232 start and stop bits

• Multiplexing of link– Multiple hosts on shared medium– Simplifies multiplexing of logical channels

• Efficient error detection– Frame serves as unit of detection (valid or invalid)– Error detection overhead scales as log N

72

Approaches

• Organized by end of frame detection method

• Approaches to framing– Sentinel (marker, like C strings)– Length-based (like Pascal strings)– Clock-based

73

Approaches

• Other aspects of a particular approach– Bit-oriented or byte-oriented– Fixed or variable length– Data-dependent or data-independent

length

74

Framing with Sentinels

• End of frame: special byte or bit pattern

• Choice of end of frame marker– Valid data byte or bit sequence e.g. 01111110– Physical signal not used by valid data symbol

8 16 16 8

Beginning sequence

Header Body CRCEnding

sequence

75

Sentinel Based Approach

• Problem: equal size frames are not possible– Frame length is data-dependent

• Sentinel based framing examples– High-Level Data Link Control (HDLC)

protocol– Point-to-Point Protocol (PPP)– ARPANET IMP-IMP protocol– IEEE 802.4 (token bus)

76

Length-based Framing

• Include payload length in header• e.g., DDCMP (byte-oriented, variable-length)• e.g. RS-232 (bit-oriented, implicit fixed length)

• Problem: count field corrupted• Solution: catch when CRC fails

8 148

SYN SYN Class Length

8 42

Header

16

Body CRC

77

Clock-based Framing

• Continuous stream of fixed-length frames– Each frame is 125µs long (all STS formats) (why?)

• Clocks must remain synchronized• e.g. SONET: Synchronous Optical NETwork

– Dominated standard for long distance transmission

– Multiplexing of low-speed links onto one high-speed link

– Byte-interleaved multiplexing

– Payload bytes are scrambled (data XOR 127 bit-pattern)

– STS-n (STS – 1 = 51.84 Mbps)

78

SONET Frame Format (STS-1)

Overhead Payload

90 columns

9 rows

79

Clock-based Framing

• Problem: how to recover frame synchronization– 2-byte synchronization pattern starts each

frame (unlikely to occur in data)

– Wait until pattern appears in same place repeatedly

80

Clock-based Framing

• Problem: how to maintain clock synchronization– NRZ encoding, data scrambled (XOR’d)

with 127-bit pattern

– Creates transitions

– Also reduces chance of finding false sync pattern

81

Error Detection

• Validates correctness of each frame

• Errors checked at many levels

• Demodulation of signals into symbols (analog)

• Bit error detection/correction (digital)—our main focus– Within network adapter (CRC check)

– Within IP layer (IP checksum)

– Possibly within application as well

82

Error Detection and Correction

• Possible binary voltage encoding symbol

• Neighborhoods and erasure region

+15

-15

volt

age

0

1

? (erasure)

• Possible QAM symbol

• Neighborhoods in green

• All other space results in erasure

Input to digital level: valid symbols or erasures

83

Error Detection: How ?• How to detect error ?

– Add redundant information to a frame to determine errors

• Transmit two complete copies of data– n redundant bits for n-bit message– Error at the same position in two copies

go undetected

84

Error Detection: How ?

• We want only k redundant bits for an n-bit message, where k < < n– In Ethernet, 32-bit CRC for 12,000 bits (1500

bytes)

• k bits are derived from the original message

• Both the sender and receiver know the algorithm

85

Hamming Distance (1950 Paper)

• Minimum number of bit flips between code words– 2 flips for parity– 3 flips for voting

• n-bit error detection– No code word changed into another code

word– Requires Hamming distance of n+1

86

Hamming Distance (1950 Paper)

• n-bit error correction– N-bit neighborhood: all code words

within n bit flips– No overlap between n-bit

neighborhoods– Requires Hamming distance of 2n+1

87

Digital Error Detection Techniques• Two-dimensional parity

– Detects up to 3-bit errors– Good for burst errors

• Internet checksum (used as backup to CRC)– Simple addition– Simple in software

• Cyclic redundancy check (CRC)– Powerful mathematics– Tricky in software, simple in hardware– Used in network adapter

88

Two-Dimensional Parity

• Adding one extra bit to a 7-bit code to balance 1s

• extra parity byte for the entire frame

• Catches all 1, 2 and 3 bit errors and most 4 bit errors

• 14 redundant bits for a 42-bit message, in the example

1011110 1

1101001 0

0101001 1

1011111 0

0110100 1

0001110 1

1111011 0

Paritybits

Paritybyte

Data

89

Internet Checksum Algorithm

• Not used at the link level but provides same sort of functionality as CRC and parity

• Idea:– Add up all words (16-bit integers) that are transmitted– Transmit the result (checksum) of that sum– Receiver performs the same calculation on received data

and compares the result with the received checksum– If the results do not match, an error is detected

• 16 redundant bits for a message of any length• Weak protection, accepted as a last line of defense

90

Cyclic Redundancy Check

Theory• Based on finite-field (binary-valued) arithmetic• Bit string represented as polynomial• Coefficients are binary-valued• Divide bit string polynomial by generator

polynomial to generate CRC

Practice• Bitwise XOR’s

91


• Add k bits of redundant data to an n-bit message– Want k << n

– e.g. k = 32 and n = 12,000 (1500 bytes)

• Represent n-bit message as n-1 degree polynomial– e.g. MSG=10011010 as M(x) = x7 + x4 + x3 + x1

– Sender and receiver exchange polynomials

• Let k be the degree of some agreed-upon divisor/ generator polynomial– e.g. C(x) = x3 + x2 + 1

92


• Transmit polynomial P(x) that is evenly divisible by C(x) – Shift left k bits, i.e. M(x)xk

– Add remainder of M(x)xk / C(x) into M(x)xk

• Receiver receives polynomial P(x) + E(x)– E(x) = 0 implies no errors

• Receiver divides (P(x) + E(x)) by C(x); remainder will be zero ONLY if:– E(x) was zero (no error), or– E(x) is exactly divisible by C(x)

93

Reliable Transmission

• Error-correcting codes are not advanced enough to handle the range of bit and burst errors– Corrupt frames generally must be discarded– A reliable link-level protocol must recover from

discarded frames

• Goals for reliable transmission– Make channel appear reliable– Maintain packet order (usually)– Impose low overhead / allow full use of link

94


• Reliability accomplished using acknowledgments and timeouts– ACK is a small control frame

confirming reception of an earlier frame

– Having no ACK, sender retransmits after a timeout

95


• Automatic Repeat reQuest (ARQ) algorithms– Stop-and-wait

– Concurrent logical channels

– Sliding window

• Go-back-n, or selective repeat

• Alternative: Forward Error Correction (FEC)

96

Automatic Repeat reQuest• Acknowledgement (ACK)

– Receiver tells sender when frame received

– Cumulative ACK (used by TCP): have received specified frame and all previous

– Selective ACK (SACK): specifies set of frames received

– Negative ACK (NACK or NAK): receiver refuses to accept frame now, e. g. when out of buffer space

97

Automatic Repeat reQuest• Timeout: sender decides that frame was

lost and tries again

• ARQ also called

Positive Acknowledgement with Retransmission (PAR)

98

Stop-and-Wait• Send a single frame• Wait for ACK or timeout

– If ACK received, continue with next frame

– If timeout occurred, send again (and wait)• Frame lost in transit; or corrupted and discarded

Sender Receiver

Frame 0

Frame1

ACK0

ACK1

99

Stop-and-Wait• Frames delivered reliably and in order• Is that enough ?

– No, we need performance, too.

• Problem: keeping the pipe full … ?• Example

– 1.5Mbps link x 45ms RTT = 67.5Kb (~8KB)– 1KB frames implies 182 Kbps (1/8th link utilization)– Want the sender to transmit 8 frames before waiting

for ACK– Throughput remains 182 Kbps regardless of the link

bandwidth !!

100

Concurrent Logical Channels• Multiplex several logical channels over a single p-to-p

physical link (include channel ID in header)• Use stop-and-wait for each logical channel• Maintain three bits of state for each logical channel:

– Boolean saying whether channel is currently busy– Sequence number for frames sent on this channel– Next sequence number to expect on this channel

• ARPANET IMP-IMP supported 8 logical channels over each ground link (16 over each satellite link)

101

Concurrent Logical Channels

• Header for each frame include 3-bit channel number and 1-bit sequence number– Same number of bits (4) as the sliding

window requires to support up to 8 outstanding frames on the link

102

Sliding Window• Allow sender to transmit multiple frames before

receiving an ACK, thereby keeping the pipe full• Upper bound on outstanding un-ACKed frames• Also used at the transport layer (by TCP)

Sender Receiver

Tim

e

……

103

Sliding Window Concepts• Consider ordered stream of data

– Broken into frames

– Stop-and-Wait

• Window of one frame

• Slides along stream over time

• Sliding window algorithms generalize this notion– Multiple-frame send window

– Multiple-frame receive window

time

104

Sliding Window - Sender• Assign sequence number to each frame (SeqNum)• Maintain three state variables:

– Send Window Size (SWS)– Last Acknowledgment Received (LAR)– Last Frame Sent (LFS)

• Maintain invariant: LFS – LAR ≤ SWS

• Advance LAR when ACK arrives • Buffer up to SWS frames and associate timeouts

time

14 1512 1311 19 2017 1816

LAR=13 LFS=18≤ SWS

105

Sliding Window - Receiver• Maintain three state variables

– Receive Window Size (RWS)– Largest Frame Acceptable (LFA)– Next Frame Expected (NFE)

• Maintain invariant: LFA – NFE+1 ≤ RWS

• Frame SeqNum arrives:– If NFE ≤ SeqNum ≤ LFA accept– If SeqNum ≤ NFE or SeqNum > LFA discarded

• Send cumulative ACKs

time

14 1512 1311 19 2017 1816

NFE=13 LFA=17≤ RWS

106

Sliding Window Issues• When a timeout occurs, data in transit decreases

– Pipe is no longer full when packet losses occur– Problem aggravates with delay in packet loss detection

• Early detection of packet losses improves performance:– Negative Acknowledgements (NACKs)– Duplicate Acknowledgements– Selective Acknowledgements (SACKs)

• Adds complexity but helps keeping the pipe full

107

Sliding Window Classification

• Stop-and-wait: SWS=1, RWS=1• Go-back-N: SWS=N, RWS=1• Selective repeat: SWS=N, RWS=M

(usually M = N)

Selective RepeatGo-back-N

Stop-and-Wait

108

Sequence Number Space

• SeqNum field is finite; sequence numbers wrap around

• Sequence number space must be larger than number of outstanding frames (SWS)

• SWS <= MaxSeqNum-1 is not sufficient– Suppose 3-bit SeqNum field (0..7); SWS=RWS=7– Sender transmits frames 0..6; which arrive successfully

(receiver window advances)– ACKs are lost; sender retransmits 0..6– Receiver expecting 7, 0..5, but receives second

incarnation of 0..5 assuming them as 8th to 13th frame

109

Required Sequence Number Space ?

• Assume SWS=RWS (simplest, and typical)– Sender transmits full SWS– Two extreme cases at receiver

• None received (waiting for 0…SWS – 1)• All received (waiting for SWS…2 × SWS – 1)

• All possible packets must have unique SeqNum• SWS < (MaxSeqNum+1)/2 or SWS+RWS < MaxSeqNum+1 is the correct rule• Intuitively, SeqNum “slides” between two halves

of sequence number space

110

Shared Media: Problems• Problem: demands can conflict, e. g.

two hosts send simultaneously– STDM does not address this problem -

centralized

– Solution is a medium access control (MAC) algorithm

111

Shared Media: Solutions• Three solutions (out of many)

– Carrier Sense Multiple Access with Collision Detection (CSMA / CD)

• Send only if medium is idle

• Stop sending immediately if collision detected

– Token Ring/FDDI pass a token around a ring; only token holder sends

– Radio / wireless (IEEE 802.11)

112

History of Ethernet

• Developed by Xerox PARC in mid-1970s• Roots in Aloha packet-radio network• Standardized by Xerox/DEC/Intel in 1978• Similar to IEEE 802.3 standard• IEEE 802.3u standard defines Fast

Ethernet (100 Mbps)• New switched Ethernet now popular

113

Ethernet – Alternative Technologies

• Can be constructed from a thinner cable (10Base2) rather than 50-ohm coax cable (10Base5)

• Newer technology uses 10BaseT (twisted pair)– Several point-to-point segments coming out of a

multiway repeater called “hub”

HubHub

114

Ethernet – Multiple Segments

• Repeaters forward the broadcast signal on all out going segments (10Base5)

• Maximum of 4 repeaters (2500m), 1024 hosts

Repeater

Host

…

…

…

115

Ethernet Packet Frame

• Preamble allows the receiver to synchronize with signal

• Frame must contain at least 46 bytes to detect collision

• 802.3 standard substitutes length with type field– Type field (demux key) is the first thing in data portion– A device can accept both frames: type > 1500

Destaddr

64 48 32

CRCPreamble Srcaddr

Type Body

1648

116

Ethernet MAC – CSMA/CD

• Multiple access– Nodes send and receive frames over a shared

link

• Carrier sense– Nodes can distinguish between an idle and

busy link

• Collision detection– A node listens as it transmits to detect

collision

117

CSMA/CD MAC Algorithm

• If line is idle (no carrier sensed)– Send immediately

– Upper bound message size of ~1500 bytes

– Must wait 9.6µs between back-to-back frames

118

CSMA/CD MAC Algorithm

• If line is busy (carrier sensed) …– Wait until the line becomes idle and then

transmit immediately

– Called 1-persistent (special case of p-persistent)

• If collision detected– Stop sending data and jam signal

– Try again later

119

Constraints on Collision Detection

• In our example, consider– my-machine’s message reaches your-machine

at T

– your-machine’s message reaches my-machine at 2T

• Thus, my-machine must still be transmitting at 2T

120

Ethernet Min. Frame Size

• RTT on a maximally configured Ethernet of 2500m, with 4 repeaters is about 51.2 μs– 2500m / 2 x 108 m/s = 12.5 µs– 2 x 12.5 = 25 us + repeater delays

• 51.2 μs on 10 Mbps corresponds to 512 bits (64 bytes)

• Therefore, the minimum frame length for Ethernet is 64 bytes (header + 46 bytes data)

121

Retry After the Collision• How long should a host wait to

retry after a collision ?– Binary exponential backoff

• Maximum backoff doubles with each failure (exponential)

• After N failures, pick an N-bit number

• 2N discrete possibilities from 0 to maximum

122

Ethernet Frame Reception

• Sender handles all access control• Receiver simply pulls frames from

network• Ethernet controller/card

– Sees all frames– Selectively passes frames to host

processor

123

Experience With Ethernet

• Number of hosts limited to 200 in practice, standard allows 1024

• Range much shorter than 2.5 km limit in standard

• Round-trip time is typically 5 or 10 μs, not 50μs

124

Token Ring Overview

• Token Ring network “was” a candidate to replace Ethernet; used in some MAN backbones– 16Mbps IEEE 802.5 (based on earlier 4Mbps IBM ring)

– 100Mbps Fiber Distributed Data Interface (FDDI)

125

IBM Token Ring – IEEE 802.5

• Ring is viewed as a single shared medium– Each node is allowed to transmit according to

some distributed algorithm for medium access

– All nodes see all frames; destination saves a copy of frame as it flows past

• The term “token” indicates the way the access to shared channel is managed

126

Token in a Token Ring

• Token is a special bit pattern that rotates around the ring– A node must capture token before transmitting– A node releases token after done transmitting

• Immediate release-token follows last frame (FDDI)

• Delayed release – after last frame returns to sender

127

Token in a Token Ring

• Remove your frame when it comes back around– Transmit another frame or re-insert the

token

• Stations get round-robin service as the token circulates around the ring

128

Physical Properties

• Data rate can be 4 Mbps or 16 Mbps

• Encoding of bits uses differential Manchester

• Ring may have up to 250 (802.5) or 260 (IBM) nodes

• Physical medium is twisted pair (IBM Token Ring)

129

Token Ring MAC

• Network adaptor contains receiver, transmitter and some storage of bits between them

• Token circulates if no station has anything to send– Ring must have enough capacity to store entire token

– At least 24 stations with 1-bit storage for 24-bit long token (if propagation delay is negligible)

– This situation is avoided by designating a monitor

130

Token Ring MAC

• Any station that has a data to send can seize token

• In 802.5, simply 1 bit in second byte token is modified

• First two bytes of modified token become preamble for the next frame

131

Frame Format

• “Illegal” Manchester codes in the start and end delimiters

• Frame priority and reservation bits in access control byte

• Demux key in frame control byte• A and C bits for reliable delivery, in status

byte

Body CRCSrcaddr

Variable48

Destaddr

48 32

Enddelimiter

8

Framestatus

8

Framecontrol

8

Accesscontrol

8

Startdelimiter

8

132

Timed Token Algorithm

• Token Holding Time (THT)– Upper limit on how long a station can hold the

token– A node checks before putting each frame on ring

that its transmit time would not cause THT to exceed

– Long THT achieves better utilization with few senders

– Short THT helps when multiple nodes have data to send

133

Reliable Delivery

• The A and C bit in the packet trailer for reliability

• Both bits are initially set to 0

• Destination sets A bit if it sees the frame and sets C bit if it copies the frame into its adaptor

134

Token Ring Packet Priorities

• A station willing to send priority n packet can set reservation bits to n, if this makes it lower in value– It captures the token when the current sender

releases it with priority set to n

• Strict priority scheme: no lower-priority packets get sent when higher priority packets are waiting

135

Token Maintenance

• Token rings have a designated monitor node

• Any station can become the monitor according to a well defined procedure

• Monitor is elected when the ring is first connected, or when the current monitor fails

136

Token Maintenance

• Monitor periodically announces its presence

• Claim token sent by a station seeing no monitor– If the sender receives back the claim token, it

becomes monitor

– If another station is also contending for monitor, some rule defines the monitor

137

Fiber Distributed Data Interface• Similar to 802.5/IBM token rings but runs on fiber• Consists of a dual ring: two independent rings that

transmit data in opposite directions at 100Mbps• Tolerates a single link break or node failure (self-

healing ring)

(a) (b)

138

FDDI – Physical Properties

• Variable size buffer (9 – 80 bits) between input and output interfaces (10ns bit time)– Not required to fill buffer before starting

transmission

• Maximum 500 stations, maximum 2 km distance between any pair of stations

139

FDDI – Physical Properties

• Total 200 km fiber: dual nature implies 100 km cable connecting all stations

• Physical media can be coax or twisted pair cable

• Uses 4B/5B encoding

140


• Token Holding Time (THT)– Upper limit on how long a station can hold the

token

– Configured to some suitable value

• Token Rotation Time (TRT)– How long it takes the token to traverse the ring

(time since a host released the token)–TRT <= ActiveNodes x THT + RingLatency

141


• Target Token Rotation Time (TTRT)–“agreed-upon” or negotiated

upper bound on TRT

142

MAC Algorithm

• Each node measures TRT between successive token arrivals

• If measured-TRT > TTRT– Token is late– Can not send data

143

FDDI Traffic Classes

• Synchronous traffic–Latency sensitive–Gets higher priority–Can always send data

144

Bounded Priority Traffic

• If a node has large amount of synchronous data– It will send regardless of measured TRT

– TTRT will become meaningless !!!

• Therefore, total synchronous data during one token rotation is bounded by TTRT

145

Token Maintenance

• The procedure when a node– Joins the ring (startup)– Suspects a failure

• Claim frame is used in order to– Generate a new Token– Agree on TTRT (so that an application can

meet its timing constraints)• A node can send a claim frame without

holding the token

146

Frame Format• 4B/5B control symbols for start and end of frame• Control Field

– 1st bit: asynchronous (0) versus synchronous (1) data– 2nd bit: 16-bit (0) versus 48-bit (1) addresses– Last 6 bits: demux key (includes reserved patterns for

token and claim frame)

• Status Field– From receiver back to sender; error in frame– Recognized address; accepted frame (flow control)

Body CRCSrcaddr

Variable48

Destaddr

48 32

End offrame

8

Status

24

Control

8

Start offrame

8

147

Wireless LANs

• IEEE 802.11 standard– Designed for use in a small area (offices,

campuses)

• Bandwidth: 1, 2 or 11 Mbps– Up to 56Mbps in newer 802.11a standard

• Targets three physical media– Two spread spectrum radio (2.4GHz freq)– One diffused infrared (10m range, 850 nm band)

148

802.11 MAC: CSMA/CA

• Similar to Ethernet …– Defer the transmission until the link

becomes idle– Take back off if collision occurs

• Is it sufficient ?• All nodes are not always within reach

of (to hear) each other

149

Hidden and Exposed Nodes• Hidden nodes

– Sender thinks its OK to send when its not (false +ve)– A-C and B-D are hidden nodes in the figure below

• Exposed nodes– Sender does not send when its OK to send (false –ve)– B and C are exposed nodes in the figure below

A B C D

150

Multiple Access with Collision Avoidance (MACA)

• Sender transmits RequestToSend (RTS) frame– Contains intended time to hold the

medium

• Receiver replies with ClearToSend (CTS) frame

151

MACA for Wireless (MACAW)

• Collision detection–No active collision detection–Known only if CTS or ACK is not

received–Binary exponential back off (BEB)

is used in case of collision, like in Ethernet

152

802.11 - Distribution System

• Nodes roam freely but operate within a structure– Tethered by wired network infrastructure

(Ethernet ?)

– Each Access Point (AP) services nodes in some region

– Each mobile node associates itself with an AP

153

Managing Connectivity/Roaming

• How wireless nodes select Access Point ?

• Scanning (active search for an AP)– Node sends Probe frame– All AP’s within reach reply with Probe Response frame

– Node selects one AP; sends it Associate Request frame

– AP replies with Association Response– New AP informs old AP via wired backbone

154

Managing Connectivity• Active scanning: when a node join or move• Passive scanning: AP periodically sends Beacon frame, advertising its capabilities

B

H

A

F

G

D

AP-2

AP-3AP-1

EC

C

Distribution system

155

Frame Format

• Control field contains three subfields:– 6-bit Type field (data, RTS, CTS, scanning);

– 1-bit ToDS; and

– 1-bit FromDS

• A single frame contains up to 2312 bytes of data

Addr4 PayloadSeqCtrlAddr3Addr2Addr1 CRC

0– 18,4964816 32484848

Duration

16

Control

16

ToDS=0, FromDS=0 C A

ToDS=1, FromDS=1 E AP-3 AP-1 A

156

Overview

• Also called network interface card (NIC)• Components (high-level overview)• Options for use

– Data motion

– Event notification

• Potential performance bottlenecks• Programming device drivers

157

Typical Workstation Architecture

CPU

Cache $

MemoryI/O bus

NetworkAdaptormemory

bus

Communication ?

To Network

To Network

Typically where data link functionality is implemented

158

Components of a Network Adaptor• Bus interface communicates with a specific host

– Bus defines protocol for CPU-adaptor communication

• Link interface speaks correct protocol on network– Implemented by a chip set, in software or on FPGA

• Buffering between different speed bus and link

Ho

st I/

O b

us

Network Adaptor

Bus Interface

Link Interface network network

159

Host Perspective

• Adaptor is ultimately programmed by CPU

• Adaptor exports a Control Status Register (CSR)

• CSR is readable and writable from CPU at some memory address

160

Data Motion Options for Network Adaptor Use

• Transfer frames between adaptor and host memory

• Programmed input/output (PIO)– Processor manages itself each

access (loads/stores)– Faster than DMA for small amounts

of data

161

Data Motion Options for Network Adaptor Use

• Direct memory access (DMA)– Adaptor gets buffer descriptor lists by

host for read/write– Processor is not involved: free to do other

things– Can be faster than memory copy through

CPU– Start-up cost

162

Data Motion

CPU

Cache $

MemoryI/O bus

NetworkAdaptormemory

busTo

Network

To Network

Data movement path using PIO

Data movement path using DMA

163

Network Adaptor: Event Notification

• Hardware interrupts– Processor free to do other things

– Events delivered “immediately”

– State (register) save/restore

expensive

– Context switches more

expensive

164

Network Adaptor: Event Notification

• Event polling– Processor must periodically

check

– Events wait until next check

– No extra state changes

165

Device Drivers

• Operating system routines anchoring protocol stack to network hardware

• Initialize device, transmit frames, field interrupts

• Code contains device specific details– Difficult to read but simple in logic

166

Performance Bottlenecks

• Link capacity

• Processor computing power

• I/O bus bandwidth– Overhead involved in each bus

transfer

167

Performance Bottlenecks

• Memory bus bandwidth– Memory hierarchy with cache

levels– Memory accesses results in

multiple memory copies in

different buffers

168

Packet Switches• A multi-input multi-output

device• Local star topology• Performance independent of

connectivity– (e.g. adding new host) if switch is

designed with enough aggregate capacity

• Maximum degree < physical network limit

169

Forwarding

• Packets arrive at one of the several inputs and have to be forwarded/switched to one of the available outputs– Connectionless and connection-oriented approach to

determine the correct output

Which way should it go ?

First challenge: forwarding

170

Routing

• Forwarding requires information

Second challenge:

routing

How to maintain forwarding information ?

171

Contention and Congestion

• If arrival rate for a certain output is greater than the output capacity, then contention occurs

• If arrival rate of packets is too high to cause buffer overflow, then congestion occurs

Who goes first ?

Any one is dropped ?

172

Network Layers and Switches

One or more nodes

within the network

User level

OS kernel

host

switchswitch

between different physical layers

Transport

Network

Data Link

Physical

Session

Presentation

Application

Network

Data Link

Physical

173

Packet Switching / Forwarding

• Three approaches– Datagram or connectionless approach

– Virtual circuit or connection-oriented approach

– Source routing

• Important notion: unique global address per host

174

Datagram Switching / Forwarding

• Every packet contains enough information– Enables switch to decide how to forward it

• Switch translates global address to output port– Maintains forwarding table for translations

• Each packet forwarded and travels independently

175

Datagram Switching• Managing tables in large, complex networks with

dynamically changing topologies is a real challenge for the routing protocol

01

3

2

0

13

2

0

13

2

Switch 3Host B

Switch 2

Host A

Switch 1

Host C

Host D

Host E

Host F

Host G

Host H

At switch 1:Dest Port#/Interface A 2 B 1 C 3 D 0 E 1 … …

176

Datagram Model

• No round trip time delay waiting for connection setup– Host can send data anywhere, anytime as soon as it is

ready– Source has no way of knowing if the network is

capable of delivering a packet or if the destination host is even up

• Packets are treated independently– Possible to route around link and node failures

dynamically

177

Virtual Circuit Switching

• Explicit connection setup (and tear-down) phase from source to destination: connection-oriented model– Subsequence packets follow established

circuit

• Supporting “connections” in network layer may be useful for service notions

178

VC Tables in VC Switching

• Setup message in signaling process (to create VC table) is forwarded like a datagram

• Acknowledgment of connection setup to downstream neighbors to complete signaling– Data transfer phase can start after ACK is

received

179

Signaling in VC Switching

• Setup message is forwarded from Host A to Host B

• On connection request, each switch creates an entry in VC table with a VCI for the connection

0

13

2

2

1

3

0

0

13

2

Switch 3Host B

Switch 2

Switch 1

Host A

I/F VCI I/F VCI in in out out

setupsetup BB

setupsetup BBsetupsetup BB

setupsetup BB

2 5 1


2 7 3


3 9 0

180

Virtual Circuit Model

• Typically wait full RTT for connection setup before sending first data packet– Can not avoid failures dynamically,

must re-establish connection (old one is torn down to free storage space)

181

Source Routing

• Packet header contains sequence of address/ports on path from source to destination– One direction per switch: port, next

switch (absolute)

– Switches read, use, and then discard directions

182

Data Transfer in Source Routing

• Analogous to following directions

0

13

2

2

1

3

0

0

13

2

Switch 3

Host B

Switch 2

Switch 1

Host A

datadata 00 11 33

datadata 33 00 11

datadata 11 33 00

datadata 33 00 11

datadata 11 00 33

datadata 22 33 00 11

183

Source Routing Model

• Source host needs to know the correct and complete topology of the network– Changes must propagate to all hosts

• Packet headers may be large and variable in size: the length is unpredictable

184

Implementation and Performance

• Packet arriving at interface 1 has to go on interface 2• Point of contention for packets: I/O and memory bus

CPU

Main memory

I/O bus

Interface 1

Interface 2

Interface 3

185

Building Extended LANs

• Traditional LAN– Shared medium (e.g. Ethernet)– Cheap, easy to administer– Supports broadcast traffic

• Problem– Want to scale LAN concept

• Larger geographic area (Greater than O(1 km))• More hosts (Greater than O(100))

– But retain LAN-like functionality

• Solution: bridges

186

Bridges• Connect two or more LANs with a bridge

– Transparently extends a LAN over multiple networks– Accept & forward strategy (in promiscuous mode)– Level 2 connection (does not add packet header)

A

Bridge

B C

X Y Z

Port 1

Port 2

187

Learning Bridges

• Learn table entries based on source address– Timeout entries to allow movement of hosts

• Table is an optimization need not be complete• Always forward broadcast frames• Uses datagram or connectionless forwarding

A

Bridge

B C

X Y Z

Port 1

Port 2

Host Port A 1 B 1 C 1 X 2 Y 2 Z 2

188

Learning Bridges

• Problem– Redundancy (desirable to handle failures, but …)– Makes extended LAN structure cyclic– Frames may cycle forever

• Solution: spanning tree

B3

A

C

E

DB2

B5

B

B7 K

F

H

B4

J

B1

B6

G

I

189

Spanning Tree

• Subset of forwarding possibilities• All LAN’s reachable, but• Acyclic• Bridges run a distributed algorithm to

calculate the spanning tree– Select which bridge actively forward– Developed by Radia Perlman of DEC– Now IEEE 802.1 specification– Reconfigurable algorithm

190

Spanning Tree Algorithm

• All designated bridges forward frames– On all designated ports– On preferred port (path leading to root)

B3

A

C

E

DB2

B5

B

B7 K

F

H

B4

J

B1

B6

G

I

B2

LAN

Designated port

Preferred port

Designated bridge

191

Distributed Spanning Tree Algorithm

• Bridges exchange configuration messages– ID for bridge sending the message– ID for what the sending bridge

believes to be root bridge– Distance (hops) from sending bridge

to root bridge

192

Limitations of Bridges

• Do not scale– Spanning tree algorithm does not scale

– Broadcast does not scale

• Do not accommodate heterogeneity– Only supports networks with same

address formats

193

ATM (Asynchronous Transfer Mode)• Common in WANs, can also be used in

LANs– Competing technology with Ethernet, but areas

of application only partially overlap

• Connection-oriented packet-switched network– Virtual-circuit routing

• Typically implemented on SONET (other physical layers possible)

194

ATM Signaling

• Connection setup called signaling (standard Q.2931)

• Route discovery, resource resv, QoS, ...• Send through network

– Request setup circuit– Send setup frame on setup circuit

• Establish locally– No intermediate switch involvement– Requires pre-established virtual path

195

Cell Switching (ATM)

• Fixed length (53 bytes) frames are called cells– 5-byte (header + 1 – byte CRC – 8) + 48-

byte payload

• Standard defines 3 layers (5 sublayers)– Layers interface to physical media and to

higher layers (e.g. encapsulating variable-length frames)

196

Cell Switching (ATM)

• 2-level connection hierarchy–Virtual circuits

–Virtual paths

•Bundles of virtual circuits

•Travel along common route

•Reduces forwarding information

197

ATM Cell Format• User-Network Interface (UNI)

– Host-to-switch format – GFC: Generic Flow Control (still being defined)– VCI/VPI: Virtual Circuit/Path Identifier– Type: management, congestion control, AAL5 (later)– CLP: Cell Loss Priority – HEC: Header Error Check (CRC-8)

• Network-Network Interface (NNI)– Switch-to-switch format– GFC becomes part of VPI field

GFC VPI VCI Type CLP HEC(CRC-8) payload

4 16 3 18 384 (48 bytes)8

198

Segmentation and Reassembly• ATM Adaptation Layer (AAL)

– Application to ATM cell mapping– AAL header contains information for reassembly– AAL1, AAL2 for applications needing guaranteed rate– AAL3/4 designed for variable-length packet data– AAL5 is an alternative standard for packet data

AAL

ATM

AAL

ATM

… …

199

ATM Layers• ATM Adaptation Layer (AAL)

– Convergence Sublayer (CS) supports different application service models

– Segmentation and Reassembly (SAR) supports variable-length frames

• ATM Layer– Handles virtual circuits, cell header

generation, flow control

• Physical layer– Transmission Convergence (TC)

handles error detection, framing– Physical medium dependent (PMD)

sublayer handles encoding

ATM

AALCS

SAR

PHYTC

PMD

200

AAL 3/4• Provides information to allow variable size packets

to be sent in fixed-size ATM cells• Convergence Sublayer Protocol Data Unit (CS-PDU)

– CPI: Common Part Indicator (version field)– Btag/Etag:beginning and ending tags (same)– BAsize: hint on reassembly buffer space to allocate – Length: size of whole PDU

• Segmented into cells: header/trailer + 44-byte data

CPI Btag BAsize payload Pad 0 Etag Length

8 16 0-24 88 < 64 KB 8 16

201

ATM Cell Format for AAL 3/4

• Type (is-start? and is-end? bits)– BOM (10): Beginning Of Message – COM (00): Continuation Of Message– EOM (01): End Of message– SSM (11): Single-Segment Message

• SEQ: Sequence Number (for cell loss/reordering)• MID: multiplexing ID (mux onto virtual circuits)• Length: number of bytes of PDU in this cell

ATM header type seq MID payload length CRC-10

40 4 352 (44 bytes) 62 10 16

202

Encapsulation and Segmentation for AAL3/4

44 bytes 44 bytes 44 bytes <44 bytes

ATM header

AAL header Cell

payload

AAL trailer Padding

CS-PDU header

User data CS-PDU trailer

< 64 KB 4-7 bytes4 bytes

203

AAL 5 CS-PDU

• CS-PDU Format

– Pad so trailer always falls at the end of ATM cell– Length: size of PDU (data only)– CRC-32 (detects missing or misordered cells)

• Cell Format– End-of-PDU bit in Type field of ATM header

0 - 47 2< 64 KB 2 32

data pad reserved length CRC-32

204

Encapsulation and Segmentation for AAL 5

User data

48 bytes 48 bytes 48 bytes

ATM header Cell payload

Padding

CS-PDUtrailer

205

Virtual Paths with ATM• Two level hierarchy of virtual connection: 8-bit

VPI and 16-bit VCI– Switches in the public network use 8-bit VPI– Corporate sites use full 24-bit address (VPI + VCI)– Much less connection-state info in switches– Virtual path: fat pipe with bundle of virtual circuits

Public network

Network BNetwork A

206

ATM as a LAN Backbone

• Different from traditional LANs, no native support for broadcast or multicast

E1

H5

H6

H7

H1E3

H2

H4

H3E2

ATM linksEthernet links

Ethernet switch

ATM switchATM-attachedhost

207

Shared Ethernet Emulation with LANE

• All hosts think they are on the same Ethernet

LANE / EthernetAdaptor Card


HHHH

HH

HHHH

EthernetSwitchATM Switch



HHHH

HH

HHHH

EthernetSwitchATM Switch

208

ATM / LANE Protocol Layers

Higher-layerprotocols

(IP, ARP, . . .)

Signalling+ LANE

AAL5

ATM

PHY

ATM

PHY PHY

Higher-layerprotocols

(IP, ARP, . . .)

Signalling+ LANE

AAL5

ATM

Host Switch Host

PHY

Ethernet-likeinterface

209

Clients and Servers in LANE

• LAN Emulation Client (LEC)–Host, bridge, router or switch

• LAN Emulation Server (LES)–Maintains client’s MAC and ATM

addresses–Maintains ATM address of BUS

210


• LAN Emulation Configuration Server (LECS)– High-level network management when

LEC starts up

– Reachable by preset VC (recall known server port#)

– Maintains mapping of ATM address to LANE type

211


• Broadcast and Unknown Server (BUS)– Emulates broadcast and multicast; critical to LANE– Uses point-to-multipoint VC with all clients

• Servers physically located in one or more devices

H2H1

BUSLESATM network

Point-to-point VC

Point-to-multipoint VCLECS

212

LANE Registration

1. Client contacts LECS on predefined VC, and sends ATM address to it

2. LECS returns LAN type, MTU and ATM address of LES

3. Client signals connection to LES, and registers MAC and ATM addresses with LES

4. LES returns ATM address of BUS5. Client signals connection to BUS6. Bus adds client to point-to-multipoint

VC

ATM Network

LECS

LES BUS

H1 H2

H3

213

LANE Circuit Setup

1. Client (H1) knows destination MAC address of receiver (H2)

2. Client (H1) sends 1st packet to BUS

3. BUS sends address resolution request to LES

4. LES returns ATM address to client (H1)

5. Client (H1) signals connection to H2 for subsequent packets

ATM Network

LECS

LES BUS

H1 H2

H3

214

Contention in Switches• Some packets destined for same output

– One goes first– Others delayed or dropped

• Delaying packets requires buffering– Finite capacity, some packets must still drop– At inputs

• Increases/adds false contention• Sometimes necessary

– At outputs– Can also exert “backpressure”

215

Output Buffering

1x6 Switch

x

a

Standard check-in linesCustomer

service

trying to check-inyou Mr. X

writing complaint

letter

Mr. A waiting to

claim refund of Rs.100

216

Input Buffering: Head-of-line Blocking

1x6 Switch

x

a

Standard check-in linesCustomer

service

trying to check-in

you

Mr. X writing

complaint letter

Mr. A waiting to

claim refund of Rs.100

agents are standing by !

217

Backpressure

• Propagation delay requires that switch 2 exert backpressure at high-water mark rather than when buffer completely full

• It is thus typically only used in networks with small propagation delays (e.g. switch fabrics)

Switch 1 Switch 2

“no more, please”

218

Switching Fabric• Special-purpose (switching) hardware

• General problem– Connect N inputs to M outputs (NxM switch)– Often N=M (bidirectional links)

• Design goals– High throughput: want aggregate close to

MIN (sum of inputs, sum of outputs)– Avoid contention (fabric faster than ports)– Good scalability:linear size/cost growth in N/M

219

Switch: Fabric and PortsFabric has a job to deliver packets to the right output

InputPort

InputPort

InputPort

InputPort

OutputPort

OutputPort

OutputPort

OutputPort

FabricSwitchfabric

(with small internal

buffering)

220

Ports and Fabric

• Ports deals with the complexity of the real world– Virtual circuit management is handled in ports– Determine output port using forwarding tables

• Input port is the first in performance bottlenecks– Header processing and handling packet to fabric

221

Design Goals - Throughput

• An n x m switch can provide max ideal throughput of:

S = S1+ S2 + ……… + Sn

– Only possible if traffic at inputs is evenly distributed across all outputs

– Sustained throughput higher than link speed of output is not possible

222

Design Goals - Scalability

• Cost of hardware rises fast with increasing the number of ports n– Adding ports increases hardware & design

complexity

– Scalability in terms of rate of increase in cost

• Design complexity determines maximum switch size– Switch designs run into problems at some maximum

number of inputs and outputs

223

Switch Performance• Avoid contention with buffering

– Use output buffering when possible– Apply backpressure through fabric– Input buffering with “peeking” (non-FIFO semantics)

to reduce head-of-line blocking problems– Drop packets if input buffer overflows

• Good scalability– O(N) ports– Port design complexity O(N) gives O(N2) for switch– Port design complexity O(1) gives O(N) for switch

224

Crossbar (“Perfect”) Switch

• Problem: hardware scales as O(N2)

225

Knockout Switch: Pick L from N

• Problem: what if more than L arrive?

1

2

3

4

OutputsInputs

D D D D D

DDD

D

D D D

D

D

D

2 × 2 random selector

delay unit

8-to-4 Concentrator

226

Shared Memory Switch

Mux Buffer memory Demux

Writecontrol

Readcontrol

Inputs Outputs

… …

227

Self-Routing Fabrics• Use source routing on “network” within switch

• Input port attaches output port number as header

• Fabric routes packet based on output port

• Types– Banyan Network

– Batcher-Banyan Network

– Sunshine Switch

228

Banyan Network

• Sends 0 bit up, 1 bit down

001

011

110

111

001

011

110

111

MSB LSB

229

Batcher (Merge Sort) Network

Routing packets through a Batcher network

• Batcher-Banyan Network– Attach the two-back-to-back– Arbitrary unique permutations routed without contention

7 3

3 7

3 3

6 6

3 1

1 3

6 6

1 1

7 1

1 7

6 6

7 7

Sort Merge Merge

230

Batcher-Banyan Network

Sends 1 bit upSends 0 bit down

Sends 0 bit upSends 1 bit down

231

Sunshine Switch

• Like a Knockout switch

• Re-circulates overflow packets i.e. when more than L arrive in one cycle

Delay

Inputs Batcher Trap SelectorOutputs

nnn

n

kk

n + kn + kl banyans

nnn(marks

overflow packets)

232

What we understand …

• Concepts of networking and network programming– Elements of networks: nodes and links– Building a packet abstraction on a link

• Transmission, and units of communication data– How to detect transmission errors in a frame after

encoding and framing it– How to simulate a reliable channel (sliding window)– How to arbitrate access to shared media in any network

• Design issues of direct link networks– Functionality of network adaptors

233

We also understand …

• How switches may provide indirect connectivity– Different ways to move through a network

(forwarding)– Bridge approach to extending LAN concept– Example of a real virtual circuit network (ATM)– How switches are built and contention within

switches

• Next: lets different networks “work together”

234

Internetworking• Reading: Peterson and Davie, Ch. 4

• Basics of Internetworking – Heterogeneity– The IP protocol, address resolution, control

messages

• Dealing with simple heterogeneity issues– Defining a service model– Defining a global namespace– Structuring the namespace to simplify forwarding– Hiding variations in frame size limits

235

Internetworking

• Routing – moving forward with IP– Building forwarding information

• Dealing with global internets-scale– Virtual geography and addresses– Hierarchical routing– Name translation and lookup: translating between

global and local (physical) names– Multicast traffic

• Future internetworking: IPv6

236

Internet Protocol (IP)• Network protocol for the Internet• Operates on all hosts and routers (routers connect

distinct networks into the Internet)

…

TFTPNVHTTPFTP

UDPTCP

IPIP

FDDI Ethernet ATM

237

IP Service Model• Provided to transport layer (TCP, UDP)

– Global name space– Host-to-host connectivity (connectionless)– “Best effort” packet delivery (datagram-based)

• No delivery guarantees on bandwidth, delay, etc.– Packet delayed for very long time– Packet lost– Packet delivered more than once– Packets delivered out of order

• Simplest model: ability of IP to “run over anything”

238

Internetwork

• Concatenation of networks

• Protocol stack

Network 1

Ethernet

Network 1

Ethernet

Network 3

FDDI

Network 3

FDDI

Network 4

Ethernet

Network 4

Ethernet

R1

R2

R3

H8H2 H3

H1

H4

H5

H6 H7

Network 2

Point-to-

point

R1

H1

TCP

IP

ETH ETH PPP

IP

R2

PPP FDDI

IP

R3

FDDI ETH

IP

H8

TCP

IP

ETH

239

IP Addresses

– 18.10.5.22 host in class A network (MIT)– 130.126.143.254 host in class B network (UIUC)– 192.12.70.111 host in class C network

• More recent classes– Multicast (class D): starts with 1110– Future expansions (class E): starts with 1111

Network Host

7 bits (126 nets) 24 bits (16 million hosts)

0Class A:

Network Host

14 bits (16k nets) 16 bits (64K hosts)

1 0Class B:

Network Host

21 bits (2 million nets) 8 bits (256)

1 1 0Class C:

240

Datagram Format

• 4-bit version (4 for IPv4, 6 for IPv6)

• 4-bit header length (in words, minimum of 5)

• 8-bit type of service (TOS) more or less unused

• 16-bit datagram length (in bytes)

• 8-bit protocol (e.g. TCP=6 or UDP=17)

Version HLen TOS Length

Ident Flags Offset

TTL Protocol Checksum

SourceAddr

DestinationAddr

Options (variable) Pad(variable)

0 4 8 16 19 31

Data

241

Internet Protocol (IP)

• Service model: glob address, H-H connect, BE• Overview of message transmission• Host addressing and address translation• Datagram forwarding• Fragmentation and reassembly• Error reporting/control messages• Dynamic configuration• Protocol extensions through tunneling• Note: congestion control not handled by IP

242

Fragmentation and Reassembly Example

H1 R1 R2 R3 H8

ETH IP (1400) FDDI IP (1400) PPP IP (512)

PPP IP (376)

PPP IP (512)

ETH IP (512)

ETH IP (376)

ETH IP (512)

Ident = x Offset = 0

Start of header

0

Rest of header

1400 data bytes


Start of header

1

Rest of header

512 data bytes


Start of header

1

Rest of header

512 data bytes


Start of header

0

Rest of header

376 data bytes

243

Datagram Forwarding

Network # Netmask Nest hop / port

18.0.0.0 255.0.0.0 1128.32.0.0 255.255.0.0 20.0.0.0 0.0.0.0 3

dest: 18.26.10.0 mask with 255.0.0.0 matched! send to port 1

dest: 128.16.14.0 mask with 255.0.0.0 not matchedmask with 255.255.0.0 not matchedmask with 0.0.0.0 matched! send to port 3

244

ARP Packet Format

TargetHardwareAddr (bytes 2 – 5)

TargetProtocolAddr (bytes 0– 3)

SourceProtocolAddr (bytes 2 – 3)

Hardware type = 1 Protocol Type = 0x0800

SourceHardwareAddr (bytes 4 – 5)

TargetHardwareAddr (bytes 0 –1)

SourceProtocolAddr (bytes 0 – 1)

HLen = 48 PLen = 32 Operation

SourceHardwareAddr (bytes 0– 3)

0 8 16 31

245

Internet Control Message Protocol (ICMP)

• IP companion protocol (not necessary)• Handles error and control messages

…

TFTPNVHTTPFTP

UDPTCP

IP

FDDI Ethernet ATM

ICMP

246

ICMP Message• Sent to the source when a node is unable to process

IP datagram successfully• Error messages

– Destination unreachable (protocol, port, or host)– Reassembly failed– IP Checksum failed; or invalid header– TTL exceeded (so datagrams don’t cycle forever)– Cannot fragment

• Control messages– Echo (ping) request and reply– Redirect (from router to source host, to change route)

247

Dynamic Host Configuration Protocol- DHCP

• DHCP server is required to provide configuration information to each host– Each host retrieve this information on bootup

• DHCP server can be configured manually, or it may allocate addresses on-demand– Addresses are “leased” for some period of time

• Each host is not configured for DHCP server, it performs a DHCP server discovery– A broadcast discovery message is sent by the host and a

unicast reply is sent by the server

248

Virtual Private Networks - VPN

• Controlled connectivity– Restrict forwarding to authorized hosts

• Controlled capacity– Change router drop and priority policies

– Provide guarantees on bandwidth, delay, etc.

• Virtual net replaces leased line with shared net

• Unwanted connectivity is prevented on this logical link using IP tunnel

249

IP Tunnel in VPNs

• Virtual point-to-point link between a pair of nodes separated by many networks

IP header,Destination = 2.x

IP payload

IP header,Destination = 10.0.0.1


IP payload


IP payload

Network 1 R1 Internetwork Network 2R2

10.0.0.1

250

IP Tunneling for Multicast• Set up a tunnel between each pair of universities• Multicast packets

– Received by tunnel entry node– Encapsulated (another IP header added for tunnel exit)– Travel through the Internet (the tunnel)– Received by tunnel exit node– Unwrapped and delivered to another

multicast-capable university campus

251

What is Routing ?• Definition: task of constructing and

maintaining forwarding information (in hosts or in switches)

• Goals for routing– Capture notion of “best” routes

– Propagate changes effectively

– Require limited information exchange

– Admit efficient implementation

• Important notion: graph representation of network

252

Routing Overview• Hierarchical routing infrastructure defines routing

domains – Where all routers are under same administrative

control• Network as a Graph

– Nodes are routers– Edges are links– Each link has a cost

• Problem: Find lowest cost path between two nodes– Maintain information about each link– Static: topology changes are not incorporated– Dynamic (or distributed): complex algorithms

4

3

6

21

9

1

1D

A

FE

B

C

253

Routing Outline• Algorithms

– Static shortest path algorithms• Bellman-Ford: all pairs shortest paths to destination• Dijkstra’s algorithm: single source shortest path

– Distributed, dynamic routing algorithms• Distance Vector routing (based on Bellman-Ford)• Link State routing (Dijkstra’s algorithm at each node)

• Metrics (from ArpaNet, with informative names)– Original– New– Revised

254

Bellman-Ford Algorithm• Static, centralized algorithm, (local iterations/destination)• Requires: directed graph with edge weights (cost)• Calculates: shortest paths for all directed pairs• Check use of each node as successor in all paths• For every node N

– for each directed pair (B,C)• is the path B N …C better than BC ?• is cost BNdestination smaller than previously

known?• For N nodes

– Uses an NxN matrix of (distance, successor) values

255

Dijkstra’s Algorithm• Static, centralized algorithm, build tree from source• Requires directed graph with edge weights (distance)• Calculates: shortest paths from 1 node to all other• Greedily grow set S of known minimum paths• From node N

– Start with S = {N} and one-hop paths from N– Loop n-1 times

• add closest outside node M to S• for each node P not in S

– is the path N .....MP better than NP ?

256

Distance Vector Routing• Distributed, dynamic version of Bellman-Ford

• Each node maintains distance vector: set of triples – (Destination, Cost, NextHop)

– Edge weights starting at a node assumed known by that node

• Exchange updates of distance vector (Destination, Cost) with directly connected neighbors (known as advertising the routes)– Periodically (on the order of several seconds to minutes)– Whenever vector changes (called triggered update)

257

Distance Vector Routing Example Information in routing table of each node:

Iteration 3

D

G

A

F

E

B

C

At distance to reach nodenode A B C D E F G A 0 1 1 2 1 1 2 B 1 0 1 2 2 2 3 C 1 1 0 1 2 2 2 D 2 2 1 0 3 2 1 E 1 2 2 3 0 2 3 F 1 2 2 2 2 0 1 G 2 3 2 1 3 1 0

258

Distance Vector Routing: Link Failure• F detects that link to G has failed• F sets distance to G to infinity and

sends update to A• A sets distance to G to infinity since

it uses F to reach G• A receives periodic update from C

with 2-hop path to G• A sets distance to G to 3 and sends

update to F• F decides it can reach G in 4 hops

via A

D

G

A

F

E

B

C

259

Count to Infinity Problem

• Link from A to E fails• A advertises distance of infinity to E, but

B and C advertise a distance of 2 to E !• B decides it can reach E in 3 hops;

advertises this to all• A decides it can read E in 4 hops;

advertises this to all• C decides that it can reach E in 5 hops…• We are counting to infinity …

D

G

A

F

E

B

C

260

Split Horizon

• Avoid counting to infinity by solving “mutual deception” problem

• When sending an update to node X, do not include destinations that you would route through X– If X thinks route is not through you, no effect– If X thinks route is through you, X will timeout route

AA BB CC

DD

C : 1 : C

C : 2 : B

C : ∞ : -C : 2 : B

Loop of > 2 nodes fails split horizon !!!

261

Split Horizon with Poison Reverse

• When sending update to node X, include destinations that you would route through X with distance set to infinity

• Don’t need to wait for X to timeout

262

Link State Routing• Distributed, dynamic form of Dijkstra’s algorithm

• Strategy– Send to all nodes (not just neighbors) information about

directly connected nodes (not entire route table) in LSP• Basic data structure: Link State Packet (LSP)

– ID of the node that created the LSP– Cost of link to each directly connected neighbor: vector

of (distance, successor) values– Sequence number (SEQNO)– Time-to-live (TTL) for this packet

263

Link State Routing• Each node maintains a list of (ideally all) LSP’s

– Runs Dijkstra’s algorithm on list– May discover its neighbors by “Hello” messages

• Information acquisition via reliable flooding– Create new LSP periodically; send to 1-hop neighbors

• Increment SEQNO (start SEQNO at 0 when reboot)– Store most recent (higher SEQNO) LSP from each node– Forward new LSP to all nodes but the one that sent it

• Decrement TTL of each LSP; discard when TTL=0– Try to minimize routing traffic “overhead”

264

Route Calculation

At node D

Confirmed list Tentative list

1. (D,0,-)

2. (D,0,-) (C,2,C), (B,11,B)

3. (D,0,-), (C,2,C) (B,11,B)

4. (D,0,-), (C,2,C) (B,5,C), (A,12,C)

5. (D,0,-), (C,2,C), (B,5,C) (A,12,C)

6. (D,0,-), (C,2,C), (B,5,C) (A,10,C)

7. (D,0,-), (C,2,C), (B,5,C), (A,10,C)

D

A

B

C

5 3

211

10

Documents

CS716 Advanced Computer Networks By Dr. Amir Qayyum