Upload
talli
View
41
Download
0
Tags:
Embed Size (px)
DESCRIPTION
CS716 Advanced Computer Networks By Dr. Amir Qayyum. 1. Lecture No. 25. Review Lecture. Switched Networks. A network can be defined recursively as. Two or more nodes connected by a link Circular nodes (switches) implement the network Squared nodes (hosts) use the network. - PowerPoint PPT Presentation
Citation preview
11
CS716
Advanced Computer Networks
By Dr. Amir Qayyum
2
Lecture No. 25
Review Lecture
4
Switched Networks
– Two or more nodes connected by a link
– Circular nodes (switches) implement the network
– Squared nodes (hosts) use the network
• A network can be defined recursively as...
5
Switched Networks
– Two or more networks connected by one or more nodes: internetworks
– Circular nodes (router or gateway) interconnects the networks
– A cloud denotes “any type of independent network”
• A network can be defined recursively as...
6
Switching Strategies
• Circuit switching: Carry bit streams
a. establishes a dedicated circuit
b. links reserved for use by communication channel
c. send/receive bit stream at constant rate
d. example: original telephone network
• Packet switching: Store-and-forward
messagesa. operates on discrete
blocks of datab. utilizes resources
dynamically according to traffic demand
c. send/receive messages at variable rate
d. example: Internet
7
Multiplexing• Physical links/switches must be shared among users
– (Synchronous) Time-Division Multiplexing (TDM)
– Frequency-Division Multiplexing (FDM)
L1
L2
L3
R1
R2
R3Switch 1 Switch 2
Multiple flows on a single link
Do you see any problem with TDM / FDM ?
8
Statistical Multiplexing• On-demand time-division, possibly synchronous (ATM)
• Schedule link on a per-packet basis
• Buffer packets in switches that are contending for the link
• Packets from different sources interleaved on link
…
Do you see any problem ?
9
Inter-Process Communication• Turn host-to-host connectivity into process-to-process
communication, making the communication meaningful.• Fill gap between what applications expect and what the
underlying technology provides.
Abstraction for application-level communication
Host
Host
Application
Host
Application
Host Host
Channel
10
Abstract Channel Functionality
• What functionality does a channel provide ?– Smallest set of abstract channel types adequate
for largest number of applications
• Where the functionality is implemented ?– Network as a simple bit-pipe with all high-level
communication semantics at the hosts
– More intelligent switches allowing hosts to be “dumb” devices (telephone network)
11
Performance Metrics
• … and to do so while delivering “good” performance
• Bandwidth (throughput)– Data transmitted per unit time, e.g. 10 Mbps– Link bandwidth versus end-to-end bandwidth– Notation
• KB = 210 bytes• Kbps = 103 bits per second
12
Performance Metrics• Latency / Delay
– Time to send message from point A to point B– One-way versus Round-Trip Time (RTT)– Components
Latency = Propagation + Transmit + QueuePropagation = Distance / cTransmit = Size / Bandwidth
• Note:• No queuing delay in direct (point-to-point) link• Bandwidth irrelevant if size = 1 bit• Process-to-process latency includes software processing overhead
(dominates over shorter distances)
13
Delay x Bandwidth Product
• Amount of data “in flight” or “in the pipe”• Example: 100ms RTT x 45Mbps BW = 560KB• This much data must be buffered before the sender
responds to slowdown the request
Delay
Bandwidth
14
Network Architecture
• The challenge is to fill the gap between hardware capabilities and application expectations, and to do so while delivering “good” performance
• Designers cope with this complex task by developing a network architecture as a guideline– Layering, protocols, standards
15
Layering• Alternative abstractions at each layer• Manageable network components• Modify layers independently
Hardware
Host-to-host connectivity
Application programs
Request/replychannel
Message streamchannel
16
Protocols
• Building blocks of a network architecture
• Each protocol object has two different interfaces– service interface: operations on this protocol
– peer-to-peer interface: messages exchanged with peer
• Term “protocol” is overloaded– Specification of peer-to-peer interface– Module that implements this interface– Peer modules are interoperable if both accurately
follow the specifications
17
Host 1 Host 2
Service
interface
Peer-to-peer
interface
Protocol Interfaces
High-level
object
High-levelobject
ProtocolProtocol
18
Protocol Graph – Network Architecture• Collection of protocols and their dependencies
– Most peer-to-peer communication is indirect– Peer-to-Peer is direct only at hardware level
Host 1 Host 2
Fileapplication
Digitallibrary
application
Videoapplication
Fileapplication
Digitallibrary
application
Videoapplication
RRP RRPMSP MSP
HHP HHP
RRP: Request Reply Protocol
MSP: Message Stream Protocol
HHP: Host-to-Host Protocol
19
Protocol Machinery
• Multiplexing and Demultiplexing (demux key)• Encapsulation (header/body) in peer-to-peer
interfaces– Indirect communication (except at hardware level)– Each protocol adds a header– Part of header includes demultiplexing field (e.g., pass
up to request/reply or to message stream?)
20
Encapsulation
Host 1 Host 2
Applicationprogram
Applicationprogram
Data Data
RRP RRP
RRP Data
HHP HHP
RRP DataHHP
RRP Data
21
Standard Architectures
• Open System Interconnect (OSI) Architecture– International Standards Organization (ISO)
– International Telecommunications Union (ITU), formerly CCITT
– “X dot” series: X.25, X.400, X.500
– Primarily a reference model
22
OSI Architecture
Application
Presentation
Session
Transport
End host
One or more nodeswithin the network
Network
Data link
Physical
Network
Data link
Physical
Network
Data link
Physical
Application
Presentation
Session
Transport
End host
Network
Data link
Physical
Application
Data formatting
Connection management
Process-to-process communication channel
Host-to-host packet delivery
Framing of data bits
Transmission of raw bits
User level
OS kernel
23
Internet Architecture
• TCP/IP Architecture– Developed with ARPANET and NSFNET
– Internet Engineering Task Force (IETF)
• Culture: implement, then standardize
• OSI culture: standardize, then implement
– Became popular with release of Berkeley Software Distribution (BSD) Unix; i.e. free software
– Standard suggestions traditionally debated publically through “Request For Comments” (RFC’s)
24
Internet Architecture
• Implementation and design done together• Hourglass Design (bottleneck is IP)• Application vs Application Protocol (FTP, HTTP)
… NETnNET2NET1
IP
TCP UDP
FTP HTTP NV TFTP
25
Internet Architecture
• Layering is not very strict
Application
TCP UDP
IP
Network
26
Networking in the Internet Age
27
Network Application Programming Interface (API)
• Interface that the OS provides to its networking subsystem– Most network protocols are implemented in software
– All systems implement network protocols as part of the OS
– Each OS is free to define its own network API
– Applications can be ported from one OS to another if APIs are similar
• *IF* application program does not interact with other parts of the OS other than the network (file system, fork processes, display …)
28
Protocols and API
• Protocols provide a certain set of services
• API provides a syntax by which those services can be invoked
• Implementation is responsible for mapping API syntax onto protocol services
29
Socket API• Use sockets as “abstract endpoints” of
communication• Issues
– Creating & identifying sockets– Sending & receiving data
• Mechanisms– UNIX system calls and library routines
socket
process
30
Protocol-to-Protocol Interface• A protocol interacts with a lower level protocol
like an application interacts with underlying network
• Why not using available network APIs for PPI ?– Inefficiencies built into the socket interface
• Application programmer tolerate them to simplify their task
– inefficiency at one level
• Protocol implementers do not tolerate them– inefficiencies at several layers of protocols
31
Protocol-to-Protocol Interface Issues
• Configure Multiple Layers– Static vs Extensible
• Process Model– Avoid context switches
• Buffer Model– Avoid data copies
32
Process Model
(a) (b)Process-per-Protocol Process-per-Message
inter-process communication
procedure call
33
Buffer Model
Buffer Copy Buffer Copy
Application Process
Topmost Protocol
send() deliver()
34
Network Programming
• Things to Learn– Internet protocols (IP, TCP, UDP, …)– Sockets API (Application Programming Interface)
• Why IP and Sockets• Allows a common name space across most of Internet
– IP (Internet Protocol) is standard• Reduces number of translations, which incur overhead
– Sockets: reasonably simple and elegant Unix interface (most servers run Unix)
35
Socket Programming
• Reading: Stevens 2nd edition, Chapter 1-6• Sockets API: A transport layer service interface
– Introduced in 1981 by BSD 4.1
– Implemented as library and/or system calls
– Similar interfaces to TCP and UDP
– Can also serve as interface to IP (for super-user) known as “raw sockets”
– Linux also provides interface to MAC layer (for super-user) known as “data-link sockets”
36
Client-Server Model
• Asymmetric relationship• Server/Daemon
– Well-known name
– Waits for contact
– Process requests, sends replies
• Client– Initiates contact
– Waits for response
Server
Client Client
Client
37
Client-Server Model• Bidirectional communication channel• Service models
– Sequential: server processes only one client’s requests at a time
– Concurrent: server processes multiple clients’ requests simultaneously
– Hybrid: server maintains multiple connections, but processes requests sequentially
• Server and client categories not disjoint– Server can be client of another server– Server as client of its own client (peer-to-peer
architecture)
38
TCP Connections
• TCP connection setup via 3-way handshake– J and K are sequence numbers for messages
Client Server
SYN J
SYN K
ACK J+1
ACK K+1 Hmmm … RTT is
important!
39
TCP Connections
• TCP connection teardown (4 steps) (either client or server can initiate connection teardown)
Client Server
FIN J
FIN K
ACK K+1
ACK J+1
active close
passive close
closes connection
Hmmm … Latency matters!
40
UDP - Aspects of Services
• Unit of transfer is a datagram (variable length packet)
• Unreliable, drops packets silently
• No ordering guarantees
• No flow control
• 16-bit port space (distinct from TCP ports) allows multiple recipients on a single host
41
Addresses and Data
• Internet domain names: human readable– Mnemonic
– Variable Length
• e.g. www.case.edu.pk, www.carepvtltd.com (FQDN)
• IP addresses: easily handled by routers/computers– Fixed Length
– Tied (loosely) to geography
• e.g. 131.126.143.82 or 212.0.0.1
42
Endianness
• Machines on Internet have different endianness
• Little-endian (Intel, DEC): least significant byte of word stored in lowest memory address
• Big-endian (Sun, SGI, HP): most significant byte...
43
Socket Address Structures• Socket address structures (all fields in network byte order
except sin_family)
IP addressstruct in_addr {
in_addr_t s_addr; /* 32-bit IP address */
};
TCP or UDP addressstruct sockaddr_in {
short sin_family; /* e.g., AF_INET */ushort sin_port; /* TCP / UDP port */struct in_addr; /* IP address */
};
44
Address Conversion• All binary values used and returned by these functions
are network byte ordered
struct hostent* gethostbyname (const char* hostname);
translates English host name to IP address (uses DNS)
struct hostent* gethostbyaddr (const char* addr, size_t len, int family);
translates IP address to English host name (not secure)
int gethostname (char* name, size_t namelen);reads host’s name (use with gethostbyname to find local
IP)
45
Address Conversionin_addr_t inet_addr (const char* strptr);
translate dotted-decimal notation to IP address; returns -1 on failure, thus cannot handle broadcast value “255.255.255.255”
int inet_aton (const char* strptr, struct in_addr inaddr);
translate dotted-decimal notation to IP address; returns 1 on success, 0 on failure
char* inet_ntoa (struct in_addr inaddr);
translate IP address to ASCII dotted-decimal notation (e.g., “128.32.36.37”); not thread-safe
46
Socket API
• Creating a socketint socket(int domain, int type, int protocol)
• domain (family) = AF_INET, PF_UNIX, AF_OSI• type = SOCK_STREAM, SOCK_DGRAM• protocol = TCP, UDP, UNSPEC• return value is a handle for the newly created
socket
47
Sockets (cont)
• Passive Open (on server)int bind(int socket, struct sockaddr *addr, int
addr_len)int listen(int socket, int backlog)int accept(int socket, struct sockaddr *addr,
int addr_len)
• Active Open (on client)int connect(int socket, struct sockaddr *addr,
int addr_len)
48
Sockets (cont)
• Sending Messagesint send(int socket, char *msg, int mlen, int
flags)
• Receiving Messagesint recv(int socket, char *buf, int blen, int
flags)
49
Point-to-Point Links
Reading: Peterson and Davie, Ch. 2
OutlineHardware building blocksEncodingFramingError DetectionReliable transmission
• Sliding Window Algorithm
50
Direct Link Issues in the OSI and Hardware/Software Contexts
Transport
Network
Data Link
Physical
Session
Presentation
Application
user-level software
kernel software (device drivers)
reliability
framing, error detection, MAC
encoding hardware (network adapter)
51
Hardware Building Blocks
• Nodes– Hosts: general-purpose computers– Switches: typically special-purpose hardware– Routers (connecting networks): varies
• Links– Copper wire with electronic signaling– Glass fiber with optical signaling– Wireless with electromagnetic (radio, infrared,
microwave) signaling
52
Links
• Physical Media– Twisted pair cable– Coaxial cable– Optical fiber– Space
• Media is used to propagate signals• Signals are electromagnetic waves of certain
frequency, traveling at speed of light
53
Signals Over a Link
• Signal is modulated for transmission– Varying frequency/amplitude/phase to
receive distinguishable signals
• Binary data (0s and 1s) is encoded in a signal– Make it understandable by the receiving
host
54
Bits Over a Link
• Bit streams may be transmitted both ways at a time on a point-to-point link– Full Duplex
• Sometimes two nodes must alternate link usage– Half Duplex
55
Encoding
• Signals propagate over a physical medium– Modulate electromagnetic waves
– e.g. vary voltage
• Encode binary data onto signals that propagate
Signalling component
Signal
Bits
Node NodeAdaptor Adaptor
56
Encoding
• Problems with signal transmission– Attenuation: signal power absorbed by medium
– Dispersion: a discrete signal spreads in space
– Noise: random background “signals”
modulator demodulatora string
of signals
Digital data (a string of symbols)
Digital data (a string of symbols)
57
RS-232(-C)
• Communication between computer and modem
• Uses two voltage levels (+15V, -15V), a binary voltage encoding
• Data rate limited to 19.2 kbps (RS-232-C) raised in later standards
58
Binary Voltage Encoding
• NRZ (Non-Return to Zero)
• NRZI (NRZ Inverted)
• Manchester (used by IEEE 802.3, 10 Mbps Ethernet)
• 4B/5B (8B/10B) in Fast Ethernet
59
Non-Return to Zero (NRZ)
• Encode binary data onto signals– e.g. 0 as low signal and 1 as high signal
– Voltage does not return to zero between bits
• Known as Non-Return to Zero (NRZ)
Bits
NRZ
0 0 1 0 1 1 1 1 0 1 0 0 0 0 1 0
60
Problem: Consecutive 1s or 0s
• Low signal (0) may be interpreted as no signal• High signal (1) leads to baseline wander• Unable to recover clock
– Sender’s and receiver’s clock have to be precisely synchronized
– Receiver resynchronizes on each signal transition
– Clock Drift in long periods without transition
Sender’s clock
Receiver’s clock
61
Alternative Encodings• Non-Return to Zero Inverted (NRZI)
• Make a transition from current signal (switch voltage level) to encode/transmit a “one”
• Stay at current signal (maintain voltage level) to encode/transmit a “zero”
• Solves the problem of consecutive ones (shifts to 0s)
62
Alternative Encodings• Manchester (in IEEE 802.3 – 10 Mbps
Ethernet)
• Split cycle into two parts– Send high--low for “1”, low--high for “0”– Transmit XOR of NRZ encoded data and the
clock
• Only 50% efficient (1/2 bit per transition)
63
4B/5B Encoding
• Every 4 consecutive bits of data encoded in a 5-bit code (symbol)– 4-bit pattern is “translated” to a 5-bit pattern (not addition)
• 5-bit codes selected to have no more than one leading 0 and no more than two trailing 0s – 00xxx (8 symbols) and xx000 (4 symbols) are illegal– 5 free symbols (non-data)
• Thus, never gets more than three consecutive 0s• Resulting 5-bit codes are transmitted using NRZI • Achieves 80% efficiency
64
Binary Voltage Encoding
• Problem: wide frequency range required, implying– Significant dispersion– Uneven attenuation
• Prefer to use narrow frequency band (carrier frequency)
• Types of modulation– Amplitude Modulation (AM)– Frequency Modulation (FM)– Phase/Phase Shift– Combination of these (e.g. QAM)
65
Phase Modulation Algorithm
• Send carrier frequency for one period• Perform phase shift• Shift value encodes symbol
– Value in range [0, 360º]– Multiple values for multiple symbols– Represent as circle
1350 450
2250 3150
1800 00
900
2700
8-symbol example
66
Constellation Pattern for V.32 QAM
For a given symbol:1. Perform phase shift
2. Change to new amplitude 450
150
• Points in constellation diagram– Chosen to maximize error
detection– Process called trellis coding
67
Bit Rate and Baud Rate
• Bit rate is bits per second
• Baud rate is “symbols” per second
• If each symbol contains 4 bits then data rate is 4 times the baud rate
68
What Limits Baud Rate ?
• Baud rates are typically limited by electrical signaling properties
• No matter how small the voltage or how short the wire, changing voltages takes time
• Electronics are slow as compared to optics
69
Summary of Encoding
• Problems: attenuation, dispersion, noise• Digital transmission allows periodic regeneration• Variety of binary voltage encodings
– High frequency components limit to short range– More voltage levels provide higher data rate
• Carrier frequency and modulation– Amplitude, frequency, phase, and combination (QAM)
• Nyquist (noiseless) and Shannon (noisy) limits on data rates
70
Framing• Breaks continuous stream/sequence of bits into a
frame and demarcates units of transfer• Typically implemented by network adaptor
– Adaptor fetches/deposits frames out of or into host memory
Frames
BitsAdaptor Adaptor Node BNode A
71
Advantages of Framing
• Synchronization recovery– Consider continuous stream of unframed bytes– Recall RS-232 start and stop bits
• Multiplexing of link– Multiple hosts on shared medium– Simplifies multiplexing of logical channels
• Efficient error detection– Frame serves as unit of detection (valid or invalid)– Error detection overhead scales as log N
72
Approaches
• Organized by end of frame detection method
• Approaches to framing– Sentinel (marker, like C strings)– Length-based (like Pascal strings)– Clock-based
73
Approaches
• Other aspects of a particular approach– Bit-oriented or byte-oriented– Fixed or variable length– Data-dependent or data-independent
length
74
Framing with Sentinels
• End of frame: special byte or bit pattern
• Choice of end of frame marker– Valid data byte or bit sequence e.g. 01111110– Physical signal not used by valid data symbol
8 16 16 8
Beginning sequence
Header Body CRCEnding
sequence
75
Sentinel Based Approach
• Problem: equal size frames are not possible– Frame length is data-dependent
• Sentinel based framing examples– High-Level Data Link Control (HDLC)
protocol– Point-to-Point Protocol (PPP)– ARPANET IMP-IMP protocol– IEEE 802.4 (token bus)
76
Length-based Framing
• Include payload length in header• e.g., DDCMP (byte-oriented, variable-length)• e.g. RS-232 (bit-oriented, implicit fixed length)
• Problem: count field corrupted• Solution: catch when CRC fails
8 148
SYN SYN Class Length
8 42
Header
16
Body CRC
77
Clock-based Framing
• Continuous stream of fixed-length frames– Each frame is 125µs long (all STS formats) (why?)
• Clocks must remain synchronized• e.g. SONET: Synchronous Optical NETwork
– Dominated standard for long distance transmission
– Multiplexing of low-speed links onto one high-speed link
– Byte-interleaved multiplexing
– Payload bytes are scrambled (data XOR 127 bit-pattern)
– STS-n (STS – 1 = 51.84 Mbps)
78
SONET Frame Format (STS-1)
Overhead Payload
90 columns
9 rows
79
Clock-based Framing
• Problem: how to recover frame synchronization– 2-byte synchronization pattern starts each
frame (unlikely to occur in data)
– Wait until pattern appears in same place repeatedly
80
Clock-based Framing
• Problem: how to maintain clock synchronization– NRZ encoding, data scrambled (XOR’d)
with 127-bit pattern
– Creates transitions
– Also reduces chance of finding false sync pattern
81
Error Detection
• Validates correctness of each frame
• Errors checked at many levels
• Demodulation of signals into symbols (analog)
• Bit error detection/correction (digital)—our main focus– Within network adapter (CRC check)
– Within IP layer (IP checksum)
– Possibly within application as well
82
Error Detection and Correction
• Possible binary voltage encoding symbol
• Neighborhoods and erasure region
+15
-15
volt
age
0
1
? (erasure)
• Possible QAM symbol
• Neighborhoods in green
• All other space results in erasure
Input to digital level: valid symbols or erasures
83
Error Detection: How ?• How to detect error ?
– Add redundant information to a frame to determine errors
• Transmit two complete copies of data– n redundant bits for n-bit message– Error at the same position in two copies
go undetected
84
Error Detection: How ?
• We want only k redundant bits for an n-bit message, where k < < n– In Ethernet, 32-bit CRC for 12,000 bits (1500
bytes)
• k bits are derived from the original message
• Both the sender and receiver know the algorithm
85
Hamming Distance (1950 Paper)
• Minimum number of bit flips between code words– 2 flips for parity– 3 flips for voting
• n-bit error detection– No code word changed into another code
word– Requires Hamming distance of n+1
86
Hamming Distance (1950 Paper)
• n-bit error correction– N-bit neighborhood: all code words
within n bit flips– No overlap between n-bit
neighborhoods– Requires Hamming distance of 2n+1
87
Digital Error Detection Techniques• Two-dimensional parity
– Detects up to 3-bit errors– Good for burst errors
• Internet checksum (used as backup to CRC)– Simple addition– Simple in software
• Cyclic redundancy check (CRC)– Powerful mathematics– Tricky in software, simple in hardware– Used in network adapter
88
Two-Dimensional Parity
• Adding one extra bit to a 7-bit code to balance 1s
• extra parity byte for the entire frame
• Catches all 1, 2 and 3 bit errors and most 4 bit errors
• 14 redundant bits for a 42-bit message, in the example
1011110 1
1101001 0
0101001 1
1011111 0
0110100 1
0001110 1
1111011 0
Paritybits
Paritybyte
Data
89
Internet Checksum Algorithm
• Not used at the link level but provides same sort of functionality as CRC and parity
• Idea:– Add up all words (16-bit integers) that are transmitted– Transmit the result (checksum) of that sum– Receiver performs the same calculation on received data
and compares the result with the received checksum– If the results do not match, an error is detected
• 16 redundant bits for a message of any length• Weak protection, accepted as a last line of defense
90
Cyclic Redundancy Check
Theory• Based on finite-field (binary-valued) arithmetic• Bit string represented as polynomial• Coefficients are binary-valued• Divide bit string polynomial by generator
polynomial to generate CRC
Practice• Bitwise XOR’s
91
Cyclic Redundancy Check
• Add k bits of redundant data to an n-bit message– Want k << n
– e.g. k = 32 and n = 12,000 (1500 bytes)
• Represent n-bit message as n-1 degree polynomial– e.g. MSG=10011010 as M(x) = x7 + x4 + x3 + x1
– Sender and receiver exchange polynomials
• Let k be the degree of some agreed-upon divisor/ generator polynomial– e.g. C(x) = x3 + x2 + 1
92
Cyclic Redundancy Check
• Transmit polynomial P(x) that is evenly divisible by C(x) – Shift left k bits, i.e. M(x)xk
– Add remainder of M(x)xk / C(x) into M(x)xk
• Receiver receives polynomial P(x) + E(x)– E(x) = 0 implies no errors
• Receiver divides (P(x) + E(x)) by C(x); remainder will be zero ONLY if:– E(x) was zero (no error), or– E(x) is exactly divisible by C(x)
93
Reliable Transmission
• Error-correcting codes are not advanced enough to handle the range of bit and burst errors– Corrupt frames generally must be discarded– A reliable link-level protocol must recover from
discarded frames
• Goals for reliable transmission– Make channel appear reliable– Maintain packet order (usually)– Impose low overhead / allow full use of link
94
Reliable Transmission
• Reliability accomplished using acknowledgments and timeouts– ACK is a small control frame
confirming reception of an earlier frame
– Having no ACK, sender retransmits after a timeout
95
Reliable Transmission
• Automatic Repeat reQuest (ARQ) algorithms– Stop-and-wait
– Concurrent logical channels
– Sliding window
• Go-back-n, or selective repeat
• Alternative: Forward Error Correction (FEC)
96
Automatic Repeat reQuest• Acknowledgement (ACK)
– Receiver tells sender when frame received
– Cumulative ACK (used by TCP): have received specified frame and all previous
– Selective ACK (SACK): specifies set of frames received
– Negative ACK (NACK or NAK): receiver refuses to accept frame now, e. g. when out of buffer space
97
Automatic Repeat reQuest• Timeout: sender decides that frame was
lost and tries again
• ARQ also called
Positive Acknowledgement with Retransmission (PAR)
98
Stop-and-Wait• Send a single frame• Wait for ACK or timeout
– If ACK received, continue with next frame
– If timeout occurred, send again (and wait)• Frame lost in transit; or corrupted and discarded
Sender Receiver
Frame 0
Frame1
ACK0
ACK1
99
Stop-and-Wait• Frames delivered reliably and in order• Is that enough ?
– No, we need performance, too.
• Problem: keeping the pipe full … ?• Example
– 1.5Mbps link x 45ms RTT = 67.5Kb (~8KB)– 1KB frames implies 182 Kbps (1/8th link utilization)– Want the sender to transmit 8 frames before waiting
for ACK– Throughput remains 182 Kbps regardless of the link
bandwidth !!
100
Concurrent Logical Channels• Multiplex several logical channels over a single p-to-p
physical link (include channel ID in header)• Use stop-and-wait for each logical channel• Maintain three bits of state for each logical channel:
– Boolean saying whether channel is currently busy– Sequence number for frames sent on this channel– Next sequence number to expect on this channel
• ARPANET IMP-IMP supported 8 logical channels over each ground link (16 over each satellite link)
101
Concurrent Logical Channels
• Header for each frame include 3-bit channel number and 1-bit sequence number– Same number of bits (4) as the sliding
window requires to support up to 8 outstanding frames on the link
102
Sliding Window• Allow sender to transmit multiple frames before
receiving an ACK, thereby keeping the pipe full• Upper bound on outstanding un-ACKed frames• Also used at the transport layer (by TCP)
Sender Receiver
Tim
e
……
103
Sliding Window Concepts• Consider ordered stream of data
– Broken into frames
– Stop-and-Wait
• Window of one frame
• Slides along stream over time
• Sliding window algorithms generalize this notion– Multiple-frame send window
– Multiple-frame receive window
time
104
Sliding Window - Sender• Assign sequence number to each frame (SeqNum)• Maintain three state variables:
– Send Window Size (SWS)– Last Acknowledgment Received (LAR)– Last Frame Sent (LFS)
• Maintain invariant: LFS – LAR ≤ SWS
• Advance LAR when ACK arrives • Buffer up to SWS frames and associate timeouts
time
14 1512 1311 19 2017 1816
LAR=13 LFS=18≤ SWS
105
Sliding Window - Receiver• Maintain three state variables
– Receive Window Size (RWS)– Largest Frame Acceptable (LFA)– Next Frame Expected (NFE)
• Maintain invariant: LFA – NFE+1 ≤ RWS
• Frame SeqNum arrives:– If NFE ≤ SeqNum ≤ LFA accept– If SeqNum ≤ NFE or SeqNum > LFA discarded
• Send cumulative ACKs
time
14 1512 1311 19 2017 1816
NFE=13 LFA=17≤ RWS
106
Sliding Window Issues• When a timeout occurs, data in transit decreases
– Pipe is no longer full when packet losses occur– Problem aggravates with delay in packet loss detection
• Early detection of packet losses improves performance:– Negative Acknowledgements (NACKs)– Duplicate Acknowledgements– Selective Acknowledgements (SACKs)
• Adds complexity but helps keeping the pipe full
107
Sliding Window Classification
• Stop-and-wait: SWS=1, RWS=1• Go-back-N: SWS=N, RWS=1• Selective repeat: SWS=N, RWS=M
(usually M = N)
Selective RepeatGo-back-N
Stop-and-Wait
108
Sequence Number Space
• SeqNum field is finite; sequence numbers wrap around
• Sequence number space must be larger than number of outstanding frames (SWS)
• SWS <= MaxSeqNum-1 is not sufficient– Suppose 3-bit SeqNum field (0..7); SWS=RWS=7– Sender transmits frames 0..6; which arrive successfully
(receiver window advances)– ACKs are lost; sender retransmits 0..6– Receiver expecting 7, 0..5, but receives second
incarnation of 0..5 assuming them as 8th to 13th frame
109
Required Sequence Number Space ?
• Assume SWS=RWS (simplest, and typical)– Sender transmits full SWS– Two extreme cases at receiver
• None received (waiting for 0…SWS – 1)• All received (waiting for SWS…2 × SWS – 1)
• All possible packets must have unique SeqNum• SWS < (MaxSeqNum+1)/2 or SWS+RWS < MaxSeqNum+1 is the correct rule• Intuitively, SeqNum “slides” between two halves
of sequence number space
110
Shared Media: Problems• Problem: demands can conflict, e. g.
two hosts send simultaneously– STDM does not address this problem -
centralized
– Solution is a medium access control (MAC) algorithm
111
Shared Media: Solutions• Three solutions (out of many)
– Carrier Sense Multiple Access with Collision Detection (CSMA / CD)
• Send only if medium is idle
• Stop sending immediately if collision detected
– Token Ring/FDDI pass a token around a ring; only token holder sends
– Radio / wireless (IEEE 802.11)
112
History of Ethernet
• Developed by Xerox PARC in mid-1970s• Roots in Aloha packet-radio network• Standardized by Xerox/DEC/Intel in 1978• Similar to IEEE 802.3 standard• IEEE 802.3u standard defines Fast
Ethernet (100 Mbps)• New switched Ethernet now popular
113
Ethernet – Alternative Technologies
• Can be constructed from a thinner cable (10Base2) rather than 50-ohm coax cable (10Base5)
• Newer technology uses 10BaseT (twisted pair)– Several point-to-point segments coming out of a
multiway repeater called “hub”
HubHub
114
Ethernet – Multiple Segments
• Repeaters forward the broadcast signal on all out going segments (10Base5)
• Maximum of 4 repeaters (2500m), 1024 hosts
Repeater
Host
…
…
…
115
Ethernet Packet Frame
• Preamble allows the receiver to synchronize with signal
• Frame must contain at least 46 bytes to detect collision
• 802.3 standard substitutes length with type field– Type field (demux key) is the first thing in data portion– A device can accept both frames: type > 1500
Destaddr
64 48 32
CRCPreamble Srcaddr
Type Body
1648
116
Ethernet MAC – CSMA/CD
• Multiple access– Nodes send and receive frames over a shared
link
• Carrier sense– Nodes can distinguish between an idle and
busy link
• Collision detection– A node listens as it transmits to detect
collision
117
CSMA/CD MAC Algorithm
• If line is idle (no carrier sensed)– Send immediately
– Upper bound message size of ~1500 bytes
– Must wait 9.6µs between back-to-back frames
118
CSMA/CD MAC Algorithm
• If line is busy (carrier sensed) …– Wait until the line becomes idle and then
transmit immediately
– Called 1-persistent (special case of p-persistent)
• If collision detected– Stop sending data and jam signal
– Try again later
119
Constraints on Collision Detection
• In our example, consider– my-machine’s message reaches your-machine
at T
– your-machine’s message reaches my-machine at 2T
• Thus, my-machine must still be transmitting at 2T
120
Ethernet Min. Frame Size
• RTT on a maximally configured Ethernet of 2500m, with 4 repeaters is about 51.2 μs– 2500m / 2 x 108 m/s = 12.5 µs– 2 x 12.5 = 25 us + repeater delays
• 51.2 μs on 10 Mbps corresponds to 512 bits (64 bytes)
• Therefore, the minimum frame length for Ethernet is 64 bytes (header + 46 bytes data)
121
Retry After the Collision• How long should a host wait to
retry after a collision ?– Binary exponential backoff
• Maximum backoff doubles with each failure (exponential)
• After N failures, pick an N-bit number
• 2N discrete possibilities from 0 to maximum
122
Ethernet Frame Reception
• Sender handles all access control• Receiver simply pulls frames from
network• Ethernet controller/card
– Sees all frames– Selectively passes frames to host
processor
123
Experience With Ethernet
• Number of hosts limited to 200 in practice, standard allows 1024
• Range much shorter than 2.5 km limit in standard
• Round-trip time is typically 5 or 10 μs, not 50μs
124
Token Ring Overview
• Token Ring network “was” a candidate to replace Ethernet; used in some MAN backbones– 16Mbps IEEE 802.5 (based on earlier 4Mbps IBM ring)
– 100Mbps Fiber Distributed Data Interface (FDDI)
125
IBM Token Ring – IEEE 802.5
• Ring is viewed as a single shared medium– Each node is allowed to transmit according to
some distributed algorithm for medium access
– All nodes see all frames; destination saves a copy of frame as it flows past
• The term “token” indicates the way the access to shared channel is managed
126
Token in a Token Ring
• Token is a special bit pattern that rotates around the ring– A node must capture token before transmitting– A node releases token after done transmitting
• Immediate release-token follows last frame (FDDI)
• Delayed release – after last frame returns to sender
127
Token in a Token Ring
• Remove your frame when it comes back around– Transmit another frame or re-insert the
token
• Stations get round-robin service as the token circulates around the ring
128
Physical Properties
• Data rate can be 4 Mbps or 16 Mbps
• Encoding of bits uses differential Manchester
• Ring may have up to 250 (802.5) or 260 (IBM) nodes
• Physical medium is twisted pair (IBM Token Ring)
129
Token Ring MAC
• Network adaptor contains receiver, transmitter and some storage of bits between them
• Token circulates if no station has anything to send– Ring must have enough capacity to store entire token
– At least 24 stations with 1-bit storage for 24-bit long token (if propagation delay is negligible)
– This situation is avoided by designating a monitor
130
Token Ring MAC
• Any station that has a data to send can seize token
• In 802.5, simply 1 bit in second byte token is modified
• First two bytes of modified token become preamble for the next frame
131
Frame Format
• “Illegal” Manchester codes in the start and end delimiters
• Frame priority and reservation bits in access control byte
• Demux key in frame control byte• A and C bits for reliable delivery, in status
byte
Body CRCSrcaddr
Variable48
Destaddr
48 32
Enddelimiter
8
Framestatus
8
Framecontrol
8
Accesscontrol
8
Startdelimiter
8
132
Timed Token Algorithm
• Token Holding Time (THT)– Upper limit on how long a station can hold the
token– A node checks before putting each frame on ring
that its transmit time would not cause THT to exceed
– Long THT achieves better utilization with few senders
– Short THT helps when multiple nodes have data to send
133
Reliable Delivery
• The A and C bit in the packet trailer for reliability
• Both bits are initially set to 0
• Destination sets A bit if it sees the frame and sets C bit if it copies the frame into its adaptor
134
Token Ring Packet Priorities
• A station willing to send priority n packet can set reservation bits to n, if this makes it lower in value– It captures the token when the current sender
releases it with priority set to n
• Strict priority scheme: no lower-priority packets get sent when higher priority packets are waiting
135
Token Maintenance
• Token rings have a designated monitor node
• Any station can become the monitor according to a well defined procedure
• Monitor is elected when the ring is first connected, or when the current monitor fails
136
Token Maintenance
• Monitor periodically announces its presence
• Claim token sent by a station seeing no monitor– If the sender receives back the claim token, it
becomes monitor
– If another station is also contending for monitor, some rule defines the monitor
137
Fiber Distributed Data Interface• Similar to 802.5/IBM token rings but runs on fiber• Consists of a dual ring: two independent rings that
transmit data in opposite directions at 100Mbps• Tolerates a single link break or node failure (self-
healing ring)
(a) (b)
138
FDDI – Physical Properties
• Variable size buffer (9 – 80 bits) between input and output interfaces (10ns bit time)– Not required to fill buffer before starting
transmission
• Maximum 500 stations, maximum 2 km distance between any pair of stations
139
FDDI – Physical Properties
• Total 200 km fiber: dual nature implies 100 km cable connecting all stations
• Physical media can be coax or twisted pair cable
• Uses 4B/5B encoding
140
Timed Token Algorithm
• Token Holding Time (THT)– Upper limit on how long a station can hold the
token
– Configured to some suitable value
• Token Rotation Time (TRT)– How long it takes the token to traverse the ring
(time since a host released the token)–TRT <= ActiveNodes x THT + RingLatency
141
Timed Token Algorithm
• Target Token Rotation Time (TTRT)–“agreed-upon” or negotiated
upper bound on TRT
142
MAC Algorithm
• Each node measures TRT between successive token arrivals
• If measured-TRT > TTRT– Token is late– Can not send data
143
FDDI Traffic Classes
• Synchronous traffic–Latency sensitive–Gets higher priority–Can always send data
144
Bounded Priority Traffic
• If a node has large amount of synchronous data– It will send regardless of measured TRT
– TTRT will become meaningless !!!
• Therefore, total synchronous data during one token rotation is bounded by TTRT
145
Token Maintenance
• The procedure when a node– Joins the ring (startup)– Suspects a failure
• Claim frame is used in order to– Generate a new Token– Agree on TTRT (so that an application can
meet its timing constraints)• A node can send a claim frame without
holding the token
146
Frame Format• 4B/5B control symbols for start and end of frame• Control Field
– 1st bit: asynchronous (0) versus synchronous (1) data– 2nd bit: 16-bit (0) versus 48-bit (1) addresses– Last 6 bits: demux key (includes reserved patterns for
token and claim frame)
• Status Field– From receiver back to sender; error in frame– Recognized address; accepted frame (flow control)
Body CRCSrcaddr
Variable48
Destaddr
48 32
End offrame
8
Status
24
Control
8
Start offrame
8
147
Wireless LANs
• IEEE 802.11 standard– Designed for use in a small area (offices,
campuses)
• Bandwidth: 1, 2 or 11 Mbps– Up to 56Mbps in newer 802.11a standard
• Targets three physical media– Two spread spectrum radio (2.4GHz freq)– One diffused infrared (10m range, 850 nm band)
148
802.11 MAC: CSMA/CA
• Similar to Ethernet …– Defer the transmission until the link
becomes idle– Take back off if collision occurs
• Is it sufficient ?• All nodes are not always within reach
of (to hear) each other
149
Hidden and Exposed Nodes• Hidden nodes
– Sender thinks its OK to send when its not (false +ve)– A-C and B-D are hidden nodes in the figure below
• Exposed nodes– Sender does not send when its OK to send (false –ve)– B and C are exposed nodes in the figure below
A B C D
150
Multiple Access with Collision Avoidance (MACA)
• Sender transmits RequestToSend (RTS) frame– Contains intended time to hold the
medium
• Receiver replies with ClearToSend (CTS) frame
151
MACA for Wireless (MACAW)
• Collision detection–No active collision detection–Known only if CTS or ACK is not
received–Binary exponential back off (BEB)
is used in case of collision, like in Ethernet
152
802.11 - Distribution System
• Nodes roam freely but operate within a structure– Tethered by wired network infrastructure
(Ethernet ?)
– Each Access Point (AP) services nodes in some region
– Each mobile node associates itself with an AP
153
Managing Connectivity/Roaming
• How wireless nodes select Access Point ?
• Scanning (active search for an AP)– Node sends Probe frame– All AP’s within reach reply with Probe Response frame
– Node selects one AP; sends it Associate Request frame
– AP replies with Association Response– New AP informs old AP via wired backbone
154
Managing Connectivity• Active scanning: when a node join or move• Passive scanning: AP periodically sends Beacon frame, advertising its capabilities
B
H
A
F
G
D
AP-2
AP-3AP-1
EC
C
Distribution system
155
Frame Format
• Control field contains three subfields:– 6-bit Type field (data, RTS, CTS, scanning);
– 1-bit ToDS; and
– 1-bit FromDS
• A single frame contains up to 2312 bytes of data
Addr4 PayloadSeqCtrlAddr3Addr2Addr1 CRC
0– 18,4964816 32484848
Duration
16
Control
16
ToDS=0, FromDS=0 C A
ToDS=1, FromDS=1 E AP-3 AP-1 A
156
Overview
• Also called network interface card (NIC)• Components (high-level overview)• Options for use
– Data motion
– Event notification
• Potential performance bottlenecks• Programming device drivers
157
Typical Workstation Architecture
CPU
Cache $
MemoryI/O bus
NetworkAdaptormemory
bus
Communication ?
To Network
To Network
Typically where data link functionality is implemented
158
Components of a Network Adaptor• Bus interface communicates with a specific host
– Bus defines protocol for CPU-adaptor communication
• Link interface speaks correct protocol on network– Implemented by a chip set, in software or on FPGA
• Buffering between different speed bus and link
Ho
st I/
O b
us
Network Adaptor
Bus Interface
Link Interface network network
159
Host Perspective
• Adaptor is ultimately programmed by CPU
• Adaptor exports a Control Status Register (CSR)
• CSR is readable and writable from CPU at some memory address
160
Data Motion Options for Network Adaptor Use
• Transfer frames between adaptor and host memory
• Programmed input/output (PIO)– Processor manages itself each
access (loads/stores)– Faster than DMA for small amounts
of data
161
Data Motion Options for Network Adaptor Use
• Direct memory access (DMA)– Adaptor gets buffer descriptor lists by
host for read/write– Processor is not involved: free to do other
things– Can be faster than memory copy through
CPU– Start-up cost
162
Data Motion
CPU
Cache $
MemoryI/O bus
NetworkAdaptormemory
busTo
Network
To Network
Data movement path using PIO
Data movement path using DMA
163
Network Adaptor: Event Notification
• Hardware interrupts– Processor free to do other things
– Events delivered “immediately”
– State (register) save/restore
expensive
– Context switches more
expensive
164
Network Adaptor: Event Notification
• Event polling– Processor must periodically
check
– Events wait until next check
– No extra state changes
165
Device Drivers
• Operating system routines anchoring protocol stack to network hardware
• Initialize device, transmit frames, field interrupts
• Code contains device specific details– Difficult to read but simple in logic
166
Performance Bottlenecks
• Link capacity
• Processor computing power
• I/O bus bandwidth– Overhead involved in each bus
transfer
167
Performance Bottlenecks
• Memory bus bandwidth– Memory hierarchy with cache
levels– Memory accesses results in
multiple memory copies in
different buffers
168
Packet Switches• A multi-input multi-output
device• Local star topology• Performance independent of
connectivity– (e.g. adding new host) if switch is
designed with enough aggregate capacity
• Maximum degree < physical network limit
169
Forwarding
• Packets arrive at one of the several inputs and have to be forwarded/switched to one of the available outputs– Connectionless and connection-oriented approach to
determine the correct output
Which way should it go ?
First challenge: forwarding
170
Routing
• Forwarding requires information
Second challenge:
routing
How to maintain forwarding information ?
171
Contention and Congestion
• If arrival rate for a certain output is greater than the output capacity, then contention occurs
• If arrival rate of packets is too high to cause buffer overflow, then congestion occurs
Who goes first ?
Any one is dropped ?
172
Network Layers and Switches
One or more nodes
within the network
User level
OS kernel
host
switchswitch
between different physical layers
Transport
Network
Data Link
Physical
Session
Presentation
Application
Network
Data Link
Physical
173
Packet Switching / Forwarding
• Three approaches– Datagram or connectionless approach
– Virtual circuit or connection-oriented approach
– Source routing
• Important notion: unique global address per host
174
Datagram Switching / Forwarding
• Every packet contains enough information– Enables switch to decide how to forward it
• Switch translates global address to output port– Maintains forwarding table for translations
• Each packet forwarded and travels independently
175
Datagram Switching• Managing tables in large, complex networks with
dynamically changing topologies is a real challenge for the routing protocol
01
3
2
0
13
2
0
13
2
Switch 3Host B
Switch 2
Host A
Switch 1
Host C
Host D
Host E
Host F
Host G
Host H
At switch 1:Dest Port#/Interface A 2 B 1 C 3 D 0 E 1 … …
176
Datagram Model
• No round trip time delay waiting for connection setup– Host can send data anywhere, anytime as soon as it is
ready– Source has no way of knowing if the network is
capable of delivering a packet or if the destination host is even up
• Packets are treated independently– Possible to route around link and node failures
dynamically
177
Virtual Circuit Switching
• Explicit connection setup (and tear-down) phase from source to destination: connection-oriented model– Subsequence packets follow established
circuit
• Supporting “connections” in network layer may be useful for service notions
178
VC Tables in VC Switching
• Setup message in signaling process (to create VC table) is forwarded like a datagram
• Acknowledgment of connection setup to downstream neighbors to complete signaling– Data transfer phase can start after ACK is
received
179
Signaling in VC Switching
• Setup message is forwarded from Host A to Host B
• On connection request, each switch creates an entry in VC table with a VCI for the connection
0
13
2
2
1
3
0
0
13
2
Switch 3Host B
Switch 2
Switch 1
Host A
I/F VCI I/F VCI in in out out
setupsetup BB
setupsetup BBsetupsetup BB
setupsetup BB
2 5 1
I/F VCI I/F VCI in in out out
2 7 3
I/F VCI I/F VCI in in out out
3 9 0
180
Virtual Circuit Model
• Typically wait full RTT for connection setup before sending first data packet– Can not avoid failures dynamically,
must re-establish connection (old one is torn down to free storage space)
181
Source Routing
• Packet header contains sequence of address/ports on path from source to destination– One direction per switch: port, next
switch (absolute)
– Switches read, use, and then discard directions
182
Data Transfer in Source Routing
• Analogous to following directions
0
13
2
2
1
3
0
0
13
2
Switch 3
Host B
Switch 2
Switch 1
Host A
datadata 00 11 33
datadata 33 00 11
datadata 11 33 00
datadata 33 00 11
datadata 11 00 33
datadata 22 33 00 11
183
Source Routing Model
• Source host needs to know the correct and complete topology of the network– Changes must propagate to all hosts
• Packet headers may be large and variable in size: the length is unpredictable
184
Implementation and Performance
• Packet arriving at interface 1 has to go on interface 2• Point of contention for packets: I/O and memory bus
CPU
Main memory
I/O bus
Interface 1
Interface 2
Interface 3
185
Building Extended LANs
• Traditional LAN– Shared medium (e.g. Ethernet)– Cheap, easy to administer– Supports broadcast traffic
• Problem– Want to scale LAN concept
• Larger geographic area (Greater than O(1 km))• More hosts (Greater than O(100))
– But retain LAN-like functionality
• Solution: bridges
186
Bridges• Connect two or more LANs with a bridge
– Transparently extends a LAN over multiple networks– Accept & forward strategy (in promiscuous mode)– Level 2 connection (does not add packet header)
A
Bridge
B C
X Y Z
Port 1
Port 2
187
Learning Bridges
• Learn table entries based on source address– Timeout entries to allow movement of hosts
• Table is an optimization need not be complete• Always forward broadcast frames• Uses datagram or connectionless forwarding
A
Bridge
B C
X Y Z
Port 1
Port 2
Host Port A 1 B 1 C 1 X 2 Y 2 Z 2
188
Learning Bridges
• Problem– Redundancy (desirable to handle failures, but …)– Makes extended LAN structure cyclic– Frames may cycle forever
• Solution: spanning tree
B3
A
C
E
DB2
B5
B
B7 K
F
H
B4
J
B1
B6
G
I
189
Spanning Tree
• Subset of forwarding possibilities• All LAN’s reachable, but• Acyclic• Bridges run a distributed algorithm to
calculate the spanning tree– Select which bridge actively forward– Developed by Radia Perlman of DEC– Now IEEE 802.1 specification– Reconfigurable algorithm
190
Spanning Tree Algorithm
• All designated bridges forward frames– On all designated ports– On preferred port (path leading to root)
B3
A
C
E
DB2
B5
B
B7 K
F
H
B4
J
B1
B6
G
I
B2
LAN
Designated port
Preferred port
Designated bridge
191
Distributed Spanning Tree Algorithm
• Bridges exchange configuration messages– ID for bridge sending the message– ID for what the sending bridge
believes to be root bridge– Distance (hops) from sending bridge
to root bridge
192
Limitations of Bridges
• Do not scale– Spanning tree algorithm does not scale
– Broadcast does not scale
• Do not accommodate heterogeneity– Only supports networks with same
address formats
193
ATM (Asynchronous Transfer Mode)• Common in WANs, can also be used in
LANs– Competing technology with Ethernet, but areas
of application only partially overlap
• Connection-oriented packet-switched network– Virtual-circuit routing
• Typically implemented on SONET (other physical layers possible)
194
ATM Signaling
• Connection setup called signaling (standard Q.2931)
• Route discovery, resource resv, QoS, ...• Send through network
– Request setup circuit– Send setup frame on setup circuit
• Establish locally– No intermediate switch involvement– Requires pre-established virtual path
195
Cell Switching (ATM)
• Fixed length (53 bytes) frames are called cells– 5-byte (header + 1 – byte CRC – 8) + 48-
byte payload
• Standard defines 3 layers (5 sublayers)– Layers interface to physical media and to
higher layers (e.g. encapsulating variable-length frames)
196
Cell Switching (ATM)
• 2-level connection hierarchy–Virtual circuits
–Virtual paths
•Bundles of virtual circuits
•Travel along common route
•Reduces forwarding information
197
ATM Cell Format• User-Network Interface (UNI)
– Host-to-switch format – GFC: Generic Flow Control (still being defined)– VCI/VPI: Virtual Circuit/Path Identifier– Type: management, congestion control, AAL5 (later)– CLP: Cell Loss Priority – HEC: Header Error Check (CRC-8)
• Network-Network Interface (NNI)– Switch-to-switch format– GFC becomes part of VPI field
GFC VPI VCI Type CLP HEC(CRC-8) payload
4 16 3 18 384 (48 bytes)8
198
Segmentation and Reassembly• ATM Adaptation Layer (AAL)
– Application to ATM cell mapping– AAL header contains information for reassembly– AAL1, AAL2 for applications needing guaranteed rate– AAL3/4 designed for variable-length packet data– AAL5 is an alternative standard for packet data
AAL
ATM
AAL
ATM
… …
199
ATM Layers• ATM Adaptation Layer (AAL)
– Convergence Sublayer (CS) supports different application service models
– Segmentation and Reassembly (SAR) supports variable-length frames
• ATM Layer– Handles virtual circuits, cell header
generation, flow control
• Physical layer– Transmission Convergence (TC)
handles error detection, framing– Physical medium dependent (PMD)
sublayer handles encoding
ATM
AALCS
SAR
PHYTC
PMD
200
AAL 3/4• Provides information to allow variable size packets
to be sent in fixed-size ATM cells• Convergence Sublayer Protocol Data Unit (CS-PDU)
– CPI: Common Part Indicator (version field)– Btag/Etag:beginning and ending tags (same)– BAsize: hint on reassembly buffer space to allocate – Length: size of whole PDU
• Segmented into cells: header/trailer + 44-byte data
CPI Btag BAsize payload Pad 0 Etag Length
8 16 0-24 88 < 64 KB 8 16
201
ATM Cell Format for AAL 3/4
• Type (is-start? and is-end? bits)– BOM (10): Beginning Of Message – COM (00): Continuation Of Message– EOM (01): End Of message– SSM (11): Single-Segment Message
• SEQ: Sequence Number (for cell loss/reordering)• MID: multiplexing ID (mux onto virtual circuits)• Length: number of bytes of PDU in this cell
ATM header type seq MID payload length CRC-10
40 4 352 (44 bytes) 62 10 16
202
Encapsulation and Segmentation for AAL3/4
44 bytes 44 bytes 44 bytes <44 bytes
ATM header
AAL header Cell
payload
AAL trailer Padding
CS-PDU header
User data CS-PDU trailer
< 64 KB 4-7 bytes4 bytes
203
AAL 5 CS-PDU
• CS-PDU Format
– Pad so trailer always falls at the end of ATM cell– Length: size of PDU (data only)– CRC-32 (detects missing or misordered cells)
• Cell Format– End-of-PDU bit in Type field of ATM header
0 - 47 2< 64 KB 2 32
data pad reserved length CRC-32
204
Encapsulation and Segmentation for AAL 5
User data
48 bytes 48 bytes 48 bytes
ATM header Cell payload
Padding
CS-PDUtrailer
205
Virtual Paths with ATM• Two level hierarchy of virtual connection: 8-bit
VPI and 16-bit VCI– Switches in the public network use 8-bit VPI– Corporate sites use full 24-bit address (VPI + VCI)– Much less connection-state info in switches– Virtual path: fat pipe with bundle of virtual circuits
Public network
Network BNetwork A
206
ATM as a LAN Backbone
• Different from traditional LANs, no native support for broadcast or multicast
E1
H5
H6
H7
H1E3
H2
H4
H3E2
ATM linksEthernet links
Ethernet switch
ATM switchATM-attachedhost
207
Shared Ethernet Emulation with LANE
• All hosts think they are on the same Ethernet
LANE / EthernetAdaptor Card
LANE / EthernetAdaptor Card
HHHH
HH
HHHH
EthernetSwitchATM Switch
LANE / EthernetAdaptor Card
LANE / EthernetAdaptor Card
HHHH
HH
HHHH
EthernetSwitchATM Switch
208
ATM / LANE Protocol Layers
Higher-layerprotocols
(IP, ARP, . . .)
Signalling+ LANE
AAL5
ATM
PHY
ATM
PHY PHY
Higher-layerprotocols
(IP, ARP, . . .)
Signalling+ LANE
AAL5
ATM
Host Switch Host
PHY
Ethernet-likeinterface
209
Clients and Servers in LANE
• LAN Emulation Client (LEC)–Host, bridge, router or switch
• LAN Emulation Server (LES)–Maintains client’s MAC and ATM
addresses–Maintains ATM address of BUS
210
Clients and Servers in LANE
• LAN Emulation Configuration Server (LECS)– High-level network management when
LEC starts up
– Reachable by preset VC (recall known server port#)
– Maintains mapping of ATM address to LANE type
211
Clients and Servers in LANE
• Broadcast and Unknown Server (BUS)– Emulates broadcast and multicast; critical to LANE– Uses point-to-multipoint VC with all clients
• Servers physically located in one or more devices
H2H1
BUSLESATM network
Point-to-point VC
Point-to-multipoint VCLECS
212
LANE Registration
1. Client contacts LECS on predefined VC, and sends ATM address to it
2. LECS returns LAN type, MTU and ATM address of LES
3. Client signals connection to LES, and registers MAC and ATM addresses with LES
4. LES returns ATM address of BUS5. Client signals connection to BUS6. Bus adds client to point-to-multipoint
VC
ATM Network
LECS
LES BUS
H1 H2
H3
213
LANE Circuit Setup
1. Client (H1) knows destination MAC address of receiver (H2)
2. Client (H1) sends 1st packet to BUS
3. BUS sends address resolution request to LES
4. LES returns ATM address to client (H1)
5. Client (H1) signals connection to H2 for subsequent packets
ATM Network
LECS
LES BUS
H1 H2
H3
214
Contention in Switches• Some packets destined for same output
– One goes first– Others delayed or dropped
• Delaying packets requires buffering– Finite capacity, some packets must still drop– At inputs
• Increases/adds false contention• Sometimes necessary
– At outputs– Can also exert “backpressure”
215
Output Buffering
1x6 Switch
x
a
Standard check-in linesCustomer
service
trying to check-inyou Mr. X
writing complaint
letter
Mr. A waiting to
claim refund of Rs.100
216
Input Buffering: Head-of-line Blocking
1x6 Switch
x
a
Standard check-in linesCustomer
service
trying to check-in
you
Mr. X writing
complaint letter
Mr. A waiting to
claim refund of Rs.100
agents are standing by !
217
Backpressure
• Propagation delay requires that switch 2 exert backpressure at high-water mark rather than when buffer completely full
• It is thus typically only used in networks with small propagation delays (e.g. switch fabrics)
Switch 1 Switch 2
“no more, please”
218
Switching Fabric• Special-purpose (switching) hardware
• General problem– Connect N inputs to M outputs (NxM switch)– Often N=M (bidirectional links)
• Design goals– High throughput: want aggregate close to
MIN (sum of inputs, sum of outputs)– Avoid contention (fabric faster than ports)– Good scalability:linear size/cost growth in N/M
219
Switch: Fabric and PortsFabric has a job to deliver packets to the right output
InputPort
InputPort
InputPort
InputPort
OutputPort
OutputPort
OutputPort
OutputPort
FabricSwitchfabric
(with small internal
buffering)
220
Ports and Fabric
• Ports deals with the complexity of the real world– Virtual circuit management is handled in ports– Determine output port using forwarding tables
• Input port is the first in performance bottlenecks– Header processing and handling packet to fabric
221
Design Goals - Throughput
• An n x m switch can provide max ideal throughput of:
S = S1+ S2 + ……… + Sn
– Only possible if traffic at inputs is evenly distributed across all outputs
– Sustained throughput higher than link speed of output is not possible
222
Design Goals - Scalability
• Cost of hardware rises fast with increasing the number of ports n– Adding ports increases hardware & design
complexity
– Scalability in terms of rate of increase in cost
• Design complexity determines maximum switch size– Switch designs run into problems at some maximum
number of inputs and outputs
223
Switch Performance• Avoid contention with buffering
– Use output buffering when possible– Apply backpressure through fabric– Input buffering with “peeking” (non-FIFO semantics)
to reduce head-of-line blocking problems– Drop packets if input buffer overflows
• Good scalability– O(N) ports– Port design complexity O(N) gives O(N2) for switch– Port design complexity O(1) gives O(N) for switch
224
Crossbar (“Perfect”) Switch
• Problem: hardware scales as O(N2)
225
Knockout Switch: Pick L from N
• Problem: what if more than L arrive?
1
2
3
4
OutputsInputs
D D D D D
DDD
D
D D D
D
D
D
2 × 2 random selector
delay unit
8-to-4 Concentrator
226
Shared Memory Switch
Mux Buffer memory Demux
Writecontrol
Readcontrol
Inputs Outputs
… …
227
Self-Routing Fabrics• Use source routing on “network” within switch
• Input port attaches output port number as header
• Fabric routes packet based on output port
• Types– Banyan Network
– Batcher-Banyan Network
– Sunshine Switch
228
Banyan Network
• Sends 0 bit up, 1 bit down
001
011
110
111
001
011
110
111
MSB LSB
229
Batcher (Merge Sort) Network
Routing packets through a Batcher network
• Batcher-Banyan Network– Attach the two-back-to-back– Arbitrary unique permutations routed without contention
7 3
3 7
3 3
6 6
3 1
1 3
6 6
1 1
7 1
1 7
6 6
7 7
Sort Merge Merge
230
Batcher-Banyan Network
Sends 1 bit upSends 0 bit down
Sends 0 bit upSends 1 bit down
231
Sunshine Switch
• Like a Knockout switch
• Re-circulates overflow packets i.e. when more than L arrive in one cycle
Delay
Inputs Batcher Trap SelectorOutputs
nnn
n
kk
n + kn + kl banyans
nnn(marks
overflow packets)
232
What we understand …
• Concepts of networking and network programming– Elements of networks: nodes and links– Building a packet abstraction on a link
• Transmission, and units of communication data– How to detect transmission errors in a frame after
encoding and framing it– How to simulate a reliable channel (sliding window)– How to arbitrate access to shared media in any network
• Design issues of direct link networks– Functionality of network adaptors
233
We also understand …
• How switches may provide indirect connectivity– Different ways to move through a network
(forwarding)– Bridge approach to extending LAN concept– Example of a real virtual circuit network (ATM)– How switches are built and contention within
switches
• Next: lets different networks “work together”
234
Internetworking• Reading: Peterson and Davie, Ch. 4
• Basics of Internetworking – Heterogeneity– The IP protocol, address resolution, control
messages
• Dealing with simple heterogeneity issues– Defining a service model– Defining a global namespace– Structuring the namespace to simplify forwarding– Hiding variations in frame size limits
235
Internetworking
• Routing – moving forward with IP– Building forwarding information
• Dealing with global internets-scale– Virtual geography and addresses– Hierarchical routing– Name translation and lookup: translating between
global and local (physical) names– Multicast traffic
• Future internetworking: IPv6
236
Internet Protocol (IP)• Network protocol for the Internet• Operates on all hosts and routers (routers connect
distinct networks into the Internet)
…
TFTPNVHTTPFTP
UDPTCP
IPIP
FDDI Ethernet ATM
237
IP Service Model• Provided to transport layer (TCP, UDP)
– Global name space– Host-to-host connectivity (connectionless)– “Best effort” packet delivery (datagram-based)
• No delivery guarantees on bandwidth, delay, etc.– Packet delayed for very long time– Packet lost– Packet delivered more than once– Packets delivered out of order
• Simplest model: ability of IP to “run over anything”
238
Internetwork
• Concatenation of networks
• Protocol stack
Network 1
Ethernet
Network 1
Ethernet
Network 3
FDDI
Network 3
FDDI
Network 4
Ethernet
Network 4
Ethernet
R1
R2
R3
H8H2 H3
H1
H4
H5
H6 H7
Network 2
Point-to-
point
R1
H1
TCP
IP
ETH ETH PPP
IP
R2
PPP FDDI
IP
R3
FDDI ETH
IP
H8
TCP
IP
ETH
239
IP Addresses
– 18.10.5.22 host in class A network (MIT)– 130.126.143.254 host in class B network (UIUC)– 192.12.70.111 host in class C network
• More recent classes– Multicast (class D): starts with 1110– Future expansions (class E): starts with 1111
Network Host
7 bits (126 nets) 24 bits (16 million hosts)
0Class A:
Network Host
14 bits (16k nets) 16 bits (64K hosts)
1 0Class B:
Network Host
21 bits (2 million nets) 8 bits (256)
1 1 0Class C:
240
Datagram Format
• 4-bit version (4 for IPv4, 6 for IPv6)
• 4-bit header length (in words, minimum of 5)
• 8-bit type of service (TOS) more or less unused
• 16-bit datagram length (in bytes)
• 8-bit protocol (e.g. TCP=6 or UDP=17)
Version HLen TOS Length
Ident Flags Offset
TTL Protocol Checksum
SourceAddr
DestinationAddr
Options (variable) Pad(variable)
0 4 8 16 19 31
Data
241
Internet Protocol (IP)
• Service model: glob address, H-H connect, BE• Overview of message transmission• Host addressing and address translation• Datagram forwarding• Fragmentation and reassembly• Error reporting/control messages• Dynamic configuration• Protocol extensions through tunneling• Note: congestion control not handled by IP
242
Fragmentation and Reassembly Example
H1 R1 R2 R3 H8
ETH IP (1400) FDDI IP (1400) PPP IP (512)
PPP IP (376)
PPP IP (512)
ETH IP (512)
ETH IP (376)
ETH IP (512)
Ident = x Offset = 0
Start of header
0
Rest of header
1400 data bytes
Ident = x Offset = 0
Start of header
1
Rest of header
512 data bytes
Ident = x Offset = 64
Start of header
1
Rest of header
512 data bytes
Ident = x Offset = 128
Start of header
0
Rest of header
376 data bytes
243
Datagram Forwarding
Network # Netmask Nest hop / port
18.0.0.0 255.0.0.0 1128.32.0.0 255.255.0.0 20.0.0.0 0.0.0.0 3
dest: 18.26.10.0 mask with 255.0.0.0 matched! send to port 1
dest: 128.16.14.0 mask with 255.0.0.0 not matchedmask with 255.255.0.0 not matchedmask with 0.0.0.0 matched! send to port 3
244
ARP Packet Format
TargetHardwareAddr (bytes 2 – 5)
TargetProtocolAddr (bytes 0– 3)
SourceProtocolAddr (bytes 2 – 3)
Hardware type = 1 Protocol Type = 0x0800
SourceHardwareAddr (bytes 4 – 5)
TargetHardwareAddr (bytes 0 –1)
SourceProtocolAddr (bytes 0 – 1)
HLen = 48 PLen = 32 Operation
SourceHardwareAddr (bytes 0– 3)
0 8 16 31
245
Internet Control Message Protocol (ICMP)
• IP companion protocol (not necessary)• Handles error and control messages
…
TFTPNVHTTPFTP
UDPTCP
IP
FDDI Ethernet ATM
ICMP
246
ICMP Message• Sent to the source when a node is unable to process
IP datagram successfully• Error messages
– Destination unreachable (protocol, port, or host)– Reassembly failed– IP Checksum failed; or invalid header– TTL exceeded (so datagrams don’t cycle forever)– Cannot fragment
• Control messages– Echo (ping) request and reply– Redirect (from router to source host, to change route)
247
Dynamic Host Configuration Protocol- DHCP
• DHCP server is required to provide configuration information to each host– Each host retrieve this information on bootup
• DHCP server can be configured manually, or it may allocate addresses on-demand– Addresses are “leased” for some period of time
• Each host is not configured for DHCP server, it performs a DHCP server discovery– A broadcast discovery message is sent by the host and a
unicast reply is sent by the server
248
Virtual Private Networks - VPN
• Controlled connectivity– Restrict forwarding to authorized hosts
• Controlled capacity– Change router drop and priority policies
– Provide guarantees on bandwidth, delay, etc.
• Virtual net replaces leased line with shared net
• Unwanted connectivity is prevented on this logical link using IP tunnel
249
IP Tunnel in VPNs
• Virtual point-to-point link between a pair of nodes separated by many networks
IP header,Destination = 2.x
IP payload
IP header,Destination = 10.0.0.1
IP header,Destination = 2.x
IP payload
IP header,Destination = 2.x
IP payload
Network 1 R1 Internetwork Network 2R2
10.0.0.1
250
IP Tunneling for Multicast• Set up a tunnel between each pair of universities• Multicast packets
– Received by tunnel entry node– Encapsulated (another IP header added for tunnel exit)– Travel through the Internet (the tunnel)– Received by tunnel exit node– Unwrapped and delivered to another
multicast-capable university campus
251
What is Routing ?• Definition: task of constructing and
maintaining forwarding information (in hosts or in switches)
• Goals for routing– Capture notion of “best” routes
– Propagate changes effectively
– Require limited information exchange
– Admit efficient implementation
• Important notion: graph representation of network
252
Routing Overview• Hierarchical routing infrastructure defines routing
domains – Where all routers are under same administrative
control• Network as a Graph
– Nodes are routers– Edges are links– Each link has a cost
• Problem: Find lowest cost path between two nodes– Maintain information about each link– Static: topology changes are not incorporated– Dynamic (or distributed): complex algorithms
4
3
6
21
9
1
1D
A
FE
B
C
253
Routing Outline• Algorithms
– Static shortest path algorithms• Bellman-Ford: all pairs shortest paths to destination• Dijkstra’s algorithm: single source shortest path
– Distributed, dynamic routing algorithms• Distance Vector routing (based on Bellman-Ford)• Link State routing (Dijkstra’s algorithm at each node)
• Metrics (from ArpaNet, with informative names)– Original– New– Revised
254
Bellman-Ford Algorithm• Static, centralized algorithm, (local iterations/destination)• Requires: directed graph with edge weights (cost)• Calculates: shortest paths for all directed pairs• Check use of each node as successor in all paths• For every node N
– for each directed pair (B,C)• is the path B N …C better than BC ?• is cost BNdestination smaller than previously
known?• For N nodes
– Uses an NxN matrix of (distance, successor) values
255
Dijkstra’s Algorithm• Static, centralized algorithm, build tree from source• Requires directed graph with edge weights (distance)• Calculates: shortest paths from 1 node to all other• Greedily grow set S of known minimum paths• From node N
– Start with S = {N} and one-hop paths from N– Loop n-1 times
• add closest outside node M to S• for each node P not in S
– is the path N .....MP better than NP ?
256
Distance Vector Routing• Distributed, dynamic version of Bellman-Ford
• Each node maintains distance vector: set of triples – (Destination, Cost, NextHop)
– Edge weights starting at a node assumed known by that node
• Exchange updates of distance vector (Destination, Cost) with directly connected neighbors (known as advertising the routes)– Periodically (on the order of several seconds to minutes)– Whenever vector changes (called triggered update)
257
Distance Vector Routing Example Information in routing table of each node:
Iteration 3
D
G
A
F
E
B
C
At distance to reach nodenode A B C D E F G A 0 1 1 2 1 1 2 B 1 0 1 2 2 2 3 C 1 1 0 1 2 2 2 D 2 2 1 0 3 2 1 E 1 2 2 3 0 2 3 F 1 2 2 2 2 0 1 G 2 3 2 1 3 1 0
258
Distance Vector Routing: Link Failure• F detects that link to G has failed• F sets distance to G to infinity and
sends update to A• A sets distance to G to infinity since
it uses F to reach G• A receives periodic update from C
with 2-hop path to G• A sets distance to G to 3 and sends
update to F• F decides it can reach G in 4 hops
via A
D
G
A
F
E
B
C
259
Count to Infinity Problem
• Link from A to E fails• A advertises distance of infinity to E, but
B and C advertise a distance of 2 to E !• B decides it can reach E in 3 hops;
advertises this to all• A decides it can read E in 4 hops;
advertises this to all• C decides that it can reach E in 5 hops…• We are counting to infinity …
D
G
A
F
E
B
C
260
Split Horizon
• Avoid counting to infinity by solving “mutual deception” problem
• When sending an update to node X, do not include destinations that you would route through X– If X thinks route is not through you, no effect– If X thinks route is through you, X will timeout route
AA BB CC
DD
C : 1 : C
C : 2 : B
C : ∞ : -C : 2 : B
Loop of > 2 nodes fails split horizon !!!
261
Split Horizon with Poison Reverse
• When sending update to node X, include destinations that you would route through X with distance set to infinity
• Don’t need to wait for X to timeout
262
Link State Routing• Distributed, dynamic form of Dijkstra’s algorithm
• Strategy– Send to all nodes (not just neighbors) information about
directly connected nodes (not entire route table) in LSP• Basic data structure: Link State Packet (LSP)
– ID of the node that created the LSP– Cost of link to each directly connected neighbor: vector
of (distance, successor) values– Sequence number (SEQNO)– Time-to-live (TTL) for this packet
263
Link State Routing• Each node maintains a list of (ideally all) LSP’s
– Runs Dijkstra’s algorithm on list– May discover its neighbors by “Hello” messages
• Information acquisition via reliable flooding– Create new LSP periodically; send to 1-hop neighbors
• Increment SEQNO (start SEQNO at 0 when reboot)– Store most recent (higher SEQNO) LSP from each node– Forward new LSP to all nodes but the one that sent it
• Decrement TTL of each LSP; discard when TTL=0– Try to minimize routing traffic “overhead”
264
Route Calculation
At node D
Confirmed list Tentative list
1. (D,0,-)
2. (D,0,-) (C,2,C), (B,11,B)
3. (D,0,-), (C,2,C) (B,11,B)
4. (D,0,-), (C,2,C) (B,5,C), (A,12,C)
5. (D,0,-), (C,2,C), (B,5,C) (A,12,C)
6. (D,0,-), (C,2,C), (B,5,C) (A,10,C)
7. (D,0,-), (C,2,C), (B,5,C), (A,10,C)
D
A
B
C
5 3
211
10