Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Part 3: Transport Layer
CSE 3461/5461Reading: Chapter 3, Kurose and Ross
1
Part 3: Transport LayerChapter goals:• Understand principles behind
transport layer services:– Multiplexing/demultiplexing– Reliable data transfer– Flow control– Congestion control
• Instantiation and implementation in the Internet
Chapter Overview:• Transport layer services• Multiplexing/demultiplexing• Connectionless transport: UDP• Principles of reliable data transfer• Connection-oriented transport: TCP
– Reliable transfer– Flow control– Connection management
• Principles of congestion control• TCP congestion control
2
Transport Services and Protocols• Provide logical
communication between application processes running on different hosts
• Transport protocols run in end systems
• Transport layer vs. network layer services:– Network layer: data transfer
between end systems– Transport layer: data transfer
between processes; relies on, enhances, network layer services
3
applicationtransportnetworkdata linkphysical
applicationtransportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysicalnetwork
data linkphysical
logical end-end transport
Transport-Layer ProtocolsInternet transport services:• Reliable, in-order unicast
delivery: TCP– Congestion – Flow control– Connection setup
• Unreliable (“best-effort”), unordered unicast or multicast delivery: UDP
• Services not available: – Real-time– Bandwidth guarantees– Reliable multicast
4
applicationtransportnetworkdata linkphysical
applicationtransportnetworkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysicalnetwork
data linkphysical
logical end-end transport
Multiplexing/Demultiplexing (1)Recall: Segment is a unit of data
exchanged between transport layer entities– Aka TPDU: transport
protocol data unit
5
ApplicationTransportNetwork
MP2
ApplicationTransportNetwork
Receiver
HtHn
Demultiplexing: Delivering received segments to correct app layer processes
Segment
Segment MApplicationTransportNetwork
P1M
M MP3 P4
Segmentheader
Application-layerdata
Multiplexing/Demultiplexing (2)
Multiplexing/demultiplexing:• Based on sender, receiver port
numbers, IP addresses– Source, destination port
numbers in each segment– Recall: Well-known port
numbers for specific applications
• Internal action between layers:in the network, messages from different applications will be in different packets.
6
Gathering data from multipleapp processes, enveloping data with header (later used for demultiplexing)
Source port # Dest port #
32 bits
Applicationdata
(message)
Other header fields
TCP/UDP segment format
Multiplexing:
Multiplexing/Demultiplexing: Examples
7
Host A Server BSource Port: XDest. Port: 23
Source Port:23Dest. Port: X
Port Use: Simple Telnet App
Web ClientHost A
WebServer B
Web ClientHost C
Source IP: CDest IP: B
Source Port: XDest. Port: 80
Source IP: CDest IP: B
Source Port: YDest. Port: 80
Port Use: Web Server
Source IP: ADest IP: B
Source Port: XDest. Port: 80
User Datagram Protocol (UDP) (1)• “No frills,” “bare bones”
Internet transport protocol• “Best effort” service, UDP
segments may be:– Lost– Delivered out of order to
app• Connectionless:
– No handshaking between UDP sender, receiver
– Each UDP segment handled independently of others
Why is there a UDP?• No connection establishment
(which can add delay)• Simple: no connection state at
sender, receiver• Small segment header• No congestion control: UDP
can blast away as fast as desired
8
UDP (2)• Often used for streaming
multimedia apps– Loss tolerant– Rate sensitive
• Other UDP uses (why?):– DNS– SNMP
• Reliable transfer over UDP: add reliability at application layer– Application-specific error
recovery!
9
Source Port # Dest Port #
32 Bits
Applicationdata
(message)
UDP Segment Format
Length ChecksumLength, in
bytes of UDPsegment,including
header
UDP Checksum
Sender:• Treat segment contents as
sequence of 16-bit integers• Checksum: addition (one’s
complement sum) of segment contents
• Sender puts checksum value into UDP checksum field
Receiver:• Compute checksum of received
segment• Check if computed checksum
equals checksum field value:– NO: Error detected– YES: No error detected. But
maybe errors nonetheless?More later ….
10
Goal: Detect “errors” (e.g., flipped bits) in transmitted segment
Principles of Reliable Data Transfer (1)• Important in application, transport, link layers
– Top-10 list of important networking topics!
• Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (RDT)
11
Principles of Reliable Data Transfer (2)• Important in application, transport, link layers
– Top-10 list of important networking topics!
• Characteristics of unreliable channel will determine complexity of reliable data transfer (RDT) protocol
12
Principles Of Reliable Data Transfer (3)• Important in application, transport, link layers
– Top-10 list of important networking topics!
• Characteristics of unreliable channel will determine complexity of reliable data transfer protocol (RDT)
13
Reliable Data Transfer: Getting Started (1)
14
sendside
receiveside
rdt_send(): Called from above, (e.g., by app.). Passed data to deliver to receiver upper layer
udt_send(): Called by RDT,to transfer packet over
unreliable channel to receiver
rdt_rcv(): Called when packet arrives on receiver side of channel
deliver_data(): Called by RDT to deliver data to upper layer
Reliable Data Transfer: Getting Started (2)
We will:• Incrementally develop sender, receiver sides of
Reliable Data Transfer protocol (RDT)• Consider only unidirectional data transfer– But control info will flow on both directions!
• Use finite state machines (FSM) to specify sender, receiver
15
state1
state2
Event causing state transitionActions taken on state transition
State: when in this “state” next state
uniquely determined by
next event
EventActions
RDT1.0: Reliable Transfer over Reliable Channel
• Underlying channel perfectly reliable– No bit errors– No loss of packets
• Separate FSMs for sender, receiver:– Sender sends data into underlying channel– Receiver reads data from underlying channel
16
Wait for call from
abovepacket = make_pkt(data)udt_send(packet)
rdt_send(data)
extract (packet,data)deliver_data(data)
Wait for call from
below
rdt_rcv(packet)
Sender Receiver
RDT2.0: Channel with Bit Errors (1)• Underlying channel may flip bits in packet– Checksum to detect bit errors
• The question: how to recover from errors:– acknowledgements (ACKs): receiver explicitly tells
sender that pkt received OK– negative acknowledgements (NAKs): receiver explicitly
tells sender that pkt had errors– sender retransmits pkt on receipt of NAK
• new mechanisms in rdt2.0 (beyond rdt1.0):– error detection– receiver feedback: control msgs (ACK,NAK) rcvr-
>sender
17
How do humans recover from “errors”during conversation?
RDT2.0: Channel with Bit Errors (2)• Underlying channel may flip bits in packet– Checksum to detect bit errors
• The question: how to recover from errors:– Acknowledgements (ACKs): receiver explicitly tells
sender that pkt received OK– Negative acknowledgements (NAKs): receiver explicitly
tells sender that pkt had errors– Sender retransmits pkt on receipt of NAK
• New mechanisms in RDT2.0 (beyond RDT1.0):– Error detection– Feedback: control msgs (ACK, NAK) from receiver to
sender
18
rdt_send(data)
RDT2.0: FSM Specification
19
sndpkt = make_pkt(data, checksum)udt_send(sndpkt)
Wait for call from above
extract(rcvpkt,data)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) && isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) && corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
Sender
Receiver
Λ
rdt_send(data)
RDT2.0: Operation with no Errors
20
Wait for call from above
sndpkt = make_pkt(data, checksum)udt_send(sndpkt)
extract(rcvpkt,data)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) && isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) && corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
L
rdt_send(data)
RDT2.0: Error Scenario
21
Wait for call from above
sndpkt = make_pkt(data, checksum)udt_send(sndpkt)
extract(rcvpkt,data)deliver_data(data)udt_send(ACK)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
rdt_rcv(rcvpkt) && isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&isNAK(rcvpkt)
udt_send(NAK)
rdt_rcv(rcvpkt) && corrupt(rcvpkt)
Wait for ACK or
NAK
Wait for call from below
Λ
RDT2.0 has a Fatal Flaw!
What happens if ACK/NAK corrupted?
• Sender doesn’t know what happened at receiver!
• Can’t just retransmit: possible duplicate
Handling duplicates: • Sender retransmits current
pkt if ACK/NAK corrupted• Sender adds sequence
number to each pkt• Receiver discards (doesn’t
deliver up) duplicate pkt
22
Stop and WaitSender sends one packet,
then waits for receiver response
RDT2.1: Sender, Handles Garbled ACK/NAKs
23
Wait for call 0 from
above
sndpkt = make_pkt(0, data, checksum)udt_send(sndpkt)
rdt_send(data)
Wait for ACK or NAK 0 udt_send(sndpkt)
rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) ||
isNAK(rcvpkt) )
sndpkt = make_pkt(1, data, checksum)udt_send(sndpkt)
rdt_send(data)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) ||isNAK(rcvpkt) )
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt)
Wait forcall 1 from
above
Wait for ACK or NAK 1
ΛΛ
extract(rcvpkt,data)deliver_data(data)sndpkt = make_pkt(ACK, chksum)udt_send(sndpkt)
sndpkt = make_pkt(NAK,chksum)udt_send(sndpkt)
RDT2.1: Receiver, Handles Garbled ACK/NAKs
24
Wait for 0 from below
sndpkt = make_pkt(NAK,chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) && not corrupt(rcvpkt) &&has_seq0(rcvpkt)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq1(rcvpkt)
Wait for 1 from below
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)&& has_seq0(rcvpkt)
extract(rcvpkt,data)deliver_data(data)sndpkt = make_pkt(ACK, chksum)udt_send(sndpkt) rdt_rcv(rcvpkt) &&
corrupt(rcvpkt)
sndpkt = make_pkt(ACK,chksum)udt_send(sndpkt)
rdt_rcv(rcvpkt) && not corrupt(rcvpkt) &&has_seq1(rcvpkt)
rdt_rcv(rcvpkt) && corrupt(rcvpkt)
sndpkt = make_pkt(ACK,chksum)udt_send(sndpkt)
RDT2.1: Discussion
Sender:• Seq # added to pkt• Two seq. numbers (0,1)
will suffice. Why?• Must check if received
ACK/NAK corrupted • Twice as many states– State must “remember”
whether “expected” pkt should have seq # 0 or 1
Receiver:• Must check if received
packet is duplicate– State indicates whether 0
or 1 is expected pkt seq #
• Note: receiver can notknow if its last ACK/NAK received OK at sender
25
RDT2.2: A NAK-Free Protocol
• Same functionality as RDT2.1, using ACKs only• Instead of NAK, receiver sends ACK for last pkt
received OK– Receiver must explicitly include seq # of pkt being ACKed
• Duplicate ACK at sender results in same action as NAK: retransmit current pkt
26
RDT2.2: Sender, Receiver Fragments
27
Wait for call 0 from
above
sndpkt = make_pkt(0, data, checksum)udt_send(sndpkt)
rdt_send(data)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) ||isACK(rcvpkt,1) )
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)&& isACK(rcvpkt,0)
Wait for ACK 0
Sender FSMfragment
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && has_seq1(rcvpkt)
extract(rcvpkt,data)deliver_data(data)sndpkt = make_pkt(ACK1, chksum)udt_send(sndpkt)
Wait for 0 from below
rdt_rcv(rcvpkt) && (corrupt(rcvpkt) ||has_seq1(rcvpkt))
udt_send(sndpkt)Receiver FSM
fragment
Λ︙
︙
⋯
⋯
RDT3.0: Channels with Errors and Loss
New assumption:Underlying channel can also lose packets (data, ACKs)– Checksum, seq. #, ACKs,
retransmissions will help … but not enough
Approach: sender waits “reasonable” amount of time for ACK
• Retransmits if no ACK received in this time
• If pkt (or ACK) just delayed (not lost):– Retransmission will be
duplicate, but seq. #’s already handles this
– Receiver must specify seq # of pkt being ACKed
• Requires countdown timer
28
stop_timer
RDT3.0 Sender
29
sndpkt = make_pkt(0, data, checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) ||isACK(rcvpkt,1) )
Wait for call 1 from
above
sndpkt = make_pkt(1, data, checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt,0)
rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) ||isACK(rcvpkt,0) )
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt,1)
stop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0 from
above
Wait for
ACK1
Lrdt_rcv(rcvpkt)
LL
L
RDT3.0 in Action (1)
30
Sender Receiver
rcv pkt1
rcv pkt0
send ack0
send ack1
send ack0
rcv ack0
send pkt0
send pkt1
rcv ack1
send pkt0rcv pkt0
pkt0
pkt0
pkt1
ack1
ack0
ack0
(a) no loss
Sender Receiver
rcv pkt1
rcv pkt0
send ack0
send ack1
send ack0
rcv ack0
send pkt0
send pkt1
rcv ack1
send pkt0rcv pkt0
pkt0
pkt0
ack1
ack0
ack0
(b) packet loss
pkt1X
loss
pkt1timeout
resend pkt1
RDT3.0 in Action (2)
31
rcv pkt1send ack1
(detect duplicate)
pkt1
Sender Receiver
rcv pkt1
rcv pkt0
send ack0
send ack1
send ack0
rcv ack0
send pkt0
send pkt1
rcv ack1
send pkt0rcv pkt0
pkt0
pkt0
ack1
ack0
ack0
(c) ACK loss
ack1X
loss
pkt1timeout
resend pkt1
rcv pkt1send ack1
(detect duplicate)
pkt1
Sender Receiver
rcv pkt1
send ack0rcv ack0
send pkt1
send pkt0rcv pkt0
pkt0
ack0
(d) premature timeout/ delayed ACK
pkt1timeout
resend pkt1
ack1
send ack1
send pkt0rcv ack1
pkt0
ack1
ack0
send pkt0rcv ack1 pkt0
rcv pkt0send ack0ack0
rcv pkt0
send ack0(detect duplicate)
Performance of RDT3.0• RDT3.0 is correct, but performance stinks• E.g.: 1 Gbps link, 15 ms prop. delay, 8000 bit packet:
32
– Usender: utilization – fraction of time sender busy sending
– If RTT=30 msec, 1KB pkt every 30 msec: 33kB/sec throughput over 1 Gbps link
– Network protocol limits use of physical resources!
RDT3.0: Stop-and-Wait Operation
33
First packet bit transmitted, t = 0
sender receiver
RTT
:ast packet bit transmitted, t = L / R
First packet bit arrivesLast packet bit arrives, send ACK
ACK arrives, send next packet, t = RTT + L / R
Pipelined Protocols
Pipelining: sender allows multiple, “in-flight”, yet-to-be-acknowledged pkts– Range of sequence numbers must be increased– Buffering at sender and/or receiver
• Two generic forms of pipelined protocols: Go-Back-N, Selective Repeat
34
Pipelining: Increased Utilization
35
First packet bit transmitted, t = 0
sender receiver
RTT
Last bit transmitted, t = L / R
First packet bit arrivesLast packet bit arrives, send ACK
ACK arrives, send next packet, t = RTT + L / R
Last bit of 2nd packet arrives, send ACKLast bit of 3rd packet arrives, send ACK
Packet pipelining increasesutilization by a factor of 3!
Pipelined Protocols: OverviewGo-Back-N:• Sender can have up to N
unACKed packets in pipeline
• Receiver only sends cumulative ACK– Doesn’t ACK packet if
there’s a gap
• Sender has timer for oldest unACKed packet– When timer expires,
retransmit all unACKed packets
Selective Repeat:• Sender can have up to N
unACKed packets in pipeline
• Receiver sends individual ACK for each packet
• Sender maintains timer for each unACKed packet– When timer expires,
retransmit only that unACKed packet
36
Go-Back-N: Sender• k-bit seq # in pkt header• “Window” of up to N, consecutive unACKed pkts allowed
37
• ACK(n): ACKs all pkts up to, including seq # n –“cumulative ACK”– May receive duplicate ACKs (see receiver)
• Timer for oldest in-flight pkt• Timeout(n): retransmit packet n and all higher seq # pkts in
window
GBN: Sender Extended FSM
38
Wait start_timerudt_send(sndpkt[base])udt_send(sndpkt[base+1]). . .udt_send(sndpkt[nextseqnum-1])
timeout
rdt_send(data)
if (nextseqnum < base+N) {sndpkt[nextseqnum] = make_pkt(nextseqnum,data,chksum)udt_send(sndpkt[nextseqnum])if (base == nextseqnum)
start_timernextseqnum++}
elserefuse_data(data)
base = getacknum(rcvpkt)+1if (base == nextseqnum) {
stop_timer }else {
start_timer }
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
base=1nextseqnum=1
rdt_rcv(rcvpkt) && corrupt(rcvpkt)
Λ
Λ
GBN: Receiver Extended FSM
ACK-only: always send ACK for correctly-received pkt with highest in-order seq #– May generate duplicate ACKs– Need only remember expectedseqnum
• Out-of-order pkt: – Discard (don’t buffer): no receiver buffering!– Re-ACK pkt with highest in-order seq #
39
Wait
udt_send(sndpkt)default
rdt_rcv(rcvpkt)&& notcurrupt(rcvpkt)&& hasseqnum(rcvpkt,expectedseqnum)
extract(rcvpkt,data)deliver_data(data)sndpkt = make_pkt(expectedseqnum,ACK,chksum)udt_send(sndpkt)expectedseqnum++
expectedseqnum=1sndpkt =
make_pkt(expectedseqnum,ACK,chksum)
L
GBN in Action
40
send pkt0send pkt1send pkt2send pkt3
(wait)
Sender Receiver
receive pkt0, send ack0receive pkt1, send ack1
receive pkt3, discard, (re)send ack1rcv ack0, send pkt4
rcv ack1, send pkt5
pkt 2 timeoutsend pkt2send pkt3send pkt4send pkt5
X loss
receive pkt4, discard, (re)send ack1
receive pkt5, discard, (re)send ack1
rcv pkt2, deliver, send ack2rcv pkt3, deliver, send ack3rcv pkt4, deliver, send ack4rcv pkt5, deliver, send ack5
ignore duplicate ACK
0 1 2 3 4 5 6 7 8
Sender window (N = 4)
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
Selective Repeat• Receiver individually acknowledges all correctly
received pkts– Buffers pkts, as needed, for eventual in-order delivery
to upper layer• Sender only resends pkts for which ACK not
received– Sender timer for each unACKed pkt
• Sender window– N consecutive seq #’s– Limits seq #s of sent, unACKed pkts
41
Selective Repeat: Sender, Receiver Windows
42
Selective Repeat
Data from above:• If next available seq # in
window, send pkt
Timeout(n):• Resend pkt n, restart timer
ACK(n) in
• Mark pkt n as received• If n smallest unACKed pkt,
advance window base to next unACKed seq #
43
SenderPkt n in
• Send ACK(n)• Out-of-order: buffer• In-order: deliver (also
deliver buffered, in-order pkts), advance window to next not-yet-received pkt
Pkt n in
• ACK(n)Otherwise, ignore
Receiver
Selective Repeat in Action
44
send pkt0send pkt1send pkt2send pkt3
(wait)
Sender Receiver
receive pkt0, send ack0receive pkt1, send ack1
receive pkt3, buffer, send ack3rcv ack0, send pkt4rcv ack1, send pkt5
pkt 2 timeoutsend pkt2
X loss
receive pkt4, buffer, send ack4receive pkt5, buffer, send ack5
receive pkt2; deliver pkt2, pkt3,pkt4, pkt5; send ack2
record ack3 arrived
0 1 2 3 4 5 6 7 8
Sender window (N=4)
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
record ack4 arrived
record ack5 arrived
Q: What happens when ack2 arrives?
Selective Repeat:Dilemma
Example: • Seq #s: 0, 1, 2, 3• Window size = 3• Receiver sees no
difference in two scenarios!
• Duplicate data accepted as new in (b)!
Q: What relationship between seq # size and window size needed to avoid problem in (b)? 45
Receiver window(after receipt)
Sender window(after receipt)
0 1 2 3 0 1 2
0 1 2 3 0 1 2
0 1 2 3 0 1 2
pkt0
pkt1pkt2
0 1 2 3 0 1 2 pkt0
timeoutretransmit pkt0
0 1 2 3 0 1 2
0 1 2 3 0 1 2
0 1 2 3 0 1 2XXX
Will accept packetwith seq number 0(b) Oops!
0 1 2 3 0 1 2
0 1 2 3 0 1 2
0 1 2 3 0 1 2
pkt0pkt1pkt2
0 1 2 3 0 1 2pkt0
0 1 2 3 0 1 2
0 1 2 3 0 1 2
0 1 2 3 0 1 2
X
Will accept packetwith seq number 0
0 1 2 3 0 1 2 pkt3
(a) No problem
Receiver can’t see sender side.Receiver behavior identical in both cases!Something’s (very) wrong!
TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581
• Full duplex data:– Bi-directional data flow in
same connection– MSS: Maximum segment
size
• Connection-oriented:– Handshaking (exchange of
control msgs) inits sender, receiver state before data exchange
• Flow controlled:– Sender will not overwhelm
receiver
• Point-to-point:– One sender, one receiver
• Reliable, in-order byte stream:– No “message boundaries”
• Pipelined:– TCP congestion and flow
control set window size
• Send and receive buffers
46
socketdoor
TCPsend buffer
TCPreceive buffer
socketdoor
segment
applicationwrites data
applicationreads data
TCP Segment Structure
47
Source port # Dest port #
32 bits
Applicationdata
(variable length)
Sequence numberAcknowledgement number
Rcvr window size
Ptr urgent dataChecksumFSRPAUHead
LenNot
Used
Options (variable length)
URG: urgent data (generally not used)
ACK: ACK # valid
PSH: push data now(Generally not used)
RST, SYN, FIN:Connection establishment
(Setup, teardown cmds)
# Bytes receiver willingto accept
Counting by bytes of data(not segments!)
InternetChecksum
(as in UDP)
TCP Sequence #s and ACKsSeq. #’s:
– Byte stream “number” of first byte in segment’s data
ACKs:– Seq. # of next byte
expected from other side
– Cumulative ACKQ: How does receiver handle
out-of-order segments?– A: TCP spec doesn’t
say, up to implementer
48
Host A Host B
Seq=42, ACK=79, data = ‘C’
Seq=79, ACK=43, data = ‘C’
Seq=43, ACK=80
Usertypes‘C’
Host ACKsreceipt
of echoed‘C’
Host ACKsreceipt of‘C’, echoes
back ‘C’
timeSimple Telnet Scenario
TCP: Reliable Data Transfer (1)
49
Simplified sender, assuming
Waitfor
event
Event: Data received from application above
Event: timer timeout for segment with seq # y
Event: ACK received,with ACK # y
Create, send segment
Retransmit segment
ACK processing
• One way data transfer• No flow, congestion control
TCP: Reliable Data Transfer (2)
50
TCP ACK GenerationRFC 1122, RFC 2581
51
Event TCP Receiver Action
In-order segment arrival, no gaps, everything else already ACKed.
Delayed ACK. Wait up to 500 msecfor next segment. If no next segment, send ACK.
In-order segment arrival, no gaps, one delayed ACK pending
Immediately send single cumulative ACK.
Out-of-order segment arrival, higher-than-expected seq. #, gap detected
Send duplicate ACK, indicating seq. # of next expected byte
Arrival of segment that partially or completely fills gap
Immediate ACK if segment starts at lower end of gap.
TCP: Retransmission Scenarios
52
Host A
Seq=92, 8 bytes data
ACK=100
loss
timeo
ut
time Lost ACK scenario
Host B
X
Seq=92, 8 bytes data
ACK=100
Host A
Seq=100, 20 bytes data
ACK=100
Seq=
92 ti
meo
uttime Premature timeout,
cumulative ACKs
Host B
Seq=92, 8 bytes data
ACK=120
Seq=92, 8 bytes data
Seq=
100
timeo
ut
ACK=120
TCP Flow ControlReceiver: Explicitly informs
sender of (dynamically changing) amount of free buffer space – RcvWindow field in
TCP segmentSender: Keeps the amount of
transmitted, unACKed data less than most recently received RcvWindow
53
Sender won’t overrunreceiver’s buffers by
transmitting too much,too fast
Flow control
Receiver buffering
RcvBuffer = Size or TCP receive buffer
RcvWindow = Amount of spare room in buffer
TCP Round Trip Time and Timeout
Q: How to set TCP timeout value?
• Longer than RTT– Note: RTT will vary
• Too short: premature timeout– Unnecessary
retransmissions• Too long: slow reaction to
segment loss
Q: How to estimate RTT?• SampleRTT: measured time from
segment transmission until ACK receipt– Ignore retransmissions,
cumulatively ACKed segments• SampleRTT will vary, want estimated
RTT “smoother”– Use several recent measurements,
not just current SampleRTT
54
TCP Round Trip Time and Timeout
Setting the timeout• EstimatedRTT plus “safety margin”• Large variation in EstimatedRTT⇒ larger safety margin
55
• Exponential weighted moving average• Influence of given sample decreases exponentially fast• Typical value of x: 1/8
TCP Connection Management (1)Recall: TCP sender, receiver establish “connection” before exchanging data
segments• Initialize TCP variables:
– Sequence #s– Buffers, flow ctrl info (e.g. RcvWindow)
• Client: Connection initiators = socket(AF_INET,SOCK_STREAM)
s.connect((‘hostname’,port_num))
• Server: Contacted by clients = socket(AF_INET, SOCK_STREAM)s.bind(‘’, port_num)s.listen(1)conn_sock, addr = s.accept()
56
TCP Connection Management (2)
57
Three way handshake:Step 1: Client end system sends TCP
SYN control segment to server– Specifies initial seq # (client_isn)
Step 2: Server end system receives SYN, replies with SYNACK control segment– ACKs received SYN– Allocates buffers– Specifies server → receiver
initial seq. # (server_isn)Step 3: Client allocates buffers and
variables upon receiving SYNACK– SYN = 0– Seq = client_isn + 1 – Ack = server_isn +1
Client
SYN (client_isn, 0)
Server
SYNACK (server_isn,
client_isn + 1)
ACK (client_isn + 1,server_isn + 1)
connect
accept
Connection established!
TCP Connection Management (3)Closing a connection:
Client closes socket:conn_sock.close()
Step 1: Client end system sends TCP FIN control segment to server
Step 2: Server receives FIN, replies with ACK. Closes connection, sends FIN.
58
Client
FIN
Server
ACK
ACK
FIN
close
close
Closed
Tim
ed w
ait
TCP Connection Management (4)Step 3: Client receives FIN,
replies with ACK.
– Enters “timed wait” - will respond with ACK to received FINs
Step 4: Server receives ACK. Connection closed.
Note: With small modification, can handle simultaneous FINs.
59
Client
FIN
Server
ACK
ACK
FIN
closing
closing
Closed
Tim
ed w
ait
Closed
TCP State Machine
60
Source: G.R. Wright and W.R. Stevens, TCP/IP Illustrated, Vol. 2: The Implementation, Addison-Wesley, 1994.
• Solid line: client side• Dashed line: server side
Principles of Congestion Control
Congestion:• Informally: “too many sources sending too much data too
fast for network to handle”• Different from flow control!• Manifestations:– Lost packets (buffer overflow at routers)– Long delays (queueing in router buffers)
• Top-10 problem!
61
Causes/Costs of Congestion: Scenario 1
• Two senders, two receivers
• One router, infinite buffers
• No retransmission• Network capacity R
• Large delays when congested
• Maximum achievable throughput
62
R/2 R/2
R/2
Causes/Costs of Congestion: Scenario 2 (1)
• One router, finite buffers • Sender retransmission of lost packet
63
Causes/Costs of Congestion:Scenario 2 (2)
• Retransmission overhead limits throughput!
64
R/2
R/2
R/2
R/2
Scenario 1: No retransmissions
R/3
Scenario 2: Retransmissions
Approaches Towards Congestion Control
End-end congestion control:
• No explicit feedback from network
• Congestion inferred from end-system observed loss, delay
• Approach taken by TCP
Network-assisted congestion control:
• Routers provide feedback to end systems– Single bit indicating
congestion (SNA, DECbit, TCP/IP Explicit Congestion Notification, ATM)
– Explicit rate sender should send at
65
Two broad approaches towards congestion control:
TCP Congestion Control (1)• End-end control (no network assistance)• Transmission rate limited by congestion window size, cwnd,
over segments:
66
• Maximum segment size (MSS): largest possible segment size (in bytes)
• Recall: round trip time (RTT) (in seconds)• Throughput: w segments, each with MSS bytes, sent per RTT:
cwnd
TCP Congestion Control (2)
• Acceptable RTTs: ≤ 300 msec (based on geographic distance between sender, receiver; smaller is better!)
67
Route Distance (km) Time, light in vacuum (msec)
Time, light in fiber (msec)
RTT in fiber (msec)
New York to San Francisco 4,148 14 21 42
New York to London 5,585 19 28 56
New York to Sydney 15,993 53 80 160
Equatorial circumference 40,075 133.7 200 400
Source: I. Gregorik, High Performance Browser Networking, O’Reilly Media, 2017, Table 1-1.
TCP Congestion Control (3)• “Probing” for usable
bandwidth:– Ideally: transmit as fast as
possible (cwnd as large as possible) without loss
– Increase cwnd until loss (congestion)
– Loss: decrease cwnd, then begin probing (increasing) again
• Two “phases”– Slow start– Congestion avoidance
• Important variables:– cwnd– ssthresh: defines
threshold between two slow start phase, congestion control phase
68
TCP Slow Start
• cwnd grows additively for each RTT ⟹ exponentially with number of RTTs (not so slow!)
• Loss event: timeout (Tahoe TCP) and/or or three duplicate ACKs (Reno TCP) 69
Host A
1 segment
RTT
1
Host B
time
2 segments
4 segments
︙RT
T 2
RTT
3
︙
TCP Congestion Avoidance
70
• TCP Reno skips slow start (fast recovery) after 3 duplicate ACKs.In fast recovery:
TCP Congestion Control FSM
71
TCP Fairness: Problem Setting
1000 10 20 30 40 50 60 70 80 90
100
0
10
20
30
40
50
60
70
80
90
User 1's Throughput (% of Bandwidth Capacity R)
User
2's
Thro
ughp
ut (
% o
f Ban
dwid
th C
apac
ity R
) Feasible Bandwidth Allocation
Fairne
ss Lin
e
Optimal Point
72
Example for N = 2 users.
If R = 10 Gbps, each user shoulduse 5 Gbps of bandwidth.
Source: A.S. Tanenbaum and D.J. Wetherall,Computer Networks, 5th ed., Prentice Hall, 2011 (Figure 6.24).
• N users share a link with capacity R• Objectives: Each user uses 1/N of the bandwidth
TCP Fairness: Control Laws
• Additive increase:
• Multiplicative increase:
• Control laws:– AIAD: Additive increase, additive decrease– AIMD: Additive increase, multiplicative decrease– MIAD: Multiplicative increase, additive decrease– MIMD: Multiplicative increase, multiplicative decrease
• Question: Which control law achieves objective? Why?73
TCP Fairness: AIMD Congestion Control
• AIAD, MIMD oscillate around starting point
• TCP uses AIMD: achieves fairness, converges to optimum
• Chiu and Jain showed (in 1989) that AIMD is “best” for congested networks (ref. in book)
74
Feasible bandwidth
Source: Tanenbaum and Wetherall, Computer Networks, Fig. 6-24.
Why is TCP Fair?Assume we have 2 competing sessions:• Additive increase gives slope of 1, as throughput increases• Multiplicative decrease decreases throughput proportionally
75
Capacity (R)
Capacity (R)
Equal bandwidth share
Connection 1 throughput
Con
nect
ion
2 th
roug
hput
Congestion avoidance: additive increaseLoss: decrease window by factor of 2
Congestion avoidance: additive increaseLoss: decrease window by factor of 2
Summary
• Principles behind transport layer services:– Multiplexing/demultiplexing– Reliable data transfer– Flow control– Congestion control
• Instantiation and implementation in the Internet– UDP– TCP
Next:• Leaving the network “edge”
(application transport layer)• Entering the network “core”
76
Full TCP Congestion Control FSM
timeoutssthresh = cwnd / 2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
Λcwnd ≥ ssthresh
Congestionavoidance
cwnd = cwnd + (1 MSS) / (cwnd)dupACKcount = 0transmit new segment(s), as allowed
New ACK
dupACKcount++
duplicate ACK
Fastrecovery
cwnd = cwnd + 1 MSStransmit new segment(s), as allowed
duplicate ACK
ssthresh = cwnd / 2cwnd = ssthresh + 3 MSSretransmit missing segment
dupACKcount == 3
timeoutssthresh = cwnd/2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
ssthresh = cwnd / 2cwnd = ssthresh + 3 MSStransmit new segment(s), as allowed
dupACKcount == 3cwnd = ssthreshdupACKcount = 0
New ACK
Slowstart
timeoutssthresh = cwnd / 2cwnd = 1 MSSdupACKcount = 0retransmit missing segment
cwnd = cwnd + (1 MSS)dupACKcount = 0transmit new segment(s), as allowed
New ACKdupACKcount++
duplicate ACK
cwnd = 1 MSSssthresh = 64 KBdupACKcount = 0
Λ
New ACK!New ACK!
New ACK!