Upload
vodung
View
236
Download
0
Embed Size (px)
Citation preview
Topic 5 1
Computer Networking
Lent Term M/W/F 11-‐midday LT1 in Gates Building
Slide Set 5
Andrew W. Moore [email protected]
February 2014
1
Topic 5b – Transport
2
Our goals: • understand principles
behind transport layer services: – mulNplexing/
demulNplexing
– reliable data transfer – flow control – congesNon control
• learn about transport layer protocols in the Internet: – UDP: connecNonless transport – TCP: connecNon-‐oriented
transport
– TCP congesNon control
AutomaNc Repeat Request (ARQ)
+ Self-‐clocking (AutomaNc)
+ AdapNve
+ Flexible
-‐ Slow to start / adapt consider high Bandwidth/Delay product
Next lets move from the generic to the specific….
TCP arguably the most successful protocol in the Internet…..
its an ARQ protocol
3 4
TCP Header"
Source port Destination port
Sequence number
Acknowledgment
Advertised window HdrLen Flags 0
Checksum Urgent pointer
Options (variable)
Data
Used to mux and demux
Last Nme: Components of a soluNon for reliable transport
• Checksums (for error detecNon) • Timers (for loss detecNon) • Acknowledgments
– cumulaNve – selecNve
• Sequence numbers (duplicates, windows) • Sliding Windows (for efficiency)
– Go-‐Back-‐N (GBN) – SelecNve Replay (SR)
5
What does TCP do?
Many of our previous ideas, but some key differences • Checksum
6
Topic 5 2
7
TCP Header"
Source port Destination port
Sequence number
Acknowledgment
Advertised window HdrLen Flags 0
Checksum Urgent pointer
Options (variable)
Data
Computed over header and data
What does TCP do?
Many of our previous ideas, but some key differences • Checksum
• Sequence numbers are byte offsets
TCP: Segments and Sequence Numbers"
9
TCP “Stream of Bytes” Service…"
Byte 0
Byte 1
Byte 2
Byte 3
Byte 0
Byte 1
Byte 2
Byte 3
ApplicaNon @ Host A
ApplicaNon @ Host B
Byte 80
Byte 80
10
… Provided Using TCP “Segments”"
Byte 0 Byte 1 Byte 2 Byte 3
Byte 0 Byte 1 Byte 2 Byte 3
Host A
Host B
Byte 80
TCP Data
TCP Data
Byte 80
Segment sent when: 1. Segment full (Max Segment Size), 2. Not full, but Nmes out
11
TCP Segment"
• IP packet – No bigger than Maximum Transmission Unit (MTU) – E.g., up to 1500 bytes with Ethernet
• TCP packet – IP packet with a TCP header and data inside – TCP header ≥ 20 bytes long
• TCP segment – No more than Maximum Segment Size (MSS) bytes – E.g., up to 1460 consecuNve bytes from the stream – MSS = MTU – (IP header) – (TCP header)
IP Hdr
IP Data
TCP Hdr TCP Data (segment)
12
Topic 5 3
Sequence Numbers"
Host A
ISN (iniNal sequence number)
Sequence number = 1st byte in segment =
ISN + k
k bytes
13
Sequence Numbers"
Host B
TCP Data
TCP Data
TCP HDR
TCP HDR
ACK sequence number = next expected byte = seqno + length(data)
Host A
ISN (iniNal sequence number)
Sequence number = 1st byte in segment =
ISN + k
k
14
TCP Header"
Source port Destination port
Sequence number
Acknowledgment
Advertised window HdrLen Flags 0
Checksum Urgent pointer
Options (variable)
Data
Starting byte offset of data carried in this segment
15
• What does TCP do?
16
What does TCP do?
Most of our previous tricks, but a few differences • Checksum • Sequence numbers are byte offsets
• Receiver sends cumulaNve acknowledgements (like GBN)
ACKing and Sequence Numbers"• Sender sends packet
– Data starts with sequence number X – Packet contains B bytes [X, X+1, X+2, ….X+B-1]
• Upon receipt of packet, receiver sends an ACK – If all data prior to X already received:
• ACK acknowledges X+B (because that is next expected byte) – If highest in-order byte received is Y s.t. (Y+1) < X
• ACK acknowledges Y+1 • Even if this has been ACKed before
18
Topic 5 4
Normal Pakern
• Sender: seqno=X, length=B • Receiver: ACK=X+B • Sender: seqno=X+B, length=B • Receiver: ACK=X+2B • Sender: seqno=X+2B, length=B
• Seqno of next packet is same as last ACK field
19 20
TCP Header"
Source port Destination port
Sequence number
Acknowledgment
Advertised window HdrLen Flags 0
Checksum Urgent pointer
Options (variable)
Data
Acknowledgment gives seqno just beyond highest seqno received in order (“What Byte is Next”)
What does TCP do?
Most of our previous tricks, but a few differences • Checksum • Sequence numbers are byte offsets
• Receiver sends cumulaNve acknowledgements (like GBN)
• Receivers can buffer out-‐of-‐sequence packets (like SR)
21
Loss with cumulaNve ACKs
• Sender sends packets with 100B and seqnos.: – 100, 200, 300, 400, 500, 600, 700, 800, 900, …
• Assume the finh packet (seqno 500) is lost, but no others
• Stream of ACKs will be: – 200, 300, 400, 500, 500, 500, 500,…
22
What does TCP do?
Most of our previous tricks, but a few differences • Checksum • Sequence numbers are byte offsets
• Receiver sends cumulaNve acknowledgements (like GBN)
• Receivers may not drop out-‐of-‐sequence packets (like SR) • Introduces fast retransmit: opNmizaNon that uses duplicate
ACKs to trigger early retransmission
23
Loss with cumulaNve ACKs
• “Duplicate ACKs” are a sign of an isolated loss – The lack of ACK progress means 500 hasn’t been delivered – Stream of ACKs means some packets are being delivered
• Therefore, could trigger resend upon receiving k duplicate ACKs
• TCP uses k=3
• But response to loss is trickier…. 24
Topic 5 5
Loss with cumulaNve ACKs
• Two choices: – Send missing packet and increase W by the number of dup ACKs
– Send missing packet, and wait for ACK to increase W
• Which should TCP do?
25
What does TCP do?
Most of our previous tricks, but a few differences • Checksum • Sequence numbers are byte offsets
• Receiver sends cumulaNve acknowledgements (like GBN)
• Receivers do not drop out-‐of-‐sequence packets (like SR) • Introduces fast retransmit: opNmizaNon that uses duplicate
ACKs to trigger early retransmission
• Sender maintains a single retransmission Nmer (like GBN) and retransmits on Nmeout
26
Retransmission Timeout
• If the sender hasn’t received an ACK by Nmeout, retransmit the first packet in the window
• How do we pick a Nmeout value?
27
Timing IllustraNon
1
1
Timeout too long ! inefficient
1
1
Timeout too short ! duplicate packets
RTT
Timeout
Timeout
RTT
28
Retransmission Timeout
• If haven’t received ack by Nmeout, retransmit the first packet in the window
• How to set Nmeout? – Too long: connecNon has low throughput – Too short: retransmit packet that was just delayed
• SoluNon: make Nmeout proporNonal to RTT
• But how do we measure RTT?
29
RTT EsNmaNon
• Use exponenNal averaging of RTT samples
SampleRTT= AckRcvdTime− SendPacketTimeEstimatedRTT =α ×EstimatedRTT + (1−α)× SampleRTT0 <α ≤1
Estim
ated
RTT
Time
SampleRTT
30
Topic 5 6
ExponenNal Averaging Example
RTT
Nme
Es)matedRTT = α*Es)matedRTT + (1 – α)*SampleRTT
Assume RTT is constant ! SampleRTT = RTT
0 1 2 3 4 5 6 7 8 9
EstimatedRTT (α = 0.8)
EstimatedRTT (α = 0.5)
31
Problem: Ambiguous Measurements
• How do we differenNate between the real ACK, and ACK of the retransmiked packet?
ACK
Retransmission
Original Transmission
SampleR
TT
Sender Receiver
ACK Retransmission
Original Transmission
SampleR
TT
Sender Receiver
32
Karn/Partridge Algorithm"
• Measure SampleRTT only for original transmissions – Once a segment has been retransmitted, do not use it for any
further measurements • Computes EstimatedRTT using α = 0.875
• Timeout value (RTO) = 2 × EstimatedRTT • Employs exponential backoff
– Every time RTO timer expires, set RTO ← 2·RTO – (Up to maximum ≥ 60 sec) – Every time new measurement comes in (= successful original
transmission), collapse RTO back to 2 × EstimatedRTT
33
Karn/Partridge in acNon
from Jacobson and Karels, SIGCOMM 1988 34
Jacobson/Karels Algorithm"
• Problem: need to better capture variability in RTT – Directly measure deviation
• Deviation = | SampleRTT – EstimatedRTT | • EstimatedDeviation: exponential average of Deviation
• RTO = EstimatedRTT + 4 x EstimatedDeviation
35
With Jacobson/Karels
36
Topic 5 7
What does TCP do?
Most of our previous ideas, but some key differences • Checksum • Sequence numbers are byte offsets • Receiver sends cumulaNve acknowledgements (like GBN) • Receivers do not drop out-‐of-‐sequence packets (like SR) • Introduces fast retransmit: opNmizaNon that uses duplicate
ACKs to trigger early retransmission • Sender maintains a single retransmission Nmer (like GBN) and
retransmits on Nmeout
37
TCP Header: What’s left?"
Source port Destination port
Sequence number
Acknowledgment
Advertised window HdrLen Flags 0
Checksum Urgent pointer
Options (variable)
Data
“Must Be Zero” 6 bits reserved
Number of 4-byte words in TCP header; 5 = no options
38
TCP Header: What’s left?"
Source port Destination port
Sequence number
Acknowledgment
Advertised window HdrLen Flags 0
Checksum Urgent pointer
Options (variable)
Data
Used with URG flag to indicate urgent data (not discussed further)
39
TCP Header: What’s left?"
Source port Destination port
Sequence number
Acknowledgment
Advertised window HdrLen Flags 0
Checksum Urgent pointer
Options (variable)
Data
40
TCP Connection Establishment and Initial Sequence Numbers"
41
Initial Sequence Number (ISN)"• Sequence number for the very first byte • Why not just use ISN = 0? • Practical issue
– IP addresses and port #s uniquely identify a connection – Eventually, though, these port #s do get used again – … small chance an old packet is still in flight
• TCP therefore requires changing ISN • Hosts exchange ISNs when they establish a connection
42
Topic 5 8
Establishing a TCP Connection"
• Three-way handshake to establish connection – Host A sends a SYN (open; “synchronize sequence numbers”) to
host B – Host B returns a SYN acknowledgment (SYN ACK) – Host A sends an ACK to acknowledge the SYN ACK
SYN
SYN ACK
ACK
A B
Data Data
Each host tells its ISN to the other host.!
43
TCP Header"
Source port Destination port
Sequence number
Acknowledgment
Advertised window HdrLen Flags 0
Checksum Urgent pointer
Options (variable)
Data
Flags: SYN ACK FIN RST PSH URG
44
Step 1: A’s Initial SYN Packet"
A’s port B’s port
A’s Initial Sequence Number
(Irrelevant since ACK not set)
Advertised window 5 Flags 0
Checksum Urgent pointer
Options (variable)
Flags: SYN ACK FIN RST PSH URG
A tells B it wants to open a connection…!
45
Step 2: B’s SYN-ACK Packet"
B’s port A’s port
B’s Initial Sequence Number
ACK = A’s ISN plus 1
Advertised window 5 0
Checksum Urgent pointer
Options (variable)
Flags: SYN ACK FIN RST PSH URG
B tells A it accepts, and is ready to hear the next byte…!
… upon receiving this packet, A can start sending data!
Flags
46
Step 3: A’s ACK of the SYN-ACK"
A’s port B’s port
B’s ISN plus 1
Advertised window 20B Flags 0
Checksum Urgent pointer
Options (variable)
Flags: SYN ACK FIN RST PSH URG
A tells B it’s likewise okay to start sending!
A’s Initial Sequence Number
… upon receiving this packet, B can start sending data! 47
Timing Diagram: 3-Way Handshaking"
Client (initiator)
Server
SYN, SeqNum = x
SYN + ACK, SeqNum = y, Ack = x + 1
ACK, Ack = y + 1
Active Open
Passive Open
connect()!listen()!
48
Topic 5 9
What if the SYN Packet Gets Lost?"
• Suppose the SYN packet gets lost – Packet is lost inside the network, or: – Server discards the packet (e.g., it’s too busy)
• Eventually, no SYN-ACK arrives
– Sender sets a timer and waits for the SYN-ACK – … and retransmits the SYN if needed
• How should the TCP sender set the timer?
– Sender has no idea how far away the receiver is – Hard to guess a reasonable length of time to wait – SHOULD (RFCs 1122 & 2988) use default of 3 seconds
• Some implementations instead use 6 seconds
49
SYN Loss and Web Downloads"• User clicks on a hypertext link
– Browser creates a socket and does a “connect” – The “connect” triggers the OS to transmit a SYN
• If the SYN is lost… – 3-6 seconds of delay: can be very long – User may become impatient – … and click the hyperlink again, or click “reload”
• User triggers an “abort” of the “connect” – Browser creates a new socket and another “connect” – Essentially, forces a faster send of a new SYN packet! – Sometimes very effective, and the page comes quickly
50
Tearing Down the Connection"
51
Normal Termination, One Side At A Time"
• Finish (FIN) to close and receive remaining bytes – FIN occupies one byte in the sequence space
• Other host acks the byte to confirm • Closes A’s side of the connection, but not B’s
– Until B likewise sends a FIN – Which A then acks
SYN
SYN
ACK
ACK
D
ata
FIN
ACK
ACK
time A
B
FIN A
CK
TIME_WAIT:
Avoid reincarnation
B will retransmit FIN if ACK is lost
Connection now half-closed
Connection now closed
52
Normal Termination, Both Together"
• Same as before, but B sets FIN with their ack of A’s FIN
SYN
SYN
ACK
ACK
D
ata
FIN
FIN + A
CK
ACK
time A
B
ACK
Connection now closed
TIME_WAIT: Avoid reincarnation Can retransmit FIN ACK if ACK lost
53
Abrupt Termination"
• A sends a RESET (RST) to B – E.g., because application process on A crashed
• That’s it – B does not ack the RST – Thus, RST is not delivered reliably – And: any data in flight is lost – But: if B sends anything more, will elicit another RST
SYN
SYN
ACK
ACK
D
ata
RST A
CK
time A
B
Data RS
T
54
Topic 5 10
TCP Header"
Source port Destination port
Sequence number
Acknowledgment
Advertised window HdrLen Flags 0
Checksum Urgent pointer
Options (variable)
Data
Flags: SYN ACK FIN RST PSH URG
55
TCP State TransiNons
CLOSED
LISTEN
SYN_RCVD SYN_SENT
ESTABLISHED
CLOSE_WAIT
LAST_ACKCLOSING
TIME_WAIT
FIN_WAIT_2
FIN_WAIT_1
Passive open Close
Send/SYNSYN/SYN + ACK
SYN + ACK/ACK
SYN/SYN + ACK
ACK
Close/FIN
FIN/ACKClose/FIN
FIN/ACKACK + FIN/ACKTimeout after twosegment lifetimesFIN/ACK
ACK
ACK
ACK
Close/FIN
Close
CLOSED
Active open /SYN
Data, ACK exchanges are in here
56
An Simpler View of the Client Side
CLOSED
TIME_WAIT
FIN_WAIT2
FIN_WAIT1
ESTABLISHED
SYN_SENT
SYN (Send)
Rcv. SYN+ACK, Send ACK
Send FIN Rcv. ACK, Send Nothing
Rcv. FIN, Send ACK
57
TCP Header"
Source port Destination port
Sequence number
Acknowledgment
Advertised window HdrLen Flags 0
Checksum Urgent pointer
Options (variable)
Data
Used to negotiate use of additional features (details in section)
58
TCP Header"
Source port Destination port
Sequence number
Acknowledgment
Advertised window HdrLen Flags 0
Checksum Urgent pointer
Options (variable)
Data
59
• What does TCP do? – ARQ windowing, set-‐up, tear-‐down
• Flow Control in TCP
60
Topic 5 11
Recap: Sliding Window (so far)"
• Both sender & receiver maintain a window
• Left edge of window: – Sender: beginning of unacknowledged data – Receiver: beginning of undelivered data
• Right edge: Left edge + constant – constant only limited by buffer size in the
transport layer 61
Sliding Window at Sender (so far)"
Sending process!
First unACKed byte"
Last byte can send"
TCP"Last byte written"Previously"
ACKed bytes"
Buffer size (B)
62
Sliding Window at Receiver (so far)"
Receiving process!
Next byte needed (1st byte not received)"
Last byte read"
Last byte received"
Received and ACKed"
Buffer size (B)
Sender might overrun the receiver’s buffer
63
Solution: Advertised Window (Flow Control)"
• Receiver uses an “Advertised Window” (W) to prevent sender from overflowing its window – Receiver indicates value of W in ACKs – Sender limits number of bytes it can have in
flight <= W
64
Sliding Window at Receiver"
Receiving process!
Next byte needed (1st byte not received)"
Last byte read"
Last byte received"
Buffer size (B)
W= B -‐ (LastByteReceived -‐ LastByteRead)
65
Sliding Window at Sender (so far)"
Sending process!
First unACKed byte"
Last byte can send"
TCP"
Last byte written"W
66
Topic 5 12
Sliding Window w/ Flow Control"
• Sender: window advances when new data ack’d
• Receiver: window advances as receiving process consumes data
• Receiver advertises to the sender where the receiver window currently ends (“righthand edge”) – Sender agrees not to exceed this amount
67
Advertised Window Limits Rate"• Sender can send no faster than W/RTT
bytes/sec
• Receiver only advertises more space when it has consumed old arriving data
• In original TCP design, that was the sole protocol mechanism controlling sender’s rate
• What’s missing?
68
TCP
• The concepts underlying TCP are simple – acknowledgments (feedback) – Nmers
– sliding windows – buffer management – sequence numbers
69
TCP
• The concepts underlying TCP are simple • But tricky in the details
– How do we set Nmers? – What is the seqno for an ACK-‐only packet? – What happens if adverNsed window = 0? – What if the adverNsed window is ½ an MSS? – Should receiver acknowledge packets right away? – What if the applicaNon generates data in units of 0.1 MSS? – What happens if I get a duplicate SYN? Or a RST while I’m in FIN_WAIT, etc., etc., etc.
70
TCP
• The concepts underlying TCP are simple • But tricky in the details • Do the details maker?
71
Sizing Windows for CongesNon Control
• What are the problems? • How might we address them?
72
Topic 5 13
• What does TCP do? – ARQ windowing, set-‐up, tear-‐down
• Flow Control in TCP • CongesNon Control in TCP
73
We have seen: – Flow control: adjusting the sending rate to
keep from overwhelming a slow receiver
Now lets attend… – Congestion control: adjusting the sending rate
to keep from overloading the network
74
• If two packets arrive at the same time – A router can only transmit one – … and either buffers or drops the other
• If many packets arrive in a short period of time – The router cannot keep up with the arriving traffic – … delays traffic, and the buffer may eventually overflow
• Internet traffic is bursty
StaNsNcal MulNplexing ! CongesNon
75
Congestion is undesirable"
Average Packet delay
Load
Typical queuing system with bursty arrivals
Must balance utilization versus delay and loss!
Average Packet loss
Load
76
Who Takes Care of CongesNon?
• Network? End hosts? Both?
• TCP’s approach: – End hosts adjust sending rate – Based on implicit feedback from network
• Not the only approach – A consequence of history rather than planning
77
Some History: TCP in the 1980s
• Sending rate only limited by flow control – Packet drops ! senders (repeatedly!) retransmit a full window’s worth of packets
• Led to “congesNon collapse” starNng Oct. 1986 – Throughput on the NSF network dropped from 32Kbits/s to 40bits/sec
• “Fixed” by Van Jacobson’s development of TCP’s congesNon control (CC) algorithms
78
Topic 5 14
Jacobson’s Approach
• Extend TCP’s exisNng window-‐based protocol but adapt the window size in response to congesNon – required no upgrades to routers or applicaNons! – patch of a few lines of code to TCP implementaNons
• A pragmaNc and effecNve soluNon – but many other approaches exist
• Extensively improved on since – topic now sees less acNvity in ISP contexts – but is making a comeback in datacenter environments
79
Three Issues to Consider
• Discovering the available (bokleneck) bandwidth
• AdjusNng to variaNons in bandwidth
• Sharing bandwidth between flows
80
Abstract View
• Ignore internal structure of router and model it as having a single queue for a parNcular input-‐output pair
Sending Host Buffer in Router Receiving Host
A B
81
Discovering available bandwidth
• Pick sending rate to match bokleneck bandwidth – Without any a priori knowledge – Could be gigabit link, could be a modem
A B 100 Mbps
82
AdjusNng to variaNons in bandwidth
• Adjust rate to match instantaneous bandwidth
– Assuming you have rough idea of bandwidth
A B BW(t)
83
MulNple flows and sharing bandwidth
Two Issues: • Adjust total sending rate to match bandwidth
• AllocaNon of bandwidth between flows
A2 B2 BW(t)
A1
A3 B3
B1
84
Topic 5 15
Reality
CongesNon control is a resource allocaNon problem involving many flows, many links, and complicated global dynamics
85
View from a single flow
• Knee – point aner which – Throughput increases slowly – Delay increases fast
• Cliff – point aner which – Throughput starts to drop to zero
(congesNon collapse)
– Delay approaches infinity
Load
Load
Throughp
ut
Delay
knee cliff
congesNon collapse
packet loss
86
General Approaches
(0) Send without care – Many packet drops
87
General Approaches
(0) Send without care (1) ReservaNons
– Pre-‐arrange bandwidth allocaNons – Requires negoNaNon before sending packets – Low uNlizaNon
88
General Approaches
(0) Send without care (1) ReservaNons
(2) Pricing – Don’t drop packets for the high-‐bidders – Requires payment model
89
General Approaches
(0) Send without care (1) ReservaNons
(2) Pricing
(3) Dynamic Adjustment – Hosts probe network; infer level of congesNon; adjust – Network reports congesNon level to hosts; hosts adjust – CombinaNons of the above – Simple to implement but subopNmal, messy dynamics
90
Topic 5 16
General Approaches
(0) Send without care (1) ReservaNons (2) Pricing (3) Dynamic Adjustment All three techniques have their place • Generality of dynamic adjustment has proven powerful • Doesn’t presume business model, traffic characterisNcs,
applicaNon requirements; does assume good ciNzenship
91
TCP’s Approach in a Nutshell
• TCP connecNon has window – Controls number of packets in flight
• Sending rate: ~Window/RTT
• Vary window size to control sending rate
92
All These Windows…
• CongesNon Window: CWND – How many bytes can be sent without overflowing routers
– Computed by the sender using congesNon control algorithm
• Flow control window: AdverNsedWindow (RWND)
– How many bytes can be sent without overflowing receiver’s buffers
– Determined by the receiver and reported to the sender
• Sender-side window = minimum{CWND,RWND} • Assume for this lecture that RWND >> CWND
93
Note
• This lecture will talk about CWND in units of MSS – (Recall MSS: Maximum Segment Size, the amount of payload data in a TCP packet)
– This is only for pedagogical purposes
• Keep in mind that real implementaNons maintain CWND in bytes
94
Two Basic QuesNons
• How does the sender detect congesNon?
• How does the sender adjust its sending rate? – To address three issues
• Finding available bokleneck bandwidth • AdjusNng to bandwidth variaNons • Sharing bandwidth
95
Detecting Congestion"• Packet delays
– Tricky: noisy signal (delay often varies considerably)
• Router tell endhosts they’re congested
• Packet loss – Fail-safe signal that TCP already has to detect – Complication: non-congestive loss (checksum errors)
• Two indicators of packet loss – No ACK aner certain Nme interval: Nmeout – MulNple duplicate ACKs
96
Topic 5 17
Not All Losses the Same
• Duplicate ACKs: isolated loss – Still getting ACKs
• Timeout: much more serious – Not enough dupacks – Must have suffered several losses
• Will adjust rate differently for each case
97
Rate Adjustment
• Basic structure: – Upon receipt of ACK (of new data): increase rate – Upon detecNon of loss: decrease rate
• How we increase/decrease the rate depends on the phase of congesNon control we’re in: – Discovering available bokleneck bandwidth vs. – AdjusNng to bandwidth variaNons
98
Bandwidth Discovery with Slow Start"
• Goal: estimate available bandwidth – start slow (for safety) – but ramp up quickly (for efficiency)
• Consider – RTT = 100ms, MSS=1000bytes – Window size to fill 1Mbps of BW = 12.5 packets – Window size to fill 1Gbps = 12,500 packets – Either is possible!
99
“Slow Start” Phase"• Sender starts at a slow rate but increases
exponentially until first loss
• Start with a small congestion window – Initially, CWND = 1 – So, initial sending rate is MSS/RTT
• Double the CWND for each RTT with no loss
100
Slow Start in Action"
• For each RTT: double CWND
• Simpler implementaNon: for each ACK, CWND += 1
D A D D A A D D
Src
Dest
D D
1 2 4 3
A A A A
8
101
AdjusNng to Varying Bandwidth
• Slow start gave an esNmate of available bandwidth
• Now, want to track variaNons in this available bandwidth, oscillaNng around its current value – Repeated probing (rate increase) and backoff (rate decrease)
• TCP uses: “AddiNve Increase MulNplicaNve Decrease” (AIMD) – We’ll see why shortly…
102
Topic 5 18
AIMD
• Additive increase – Window grows by one MSS for every RTT with no
loss – For each successful RTT, CWND = CWND + 1 – Simple implementation:
• for each ACK, CWND = CWND+ 1/CWND
• Multiplicative decrease – On loss of packet, divide congestion window in half – On loss, CWND = CWND/2
103
Leads to the TCP “Sawtooth”"
Loss
Exponential “slow start”
t
Window
104
Slow-‐Start vs. AIMD
• When does a sender stop Slow-‐Start and start AddiNve Increase?
• Introduce a “slow start threshold” (ssthresh) – IniNalized to a large value – On Nmeout, ssthresh = CWND/2
• When CWND = ssthresh, sender switches from slow-‐start to AIMD-‐style increase
105 105
• What does TCP do? – ARQ windowing, set-‐up, tear-‐down
• Flow Control in TCP • CongesNon Control in TCP
– AIMD
106
Why AIMD?"
107
Recall: Three Issues
• Discovering the available (bokleneck) bandwidth – Slow Start
• AdjusNng to variaNons in bandwidth – AIMD
• Sharing bandwidth between flows 108
Topic 5 19
Goals for bandwidth sharing
• Efficiency: High uNlizaNon of link bandwidth
• Fairness: Each flow gets equal share
109
Why AIMD?
• Some rate adjustment opNons: Every RTT, we can – MulNplicaNve increase or decrease: CWND→ a*CWND
– AddiNve increase or decrease: CWND→ CWND + b
• Four alternaNves: – AIAD: gentle increase, gentle decrease – AIMD: gentle increase, drasNc decrease – MIAD: drasNc increase, gentle decrease – MIMD: drasNc increase and decrease
110
Simple Model of Congestion Control"
User 1’s rate (x1)
User 2’s rate (x
2)
Fairness line
Efficiency line
1
1
• Two users – rates x1 and x2
• Congestion when x1+x2 > 1
• Unused capacity when x1+x2 < 1
• Fair when x1 =x2
congested !
" inefficient
111
Example"
User 1: x1
Use
r 2: x
2
fairness line
efficiency line
1
1
Inefficient: x1+x2=0.7
(0.2, 0.5)
Congested: x1+x2=1.2
(0.7, 0.5)
Efficient: x1+x2=1 Fair
(0.5, 0.5)
congested !
" inefficient
Efficient: x1+x2=1 Not fair
(0.7, 0.3)
112
AIAD"
User 1: x1
Use
r 2: x
2
fairness line
efficiency line
(x1,x2)
(x1-aD,x2-aD)
(x1-aD+aI), x2-aD+aI)) • Increase: x + aI
• Decrease: x - aD
• Does not converge to fairness
congested !
" inefficient
113
MIMD"
User 1: x1
Use
r 2: x
2
fairness line
efficiency line
(x1,x2)
(bdx1,bdx2)
(bIbDx1, bIbDx2)
• Increase: x*bI • Decrease: x*bD
• Does not converge to fairness
congested !
" inefficient
114
Topic 5 20
(bDx1+aI, bDx2+aI)
AIMD"
User 1: x1
Use
r 2: x
2
fairness line
efficiency line
(x1,x2)
(bDx1,bDx2)
• Increase: x+aI • Decrease: x*bD
• Converges to fairness
congested !
" inefficient
115 116
Why is AIMD fair? (a preky animaNon…)
Two compeNng sessions: • AddiNve increase gives slope of 1, as throughout increases
• mulNplicaNve decrease decreases throughput proporNonally
R
R
equal bandwidth share
Bandwidth for ConnecNon 1
Ban
dwid
th fo
r Con
necNon
2
congesNon avoidance: addiNve increase
loss: decrease window by factor of 2
congesNon avoidance: addiNve increase loss: decrease window by factor of 2
117
AIMD Sharing Dynamics"A
50 pkts/sec B x1
D E
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
Rates equalize ! fair share
x2
117 118
AIAD Sharing Dynamics"
A B x1
D E
0
10
20
30
40
50
60
1 28 55 82 109
136
163
190
217
244
271
298
325
352
379
406
433
460
487
x2
118
TCP Congestion Control Details"
119
ImplementaNon
• State at sender – CWND (iniNalized to a small constant) – ssthresh (iniNalized to a large constant) – [Also dupACKcount and Nmer, as before]
• Events – ACK (new data) – dupACK (duplicate ACK for old data) – Timeout
120
Topic 5 21
Event: ACK (new data)
• If CWND < ssthresh – CWND += 1
• CWND packets per RTT • Hence aIer one RTT
with no drops: CWND = 2xCWND
121
Event: ACK (new data)
• If CWND < ssthresh – CWND += 1
• Else – CWND = CWND + 1/CWND
Slow start phase
• CWND packets per RTT • Hence aIer one RTT
with no drops: CWND = CWND + 1
“Conges)on Avoidance” phase (addi)ve increase)
122
Event: TimeOut
• On Timeout – ssthresh " CWND/2 – CWND " 1
123
Event: dupACK
• dupACKcount ++
• If dupACKcount = 3 /* fast retransmit */ – ssthresh = CWND/2 – CWND = CWND/2
124
Example"
t
Window
Slow-start restart: Go back to CWND = 1 MSS, but take advantage of knowing the previous value of CWND
Slow start in operation until it reaches half of previous CWND, I.e., SSTHRESH
Timeout Fast Retransmission
SSThresh Set to Here
125
• What does TCP do? – ARQ windowing, set-‐up, tear-‐down
• Flow Control in TCP • CongesNon Control in TCP
– AIMD, Fast-‐Recovery
126 126
Topic 5 22
One Final Phase: Fast Recovery
• The problem: congesNon avoidance too slow in recovering from an isolated loss
127
Example (in units of MSS, not bytes)
• Consider a TCP connecNon with: – CWND=10 packets – Last ACK was for packet # 101
• i.e., receiver expecNng next packet to have seq. no. 101
• 10 packets [101, 102, 103,…, 110] are in flight – Packet 101 is dropped – What ACKs do they generate? – And how does the sender respond?
128
Timeline
• ACK 101 (due to 102) cwnd=10 dupACK#1 (no xmit) • ACK 101 (due to 103) cwnd=10 dupACK#2 (no xmit) • ACK 101 (due to 104) cwnd=10 dupACK#3 (no xmit) • RETRANSMIT 101 ssthresh=5 cwnd= 5 • ACK 101 (due to 105) cwnd=5 + 1/5 (no xmit) • ACK 101 (due to 106) cwnd=5 + 2/5 (no xmit) • ACK 101 (due to 107) cwnd=5 + 3/5 (no xmit) • ACK 101 (due to 108) cwnd=5 + 4/5 (no xmit) • ACK 101 (due to 109) cwnd=5 + 5/5 (no xmit) • ACK 101 (due to 110) cwnd=6 + 1/5 (no xmit) • ACK 111 (due to 101) # only now can we transmit new packets • Plus no packets in flight so ACK “clocking” (to increase CWND) stalls for
another RTT
129 129
SoluNon: Fast Recovery
Idea: Grant the sender temporary “credit” for each dupACK so as to keep packets in flight • If dupACKcount = 3
– ssthresh = cwnd/2 – cwnd = ssthresh + 3
• While in fast recovery – cwnd = cwnd + 1 for each addiNonal duplicate ACK
• Exit fast recovery aner receiving new ACK – set cwnd = ssthresh
130
Example
• Consider a TCP connecNon with: – CWND=10 packets – Last ACK was for packet # 101
• i.e., receiver expecNng next packet to have seq. no. 101
• 10 packets [101, 102, 103,…, 110] are in flight – Packet 101 is dropped
131
Timeline
• ACK 101 (due to 102) cwnd=10 dup#1 • ACK 101 (due to 103) cwnd=10 dup#2 • ACK 101 (due to 104) cwnd=10 dup#3 • REXMIT 101 ssthresh=5 cwnd= 8 (5+3) • ACK 101 (due to 105) cwnd= 9 (no xmit) • ACK 101 (due to 106) cwnd=10 (no xmit) • ACK 101 (due to 107) cwnd=11 (xmit 111) • ACK 101 (due to 108) cwnd=12 (xmit 112) • ACK 101 (due to 109) cwnd=13 (xmit 113) • ACK 101 (due to 110) cwnd=14 (xmit 114) • ACK 111 (due to 101) cwnd = 5 (xmit 115) # exiNng fast recovery • Packets 111-‐114 already in flight • ACK 112 (due to 111) cwnd = 5 + 1/5 " back in congesNon avoidance
Topic 5 23
Pu�ng it all together: The TCP State Machine (parNal)
• How are ssthresh, CWND and dupACKcount updated for each event that causes a state transiNon?
slow start
congstn. avoid.
fast recovery
cwnd > ssthresh
)meout
dupACK=3
)meout
dupACK=3 new ACK
dupACK
new ACK
)meout new ACK
TCP Flavors
• TCP-‐Tahoe – cwnd =1 on triple dupACK
• TCP-‐Reno – cwnd =1 on Nmeout – cwnd = cwnd/2 on triple dupack
• TCP-‐newReno – TCP-‐Reno + improved fast recovery
• TCP-‐SACK – incorporates selecNve acknowledgements
• What does TCP do? – ARQ windowing, set-‐up, tear-‐down
• Flow Control in TCP • CongesNon Control in TCP
– AIMD, Fast-‐Recovery, Throughput
135
TCP Throughput Equation"
136
A
A Simple Model for TCP Throughput
Timeouts
t
cwnd
1
RTT
maxW
2maxW
½ Wmax RTTs between drops
Avg. ¾ Wmax packets per RTTs
A
A Simple Model for TCP Throughput
Timeouts
t
cwnd
maxW
2maxW
Packet drop rate, p =1/ A, where A = 38Wmax
2
Throughput, B = AWmax
2!
"#
$
%&RTT
=32
1RTT p
138
Topic 5 24
Some implicaNons: (1) Fairness
• Flows get throughput inversely proporNonal to RTT – Is this fair?
Throughput, B = 32
1RTT p
Some ImplicaNons: (2) How does this look at high speed?
• Assume that RTT = 100ms, MSS=1500bytes
• What value of p is required to go 100Gbps? – Roughly 2 x 10-‐12
• How long between drops? – Roughly 16.6 hours
• How much data has been sent in this Nme? – Roughly 6 petabits
• These are not pracNcal numbers!
140
Some implicaNons: (3) Rate-‐based CongesNon Control
• One can dispense with TCP and just match eqtn: – EquaNon-‐based congesNon control – Measure drop percentage p, and set rate accordingly – Useful for streaming applicaNons
Throughput, B = 32
1RTT p
Some ImplicaNons: (4) Lossy Links
• TCP assumes all losses are due to congesNon
• What happens when the link is lossy?
• Throughput ~ 1/sqrt(p) where p is loss prob.
• This applies even for non-‐congesNon losses!
142
Other Issues: CheaNng
• CheaNng pays off
• Some favorite approaches to cheaNng: – Increasing CWND faster than 1 per RTT – Using large iniNal CWND
– Opening many connecNons
Increasing CWND Faster
A B x
D E y
Limit rates: x = 2y
C
x
y
x increases by 2 per RTT y increases by 1 per RTT
144
Topic 5 25
A Closer look at problems with
TCP CongesNon Control
145
TCP State Machine
slow start
congstn. avoid.
fast recovery
cwnd > ssthresh
)meout
dupACK=3
)meout
dupACK=3 new ACK
dupACK
new ACK
)meout
new ACK dupACK
dupACK
146
TCP State Machine
slow start
congstn. avoid.
fast recovery
cwnd > ssthresh
)meout
dupACK=3
)meout
dupACK=3 new ACK
dupACK
new ACK
)meout
new ACK dupACK
dupACK
TCP State Machine
slow start
congstn. avoid.
fast recovery
cwnd > ssthresh
)meout
dupACK=3
)meout
dupACK=3 new ACK
dupACK
new ACK
)meout
new ACK dupACK
dupACK
148
TCP State Machine
slow start
congstn. avoid.
fast recovery
cwnd > ssthresh
)meout
dupACK=3
)meout
dupACK=3 new ACK
dupACK
new ACK
)meout
new ACK dupACK
dupACK
TCP Flavors
• TCP-‐Tahoe – CWND =1 on triple dupACK
• TCP-‐Reno – CWND =1 on Nmeout – CWND = CWND/2 on triple dupack
• TCP-‐newReno – TCP-‐Reno + improved fast recovery
• TCP-‐SACK – incorporates selecNve acknowledgements
Our default assumpNon
150
Topic 5 26
Interoperability
• How can all these algorithms coexist? Don’t we need a single, uniform standard?
• What happens if I’m using Reno and you are using Tahoe, and we try to communicate?
151
TCP Throughput Equation"
152
A
A Simple Model for TCP Throughput
Loss
t
cwnd
1
RTT
maxW
2maxW
½ Wmax RTTs between drops
Avg. ¾ Wmax packets per RTTs 153
A
A Simple Model for TCP Throughput
Loss
t
cwnd
maxW
2maxW
Packet drop rate, p =1/ A, where A = 38Wmax
2
Throughput, B = AWmax
2!
"#
$
%&RTT
=32
1RTT p
154
ImplicaNons (1): Different RTTs
• Flows get throughput inversely proporNonal to RTT • TCP unfair in the face of heterogeneous RTTs!
Throughput = 32
1RTT p
A1
A2 B2
B1
boZleneck link
100ms
200ms
155
ImplicaNons (2): High Speed TCP
• Assume RTT = 100ms, MSS=1500bytes
• What value of p is required to reach 100Gbps throughput – ~ 2 x 10-‐12
• How long between drops? – ~ 16.6 hours
• How much data has been sent in this Nme? – ~ 6 petabits
• These are not pracNcal numbers!
Throughput = 32
1RTT p
156
Topic 5 27
AdapNng TCP to High Speed
– Once past a threshold speed, increase CWND faster – A proposed standard [Floyd’03]: once speed is past some threshold, change equaNon to p-‐.8 rather than p-‐.5
– Let the addiNve constant in AIMD depend on CWND
• Other approaches? – MulNple simultaneous connecNons (hack but works today)
– Router-‐assisted approaches (will see shortly)
157
ImplicaNons (3): Rate-‐based CC
• TCP throughput is “choppy” – repeated swings between W/2 to W
• Some apps would prefer sending at a steady rate – e.g., streaming apps
• A soluNon: “EquaNon-‐Based CongesNon Control” – ditch TCP’s increase/decrease rules and just follow the equaNon – measure drop percentage p, and set rate accordingly
• Following the TCP equaNon ensures we’re “TCP friendly” – i.e., use no more than TCP does in similar se�ng
Throughput = 32
1RTT p
158
• What does TCP do? – ARQ windowing, set-‐up, tear-‐down
• Flow Control in TCP • CongesNon Control in TCP
– AIMD, Fast-‐Recovery, Throughput
• LimitaNons of TCP CongesNon Control
159 159
Other LimitaNons of TCP CongesNon Control
160
(4) Loss not due to congesNon?
• TCP will confuse any loss event with congesNon
• Flow will cut its rate – Throughput ~ 1/sqrt(p) where p is loss prob. – Applies even for non-‐congesNon losses!
• We’ll look at proposed soluNons shortly… 161
(5) How do short flows fare?
• 50% of flows have < 1500B to send; 80% < 100KB
• ImplicaNon (1): short flows never leave slow start! – short flows never akain their fair share
• ImplicaNon (2): too few packets to trigger dupACKs – Isolated loss may lead to Nmeouts
– At typical Nmeout values of ~500ms, might severely impact flow compleNon Nme
162 162
Topic 5 28
(6) TCP fills up queues ! long delays
• A flow deliberately overshoots capacity, unNl it experiences a drop
• Means that delays are large for everyone – Consider a flow transferring a 10GB file sharing a bokleneck link with 10 flows transferring 100B
163
(7) CheaNng
• Three easy ways to cheat – Increasing CWND faster than +1 MSS per RTT
164
Increasing CWND Faster
Limit rates: x = 2y
C
x
y x increases by 2 per RTT y increases by 1 per RTT
165
(7) CheaNng
• Three easy ways to cheat – Increasing CWND faster than +1 MSS per RTT – Opening many connecNons
166
Open Many ConnecNons
A B x
D E y
Assume • A starts 10 connecNons to B • D starts 1 connecNon to E • Each connecNon gets about the same throughput Then A gets 10 Nmes more throughput than D
167
(7) CheaNng
• Three easy ways to cheat – Increasing CWND faster than +1 MSS per RTT – Opening many connecNons
– Using large iniNal CWND
• Why hasn’t the Internet suffered a congesNon collapse yet?
168
Topic 5 29
(8) CC intertwined with reliability
$ Mechanisms for CC and reliability are Nghtly coupled $ CWND adjusted based on ACKs and Nmeouts
$ CumulaNve ACKs and fast retransmit/recovery rules
$ Complicates evoluNon $ Consider changing from cumulaNve to selecNve ACKs
$ A failure of modularity, not layering
$ SomeNmes we want CC but not reliability $ e.g., real-‐Nme applicaNons
$ SomeNmes we want reliability but not CC (?) 169 169
Recap: TCP problems
• Misled by non-‐congesNon losses • Fills up queues leading to high delays • Short flows complete before discovering available capacity • AIMD impracNcal for high speed links
• Sawtooth discovery too choppy for some apps
• Unfair under heterogeneous RTTs • Tight coupling with reliability mechanisms
• Endhosts can cheat
Could fix many of these with some help from routers!
Routers tell endpoints if they’re congested
Routers tell endpoints what rate to send at
Routers enforce fair sharing
170
• What does TCP do? – ARQ windowing, set-‐up, tear-‐down
• Flow Control in TCP • CongesNon Control in TCP
– AIMD, Fast-‐Recovery, Throughput
• LimitaNons of TCP CongesNon Control
• Router-‐assisted CongesNon Control
171 171
Router-‐Assisted CongesNon Control
• Three tasks for CC: – IsolaNon/fairness – Adjustment
– DetecNng congesNon
172
How can routers ensure each flow gets its “fair share”?
173
Fairness: General Approach
• Routers classify packets into “flows” – (For now) flows are packets between same source/desNnaNon
• Each flow has its own FIFO queue in router
• Router services flows in a fair fashion – When line becomes free, take packet from next flow in a fair order
• What does “fair” mean exactly?
174
Topic 5 30
Max-Min Fairness"• Given set of bandwidth demands ri and total bandwidth
C, max-min bandwidth allocations are: ai = min(f, ri)
where f is the unique value such that Sum(ai) = C
r1
r2
r3
? ?
? C bits/s
175 175
Example"• C = 10; r1 = 8, r2 = 6, r3 = 2; N = 3 • C/3 = 3.33 →
– Can service all of r3 – Remove r3 from the accounting: C = C – r3 = 8; N = 2
• C/2 = 4 → – Can’t service all of r1 or r2 – So hold them to the remaining fair share: f = 4
8 6 2
4 4
2
f = 4: min(8, 4) = 4 min(6, 4) = 4 min(2, 4) = 2
10
176
Max-Min Fairness"• Given set of bandwidth demands ri and total bandwidth
C, max-min bandwidth allocations are: ai = min(f, ri)
• where f is the unique value such that Sum(ai) = C
• Property: – If you don’t get full demand, no one gets more than you
• This is what round-robin service gives if all packets are
the same size
177
How do we deal with packets of different sizes?
• Mental model: Bit-‐by-‐bit round robin (“fluid flow”)
• Can you do this in pracNce?
• No, packets cannot be preempted
• But we can approximate it – This is what “fair queuing” routers do
178
Fair Queuing (FQ)
• For each packet, compute the Nme at which the last bit of a packet would have len the router if flows are served bit-‐by-‐bit
• Then serve packets in the increasing order of their deadlines
179
Example
1 2 3 4 5
1 2 3 4
1 2 3
1 2 4
3 4 5
5 6
1 2 1 3 2 3 4 4
5 6
5 5 6
Flow 1 (arrival traffic)
Flow 2 (arrival traffic)
Service in fluid flow system
FQ Packet system
Nme
Nme
Nme
Nme
180
Topic 5 31
Fair Queuing (FQ)
• Think of it as an implementation of round-robin generalized to the case where not all packets are equal sized
• Weighted fair queuing (WFQ): assign different flows different shares
• Today, some form of WFQ implemented in almost all routers – Not the case in the 1980-90s, when CC was being developed – Mostly used to isolate traffic at larger granularities (e.g., per-prefix)
181
FQ vs. FIFO
• FQ advantages: – IsolaNon: cheaNng flows don’t benefit – Bandwidth share does not depend on RTT – Flows can pick any rate adjustment scheme they want
• Disadvantages:
– More complex than FIFO: per flow queue/state, addiNonal per-‐packet book-‐keeping
FQ in the big picture
• FQ does not eliminate congesNon ! it just manages the congesNon
1Gbps 100M
bps
1Gbps
5Gbps 1Gbps
Blue and Green get 0.5Gbps; any excess will be dropped
Will drop an addiNonal 400Mbps from the green flow
If the green flow doesn’t drop its sending rate to 100Mbps, we’re wasNng 400Mbps that could be
usefully given to the blue flow
FQ in the big picture
• FQ does not eliminate congesNon ! it just manages the congesNon – robust to cheaNng, variaNons in RTT, details of delay, reordering, retransmission, etc.
• But congesNon (and packet drops) sNll occurs
• And we sNll want end-‐hosts to discover/adapt to their fair share!
• What would the end-‐to-‐end argument say w.r.t. congesNon control?
Fairness is a controversial goal
• What if you have 8 flows, and I have 4? – Why should you get twice the bandwidth
• What if your flow goes over 4 congested hops, and mine only goes over 1? – Why shouldn’t you be penalized for using more scarce bandwidth?
• And what is a flow anyway? – TCP connecNon – Source-‐DesNnaNon pair? – Source?
Router-‐Assisted CongesNon Control
• CC has three different tasks: – IsolaNon/fairness – Rate adjustment
– DetecNng congesNon
Topic 5 32
Why not just let routers tell endhosts what rate they should use?
• Packets carry “rate field”
• Routers insert “fair share” f in packet header – Calculated as with FQ
• End-hosts set sending rate (or window size) to f – hopefully (still need some policing of endhosts!)
• This is the basic idea behind the “Rate Control Protocol” (RCP) from Dukkipati et al. ’07
188
Flow Completion Time: TCP vs. RCP (Ignore XCP)
Flow DuraNon (secs) vs. Flow Size # AcNve Flows vs. Nme
RCP
RCP
Why the improvement?
189
Router-‐Assisted CongesNon Control
• CC has three different tasks: – IsolaNon/fairness – Rate adjustment
– DetecNng congesNon
190
Explicit CongesNon NoNficaNon (ECN)
• Single bit in packet header; set by congested routers – If data packet has bit set, then ACK has ECN bit set
• Many opNons for when routers set the bit – tradeoff between (link) uNlizaNon and (packet) delay
• CongesNon semanNcs can be exactly like that of drop – I.e., endhost reacts as though it saw a drop
• Advantages: – Don’t confuse corrupNon with congesNon; recovery w/ rate adjustment
– Can serve as an early indicator of congesNon to avoid delays – Easy (easier) to incrementally deploy
• defined as extension to TCP/IP in RFC 3168 (uses diffserv bits in the IP header) 191
One final proposal: Charge people for congesNon!
• Use ECN as congesNon markers
• Whenever I get an ECN bit set, I have to pay $$
• Now, there’s no debate over what a flow is, or what fair is…
• Idea started by Frank Kelly here in Cambridge – “opNmal” soluNon, backed by much math
– Great idea: simple, elegant, effecNve
– Unclear that it will impact pracNce – although London congesNon works
192
Topic 5 33
Synchronized Flows Many TCP Flows • Aggregate window has same
dynamics • Therefore buffer occupancy has
same dynamics • Rule-of-thumb still holds.
• Independent, desynchronized • Central limit theorem says the
aggregate becomes Gaussian • Variance (buffer size) decreases
as N increases
Some TCP issues outstanding…
Probability DistribuNon
t
Buffer Size
t
193
• What does TCP do? – ARQ windowing, set-‐up, tear-‐down
• Flow Control in TCP • CongesNon Control in TCP
– AIMD, Fast-‐Recovery, Throughput
• LimitaNons of TCP CongesNon Control
• Router-‐assisted CongesNon Control
194
TCP in detail
Recap
• TCP: – somewhat hacky – but pracNcal/deployable – good enough to have raised the bar for the deployment of new, more opNmal, approaches
– though the needs of datacenters might change the status quos
195