Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Congestion
Destination1.5-Mbps T1 link
Router
Source2
Source1
100-Mbps FDDI
10-Mbps Ethernet
• Can’t sustain input rate >> output rate
• Issues:- Avoid congestion
- Control congestion
- Prioritize who gets limited resources
Taxonomy of approaches
• Router-centric vs. host-centric- hosts at the edges of the network (transport protocol)
- routers inside the network (queuing discipline)
• Reservation based vs. feedback based- pre-allocate resources so at to avoid congestion
- control congestion if (and when) is occurs
• Window based vs. rate based
• Best-effort (today) vs. multiple QoS (Thursday)
Router design issues
• Scheduling discipline- Which of multiple packets should you send next?
- May want to achieve some notion of fairness
- May want some packets to have priority
• Drop policy- When should you discard a packet?
- Which packet to discard?
- Some packets more important (perhaps BGP)
- Some packets useless w/o others (cells in AAL5 CS-PDU)
- Need to balance throughput & delay
Example: FIFO tail dropArrivingpacket
Next freebuffer
Free buffers Queued packets
Next totransmit
(a)
Arrivingpacket
Next totransmit
(b) Drop
• Differentiates packets only by when they arrive
• Might not provide useful feedback for sending hosts
What to optimize for?
• Fairness (in two slides)
• High throughput – queue should never be empty
• Low delay – so want short queues
• Crude combination: power = Throughput/Delay- Want to convince hosts to offer optimal load
Optimalload
Load
Thro
ughp
ut/
del
ay
Connectionless flows
Router
Source
2
Source
1
Source
3
Router
Router
Destination
2
Destination
1
• Even in Internet, routers can have a notion of flows- E.g., base on IP addresses & TCP ports (or hash of those)
- Soft state—doesn’t have to be correct
- But if often correct, can use to form router policies
Fairness
• What is fair in this situation?- Each flow gets 1/2 link b/w? Long flow gets less?
• Usually fair means equal- For flow bandwidths (x1, . . . , xn), fairness index:
f(x1, . . . , xn) =(∑
n
i=1xi)
2
n∑
n
i=1x2
i
- If all xis are equal, fairness is one
• So what policy should routers follow?- First, we have to understand what TCP is doing
TCP Congestion Control
• Idea- Assumes best-effort network
- Each source determines network capacity for itself
- Uses implicit feedback (dalay, drops)
- ACKs pace transmission (self-clocking)
• Challenge- Determining the available capacity in the first place
- Adjusting to changes in the available capacity
Detecting congestion
• Question: how does the source determine whetheror not the network is congested?
• Answer: a timeout occurs- Timeout signals that a packet was lost
- Packets are seldom lost due to transmission error
- Lost packet implies congestion
Dealing with congestion
• TCP keeps congestion & flow control windows- Max packets in flight is lesser of two
• After a packet loss, must reduce cong. window- This will control congestion situation
- But how much to reduce?
• Idea: conservation of packets at equilibrium- Want to keep roughly same number of packets in network
- By analogy with water in fixed-size pipe
- Put new packet into network when one exits
How much to reduce window?
• Let’s build a crude model of network- Let Li be load of network (# pkts in contains) at time i
- If network uncongested, roughly constant Li = N
• Now what happens under congestion?- Some fraction γ of packets can’t exit network
- So now Li = N + γ · Li−1, or Li ≈ gi· L0
- Congestion increases exponentially (w. infinite buffers)
• Requires multiplicative decrease of window size- TCP choses to cut window in half
How to use extra capacity?
• Must adjust as extra capacity becomes available- Unlike drops for congestion, no explicit signal
- Instead, try to send slightly faster, see if it works
- So need to increase window when no losses – how much?
• Multiplicative increase- But easier to saturate net than to recover (rush-hour effect)
- Multiplicative so fast, will inevitably lead to saturation
• Additive increase won’t saturate net- So Additive Increase, Multiplicative Decrease, AIMD
Additive IncreaseSource Destination
…
Implementation
• In practice, sending MSS-sized frames- Let window size in bytes be w, should be multiple of MSS
• Increase:- After w ·MSS bytes ACKed, could set w ← w + MSS
- Smoother to increment window on each ACK received:w ← w + MSS ·MSS/w
• Decrease:- After a packet loss, w ← w/2
- But don’t want w < MSS
- So react differently to multiple consecutive losses
- Back-off exponentially (pause with no packets in flight)
AIMD trace
• Window trace produces sawtooth pattern:
60
20
1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
KB
Time (seconds)
70
30
40
50
10
10.0
Slow start
• Question: Where to set w initially?- Should start at 1 MSS (to avoid overloading network)
- But additive ramp-up too slow on fast net
• Start by doubling window each RTT- Then at most will dump one extra window into network
• Slow start? This sounds like fast start?- In contrast to what happened before Jacobson/Karels work
- Sender would dump an entire flow control window into net
• Slow start used in multiple situations- Connection start time & after timeout
Slow start pictureSource Destination
…
Slow start implementation• We are doubling w after each RTT
- But receiving w packets each RTT
- So can set w ← w + MSS on every ack received
• Now implementation has to keep track of threelimits
- AvailableWindow – for flow control
- CongestionThreshold – old congestion window
- CongestionWindow – smaller than threshold during slowstart
• Slow start only up to CongestionThreshold- Remember last value
- When reached, go back to additive increase
Fast retransmit & fast recovery
• Problem: Coarse-grain TCP timeouts- Have to be conservative about RTT
- Net will sit idle while waiting for a timeout
- Worse, TCP intentionally keeps bumping head against limit
• Solution: Fast retransmit- Use 3 duplicate ACKs to trigger retransmission
- If more than one packet was lost, still need timeout
- Else, halve w, but otherwise keep sending
- No need to set w ←MSS and use slow start
Fast retransmit picture
Packet 1
Packet 2
Packet 3
Packet 4
Packet 5
Packet 6
Retransmitpacket 3
ACK 1
ACK 2
ACK 2
ACK 2
ACK 6
ACK 2
Sender Receiver
Before fast retransmit
60
20
1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
KB
Time (seconds)
70
30
40
50
10
With fast retransmit
60
20
1.0 2.0 3.0 4.0 5.0 6.0 7.0
KB
Time (seconds)
70
30
40
50
10
Congestion Avoidance
• TCP’s strategy- Control congestion once it happens
- Repeatedly increase load in an effort to find the point atwhich congestion occurs, and then back off
• Alternative strategy- Predict when congestion is about to happen
- Reduce rate before packets start being discarded
- Call this congestion avoidance, instead of congestioncontrol
• Two possibilities- Host-centric: TCP Vegas
- Router-centric: DECbit and RED Gateways
TCP Vegas
Idea: source watches for some sign that router’s queue is building upand congestion will happen—E.g., RTT grows or sending rate flattens.
60
20
0.5 1.0 1.5 4.0 4.5 6.5 8.0
KB
Time (seconds)
Time (seconds)
70
304050
10
2.0 2.5 3.0 3.5 5.0 5.5 6.0 7.0 7.5 8.5
900
300
100
0.5 1.0 1.5 4.0 4.5 6.5 8.0
Sen
din
g K
Bps
1100
500
700
2.0 2.5 3.0 3.5 5.0 5.5 6.0 7.0 7.5 8.5
Time (seconds)0.5 1.0 1.5 4.0 4.5 6.5 8.0
Queu
e si
ze in r
oute
r
5
10
2.0 2.5 3.0 3.5 5.0 5.5 6.0 7.0 7.5 8.5
TCP Vegas picture
70605040302010
KB
Time (seconds)
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
KB
ps
240
200
160
120
80
40
Time (seconds)
Fair Queuing (FQ)
• Explicitly segregates traffic based on flows
• Ensures no flow consumes more than its share
• Variation: weighted fair queuing (WFQ)
Flow 1
Flow 2
Flow 3
Flow 4
Round-robin
service
FQ Algorithm
• Suppose clock ticks each time a bit is transmitted
• Let Pi denote the length of packet i
• Let Si denote the time when start to transmit packet i
• Let Fi denote the time when finish transmitting packet i
• Fi = Si + Pi
• When does router start transmitting packet i?- If arrived before router finished packet i− 1 from this flow, then
immediately after last bit of i− 1 (Fi−1)
- If no current packets for this flow, then start transmitting whenarrives (call this Ai)
• Thus: Fi = max(Fi−1, Ai) + Pi
FQ Algorithm (cont)
• For multiple flows- Calculate Fi for each packet that arrives on each flow
- Treat all Fis as timestamps
- Next packet to transmit is one with lowest timestamp
• Not perfect: can’t preempt current packet
• Example:
Flow 1 Flow 2
(a) (b)
Output Output
F = 8 F = 10
F = 5
F = 10
F = 2
Flow 1(arriving)
Flow 2(transmitting)
Random Early Detection (RED)
• Notification is implicit- just drop the packet (TCP will timeout)
- could make explicit by marking the packet
• Early random drop- rather than wait for queue to become full, drop each
arriving packet with some drop probability whenever thequeue length exceeds some drop level
RED Details
• Compute average queue lengthAvgLen = (1−Weight) ·AvgLen+Weight ·SampleLen
0 < Weight < 1 (usually 0.002)SampleLen is queue length each time a packetarrives
MaxThreshold MinThreshold
AvgLen
AvgLen
Queue length
Instantaneous
Average
Time
• Smooths out AvgLen over time- Don’t want to react to instantaneous fluctuations
RED Details (cont)
• Two queue length thresholds:
if AvgLen <= MinThreshold then
enqueue the packet
if MinThreshold < AvgLen < MaxThreshold then
calculate probability P
drop arriving packet with probability P
if ManThreshold <= AvgLen then
drop arriving packet
RED Details (cont)
• Computing probability P- TempP = MaxP · (AvgLen−MinThreshold)/(MaxThreshold−
MinThreshold)
- P = TempP/(1− count · TempP)
• Drop Probability Curve:P(drop)
1.0
MaxP
MinThresh MaxThresh
AvgLen
Tuning RED
- Probability of dropping a particular flow’s packet(s) is roughlyproportional to the share of the bandwidth that flow iscurrently getting
- MaxP is typically set to 0.02, meaning that when the averagequeue size is halfway between the two thresholds, the gatewaydrops roughly one out of 50 packets.
- If traffic is bursty, then MinThreshold should be sufficientlylarge to allow link utilization to be maintained at an acceptablyhigh level
- Difference between two thresholds should be larger than thetypical increase in the calculated average queue length in oneRTT; setting MaxThreshold to twice MinThreshold isreasonable for traffic on today’s Internet
FPQ
• Problem: Tuning RED can be slightly tricky
• Observations:- TCP performs badly with window size under 4 packets:
Need 4 packets for 3 duplicate ACKs and fast retransmit
- Can supply feedback through delay as well as through drops
• Solution: Make buffer size proportional to #flows- Few flows =⇒ low delay; Many flows =⇒ low loss rate
- Router automatically adjusts, far less tricky tuning required
- Window size is a function of loss rate, keep min size
- Transmit rate = Window size / RTT, RTT ∼ Qlen
• Clever algorithm estimates number of flows- Hash flow info, set bits, decay
- Requires reasonable amount of storage
XCP
• New proposed IP protocol: XCP- Not compatible w. TCP, requires router support
- Idea: Have router tell us exactly what we want to know!
• Packets contain: cwnd, RTT, feedback field
• Router tells you whether to increase or decrease rate- Give explicit rates for increase/decrease amounts
- Later routers don’t override bottleneck router
- Feedback returned to sender in ACKs