Congestion - scs.stanford.edu

Congestion

Destination1.5-Mbps T1 link

Router

Source2

Source1

100-Mbps FDDI

10-Mbps Ethernet

• Can’t sustain input rate >> output rate

• Issues:- Avoid congestion

- Control congestion

- Prioritize who gets limited resources

Taxonomy of approaches

• Router-centric vs. host-centric- hosts at the edges of the network (transport protocol)

- routers inside the network (queuing discipline)

• Reservation based vs. feedback based- pre-allocate resources so at to avoid congestion

- control congestion if (and when) is occurs

• Window based vs. rate based

• Best-effort (today) vs. multiple QoS (Thursday)

Router design issues

• Scheduling discipline- Which of multiple packets should you send next?

- May want to achieve some notion of fairness

- May want some packets to have priority

• Drop policy- When should you discard a packet?

- Which packet to discard?

- Some packets more important (perhaps BGP)

- Some packets useless w/o others (cells in AAL5 CS-PDU)

- Need to balance throughput & delay

Example: FIFO tail dropArrivingpacket

Next freebuffer

Free buffers Queued packets

Next totransmit

(a)

Arrivingpacket

Next totransmit

(b) Drop

• Differentiates packets only by when they arrive

• Might not provide useful feedback for sending hosts

What to optimize for?

• Fairness (in two slides)

• High throughput – queue should never be empty

• Low delay – so want short queues

• Crude combination: power = Throughput/Delay- Want to convince hosts to offer optimal load

Optimalload

Load

Thro

ughp

ut/

del

ay

Connectionless flows

Router

Source

2

Source

1

Source

3

Router

Router

Destination

2

Destination

1

• Even in Internet, routers can have a notion of flows- E.g., base on IP addresses & TCP ports (or hash of those)

- Soft state—doesn’t have to be correct

- But if often correct, can use to form router policies

Fairness

• What is fair in this situation?- Each flow gets 1/2 link b/w? Long flow gets less?

• Usually fair means equal- For flow bandwidths (x1, . . . , xn), fairness index:

f(x1, . . . , xn) =(∑

n

i=1xi)

2

n∑

n

i=1x2

i

- If all xis are equal, fairness is one

• So what policy should routers follow?- First, we have to understand what TCP is doing

TCP Congestion Control

• Idea- Assumes best-effort network

- Each source determines network capacity for itself

- Uses implicit feedback (dalay, drops)

- ACKs pace transmission (self-clocking)

• Challenge- Determining the available capacity in the first place

- Adjusting to changes in the available capacity

Detecting congestion

• Question: how does the source determine whetheror not the network is congested?

• Answer: a timeout occurs- Timeout signals that a packet was lost

- Packets are seldom lost due to transmission error

- Lost packet implies congestion

Dealing with congestion

• TCP keeps congestion & flow control windows- Max packets in flight is lesser of two

• After a packet loss, must reduce cong. window- This will control congestion situation

- But how much to reduce?

• Idea: conservation of packets at equilibrium- Want to keep roughly same number of packets in network

- By analogy with water in fixed-size pipe

- Put new packet into network when one exits

How much to reduce window?

• Let’s build a crude model of network- Let Li be load of network (# pkts in contains) at time i

- If network uncongested, roughly constant Li = N

• Now what happens under congestion?- Some fraction γ of packets can’t exit network

- So now Li = N + γ · Li−1, or Li ≈ gi· L0

- Congestion increases exponentially (w. infinite buffers)

• Requires multiplicative decrease of window size- TCP choses to cut window in half

How to use extra capacity?

• Must adjust as extra capacity becomes available- Unlike drops for congestion, no explicit signal

- Instead, try to send slightly faster, see if it works

- So need to increase window when no losses – how much?

• Multiplicative increase- But easier to saturate net than to recover (rush-hour effect)

- Multiplicative so fast, will inevitably lead to saturation

• Additive increase won’t saturate net- So Additive Increase, Multiplicative Decrease, AIMD

Additive IncreaseSource Destination

…

Implementation

• In practice, sending MSS-sized frames- Let window size in bytes be w, should be multiple of MSS

• Increase:- After w ·MSS bytes ACKed, could set w ← w + MSS

- Smoother to increment window on each ACK received:w ← w + MSS ·MSS/w

• Decrease:- After a packet loss, w ← w/2

- But don’t want w < MSS

- So react differently to multiple consecutive losses

- Back-off exponentially (pause with no packets in flight)

AIMD trace

• Window trace produces sawtooth pattern:

60

20

1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0

KB

Time (seconds)

70

30

40

50

10

10.0

Slow start

• Question: Where to set w initially?- Should start at 1 MSS (to avoid overloading network)

- But additive ramp-up too slow on fast net

• Start by doubling window each RTT- Then at most will dump one extra window into network

• Slow start? This sounds like fast start?- In contrast to what happened before Jacobson/Karels work

- Sender would dump an entire flow control window into net

• Slow start used in multiple situations- Connection start time & after timeout

Slow start pictureSource Destination

…

Slow start implementation• We are doubling w after each RTT

- But receiving w packets each RTT

- So can set w ← w + MSS on every ack received

• Now implementation has to keep track of threelimits

- AvailableWindow – for flow control

- CongestionThreshold – old congestion window

- CongestionWindow – smaller than threshold during slowstart

• Slow start only up to CongestionThreshold- Remember last value

- When reached, go back to additive increase

Fast retransmit & fast recovery

• Problem: Coarse-grain TCP timeouts- Have to be conservative about RTT

- Net will sit idle while waiting for a timeout

- Worse, TCP intentionally keeps bumping head against limit

• Solution: Fast retransmit- Use 3 duplicate ACKs to trigger retransmission

- If more than one packet was lost, still need timeout

- Else, halve w, but otherwise keep sending

- No need to set w ←MSS and use slow start

Fast retransmit picture

Packet 1

Packet 2

Packet 3

Packet 4

Packet 5

Packet 6

Retransmitpacket 3

ACK 1

ACK 2

ACK 2

ACK 2

ACK 6

ACK 2

Sender Receiver

Before fast retransmit

60

20

1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0

KB

Time (seconds)

70

30

40

50

10

With fast retransmit

60

20

1.0 2.0 3.0 4.0 5.0 6.0 7.0

KB

Time (seconds)

70

30

40

50

10

Congestion Avoidance

• TCP’s strategy- Control congestion once it happens

- Repeatedly increase load in an effort to find the point atwhich congestion occurs, and then back off

• Alternative strategy- Predict when congestion is about to happen

- Reduce rate before packets start being discarded

- Call this congestion avoidance, instead of congestioncontrol

• Two possibilities- Host-centric: TCP Vegas

- Router-centric: DECbit and RED Gateways

TCP Vegas

Idea: source watches for some sign that router’s queue is building upand congestion will happen—E.g., RTT grows or sending rate flattens.

60

20

0.5 1.0 1.5 4.0 4.5 6.5 8.0

KB

Time (seconds)

Time (seconds)

70

304050

10

2.0 2.5 3.0 3.5 5.0 5.5 6.0 7.0 7.5 8.5

900

300

100

0.5 1.0 1.5 4.0 4.5 6.5 8.0

Sen

din

g K

Bps

1100

500

700

2.0 2.5 3.0 3.5 5.0 5.5 6.0 7.0 7.5 8.5

Time (seconds)0.5 1.0 1.5 4.0 4.5 6.5 8.0

Queu

e si

ze in r

oute

r

5

10

2.0 2.5 3.0 3.5 5.0 5.5 6.0 7.0 7.5 8.5

TCP Vegas picture

70605040302010

KB

Time (seconds)

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0

KB

ps

240

200

160

120

80

40

Time (seconds)

Fair Queuing (FQ)

• Explicitly segregates traffic based on flows

• Ensures no flow consumes more than its share

• Variation: weighted fair queuing (WFQ)

Flow 1

Flow 2

Flow 3

Flow 4

Round-robin

service

FQ Algorithm

• Suppose clock ticks each time a bit is transmitted

• Let Pi denote the length of packet i

• Let Si denote the time when start to transmit packet i

• Let Fi denote the time when finish transmitting packet i

• Fi = Si + Pi

• When does router start transmitting packet i?- If arrived before router finished packet i− 1 from this flow, then

immediately after last bit of i− 1 (Fi−1)

- If no current packets for this flow, then start transmitting whenarrives (call this Ai)

• Thus: Fi = max(Fi−1, Ai) + Pi

FQ Algorithm (cont)

• For multiple flows- Calculate Fi for each packet that arrives on each flow

- Treat all Fis as timestamps

- Next packet to transmit is one with lowest timestamp

• Not perfect: can’t preempt current packet

• Example:

Flow 1 Flow 2

(a) (b)

Output Output

F = 8 F = 10

F = 5

F = 10

F = 2

Flow 1(arriving)

Flow 2(transmitting)

Random Early Detection (RED)

• Notification is implicit- just drop the packet (TCP will timeout)

- could make explicit by marking the packet

• Early random drop- rather than wait for queue to become full, drop each

arriving packet with some drop probability whenever thequeue length exceeds some drop level

RED Details

• Compute average queue lengthAvgLen = (1−Weight) ·AvgLen+Weight ·SampleLen

0 < Weight < 1 (usually 0.002)SampleLen is queue length each time a packetarrives

MaxThreshold MinThreshold

AvgLen

AvgLen

Queue length

Instantaneous

Average

Time

• Smooths out AvgLen over time- Don’t want to react to instantaneous fluctuations

RED Details (cont)

• Two queue length thresholds:

if AvgLen <= MinThreshold then

enqueue the packet

if MinThreshold < AvgLen < MaxThreshold then

calculate probability P

drop arriving packet with probability P

if ManThreshold <= AvgLen then

drop arriving packet

RED Details (cont)

• Computing probability P- TempP = MaxP · (AvgLen−MinThreshold)/(MaxThreshold−

MinThreshold)

- P = TempP/(1− count · TempP)

• Drop Probability Curve:P(drop)

1.0

MaxP

MinThresh MaxThresh

AvgLen

Tuning RED

- Probability of dropping a particular flow’s packet(s) is roughlyproportional to the share of the bandwidth that flow iscurrently getting

- MaxP is typically set to 0.02, meaning that when the averagequeue size is halfway between the two thresholds, the gatewaydrops roughly one out of 50 packets.

- If traffic is bursty, then MinThreshold should be sufficientlylarge to allow link utilization to be maintained at an acceptablyhigh level

- Difference between two thresholds should be larger than thetypical increase in the calculated average queue length in oneRTT; setting MaxThreshold to twice MinThreshold isreasonable for traffic on today’s Internet

FPQ

• Problem: Tuning RED can be slightly tricky

• Observations:- TCP performs badly with window size under 4 packets:

Need 4 packets for 3 duplicate ACKs and fast retransmit

- Can supply feedback through delay as well as through drops

• Solution: Make buffer size proportional to #flows- Few flows =⇒ low delay; Many flows =⇒ low loss rate

- Router automatically adjusts, far less tricky tuning required

- Window size is a function of loss rate, keep min size

- Transmit rate = Window size / RTT, RTT ∼ Qlen

• Clever algorithm estimates number of flows- Hash flow info, set bits, decay

- Requires reasonable amount of storage

XCP

• New proposed IP protocol: XCP- Not compatible w. TCP, requires router support

- Idea: Have router tell us exactly what we want to know!

• Packets contain: cwnd, RTT, feedback field

• Router tells you whether to increase or decrease rate- Give explicit rates for increase/decrease amounts

- Later routers don’t override bottleneck router

- Feedback returned to sender in ACKs

Documents

Congestion - scs.stanford.edu