37
Congestion Destination 1.5-Mbps T1 link Router Source 2 Source 1 100-Mbps FDDI 10-Mbps Ethernet Can’t sustain input rate >> output rate Issues: - Avoid congestion - Control congestion - Prioritize who gets limited resources

Congestion - scs.stanford.edu

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Congestion - scs.stanford.edu

Congestion

Destination1.5-Mbps T1 link

Router

Source2

Source1

100-Mbps FDDI

10-Mbps Ethernet

• Can’t sustain input rate >> output rate

• Issues:- Avoid congestion

- Control congestion

- Prioritize who gets limited resources

Page 2: Congestion - scs.stanford.edu

Taxonomy of approaches

• Router-centric vs. host-centric- hosts at the edges of the network (transport protocol)

- routers inside the network (queuing discipline)

• Reservation based vs. feedback based- pre-allocate resources so at to avoid congestion

- control congestion if (and when) is occurs

• Window based vs. rate based

• Best-effort (today) vs. multiple QoS (Thursday)

Page 3: Congestion - scs.stanford.edu

Router design issues

• Scheduling discipline- Which of multiple packets should you send next?

- May want to achieve some notion of fairness

- May want some packets to have priority

• Drop policy- When should you discard a packet?

- Which packet to discard?

- Some packets more important (perhaps BGP)

- Some packets useless w/o others (cells in AAL5 CS-PDU)

- Need to balance throughput & delay

Page 4: Congestion - scs.stanford.edu

Example: FIFO tail dropArrivingpacket

Next freebuffer

Free buffers Queued packets

Next totransmit

(a)

Arrivingpacket

Next totransmit

(b) Drop

• Differentiates packets only by when they arrive

• Might not provide useful feedback for sending hosts

Page 5: Congestion - scs.stanford.edu

What to optimize for?

• Fairness (in two slides)

• High throughput – queue should never be empty

• Low delay – so want short queues

• Crude combination: power = Throughput/Delay- Want to convince hosts to offer optimal load

Optimalload

Load

Thro

ughp

ut/

del

ay

Page 6: Congestion - scs.stanford.edu

Connectionless flows

Router

Source

2

Source

1

Source

3

Router

Router

Destination

2

Destination

1

• Even in Internet, routers can have a notion of flows- E.g., base on IP addresses & TCP ports (or hash of those)

- Soft state—doesn’t have to be correct

- But if often correct, can use to form router policies

Page 7: Congestion - scs.stanford.edu

Fairness

• What is fair in this situation?- Each flow gets 1/2 link b/w? Long flow gets less?

• Usually fair means equal- For flow bandwidths (x1, . . . , xn), fairness index:

f(x1, . . . , xn) =(∑

n

i=1xi)

2

n∑

n

i=1x2

i

- If all xis are equal, fairness is one

• So what policy should routers follow?- First, we have to understand what TCP is doing

Page 8: Congestion - scs.stanford.edu

TCP Congestion Control

• Idea- Assumes best-effort network

- Each source determines network capacity for itself

- Uses implicit feedback (dalay, drops)

- ACKs pace transmission (self-clocking)

• Challenge- Determining the available capacity in the first place

- Adjusting to changes in the available capacity

Page 9: Congestion - scs.stanford.edu

Detecting congestion

• Question: how does the source determine whetheror not the network is congested?

• Answer: a timeout occurs- Timeout signals that a packet was lost

- Packets are seldom lost due to transmission error

- Lost packet implies congestion

Page 10: Congestion - scs.stanford.edu

Dealing with congestion

• TCP keeps congestion & flow control windows- Max packets in flight is lesser of two

• After a packet loss, must reduce cong. window- This will control congestion situation

- But how much to reduce?

• Idea: conservation of packets at equilibrium- Want to keep roughly same number of packets in network

- By analogy with water in fixed-size pipe

- Put new packet into network when one exits

Page 11: Congestion - scs.stanford.edu

How much to reduce window?

• Let’s build a crude model of network- Let Li be load of network (# pkts in contains) at time i

- If network uncongested, roughly constant Li = N

• Now what happens under congestion?- Some fraction γ of packets can’t exit network

- So now Li = N + γ · Li−1, or Li ≈ gi· L0

- Congestion increases exponentially (w. infinite buffers)

• Requires multiplicative decrease of window size- TCP choses to cut window in half

Page 12: Congestion - scs.stanford.edu

How to use extra capacity?

• Must adjust as extra capacity becomes available- Unlike drops for congestion, no explicit signal

- Instead, try to send slightly faster, see if it works

- So need to increase window when no losses – how much?

• Multiplicative increase- But easier to saturate net than to recover (rush-hour effect)

- Multiplicative so fast, will inevitably lead to saturation

• Additive increase won’t saturate net- So Additive Increase, Multiplicative Decrease, AIMD

Page 13: Congestion - scs.stanford.edu

Additive IncreaseSource Destination

Page 14: Congestion - scs.stanford.edu

Implementation

• In practice, sending MSS-sized frames- Let window size in bytes be w, should be multiple of MSS

• Increase:- After w ·MSS bytes ACKed, could set w ← w + MSS

- Smoother to increment window on each ACK received:w ← w + MSS ·MSS/w

• Decrease:- After a packet loss, w ← w/2

- But don’t want w < MSS

- So react differently to multiple consecutive losses

- Back-off exponentially (pause with no packets in flight)

Page 15: Congestion - scs.stanford.edu

AIMD trace

• Window trace produces sawtooth pattern:

60

20

1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0

KB

Time (seconds)

70

30

40

50

10

10.0

Page 16: Congestion - scs.stanford.edu

Slow start

• Question: Where to set w initially?- Should start at 1 MSS (to avoid overloading network)

- But additive ramp-up too slow on fast net

• Start by doubling window each RTT- Then at most will dump one extra window into network

• Slow start? This sounds like fast start?- In contrast to what happened before Jacobson/Karels work

- Sender would dump an entire flow control window into net

• Slow start used in multiple situations- Connection start time & after timeout

Page 17: Congestion - scs.stanford.edu

Slow start pictureSource Destination

Page 18: Congestion - scs.stanford.edu

Slow start implementation• We are doubling w after each RTT

- But receiving w packets each RTT

- So can set w ← w + MSS on every ack received

• Now implementation has to keep track of threelimits

- AvailableWindow – for flow control

- CongestionThreshold – old congestion window

- CongestionWindow – smaller than threshold during slowstart

• Slow start only up to CongestionThreshold- Remember last value

- When reached, go back to additive increase

Page 19: Congestion - scs.stanford.edu

Fast retransmit & fast recovery

• Problem: Coarse-grain TCP timeouts- Have to be conservative about RTT

- Net will sit idle while waiting for a timeout

- Worse, TCP intentionally keeps bumping head against limit

• Solution: Fast retransmit- Use 3 duplicate ACKs to trigger retransmission

- If more than one packet was lost, still need timeout

- Else, halve w, but otherwise keep sending

- No need to set w ←MSS and use slow start

Page 20: Congestion - scs.stanford.edu

Fast retransmit picture

Packet 1

Packet 2

Packet 3

Packet 4

Packet 5

Packet 6

Retransmitpacket 3

ACK 1

ACK 2

ACK 2

ACK 2

ACK 6

ACK 2

Sender Receiver

Page 21: Congestion - scs.stanford.edu

Before fast retransmit

60

20

1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0

KB

Time (seconds)

70

30

40

50

10

Page 22: Congestion - scs.stanford.edu

With fast retransmit

60

20

1.0 2.0 3.0 4.0 5.0 6.0 7.0

KB

Time (seconds)

70

30

40

50

10

Page 23: Congestion - scs.stanford.edu

Congestion Avoidance

• TCP’s strategy- Control congestion once it happens

- Repeatedly increase load in an effort to find the point atwhich congestion occurs, and then back off

• Alternative strategy- Predict when congestion is about to happen

- Reduce rate before packets start being discarded

- Call this congestion avoidance, instead of congestioncontrol

• Two possibilities- Host-centric: TCP Vegas

- Router-centric: DECbit and RED Gateways

Page 24: Congestion - scs.stanford.edu

TCP Vegas

Idea: source watches for some sign that router’s queue is building upand congestion will happen—E.g., RTT grows or sending rate flattens.

60

20

0.5 1.0 1.5 4.0 4.5 6.5 8.0

KB

Time (seconds)

Time (seconds)

70

304050

10

2.0 2.5 3.0 3.5 5.0 5.5 6.0 7.0 7.5 8.5

900

300

100

0.5 1.0 1.5 4.0 4.5 6.5 8.0

Sen

din

g K

Bps

1100

500

700

2.0 2.5 3.0 3.5 5.0 5.5 6.0 7.0 7.5 8.5

Time (seconds)0.5 1.0 1.5 4.0 4.5 6.5 8.0

Queu

e si

ze in r

oute

r

5

10

2.0 2.5 3.0 3.5 5.0 5.5 6.0 7.0 7.5 8.5

Page 25: Congestion - scs.stanford.edu

TCP Vegas picture

Page 26: Congestion - scs.stanford.edu

70605040302010

KB

Time (seconds)

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0

KB

ps

240

200

160

120

80

40

Time (seconds)

Page 27: Congestion - scs.stanford.edu

Fair Queuing (FQ)

• Explicitly segregates traffic based on flows

• Ensures no flow consumes more than its share

• Variation: weighted fair queuing (WFQ)

Flow 1

Flow 2

Flow 3

Flow 4

Round-robin

service

Page 28: Congestion - scs.stanford.edu

FQ Algorithm

• Suppose clock ticks each time a bit is transmitted

• Let Pi denote the length of packet i

• Let Si denote the time when start to transmit packet i

• Let Fi denote the time when finish transmitting packet i

• Fi = Si + Pi

• When does router start transmitting packet i?- If arrived before router finished packet i− 1 from this flow, then

immediately after last bit of i− 1 (Fi−1)

- If no current packets for this flow, then start transmitting whenarrives (call this Ai)

• Thus: Fi = max(Fi−1, Ai) + Pi

Page 29: Congestion - scs.stanford.edu

FQ Algorithm (cont)

• For multiple flows- Calculate Fi for each packet that arrives on each flow

- Treat all Fis as timestamps

- Next packet to transmit is one with lowest timestamp

• Not perfect: can’t preempt current packet

• Example:

Flow 1 Flow 2

(a) (b)

Output Output

F = 8 F = 10

F = 5

F = 10

F = 2

Flow 1(arriving)

Flow 2(transmitting)

Page 30: Congestion - scs.stanford.edu

Random Early Detection (RED)

• Notification is implicit- just drop the packet (TCP will timeout)

- could make explicit by marking the packet

• Early random drop- rather than wait for queue to become full, drop each

arriving packet with some drop probability whenever thequeue length exceeds some drop level

Page 31: Congestion - scs.stanford.edu

RED Details

• Compute average queue lengthAvgLen = (1−Weight) ·AvgLen+Weight ·SampleLen

0 < Weight < 1 (usually 0.002)SampleLen is queue length each time a packetarrives

MaxThreshold MinThreshold

AvgLen

Page 32: Congestion - scs.stanford.edu

AvgLen

Queue length

Instantaneous

Average

Time

• Smooths out AvgLen over time- Don’t want to react to instantaneous fluctuations

Page 33: Congestion - scs.stanford.edu

RED Details (cont)

• Two queue length thresholds:

if AvgLen <= MinThreshold then

enqueue the packet

if MinThreshold < AvgLen < MaxThreshold then

calculate probability P

drop arriving packet with probability P

if ManThreshold <= AvgLen then

drop arriving packet

Page 34: Congestion - scs.stanford.edu

RED Details (cont)

• Computing probability P- TempP = MaxP · (AvgLen−MinThreshold)/(MaxThreshold−

MinThreshold)

- P = TempP/(1− count · TempP)

• Drop Probability Curve:P(drop)

1.0

MaxP

MinThresh MaxThresh

AvgLen

Page 35: Congestion - scs.stanford.edu

Tuning RED

- Probability of dropping a particular flow’s packet(s) is roughlyproportional to the share of the bandwidth that flow iscurrently getting

- MaxP is typically set to 0.02, meaning that when the averagequeue size is halfway between the two thresholds, the gatewaydrops roughly one out of 50 packets.

- If traffic is bursty, then MinThreshold should be sufficientlylarge to allow link utilization to be maintained at an acceptablyhigh level

- Difference between two thresholds should be larger than thetypical increase in the calculated average queue length in oneRTT; setting MaxThreshold to twice MinThreshold isreasonable for traffic on today’s Internet

Page 36: Congestion - scs.stanford.edu

FPQ

• Problem: Tuning RED can be slightly tricky

• Observations:- TCP performs badly with window size under 4 packets:

Need 4 packets for 3 duplicate ACKs and fast retransmit

- Can supply feedback through delay as well as through drops

• Solution: Make buffer size proportional to #flows- Few flows =⇒ low delay; Many flows =⇒ low loss rate

- Router automatically adjusts, far less tricky tuning required

- Window size is a function of loss rate, keep min size

- Transmit rate = Window size / RTT, RTT ∼ Qlen

• Clever algorithm estimates number of flows- Hash flow info, set bits, decay

- Requires reasonable amount of storage

Page 37: Congestion - scs.stanford.edu

XCP

• New proposed IP protocol: XCP- Not compatible w. TCP, requires router support

- Idea: Have router tell us exactly what we want to know!

• Packets contain: cwnd, RTT, feedback field

• Router tells you whether to increase or decrease rate- Give explicit rates for increase/decrease amounts

- Later routers don’t override bottleneck router

- Feedback returned to sender in ACKs