Congestion Control

Congestion Control

EE 122: Intro to Communication Networks

Fall 2010 (MW 4-5:30 in 101 Barker)

Scott Shenker

TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev, Prayag Narula

http://inst.eecs.berkeley.edu/~ee122/

Materials with thanks to Jennifer Rexford, Ion Stoica, Vern Paxsonand other colleagues at Princeton and UC Berkeley

2

Goals of Today’s Lecture

• Congestion Control: Overview– End system adaptation controlling network load

• Congestion Control: Some Details– Additive-increase, multiplicative-decrease (AIMD)– How to begin transmitting: Slow Start (really “fast start”)

• Why AIMD?

• Listing of Additional Congestion Control Topics

Will repeat myself several times in this lecture

Easy to think you understand congestion control

Trust me, you don’t……because I don’t either!

3

Course So Far….

We know:

• How to process packets in a switch

• How to route packets in the network

• How to send packets reliably

• How to not overload a receiver

• How to split infinitives……

We don’t know:

• How to not overload the network!

4

Congestion Control Overview

5

It’s Not Just The Sender & Receiver

• Flow control keeps one fast sender from overwhelming a slow receiver

• Congestion control keeps a set of senders from overloading the network

• Huge academic literature on congestion control– Advances in mid-80s by Jacobson “saved” the Internet– Well-defined performance question: easy to write papers– Less of a focus now (bottlenecked access links?)…– …but still far from academically settled

o E.g. battle over “fairness” with Bob Briscoe…

• Because Internet traffic is bursty!

• If two packets arrive at the same time– The node can only transmit one– … and either buffers or drops the other

• If many packets arrive in a short period of time– The node cannot keep up with the arriving traffic– … delays, and the buffer may eventually overflow

6

Congestion is Natural

7

Load and Delay

AveragePacket delay

Load

Typical behavior of queuing systems with bursty arrivals:

Power

Load

Ideal: low delays and high utilizationReality: must balance the two

Maximizing “power” is an example

“optimalload”

Question: How do we balance utilization and delay?

8

Dynamic Adjustment by End systems

• Probe network to test level of congestion– Speed up when no congestion– Slow down when congestion

• Drawbacks:– Suboptimal (always above or below optimal point)– Messy dynamics– Relies on end system cooperation

• Might seem complicated to implement– But algorithms are pretty simple (Jacobson/Karels ‘88)– Understanding global behavior is not simple….

9

Basics of TCP Congestion Control

• Each source determines the available capacity– … so it knows how many packets to have in flight

• Congestion window (CWND)– Maximum # of unacknowledged bytes to have in flight– Congestion-control equivalent of receiver window– MaxWindow = min{congestion window, receiver window}– Send at the rate of the slowest component

• Adapting the congestion window– Decrease upon detecting congestion– Increase upon lack of congestion: optimistic exploration

• Note: TCP congestion control done only by end systems, not by mechanisms inside the network

10

Detecting Congestion

• How can a TCP sender determine that the network is under stress?

• Network could tell it (ICMP Source Quench)– Risky, because during times of overload the signal itself

could be dropped (and add to congestion)!

• Packet delays go up (knee of load-delay curve)– Tricky: noisy signal (delay often varies considerably)

• Packet loss– Fail-safe signal that TCP already has to detect– Complication: non-congestive loss (checksum errors)

11

How to Adjust CWND?

• Increase linearly, decrease multiplicatively (AIMD)– Consequences of over-sized window much worse than

having an under-sized windowo Over-sized window: packets dropped and retransmittedo Under-sized window: somewhat lower throughput

– Will come back to this later

• Additive increase– On success for last window of data, increase linearly

o TCP uses an increase of one packet (MSS) per RTT

• Multiplicative decrease– On loss of packet, TCP divides congestion window in half

12

Leads to the TCP “Sawtooth”

t

Window

halved

Loss

13

AIMD Starts Too Slowly!

t

Window

It could take a long time to get started!

Need to start with a small CWND to avoid overloading the network.

14

“Slow Start” Phase

• Start with a small congestion window–Initially, CWND is 1 MSS–So, initial sending rate is MSS/RTT

• That could be pretty wasteful–Might be much less than the actual bandwidth–Linear increase takes a long time to accelerate

• Slow-start phase (actually “fast start”)–Sender starts at a slow rate (hence the name)–… but increases the rate exponentially–… until the first loss event

15

Slow Start in Action

Double CWND per round-trip time

Simple implementation:on each ack, CWND += MSS

D A D D A A D D

Src

Dest

D D

1 2 43

A A A A

8

• Each ACK releases two packets

cwnd = 1

cwnd = 2

16

Another Picture of Slow-Start

segment 1

segment 2segment 3

cwnd = 4 segment 4

segment 5segment 6segment 7

cwnd = 7

cwnd = 3

cwnd = 8

cwnd = 6

cwnd = 5

17

Slow Start and the TCP Sawtooth

Loss

Exponential“slow start”

t

Window

Why is it called slow-start? Because TCP originally hadno congestion control mechanism. The source would just

start by sending a whole window’s worth of data.

18

This has been incredibly successful

• Leads to the theoretical puzzle:

If TCP congestion control is the answer,

then what was the question?

• Not about optimizing, but about robustness– Hard to capture…

19

Congestion Control Details

20

Increasing CWND

• Increase by MSS for every successful window

• Increase a fraction of MSS per received ACK

• # packets (thus ACKs) per window: CWND / MSS

• Increment per ACK:

CWND += MSS * (MSS / CWND)

• Termed: Congestion Avoidance– Very gentle increase

Detecting Packet Loss

• If retransmitting, already assuming packet loss– Two criteria for retransmissions

• Packet can timeout (RTO timer expires)– But RTO is often ~ 500msec

• Fast retransmit– When same packet gets acked 4 times (3 DupACKs)

• Congestion control treats these events differently

21

22

Fast Retransmission

• Packet n is lost, but packets n+1, n+2, …, arrive

• On each arrival of a packet not in sequence, receiver generates an ACK– ACK is for seq.no. just beyond highest in-sequence

• So as n+1, n+2, … arrive, receiver generates repeated ACKs for seq.no. n– “duplicate” acknowledgments since they look the same

• Sender sees 3 of these dupACKs and immediately retransmits packet n (and only n)

• Multiplicative decrease and keep going– CWND halved

23

CWND with Fast Retransmit

segment 1cwnd = 1

ACK 2

cwnd = 2 segment 2segment 3

ACK 4

cwnd = 4 segment 4segment 5segment 6segment 7

ACK 4

ACK 4

ACK 3cwnd = 3

ACK 4 segment 4

3 duplicateACKs

cwnd = 2

24

Loss Detected by Timeout

• Sender starts a timer that runs for RTO seconds

• Every time ack for new data arrives, restart timer

• If timer expires:– Set SSTHRESH CWND / 2 (“Slow-Start Threshold”)– Set CWND MSS (avoid a burst)– Retransmit first lost packet– Execute Slow Start until CWND > SSTHRESH– After which switch to Additive Increase

25

Summary of Decrease

• Cut CWND half on loss detected by NACK – “fast retransmit”

• Cut CWND all the way to 1 MSS on timeout– Set ssthresh to cwnd/2

• Never drop CWND below 1 MSS

Summary of Increase

• “Slow-start”: increase cwnd by MSS for each ack

• Leave slow-start regime when either:– cwnd > SSThresh– Packet drop

• Enter AIMD regime– Increase by MSS for each window’s worth of acked data

26

27

Repeating Slow Start After Timeout

t

Window

Slow-start restart: Go back to CWND of 1 MSS, but take advantage of knowing the previous value of CWND.

Slow start in operation until it reaches half of previous CWND, I.e.,

SSTHRESH

TimeoutFast

Retransmission

SSThreshSet to Here

28

Slow Start/AIMD Pseudocode (Not FR)

Initially:cwnd = MSS;ssthresh = infinite;

New ack received:if (cwnd < ssthresh) /* Slow Start*/ cwnd = cwnd + MSS;else /* Congestion Avoidance */ cwnd = cwnd + MSS/cwnd;

Timeout:/* Multiplicative decrease */ssthresh = cwnd/2;cwnd = MSS;

29

5 Minute Break

Questions Before We Proceed?

Announcements

• No office hours today

• HW3a grades released

• On Monday Igor will review Networking Libraries– For 45 minutes

• Then I will cover Overlays and P2P….

30

31

Why AIMD?

In what follows refer to cwnd in units of MSS

Three Congestion Control Challenges

• Single flow adjusting to bottleneck bandwidth– Without any a priori knowledge– Could be a Gbps link; could be a modem

• Single flow adjusting to variations in bandwidth– When bandwidth decreases, must lower sending rate– When bandwidth increases, must increase sending rate

• Multiple flows sharing the bandwidth– Must avoid overloading network– And share bandwidth “fairly” among the flows

32

33

Problem #1: Single Flow, Fixed BW• Want to get a first-order estimate of the available

bandwidth– Assume bandwidth is fixed– Ignore presence of other flows

• Want to start slow, but rapidly increase rate until packet drop occurs (“slow-start”)

• Adjustment: – cwnd initially set to 1 (MSS)– cwnd++ upon receipt of ACK

34

Problems with Slow-Start

• Slow-start can result in many losses– Roughly the size of cwnd ~ BW*RTT

• Example:– At some point, cwnd is enough to fill “pipe”– After another RTT, cwnd is double its previous value– All the excess packets are dropped!

• Need a more gentle adjustment algorithm once have rough estimate of bandwidth

Problem #2: Single Flow, Varying BW

Want to track available bandwidth

• Oscillate around its current value

• If you never send more than your current rate, you won’t know if more bandwidth is available

Possible variations: (in terms of change per RTT)

• Multiplicative increase or decrease: cwnd cwnd * / a

• Additive increase or decrease: cwnd cwnd +- b 35

Four alternatives

• AIAD: gentle increase, gentle decrease

• AIMD: gentle increase, drastic decrease

• MIAD: drastic increase, gentle decrease– too many losses: eliminate

• MIMD: drastic increase and decrease

36

Why AIMD?

• What’s wrong with AIAD?

• What’s wrong with MIMD?

37

38

Problem #3: Multiple Flows

• Want steady state to be “fair”

• Many notions of fairness, but here just require two identical flows to end up with the same bandwidth

• This eliminates MIMD and AIAD– As we shall see…

• AIMD is the only remaining solution!– Not really, but close enough….

39

Buffer and Window Dynamics

• No congestion x increases by one packet/RTT every RTT

• Congestion decrease x by factor 2

A BC = 50 pkts/RTT

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

Backlog in router (pkts)Congested if > 20

Rate (pkts/RTT)

x

40

AIMD Sharing Dynamics

A Bx1

D E

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

No congestion rate increases by one packet/RTT every RTT Congestion decrease rate by factor 2

Rates equalize fair share

x2

41

AIAD Sharing Dynamics

A Bx1

D E No congestion x increases by one packet/RTT every RTT Congestion decrease x by 1

0

10

20

30

40

50

60

1 28 55 82 109

136

163

190

217

244

271

298

325

352

379

406

433

460

487

x2

42

Simple Model of Congestion Control

• Two TCP connections– Rates x1 and x2

• Congestion when sum>1

• Efficiency: sum near 1

• Fairness: x’s convergeUser 1: x1

Use

r 2:

x2

Efficiencyline

2 user example

overload

underload

43

Example

User 1: x1

Use

r 2:

x2

fairnessline

efficiencyline

1

1

• Total bandwidth 1

Inefficient: x1+x2=0.7

(0.2, 0.5)

Congested: x1+x2=1.2

(0.7, 0.5)

Efficient: x1+x2=1Not fair

(0.7, 0.3)

Efficient: x1+x2=1Fair

(0.5, 0.5)

44

AIAD

User 1: x1

Use

r 2:

x2

fairnessline

efficiencyline

(x1h,x2h)

(x1h-aD,x2h-aD)

(x1h-aD+aI),x2h-aD+aI))• Increase: x + aI

• Decrease: x - aD

• Does not converge to fairness

45

MIMD

User 1: x1

Use

r 2:

x2

fairnessline

efficiencyline

(x1h,x2h)

(bdx1h,bdx2h)

(bIbDx1h,bIbDx2h)

• Increase: x*bI

• Decrease: x*bD

• Does not converge to fairness

46

(bDx1h+aI,bDx2h+aI)

AIMD

User 1: x1

Use

r 2:

x2

fairnessline

efficiencyline

(x1h,x2h)

(bDx1h,bDx2h)

• Increase: x+aD

• Decrease: x*bD

• Converges to fairness

47

Other Congestion Control Topics

Additional Congestion Control Topics

• TCP compatibility– Everyone has to use congestion control that shares fairly

• Equation-based congestion control [ r ~ 1/sqrt(d) ]

• Extending TCP to higher speeds

• Signaling congestion without packet drops (ECN)

• Router involvement with congestion control– Ensuring fairness (no need for single standard)– Better than AIMD

• Economic models (Kelly)48

49

Summary

• Congestion is natural– Internet does not reserve resources in advance– TCP actively tries to grab capacity

• Congestion control critically important– AIMD: Additive Increase, Multiplicative Decrease– Congestion detected via packet loss (fail-safe)– Slow start to find initial sending rate & to restart after

timeout

• Next class:– Igor: Networking Libraries– Scott: Overlays and P2P

Documents

Congestion Control