Reducing the Buffer Size in Backbone Routers
Yashar Ganjali
High Performance Networking Group
Stanford University
February 23, 2005
http://www.stanford.edu/~yganjali
23 February 2005 High Performance Networking Group 2
Motivation
Problem– Internet traffic is doubled every year– Disparity between traffic and router growth
(space, power, cost)
Possible solution– All-optical networking
Consequences– Large capacity large traffic– Very small buffers
23 February 2005 High Performance Networking Group 3
Outline of the Talk
Buffer sizes in today’s Internet From huge to small (Guido’s results)
– 2-3 orders of magnitude reduction
From small to tiny– Constant buffer sizes?
23 February 2005 High Performance Networking Group 4
Backbone Router Buffers
Universally applied rule-of-thumb– A router needs a buffer size: B=2TxC
2T is the two-way propagation delay C is the capacity of the bottleneck link
Known to the inventors of TCP Mandated in backbone routers Appears in RFPs and IETF architectural guidelines
CRouterSource Destination
2T
23 February 2005 High Performance Networking Group 5
Review: TCP Congestion Control
Only W packets may be outstanding
Rule for adjusting W– If an ACK is received: W ← W+1/W– If a packet is lost: W ← W/2
Source Dest
maxW
2maxW
t
Window size
23 February 2005 High Performance Networking Group 6
Multiplexing Effect in the Core
ProbabilityDistribution
B
0
Buffer Size
W
23 February 2005 High Performance Networking Group 7
Backbone router buffers
It turns out that– The rule of thumb is wrong for a core routers
today
– Required buffer is instead of CT 2n
CT 2
23 February 2005 High Performance Networking Group 8
Required Buffer Size
2T C
n
Simulation
23 February 2005 High Performance Networking Group 9
Impact on Router Design
10Gb/s linecard with 200,000 x 56kb/s flows– Rule-of-thumb: Buffer = 2.5Gbits
Requires external, slow DRAM
– Becomes: Buffer = 6Mbits Can use on-chip, fast SRAM Completion time halved for short-flows
40Gb/s linecard with 40,000 x 1Mb/s flows– Rule-of-thumb: Buffer = 10Gbits– Becomes: Buffer = 50Mbits
23 February 2005 High Performance Networking Group 10
How small can buffers be?
Imagine you want to build an all-optical router for a backbone network…
…and you can build a few dozen packets in delay lines.
Conventional wisdom: It’s a routing problem (hence deflection routing, burst-switching, etc.)
Our belief: First, think about congestion control.
23 February 2005 High Performance Networking Group 11
TCP with ALMOST No Buffers
Utilization of bottleneck link = 75%
23 February 2005 High Performance Networking Group 12
Problem Solved?
75% utilization with only one unit of buffering More flows Less buffer Therefore, one unit of buffering is enough
23 February 2005 High Performance Networking Group 13
TCP Throughput withSmall Buffers
TCP Throughput vs. Number of Flows
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 200 400 600 800 1000Number of Flows
Th
rou
gh
pu
t
23 February 2005 High Performance Networking Group 14
TCP Reno Performance
Buffer Size = 10; Load = 80%
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 1000 2000 3000 4000 5000 6000
Bottleneck Capacity Mbps
Th
rou
gh
pu
t
23 February 2005 High Performance Networking Group 15
Two Concurrent TCP Flows
23 February 2005 High Performance Networking Group 16
Simplified Model
Flow 1 sends W packets during each RTT Bottleneck capacity = C packets per RTT Example: C = 15, W = 5
Flow 2 sends two consecutive packets during each RTT
Drop probability is increased with W
1 RTT
23 February 2005 High Performance Networking Group 17
Simplified Model (Cont’d)
W(t+1) = p(t)x[W(t)/2] + [1-p(t)]x[W(t)+1] But, p grows linearly with W E[W] = O(C½) Link utilization = W/C As C increases, link utilization goes to zero.
Snow model!!!
23 February 2005 High Performance Networking Group 18
Q&A
Q. What happens if flow 2 never sends any consecutive packets?
A. No packet drops unless utilization = 100%.
Q. How much space we need between the two packets?
A. At least the size of a packet.
Q. What if we have more than two flows?
23 February 2005 High Performance Networking Group 19
Per-flow Queueing
Let us assume we have a queue for each flow; and
Server those queues in a round robin manner.
Does this solve the problem?
23 February 2005 High Performance Networking Group 20
Per-flow Buffering
23 February 2005 High Performance Networking Group 21
Per-Flow Buffering
Flow 3, does not have a packet at time t; Flows 1 and 2 do.
At time t+RTT we will see a drop.
Temporarily Idle
Time t Time t + RTT
23 February 2005 High Performance Networking Group 22
Ideal Solution
If packets are spaced out perfectly; and The starting times of flows are chosen randomly; We only need a small buffer for contention
resolution.
23 February 2005 High Performance Networking Group 23
Randomization
Mimic an M/M/1 queue
Under low load, queue size is small with high Probability
Loss can be bounded
kkXP
EX
1
11
M/M/1
X
b
P(Q > b)Buffer B
Packet Loss
23 February 2005 High Performance Networking Group 24
TCP Pacing
Current TCP: – Send packets when ACK received.
Paced TCP: – Send one packet every W/RTT time units.– Update W, and RTT similar to TCP
23 February 2005 High Performance Networking Group 25
CWND: Reno vs. Paced TCP
23 February 2005 High Performance Networking Group 26
TCP Reno: Throughput vs. Buffer Size
23 February 2005 High Performance Networking Group 27
Paced TCP: Throughput vs. Buffer Size
23 February 2005 High Performance Networking Group 28
Early Results
Congested core router with 10 packet buffers.Average offered load = 80%RTT = 100ms; each flow limited to 2.5Mb/s
router
source
server
source
10Gb/s
>10Gb/s
>10Gb/s
23 February 2005 High Performance Networking Group 29
What We Know
ArbitraryInjectionProcess
If Poisson Processwith load < 1
CompleteCentralized
Control
Any rate > 0need unbounded
buffers
Theory Experiment
Need buffersize of approx:
O(logD + logW)i.e. 20-30 pkts
D=#of hopsW=window size
[Goel 2004]
TCP Pacing:Results as goodor better than forPoisson
Constant fractionthroughput withconstant buffers
[Leighton]
23 February 2005 High Performance Networking Group 30
Limited Congestion WindowR
EN
OP
AC
ING
Limited Window Unlimited Window
23 February 2005 High Performance Networking Group 31
Slow Access Links
router
source
server
source
10Gb/s
5Mb/s 5Mb/s
Congested core router with 10 packet buffers.RTT = 100ms; each flow limited to 2.5Mb/s
23 February 2005 High Performance Networking Group 32
Conclusion
We can reduce 1,000,000 packet buffers to 10,000 today.
We can “probably” reduce to 10-20 packet buffers:– With many small flows, no change needed.– With some large flows, need pacing in the access
routers or at the edge devices.
Need more work!
23 February 2005 High Performance Networking Group 33
Extra Slides
23 February 2005 High Performance Networking Group 34
Pathological Example
Flow 1: S1 D; Load = 50% Flow 2: S2 D; Load = 50%
If S1 sends a packet at time t, S2 cannot send any packets at time t, and t+1.
To achieve 100% throughput we need at least one unit of buffering.
1 2
S1
S2
D3 4