Upload
norah
View
52
Download
0
Tags:
Embed Size (px)
DESCRIPTION
The Crosspoint Queued Switch. Yossi Kanizo (Technion, Israel). Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Politecnico di Torino, Italy). Typical Switch Architectures. Linecards. Linecards. Switch Fabric. Switch Fabric. Assumes Instantaneous Closed Loop. - PowerPoint PPT Presentation
Citation preview
The Crosspoint Queued Switch
Yossi Kanizo (Technion, Israel)
Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Politecnico di Torino, Italy)
Typical Switch Architectures
IQ – Input Queued
LinecardsSwitch Fabric Switch Fabric
CICQ – Combined Input and Crosspoint Queued
Linecards
Assumes Instantaneous Closed Loop
Single-Rack Router
Instantaneous closed loop → works in a single rack
Problem: multi-rack routers
Linecards
Switch Fabric
Current Router Architectures
Optical links
10sof meters
[Source: N. McKeown]
Is the closed loop still instantaneous?
1
10
100
1000
1998 2000 2002 2004 2006 2008 2010
Time-slot
RTT
Time Trendsn
s
Hiding Propagation Delays
Traditional solutions:Increase time-slot
poor switch performanceHide propagation delays using buffers
impractical amount of buffering
Proposed solution: closed loop → open loopPerformance degradation vs. instantaneous
closed loop
Outline
CQ: Open-loop switch architecture
Performance EvaluationAnalytical resultsSimulations CQ performance
degradation is not significant
Proposed Architecture:The Crosspoint-Queued (CQ) Switch No queues in
the linecards Buffering
only inside the fabric
Independent output schedulers
Drops with full buffers
Switch CoreLinecards
10s of meters
CQ Properties
Open loopNo communication overhead
No linecard queues No linecard queue management
“Router on a chip”Buffering and switch fabric
on same chip
Why not 10 years ago?
No need: single rackNo technology: SRAM density
Moore’s law: density doubling every 2.5 yearsAggressive 128x128 CQ switch: 4 cells of 64
bytes per crosspoint → 64 cells today
Conservative buffer requirements TCP Stanford model with smaller buffer needs
[Appenzeller, Keslassy and McKeown ’04]
Outline
CQ: Our open-loop switch architecture
Performance EvaluationAnalytical resultsSimulations
100% Throughput as B→
Throughput bounds:
OQ(2B-1) ≤ CQ(B)≤ OQ(NB)
…
…
…Buffer size B, LQF scheduling algorithm
100% Throughput
100% Throughput100%
Throughput
∞
Uniform Traffic, B=1
Uniform traffic model:At each time-slot, at each
of the N inputs: Bernoulli IID packet arrivals with probability
Each packet is destined for one of the N outputs uniformly at random
Theorem: Under uniform traffic and B=1, the performance of the switch is independent of the specific work-conserving scheduling algorithm Intuition: Symmetry
/ N
/ N
/ N
/ N
/ N
Uniform Traffic, B=1
Theorem: The throughput and waiting time of a CQ switch, B=1 is:
Proof: Based on Z-transform
q=1-/N
Goes to 100% as N goes to infinity
Models for larger buffers
Approximate Performance Analysis
Model for exhaustive round-robin scheduling Based on modifications to polling
system with zero switch-over times
Model for random scheduling algorithm
Show 100% throughput as N→∞
1 / N
1 / N
1 / N
1 / N
1 / N
Trace-Driven Simulation
Buffers of size 64 suffice to ensure 99% throughput for N=32.
32x32 CQ switch with different buffer sizes (in units of 64-byte packets)
Conclusions
CQ is open loop → allows multi-rack configuration
CQ provides easy schedulingCQ is feasible to implement in a single
chipCQ shows good performance in
simulations
Thank You