Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
MotivationPCIQ Switch Architecture
Summary
Towards an Efficient Switch Architecture forHigh-Radix Switches
G. Mora1 J. Flich1 J. Duato1 P. López1 E. Baydal1
O. Lysne2
1Department of Computer EngineeringTechnical University of Valencia, Spain
2Simula LabOslo, Norway
ACM/IEEE Symposium on Architectures for Networking andCommunications Systems 2006
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Outline
1 Motivation
2 PCIQ Switch ArchitectureDescriptionEvaluationEnhancements and Cost Analysis
3 Summary
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Motivation
HPC requires efficient Interconnection Networks.The Interconnection Network efficiency largely depends onthe Switch design.How to build those Switches?
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Motivation
HPC requires efficient Interconnection Networks.The Interconnection Network efficiency largely depends onthe Switch design.How to build those Switches?
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Motivation
HPC requires efficient Interconnection Networks.The Interconnection Network efficiency largely depends onthe Switch design.How to build those Switches?
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Motivation
HPC requires efficient Interconnection Networks.The Interconnection Network efficiency largely depends onthe Switch design.How to build those Switches?
Specially, how to use the pin bandwidth of such Switches?Low-radix switches with wide channels.High-radix switches with narrow channels.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Motivation
HPC requires efficient Interconnection Networks.The Interconnection Network efficiency largely depends onthe Switch design.How to build those Switches?
Specially, how to use the pin bandwidth of such Switches?Low-radix switches with wide channels.High-radix switches with narrow channels.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Switch Solutions to build large networks
Networks Using Low-Radix Switches with Wide Channels
↑Network Latency↑Network Cost↑Power Consumption
Networks using High-Radix Switches with Narrow Channels
↓Network Latency↓Network Cost↓Power Consumption
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Cost of Building a Switch
We need to keep a high switch efficiency with an affordablecost.The cost depends on:
Memory resources.Arbiter logic.Internal connection logic.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Cost of Building a Switch
We need to keep a high switch efficiency with an affordablecost.The cost depends on:
Memory resources.Arbiter logic.Internal connection logic.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Head of Line Blocking
It largely affects the switch efficiency.It appears when a packet at the head of a queue is blockedand packets behind requesting free output ports.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Head of Line Blocking
It largely affects the switch efficiency.It appears when a packet at the head of a queue is blockedand packets behind requesting free output ports.
HOL!
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Switch OrganizationsOQ - Output Queueing
Output Queueing - N × N Switch
XBar
Memoryrequirements ∼ NNo HOLSpeedup of N
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Switch OrganizationsIQ - Input Queueing
Input Queued N × N Switch
XBar
Memoryrequirements ∼ NNo SpeedupHOL limits max.througput at 58%
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Switch OrganizationsIQ - Input Queueing with VOQ
Input Queued N × N Switch with VOQ
XBar
No HOL at switchlevelNo SpeedupMemoryrequirements ∼ N2
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Switch OrganizationsCIOQ - Combined Input-Output Queueing
Combined Input-Output Queued N × N Switch
XBar
Memoryrequirements ∼ 2NHOL at switch levelMax. Speedup of 2 or3
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Switch OrganizationsBC - Buffered Crossbar
Buffered Crossbar N × N Switch
No HOLLow cost arbitersMemoryrequirements ∼ N2
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Switch OrganizationsHC - Hierarchical Crossbar
p−Hierarchical Crossbar N × N Switch
CIOQ
It is an intermediate solutionbetween CIOQ and BC.A Buffered Crossbar (with Nports) is subtituted by smallerswitches (with p ports).
Memory requirements ∼ N2
p
Speedup
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Switch Organizations
None of these architectures scale (because high-cost orlow switch efficiency).We need a better proposal for high-radix switches!
A proposal that scales,that achieves high switch efficiency,and that eliminates HOL blocking problem.
PCIQ fulfills all these requirements.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Switch Organizations
None of these architectures scale (because high-cost orlow switch efficiency).We need a better proposal for high-radix switches!
A proposal that scales,that achieves high switch efficiency,and that eliminates HOL blocking problem.
PCIQ fulfills all these requirements.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
Outline
1 Motivation
2 PCIQ Switch ArchitectureDescriptionEvaluationEnhancements and Cost Analysis
3 Summary
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
Starting Point
To be aware of the implementation cost we will present thePCIQ in a constructive way.We will monitor aspects such as Switch Efficiency, MemoryRequirements and Crossbar Complexity.We start with CIOQ switch organization without speedup.
Ou
tpu
t li
nk
s
Crossbar
Inp
ut
lin
ks
memory memoryRouting &arbitrationunit
Input Output
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
First Modification: Increasing Read Bandwidth
Outp
ut
links
Crossbar
Input
links
memory memoryRouting &arbitrationunit
CIOQ
Switch Efficiency
Memory Requeriments
Crossbar Complexity
Input Output
In order to increase the switch efficiency, let’s increase theread bandwith of input memories.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
Doubling SRAM Read Bandwidth
How to double the SRAM read bandwidth?Split SRAM into two independent modules.
Doubles silicon area (and SRAM size).Requires extra logic to select the SRAM.
Implement two read ports.Increases silicon area by 25%.With full-custom designs or HMA techniques this extra areacan be reduced.
Ou
tpu
t li
nk
s
Crossbar
Inp
ut
lin
ks
memory memoryRouting &arbitrationunit
Input Output
Ou
tpu
t li
nk
s
Crossbar
Inp
ut
lin
ks
memory memoryRouting &arbitrationunit
Input Output
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
First Modification: Increasing Read Bandwidth
Outp
ut
links
Crossbar
Input
links
memory memoryRouting &arbitrationunit
Outp
ut
links
Crossbar
Input
links
memory memoryRouting &arbitrationunit
CIOQ−2rpCIOQ
Switch Efficiency
Memory Requeriments
Crossbar Complexity
Input Output Input Output
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
Second Modification: Splitting Crossbars
We need to keep constant arbiter complexity.We split the crossbar into two separated crossbars.
Ou
tpu
t li
nk
s
Crossbar
Inp
ut
lin
ks
memory memoryRouting &arbitrationunit
Input Output
Crossbar
Outp
ut
links
Input
links
Inputmemories Output
memories
arbitration
unit
Routing &
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
Second Modification: Splitting Crossbars
Outp
ut
links
Crossbar
Input
links
memory memoryRouting &arbitrationunit
Outp
ut
links
Crossbar
Input
links
memory memoryRouting &arbitrationunit
Crossbar
Outp
ut
links
Input
links
Inputmemories Output
memories
PCCIOQ−2rpCIOQ
Switch Efficiency
Memory Requeriments
Crossbar Complexity
arbitration
unit
Routing &
Input Output Input Output
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
Third Modification: Removing the Output Memories
Crossbar
Outp
ut
links
Input
links
Inputmemories
arbitration
unit
Routing &
Each read port is used toforward packets to a differentset of output links.Output memories receive dataat the link rate.Output memories can beremoved.It compensates the extra costfor dual-ported memory.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
PCIQ Switch OrganizationOverview
Outp
ut
links
Crossbar
Input
links
memory memoryRouting &arbitrationunit
Outp
ut
links
Crossbar
Input
links
memory memoryRouting &arbitrationunit
Crossbar
Outp
ut
links
Input
links
Inputmemories Output
memories
Crossbar
Outp
ut
links
Input
links
Inputmemories
Switch Efficiency
Memory Requeriments
Crossbar Complexity
PCIQPCCIOQ−2rpCIOQ
arbitration
unit
Routing &
arbitration
unit
Routing &
Input Output Input Output
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
PCIQ Switch OrganizationRouting and Flow Control
Crossbar
Outp
ut
links
Input
links
Inputmemories
arbitration
unit
Routing &
Packets must be stored in thecorrect queue.Credit-based flow control atmemory level.Xon/Xoff flow control at queuelevel.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
PCIQ Switch OrganizationArbiter
Two identical arbiters arerequired.One per crossbar andassociated with one readport from each inputmemory.Implemented as anhierarchical round-robinarbiter.
Arbiter (not new)1st. level: qx1 rr-arbiter(among q queues).2nd. level: Nx1 rr-arbiter(among N memories).It arbitratesasynchronously at packetlevel.So, efficiency increaseswith traffic load.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
PCIQ Switch OrganizationArbiter
Two identical arbiters arerequired.One per crossbar andassociated with one readport from each inputmemory.Implemented as anhierarchical round-robinarbiter.
Arbiter (not new)1st. level: qx1 rr-arbiter(among q queues).2nd. level: Nx1 rr-arbiter(among N memories).It arbitratesasynchronously at packetlevel.So, efficiency increaseswith traffic load.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
PCIQ Switch OrganizationArbiter Efficiency
Arbiter efficiency increases with traffic load, specially forasymmetrical crossbars.
N × N Crossbar (Symmetrical)
NxN
8 inputs requesting 8 outputs(Ratio 1 to 1)
7 inputs 7 outputs(Ratio 1 to 1)
6 inputs 6 outputs(Ratio 1 to 1)
5 inputs 5 outputs(Ratio 1 to 1)
N × N2 Crossbar (Asymmetrical)
NxN/2
8 inputs requesting 4 outputs(Ratio 2 to 1)
7 inputs 3 outputs(Ratio 2.33 to 1)
6 inputs 2 outputs(Ratio 3 to 1)
5 inputs 1 outputs(Ratio 5 to 1)
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
PCIQ Switch OrganizationArbiter Efficiency
Arbiter efficiency increases with traffic load, specially forasymmetrical crossbars.
N × N Crossbar (Symmetrical)
NxN
8 inputs requesting 8 outputs(Ratio 1 to 1)
7 inputs 7 outputs(Ratio 1 to 1)
6 inputs 6 outputs(Ratio 1 to 1)
5 inputs 5 outputs(Ratio 1 to 1)
N × N2 Crossbar (Asymmetrical)
NxN/2
8 inputs requesting 4 outputs(Ratio 2 to 1)
7 inputs 3 outputs(Ratio 2.33 to 1)
6 inputs 2 outputs(Ratio 3 to 1)
5 inputs 1 outputs(Ratio 5 to 1)
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
PCIQ Switch OrganizationArbiter Efficiency
Arbiter efficiency increases with traffic load, specially forasymmetrical crossbars.
N × N Crossbar (Symmetrical)
NxN
8 inputs requesting 8 outputs(Ratio 1 to 1)
7 inputs 7 outputs(Ratio 1 to 1)
6 inputs 6 outputs(Ratio 1 to 1)
5 inputs 5 outputs(Ratio 1 to 1)
N × N2 Crossbar (Asymmetrical)
NxN/2
8 inputs requesting 4 outputs(Ratio 2 to 1)
7 inputs 3 outputs(Ratio 2.33 to 1)
6 inputs 2 outputs(Ratio 3 to 1)
5 inputs 1 outputs(Ratio 5 to 1)
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
PCIQ Switch OrganizationArbiter Efficiency
Arbiter efficiency increases with traffic load, specially forasymmetrical crossbars.
N × N Crossbar (Symmetrical)
NxN
8 inputs requesting 8 outputs(Ratio 1 to 1)
7 inputs 7 outputs(Ratio 1 to 1)
6 inputs 6 outputs(Ratio 1 to 1)
5 inputs 5 outputs(Ratio 1 to 1)
N × N2 Crossbar (Asymmetrical)
NxN/2
8 inputs requesting 4 outputs(Ratio 2 to 1)
7 inputs 3 outputs(Ratio 2.33 to 1)
6 inputs 2 outputs(Ratio 3 to 1)
5 inputs 1 outputs(Ratio 5 to 1)
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
PCIQ Switch OrganizationExtending PCIQ to More than Two Subcrossbars
PCIQ can be further partitioned in more than 2subcrossbars.
Crossbar
Ou
tpu
t li
nk
s
Inp
ut
lin
ks
Inputmemories
arbitration
unit
Routing &
Outp
ut
lin
ks
Inp
ut
lin
ks
memoriesInput
Crossbars
unit
Routing &
arbitration
PCIQ is a new family of switch architectures. Fits the gapbetween CIOQ and BC.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
PCIQ Switch OrganizationExtending PCIQ to More than Two Subcrossbars
PCIQ can be further partitioned in more than 2subcrossbars.
Crossbar
Ou
tpu
t li
nk
s
Inp
ut
lin
ks
Inputmemories
arbitration
unit
Routing &
Outp
ut
lin
ks
Inp
ut
lin
ks
memoriesInput
Crossbars
unit
Routing &
arbitration
PCIQ is a new family of switch architectures. Fits the gapbetween CIOQ and BC.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
Evaluation of PCIQConfigurations Analyzed
Switch24-ports
Organizations
Basic CIOQPCIQ - 2-xbarPCIQ - 4-xbarHC (p = 12)
PacketsSize: 256 bytes
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
Evaluation of PCIQConfigurations Analyzed
Switch24-ports
Organizations
Basic CIOQPCIQ - 2-xbarPCIQ - 4-xbarHC (p = 12)
PacketsSize: 256 bytes
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
Evaluation of PCIQConfigurations Analyzed
Switch24-ports
Organizations
Basic CIOQPCIQ - 2-xbarPCIQ - 4-xbarHC (p = 12)
PacketsSize: 256 bytes
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
Evaluation of PCIQSimulator
Event-driven SimulatorWorks at the clock level.VCT and Flow Control are modeled.Each link of the switch is attached to an end node.Injection by nodes at maximum link rate (1 byte/cycle).Three phase arbiter is modeled.The switch forwards a byte from an input to an output inone cycle.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
Evaluation of PCIQResults for Uniform Traffic
Throughput, 256 bytes
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Acc
epte
d tr
affic
(B
ytes
/cyc
le/p
ort)
Injected traffic (Bytes/cycle/port)
"HC""PCIQ-4xbar""PCIQ-2xbar"
"CIOQ-2Q""CIOQ-1Q"
Latency, 256 bytes
0
1000
2000
3000
4000
5000
6000
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1N
etw
ork
late
ncy
(Cyc
les)
Injected traffic (Bytes/cycle/port)
"HC""PCIQ-4xbar""PCIQ-2xbar"
"CIOQ-2Q""CIOQ-1Q"
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
Evaluation of PCIQResults for Hot-spot plus Uniform Traffic
Throughput, 256 bytes
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Acc
epte
d tr
affic
(B
ytes
/cyc
le/p
ort)
Injected traffic (Bytes/cycle/port)
"HC""PCIQ-4xbar""PCIQ-2xbar"
"CIOQ-2Q""CIOQ-1Q"
Latency, 256 bytes
0
1000
2000
3000
4000
5000
6000
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1N
etw
ork
late
ncy
(Cyc
les)
Injected traffic (Bytes/cycle/port)
"HC""PCIQ-4xbar""PCIQ-2xbar"
"CIOQ-2Q""CIOQ-1Q"
Hot Spot: Each input sending 10% of traffic to a hot spot destination and the rest randomly.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
Enhancements to PCIQ
Solutions to the HOL blocking problem
VOQQueue requirements grow quadratically.Does not solve network HOL.
Increase SpeedupIncreases the cost of the switch.Does not solve network HOL.
RECN (Regional Explicit Congestion Notification)RECN is a congestion management technique thatdynamically detects congestion and separates congestedpackets from non-congested ones.Requires a very limited set of extra queues (known asSAQs).Solves switch and network HOL blocking.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
Enhancements to PCIQ
Solutions to the HOL blocking problem
VOQQueue requirements grow quadratically.Does not solve network HOL.
Increase SpeedupIncreases the cost of the switch.Does not solve network HOL.
RECN (Regional Explicit Congestion Notification)RECN is a congestion management technique thatdynamically detects congestion and separates congestedpackets from non-congested ones.Requires a very limited set of extra queues (known asSAQs).Solves switch and network HOL blocking.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
Enhancements to PCIQ
Solutions to the HOL blocking problem
VOQQueue requirements grow quadratically.Does not solve network HOL.
Increase SpeedupIncreases the cost of the switch.Does not solve network HOL.
RECN (Regional Explicit Congestion Notification)RECN is a congestion management technique thatdynamically detects congestion and separates congestedpackets from non-congested ones.Requires a very limited set of extra queues (known asSAQs).Solves switch and network HOL blocking.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
Evaluation of PCIQ with RECNResults for Uniform Traffic
Throughput, 256 bytes
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Acc
epte
d tr
affic
(B
ytes
/cyc
le/p
ort)
Injected traffic (Bytes/cycle/port)
"PCIQ-2xbar-4saqs""PCIQ-2xbar-2saqs"
"HC""PCIQ-4xbar"
Latency, 256 bytes
0
1000
2000
3000
4000
5000
6000
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1N
etw
ork
late
ncy
(Cyc
les)
Injected traffic (Bytes/cycle/port)
"PCIQ-2xbar-4saqs""PCIQ-2xbar-2saqs"
"HC""PCIQ-4xbar"
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
Evaluation of PCIQ with RECNResults for Hot-spot plus Uniform Traffic
Throughput, 256 bytes
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Acc
epte
d tr
affic
(B
ytes
/cyc
le/p
ort)
Injected traffic (Bytes/cycle/port)
"PCIQ-2xbar-4saqs""PCIQ-2xbar-2saqs"
"HC""PCIQ-4xbar"
Latency, 256 bytes
0
1000
2000
3000
4000
5000
6000
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1N
etw
ork
late
ncy
(Cyc
les)
Injected traffic (Bytes/cycle/port)
"PCIQ-2xbar-4saqs""PCIQ-2xbar-2saqs"
"HC""PCIQ-4xbar"
Hot Spot: Each input sending 10% of traffic to a hot spot destination and the rest randomly.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
Evaluation of PCIQSpecial Results
RECN added to CIOQarchitecture.
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Acc
epte
d tr
affic
(B
ytes
/cyc
le/p
ort)
Injected traffic (Bytes/cycle/port)
"PCIQ-2xbar-2saqs""CIOQ-2saqs"
Worst case traffic for PCIQ.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Acc
epte
d tr
affic
(B
ytes
/cyc
le/p
ort)
Injected traffic (Bytes/cycle/port)
"PCIQ-2xbar-4saqs""PCIQ-2xbar-2saqs"
"CIOQ-2Q""HC"
"PCIQ-2xbar""CIOQ-1Q"
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
Cost Analysis
Cost depends on...Memory Resources.Crossbar.Arbiter Complexity.
Memory Resources
0
200
400
600
800
1000
50 100 150 200 250
Num
ber
of r
equi
red
mem
orie
s
Switch radix
CIOQminimum cost HC
PCIQ-2xbar
Crossbar
CIOQ→ N × N : N2
HC→ (N/p)2 xbars p × p : N2
PCIQ→ k xbars N × N/k : N2
Arbiter Complexity
Deduced from the crossbarcomplexity.
Thus, similar for all architectures.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
Cost Analysis
Cost depends on...Memory Resources.Crossbar.Arbiter Complexity.
Memory Resources
0
200
400
600
800
1000
50 100 150 200 250
Num
ber
of r
equi
red
mem
orie
s
Switch radix
CIOQminimum cost HC
PCIQ-2xbar
Crossbar
CIOQ→ N × N : N2
HC→ (N/p)2 xbars p × p : N2
PCIQ→ k xbars N × N/k : N2
Arbiter Complexity
Deduced from the crossbarcomplexity.
Thus, similar for all architectures.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
Cost Analysis
Cost depends on...Memory Resources.Crossbar.Arbiter Complexity.
Memory Resources
0
200
400
600
800
1000
50 100 150 200 250
Num
ber
of r
equi
red
mem
orie
s
Switch radix
CIOQminimum cost HC
PCIQ-2xbar
Crossbar
CIOQ→ N × N : N2
HC→ (N/p)2 xbars p × p : N2
PCIQ→ k xbars N × N/k : N2
Arbiter Complexity
Deduced from the crossbarcomplexity.
Thus, similar for all architectures.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
Cost Analysis
Cost depends on...Memory Resources.Crossbar.Arbiter Complexity.
Memory Resources
0
200
400
600
800
1000
50 100 150 200 250
Num
ber
of r
equi
red
mem
orie
s
Switch radix
CIOQminimum cost HC
PCIQ-2xbar
Crossbar
CIOQ→ N × N : N2
HC→ (N/p)2 xbars p × p : N2
PCIQ→ k xbars N × N/k : N2
Arbiter Complexity
Deduced from the crossbarcomplexity.
Thus, similar for all architectures.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
DescriptionEvaluationEnhancements and Cost Analysis
Cost AnalysisOverview
��������
����
����
����������������
��������
����
����
����������������
����������������
����
����
�����������������
���������������
����
����
��������
�������� �
�������
��������
��������
�������� �
�������
��������
PCIQCIOQ HC
xbar
xbar
xbar xbar
xbar xbar
xbar
Arb. Arb.
Arb.Arb.Arb.Arb.
Arb. Arb.
Arb. Arb. Arb. Arb.
Arb. Arb. Arb. Arb.
Crossbar: Number Crosspoints = 16
Number of Arbiters = 4
Required Wires from Arbiter = 16
Crossbar: Number Crosspoints = 16
Number of Arbiters = 8
Required Wires from Arbiter = 16
Crossbar: Number Crosspoints = 16
Number of Arbiters = 4
Required Wires from Arbiter = 16
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Outline
1 Motivation
2 PCIQ Switch ArchitectureDescriptionEvaluationEnhancements and Cost Analysis
3 Summary
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Summary
High-radix switches are becoming a necessity.Current switch organizations suffer low efficiency or highcost.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Summary
High-radix switches are becoming a necessity.Current switch organizations suffer low efficiency or highcost.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Summary
High-radix switches are becoming a necessity.Current switch organizations suffer low efficiency or highcost.
PCIQ relies on...A partitioned crossbar that allows to increase the readbandwidth without increasing the cost.Two round-robin packed-based arbiters (one for eachcrossbar).A congestion management technique (RECN) to eliminatethe HOL problem.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Summary
High-radix switches are becoming a necessity.Current switch organizations suffer low efficiency or highcost.
PCIQ relies on...A partitioned crossbar that allows to increase the readbandwidth without increasing the cost.Two round-robin packed-based arbiters (one for eachcrossbar).A congestion management technique (RECN) to eliminatethe HOL problem.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Summary
High-radix switches are becoming a necessity.Current switch organizations suffer low efficiency or highcost.
PCIQ relies on...A partitioned crossbar that allows to increase the readbandwidth without increasing the cost.Two round-robin packed-based arbiters (one for eachcrossbar).A congestion management technique (RECN) to eliminatethe HOL problem.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Summary
High-radix switches are becoming a necessity.Current switch organizations suffer low efficiency or highcost.
PCIQ relies on...A partitioned crossbar that allows to increase the readbandwidth without increasing the cost.Two round-robin packed-based arbiters (one for eachcrossbar).A congestion management technique (RECN) to eliminatethe HOL problem.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Summary
High-radix switches are becoming a necessity.Current switch organizations suffer low efficiency or highcost.
PCIQ achieves...Cost similar or lower than basic organizations like CIOQ.Maximum switch efficiency for uniform traffic distribution.Eliminate completely switch and network-wide HOLblocking...Thus, maximum switch efficiency for non-uniform traffic.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Summary
High-radix switches are becoming a necessity.Current switch organizations suffer low efficiency or highcost.
PCIQ achieves...Cost similar or lower than basic organizations like CIOQ.Maximum switch efficiency for uniform traffic distribution.Eliminate completely switch and network-wide HOLblocking...Thus, maximum switch efficiency for non-uniform traffic.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Summary
High-radix switches are becoming a necessity.Current switch organizations suffer low efficiency or highcost.
PCIQ achieves...Cost similar or lower than basic organizations like CIOQ.Maximum switch efficiency for uniform traffic distribution.Eliminate completely switch and network-wide HOLblocking...Thus, maximum switch efficiency for non-uniform traffic.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Summary
High-radix switches are becoming a necessity.Current switch organizations suffer low efficiency or highcost.
PCIQ achieves...Cost similar or lower than basic organizations like CIOQ.Maximum switch efficiency for uniform traffic distribution.Eliminate completely switch and network-wide HOLblocking...Thus, maximum switch efficiency for non-uniform traffic.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
MotivationPCIQ Switch Architecture
Summary
Summary
High-radix switches are becoming a necessity.Current switch organizations suffer low efficiency or highcost.
PCIQ achieves...Cost similar or lower than basic organizations like CIOQ.Maximum switch efficiency for uniform traffic distribution.Eliminate completely switch and network-wide HOLblocking...Thus, maximum switch efficiency for non-uniform traffic.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
Thank you very much for your attention.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches
For Further Reading
J. Duato, S. Yalamanchili, L. Ni.Interconnection Networks: An Engineering Approach.Morgan Kaufmann, 2003.
J. Kim, W. J. Dally, B. Towles, A .K. Gupta.Microarchitecture of a high-radix router.32nd ISCA, 420–431, 2005.
E. S. Shin, V. J. Mooney III, G. F. Riley.Round-robin arbiter design and generation.15th International Symposium on System Synthesis, 2002.
Hans Jürgen Mattausch.Hierarchical N-Port Memory Architecture based on 1-PortMemory Cells.ESSCIRC’97, pp. 348–351, 1997.
G. Mora, J. Flich, J. Duato, P. López, E. Baydal, O. Lysne Towards an Efficient Switch Architecture for High-Radix Switches