1
Hardware Implementation and Evaluation of Flexible Router
Architecture for NoCsHossam El-Sayed Abdel-FadeelM.Sc. Student, ECE department, E-JUST,
Research Assistance, NTIemail: [email protected]
Prof. M. Ragab, Assoc. Prof. Maha El-Sabarouty, Assoc. Prof. V. Goulart, and Assist. Prof. Mohammed Sharaf
December 2, 2013
2
• MOTIVATION• RELATED WORK• BASE ROUTER ARCHITECTURE• FLEXIBLE ROUTER ARCHITECTURE• EVALUATION AND EXPERIMENTS• CONCLUSION
December 2, 2013
Outline Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
3
• Process technology scales Transistor densities increases.
Many Processing Elements in a single chip. BUT, also global wiring delays increases. (wire speed not scaling)
Performance of Digital Systems increases in terms of computation.• Design concept
Many Processing Elements (PEs) need to be interconnected. Need a structured and scalable on-chip communication architecture.
Computation-centric design. Communication-centric design.
December 2, 2013
Why Network on Chip?
To combat these issues, researchers have proposed Network on Chip (NoC)
Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
4
• NoC = Routers + Links.– Network topology (how the nodes are connected
to each other)– Routing algorithm (how packets move: source
destination)– Flow control (controls the transmission of packets
between routers)– Router architecture (Buffers, Arbiters,
Crossbars, ..etc.)• Buffer Requirements in a Router
– Stores arriving Packets or flits.
December 2, 2013
What is Network on Chip (NoC) ?
Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
5
Buffers in NoC Routers
• Why buffering ? Wait for routing decisions. Contention for the same output channel. Congested downstream router.
• Large buffers improve Throughput and Latency.• BUT, in cost of
– Area: High hardware resource overhead– Power: Large energy consumers about 64% of the total router leakage
power .• Need efficient ways to use buffer resources
– Through Perfect management of available buffers.• Several architectures and implementations were proposed .
December 2, 2013
Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
6
Related Work
• Central Buffer Sharing Method– All ports share a central buffer– Improves the performance but at
the cost of• Area overheads• Complexity of control
• Distributed Shared Buffer – Shows improvement in the
throughput but in cost of • Power and • Area overhead.
December 2, 2013
Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
7
• Improve the performance of the overall network.– Modifying the Router Architecture
• Using the same amount of available buffers in more efficient way.– If there is a contention at any input port, the Flexible
Router will try to allocate any suitable free buffer in other input ports in the router.
– No need to increase the size of buffers or to use extra virtual channels (VCs)
December 2, 2013
Flexible Router Approach (1/3) Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
8December 2, 2013
Flexible Router Approach (2/3) Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
Base Router Congestion Problem
Busy
Busy E
W
S
N
Packets requesting busy buffer will be blocked
9December 2, 2013
Flexible Router Approach (3/3)
Instead of waiting busy buffer to be free look for another one.
Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
Increases packets moving through router
Features of Flexible Router
Busy
The design of Flexible Router similar to the base router except the added functionality and modules to the input ports.
Busy Busy
Efficient buffer utilization
Enhance Packets throughput
Low hardware resource overhead
EW
S
N
10December 2, 2013
Base Router Architecture Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
11December 2, 2013
Input Port Module
RC
Rou
ting
Com
puta
tion
FIFO buffer
ReqUpStr ReqInt(3:0)
FIFOController
GntUpStr GntInt(3:0)
ReqInCnt
GntInCnt
EmptyFull
PacketIn PacketInCnt IntPacket
Rea
dEn
Writ
eEn
Rea
dAdd
r
Writ
eIA
ddr
Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
12December 2, 2013
Output Port Module
Mux
Round RobinArbiter
gnt[1:0]
ReqInt (3:0)
GntInt (3:0) ReqDnStr
GntDnStr
fullDnStrPacketIn 0
PacketOutPacketIn 1
PacketIn 2
PacketIn 3
Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
13December 2, 2013
Basic operation of Base Router
Receiving flowchart of Down Stream
Sending flowchart of Up Stream
Out
put P
ort
Up
Stre
am R
oute
r (U
S)
Full_US
Request_US
Grant_US
PacketIn_US Inpu
t Por
t Do
wn
Stre
am R
oute
r (D
S)
Inpu
t Por
tU
p St
ream
Rou
ter
(US)
Full_DS
Request_DS
Grant_DS
PacketOut_DS
Out
put P
ort
Dow
n St
ream
Rou
ter
(DS)
Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
14December 2, 2013
Flexible Router Architecture Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
15December 2, 2013
Input Port Module
RC
Rou
ting
Com
puta
tion
ReqUpStr ReqInt(3:0)GntUpStr
GntInt(3:0)
ReqInCnt
GntInCnt
EmptyFull
PacketIn IntPacket
Rea
dEn
Writ
eEn
Rea
dAdd
r
Writ
eIA
ddr
FIFO buffer
FIFO Flexibility Controller
Req_FFCE_FIFO_W,N,SGnt_FFCE_FIFO_W,N,S
MU
X
EastPacket
Packets From Other Ports
Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
16December 2, 2013
Basic operation of Flexible Router The FFC requests other
FIFOs in a sequential order.pseudo code for East FFC : if (FIFO West is not full){Send Request and wait Grant ;}else if (FIFO North is not full){Send Request and wait Grant ;}else if (FIFO South is not full){Send Request and wait Grant ;}
Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
17
• By applying the turn model on the Flexible router working under XY routing we can avoid deadlock.
• Under XY routing, possible packet directions that each buffer can store in the Flexible router are as follows:– North buffer:
• Can contain packets directed to Local or South. – South buffer:
• Can contain packets directed to Local or North. – East buffer:
• Can contain packets directed to Local, North, South, or West. – West buffer:
• Can contain packets directed to Local, North, South, or East.
December 2, 2013
Possible Packet Directions
Packets directed to the local port they reach their destination and are absorbed directly with the local port.
Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
18
• NoC parameters used in this work :– A 64-bit 5-input-buffer router
System Architecture
Selected Logic Why usedArbitration Round Robin Fairness Switching Store-And-Forward For simplicity and prove of concept.Routing Algorithm XY- DOR Routing Minimize area and control overhead.
Deadlock free routingTopology Mesh Most common for 2D chipsPacket/Flit size 64 bits Can be vary from 32 to 264 bitsBuffer Size 2,4,8 Packets Small to see the utilizationTraffic Patterns Uniform Random, Hot-Spot and
Nearest-NeighborFor performance evaluation
December 2, 2013
Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
19
XY - Dimension-Ordered Routing (XY - DOR)
S
D
Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
20
• Latency is the time elapsed since a particular packet enters the network until its last packet reaches its destination.
• Throughput is the rate at which packets are delivered by the network for a particular traffic pattern. .
• There are many factors that a affecting these parameters– Topology: determines the connecting form of the system and the size,
or the number of nodes.– Injection rate: the rate at which packets are injected into the
simulator, tell the simulator how many packets to inject per simulation cycle per nodes on an average.
– Flow control: It refers to the number of virtual channels per physical channel and the depth of each virtual channel; the unit is flit.
December 2, 2013
Performance Parameters Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
21
• A cycle-accurate NoC simulation system in Verilog HDL is developed to evaluate the performance of Flexible Router.
• Synthesis Environment:– XILINX ISE 14.1 Target platform – XILINX Virtex-5 xc5vfx70t-
1ff1136 FPGA.– Cadence SoC Encounter ® Digital Implementation System,
with 180nm technology. (Encounter RTL Compiler®)
December 2, 2013
Evaluation Approach Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
22December 2, 2013
Simulation Platform (1/3)
PE Information Where FunctionSend Time Sender Log Cycle counter of each sent packet
Receive Time Receiver Log Cycle counter of each received packet
PE Module Sender ID Sender and Receiver Log The PE Module ID of the Sender
PE Module Receiver ID Receiver Log The PE Module ID of the Receiver
Packet ID Sender and Receiver Log The ID of the transmitted Packet
Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
Packet Injector Flow Chart
23
Simulation Platform (2/3)
Verilog RTL Model
Verilog Testbench
Simulation Compiler
Simulation Results
Log FilesWaveform Matlab
Matlab calculates the following: Average Latency for all the packets in the simulation system. Average Throughput for all the packets in the simulation system.
December 2, 2013
Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
SimulationGraphs
Modelsim or ISim
24
• Most performance analysis used synthetic traffic patterns with different characteristics.
• Simulation done under 3 different traffic patterns:– Uniform (UNI): all the traffic is equally distributed between all nodes.
This is the most commonly used traffic pattern for network evaluation because it is straightforward to implement, it makes no assumptions about the application.
– Nearest-Neighbor (NN): any node sends only to its neighbor nodes.– Hotspot (HS): 90% of the traffic is directed to the hotspot node at (2,
2) and the rest of the traffic is equally distributed between all other nodes.
December 2, 2013
Simulation Platform (3/3) Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
25
Uniform Random Traffic
December 2, 2013
Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
0 0.02 0.04 0.06 0.08 0.1 0.12 0.1420
40
60
80
100Buffer Size = 4
Packets Injection Rate (Packet/Cycle/PE)
Ave
rage
Lat
ency
(Cyc
les)
Base RouterFlexible Router
0 0.02 0.04 0.06 0.08 0.1 0.12 0.140
0.05
0.1
0.15Buffer Size = 4
Packets Injection Rate (Packet/Cycle/PE)
Thro
ughp
ut (P
acke
ts/C
ycle
/PE
)
Base RouterFlexible Router
0 0.02 0.04 0.06 0.08 0.1 0.12 0.140
50
100
150Buffer Size = 8
Packets Injection Rate (Packet/Cycle/PE)
Ave
rage
Lat
ency
(Cyc
les)
Base RouterFlexible Router
0 0.02 0.04 0.06 0.08 0.1 0.12 0.140
0.05
0.1
0.15
Packets Injection Rate (Packet/Cycle/PE)
Thro
ughp
ut (P
acke
ts/C
ycle
/PE
)
Buffer Size = 8
Base RouterFlexible Router
0 0.02 0.04 0.06 0.08 0.1 0.120
0.02
0.04
0.06
0.08
0.1
0.12
Packets Injection Rate (Packet/Cycle/PE)
Thro
ughp
ut (P
acke
ts/C
ycle
/PE
) Buffer Size = 2
Base RouterFlexible Router
0 0.02 0.04 0.06 0.08 0.1 0.1225
30
35
40
45
50
55
Packets Injection Rate (Packet/Cycle/PE)
Ave
rage
Lat
ency
(Cyc
les)
Buffer Size = 2
Base RouterFlexible Router
26
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.212
12.5
13
13.5
14
14.5
15 Buffer Size = 2
Packet Injection Rate (Packets/Cycle/PE)
Ave
rage
Lat
ency
(Cyc
les)
Base RouterFlexible Router
December 2, 2013
Nearest Neighbor Traffic Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.212
12.5
13
13.5
14
14.5
15
15.5Buffer Size = 4
Packets Injection Rate (Packet/Cycle/PE)
Ave
rage
Lat
ency
(Cyc
les)
Base RouterFlexible Router
0 0.05 0.1 0.15 0.212
12.5
13
13.5
14
14.5
15
15.5
Packets Injection Rate (Packet/Cycle/PE)
Ave
rage
Lat
ency
(Cyc
les)
Buffer Size = 8
Flexible RouterBase Router
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.20
0.05
0.1
0.15
0.2Buffer Size = 4
Packets Injection Rate (Packet/Cycle/PE)
Thro
ughp
ut (P
acke
ts/C
ycle
/PE
)
Base RouterFlexible Router
0 0.025 0.05 0.075 0.1 0.125 0.15 0.175 0.20
0.05
0.1
0.15
0.2Buffer Size = 8
Packets Injection Rate (Packet/Cycle/PE)
Thro
ughp
ut (P
acke
ts/C
ycle
/PE
)
Base RouterFlexible Router
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.20
0.05
0.1
0.15
0.2Buffer Size = 2
Packets Injection Rate (Packet/Cycle/PE)
Thro
ughp
ut (P
acke
ts/C
ycle
/PE
)
Base RouterFlexible Router
The traffic characteristics of Nearest Neighbor has that each injector only injects packets to its neighbors so the utilization of buffer makes the throughput to perform as a linear function that all injection served by the routers and no congestion happens to affect the throughput.
27December 2, 2013
Hot Spot Traffic
0 0.005 0.01 0.015 0.02 0.025 0.03 0.0350
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016Buffer Size = 8
Packets Injection Rate (Packet/Cycle/PE)
Thro
ughp
ut (P
acke
ts/C
ycle
/PE
)
Base RouterFlexible Router
0 0.005 0.01 0.015 0.02 0.025 0.030
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016Buffer Size = 2
Packets Injection Rate (Packet/Cycle/PE)
Thro
ughp
ut (P
acke
ts/C
ycle
/PE
)
Base RouterFlexible Router
0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.01820
40
60
80
100
120
140Buffer Size = 2
Packets Injection Rate (Packet/Cycle/PE)
Ave
rage
Lat
ency
(Cyc
les)
BR-BUF-2FR-BUF-2
0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.0180
50
100
150
200
250
300
350
400
450
500Buffer Size = 8
Packets Injection Rate (Packet/Cycle/PE)
Ave
rage
Lat
ency
(Cyc
les)
BR-BUF-8FR-BUF-8
Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.01620
40
60
80
100
120
140
160
180
200
220
Buffer Size = 4
Packets Injection Rate (Packet/Cycle/PE)
Ave
rage
Lat
ency
(Cyc
les)
BR-BUF-4FR-BUF-4
0 0.005 0.01 0.015 0.02 0.025 0.03 0.0350
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016Buffer Size = 4
Packets Injection Rate (Packet/Cycle/PE)
Thro
ughp
ut (P
acke
ts/C
ycle
/PE
)
Base RouterFlexible Router
The slight improvement in HS, except of increasing saturation point, is because the HS packets are injected faster than they can be collected, furthermore HS packets acquire all network buffer spaces. It could be one of our future work on modifying the architecture of FR to be suitable for such kind of this type of traffic.
28
• Using Xilinx ISE® Synthesis Tool (XST) targeting Virtex-5 FPGA, xc5vfx70t-1ff1136.
– The area and maximum frequency results of both Flexible and Base Routers– The increase in area is accepted due to the added logics for flexibility and FPGA
resources.
December 2, 2013
Synthesis Results (1/2) Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
FPGA resourcesNumber of resources used
Base Flexible
BUF2 BUF4 BUF8 BUF2 BUF4 BUF8
LUTs 657 776 836 1078 1111 1112
FFs 425 430 440 473 474 493
AREA RESULTS OF XILINX FPGA
FPGA resources Base Flexible
BUF2 BUF4 BUF8 BUF2 BUF4 BUF8
Max Frequency (MHz) 164 150 150 141 139 141
FREQUENCY RESULTS OF XILINX FPGAMax Clock Frequency decreased in Flexible router due to the Flexibility units but the Performance of Flexible Router in terms of throughput and Latency overcome this impact.
29December 2, 2013
Synthesis Results (1/2)
ConfigurationArea in Cells Power in µW
Cell area Leakage Switching
Base 557963.68 1.015 51421.04
Flexible 661936.7 1.15 56372.29
Overhead 18 % 13 % 9.6 %
• Using Cadence Encounter RTL Compiler tool and 180nm standard cell library. – The power dissipation and area overhead are obtained for each case
at a typical operating conditions for 180nm technology. • 25o C, 1.8 Volts, Typical Transistor Model.
– Both dynamic and leakage power estimates were extracted from the synthesized router implementation, assuming a 50% uniform switching activity on all router input ports.
AREA AND POWER RESULTS FOR 180NM TECHNOLOGY
Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
30
• Experiment results show that Flexible Router • Increase in the throughput • Reduce the latency
• @low injection rates both Base and Flexible routers have nearly the same performance.• @ high injection rates Flexible has better performance, hence the propriety of flexibility
used.• Flexible router has saturation point higher than that of the Base router.• For UNI traffic there is 15% allows higher injection rate, in addition to improvement in the
performance at higher rates.• For HS and NN it is a small improvement (increasing saturation point), specially for HS.
– For HS, regards to that HS packets injected faster than they can be collected, furthermore HS packets acquire all network buffer spaces.
– As the traffic characteristics of NN where each injector only injects packets to its neighbors so the utilization of buffer makes the throughput to perform as a linear function that all injection served by the routers and no congestion happens to affect the throughput.
December 2, 2013
Analysis Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
31
• Decrease the communication overhead due to FFC• Support hot Spot traffics by modifying the FFC.• Implement and evaluate the Flexible Router for:
Virtual Channels. Other switching techniques like Virtual Cut-Through and Wormhole.
• Explore Flexible Router to support 3-D Network on Chip.• More real-world example implementations• The support for dynamically reconfigurable system
December 2, 2013
Future Work Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
32
[1] Hossam El-Sayed, Mohammed Ragab, Mohammed S. Sayed, and Victor Goulart, “ Hardware Implementation and Evaluation of the Flexible Router Architecture for NoCs,” 20th IEEE-ICECS International Conference on Electronics, Circuits, and Systems, UAE, Dec. 2013. (Accepted As Lecture).[2] Hossam El-Sayed, Ahmed Shalaby, Mostafa Said, Mohammed S. Sayed, Mohammed Ragab and Victor Goulart, Performance Evaluation of Flexible Router Architecture for NoCs,” 24th International Conference on Field Programmable Logic and Applications, Munich, Germany; September 2 - 4, 2014. (Submitted).
December 2, 2013
Published Papers Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
33December 2, 2013
Acknowledgment Motivation Related Work Base Router Architecture Flexible Router Architecture Evaluation and Experiments Conclusion
Finally I’d like to thanks all people helped meSpecially
Maher AbdelrasoulAhmed Shalaby Mostafa Said
34
Thank You
December 2, 2013
35December 2, 2013