Upload
vongoc
View
227
Download
6
Embed Size (px)
Citation preview
An NoC Architecture for Inductive
Coupling Wireless Interconnect
H.Amano
Keio University
Special Thanks to
Prof. Kuroda and
Prof. Matsutani
Outline: Wireless 3D NoC
• 3D IC technologies
– Wired approach vs. wireless approach
– Inductive-coupling technology
• Design Examples
– MuCCRA-Cube
– Cube-1
• Simple wireless 3D NoC
– Ring-based 3D network
– Bubble flow control
• CoC (Castle of Chips)
– Large scale system by wireless links
Design cost of LSI increasing…
• System-on-Chip (SoC)
– Required components are integrated on a single chip
– Different LSI must be developed for each application
• System-in-Package (SiP) or 3D IC
– Required components are stacked for each
application SiP
By changing the chips in a package, we can provide a wider range of chip family with modest design cost
3D IC technology for going vertical Two
chips
(face
-to
-fa
ce)
Microbump
Through silicon via
Capacitive coupling
Inductive coupling
Wired Wireless
Scalability
Flexibility
Mor
e t
han
thre
e c
hips
Inductive coupling link for 3D ICs
Stacking after chip fabrication Only know-good-dies selected
More than 3 chips
Bonding wires for power supply
Inductor for transceiver Implemented as a square coil with metal in common CMOS
Not a serious problem. Only metal layers are occupied
Footprint of inductor
Inductive-coupling I/F: An Example
240 8 240
tMUX
Digital 8
Txdata Rxdata
System
Clock Osc.
Local Clock
tDEMUX tTx-Rx
Tx
Tx
Rx
Rx
PCU
Rx enable Tx enable
Phase control unit
generates Rx and Tx enable signals based on the counter value
Data link (8ch)
Clock link
From upper chip
Clock link
Data link (8ch)
Outline: Wireless 3D NoC
• 3D IC technologies
– Wired approach vs. wireless approach
– Inductive-coupling technology
• Design Examples
– MuCCRA-Cube
– Cube-1
• Simple wireless 3D NoC
– Ring-based 3D network
– Bubble flow control
• CoC (Castle of Chips)
– Large scale system by wireless links
Prof. Kuroda’s recent projects
• Non-contact memory cards with wireless data/power supply [ Chung2012 ]
• Digital Rosetta Stone [Yuan2010]
• Non-contact Wafer-Level Testing[ Radecki2012]
Today, I focused on joint project for developing systems using wireless inductive coupling.
MuCCRA-Cube (2008)
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
Data Memory
Technology: 90nm, Chip thickness: 85um, Glue: 10um
5.0
mm
2.5mm
Inductive-Coupling Up Link
Inductive-Coupling Down Link
• 4 MuCCRA chips are stacked on a PCB board
[Saito,FPL’09]
MuCCRA-Cube using inductive coupling
• MuCCRA: a dynamically reconfigurable processor
• Number of MuCCRA chips stacked in a package
can be changed
SiP
MuCCRA-Cube: Application mapping
Chip0
Chip1
Mem Mem Mem Mem
PE PE PE PE
PE PE PE PE
PE PE PE PE
PE PE PE PE
PE PE PE PE
PE PE PE PE
PE PE PE PE
Mem Mem Mem Mem
PE PE PE PE
0
180
Left side PEs have uplinks
Right side PEs have downlinks
MuCCRA-Cube: Application mapping
Chip0
Chip1
Data1 Data2 Data3 Data4
ADD ADD SUB SUB
MULT MULT
ADD
SHIFT
DataA DataB
ADD OR
SHIFT SHIFT
MULT
0
180
Cube-1(2012)
• Wireless links are used as packet switching network rather than static links
Geyser-Cube
CMA-Cube
Inductive Coupling
Ring based packet switching network is formed The number of accelerators can be
changed.
GeyserCUBE
CMACUBE CMACUBE
Implementing JPEG decoder • DMA transfer between accelerators.
Inverse Quantitation
YUV to RGB Convert
Inverse DCT
The flow of JPEG decoder Mapping on Cube-1
Header Analysis
Decode
Huffman decode
Inverse Quantization
Inverse DCT
YUV to RGB Convert
JPEG image
RGB image
:MCU
Image data
Huffman decoder Task control
評価結果 • Cube-1 Quad-CoreにおけるJPEGデコーダの実行 • 128x96pixelの画像をデコード
0
100000
200000
300000
400000
500000
600000
700000
800000
実行サイクル数
(cyc
le)
other
convert yuv to rgb
store intermediate
inverse dct
inverse quantization
huffman decode
collect result
dma trans
processing
image data trans
Cube-1 Quad-Core No
3.15倍の性能
Evaluation Results
•Using three accelerators.
•The target image block: 128 x 96 pixel
3.15 times speed up
Execu
tio
n c
ycle
s
Outline: Wireless 3D NoC
• 3D IC technologies
– Wired approach vs. wireless approach
– Inductive-coupling technology
• Design Examples
– MuCCRA-Cube
– Cube-1
• Simple wireless 3D NoC
– Ring-based 3D network
– Bubble flow control
• CoC (Castle of Chips)
– Large scale system by wireless links
TX
TX
TX
TX TX
TX
TX
TX Bonding wire
Bonding wire
Bonding wire
Bonding wire
Chip stacking method: Slide & stack
• Inductor has TX/RX/Idle modes (1-cycle switch)
Slide & stack
Inductor (TX)
Inductor (RX)
TX
Wireless 3D NoC
Arbitrary chips are stacked to form a single system
– Each chip has vertical links at pre-specified locations, but
we do not know the number and types of chips.
CPU chip from
CPU maker
Memory chip from
memory maker
GPU chip from
GPU maker
Required chips are stacked for given applications
An example (4 chips)
Ring is the simplest approach to add, remove, swap the nodes
Ring
networ
k
Ring network: Deadlock problems Ring is the simplest approach to add, remove, and swap the
chips in a package without any modifications. But…
• Structure deadlock
– Ring network inherently
includes a cycle
– Cyclic dependency causes
packet deadlocks
• Protocol deadlock
– Coherence protocol has
multiple message classes
– Request-reply deadlocks
Deadlock-free packet transfer is mandatory for NoCs
RX TX
Ring network: VC-based approach
• VC-based approach
– Two VCs for each
message class
– Packets transit these two
VCs at the dateline
• Merit
– Conventional VC router
• Demerit
– Number of VCs is
increased as number of
message classes
– 6 VCs for 3 classes
Dateline
2VCs for each message class
Cyclic dependency can be cut before and after the dateline by VC transition
RX TX
Ring network: Bubble flow approach
• Bubble flow approach
– Single buffer can store
more than 2 packets
– Buffer space of a single
packet is always reserved in
each router
• Merit
– No VC; Simple flow control
• Demerit
– Miss routing when packets
cannot exit the ring
– Scalability problem
Single VC that can buffer more than 2 packets
Deadlock does not occur since all buffers are never occupied by the flow control
[Puente,ICPP’99] [Abad,ISCA’07]
RX TX
Evaluations: Simulation environments • Two network sizes are simulated by GEMS/Simics
4 chips (4-CPU)
3
8 chips (8-CPU)
2
1
0 CPU L2$ banks
7
1
0
# of chips 4 / 8
# of CPUs 4 / 8
# of routers 8 / 16
# of L2$ banks 16 / 32
Packet sizes 1 or 5 flits
Table 1: Architectural parameters
OS Sun Solaris 9
Compiler Sun Studio 12
Application NAS Parallel Bench
(OpenMP ver)
Table 2: Software environments
BT, CG, DC, EP, FT, IS, LU, MG, SP, UA (Total 10) For more detail, refer the paper
Evaluations: Simulation environments
• Two network sizes are simulated by GEMS/Simics
• Three communication schemes are compared
Ring + VC flow Ring + Bubble flow Vertical bus
4 chips (4-CPU)
3
8 chips (8-CPU)
2
1
0 CPU L2$ banks
7
1
0
Dateline
2VC
Results: Network throughput @ 4 chips
RTL simulations of wireless 3D NoC model (8 routers)
Bubble outperforms 2VC(15-flit) & comparable to 2VC(30-flit)
Vertical bus Ring + VC flow
2VC (15-flit)
Ring + Bubble
Bubble (15-flit)
Bubble(15-flit)
2VC(15-flit)
Results: Network throughput @ 8 chips
RTL simulations of wireless 3D NoC model (16 routers)
Vertical bus Ring + VC flow
2VC (15-flit)
Ring + Bubble
Bubble (15-flit)
Bubble(15-flit)
2VC(15-flit)
Bubble outperforms 2VC(15-flit) & comparable to 2VC(30-flit)
Bubble(15-flit)
2VC(15-flit)
Results: Application performance @4chips
Execution times of NAS parallel bench (4 CPUs)
Ring + VC flow
6VC (30-flit)
Ring + Bubble
Bubble (15-flit)
Ring + VC flow
6VC (18-flit)
Bubble approach outperforms VC-based one by 12.5% @4 chips
Vertical bus
-12.5%
Outline: Wireless 3D NoC
• 3D IC technologies
– Wired approach vs. wireless approach
– Inductive-coupling technology
• Design Examples
– MuCCRA-Cube
– Cube-1
• Simple wireless 3D NoC
– Ring-based 3D network
– Bubble flow control
• CoC (Castle of Chips)
– Large scale system by wireless links
TX
TX
TX
TX TX
TX
TX
TX Bonding wire
Bonding wire
Bonding wire
Bonding wire
The limitation of stacking
Inductor (TX)
Inductor (RX)
TX
TX TX
TX TX
Castle of Chips (CoC)
• Chips with multiple wireless ports are used as bridges of stacking.
• The stacking can be extended to the horizontal direction.
→ A large number of chips can be connected only with wireless inductive coupling links.
• However, power supply requires bonding wires with the current art of technology
Transmitter
Receiver Bi-directional
a) Uni-directional Links b) Bi-directional Links
Examples of Wireless Coupling Links
Linear Stacking: The simplest CoC
Up link
Down link
Linear Stacking: The simplest CoC
Layer 0
Layer 1
Stacking using bi-directional links
The case of using bi-directional links
Layer 0
Layer 1
Layer 2
Layer 3
Circular Stacking
The central space is used for power supply bonding wires.
The network consisting of CoC
• Tightly coupled interconnection is
assumed between links in the same chip.
– Bus, Crossbar, Direct links, etc.
– Here, a chip with 4 links = a node with 4links
Up link
Down link
a) Chip Stacking
b) Corresponding Network
3、-1 4、0 5、1 6、2 7、3
6,4
0,0 1,1 2,2 3,3 4,4 5,5
1、-1
2、-2
Level 0
Level 1
Level 2
Level 3
Level 4
x
y
n*2+1
m*2+1
Linear
Stacking
m*2+1:Height
n*2+1:Width
0,0
10,0 5,-5
5,5
Interconnection
network formed
with the circular
stacking
Stairway Boundary Mesh(SBM)
0,0
0,0
5,5
6,4
1, -1
2, -2
5, 5
7, 3
0,0
3, -3
8, 2
5, 5
m=1, n=5 m=2, n=5 m=3, n=5
Extension of Dimension Order
Routing
0,0
3, -3
8, 2
5, 5
Go to X direction
On the boundary,
go around it.
When X is the
same as the destination,
go to Y direction 2,-1
6,2
0,0 1,1 2,2 3,3 4,4 5,5
1,-1
2,-2
3,-1 4,0 5,1 6,2 7,3
6,4
x
y
5,3
1,0
2,-1
Original DOR
Original DOR
DOR vs. Extended DOR
The hop counts are the same as that of the original DOR.
If n>m, Diameter=2n (Independent on m)
x
y X X X
X
Possible turn
Forbidden turn
Explanation of deadlock
avoidance by Turn model
The number of stacked chips and
diameter
m Height
of
stacking
n=3 n=4 n=5 n=6 n=7
2 5 18 23 28 33 38
3 7 25 32 39 46 53
4 9 32 41 50 59 68
Diam
eter
6 8 10 12 14
Average distance vs. the number of chips
Better than those of rectangular mesh
Circular Stacking is not so good because of the central space.
Summary
• Wireless 3D interconnect technique will spread the possibility of the system integration.
• Wireless power supply comes insight.
• Researches on CoC (Castle of Chips) just starts.
– There are a lot of possible structures especially on the extension of circular stacking.