Upload
others
View
15
Download
0
Embed Size (px)
Citation preview
Interleaved Architectures for High-ThroughputSynthesizable Synchronization FIFOsAmeer Abdelhadi Mark GreenstreetMcGill University University of British Columbia
[email protected] [email protected]
May 22, 2017
Synchronization FIFOsThe Interleaved FIFO ArchitectureDesign FlowHazardsResults
Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 1 / 17
Multi-Synchronous Designs
...
core core
core core
L2
$
NOC
DSP crypto
DDR4 PCIe 3.0 h.264
Multi-Synchronous Design
Typical chip designs consist ofI A large number of synchronous, synthesized blocks –
this is the easy part!I There are many different clocks.I Designers need easy to use, synthesizable interfaces.
The big challenges are ones of design integration.I This is where asynchronous methods excel.
FIFOs are a commonly used for these interfaces
Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 2 / 17
Prior Work: Gray-Code FIFOs
get_datar_data
r_addr
sync sync Q
compare
space_avail w_en
w_data
r_en
get_req
data_valid
w_addr
Dual−Port SRAM
gray
counter
en
controlread
gray2bin
gray2bin
gray
counter
en
controlwrite
gray2bin
gray2binQ
compare
put_data
put_req
Legend:
code
clocked by put_clk clocked by get_clk
code
critical path (for cycle time)
+ Textbook solution – familiar to many designs.− Gray-to-binary conversion a bottleneck.• See [Cummings+Alfke2002].
Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 3 / 17
Prior Work: One-Hot FIFOs
sync
gf ge
sync
empty full
stage 1
stage N−1..
.
... get
con
trol
pu
t co
ntr
ol Q
D D enQ
en
Note: data-store not shown.
+ Simple, fast control – very modular design.− Per-stage synchronizers and control tends to dominate area and
power.• See [Chelcea+Nowick2004,Ono+Greenstreet2009].
Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 4 / 17
Contributions
A novel, interleaved FIFO architectureI Area and power efficientI High throughput, minimal latency.
Synthesizable designI Open-source Verilog
2 Supports standard, SynopsisTM design flow.2 You can use it in your designs!
I Highly parameterized – synthesize the FIFO you want.I Provides benchmark design for comparisons for further
research.Identify a glitch hazard in synchronization FIFOsI Many published designs have this hazard.I We present our solution.
Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 5 / 17
FIFO Architecture
clk_get
DQ
sync
sync
getcontrol
DQ
N
putcontrol
Nv
Nv
D Q
a few gates
wQD
a few gates
w
a few gates
spaceav D Q
data_out
dl_oe
N
data_in
dl_we
data store
D Qreq_put
data_in
clk_put
data_out
req_get
datav
Similar to Gray-Code FIFOsI Separate write and read pointers in put and get interfaces.I Our design uses a pair of one-hot counters in each interface.
2 Avoids conversions to/from Gray code⇒ simple and fast2 Has a small number of synchronizers between put and get⇒ area and power efficient.
Simple interface to sender and receiverI Looks like flip-flop (in each domain) with standard flow control.I Minimal exposure of internal timing details.
Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 6 / 17
A Thermometer Counter
clk
enableD Qen
q[2]
D Qen
q[3]q[1]
D Qen
D Qen
q[0]
Initialized to all zeros.Fills left-to-right with ones, then fills with zeros, then with ones, . . .Current position marked by state with transition (or 0 if all stagesare the same).
Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 7 / 17
Dual Thermometer Counters
qv[0]
qv[1]
qv[2]
qv[3]
D Qen
qh[2]
D Qen
qh[3]
D Qen
D Qen
qh[0]
DQ
en
DQ
en
DQ
en
DQ
en
qh[1]
clk
enable
Increment the horizontal counter witheach wrap-around of the vertical counter.Sequence length is the product of thosefor the two counters.Avoids the O(N ) control-complexity thatis the bane of one-hot designs.
Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 8 / 17
Decode to produce full-thermometer code
0
a1a0a0a1 a1a0a0a1
a1a0a0a1 a1a0a0a1
a1a0a0a1 a1a0a0a1
a1a0a0a1 a1a0a0a1
clk
enableD Qen
D Qen
D Qen
D Qen
DQ
en
DQ
en
DQ
en
DQ
en
Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 9 / 17
Decode to produce full-thermometer code
1
0 0 0
0 0 0
00 0 0
0
00 0 0
0
00 0 0
0
0
1
0
0
a1
a1a0a0a1 a1a0a0a1
a1a0a0a1 a1a0a0a1
a1a0a0a1 a1a0a0a1
clk
enableD Qen
D Qen
D Qen
D Qen
DQ
en
DQ
en
DQ
en
DQ
en
a1a0a0a1 a1a0a0
Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 9 / 17
Decode to produce full-thermometer code
0
0
00 0 0
0
0 0 0
00 0
0
0
0
1
00 0 0
1
1 0 0
0
a1
a1a0a0a1 a1a0a0a1
a1a0a0a1 a1a0a0a1
a1a0a0a1 a1a0a0a1
clk
enableD Qen
D Qen
D Qen
D Qen
DQ
en
DQ
en
DQ
en
DQ
en
a1a0a0a1 a1a0a0
Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 9 / 17
Decode to produce full-thermometer code
0
00 0
0
0
0
1
01 0 0
1
1 0 0
00 0 0
00 0 0
0
1
a1a0a0a1 a1a0a0a1
a1a0a0a1 a1a0a0a1
a1a0a0a1 a1a0a0a1
clk
enableDen
D Qen
D Qen
D Qen
DQ
en
DQ
en
DQ
en
DQ
en
Q
a1a0a0a1 a1a0a0a1
Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 9 / 17
Decode to produce full-thermometer code
0
00 0
0
0
0
1
01 0 0
1
1 0 0
01 0 0
00 0 0
1
1
a1a0a0a1 a1a0a0a1
a1a0a0a1 a1a0a0a1
a1a0a0a1 a1a0a0a1
clk
enableDen
D Qen
D Qen
D Qen
DQ
en
DQ
en
DQ
en
DQ
en
Q
a1a0a0a1 a1a0a0a1
Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 9 / 17
Decode to produce full-thermometer code
1
00
0
1
0
1
01 0 0
1
1 0 0
01 0 0
00 0
1
1
1
0
a1a0a0a1 a1a0a0a1
a1a0a0a1 a1a0a0a1
a1a0a0a1 a1a0a0a1
clk
enableDen
D Qen
D Qen
D Qen
DQ
en
DQ
en
DQ
en
DQ
en
Q
a1a0a0a1 a1a0a0a1
Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 9 / 17
Decode to produce full-thermometer code
1
0
1
0
1
01 0 0
0
1 1 0
01 0 0
00 0
1
1
0
1
0 0
a1a0a0a1 a1a0a0a1
a1a0a0a1 a1a0a0a1
a1a0a0a1 a1a0a0a1
clk
enableD Qen
Den
D Qen
D Qen
DQ
en
DQ
en
DQ
en
DQ
en
Q
a1a0a0a1 a1a0a0a1
Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 9 / 17
Detecting Data Availability
clk_getneeds to be synchronized to
full[2,*]full[0,*] full[1,*] full[3,*]
full[i,j]
[i,j]therm_put
[i,j]therm_get
data_available
A stage holds valid data if the value of the get-thermometer differs fromthe value of the put-thermometer.An OR-tree can determine if any stage holds valid data.But we need to synchronize.
Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 10 / 17
Synchronization with Interleaving
D
sync
Q
D
sync
Q
D
sync
Q
D
sync
Qclk_get
full[2,*]full[0,*] full[1,*] full[3,*]
data_available
One synchronizer per row.Nvertical ≥ Latencysync required for full throughput.Can use fewer synchronizers than Gray-Code designs – if the synchro-nizer latency is less than the number of address bits.
Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 11 / 17
Synchronization: Achieving Full Throughput
data_available
!ohH_get[*]
full[0,*]
b
clk_get
D Qclr
sync
D Qclr
D Qclr
D Qclr
do_geteven
odd
d
c
e
...
...
...
...
...
...
Nh
Nh
ohV[i]
do_get a
data available is registered, we need to know if there will be dataavailable after the next clk get↑.I If any row has a valid value and there isn’t a req get on this cycle,
then data will be available on the next cycle.I If two consecutive rows have valid data (one even, the other odd),
then data will be available on the next cycle.We clear the synchronizer after removing a data word from that row.
Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 12 / 17
Synchronization: Achieving Full Throughput
data_available
!ohH_get[*]
full[0,*]
b
clk_get
D Qclr
sync
D Qclr
D Qclr
D Qclr
do_geteven
odd
d
c
e
...
...
...
...
...
...
Nh
Nh
ohV[i]
do_get a
data available is registered, we need to know if there will be dataavailable after the next clk get↑.We clear the synchronizer after removing a data word from that row.I If there is data available in another column, we allow the first stage
of the synchronizer to record that on clk get↑
Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 12 / 17
The Data Path
w data_out
datav
req_get
w
w
put control
get control
D Qen
wdata_in
space_av
clk_put
req_put
clk_get
data_even
data_odd
sel_odd
en
oeD Q
en
oeD Q
en
oeD Q
D Q
D Q
sel_even
w
w
w
w
w
Input latch gives FIFO the same set-up and hold timing as a flip-flop.Output path uses even/odd interleaving:I We found that the data output path could not maintain GHz clock fre-
quencies.I Key observation: data is available on the output of the data latches
for nearly the full synchronizer latency. This ensures data validity.I Alternating paths provides an extra clock-cycle for the path to settle,
and the timing requirements are easily satisfied.The output logic is to block glitches (see next slide).
Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 13 / 17
FIFOs and CDC Hazards
...
g
state
local
state
local
data_out
FIFO
data_valid
get_req
f
a1
a2
D Q...
control
What the designer intended
A designer might (reasonably) assume thatassign foo = data valid ? f(data out, . . .) : g(. . .);
is safe.
BUT, synthesis can introduce glitches the designer never imagined.This hazard is present in many published clock-domain-crossing (CDC) FIFOs.Our design ensures data out is all 0s when no valid data is available.
Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 14 / 17
FIFOs and CDC Hazards
f, g, & control
merged and
optimized by
synthesis
...data_out
data_valid
get_req
D Q
...
statelocal
What synthesis can do
A designer might (reasonably) assume thatassign foo = data valid ? f(data out, . . .) : g(. . .);
is safe.BUT, synthesis can introduce glitches the designer never imagined.
This hazard is present in many published clock-domain-crossing (CDC) FIFOs.Our design ensures data out is all 0s when no valid data is available.
Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 14 / 17
FIFOs and CDC Hazards
1
...
state
local
0
1
1
...
...
......data_out
data_valid
get_req
D Q
0
01
...
What synthesis can do
A designer might (reasonably) assume thatassign foo = data valid ? f(data out, . . .) : g(. . .);
is safe.BUT, synthesis can introduce glitches the designer never imagined.
This hazard is present in many published clock-domain-crossing (CDC) FIFOs.Our design ensures data out is all 0s when no valid data is available.
Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 14 / 17
FIFOs and CDC Hazards
...
state
local
...
......
...
data_out
data_valid
get_req
D Q
0
111
1
1...
0
What synthesis can do
A designer might (reasonably) assume thatassign foo = data valid ? f(data out, . . .) : g(. . .);
is safe.BUT, synthesis can introduce glitches the designer never imagined.
This hazard is present in many published clock-domain-crossing (CDC) FIFOs.Our design ensures data out is all 0s when no valid data is available.
Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 14 / 17
FIFOs and CDC Hazards
...
state
local
...
......
...
data_out
data_valid
get_req
D Q
0
X1
X
X X... X
What synthesis can do
A designer might (reasonably) assume thatassign foo = data valid ? f(data out, . . .) : g(. . .);
is safe.BUT, synthesis can introduce glitches the designer never imagined.This hazard is present in many published clock-domain-crossing (CDC) FIFOs.Our design ensures data out is all 0s when no valid data is available.
Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 14 / 17
Design Flow
Design Generator & Run-in-Batch Manager (shell/tcl/perl scripts)
Interleaved FIFO VERILOG Modules
Gates Synthesis(Synopsys Design Compiler)
Pre-layout gates netlist
Place & Route(Synopsys IC Compiler)
Back-annotated Gate-level-simulation (GLS)
(Synopsys VCS simulator)
Timing
Power Estimates(Synopsys PrimeTime)
CAD Tools Setup
Cell-based PDK
User s Design Requirements:Number of vertical/horizontal stages, data width,
synchronizer depth
Design Constraints
Verilog Testbench
RC ExtractionPost-layout gates netlist
Area/cell#/wire length
Performance /Latency
Equiv-alence
Delays(sdf)
Nodes Activity
(VCD)
Switching/Dynamic/Leakage Power Dissipation
Static Timing (STA)(Synopsys PrimeTime)
Complete SynopsisTM design flowI Includes synthesis, place-and-
route, back-annotation, simula-tion, performance analysis, andpower estimation
Highly parameterizedI FIFO word-size, depth, and in-
terleaving specified by user.I Number of synchronizer stagesI Target clock frequency
Open sourceI The FIFO itself is about 200
lines of VerilogI The test bench is about 300
lines of VerilogI Plus about 2400 lines for 10
scripts.I You can use it, you can build on
it!I Get it from Github – details on
the last slide.
Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 15 / 17
PerformanceN Nh Nv w S freq power area8 4 2 8 2 1300 1.2mw 1745µ2
16 4 4 8 3 1400 1.3mw 2618µ2
16 8 2 8 3 1500 1.9mw 3268µ2
32 8 4 8 4 1200 1.8mw 4875µ2
8 4 2 32 2 1400 2.2mw 2979µ2
16 4 4 32 2 1200 2.0mw 6725µ2
32 8 4 32 4 1100 2.9mw 12712µ2
A few examples from 144 test cases.
Frequency dominated by “tree structures”. Clock period is roughlylogarithmic in Nh, Nv , and w .Power is dominated by the control circuitry and is roughly linear inthe number of control flip-flops.Area is dominated by data storage and is roughly linear in thenumber of data latches.Results using a 65nm commercial library.
Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 16 / 17
ConclusionsWe have presented a fully synthesizable, general purpose,clock-domain crossing FIFO.I open source: https://ghithub.com/AmerAbdelhadi/
Interleaved-Synthesizable-Synchronization-FIFOsI Supports complete Synopsis ASIC design flow.I Highly configurable.I Thoroughly evaluated for a wide range of configuration
parameters.Novel interleaved FIFO architectureI avoids the Gray-code conversion bottlenecks of Gray-code
FIFOs.I smaller control logic and far fewer synchronizers than one-hot
FIFOs.Identified a glitch hazard that is common in published designs.I and our design avoids it.
Thanks: Tarik Ono, Brad Quinton, NSERC Canada.
Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 17 / 17