Upload
kristian-roberts
View
226
Download
0
Tags:
Embed Size (px)
Citation preview
Selective Flexibility: Breaking the Rigidity of Datapath Merging
Mirjana Stojilović, Institute Mihailo Pupin, University of BelgradeDavid Novo, École Polytechnique Fédérale de Lausanne (EPFL)
Lazar Saranovac, School of Electrical Engineering, University of BelgradePhilip Brisk, University of California Riverside
Paolo Ienne, École Polytechnique Fédérale de Lausanne (EPFL)
Zuluaga and Topham, TCAD 2009
The Rigidity of Datapath Merging
2/34
Brisk, Kaplan, and Sarrafzadeh, DAC 2004
Datapath merging is a technique for generating a single reconfigurable datapath out of a set of input DFGs,
which focuses on resource reuse among DFGs to save area.
Motivation
• Improve the efficiency through specialization• Area savings by merging datapaths• But what about flexibility?
We want to fill this gap!
4/34
Selective Flexibility
flexibility = ability to capture and implement computational structures that are characteristic of a specific application domain
selective = the computational structures are characterized, and thus restricted, by:
(1) type of operations, (2) their number, and (3) their interconnections
5/34
Path Fusion
Creating a SUPERPATH – the (minimum area) super-sequence of all sequences
of operators found in input DFGs
6/34
Path FusionSTEP 1: Enumerate all paths from inputs to the outputs of each DFG.
A path in a graph is a sequence of vertices such that from each of its vertices there is an edge to the next vertex in the sequence.
7/34
A subsequence is a sequence that can be derived from another sequence by deleting some elements
without changing the order of the remaining elements.
Path FusionSTEP 3: Perform greedy search for maximum-area common subsequence (MACSeq) STEP 4: Fuse the pair of paths (sequence alignment by Needleman/Wunsch)REPEAT steps 3-4 UNTIL a single path is left in the set
Assumption: MUL > SUB > ADD9/34
Brisk, Kaplan, and Sarrafzadeh, DAC 2004
Path Fusion
STEP 5: Proceed by moving the path to the set with shorter pathsREPEAT steps 3-5 UNTIL a single path is left – THE SUPERPATH
THE SUPERPATH
10/34
Array Generation
Superpath replication to create regular array of operators. How many columns?
11/34
Interconnect Dimensioning
Adding FPGA-like interconnections:
two I/O ports per column, horizontal and vertical channels
12/34
Interconnect Dimensioning
• To decide on the number of word-size tracks we do P&R
• Placement:• Assign nodes to rows (top-down)• When assigning to columns:• keep distances between nodes short• emphasize graph regularity• emphasize symmetry
dot, tool for laying out hierarchical drawings of directed graphs
Yoon, Shrivastava, Park, Ahn, Jeyapaul, and Paek, ASP-DAC 2008
Cong and Jiang, FPGA 2008
13/34
Interconnect DimensioningSUPERPATH
Top-down greedy placement approach: Place the node in the first row with the correct operators below predecessor nodes.
15/34
Interconnect Dimensioning
Placement exception: if a node is a part of a binary tree,first minimize the tree height and then place as early as possible.
Rows never used are potentially removed after placement to conserve area!
16/34
Interconnect Dimensioningdot is forced to place nodes having the same rank within the same row.
dot outputs:• Vertical coordinates of nodes• Horizontal coordinates of nodes 17/34
Interconnect Dimensioning
• To decide on the number of word-size tracks we do P&R• Placement defined by dot
VPR, an FPGA architectural simulator and tool for P&R
• FPGA-like routing:• horizontal and vertical routing channels• two-IN one-OUT operators• two IN/OUT ports per column• word-size tracks (constant bitwidth)
Betz and Rose, FPGA 2000
Ye and Rose, Transactions on VLSI systems, 2006
19/34
Interconnect DimensioningInputs for VPR: • DFG Netlist • DFG Placement (dot) • Architectural description
VPR does the routing and reports MIN channel width to achieve legal routing
20/34
Experimental Results
1. Measuring area/delay with respect to ASIC and FPGA2. Measuring generality
Where do our domain-specific datapaths fit?
26/34
Experimental Results
• 19 DFGs covering various classical signal and image processing computations
(FFT, FFTr4, DCT, IDCT, FIR, IIR, autocorrelation, sobel, complex dot product, …)
• DFGs extracted from applications available in EEMBC, TMS320C64x DSP library, TMS320C64x Image/Video processing library, and
ExpressDFG
• Loop unrolling with different factors
• Groupings: GP1 contains all DFGs, while GP2x, GP3x and GP4x regroup DFGs into different and increasingly smaller clusters
27/34
GENERALITY – the ratio of the number of successfully mapped excluded DFGsto the total number of DFGs in the group
Generality
For all DFGs in the group
28/34
GeneralityGroup # of DFGs Generality
[%]1 19 87.5
2A 8 75.0
2B 11 72.7
3A 10 90.0
3B 8 87.5
4A 6 83.3
4B 4 50.0
4C 4 75.0
4D 5 60.0
• Generality 50-90%• In most cases higher than 75%• Lower generality when the learning set is small (4B, 4D)• No extra columns in the array to potentially accommodate bigger DFGs
29/34
Area/Delay compared to ASIC and FPGA
• We synthesized, placed, and routed individually all the operations found in the DFG using the gate implementations of a commercial 65nm library• Conservatively, we ignored the routing area and delay in the ASIC
implementation
• VPR estimates the routing area in the datapath and the routing delay of a DFG when P&R on the datapath• Conservatively, VPR considers all wires as individual wires, not busses
30/34
Area/Delay compared to ASIC and FPGA
Kuon and Rose, “Measuring the gap between FPGAs and ASICs”, FPGA 2006
outliersoutliers
31/34Conservatively, ASIC area/delay refers to a single DFG, rather than all DFGs merged
In most cases,delay cost < 2 andarea cost < 10-12
Conclusions
• A novel way to merge DFGs application domain specific CGRA• A new tradeoff between generality and efficiency
• Future directions:• Specialize the bitwidth of the operators• Customize the shape of the datapath to better fit the domain 33/34
Thank you.
Mirjana Stojilović, [email protected] Novo, [email protected]
Lazar Saranovac, [email protected] Brisk, [email protected]
Paolo Ienne, [email protected]