35
Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004 ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

  • Upload
    baby

  • View
    30

  • Download
    0

Embed Size (px)

DESCRIPTION

ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs. Overview. Logic synthesis LUT Clustering LUT capacity Chortle – example technology mapper Architecture-specific optimization. Boolean network. - PowerPoint PPT Presentation

Citation preview

Page 1: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

ECE 697F

Reconfigurable Computing

Lecture 5

Technology Mapping: Packing Logic into LUTs

Page 2: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

Overview

• Logic synthesis

• LUT Clustering

• LUT capacity

• Chortle – example technology mapper

• Architecture-specific optimization

Page 3: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

Boolean network

° A Boolean network is the main representation of the logic functions for technology independent optimizations.

° Each node can be represented as sum-of-products (or product-of-sums).

° Provides multi-level structure, but functions in the network need not correspond to logic gates.

Page 4: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

Boolean network example

out1 = k2 + x2’ out2 = k3 + x1

k2 = x1’ x2 x4 + k1k3 = k1 x4’

k1 = x2 + x3

x1 x2 x3 x4

primary outputs

primary inputs

Page 5: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

Terms

° Support: set of variables used by a function.

° Transitive fanout: all the primary outputs and intermediate variables of a function.

° Transitive fanin: all the primary inputs and intermediate variables used by a function. Transistive fanin determines a cone of logic.

coneprimary inputs output

Page 6: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

Partially-specified function

x1

x2

x3

0

1

1

1

don’t care

Page 7: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

Optimizations

° Simplification.• Changing the way a function is represented.

° Network restructuring.• Adding and removing nodes.

° Delay restructuring.• Optimizations that reduce the height of critical paths.

Page 8: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

Partial collapsing

before after

f1

f2 f3

f4 F

f3

f4

Page 9: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

Technology mapping

° Cover the function:

Page 10: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

FPGA tech mapping

° Cost (number of inputs) doesn’t always increase with added functions:

Page 11: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

FPGAs vs. custom logic

° Cost metric for static gates is literal:• ax + bx’ has four literals, requires 8 transistors.

° Cost metric for FPGAs is logic element:• All functions that fit in an LE have the same cost.

Page 12: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

LUT-based logic synthesis

° Find the largest logic cone that will fit into the LUT:

r = q + s’

q = g’ + h s = d’

d = a + b

Page 13: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

How much fits in a LUT?

• One 2-input NAND gate frequently used for comparison.

• Approximately 12 ~ 15 gates per four-input LUT.

• 216 functions -> 80 after IO swapping

14 after IO inversion

• 4-input determined to be optimal

[Rose 1990]

AB

CD

AB

CD

Page 14: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

Technology-Independent Logic Optimization

• Improve circuit based on cost

- Keep same functionality

• Boolean Evaluation/decomposition

• Simple factoring -> minimizing literals

f = ac + ad + bc + bd

g = a + b + c

e = a + b g = e + c

f = e(c + d)

Page 15: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

Factorization

° Based on division:• formulate candidate divisor;

• test how it divides into the function;

• if g = f/c, we can use c as an intermediate function for f.

° Algebraic division: don’t take into account Boolean simplification. Less expensive then Boolean division.

Page 16: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

Library-based Technology Mapping – MIS II

• Three steps: decomposition, matching, covering

• Circuit first decomposed into NAND representations

• Different collections of NANDs can be implemented differently in VLSI

Inv, cost 2 NAND2, cost 3AOI-21, cost 4

Page 17: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

MIS II

• Decompose into NAND-2 using Boolean techniques

• Use dynamic programming to match subtrees with libraries

• Choose lowest cost implementation that covers all primitives.

Cost = Cost =

Page 18: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

Tech Mapping for LUTs

• Minimize total number of LUTs

• Minimize the number of levels of LUTs

• Many different approaches

- Partitioning -> Flowmap

- BDDs -> XMAP

- Chortle -> Covering

• Basic Xilinx tech mapping follows Chortle with modification to handle registers.

Page 19: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

Chortle-crf• Dynamic programming approach

• Minimize # LUTs – primary goal

• Minimize # input circuit root uses

- Secondary goal• Operates on AND-OR circuits.

Locate boundaries

AB C DE F

w

x

GHI J K L M

y

z

Page 20: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

Chortle-crf

• Major innovation is bin packing

• Simultaneously addresses decomposition and matching

• Goal: Find decomposition of every node in the network that minimizes # LUTs in final circuit

Without decomp 4-LUTs

With decomposition 2-LUTs

Page 21: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

Mapping Each Tree

• Dynamically visit each node in the graph

- Fanin nodes drive the node under evaluation

Boxes -> fanin LUTs, cost is number of inputs

Bins -> N input LUT (in this case 5)

First Fit Decreasing /* construct 2-level decomp */

box list <- fanin LUTs sorted by size

bin list <- 0

while (box list is not 0) {

box <- largest LUT

find bin that will contain LUT

if bin doesn’t exist

bin <- box /* create new bin */

else

bin <- box /* pack in exisiting */

Page 22: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

Multi-Level Decomposition

• Chain LUTs together

• Output of largest second level LUT connected to LUT with unused input

• May need to add a new LUT

• Leads to min LUTs and fanout LUT with smallest # input

• This fanout LUT used as input to next stage

Page 23: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

Examples

a) Fanin LUTs

b) Two-level Decomposition

c) Multi-level Decomposition

u vw

x y

y

x

z.2

z.1

wvu

y

u v

w xz.1

Page 24: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

Optimality

• For LUTs with fewer than 6 inputs Chortle will create an optimal result for subtree

• Combination of sub-trees is not optimized.

• Local optimizations needed to ensure global optimality.

Reconvergent paths -> net drives multiple gates.

Replicating logic -> creating additional fanout

Page 25: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

Translating a Design to an FPGA

• Improve 2-level decomposition to take fanout into account

• Replace FFD with an exhaustive search that repeatedly invokes FFD.

• Try both with and without reconvergent path and select best mapping (forced merging)

• Inputs must reconverge at node being decomposed.

Page 26: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

Reconvergent Paths

• Frequently, more than one pair of fan-in LUTs share inputs

• For each combination of pairs that share inputs, perform FFD.

• Two-level decomp with fewest bins and smallest least filled bin retained

Reconverge

pair list <- all pairs of fanin LUTs with shared inputs

best LUTs <- 0

for all possible pairs from pair list {

merged LUTs <- copy of fanin LUTs with forced merge

FFD(merged LUTs) /* best combo */

}

Page 27: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

Maximum Share Decreasing

• Exhaustive search prohibitive

• Select box using following criteria

1. Greatest # inputs

2. Shares greatest # inputs with any existing bin

3. Shares greatest # of inputs with existing (remaining) boxes

• Reduces to FFD for no input sharing

• Points 2 and 3 optimize network sharing

Page 28: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

Node Replication

• Apply replication to fanout nodes

• Map without replication first

• Locally decompose fanout nodes to determine savings

• Ordering important

Without Replication With Replication

Page 29: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

Results – Chortle-crf

• 20 netlists mapped to 5-input LUTs

• Reconvergence reduced LUTs by 2.7%

• Replication reduced LUTs by 3.7%

• Combined 14% reduction achieved

• Replication exposes reconvergent paths creating additional opportunities for optimization.

Page 30: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

Chortle-d

• Minimize delay through circuit

• Generally increases hardware required

- Reduced logic levels by 38%

- Increased # LUTs by 79%

• Note most delay in FPGA in interconnect

Page 31: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

Other Approaches

• MIS-PGA

- Groups inputs into LUTs

- Decompose into 4-LUTs (Roth-Karp)

- 47 times slower than Chortle

- 14% fewer LUTs

• XMAP

- Represent circuit as BDDs

- Effective for multiplexer based devices.

- Also, BDS-PGA

Page 32: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

Flowmap

1. Use network flow to partition circuit.

2. Determine point where minimum flow achieved for minimum cut

3. Cut until LUTs of size N achieved.

Page 33: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

Taking Flip flops into Account

• FPGA devices contain fixed resources – FFs

• Technology mapping should take these into account

• Consider fanout nodes.

FF

FF

Page 34: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

LUT Packing - VPACK

• Seed BLE – choose BLE with most inputs.

• Select next BLE -> BLE which shares most inputs and outputs with cluster

• Continue until cluster is full or adding any BLE will overflow I -> # inputs

• Hill Climbing – exceed I limit temporarily to find better minimum.

Page 35: ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs

Lecture 5: Technology Mapping: Packing Logic Into LUTs September 22, 2004

Summary

• Many tech mapping algorithms exist to minimize delay/area

• Chortle use dynamic programming heuristic to perform mapping

• Largely a solved problem

• More sophisticated techniques evaluated recently