29
Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1 , Satyaki Das 2 , Steve Trimberger 2 , and Lei He 1 1. Electrical Engineering Dept., UCLA 2. Research Labs, Xilinx Inc. Presented by Yu Hu Presented by Yu Hu Address comments to [email protected] Address comments to [email protected]

Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

Embed Size (px)

Citation preview

Page 1: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

Design, Synthesis and Evaluation of Heterogeneous FPGA

with Mixed LUTs and Macro-Gates

Yu Hu1, Satyaki Das2, Steve Trimberger2, and Lei He1

1. Electrical Engineering Dept., UCLA

2. Research Labs, Xilinx Inc.

Presented by Yu HuPresented by Yu Hu

Address comments to [email protected] comments to [email protected]

Page 2: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

Outline

Introduction

Design of the Macro-gates

Synthesis for the Proposed FPGA Architecture

Comparison of Heterogeneous FPGA Architectures

Conclusions and Future Work

Page 3: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

Heterogeneity in FPGA Architectures

Heterogeneity among SLICEs Programmable logic and routing Tiles are not identical

soft logic fabric [Kaviani, FPGA’96]] hard structures [Jamieson, FPL’05]

Dedicated hard structures e.g. DSP e.g memory block

Heterogeneity within a SLICE Programmable logic and routing Tiles (SLICEs) are identical Different logics exist within a SLICE

e.g. LUTs with different size [Cong, FPGA’99] e.g. mixed PLAs and LUTs [Cong, TODAES’05] e.g. mixed macro-gates and LUTs

(source: Jamieson@FPL’05)

Page 4: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

Heterogeneous FPGA with Macro-Gates

There exists programmability and cost trade-off between LUTs and macrogates Xilinx V4 benefits from small gates (MUX2, XOR2) built in

SLICEs.

The benefit of wider macro-gates Effectiveness of the incorporation of wider logic functions (macro

gates) is not clear.

Our contributions Design a new FPGA architecture with mixed LUTs and macro-

gates Propose a new automatic synthesis flow for mapping a circuit to

the proposed FPGA architecture Evaluate the architecture and show that the proposed

architecture reduces delay and area by 16.5% and 30%, respective, compared to the LUT-only architecture.

Page 5: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

Outline

Introduction

Design of the Macro-gates

Synthesis for the Proposed FPGA Architecture

Comparison of Heterogeneous FPGA Architectures

Conclusions and Future Work

Page 6: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

Overview of Macro-Gate Design

Key problem Select the logic functions for the macro-gate

Problem formulation: Input: a set of training circuits, which have been

mapped to K-input LUTs Output: N K-input Boolean functions: f1 , … , fN Objective: Maximize the number of logics (in the

training circuit set) which can be implemented by f1 , … , fN

The proposed solution Ranking of the logic functions for a set of training

circuits

Page 7: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

NPN-Class Diagram: Organization of Logics Canonical and efficient representation of all NPN classes

NPN-Equivalent: functional equivalency under inputs negation, permutation or output negation

E.g., f(a,b,c)=a+bc, g(a,b,c)=b’a+b’c

NPN-Cofactor relationship is indicated

DAG: easy to manipulate

It becomes impractical to compute for more than 6-input functions! Solution: Utilization NPN-Class Diagram

Level3: 3-inputLevel2: 2-input

Level1: 1-input

Level0: constant

Wider inputs

Page 8: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

UND: Utilization NPN-Class Diagram UND is an DAG, sub-graph of NCD

Help for scoring and ranking functions ab’c’+a’bc’

ab’c’+a’bc’ / 1 / xx% abc

ab / 0 / xx%

a / 0 / xx%

ab’+a’b / 0 / xx%

-0- / 0 / xx%

abc/ 1 / xx%

ab’+a’b

a

Implementation capability

Appearance frequency

functionality

Page 9: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

UND: Utilization NPN-Class Diagram

ab’c’+a’bc’

ab’c’+a’bc’ / 1 / xx% abc

ab / 0 / xx%

a / 0 / xx%

ab’+a’b / 0 / xx%

-0- / 0 / xx%

abc/ 1 / xx%

ab’+a’b

aab’+a’b / 1 / xx%

a / 1 / xx%

Page 10: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

a / 1 / 25%

ab’+a’b / 1 / 50%

UND: Utilization NPN-Class Diagram

Calculate Implementation Capability

ab’c’+a’bc’

ab’c’+a’bc’ / 1 / 75% abc

ab / 0 / 25%

-0- / 0 / xx%

abc/ 1 / 50%

ab’+a’b

a

Fanout cone ofab’c+a’bc’

The topology property (DAG) of UND enables us to efficiently explore

different metrics for functionality ranking, e.g., utilization rate.

Page 11: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

Recap: Overall Flow for Macro-Gate Design

Map with LUT-N

Extract logic functions

Generate UtilizationNPN Diagram

Calculate score For logic functions

Rank logic functions

ab’c’+a’bc’ / 1 / xx%

ab / 0 / xx%

a / 0 / xx%

ab’+a’b / 0 / xx%

-0- / 0 / xx%

abc/ 1 / xx%

ab’+a’b / 1 / xx%

a / 1 / xx%

F

f

g

d

e

h

b

a

c

LUTLUT

LUTLUT

LUTLUT

and2(3)

inv(1)

nand2(2)

000000100000000000000100000000000000100000000000000100000000000000100000000000000100000000000000

……

a / 1 / 25%

ab’+a’b / 1 / 50%

ab’c’+a’bc’ / 1 / 75%

ab / 0 / 25%

-0- / 0 / xx%

abc/ 1 / 50%

1+1*1/2=1.5

1

1*1/2=0.5

1+1*1/3=1.33 1+1*2/3+1*1/3=2

Best function: ab’c’+a’bc’

Page 12: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

Proposed Macro-Gates and FPGA Architecture

For IWLS’05 benchmarks, the following four 6-input functions have the highest ranks GI1=a b c d e f (AND-6) GI2=a’ b’ c’ + b c f’ + b c’ d’ + b’ c e (MUX-4) GI3=a b' c d' e + b c e f + d e f GI4=a b' + a' c d' + b' c' + e' + f‘

It can implement over 50% of logic functions in IWLS’05 benchmarks.

The architecture of the proposed macro-gate and FPGA SLICE are

Page 13: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

Outline

Design of the Embedded Macro-gates

Synthesis for the Proposed FPGA Architecture

Technology Mapping for Heterogeneous FPGAs

SAT-based Packing

Place and Routing

Comparison of Heterogeneous FPGA Architectures

Conclusions and Future Work

Page 14: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

Functional & Structural Cut Enumeration

ab

d

zyx

c

w

a=(x+y)’b=y+wz

d=ab=(x+y)’(y+wz)=x’y’wz

Is x’v’wz in library?

4-input macro gate lib

000000100000000000000100000000000000100000000000000100000000000000100000000000000100000000000000

……Yes

Phase1:Enumerate and label cuts from PIs to Pos Check the feasibility of a cut w.r.t. the macro-gate

Phase2:Select best choice from POs to Pis

A general yet efficient solution is SAT based Boolean matching Exploiting Symmetry in SAT-Based Boolean Matching for Heterogeneous FPGA Technology

Mapping , Session 5C.1, ICCAD 07

Page 15: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

Key in Technology Mapping: Balance Resource Utilization

Asymmetric architecture causes problem to resource utilization

Exclusively use of one logic resource leads to lots of unused fabric

Simple yet effective solution : Change LUT-MG ratio by adjusting their area weights. Precise calibration is hard to reach by this approach.

0

1000

2000

3000

4000

5000

6000

1:1 1:0.95 1:0.9 1:0.8 1:0.5 1:0.1

MG# LUT6#

Total# too large!

Hard to obtain precise calibrationObjective

architecture:LUT6:MacroGate6

=1:1

Best LUT-MG ratio= 1:1

LUT-MG ratio = LUT#/MG#

Page 16: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

Post-Mapping Area Recovery (motivation example)

Given: Target architecture = LUT6 + MG6 LUT-MG ratio in target architecture = 1:1 LUT# < MG# in the mapped design Intrinsic delay (LUT6 : MG6) = 5:4

Objective: balance LUT MG number without increasing delay

LUT6

MG6

MG6

MG6

MG6

5 / 5

4 / 5

9 / 9

9 / 13

17 / 17

13 / 13

PI POMG6

MG68 / 9

Page 17: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

Post-Mapping Area Recovery (motivation example)

Given: Target architecture = LUT6 + MG6 LUT-MG ratio in target architecture = 1:1 LUT# < MG# in the mapped design Intrinsic delay (LUT6 : MG6) = 5:4

Objective: balance LUT MG number without increasing delay

LUT6

MG6

MG6

MG6

5 / 5

4 / 5

9 / 9

10 / 13

17 / 17

13 / 13

PI POMG6

MG68 / 9

LUT6

Page 18: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

Post-Mapping Area Recovery (motivation example)

Given: Target architecture = LUT6 + MG6 LUT-MG ratio in target architecture = 1:1 LUT# < MG# in the mapped design Intrinsic delay (LUT6 : MG6) = 5:4

Objective: balance LUT MG number without increasing delay

LUT6

MG6 MG6

5 / 5

5 / 5

9 / 9

10 / 13

18 / 17

14 / 13

PI POMG6

10 / 9

LUT6

LUT6 LUT6 Timing target violation!

Timing slack budgeting is necessary!

Page 19: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

Post Mapping Area Recovery by Timing Budgeting

Formulated as an Integer Linear Programming (ILP) Problem

Objective (minimize gap between target and actual LUT-MG ratios): min |m2+…+m7-7/2|

Arrival time constraints: ai+dj+bj<=aj

Clock period target: ai<=17

LUT assignment with given timing slack: (5-4)*mj<=bj, mj={0,1}

LUT6

MG6 MG6

a1

a6

a5

a2

a3

a4

PI POMG6

a7

MG6

MG6 MG6

Easy to be generalized to handle arch with multiple macro gates with different input pin numbers

Page 20: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

Outline

Design of the Embedded Macro-gates

Synthesis for the Proposed FPGA Architecture

Technology Mapping for Heterogeneous FPGAs

SAT-based Packing

Comparison of Heterogeneous FPGA Architectures

Conclusions and Future Work

Page 21: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

SAT-Based Packing

Motivation Traditional packing tools, e.g., T-VPack, hard-codes the architecture

specification of a SLICEs…. Re-impalement from scratch when architecture changes

Propose a unified implementation of the packers for different architectures: easy to perform architecture exploration!

The architecture dependent sub-problem in packing Structural feasibility checking for a sub-circuit to the SLICE

Solution Solve the problem of validating SLICE packing as a local

place&route problem A SAT solver is used to carry out the validation checking

Page 22: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

Example of SAT-Based SLICE Packing Examples of constraints: (for each classes of constraint…)

Placement and routing choice variables: X@A, X@B, U5@N10

Exclusively constraint: (¬X@A) (¬X@B)∨

Presence constraint: (X@A) (¬X@B)∨

Input/Output constraint: X@A → U5@N10

Routing constraint: G0 →out U∧ 5@N10) → U5@N12

Page 23: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

Recap: Overall Synthesis Flow

Area weightSetting

Cut-based Mapping

Area-BalanceTrade-off?

Y

NPost-mappingArea recovery

LUT6

MG6

MG6

MG6

MG6

MG6

MG6

LUT6

MG6

MG6

MG6

MG6LUT6

LUT6

MG6

MG6

MG6

LUT6

LUT6

packing

F

f

g

d

e

h

b

a

c

LUTLUT

LUTLUT

LUTLUT

LUT

LUT

LUT

Page 24: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

Outline

Motivation and Objectives

Methodology for Logic Function Exploration

Technology Mapping for Heterogeneous FPGAs

Evaluation of Heterogeneous FPGA Architectures

Conclusions and Future Work

Page 25: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

Experimental Setting Design library parameters [Cong, TODAES’05]

Benchmark set: IWLS 2005

Four architectures are compared: LUT4, LUT4 + macro gate, LUT6, and LUT6 + macro gate Synthesize the proposed macro-gate by SIS1.2 Delay and area model

Interconnect delay is igonired

Page 26: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

Delay Comparisons

Compared to LUT4, LUT4+MG reduces both logic depth and delay by 9.2%.

Compared to LUT6, LUT6+MG reduces delay by 30% while increasing logic depth by 36.5%. A LUT6 can implement more logics than a macro-gate

Logic depth

7.867.14

5.48

7.48

0

2

4

6

8

10

delay

7.86 7.14

10.959.14

02468

1012

Page 27: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

Logic Area Comparisons

Compared to LUT4, LUT4+MG reduces logic area by 12.5%.

Compared to LUT6, LUT6+MG reduces logic area by 16.9%.

PLB#

6406

29853711

2142

01000200030004000500060007000

Area

7346 6408

16816

11849

02000400060008000

1000012000140001600018000

Page 28: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

Outline

Motivation and Objectives

Methodology for Logic Function Exploration

Technology Mapping for Heterogeneous FPGAs

Comparison of Heterogeneous FPGA Architectures

Conclusions and Future Work

Page 29: Design, Synthesis and Evaluation of Heterogeneous FPGA with Mixed LUTs and Macro-Gates Yu Hu 1, Satyaki Das 2, Steve Trimberger 2, and Lei He 1 1. Electrical

Conclusions

Conclusions A novel FPGA architecture with the mixed LUTs and macro-

gates is proposed A synthesis flow for the proposed architecture is implemented The preliminary experimental results show the effectiveness of

the proposed architecture for the area and delay reduction

Future Work Perform the physical design for the synthesized circuits and

compare the routing costs, architecture evaluation considering interconnect delay

Study the effectiveness of the power reduction for the proposed architecture

Macro-gates with wider inputs will be examined