32
LOGO CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration 1 Xiaoyu Shi, 1 Dahua Zeng, 2 Yu Hu, 1 Guohui Lin, 1 Osmar R. Zaiane 1 Dept. of Computing Science, University of Alberta 2 Dept. of Electrical and Computer Engineering, University of Alberta Presented by Xiaoyu Shi Please address comments to [email protected]

CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Embed Size (px)

Citation preview

Page 1: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

LOGO

CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design

and Design Space Exploration

1Xiaoyu Shi, 1Dahua Zeng, 2Yu Hu, 1Guohui Lin, 1Osmar R. Zaiane

1Dept. of Computing Science, University of Alberta2Dept. of Electrical and Computer Engineering, University of Alberta

Presented by Xiaoyu Shi

Please address comments to [email protected]

Page 2: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Outline

Introduction

Circuit Similarity-Based Placement

Experimental Results

Conclusion and Future Work

Page 3: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Introduction Field Programmable Gate Array (FPGA)

Ease of design, low start-up costs and fast manufacturing turnaround time.

Size of FPGAs has reached million gates level. Modern FPGA designs suffer from long compilation time.

FPGA placement Determines which logic block within an FPGA should implement each of the

logic blocks required by the circuits. Has a significant impact on the performance and routability in nanometer

circuit designs. The optimization goals are to minimize certain criteria, such as wire length,

critical delay and area. Now becomes the bottleneck of modern FPGA circuit design [Chen’06].

Up-to-date fast placement algorithms Extensive studies have been performed to improve the placement efficiency

as a single synthesis phase for decades. State-of-the-art work includes using multi-core [Ludwin’08], embedding-

based [Gopalakrishnan’06], partitioning-based [Maidee’05], multi-level [Sankar’99], simulated annealing [Betz’97].

Xilinx SPARTAN-6 board

Page 4: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Reusable Info in CAD Incremental design for FPGAs

Design preservation is the key of incremental design. Similarity among circuits exists because functional changes or optimizations

are small, and they generally result in a similar topology of the modified circuit compared to the original circuit [Krishnaswamy’09].

Final iteration

Iteration 3 …

Iteration 2

Iteration 1Initial design

Optimizations, timing, etc …

Final design

Changes due to verification, timing, etc

Incremental design process for FPGAs

Page 5: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Reusable Info in CAD (Cont.) Design space exploration for FPGAs

FPGA design offers a variety of customizations by varying design parameters.

Local similarity and global similarity exist in design space exploration.

Initial design

Optimizations, timing, etc …

Final design

Changes due to verification, timing, etc

Constant multiplier blocks by CMU SPIRAL [Puschel’04]

Page 6: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Data Mining Overview

The key of data mining is to extract patterns and useful information from data, including text, graphs and circuits, etc.

It has been extensively studied since 1950s, and has been widely applied to many domains, such as businesses, sciences and health cares.

Graph mining, including graph pattern mining, graph classification and graph compression, is a research hot area in data mining [Borgwardt’08].

Graph similarity It quantitatively defines the topological similarity between two graphs. It has been used to many applications, such as web searching

[Kleinberg’99], social network mapping [Watts’99] and chemical structure matching [Hattori’03].

Page 7: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Graph Similarity Summary of graph similarity measures

Measure Description TimeComplexity

Global Topo

Isomorphism [Pelillo’02]

Identifying a bijection between the nodes of two graphs which preserves (directed) adjacency

NP-Hard Yes

Edit distance [Bunke’99]

Given a cost function on edit operations,determine the minimum cost transformation from one graph to another

NP-Hard Yes

Common subgraph[Fernandez’01]

Identifying the largest isomorphic subgraphs of two graphs

NP-Hard Yes

Iterative methods [Blondel’04]

Two graph elements are similar if their neighborhoods are similar

Cubic Yes

Statistical methods [Alberta’02]

Assessing aggregate measures of graph structure, degree distribution, diameter, betweenness measures

Linear No

Iterative methods It has lower computational complexity and considers global topological

information. It takes advantage of the graph sparsity.

Page 8: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Circuit Similarity Circuit similarity

We define circuit similarity to describe the similar topological structures between two circuits.

We adapt the iterative methods in graph similarity. It exists in several CAD phases, such as placement, routing and verification. It can be widely used to accelerate FPGA designs, such as incremental

design and exploration of the design space, etc.

Page 9: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Outline

Introduction

Circuit Similarity-Based Placement

Experimental Results

Conclusion and Future Work

Page 10: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Motivating Example Circuit similarity algorithm

Graph G

Graph G’Similarity score matrix for G and G’

V7 V8 V9 V10 V11 V12 V13 V14 V15 V16

V’70.92 0.25 0.48 0.15 0 0 0 0.42 0.06 0

V’80 0.73 0 0 0.05 0 0.39 0 0.17 0.06

V’90 0.39 0 0 0.4 0 0.73 0 0.06 0.48

V’10

0.48 0 0.89 0.25 0.3 0.12 0.14 0.06 0.33 0.09V’11

0 0 0.11 0.48 0 0.86 0 0.36 0.17 0V’12

0 0 0.3 0.34 0.64 0.25 0.39 0.34 0.15 0.42V’13

0.48 0.25 0.07 0.4 0 0.36 0 0.88 0.06 0V’14

0.4 0.39 0.29 0.15 0.15 0.18 0.12 0.46 0.59 0.06V’15

0 0.12 0.09 0 0.63 0 0.36 0 0.27 0.82

Page 11: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Motivating Example (Cont.) Circuit similarity-based

placement The initial placement of the new

circuit design (G’) is generated by computing the similarity between the original (G) and modified circuits, and finding the correspondent node matching.

A low-temperature simulated annealing is applied to further refine the results.

The proposed circuit similarity algorithm can be used to speedup placement, which allows faster incremental design and design space exploration.

Page 12: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Motivating Example (Cont.)

A real example For circuit “des”, the reference

configuration (synthesized using “resyn3” script in ABC) has 1245 CLBs and 1501 nets while the new configuration (synthesized using “rwsat2” script in ABC) has 1215 CLBs and 1471 nets.

The results show that CSBP successfully finds the internal node correspondence.

(a) Placement of reference config

(b) Init placement using CS

(c) Final placement using CS

(d) init placement using VPR

(c) Final placement using VPR

Placement layouts comparison of circuit “des”

Wire Delay (E-05)

Critical Delay (E-08)

Runtime (s)

CS-init 306 5.93 - -

VPR-init 1087 14.00 - -

CS-final 237 5.08 8.28 13.38

VPR-final 221 4.98 10.10 28.42

Status of placement results of circuit “des”

Page 13: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Circuit Similarity CAD Flow

CAD flow for design space explorationCAD flow for incremental design

Page 14: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Circuit Similarity Algorithm Iterative similarity algorithm

We employ the iterative similarity algorithm for undirected molecular graphs [Rupp’07].

We adapt the iterative similarity algorithm to consider directed circuit graphs, fix the I/O pins, and compute the similarity of faninand fanout nodes respectively, based on unique circuit constraints.

If (|in(vi)| < |in(v’j)| and |out(vi)| < |out(v’j)|)

Summary of variables

Page 15: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Performance Enhancement Support constraint

A support of a node is the set of nodes with predefined matchingsin the transitive fanin or fanoutcone of this node.

Formally, if v ∈ G and v’ ∈ G’, the support constraint requires:

where β ∈ (0,1].

Level constraint A topological sort and reverse

topological sort can label each internal node with two values.

Formally, if v ∈ G and v’ ∈ G’, the level constraint requires:

where Bl and Br are two nonnegative integers.

Effectiveness of the pruning techniques

Page 16: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Outline

Introduction

Circuit Similarity-Based Placement

Experimental Results

Conclusion and Future Work

Page 17: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Incremental Design CAD flow

Two-iteration CAD flow. CSBP flow (a) and from-scratch

flow (b) are compared. Optimization “imfs” reduces the

number of CLBs by 2%.

Settings Two versions of CSBP are

compared: A high quality version (CS) with β = 0.5, inner_num = 1 and Bl = Br = 1; A turbo version (CS-t) with β = 1, inner_num = 0.1 and Bl = Br = 0.

CSBP is implemented in C and evaluated on the 20 largest MCNC benchmarks.

The results are averaged over 5 funs on a Linux server with dual-core 2.19GHz CPU and 5GB memory.

CS2 package [Goldberg’97] is used for maximum matching problem.

f

CAD flow for incremental design

Page 18: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Results Initial placement results

Bounding box cost (bb cost) and delay cost are compared. Clearly, the initial placement results generated using CS is much better than

VPR’s initial results, and is very close to VPR’s final results.

Comparisons of initial bb cost Comparisons of initial delay cost

0%10%20%30%40%50%60%70%80%90%

100%

alu4

apex

2ap

ex4

bigk

eycl

ma

des

diff

eqds

ipel

liptic

ex10

10ex

5pfr

isc

mis

ex3

pdc

s298

s384

17s3

8584 seq

spla

tsen

g

CS-init VPR-final VPR-init

0%10%20%30%40%50%60%70%80%90%

100%

alu4

apex

2ap

ex4

bigk

eycl

ma

des

diff

eqds

ipel

liptic

ex10

10ex

5pfr

isc

mis

ex3

pdc

s298

s384

17s3

8584 seq

spla

tsen

g

CS-init VPR-final VPR-init

CS reduces bb cost by 72% on avg. compared to VPR CS reduces delay cost by 53% on avg. compared to VPR

Perc

enta

ge

Perc

enta

ge

Page 19: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

0.00E+005.00E-081.00E-071.50E-072.00E-072.50E-073.00E-073.50E-074.00E-074.50E-07

alu4

apex

2ap

ex4

bigk

eycl

ma

des

diff

eqds

ipel

liptic

ex10

10ex

5pfr

isc

mis

ex3

pdc

s298

s384

17s3

8584 seq

spla

tsen

g

CS-t CS VPR

Results (Cont.) Post-routing results comparison

A low-temperature annealing is applied to the initial results.

Wire length, critical delay and area are compared.

The results demonstrate the effectiveness of the pruning techniques, which do not affect the quality significantly.

0

50000

100000

150000

200000

250000

300000

alu4

apex

2ap

ex4

bigk

eycl

ma

des

diff

eqds

ipel

liptic

ex10

10ex

5pfr

isc

mis

ex3

pdc

s298

s384

17s3

8584 seq

spla

tsen

g

CS-t CS VPR Wire length

0.00E+005.00E+071.00E+081.50E+082.00E+082.50E+083.00E+083.50E+084.00E+08

alu4

apex

2ap

ex4

bigk

eycl

ma

des

diff

eqds

ipel

liptic

ex10

10ex

5pfr

isc

mis

ex3

pdc

s298

s384

17s3

8584 seq

spla

tsen

g

CS-t CS VPR AreaCritical delay

CS increases the area by 2% on avg.

CS increases the wire length by 3% on avg.

CS increases the crit. delay by 6% on avg.

Page 20: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Results (Cont.) Runtime comparison

Only placement time is compared. CS-t achieves 31x speedup on average, with up to 91x. More speedup is expected when circuits become larger.

Speedups compared to VPR

0

10

20

30

40

50

60

70

80

90

100

CS-t CS VPR

Spee

dups

Page 21: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Design Space Exploration CAD flow

Study logic-level and algorithm-level design space, respectively.

CSBP flow (a) and from-scratch flow (b) are compared.

Settings The logic-level design space

consists of 19 configurations generated by 19 ABC1 synthesis scripts in abc.rc.

The algorithm-level design space consists of 18 configurations of constant multiplier generated by CMU SPIRAL [Puschel’04]varying bits from 7 to 252.

Both CS and CS-t are evaluated. The benchmarking environments

are the same as logic-level design space exploration.

CAD flow for design space exploration2 Bit = 16 is abandoned due to ABC crash

1 http://www.eecs.berkeley.edu/~alanmi/abc/

Page 22: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Logic-level Sample Synthesis Scripts

Alias Scripts

resyn "b; rw; rwz; b; rwz; b"

resyn2 "b; rw; rf; b; rw; rwz; b; rfz; rwz; b"

resyn2a "b; rw; b; rw; rwz; b; rwz; b"

src_rw "st; rw -l; rwz -l; rwz -l"

src_rs "st; rs -K 6 -N 2 -l; rs -K 9 -N 2 -l; rs -K 12 -N 2 -l"

choice "fraig_store; resyn; fraig_store; resyn2; fraig_store; fraig_restore" rwsat "st; rw -l; b -l; rw -l; rf -l"

compress "b -l; rw -l; rwz -l; b -l; rwz -l; b -l" share "st; multi -m; fx; resyn2"

http://www.eecs.berkeley.edu/~alanmi/abc/

Page 23: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

0

500

1000

1500

2000

2500

resy

nre

syn2

resy

n2a

resy

n3co

mpr

ess

com

pres

s2ch

oice

choi

ce2

rwsa

trw

sat2

shak

esh

are

src_

rwsr

c_rs

src_

rws

resy

n2rs

com

pres

s2rs

resy

n2rs

dcco

mpr

ess2

rsdc

CS CS-t VPR

Logic Level Results Initial results comparison

The number of CLBs and levels vary widely in logic-level design space.

Show circuit “dsip” as an example. Bounding box cost and delay cost are

compared for initial placement results.

Initial bb cost of “dsip”

Critical delay

Characteristics of logic-level design space

0.00E+00

1.00E-04

2.00E-04

3.00E-04

4.00E-04

resy

nre

syn2

resy

n2a

resy

n3co

mpr

ess

com

pres

s2ch

oice

choi

ce2

rwsa

trw

sat2

shak

esh

are

src_

rwsr

c_rs

src_

rws

resy

n2rs

com

pres

s2rs

resy

n2rs

dcco

mpr

ess2

rs…

CS CS-t VPR Initial delay cost of “dsip”

CS reduces bb cost by 76% on avg.

CS reduces delay cost by 48% on avg.

Page 24: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Logic Level Results (Cont.) Final placement results

Wire length and critical delay of circuit “dsip” are compared. The final results produced by CS and CS-t are very close or better

compared to VPR’s, with 32% overhead for wire length and 20% improvement for critical delay.

Final wire length comparison of “dsip” Final critical delay comparison of “dsip”

0%

20%

40%

60%

80%

100%

resy

nre

syn2

resy

n2a

resy

n3co

mpr

ess

com

pres

s2ch

oice

choi

ce2

rwsa

trw

sat2

shak

esh

are

src_

rwsr

c_rs

src_

rws

resy

n2rs

com

pres

s2rs

resy

n2rs

dcco

mpr

ess2

rsdc

CS-t CS VPR

0%

20%

40%

60%

80%

100%

resy

nre

syn2

resy

n2a

resy

n3co

mpr

ess

com

pres

s2ch

oice

choi

ce2

rwsa

trw

sat2

shak

esh

are

src_

rwsr

c_rs

src_

rws

resy

n2rs

com

pres

s2rs

resy

n2rs

dcco

mpr

ess2

rsdc

CS-t CS VPR

Perc

enta

ge

Perc

enta

ge

Page 25: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Logic Level Results (Cont.) Design space shape characterization

We compare the minimal, median and maximal wire length and critical delay produced by CS, CS-t and VPR.

We also compare the shapes of each configuration over 19 designs.

The almost identical curves show that CSBP is able to accurately depict the shape of a design space.

Shape of minimal wire length of 20 circuits over 19 designs

0

500

1000

1500

2000

2500

alu4

apex

2ap

ex4

bigk

eycl

ma

des

diff

eqds

ipel

liptic

ex10

10ex

5pfr

isc

mis

ex3

pdc

s298

s384

17s3

8584 seq

spla

tsen

g

vpr-min cs-min cs-t-min

05E-08

0.00000011.5E-07

0.00000022.5E-07

0.00000033.5E-07

0.00000044.5E-07

alu4

apex

2ap

ex4

bigk

eycl

ma

des

diff

eqds

ipel

liptic

ex10

10ex

5pfr

isc

mis

ex3

pdc

s298

s384

17s3

8584 seq

spla

tsen

g

vpr-min cs-min cs-t-min

0100200300400500600700800

resy

nre

syn2

resy

n2a

resy

n3co

mpr

ess

com

pres

s2ch

oice

choi

ce2

rwsa

trw

sat2

shak

esh

are

src_

rwsr

c_rs

src_

rws

resy

n2rs

com

pres

s2rs

resy

n2rs

dcco

mpr

ess2

vpr cs cs-t

Shape of minimal crit. delay of 20 circuits over 19 designs

Shape of final wire length of circuit “dsip”

Page 26: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Logic Level Results (Cont.) Runtime comparison

Only placement time is compared. CS-t achieves 30x speedup on

average, with up to 100x. In practice, one can take

advantage of the significant speedup of CS-t to perform quick design space exploration.

Runtime comparison (“*” marked time is measured with a timeout )

0102030405060708090

100

alu4

apex

2

apex

4

bigk

ey

clm

a

des

diff

eq

dsip

ellip

tic

ex10

10

ex5p

fris

c

mis

ex3

pdc

s298

s384

17

s385

84 seq

spla

tsen

g

CS CS-t VPR

Speedups compared to VPR

Spee

dups

Page 27: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Algorithm Level Results Experimental settings

The algorithm-level design is a constant multiplier.

The design parameter explored in our experiments is the fractional bits varying from 7 to 251.

CMU SPIRAL is used to generate RTL design based on Hcub algorithm [Voronenko’07].

Experimental results The initial and final placement results

are similar to logic-level space exploration.

CS and CS-t achieve 7x and 30x speedup compared VPR, respectively.

Characteristics of algorithm-level design space generated by CMU SPIRAL

An example of a constant parallel multiplier

1 Bit = 16 is abandoned due to ABC crash

Page 28: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Algorithm Level Results (Cont.) Wire length-delay space comparison

The pareto-points, which are the optimal configurations in a design space, are of most interests to IC designers.

CS and VPR find the same pareto-points. Bits = 24 is used as the reference circuit.

Wire length-delay space of VPR Wire length-delay space of CS

B7

B8

B9B10

B12

B14

B15

B17

B18

B19

B21

B22B23

B25

1.50E-07

2.00E-07

2.50E-07

3.00E-07

3.50E-07

4.00E-07

0 100 200 300 400 500

Esti

mat

ed c

riti

cal d

elay

Wire length

B7

B8 B9B10

B12

B14B15

B17

B18B19

B21

B22B23

B25

1.75E-07

2.25E-07

2.75E-07

3.25E-07

3.75E-07

4.25E-07

0 200 400 600

Esti

mat

ed c

riti

cal d

elay

Wire length

Page 29: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Outline

Introduction

Circuit Similarity-Based Placement

Experimental Results

Conclusion and Future Work

Page 30: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Future Work Improvement to CSBP

Integrate predefined matchings, for example, naming matching, into our CSBP to further enhance both the efficiency and the quality of the design.

Other applications Study the effectiveness of applying circuit similarity algorithm to other

applications, such as routing and sequential verification for FPGAs

Page 31: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

Conclusion Proposed an efficient circuit similarity algorithm Developed CSBP, a fast circuit similarity-based placement for

FPGAs Applied CSPB to incremental design and design space exploration. Open-source tool available at:

http://webdocs.cs.ualberta.ca/~xshi/soft.html Applied CSBP to incremental design for FPGAs

CSBP is able to reduce engineering effort by capturing the similarity from the previous design iterations.

CSBP is 31x faster compared to VPR. Applied CSBP to design space exploration for FPGAs

CSBP can precisely depict the shape of a design space and pinpoint the optimal designs.

CSBP is 30x faster compared to VPR.

Page 32: CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration

LOGO

Xiaoyu Shi, Dahua Zeng, Yu Hu, Guohui Lin, Osmar R. Zaiane

www.themegallery.com

CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design and Design Space Exploration