49
Application of Specific Delay Window Routing for Timing Optimization in FPGA Designs Evan Wegley, Qinhai Zhang Lattice Semiconductor Corporation 23 rd ACM / SIGDA International Symposium on FPGAs

Application of Specific Delay Window Routing for Timing

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Application of Specific Delay Window Routing for Timing

Application of Specific Delay Window Routing

for Timing Optimization in FPGA Designs

Evan Wegley, Qinhai Zhang

Lattice Semiconductor Corporation

23rd ACM / SIGDA International Symposium on FPGAs

Page 2: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 2 FPGA 2015

FPGA Routing

• FPGA routers must optimize for competing goals

• Routability

• Timing performance

• Timing constraints come in many forms

• Setup timing

• Hold timing

• Other timing constraints

• Maximum skew, for example

Page 3: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 3 FPGA 2015

Setup Timing

• Constrains data paths between registers to arrive before the

next clock cycle starts

• Produces an upper bound on delay of the data path

0.1 ns ≤ Delay ≤ 3.0 ns

Page 4: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 4 FPGA 2015

Hold Timing

• Constrains data paths between registers to arrive after the

previous cycle has been captured

• Produces a lower bound on delay of the data path

• At routing phase, we can correct hold timing violations by

rerouting the data path to have additional delay

0.1 ns ≤ Delay ≤ 3.0 ns

Page 5: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 5 FPGA 2015

More Complex Hold Timing

• Example: constrain the data paths to 2.0 ns for maximum setup

time and 0.4 ns for minimum hold time with respect to clock

Page 6: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 6 FPGA 2015

More Complex Hold Timing

• Example: constrain the data paths to 2.0 ns for maximum setup

time and 0.4 ns for minimum hold time with respect to clock

SlackA, Hold = (1.0 + 0.3 – 1.2) – 0.4

SlackA, Hold = –0.3 ns

Hold violation!

Note: ignoring LUT logic delays

Page 7: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 7 FPGA 2015

More Complex Hold Timing

• Example: constrain the data paths to 2.0 ns for maximum setup

time and 0.4 ns for minimum hold time with respect to clock

SlackA, Hold = (1.0 + 0.3 – 1.2) – 0.4

SlackA, Hold = –0.3 ns

Hold violation!

SlackB, Setup = 2.0 – (1.0 + 2.1 – 1.2)

SlackB, Setup = 0.1 ns

Close to setup threshold

Note: ignoring LUT logic delays

Page 8: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 8 FPGA 2015

More Complex Hold Timing

• Need awareness of both setup and hold requirements for each

connection

• We can do so by setting lower and upper bounds on delay for

each connection: slack allocation

Page 9: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 9 FPGA 2015

Slack Allocation

• Process of distributing slack to all connections of a circuit

(Youssef 1990, Frankle 1992, Fung 2008)

Page 10: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 10 FPGA 2015

Slack Allocation

• Process of distributing slack to all connections of a circuit

(Youssef 1990, Frankle 1992, Fung 2008)

• Allocate slack on setup timing to get upper bound

[1.00, 1.03] [0.56, 2.17]

[0.57, 2.17]

Note: ignoring LUT logic delays

Page 11: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 11 FPGA 2015

Slack Allocation

• Process of distributing slack to all connections of a circuit

(Youssef 1990, Frankle 1992, Fung 2008)

• Allocate slack on setup timing to get upper bound

• Allocate slack on hold timing to get lower bound

[1.00, 1.03] [0.60, 2.17]

[0.60, 2.17]

Note: ignoring LUT logic delays

Page 12: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 12 FPGA 2015

Slack Allocation

• Process of distributing slack to all connections of a circuit

(Youssef 1990, Frankle 1992, Fung 2008)

• Allocate slack on setup timing to get upper bound

• Allocate slack on hold timing to get lower bound

[1.00, 1.03] [0.60, 2.17]

[0.60, 2.17]

Reroute

Note: ignoring LUT logic delays

Page 13: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 13 FPGA 2015

Maximum Skew

• Constrains the range of delays on the loads of some net(s) or

buses

• Example: constrain the net to have a maximum skew of 0.5 ns

Page 14: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 14 FPGA 2015

Maximum Skew

• Constrains the range of delays on the loads of some net(s) or

buses

• Example: constrain the net to have a maximum skew of 0.5 ns

• Initial skew is 1.0 – 0.2 = 0.8 ns

• Violates constraint of 0.5 ns maximum skew by 0.3 ns

Page 15: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 15 FPGA 2015

Maximum Skew

• Setting delay bounds on each connection

• Step 1: Use the largest delay value as the upper bound

[0.50, 1.00]

Page 16: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 16 FPGA 2015

Maximum Skew

• Setting delay bounds on each connection

• Step 1: Use the largest delay value as the upper bound

• Step 2: Find the lower bound by subtracting the constraint value

[0.50, 1.00]

Page 17: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 17 FPGA 2015

Maximum Skew

• Setting delay bounds on each connection

• Step 1: Use the largest delay value as the upper bound

• Step 2: Find the lower bound by subtracting the constraint value

• Step 3: Reroute any connections falling outside the bounds

[0.50, 1.00]

Page 18: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 18 FPGA 2015

• Definition: routing a connection with a target delay between

some lower and upper bounds

• Lower and upper delay bounds form a “window”

• Allows us to optimize for various timing constraints constituting

both lower and upper bounds on delay

Specific Delay Window Routing

Page 19: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 19 FPGA 2015

Current FPGA Routing Technology

• PathFinder based (McMurchie 1995)

• Basis of VPR router (Betz 1997) and many other academic and

commercial FPGA routers

• Effective at balancing routability and timing performance

• Delay cost is the total delay of the connection

• Traditional single-wave search: total delay contains estimation

Costn = Aij dn + (1 - Aij) cn

Delay Cost Congestion Cost

Criticality Factor

Page 20: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 20 FPGA 2015

Single-Wave Expansion

• One approach to performing Specific Delay Window Routing

• Single-wave expansion using delay estimation to direct search

towards the target delay window

Total Delay = Known Delay + Estimated Delay

Page 21: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 21 FPGA 2015

Single-Wave Expansion

• One approach to performing Specific Delay Window Routing

• Single-wave expansion using delay estimation to direct search

towards the target delay window

Total Delay = Known Delay + Estimated Delay

Page 22: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 22 FPGA 2015

Single-Wave Expansion

• One approach to performing Specific Delay Window Routing

• Single-wave expansion using delay estimation to direct search

towards the target delay window

Total Delay = Known Delay + Estimated Delay

Page 23: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 23 FPGA 2015

Accuracy Issues

• Estimation can be inaccurate

• Sparse crossbar

• Manhattan distance: 4

• Estimate: 2 “X2” wires

• Switch between “X2” wires

is missing!

Page 24: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 24 FPGA 2015

Accuracy Issues

• Estimation can be inaccurate

• Sparse crossbar

• Swappable pins

• LUT input pins have

different delays

• Pins are logically equivalent

• Actual target pin not known

during estimation

Page 25: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 25 FPGA 2015

Accuracy Issues

• Estimation can be inaccurate

• Sparse crossbar

• Swappable pins

• Congestion adds variability

Page 26: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 26 FPGA 2015

Accuracy Issues

• Estimation can be inaccurate

• Sparse crossbar

• Swappable pins

• Congestion adds variability

Congested

Area

Page 27: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 27 FPGA 2015

Accuracy Issues

• Estimation can be inaccurate

• Sparse crossbar

• Swappable pins

• Congestion adds variability

Congested

Area

Page 28: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 28 FPGA 2015

Dual-Wave Expansion

• To address these accuracy issues, we propose to use dual-wave

search

• Instead of directing one search towards the target,

we can expand from both the source and target

• Each time the waves intersect, we check if the resulting path meets

the target delay window

• This eliminates estimation from the selection process

Page 29: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 29 FPGA 2015

Dual-Wave Expansion

Wave from both source and target

Page 30: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 30 FPGA 2015

Dual-Wave Expansion

Intersection:

Total delay does not

meet target window

Total Delay = Known Delay + Known Delay

Page 31: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 31 FPGA 2015

Dual-Wave Expansion

Intersection:

Total delay does

meet target window

Total Delay = Known Delay + Known Delay

Page 32: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 32 FPGA 2015

Routing Flow

No

rmal

Ro

uti

ng

• Global Routing

• Clock routing, other architecture-specific routing

• Detail Routing

• PathFinder-based

Sp

ecif

ic D

ela

y W

ind

ow

Ro

uti

ng

• Skew Optimization

• Calculate delay windows for constrained connections

• Perform specific delay window routing on connections in

violation of skew constraint

• Hold Timing Optimization

• Calculate delay windows using slack allocation

• Perform specific delay window routing on connections in

violation of hold timing

Page 33: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 33 FPGA 2015

Experimental Methodology

• Compared two generations of commercial tools

• Older one uses single-wave expansion with delay estimation

• Newer one uses dual-wave expansion without delay estimation

• Ten designs each tested for hold timing and skew correction

• Customer designs with known violations

• Designs with strict skew constraints

Device Device

Design Family LUT4s Utilization Design Family LUT4s Utilization

N0 MachXO2 4K 70% H0 LatticeEC 33K 9%

N1 LatticeECP3 150K 11% H1 LatticeEC 33K 9%

N2 LatticeECP2 20K 45% H2 MachXO 2K 85%

N3 LatticeECP3 150K 66% H3 MachXO 2K 85%

N4 ECP5 85K 67% H4 LatticeECP3 150K 83%

B0 MachXO2 2K 67% H5 LatticeECP3 150K 14%

B1 LatticeECP2 20K 74% H6 LatticeECP3 150K 82%

B2 LatticeECP3 70K 41% H7 LatticeECP3 150K 87%

B3 LatticeECP3 150K 28% H8 LatticeECP3 150K 76%

B4 ECP5 45K 23% H9 LatticeECP3 150K 84%

Page 34: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 34 FPGA 2015

Experimental Results: Skew Correction

• Varied the skew constraint values from 4.0 ns (loose) to

0.0625 ns (very strict) and compared ability to find a solution

10 10 10

7

3

1 0

10 10 10 10 10

8

4

0

2

4

6

8

10

4.0000 2.0000 1.0000 0.5000 0.2500 0.1250 0.0625

Nu

mb

er

of

de

sig

ns

pa

ss

ed

Constraint Value (ns)

Single

Dual

Page 35: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 35 FPGA 2015

Experimental Results: Skew Correction

• Runtime comparison between dual-wave and single-wave

approaches

0.1

1

10

100

4.0000 2.0000 1.0000 0.5000 0.2500 0.1250 0.0625

Ru

nti

me

Ra

tio

(d

ual-

wave r

untim

e / s

ingle

-wave r

untim

e)

Constraint Value (ns)

Individual Designs Mean

Page 36: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 36 FPGA 2015

Experimental Results: Hold Correction

• Compared ability to solve hold timing violations in 10 designs

• Single-wave approach corrected only 5 of 10

• Dual-wave approach corrected all 10 of 10

-3

-2

-1

0

H0 H1 H2 H3 H4 H5 H6 H7 H8 H9

Wo

rst

Neg

ati

ve

Sla

ck

(n

s)

No Correction Single Dual

Page 37: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 37 FPGA 2015

Experimental Results: Hold Correction

• Runtime not significantly increased using dual-wave approach

• Older tool generation used heuristic method for setup-awareness

• Newer tool generation uses slack allocation, which uses more time

0.0

0.5

1.0

1.5

2.0

H0 H1 H2 H3 H4 H5 H6 H7 H8 H9

Ru

nti

me

Rati

o

(dual-w

ave r

untim

e / s

ingle

-wave r

untim

e)

Individual Designs Mean

Page 38: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 38 FPGA 2015

Conclusion

• Presented specific delay window routing as a generic framework

to optimize for constraints constituting lower and upper bounds

on delay

• Proposed dual-wave expansion to address accuracy issues with

single-wave expansion when targeting a specific delay window

• Future work

• Applying specific delay window routing to more constraints

• Clock-to-output

• Cycle stealing

Page 39: Application of Specific Delay Window Routing for Timing
Page 40: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 40 FPGA 2015

Multiple Speed Grades

• We consider multiple speed

grades for slack allocation

• For setup timing, we use the

design speed grade

• For hold timing, we use the

fastest speed grade

• Inverted delay windows

• If the lower bound exceeds

the upper bound, then we

cannot satisfy the setup and

hold requirements for the

connection

Page 41: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 41 FPGA 2015

Slack Allocation

LOOP {

1. Update timing

2. Compute slack on each connection

slack(c) = min(slack(p))

where p is any path containing c

Stop if the cumulative slack is near zero

3. Compute the weight for each connection

weight(c) = delay(c) / max(delay(p))

where p is any path containing c

4. Allocate slack for each connection

delay(c) = delay(c) + weight(c) * slack(c)

}

Page 42: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 42 FPGA 2015

Slack Allocation

0

0.2

0.4

0.6

0.8

0

0.5

1

1.5

2

1 2 3 4 5 6 7 8 9 10 11

Co

nn

ec

tio

n W

eig

ht

Se

tup

Sla

ck

(n

s)

Iteration

Slack

Weight

0

2

4

6

8

10

12

0

0.5

1

1.5

2

2.5

1 2 3 4 5 6 7 8 9 10 11

Itera

tio

n

Co

nn

ec

tio

n D

ela

y (

ns

)

Iteration

Delay

Iteration

Slack allocation on setup

timing for the circled

connection

Page 43: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 43 FPGA 2015

Results Analysis

• Why does design “B0” take so much longer with dual-wave

expansion?

• Both single-wave and dual-wave approaches fail for this design

with a 0.125 ns constraint

• Architectural corner case involving three connections

• Stuck in a scenario where 2 have equal skew, but another is faster

Page 44: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 44 FPGA 2015

Results Analysis

• Why does design “B0” take so much longer with dual-wave

expansion?

• Both single-wave and dual-wave approaches fail for this design

with a 0.125 ns constraint

• Architectural corner case involving three connections

• Stuck in a scenario where 2 have equal skew, but another is faster

Page 45: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 45 FPGA 2015

Results Analysis

• Why does design “B0” take so much longer with dual-wave

expansion?

• Both single-wave and dual-wave approaches fail for this design

with a 0.125 ns constraint

• Architectural corner case involving three connections

• Stuck in a scenario where 2 have equal skew, but another is faster

Page 46: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 46 FPGA 2015

Results Analysis

• Why does design “B0” take so much longer with dual-wave

expansion?

• Both single-wave and dual-wave approaches fail for this design

with a 0.125 ns constraint

• Architectural corner case involving three connections

• Stuck in a scenario where 2 have equal skew, but another is faster

Page 47: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 47 FPGA 2015

Future Work

• Clock-to-output constraint

• Relative timing constraint between paths

• Example: 0.5 ns < clock-to-out path – clock-out path < 4 ns

Page 48: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 48 FPGA 2015

Future Work

• Cycle stealing

• Add extra delay to clock path to alleviate setup violations

Page 49: Application of Specific Delay Window Routing for Timing

Evan Wegley, Qinhai Zhang Page 49 FPGA 2015

Future Work

• Cycle stealing

• Add extra delay to clock path to alleviate setup violations