23
1 Fundamental Algorithms for System Modeling, Analysis, and Optimization Edward A. Lee, Jaijeet Roychowdhury, Sanjit A. Seshia, Stavros Tripakis UC Berkeley EECS 144/244 Fall 2014 Copyright © 2010-14, E. A. Lee, J. Roychowdhury, S. A. Seshia, S. Tripakis. All rights reserved Lecture: Timing Analysis, Retiming Thanks to Kurt Keutzer for several slides EECS 144/244, UC Berkeley: 2 RTL Synthesis Flow RTL Synthesis HDL netlist logic optimization netlist Library/ module generators physical design layout HDL Simulation/ Verification K. Keutzer a b s q 0 1 d clk a b s q 0 1 d clk FSM, Verilog, VHDL Boolean circuit/network Boolean circuit/network Graph / Rectangles Timing Analysis

Fundamental Algorithms for System Modeling, Analysis, and ...€¦ · Cycle time is also a function of propagation delay of FF (Tclk-to-Q) Tclk-to-Q: time from arrival of clock signal

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Fundamental Algorithms for System Modeling, Analysis, and ...€¦ · Cycle time is also a function of propagation delay of FF (Tclk-to-Q) Tclk-to-Q: time from arrival of clock signal

1

Fundamental Algorithms for System Modeling, Analysis, and Optimization

Edward A. Lee, Jaijeet Roychowdhury, Sanjit A. Seshia, Stavros TripakisUC BerkeleyEECS 144/244Fall 2014

Copyright © 2010-14, E. A. Lee, J. Roychowdhury, S. A. Seshia, S. Tripakis. All rights reserved

Lecture: Timing Analysis, Retiming

Thanks to Kurt Keutzer for several slides

EECS 144/244, UC Berkeley: 2

RTL Synthesis Flow

RTLSynthesis

HDL

netlist

logicoptimization

netlist

Library/modulegenerators

physicaldesign

layout

HDLSimulation/ Verification

K. Keutzer

a

b

s

q0

1

d

clk

a

b

s

q0

1

d

clk

FSM,Verilog,VHDL

Boolean circuit/network

Boolean circuit/network

Graph / RectanglesTiming Analysis

Page 2: Fundamental Algorithms for System Modeling, Analysis, and ...€¦ · Cycle time is also a function of propagation delay of FF (Tclk-to-Q) Tclk-to-Q: time from arrival of clock signal

2

EECS 144/244, UC Berkeley: 3

Timing Analysis / Verification

Verifying a property about system timing

Arises in many settings: Integrated circuits

Embedded software

Distributed embedded systems

Biological systems

Here we focus on circuits.

Chapter 15 of Lee-Seshia book contains a more broad discussion.

EECS 144/244, UC Berkeley: 4

Timing Analysis for Digital ICs

(Clock) Speed is one of the major performance metrics for digital circuits

Timing Analysis = the process of verifying that a chip meets its speed requirementE.g., 1 GHz means that next-state function must be

computed within 1 ns

Page 3: Fundamental Algorithms for System Modeling, Analysis, and ...€¦ · Cycle time is also a function of propagation delay of FF (Tclk-to-Q) Tclk-to-Q: time from arrival of clock signal

3

EECS 144/244, UC Berkeley: 5

Determine fastest permissible clock speed (e.g. 1 GHz) by determining delay of longest path from register to register (e.g. 1ns.)

Static Timing Analysis for Circuits

clk

Combinationallogic

clk

Combinationallogic

clk

Combinationallogic

EECS 144/244, UC Berkeley: 6

Cycle Time - Critical Path Delay

Cycle time (T) cannot be smaller than longest path delay (Tmax)

Longest (critical) path delay is a function of:

Total gate, wire delays

clock

Q1 Q2

Tclock1 Tclock2critical path,

~5 logic levels

Tclock1

data

cycle time

maxT T

Page 4: Fundamental Algorithms for System Modeling, Analysis, and ...€¦ · Cycle time is also a function of propagation delay of FF (Tclk-to-Q) Tclk-to-Q: time from arrival of clock signal

4

EECS 144/244, UC Berkeley: 7

Cycle Time - Setup Time

For FFs to correctly latch data, input data must be stable during:

Setup time (Tsetup) beforeclock arrives

clock

Q1 Q2

Tclock1 Tclock2critical path,

~5 logic levels

Tclock1

data

setup time

max setupT T T

EECS 144/244, UC Berkeley: 8

Cycle Time - Clock-skew

clock

Q1 Q2

Tclock1 Tclock2

Tclock2

Tclock1

Q2

data

clock skewQ2

If clock network has unbalanced delay – clock skew

Cycle time is also a function of clock skew (Tskew)

max setup skewT T T T

critical path, ~5 logic levels

Page 5: Fundamental Algorithms for System Modeling, Analysis, and ...€¦ · Cycle time is also a function of propagation delay of FF (Tclk-to-Q) Tclk-to-Q: time from arrival of clock signal

5

EECS 144/244, UC Berkeley: 9

Cycle Time - Clock to Q

Cycle time is also a function of propagation delay of FF (Tclk-to-Q)

Tclk-to-Q : time from arrival of clock signal till change at FF output)

clock

Q1 Q2

Tclock1 Tclock2

Tclock1

Tclock2

Q2clock-to-Q

data

Q2

max setup skew clk to QT T T T T

critical path, ~5 logic levels

EECS 144/244, UC Berkeley: 10

Min Path Delay - Hold Time

For FFs to correctly latch data, input data must be stable during Hold time (Thold) after clock arrives

Determined by delay of shortest path in circuit (Tmin) and clock skew (Tskew)

clock

Q1 Q2

Tclock1 Tclock2short path, ~3

logic levels

Tclock1

data

hold time

min hold skewT T T

Page 6: Fundamental Algorithms for System Modeling, Analysis, and ...€¦ · Cycle time is also a function of propagation delay of FF (Tclk-to-Q) Tclk-to-Q: time from arrival of clock signal

6

EECS 144/244, UC Berkeley: 11

Summary

Need to be able to compute

Tmin: delay of “fastest” path in the circuit

Tmax: delay of “slowest” (critical) path in the circuit

EECS 144/244, UC Berkeley: 13

Modeling Timing in a Combinational Circuit

Arrival time in green A

C

B

f

2

2

2

1

0

1

0

.20

.20

.20

.10

X

YZ

W

.15

.05

1

.05

Interconnect delay in red

Gate delay in blue

What’s the right mathematical object to use to represent this physical object?

Page 7: Fundamental Algorithms for System Modeling, Analysis, and ...€¦ · Cycle time is also a function of propagation delay of FF (Tclk-to-Q) Tclk-to-Q: time from arrival of clock signal

7

EECS 144/244, UC Berkeley: 14

Modeling - 1

C

B

f

X

Y

W

0.05

.05.1

1

.2

A

.15.20

.20

12

2

2

Z

Use a labeled directed graph

G = <V,E>

Vertices represent gates, primary inputs and primary outputs

Edges represent wires

Labels represent delays

What about arrival times?

A

C

B

f

2

2

2

1

0

1

0

.20

.20

.20

.10

X

YZ

W

.15

.05

1

.05

0

0

1

s

EECS 144/244, UC Berkeley: 15

Modeling - 2

C

B

f

X

Y

W

0

.05.1

1

.2

0

0

1

A

.15.20

.20

12

2

2

Z

Find longest/shortest paths in a directedgraph G = <V,E>

What sort of directed graph do we have?

Is this in the standard form for a longest/shortest path problem?

s

Page 8: Fundamental Algorithms for System Modeling, Analysis, and ...€¦ · Cycle time is also a function of propagation delay of FF (Tclk-to-Q) Tclk-to-Q: time from arrival of clock signal

8

EECS 144/244, UC Berkeley: 16

Split Nodes into Edges

C

B

f

X

Y

W

0

.05.1

1

.2

0

0

1

A

.15.20

.20

12

2

2

Z

22

2 1

0.5 2.5

3

s

EECS 144/244, UC Berkeley: 17

DAG with Weighted Edges

C

B

f

X

Y

W

0

1.050.1

1

0.2

0

0

1

A

2.15

2.2

2.2Z

Problem: Find the longest (critical) path from source s to sink f.

s

Page 9: Fundamental Algorithms for System Modeling, Analysis, and ...€¦ · Cycle time is also a function of propagation delay of FF (Tclk-to-Q) Tclk-to-Q: time from arrival of clock signal

9

EECS 144/244, UC Berkeley: 18

Naïve Approach: Enumerate Paths

C

B

f

X

Y

W

0

1.05.1

1

.2

0

0

1

A

2.15

2.2

2.2Z

Problem: Find the longest path from source s to sink f.

How many paths in this example?In the worst case?

s

EECS 144/244, UC Berkeley: 19

Algorithm 1: Longest path in a DAG

Critical Path Method [Kirkpatrick 1966, IBM JRD](dynamic programming)

Let w(u,v) denote weight of edge from u to v

Steps:

1. Topologically sort vertices (graph is acyclic)

order: v1, v2, …, vn v1 = s, vn = ?

2. For each vertex v, compute

d(v) = length of longest path from source s to v

d(v1) = 0

For i = 2..n

d(vi) = maxall incoming edges (u, vi)d(u) + w(u,vi)

Page 10: Fundamental Algorithms for System Modeling, Analysis, and ...€¦ · Cycle time is also a function of propagation delay of FF (Tclk-to-Q) Tclk-to-Q: time from arrival of clock signal

10

EECS 144/244, UC Berkeley: 20

Algorithm 1: Longest path in a DAG

Critical Path Method [Kirkpatrick 1966, IBM JRD]

Let w(u,v) denote weight of edge from u to v

Steps:

1. Topologically sort vertices

order: v1, v2, …, vn v1 = s, vn = f

2. For each vertex v, compute

d(v) = length of longest path from source s to v

d(v1) = 0

For i = 2..n

d(vi) = maxall incoming edges (u, vi)d(u) + w(u,vi)

Time Complexity?O(m+n)

Run the CPM on our example

EECS 144/244, UC Berkeley: 21

Graph vs. Circuit

Delay of a combinational circuit depends on Circuit topology (graph model)

Delay model (e.g., fixed vs. variable)

Boolean behavior of gates

We have only considered circuit topology so far.

Longest/shortest path found on the graph can be very pessimistic: Paths can be FALSE

Delay values are BOUNDS

Page 11: Fundamental Algorithms for System Modeling, Analysis, and ...€¦ · Cycle time is also a function of propagation delay of FF (Tclk-to-Q) Tclk-to-Q: time from arrival of clock signal

11

EECS 144/244, UC Berkeley: 22

False Paths (consider Transition Mode)

Graph model implies path of length 6

A path is false if it cannot be responsible for the delay of a circuit

EECS 144/244, UC Berkeley: 23

False Paths

Graph model implies path of length 6

A path is false if it cannot be responsible for the delay of a circuit

Page 12: Fundamental Algorithms for System Modeling, Analysis, and ...€¦ · Cycle time is also a function of propagation delay of FF (Tclk-to-Q) Tclk-to-Q: time from arrival of clock signal

12

EECS 144/244, UC Berkeley: 24

False vs. True Paths

TRUE path = one that can be responsible for the delay of a circuit

Need techniques to find whether a path is TRUE or FALSE

EECS 144/244, UC Berkeley: 25

The Fixed Delay Model: Constant delay for each gate (or wire)

Page 13: Fundamental Algorithms for System Modeling, Analysis, and ...€¦ · Cycle time is also a function of propagation delay of FF (Tclk-to-Q) Tclk-to-Q: time from arrival of clock signal

13

EECS 144/244, UC Berkeley: 26

Paradoxical Behavior with the Transition Model?

non-monotonicw.r.t. time delays!

EECS 144/244, UC Berkeley: 27

Problems with Fixed Delay + Transition Model

1. Transition model can be tricky to reason about

2. Fixed gate delays are unrealistic, due to manufacturing process variations

More realistic delay model: Lower and upper bounds

Perform timing analysis for a whole family of circuits that share the same lower/upper bounds

Page 14: Fundamental Algorithms for System Modeling, Analysis, and ...€¦ · Cycle time is also a function of propagation delay of FF (Tclk-to-Q) Tclk-to-Q: time from arrival of clock signal

14

EECS 144/244, UC Berkeley: 28

Fixed Delays Bounded Delays

Want algorithms that report the critical path delay

of the slowest circuit in the circuit family

Paper: Computation of Floating Mode Delay in Combinational

Circuits: Theory and Algorithms, Devadas, Keutzer, Malik, 1993

(on bcourses)

EECS 144/244, UC Berkeley: 29

RETIMING

Page 15: Fundamental Algorithms for System Modeling, Analysis, and ...€¦ · Cycle time is also a function of propagation delay of FF (Tclk-to-Q) Tclk-to-Q: time from arrival of clock signal

15

EECS 144/244, UC Berkeley: 30

Retiming Tradeoffs

[Shenoy, 1997]

a

b

6

4

1

1

12

2

2

clock period =

# registers =

EECS 144/244, UC Berkeley: 31

Retiming Tradeoffs

a

b

5

4

1

1

12

2

2

clock period =

# registers =

[Shenoy, 1997]

Page 16: Fundamental Algorithms for System Modeling, Analysis, and ...€¦ · Cycle time is also a function of propagation delay of FF (Tclk-to-Q) Tclk-to-Q: time from arrival of clock signal

16

EECS 144/244, UC Berkeley: 32

Retiming Tradeoffs

a

b 1

1

12

2

2

clock period =

# registers =

4

3

[Shenoy, 1997]

EECS 144/244, UC Berkeley: 33

Retiming Tradeoffs

a

b 1

1

12

2

2

clock period =

# registers =

2

4

[Shenoy, 1997]

Page 17: Fundamental Algorithms for System Modeling, Analysis, and ...€¦ · Cycle time is also a function of propagation delay of FF (Tclk-to-Q) Tclk-to-Q: time from arrival of clock signal

17

EECS 144/244, UC Berkeley: 34

Goals of Retiming

Possible goals:

• Minimize clock period (min-period retiming)

• Minimize number of registers (min-area retiming)

• Minimize number of registers for a target clock period (constrained min-area retiming)

[Shenoy, 1997]

EECS 144/244, UC Berkeley: 35

Abstraction:Circuit Graph

Nodes: circuit elementsNode weights: delay

Dummy node for fanout

Page 18: Fundamental Algorithms for System Modeling, Analysis, and ...€¦ · Cycle time is also a function of propagation delay of FF (Tclk-to-Q) Tclk-to-Q: time from arrival of clock signal

18

EECS 144/244, UC Berkeley: 36

Abstraction:Registers?

EECS 144/244, UC Berkeley: 37

Abstraction - Registers

Arc weights indicate registers

1

1

1

1

0

0

00 0

0 0 0

0

000

00

Page 19: Fundamental Algorithms for System Modeling, Analysis, and ...€¦ · Cycle time is also a function of propagation delay of FF (Tclk-to-Q) Tclk-to-Q: time from arrival of clock signal

19

EECS 144/244, UC Berkeley: 38

Abstraction - Environment

1

1

1

1

0

0

00 0

0 0 0

0

000

0

0

EECS 144/244, UC Berkeley: 39

Cutset – Divides the Graph in Two

1

1

1

1

0

0

00 0

0 0 0

0

000

0

0

Page 20: Fundamental Algorithms for System Modeling, Analysis, and ...€¦ · Cycle time is also a function of propagation delay of FF (Tclk-to-Q) Tclk-to-Q: time from arrival of clock signal

20

EECS 144/244, UC Berkeley: 40

Retiming: Add a register on all arcs crossing the cutset in one direction, and subtract a register from all arcs crossing the cutset in the other direction.

1

1

1-1=0

1-1=0

0

0

00

00+1=1

0

0

000

0

0

0+1=1

EECS 144/244, UC Berkeley: 41

Recall: Retiming Tradeoffs

[Shenoy, 1997]

a

b

6

4

1

1

12

2

2

clock period =

# registers =

Page 21: Fundamental Algorithms for System Modeling, Analysis, and ...€¦ · Cycle time is also a function of propagation delay of FF (Tclk-to-Q) Tclk-to-Q: time from arrival of clock signal

21

EECS 144/244, UC Berkeley: 42

Recall: Retiming Tradeoffs

a

b

5

4

1

1

12

2

2

clock period =

# registers =

[Shenoy, 1997]

EECS 144/244, UC Berkeley: 43

Simplest cutset surrounds one node

1

1

1-1=0

1-1=00

0

00

0

0+1=1 0

0

000

0

0

0

Page 22: Fundamental Algorithms for System Modeling, Analysis, and ...€¦ · Cycle time is also a function of propagation delay of FF (Tclk-to-Q) Tclk-to-Q: time from arrival of clock signal

22

EECS 144/244, UC Berkeley: 44

Recall: Retiming Tradeoffs

a

b

5

4

1

1

12

2

2

clock period =

# registers =

[Shenoy, 1997]

EECS 144/244, UC Berkeley: 45

Recall: Retiming Tradeoffs

a

b 1

1

12

2

2

clock period =

# registers =

4

3

[Shenoy, 1997]

Page 23: Fundamental Algorithms for System Modeling, Analysis, and ...€¦ · Cycle time is also a function of propagation delay of FF (Tclk-to-Q) Tclk-to-Q: time from arrival of clock signal

23

EECS 144/244, UC Berkeley: 46

For each node v, define r (v) = # of registers moved from the outputs to the inputs.

1

1

1-1=0

1-1=00

0

00

0

0+1=1 0

0

000

0

0

0

r (v) = -1

v

A retiming is now an assignment r (v) for every node v such that the weight of every arc is non-negative.

EECS 144/244, UC Berkeley: 47

Rest of the story

Retiming operations = graph transformations

Each graph has a vector of costs: (clock period, # registers, …)

Formulate and solve optimization problem.

Papers (on bcourses):1. Leiserson, C. E. and J. B. Saxe (1983). "Optimizing synchronous systems."

Journal of VLSI and Computer Systems: pp. 41-67.

2. Leiserson, C. E. and J. B. Saxe (1991). "Retiming synchronous circuitry." Algorithmica 6(1): pp. 5-35.

3. Shenoy, N. (1997). "Retiming: Theory and practice." Integration, the VLSI Journal 22: pp. 1-21.