27
ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

Embed Size (px)

Citation preview

Page 1: ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

ELEC692 VLSI Signal Processing Architecture

Lecture 3Retiming

Page 2: ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

Retiming - Introduction• Retiming is a transformation technique to change

the locations of delay element such that– The input/output characteristics are kept– The latency of the system is not changed– The critical path of the system is reduced– The number of registers is reduced– The power consumption is reduced

D

D

D

D

retiming

A B(2) (4)

2D

retimingA B

(2) (4)

D

D

Critical path=6, loop bound=6/2=3 Critical path=4, loop bound=6/2=3

Page 3: ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

An IIR Example

)()3()2(

)()1()(

)2()1()(

nxnbynay

nxnwny

nbynaynw

)()3()2(

)()1()1()(

)2()(

)1()(

21

2

1

nxnbynay

nxnwnwny

nbynw

naynw

retiming

(1)

(2)(1)

(2)

x(n) y(n)

a

b

w(n) 2DDD

(1)

(2)(1)

(2)

x(n) y(n)

a

bw1(n)

2DD

D

w2(n)

D

Critical path delay = 3 unit# of register = 4 Critical path delay = 2 unit

# of register = 5

retiming

Page 4: ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

Retiming for power consumption

• Placing registers at the input of nodes with large capacitances can reduce the switching activities at these nodes, which can lead to low-power solution.

Page 5: ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

Quantitative Description of retiming• Retiming maps a circuit (graph) G to a retimed circuit (graph)

Gr.• A retiming solution is characterized by a value r(V) for each

node V in the graph G.– R(V): the number of delays moved from the output edge of the

node V to each of its input edges• Let w(e) denote the weight of the edge e in the original graph G,

and let wr(e) denote the weight of the edge e in the retimed graph Gr. The weight of the edge U->V in the retimed graph is computed by– wr(e) = w(e) + r(V) – r(U)

• The IIR example1

2 3

4

(1)

(1) (2)

(2)

2DDD

1

2 3

4

(1)

(1) (2)

(2)

2DD

D

D

r(1)=0,r(2)=1r(3)=0r(4)=0

A retiming solution is feasible if wr(e) >=0 holds for all edges

If r(1)=0,r(2)=-1,r(3)=0 and r(4)=0, the retiming solution is not feasible as wr(3->2) will be equal to -1.

Page 6: ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

Properties of Retiming• Paths: a sequence of nodes and edges,

ke

keee VVVV kk

1210

110

• Cycle: also a sequence of nodes and edges,

Weight of the path p:

1

0

)()(k

iiewpw

011012110 VVVV kk e

keee

Weight of the path p:

1

0

)()(k

iiewpw

delay of the path p:

k

iiVtpt

0

)()(

delay of the path p:

k

iiVtpt

0

)()(

Page 7: ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

Properties of Retiming

• Property 1: The weight of the retimed path p=V0->V1->…->Vk is given by wr(p)=w(p)+r(Vk)-r(V0).

1

00

1

0

1

01

1

01

1

0

)()()()()()(

))()()(()()(

k

ik

k

i

k

iiii

k

iiii

k

iirr

VrVrpwVrVrew

VrVrewewpw

• Property 2: Retiming does not change the number of delays in a cycle.

• Property 3: Retiming does not alter the iteration bound in a DFG.

• Property 4: Adding the constant value j to the retiming value of each node does not change the mapping from G to Gr.

Page 8: ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

Retiming Technique

• Cutset retiming and pipelining

• Retiming to minimize the clock period and retiming to minimize the number of registers

Page 9: ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

Cutset Retiming and Pipelining

• A cutset is a set of edges that can be removed from the graph to create 2 disconnected subgraphs.

• Only affects the weights of the edges in the cutset.

• If the 2 disconnected subgraphs are G1 and G2, cutset retiming consists of adding k delays to each edge from G1 to G2 and removing k delays from each edge from G2 to G1.

+kD

+kD

-kD

1

2 3

4

(1)

(1) (2)

(2)

2DDD

1

2 3

4

D

G1

1

2 3

4

(1)

(1) (2)

(2)

3DD

D

Page 10: ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

Feasibility of cutset retiming

• Feasibility: for the retimed graph, wr(e) >=0 must hold for all edges e in Fr.

• Cutset retiming by adding k delay, from edges from G1 to G2, we have

• From edges from G2 to G1, we have

• Combining two, we have

0)(0)( 2,12,1 kewewr

0)(0)( 1,21,2 kewewr

)()( minmin1221

ewkewGGGG ee

Page 11: ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

Node retiming• Cutset retiming – If one of the subgraph G2 is a

single node and the other subgraph is the rest of the graph minus the edges going into and out of the chosen node. We call this node retiming

1

2 3

4

(1)

(1) (2)

(2)

2DDD

1

2 3

4

2DD

G2

G1

1

2 3

4

(1)

(1) (2)

(2)

2DD

D

D

Subtracting one delay from each edge outgoing from the nodeAdding one delay to each edge incident into the node

Noderetiming

Page 12: ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

Node Retiming• To recap the feasibility:

– Feasibility constraints: for each edge U->V of the retimed graph, the number of delay elements, i.e. the weight of the edge must be non-negative, we have

– Where wr(e) and w(e) are the number of delay elements on the edge U->V after and before retiming, respectively

– For a circuit graph, there is a retiming design space defined by the system of inequalities (each edge forms an inequality.

– This requires solving systems of inequalities

)()()(

0)()()()(

ewVrUr

UrVrewewr

Page 13: ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

Example again

024

023

112

241

131

1

2 3

4

(1)

(1) (2)

(2)

2DDD

1

2 3

4

2DD

G2

G1

Original weights

Choosing retiming values:r(1)=0,r(2)=1,r(3)=0,r(4)=0, G2=1,G1=0

wr(e)=w(e)+r(V)-r(U)

101024

101023

010112

200241

100131

1

2 3

4

(1)

(1) (2)

(2)

2DD

D

D

Page 14: ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

Pipelining – Feedforward Cutset• Pipelining – special case of cutset retiming, where

there are not edges in the cutset from the subgraph G2 to the subgraph G1, termed as feedforward cutset.

a a a

In D D

cutset a a a

In D D G1

G2

Retime with k=2

a a a

In D D

2D2D 2D

Page 15: ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

Example-Lattice Filter

D D

D

Critical path

Tcritical = 2Tmult+(N+1)Tadd where N is the number of stages

Cutset retiming

D D

D

D

New critical path

Tcritical = 4Tmult+4Tadd , a constant independent of # of stages, N

Page 16: ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

N slow-down with retiming• Cutset retiming is used in combination with slow-

down.– Replace each delay in the graph with N delays to create an

N-slow version of the graph– Perform cutset retiming on the N-slow graph– N-1 null operations must be interleaved after each useful

signal sample to preserve the functionality of the algorithm

1 2(1) (1)

D1 2

(1) (1)

2D2-slowtransformation

Tclk=2Titer=2

Tclk=2Titer=2*2 = 4

1 2(1) (1)

D

DRetiming of the2-slow graph

Tclk=1Titer=2

Page 17: ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

Another example- A 100-stage lattice filter

D D

D

Critical path

……

……D

Stage 1 Stage 2 Stage 100

Tcritical = 2Tmult+(N+1)Tadd where N is the number of stagesIn this case it is 2Tmult+101*Tadd

2D 2D

2D

Critical path

……

……2D

Stage 1 Stage 2 Stage 100

2-slow transformation

Page 18: ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

Another example- A 100-stage lattice filter

2D 2D

2D

Critical path

……

……2D

Stage 1 Stage 2 Stage 100

Inserting null operation to preserve behavior

Cutsetretiming

D D

2D

Critical path

……

……D

Stage 1 Stage 2 Stage 100

D D D

Tcritical = 2Tmult+2 Tadd Since the circuit is 2-slow, the sample period is 2*Tcritical

Page 19: ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

Retiming for clock period minimization

• Minimum feasible clock period (G):– Computation time of the critical path, which is the path with the

longest computation time among all paths with no delays (G)= max{t(p):w(p)=0} (w(p) is the # of delay of the path.

• The problem: Given a circuit graph G, find the best retimed circuit graph Gr, for all possible and feasible retiming choices, such that (Gr) is minimum, i.e. find a retiming solution r0 such that (Gr0) <= (Gr) for any other retiming solution r.

• Definition:– W(U,V)- minimum # of registers on any path from node U to node

V

– D(U,V)- maximum computation time among all paths from U to V with weight W(U,V) }:)(min{),( VUpwVUW p

)},()(:)(max{),( VUWpwandVUptVUD

Page 20: ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

Retiming for clock period minimization (Cont.)

• Overall algorithm:– Compute W(U,V) and D(U,V)– From the W(U,V) and D(U,V), determie if there is a retiming

solution that can achieve a desired clock period. Given a clock period c, find a feasible retiming solution such that (G)< c if the following constraints hold

• Feasibility constraint: r(U)-r(V) <=w(e) for every edge U->V of G, (make sure # of delays on each edge in the retimed graph is nonnegative

• Critical path constraint: r(U)-r(V) <= W(U,V)-1 for all vertices U,V in G such that D(U,V)>c,

– If there is a retimed solution, we can try a smaller c until there is not retimed solution.

Page 21: ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

How to compute W(U,V) and D(U,V)

1. Let M=tmaxn, where tmax is the maximum computation time of the nodes in G and n is the number of nodes in G.

2. Form a new graph Gn which is the same as G except the edge weights are replaced by w’(e)=Mw(e)-t(U) for all edges U->V.

3. Solve the all-pair shortest path problem on G’. Let S’UV be the shortest path from U->V.

4. If U /= V, then and D(U,V)= MW(U,V)-S’UV+t(V). If U=V, then W(U,V)=0 and D(U,V)=t(U).

M

SVUW UV

'

),(

Page 22: ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

Example

W(U,V) 1 2 3 4

1 0 1 1 2

2 1 0 2 3

3 1 0 0 3

4 1 0 2 0

S’UV 1 2 3 4

1 12 5 7 15

2 7 12 14 22

3 5 -2 12 20

4 5 -2 12 20

1

2 3

4

(1)

(1) (2)

(2)

2DDD

Step 1: tmax = 2, n = 4, hence M = 8

1

2 3

4

(1)

(1) (2)

(2)

1577

-2

-2

Step 2: The new graph G’ is shown here

Step 3: The shortest path can be found by standard shortest path algorihtm and the value of S’UV is as follows:

Step 4: The W(U,V) and D(U,V) can be found as follows

D(U,V) 1 2 3 4

1 1 4 3 3

2 2 1 4 4

3 4 3 2 6

4 4 3 6 2

Page 23: ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

Example• Given the values of W(U,V) and D(U,V), determine

whether there is a retiming solution that can achieved a desired clock period c.

• Let c = 3 in the previous example:

0)2()4(

0)2()3(

1)1()2(

2)4()1(

1)3()1(

rr

rr

rr

rr

rr

Feasibility constraints for all edges: Critical path constraints for all edges:

1)3()4(

0)1()4(

2)4()3(

0)1()3(

2)4()2(

1)3()2(

0)2()1(

rr

rr

rr

rr

rr

rr

rr

Feasiblesolution

r(1)=r(2)=r(3)=r(4)=0i.e. no retiming is needed

Page 24: ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

Example (cont.)• Now we try c = 2

0)2()4(

0)2()3(

1)1()2(

2)4()1(

1)2()1(

rr

rr

rr

rr

rr

Feasibility constraints for all edges:

Critical path constraints for all edges:

1)3()4(

1)2()4(

0)1()4(

2)4()3(

1)2()3(

0)1()3(

2)4()2(

1)3()2(

1)4()1(

0)3()1(

0)2()1(

rr

rr

rr

rr

rr

rr

rr

rr

rr

rr

rr

Feasiblesolution

r(1)=-1,r(2)=0,r(3)=-1, r(4)=-1

The retimedsolution1

2 3

4

(1)

(1) (2)

(2)

2DD

D

D

Page 25: ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

Retiming for register minimization• Find a retiming solution that uses the minimum # of

registers while satisfying the clock period constraints, i.e. find a feasible solution with minimum # of registers in the design space constrained by the feasibility and critical path constraints.

• Maximum fanout observation: if a node has several output edges carrying the same signal, the number of registers to implement these edge is the maximum number of registers on any one of the edge.

U

V1D

V23D

V37D

U

V1

D V32D

V21

4D

Page 26: ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

Retiming for register minimization (cont.)• # of registers required to implement the output edges

of the node V in the retimed graph is

• The total register cost in the retimed circuit is

• The formulation of retiming to minimize the # of register under the constriant that the clock period is not greater than c is: Minimize COST subject to1. (fanout constraint) RV>= wr(e) for all V and all edges coming

from V.2. (feasibility constraint): r(U)-r(V) <=w(e) for every edge U->V3. (clock period constraint): r(U)-r(V) <= w(U,V)-1 for all nodes

U,V such that D(U,V) >c

)}({max?

ewR rV

V e

VRCOST

Page 27: ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

Retiming for register minimization (cont.)

• The above formulation can be solved by using Interger Linear Programming (ILP)

• The details of solving ILP will not be covered in this course.