Upload
rosemary-osborne
View
309
Download
7
Embed Size (px)
Citation preview
ELEC692 VLSI Signal Processing Architecture
Lecture 3Retiming
Retiming - Introduction• Retiming is a transformation technique to change
the locations of delay element such that– The input/output characteristics are kept– The latency of the system is not changed– The critical path of the system is reduced– The number of registers is reduced– The power consumption is reduced
D
D
D
D
retiming
A B(2) (4)
2D
retimingA B
(2) (4)
D
D
Critical path=6, loop bound=6/2=3 Critical path=4, loop bound=6/2=3
An IIR Example
)()3()2(
)()1()(
)2()1()(
nxnbynay
nxnwny
nbynaynw
)()3()2(
)()1()1()(
)2()(
)1()(
21
2
1
nxnbynay
nxnwnwny
nbynw
naynw
retiming
(1)
(2)(1)
(2)
x(n) y(n)
a
b
w(n) 2DDD
(1)
(2)(1)
(2)
x(n) y(n)
a
bw1(n)
2DD
D
w2(n)
D
Critical path delay = 3 unit# of register = 4 Critical path delay = 2 unit
# of register = 5
retiming
Retiming for power consumption
• Placing registers at the input of nodes with large capacitances can reduce the switching activities at these nodes, which can lead to low-power solution.
Quantitative Description of retiming• Retiming maps a circuit (graph) G to a retimed circuit (graph)
Gr.• A retiming solution is characterized by a value r(V) for each
node V in the graph G.– R(V): the number of delays moved from the output edge of the
node V to each of its input edges• Let w(e) denote the weight of the edge e in the original graph G,
and let wr(e) denote the weight of the edge e in the retimed graph Gr. The weight of the edge U->V in the retimed graph is computed by– wr(e) = w(e) + r(V) – r(U)
• The IIR example1
2 3
4
(1)
(1) (2)
(2)
2DDD
1
2 3
4
(1)
(1) (2)
(2)
2DD
D
D
r(1)=0,r(2)=1r(3)=0r(4)=0
A retiming solution is feasible if wr(e) >=0 holds for all edges
If r(1)=0,r(2)=-1,r(3)=0 and r(4)=0, the retiming solution is not feasible as wr(3->2) will be equal to -1.
Properties of Retiming• Paths: a sequence of nodes and edges,
ke
keee VVVV kk
1210
110
• Cycle: also a sequence of nodes and edges,
Weight of the path p:
1
0
)()(k
iiewpw
011012110 VVVV kk e
keee
Weight of the path p:
1
0
)()(k
iiewpw
delay of the path p:
k
iiVtpt
0
)()(
delay of the path p:
k
iiVtpt
0
)()(
Properties of Retiming
• Property 1: The weight of the retimed path p=V0->V1->…->Vk is given by wr(p)=w(p)+r(Vk)-r(V0).
1
00
1
0
1
01
1
01
1
0
)()()()()()(
))()()(()()(
k
ik
k
i
k
iiii
k
iiii
k
iirr
VrVrpwVrVrew
VrVrewewpw
• Property 2: Retiming does not change the number of delays in a cycle.
• Property 3: Retiming does not alter the iteration bound in a DFG.
• Property 4: Adding the constant value j to the retiming value of each node does not change the mapping from G to Gr.
Retiming Technique
• Cutset retiming and pipelining
• Retiming to minimize the clock period and retiming to minimize the number of registers
Cutset Retiming and Pipelining
• A cutset is a set of edges that can be removed from the graph to create 2 disconnected subgraphs.
• Only affects the weights of the edges in the cutset.
• If the 2 disconnected subgraphs are G1 and G2, cutset retiming consists of adding k delays to each edge from G1 to G2 and removing k delays from each edge from G2 to G1.
+kD
+kD
-kD
1
2 3
4
(1)
(1) (2)
(2)
2DDD
1
2 3
4
D
G1
1
2 3
4
(1)
(1) (2)
(2)
3DD
D
Feasibility of cutset retiming
• Feasibility: for the retimed graph, wr(e) >=0 must hold for all edges e in Fr.
• Cutset retiming by adding k delay, from edges from G1 to G2, we have
• From edges from G2 to G1, we have
• Combining two, we have
0)(0)( 2,12,1 kewewr
0)(0)( 1,21,2 kewewr
)()( minmin1221
ewkewGGGG ee
Node retiming• Cutset retiming – If one of the subgraph G2 is a
single node and the other subgraph is the rest of the graph minus the edges going into and out of the chosen node. We call this node retiming
1
2 3
4
(1)
(1) (2)
(2)
2DDD
1
2 3
4
2DD
G2
G1
1
2 3
4
(1)
(1) (2)
(2)
2DD
D
D
Subtracting one delay from each edge outgoing from the nodeAdding one delay to each edge incident into the node
Noderetiming
Node Retiming• To recap the feasibility:
– Feasibility constraints: for each edge U->V of the retimed graph, the number of delay elements, i.e. the weight of the edge must be non-negative, we have
– Where wr(e) and w(e) are the number of delay elements on the edge U->V after and before retiming, respectively
– For a circuit graph, there is a retiming design space defined by the system of inequalities (each edge forms an inequality.
– This requires solving systems of inequalities
)()()(
0)()()()(
ewVrUr
UrVrewewr
Example again
024
023
112
241
131
1
2 3
4
(1)
(1) (2)
(2)
2DDD
1
2 3
4
2DD
G2
G1
Original weights
Choosing retiming values:r(1)=0,r(2)=1,r(3)=0,r(4)=0, G2=1,G1=0
wr(e)=w(e)+r(V)-r(U)
101024
101023
010112
200241
100131
1
2 3
4
(1)
(1) (2)
(2)
2DD
D
D
Pipelining – Feedforward Cutset• Pipelining – special case of cutset retiming, where
there are not edges in the cutset from the subgraph G2 to the subgraph G1, termed as feedforward cutset.
a a a
In D D
cutset a a a
In D D G1
G2
Retime with k=2
a a a
In D D
2D2D 2D
Example-Lattice Filter
D D
D
Critical path
Tcritical = 2Tmult+(N+1)Tadd where N is the number of stages
Cutset retiming
D D
D
D
New critical path
Tcritical = 4Tmult+4Tadd , a constant independent of # of stages, N
N slow-down with retiming• Cutset retiming is used in combination with slow-
down.– Replace each delay in the graph with N delays to create an
N-slow version of the graph– Perform cutset retiming on the N-slow graph– N-1 null operations must be interleaved after each useful
signal sample to preserve the functionality of the algorithm
1 2(1) (1)
D1 2
(1) (1)
2D2-slowtransformation
Tclk=2Titer=2
Tclk=2Titer=2*2 = 4
1 2(1) (1)
D
DRetiming of the2-slow graph
Tclk=1Titer=2
Another example- A 100-stage lattice filter
D D
D
Critical path
……
……D
Stage 1 Stage 2 Stage 100
Tcritical = 2Tmult+(N+1)Tadd where N is the number of stagesIn this case it is 2Tmult+101*Tadd
2D 2D
2D
Critical path
……
……2D
Stage 1 Stage 2 Stage 100
2-slow transformation
Another example- A 100-stage lattice filter
2D 2D
2D
Critical path
……
……2D
Stage 1 Stage 2 Stage 100
Inserting null operation to preserve behavior
Cutsetretiming
D D
2D
Critical path
……
……D
Stage 1 Stage 2 Stage 100
D D D
Tcritical = 2Tmult+2 Tadd Since the circuit is 2-slow, the sample period is 2*Tcritical
Retiming for clock period minimization
• Minimum feasible clock period (G):– Computation time of the critical path, which is the path with the
longest computation time among all paths with no delays (G)= max{t(p):w(p)=0} (w(p) is the # of delay of the path.
• The problem: Given a circuit graph G, find the best retimed circuit graph Gr, for all possible and feasible retiming choices, such that (Gr) is minimum, i.e. find a retiming solution r0 such that (Gr0) <= (Gr) for any other retiming solution r.
• Definition:– W(U,V)- minimum # of registers on any path from node U to node
V
– D(U,V)- maximum computation time among all paths from U to V with weight W(U,V) }:)(min{),( VUpwVUW p
)},()(:)(max{),( VUWpwandVUptVUD
Retiming for clock period minimization (Cont.)
• Overall algorithm:– Compute W(U,V) and D(U,V)– From the W(U,V) and D(U,V), determie if there is a retiming
solution that can achieve a desired clock period. Given a clock period c, find a feasible retiming solution such that (G)< c if the following constraints hold
• Feasibility constraint: r(U)-r(V) <=w(e) for every edge U->V of G, (make sure # of delays on each edge in the retimed graph is nonnegative
• Critical path constraint: r(U)-r(V) <= W(U,V)-1 for all vertices U,V in G such that D(U,V)>c,
– If there is a retimed solution, we can try a smaller c until there is not retimed solution.
How to compute W(U,V) and D(U,V)
1. Let M=tmaxn, where tmax is the maximum computation time of the nodes in G and n is the number of nodes in G.
2. Form a new graph Gn which is the same as G except the edge weights are replaced by w’(e)=Mw(e)-t(U) for all edges U->V.
3. Solve the all-pair shortest path problem on G’. Let S’UV be the shortest path from U->V.
4. If U /= V, then and D(U,V)= MW(U,V)-S’UV+t(V). If U=V, then W(U,V)=0 and D(U,V)=t(U).
M
SVUW UV
'
),(
Example
W(U,V) 1 2 3 4
1 0 1 1 2
2 1 0 2 3
3 1 0 0 3
4 1 0 2 0
S’UV 1 2 3 4
1 12 5 7 15
2 7 12 14 22
3 5 -2 12 20
4 5 -2 12 20
1
2 3
4
(1)
(1) (2)
(2)
2DDD
Step 1: tmax = 2, n = 4, hence M = 8
1
2 3
4
(1)
(1) (2)
(2)
1577
-2
-2
Step 2: The new graph G’ is shown here
Step 3: The shortest path can be found by standard shortest path algorihtm and the value of S’UV is as follows:
Step 4: The W(U,V) and D(U,V) can be found as follows
D(U,V) 1 2 3 4
1 1 4 3 3
2 2 1 4 4
3 4 3 2 6
4 4 3 6 2
Example• Given the values of W(U,V) and D(U,V), determine
whether there is a retiming solution that can achieved a desired clock period c.
• Let c = 3 in the previous example:
0)2()4(
0)2()3(
1)1()2(
2)4()1(
1)3()1(
rr
rr
rr
rr
rr
Feasibility constraints for all edges: Critical path constraints for all edges:
1)3()4(
0)1()4(
2)4()3(
0)1()3(
2)4()2(
1)3()2(
0)2()1(
rr
rr
rr
rr
rr
rr
rr
Feasiblesolution
r(1)=r(2)=r(3)=r(4)=0i.e. no retiming is needed
Example (cont.)• Now we try c = 2
0)2()4(
0)2()3(
1)1()2(
2)4()1(
1)2()1(
rr
rr
rr
rr
rr
Feasibility constraints for all edges:
Critical path constraints for all edges:
1)3()4(
1)2()4(
0)1()4(
2)4()3(
1)2()3(
0)1()3(
2)4()2(
1)3()2(
1)4()1(
0)3()1(
0)2()1(
rr
rr
rr
rr
rr
rr
rr
rr
rr
rr
rr
Feasiblesolution
r(1)=-1,r(2)=0,r(3)=-1, r(4)=-1
The retimedsolution1
2 3
4
(1)
(1) (2)
(2)
2DD
D
D
Retiming for register minimization• Find a retiming solution that uses the minimum # of
registers while satisfying the clock period constraints, i.e. find a feasible solution with minimum # of registers in the design space constrained by the feasibility and critical path constraints.
• Maximum fanout observation: if a node has several output edges carrying the same signal, the number of registers to implement these edge is the maximum number of registers on any one of the edge.
U
V1D
V23D
V37D
U
V1
D V32D
V21
4D
Retiming for register minimization (cont.)• # of registers required to implement the output edges
of the node V in the retimed graph is
• The total register cost in the retimed circuit is
• The formulation of retiming to minimize the # of register under the constriant that the clock period is not greater than c is: Minimize COST subject to1. (fanout constraint) RV>= wr(e) for all V and all edges coming
from V.2. (feasibility constraint): r(U)-r(V) <=w(e) for every edge U->V3. (clock period constraint): r(U)-r(V) <= w(U,V)-1 for all nodes
U,V such that D(U,V) >c
)}({max?
ewR rV
V e
VRCOST
Retiming for register minimization (cont.)
• The above formulation can be solved by using Interger Linear Programming (ILP)
• The details of solving ILP will not be covered in this course.