View
217
Download
1
Embed Size (px)
Citation preview
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 1
Circuit Performance and Adders Recap from last time
Hardware Design is Complicated Because We Want Circuits to Go Fast
Combinational Logic: Used A Simple Model of Delay Integer Delay on Each Gate Reduction of Circuit to Directed Acyclic Graph Delay of Circuit (= Clock Period) is longest path in graph
Making Circuits Go Fast = Shortening Longest Path Exploit Asymmetry between path lengths Shorten Longest Path by
• Introducing Redundant Logic• Moving Logic from Long to Short Paths
We will see a different technique today!
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 2
Delay Model of a Circuit
Translate circuit into graph
Weights on nodes are delay through gates
Delay through circuit is longest path through graph
Easy, linear-time algorithm
2
1
1
A
B
C
D
2
1
1
A
B
C
D
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 3
Circuit Performance Model
Latches
Combinational Logic
Latches
Inputs stabilize at 0
Logic finishes when last output stabilizes
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 4
Circuit Performance Model
Outputs of latches are stable only at clock edge
Inputs to latches must be stable by next clock edge Time between clock edges must be > delay of combinational logic
Latches
Combinational Logic
Latches
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 5
Adders
Highly-Studied Circuit, so case study in design “Ripple-carry” adder: standard adder where carry ripples
from one bit to another Longest path for n-bit adder is O(n) Number of gates for n-bit adder is O(n)
“Carry Lookahead”: Accelerate carry chain Collapse carry into all bits O(log n) delay (optimal!) O(n^3) gates (terrible!)
Practical Compromise is block-accelerated adders Block-carry lookahead Carry-select adder
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 6
Hierarchical Carry lookahead
PHG, GG used as propagate, generate inputs to hierarchical block
Carry Lookahead Block
m-bit CLA adderPP GG
m-bit CLA adderPG GG
m-bit CLA adderPG GG
PG0 GG0PG1
GG1PG2 GG2
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 7
Synopsis of Hierarchical Carry-Lookahead
n-bit adder, m-bit blocks, n/m blocks Delay is 2 log n + 2 log m Size is max(nm^2, (n/m)^3) Best is m = n^2/5 Delay is 14/5 log n, size is O(n^9/5)
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 8
Analysis of the Carry-Lookahead Adder
n bit adder, m-bit blocks, n/m blocks
Delay through the adder: 2 * delay through the lookahead block + delay through the super-lookahead block Lookahead block 2 log m Super-block: 2 log n/m = 2 log n – 2 log m Total: 2 log n + 2 log m
Logic: scales like the lookahead blocks Size p block: O(p^3) from before Two size of blocks: n/m blocks of size m, one block of size n/m Total: n/m * m^3 = nm^2, (n/m)^3 Choose m to minimize max(nm^2,(n/m)^3) Solution at m=n^(2/5). Total is n + n^3/5
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 9
Carry Select Adder
“Combinational Speculative Execution”
Basic intuition: Adders spend time waiting to see what carry-in is
Therefore Go ahead and guess each way Pick the right answer when the carry comes by
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 10
Carry-Select adder Each block is doubled
One block computes Carry-in=0, other carry-in=1 Actual carry-in (carry-out from previous block) computes result
m sum bits 1 carry-out bit
m-bit blockm-bit blockm-bit block
m
m
01
001 1
m
Block 1
Block 0
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 11
Analysis of Carry-Select Adder Delay analysis: Worst-case path is through Block0 then control of multiplexer chain O(m) gates in Block0 O(p = n/m) gates in multiplexer chain
Block0Block10Block11Block20Block21Blockp0Blockp1
Choose m to minimize max(n/m, m) Minimum is to choose m= n
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 12
Twelve-bit Carry-Select Example
Problem: add -3 (0xffd, 111111111101) to 17 (0x011, 000000010001))
Use 4-bit carry select blocks 1 d
1 f1 f
1 0
e00,0
0
0,1
Result is 0xe (14)
0 f0 f
1 0
0,f
0,0
0,0
0
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 13
Hardware for the Carry Select Adder
n blocks, each of n gates Additional hardware is n multiplexers +
additional adder for each block but the first n - n additional adder bits Therefore n + 2n - n = 2n gates Exactly twice the size of an ordinary adder, but
delay is n instead of n
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 14
Carry-Bypass Adder
Like the carry-select adder, has O(n) delay
But even more efficient (in terms of gates) than the carry-select Has only n + n log n gates
However, it broke every timing analyzer…
Instead of shortening the longest path, made it longer!
How can this be? Isn’t the delay of the circuit the length of the longest path?...
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 15
What is the delay of the Circuit?
The delay of a circuit is the time that the last output settles
This can be the length of the longest path, but sometimes isn’t
The longest path is an upper bound on the delay of the circuit, but sometimes this isn’t tight
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 16
Example
Long paths are from X,Y->out through bottom of circuit But no signal can travel down these paths!
zy
x 1
2
2
2
2
2
2
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 17
Example
zy
x 1
2
2
2
2
2
2
11 1
t=0t=1 0
t=2
1
1
1
t=3
t=6
0
1
1
t=4
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 18
Timing Analysis
zy
x 1
2
2
2
2
2
2x y z delay
0 0 0 6 (z->out)
0 0 1 5 (z->z’->out)
0 1 0 6 (z->out)
0 1 1 5 (z->z’->out)
1 0 0 6 (z->out)
1 0 1 6 (y->out)
1 1 0 6 (z->out)
1 1 1 6 (x,y->out)
Longest path is 8, but no signal ever travels down it!
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 19
What happened?
Long Paths are false A->B requires z=1 B->C requires z=0
Conflict! No signal can propagate down this path This analysis doesn’t quite work
Analysis has to take into account delays Complete theory not understood till 1993
This is good enough for carry-bypass adder
zy
x 1
2
2
2
2
2
2
A
B
C
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 20
Announcements
Prof. Pister will lecture on wireless protocol Thursday Need this for your project
Spring Break
Tuesday 4/1 – TBD
Thursday 4/3 – MT review
Tuesday 4/8 – MT 2
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 21
False Paths and Adders
Key idea: Don’t make critical paths in adder short Idea behind Carry Lookahead and Carry-Select adders Instead, make long paths false
Critical Path is Through the Carry Chain Only exercised when propagate bit through every block
is set? (Question: is this likely?) Therefore: when signal would propagate through carry
chain, skip the block!
Recall from block carry-lookahead adder: Group Propagate PG = P0P1P2P3 When PG=1 have the carry skip the whole block!
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 22
Carry-Skip Block
m-bit ripple-carry adder
PG
0
1
Carry-in
Carry-in to next block
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 23
Suppose Carry-in Propagates to Carry-Out…
m-bit ripple-carry adder
PG
0
1
Carry-in
Carry-in to next block
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 24
Then PG=1
m-bit ripple-carry adder
PG
0
1
Carry-in
Carry-in to next block
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 25
So Path goes Through the 1-port of the MUX
m-bit ripple-carry adder
PG
0
1
Carry-in
Carry-in to next block
Delay is 1-MUX delay, not 4 propagate delays!
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 26
Full Carry-Bypass Adder
Block 0PG
0
1
Carry-inBlock 1
0
1
Block n/m
0
1
As before, n/m array of m-bit blocks
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 27
Full Carry-Bypass Adder: Worst-case path
Block 0PG
0
1
Carry-inBlock 1
0
1
Block n/m -1
0
1
Worst-case path goes through m-1 bits of block 0, n/m-2 1 gates of multiplexer, m-1 bits of block n/m -1
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 28
Timing and Size Analysis
Delay = 2 * (m – 1) + n/m – 2 Choose m to minimize delay => m= n We have Delay = 2 * (n – 1) + n – 2 = 3 n – 4 What’s the additional circuitry?
log m gates to build PG (1 per block) 1 two-input multiplexer per block n/m blocks => n/m (log m + 1) m = n => n (log n/2 + 1)
Same delay as carry-select, but much smaller (n + n) vs 2n
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 29
Full Carry-Bypass Adder: Longest path
Block 0PG
0
1
Carry-inBlock 1
0
1
Block n/m -1
0
1
Longest path goes through all blocks and all multiplexers: m * n/m + n/m
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 30
Longest Path vs Circuit Delay
Longest Path is n + n
Worst-case path is n
Worst-case path for ripple-carry is n
Made things better, but a timing analyzer thinks it’s worse! Stimulated tremendous interest in timing analyzers!
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 31
Adder Summary
Adder Delay Size
Ripple-Carry n n
Carry-Lookahead (full)
log n n^3
Carry lookahead
(block)
14/5 log n n + n^3/5
Carry Select n 2n
Carry-Bypass n n+n
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 32
A comment on n
Asymptotic results tell us what happens at infinity
For our purposes, n=16, 32, 64 Means: square root n = 4 – 8 Means: Log n = 4-6
For the sizes we are interested in, carry-select and carry-bypass are as fast as block CLA
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 33
Remaining Questions (just for fun)
How often does worst-case delay path occur in Carry-bypass adder?
How do we automatically analyze for false paths?
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 34
How often does (near) worst-case delay occur?
Worst case delay: Pi = 1 for all i > j, small j Pi=Ai Bi
How often is Pi=Ai Bi = 1?
Ai Large Ai Small, Negative
Ai Small, Positive
Bi Large
Bi Small, Negative
Bi Small, Positive
Only two of nine cases, but they happen frequently
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 35
How hard is it to analyze false paths?
Hard! Problem noticed in early timing verifiers in the 1970’s Early researchers (Hitchcock, Jouppi, Ousterhout) used
hand-done rules Often wrong (if it’s hard to analyze automatically, it’s hard
to guess right by hand) Next: “Static sensitization”
Assert “non-controlling’’ values on side inputs (0 for OR/NOR, 1 for AND/NAND)
Make sure assignments are consistent Problem: Values are changing!
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 36
Example
To sensitize a->d->f->g, note: a->d requires b=1
But b=1 => e=0, and f->g requires b=1
Similar argument says you can’t set b->d->f->f
1
1
1
1a b
c
d
e
f
g
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 37
But…
1
1
1
1a b
c
d
e
f
g
a b c d e f g
0 0 0 0
1 0 0 0 0 1
2 0 0 0 0 1 0
3 0 0 0 0 1 0 0
Delay of the circuit is 3!
Path a->d->f->g really was true
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 38
Key Problem
All inputs are changing… a->d requires b=1 means b=1 stable at t=0 But b changes to 0 at t=0 Therefore, value of b is unknown (X)
Also, delays of gates are unknown “1” really means [0,1]
1
1
1
1a b
c
d
e
f
g
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 39
Key Idea: Derive Function for each time
1
1
1
1a b
c
d
e
f
g
a b c d e f g
0
1
2
3
d= 1 at 1d = 0 at 1d = X at 1
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 40
Key Idea: Derive Function for each time
1
1
1
1a b
c
d
e
f
g
a b c d e f g
0
1
2
3
(d= 1 at 1) = (a=1 at 0) and (b = 1 at 0)
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 41
Key Idea: Derive Function for each time
1
1
1
1a b
c
d
e
f
g
a b c d e f g
0
1
2
3
(d= 0 at 1) = (a=0 at 0) or (b = 0 at 0)
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 42
Key Idea: Derive Function for each time
1
1
1
1a b
c
d
e
f
g
a b c d e f g
0
1
2
3
(d= 0 at 1) = (d=1 at 1) nor (d = 1 at 0)
CS 150 – Spring 2008 – Lec #15: Ckt Performance - 43
Delay of the Circuit
Delay of the Circuit is the latest t such that (“output = X at t”) is not == 0
Problem is NP-complete
Size of problem is linear in number of time slices x number of gates
Mathematical machinery fairly massive “Special Theory”: 1989 – handled symmetric gates, zero-lower-
bounded delays (all signals were X until they hit their final values) Other cases were conservatively approximated
“General Theory”: 1993 – handled all gates, general delay models Gave exact answers for all delay types
Still hasn’t quite reached industrial practice!