Upload
hacong
View
242
Download
0
Embed Size (px)
Citation preview
ECE553 ECE553 1
Reading: CH 11.1 – 11.3.1
Lecture 7: Adders
ECE553 ECE553
The Latest News in EE world
2
20nm NAND, Hynix, 2010/2/10 R&D, Third quarter mass production
20nm NAND, Samsung, 2010/4/19 50% more density than 30nm 1st product level Write speed: 10MB/s
$~180
ECE553 ECE553
A Generic Digital Processor
3
MEMORY
DATAPATH
CONTROL
INPUT-OUTPUT
<Building Blocks> Arithmetic unit - bit-sliced datapath (adder,
multiplier, shifter, comparator, etc)
Memory - RAM, ROM, Buffers, Shift
registers Control - Finite state machine (PLA,
random logic) - Counters
Interconnect - Switches - Arbiters - Bus
ECE553 ECE553
An Intel Microprocessor
4
9-1
Mux
9-1
Mux
5-1
Mux
2-1
Mux
ck1
CARRYGEN
SUMGEN+ LU
1000um
b
s0
s1
g64
sum sumb
LU : LogicalUnit
SUM
SEL
a
to Cachenode1
REG
Itanium has 6 integer execution units like this
ECE553 ECE553
Bit-Sliced Design
5
Bit 3
Bit 2
Bit 1
Bit 0
Reg
ister
Add
er
Shift
er
Mul
tiple
xer
ControlD
ata-
In
Dat
a-O
ut
Tile identical processing elementsDatapath: all computations are performed word-based
Tile identical processing elements
ECE553 ECE553
Bit-Sliced Datapath
6
Adder stage 1
Wiring
Adder stage 2
Wiring
Adder stage 3
Bit slice 0
Bit slice 2
Bit slice 1
Bit slice 63
Sum Select
Shifter
Multiplexers
Loopback Bus
From register files / Cache / Bypass
To register files / CacheLoopback B
us
Loopback Bus
ECE553 ECE553
Itanium Integer Datapath
7
Fetzer, Orton, ISSCC’02
ECE553 ECE553
Full-Adder
8
A B
Cout
Sum
Cin Fulladder
Half adder
P: Cout = Cin G: Cout = 1 D: Cout = 0
ECE553 ECE553
The Binary Adder
9
S A B Ci⊕ ⊕=
A= BCi ABCi ABCi ABCi+ + +
Co AB BCi ACi+ +=
A B
Cout
Sum
Cin Fulladder
( ) ( ( ))out inC A B C A B= ⋅ + ⋅ ⊕
ECE553 ECE553
Express Sum and Carry as a function of P, G, D
10
Define 3 new variable which ONLY depend on A, B, not Ci: why?
Generate (G) = AB
Propagate (P) = A + B
Delete = A B
Can also derive expressions for S and C o based on D and P
Propagate (P) = A + B Note that we will be sometimes using an alternate definition for
( ) ( ( ))i
o i
S A B CC A B C A B= ⊕ ⊕= ⋅ + ⋅ ⊕
CO (G, P) =G + PCi
S(G, P) = P⊕Ci
ECE553 ECE553
The Ripple-Carry Adder
11
Worst case delay linear with the number of bits(N)
Goal: Make the fastest possible carry path circuit
FA FA FA FA
A0 B0
S0
A1 B1
S1
A2 B2
S2
A3 B3
S3
Ci,0 Co,0
(= Ci,1)
Co,1 Co,2 Co,3
td = O(N) tadder = (N-1)tcarry + tsum
Simple but slow !!
Critical path: the worst case delay over all possible input patterns
ECE553 ECE553
Static Adder Circuit
12
CO = AB + BCi + ACi = AB +Ci( A+ B)
S = ABCi +CO ( A+ B +Ci )
S = A⊕ B⊕Ci
= ABCi + ABCi + ABCi + ABCi
Co = AB + BCi + ACi
ECE553 ECE553
Complimentary Static CMOS Full Adder
13
28 Transistors
A B
B
A
Ci
Ci A
X
VDD
VDD
A B
Ci BA
B VDD
A
B
Ci
Ci
A
B
A CiB
Co
VDD
S
Implementation is ugly!!
Sizing?
O i iC AB BC AC= + +( )i O iS ABC C A B C= + + +
ECE553 ECE553
Inversion Property
14
A B
S
CoCi FA
A B
S
CoCi FA
S A B Ci, ,( ) S A B Ci, ,( )=
Co A B Ci, ,( ) Co A B Ci, ,( )=
Inverted inputs inverted outputs
ECE553 ECE553
Minimize Critical Path by Reducing Inverting Stages
15
Exploit Inversion Property
A3
FA FA FA
Even cell Odd cell
FA
A0 B0
S0
A1 B1
S1
A2 B2
S2
B3
S3
Ci,0 Co,0 Co,1 Co,3Co,2
Allows us to remove inverter in carry chain at what cost?
ECE553 ECE553
A Better Structure: The Mirror Adder
16
VDD
CiA
BBA
B
A
A BKill
Generate"1"-Propagate
"0"-Propagate
VDD
Ci
A B Ci
Ci
B
A
Ci
A
BBA
VDD
SCo
24 transistors
ECE553 ECE553
The Mirror Adder Details u The NMOS and PMOS chains are completely symmetrical.
A maximum of 2 series transistors can be observed in the carry-generation circuitry.
u In layout critical to minimize the capacitance at node Co. The reduction of the diffusion (junction) capacitances is particularly important.
u The capacitance at node Co is composed of 4 diffusion (junction) capacitances, 2 internal gate capacitances, and 6 gate capacitances in the connecting adder cell .
u The transistors connected to Ci are placed closest to the output.
u Only the transistors in the carry stage have to be optimized for optimal speed. All transistors in the sum stage can be minimal size.
17
ECE553 ECE553
Mirror Adder
18
Stick Diagram
CiA B
VDD
GND
B
Co
A Ci Co Ci A B
S
ECE553 ECE553
Layout of Mirror Adder
19
l 2 possible layouts of mirror adder - (a) Corresponds
roughly to last slide’s stick diagram
- (b) Layout is datapath-oriented (ex. M2 can easily run horizontally across cell)
(a)
(b)
ECE553 ECE553
Sizing Mirror Adder
20
ECE553 ECE553
Transmission Gate Full Adder
21
A
B
P
Ci
VDD A
A A
VDD
Ci
A
P
AB
VDD
VDD
Ci
Ci
Co
S
Ci
P
P
P
P
P
Sum Generation
Carry Generation
Setup
24 transistors Similar delay for Sum and Carry
CO (G, P) =G + PCi
S(G, P) = P⊕Ci
ECE553 ECE553
Manchester Carry Chain
22
CoCi
Gi
DiPi
PiVDD CoCi
Gi
PiVDD
φ
φ
Static Dynamic
ECE553 ECE553
Manchester Carry Chain
23
G2
φ
C3
G3Ci,0
P0
G1
VDD
φ
G0
P1 P2 P3
C3C2C1C0
• Implement P with pass-transistors (1 0 transition), not TG • Implement G with pull-down OR delete with pull-down (note inversion) • Use dynamic logic to reduce complexity and speed up
Worst delay path?
ECE553 ECE553
Manchester Carry Chain
24
Pi + 1 Gi + 1 φ
Ci
Inverter/Sum Row
Propagate/Generate Row
Pi Gi φ
Ci - 1Ci + 1
VDD
GND
Stick Diagram
ECE553 ECE553
Carry-Bypass Adder
25
FA FA FA FA
P0 G1 P0 G1 P2 G2 P3 G3
Co,3Co,2Co,1Co,0Ci,0
FA FA FA FA
P0 G1 P0 G1 P2 G2 P3 G3
Co,2Co,1Co,0Ci,0
Co,3
Multiplexer
BP=PoP1P2P3
Idea: If (P0 and P1 and P2 and P3 = 1)then Co3 = C0, else “kill” or “generate”.
Also called Carry-Skip
Small area overhead (10~20%) But, not regular bit-slice structure
ECE553 ECE553
Carry-Bypass Adder (cont.)
26
Carrypropagation
SetupBit 0–3
Sum
M bits
tsetup
tsum
Carrypropagation
SetupBit 4–7
Sum
tbypass
Carrypropagation
SetupBit 8–11
Sum
Carrypropagation
SetupBit 12–15
Sum
tadder = tsetup + Mtcarry + (N/M-1)tbypass + (M-1)tcarry + tsum
• Inner blocks do not contribute to worst-case delay since they have time to compute while bits 0-3 are propagating (assuming they have a generate or delete)
• Block sizes can be made non-uniform (HOW?)
Total N-bit adder
ECE553 ECE553
Carry Ripple versus Carry Bypass
27
N
tp
ripple adder
bypass adder
4..8
Why?
ECE553 ECE553
Carry-Select Adder
28
Setup
"0" Carry Propagation
"1" Carry Propagation
Multiplexer
Sum Generation
Co,k-1 Co,k+3
"0"
"1"
P,G
Carry Vector
Basic idea: Pre-compute both possibilities (0 or 1) à choose when Cin available
0 1
Area overhead: additional carry path + mux ~ 30%
ECE553 ECE553
Carry Select Adder: Critical Path
29
0
1
Sum Generation
Multiplexer
1-Carry
0-Carry
Setup
Ci,0 Co,3 Co,7 Co,11 Co,15
S0–3
Bit 0–3 Bit 4–7 Bit 8–11 Bit 12–15
0
1
Sum Generation
Multiplexer
1-Carry
0-Carry
Setup
S4–7
0
1
Sum Generation
Multiplexer
1-Carry
0-Carry 0-Carry
Setup
S8–11
0
1
Sum Generation
Multiplexer
1-Carry
Setup
S12–15
( )add setup carry mux sumNt t Mt t tM
= + + + Still O(N)
ECE553 ECE553
Linear Carry Select
30
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Bit 0-3 Bit 4-7 Bit 8-11 Bit 12-15
S0-3 S4-7 S8-11 S12-15
Ci,0
(1)
(1)
(5)(6) (7) (8)
(9)
(10)
(5) (5) (5)(5)
Still ~ O(N)
( )add setup carry mux sumNt t Mt t tM
= + + +
ECE553 ECE553
Square Root Carry Select
31
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Setup
"0" Carry
"1" Carry
Multiplexer
Sum Generation
"0"
"1"
Bit 0-1 Bit 2-4 Bit 5-8 Bit 9-13
S0-1 S2-4 S5-8 S9-13
Ci,0
(4) (5) (6) (7)
(1)
(1)
(3) (4) (5) (6)
Mux
Sum
S14-19
(7)
(8)
Bit 14-19
(9)
(3)
~O(sqrt(N))
( 2 )add setup carry mux sumt t P t N t t= + ⋅ + +
ECE553 ECE553
Adder Delays - Comparison
32
Square root select
Linear select
Ripple adder
20 40N
t p(in
uni
t del
ays)
600
10
0
20
30
40
50
ECE553 ECE553
Carry Lookahead Adder - Basic Idea
33
AN-1, BN-1A1, B1
P1
S1
• • •
• • • SN-1
PN-1Ci, N-1
S0
P0Ci,0 Ci,1
A0, B0
To remove the rippling effect of the carry
, 1
1
, , 1
, ,01 1 0 0
( , , )( ( ( )))
o k o k
o k
k k o k k k
k k k k i
f A B CC CG PG P G P P GC CP
− −
− −
= = + ⋅+ + + + ⋅= L
ECE553 ECE553
Lookahead Adder
34
Carry and sum are independent of the previous bits Addition time is independent of the #of bits
ECE553 ECE553
Lookahead Adder
35
u Unrolling of carry recurrence can be continued u If unrolled to level k, resulting in two-level AND str
ucture u AND Fan-in = k + 1, OR Fan-in = k + 1 u k + 1 transistors in the MOS stack u Limits k to 3-4
ECE553 ECE553
Look-Ahead: Topology – 4 bits
36
Expanding Lookahead equations:
All the way:
Co,3Ci,0
VDD
P0
P1
P2
P3
G0
G1
G2
G3
Mirror Implementation
Co,k =Gk + Pk (Gk−1 + Pk−1(+ P1(G0 + P0 ⋅Ci ,0 )))
, , 1
1 1 , 2( )o k k k o k
k k k k o k
C G PCG P G P C
−
− − −
= += + +
Co,3 =G3 + P3G2 + P3P2G1 + P3P2P1G0 + P3P2P1P0Ci ,0= (G3 + P3G2 ) + (P3P2 )Co,1=G3:2 + P3:2Co,1
k=3
ECE553 ECE553
Block (Logarithmic) Lookahead
37
ECE553 ECE553
Block lookahead
38
Can create groups of groups, or ‘super-groups’
ECE553 ECE553
Summary
39
Many topologies for adders l Extensions of carry lookahead are dominant today (why?)
For 16-bit addition, complex techniques such as carry lookahead do not offer much benefit l Carry lookahead based techniques are much faster for N>16, lots more
area/complexity l Carry select and carry bypass yield good performance in this case l Can string together 4-bit lookahead structures in a ripple fashion to
achieve decent performance
Adder cells may use mirror structure or transmission gates