Upload
seth-mendez
View
50
Download
5
Embed Size (px)
DESCRIPTION
ELECT 90X. Programmable Logic Circuits: Computer Arithmetic: Introduction. Dr. Eng. Amr T. Abdel-Hamid. Slides based on slides prepared by: B. Parhami, Computer Arithmetic: Algorithms and Hardware Design, Oxford University Press, 2000. - PowerPoint PPT Presentation
Citation preview
Programmable Logic Circuits:
Computer Arithmetic: Introduction
Dr. Eng. Amr T. Abdel-Hamid
ELECT 90X
Fall 2009
Slides based on slides prepared by: • B. Parhami, Computer Arithmetic: Algorithms and Hardware Design, Oxford University Press, 2000.• I. Koren, Computer Arithmetic Algorithms, 2nd Edition, A.K. Peters, Natick, MA, 2002.
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
What is Computer Arithmetic?
Pentium Division Bug (1994-95): Pentium’s radix-4 SRT algorithm occasionally gave incorrect quotient First noted in 1994 by T. Nicely who computed sums of reciprocals of twin primes:
1/5 + 1/7 + 1/11 + 1/13 + . . . + 1/p + 1/(p + 2) + . . .
Worst-case example of division error in Pentium:
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Using a calculator with √, x2, and xy functions, compute:u = √√ … √ 2 = 1.000 677 131 “1024th root of 2”v = 21/1024 = 1.000 677 131 Save u and v; If you
can’t save, recompute values when neededx = (((u2)2)...)2 = 1.999 999 963x' = u1024 = 1.999 999 973 y = (((v2)2)...)2 = 1.999 999 983y' = v1024 = 1.999 999 994 Perhaps v and u are not really the same valuew = v – u = 1 10–11 Nonzero due to hidden digits (u – 1) 1000 =0.677 130 680 [Hidden ... (0) 68](v – 1) 1000 =0.677 130 690 [Hidden ... (0) 69]
A Motivating Example
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Finite Range Can Lead to Disaster
Example: Explosion of Ariane Rocket (1996 June 4)
Unmanned Ariane 5 rocket of the European Space Agency veered off its flight path, broke up, and exploded only 30 s after lift-off (altitude of 3700 m)
The $500 million rocket (with cargo) was on its first voyage after a decade of development costing $7 billion
Cause: “software error in the inertial reference system”Problem specifics: A 64 bit floating point number relating
to the horizontal velocity of the rocket was being converted to a 16 bit signed integer
An SRI* software exception arose during conversion because the 64-bit floating point number had a value greater than what could be represented by a 16-bit signed integer (max 32 767) *SRI = Inertial Reference System
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Encoding Numbers in 4 Bits
Some of the possible ways of assigning 16 distinct codes to represent numbers.
0 2 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16
Unsigned integers
Signed-magnitude
3 + 1 fixed-point, xxx.x
Signed fraction, .xxx
2’s-compl. fraction, x.xxx
2 + 2 floating-point, s 2 e in [ 2, 1], s in [0, 3]
2 + 2 logarithmic (log = xx.xx)
Number format
log x
s e e
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
The Binary Number System In conventional digital computers - integers
represented as binary numbers of fixed length n An ordered sequence of
binary digits Each digit x (bit) is 0 or 1 The above sequence represents the integer value
X
Upper case letters represent numerical values or sequences of digits
Lower case letters, usually indexed, represent individual digits
i
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Radix of a Number System
The weight of the digit x is the i th power of 2 2 is the radix of the binary number system Binary numbers are radix-2 numbers -
allowed digits are 0,1 Decimal numbers are radix-10 numbers - allowed
digits are 0,1,2,…,9 Radix indicated in subscript as a decimal number Example:
(101) - decimal value 101
(101) - decimal value 5
i
10
2
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Range of Representations
Operands and results are stored in registers of fixed length n - finite number of distinct values that can be represented within an arithmetic unit
Xmin ; Xmax - smallest and largest representable values
[Xmin,Xmax] - range of the representable numbers
A result larger then Xmax or smaller than Xmin - incorrectly represented
The arithmetic unit should indicate that the generated result is in error - an overflow indication
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Example - Overflow in Binary System Unsigned integers with 5 binary digits (bits)
Xmax = (31)10 - represented by (11111)2 Xmin = (0)10 - represented by (00000)2
Increasing Xmax by 1 = (32)10 =(100000)2 5-bit representation - only the last five digits retained -
yielding (00000)2 =(0)10
In general - A number X not in the range [Xmin,Xmax]=[0,31] is
represented by X mod 32 If X+Y exceeds Xmax - the result is S = (X+Y) mod 32
Example: X 10001 17 +Y 10010 18
1 00011 3 = 35 mod 32 Result has to be stored in a 5-bit register - the most
significant bit (with weight 2 =32) is discarded5
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Fixed Radix Systems r - the radix of the number system Conventional number systems are also called
fixed-radix systems With no redundancy - 0 xi r-1
xi r introduces redundancy into the fixed-radix number system ?? HOW?
If xi r is allowed -
two machine representations for the same value
-(...,xi+1,xi,... ) and (...,xi+1+1,xi-r,... )
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Representation of Mixed Numbers A sequence of n digits in a register - not
necessarily representing an integer Can represent a mixed number with a fractional
part and an integral part The n digits are partitioned into two - k in the
integral part and m in the fractional part (k+m=n)
The value of an n-tuple with a radix point between the k most significant digits and the m least significant digits
is
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Fixed Point Representations Radix point not stored in register - understood to be in a
fixed position between the k most significant digits and the m least significant digits These are called fixed-point representations
Programmer not restricted to the predetermined position of the radix point Operands can be scaled - same scaling for all operands
Add and subtract operations are correct - aX aY=a(X Y) (a - scaling factor)
Corrections required for multiplication and division aX aY=a X Y ; aX/aY=X/Y
Commonly used positions for the radix point - rightmost side of the number (pure integers - m=0) leftmost side of the number (pure fractions - k=0)
2
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
ULP - Unit in Last Position
Given the length n of the operands, the weight r of the least significant digit indicates the position of the radix point
Unit in the last position (ulp) - the weight of the least significant digit
ulp = r This notation simplifies the discussion No need to distinguish between the different
partitions of numbers into fractional and integral parts
-m
-m
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Representation of Negative Numbers
Fixed-point numbers in a radix r system Two ways of representing negative numbers:
Sign and magnitude representation (or signed-magnitude representation)
Complement representation with two alternativesRadix complement (two's complement in the
binary system)Diminished-radix complement (one's
complement in the binary system)
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Signed-Magnitude Representation Sign and magnitude are represented separately First digit is the sign digit, remaining n-1 digits
represent the magnitude Binary case - sign bit is 0 for positive, 1 for negative
numbers Non-binary case - 0 and r-1 indicate positive and
negative numbers Only 2r out of the r possible sequences are
utilized Two representations for zero - positive and negative
Inconvenient when implementing an arithmetic unit - when testing for zero, the two different representations must be checked
n-1 n
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Disadvantage of the Signed-Magnitude Representation
Operation may depend on the signs of the operands Example - adding a positive number X and a negative
number -Y : X+(-Y) If Y>X, final result is -(Y-X) Calculation -
switch order of operands perform subtraction rather than addition attach the minus sign
A sequence of decisions must be made, costing excess control logic and execution time
This is avoided in the complement representation methods
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Complement Representations of Negative Numbers Two alternatives -
Radix complement (called two's complement in the binary system)
Diminished-radix complement (called one's complement in the binary system)
In both complement methods - positive numbers represented as in the signed-magnitude method
A negative number -Y is represented by R-Y where R is a constant
This representation satisfies -(-Y )=Y since R-(R-Y)=Y
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Advantage of Complement Representation
No decisions made before executing addition or subtraction
Example: X-Y=X+(-Y) -Y is represented by R-Y Addition is performed by X+(R-Y) = R-(Y-X) If Y>X, -(Y-X) is already represented as R-(Y-X) No need to interchange the order of the two
operands
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Two’s Complement r=2, k=n=4, m=0, ulp=2 =1 Radix complement (called two's complement in the binary
case) of a number X = 2 - X It can instead be calculated by X+1 0000 to 0111 represent positive numbers 010 to 710
The two's complement of 0111 is 1000+1=1001 it represents the value (-7)10
The two's complement of 0000 is 1111+1=10000=0 mod 2 - single representation of zero
Each positive number has a corresponding negative number that starts with a 1
1000 representing (-8)10 has no corresponding positive number
Range of representable numbers is -8 X 7
4
0
4
-
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
The Two’s Complement Representation
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Example - Addition in Two’s complement Calculating X+(-Y) with Y>X - 3+(-5)
0011 3 + 1011 -5
1110 -2 Correct result represented in the two's
complement method - no need for preliminary decisions or post corrections
Calculating X+(-Y) with X>Y - 5+(-3) 0101 5 + 1101 -3 1 0010 2
Only the last four least significant digits are retained, yielding 0010
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
One’s Complement in Binary System
r=2, k=n=4, m=0, ulp=2 =1 Diminished-radix complement (called one's
complement in the binary case) of a number X = (2 - 1) - X = X
As before, the sequences 0000 to 0111 represent the positive numbers 010 to 710
The one's complement of 0111 is 1000, representing (-7)10
The one's complement of zero is 1111 - two representations of zero
Range of representable numbers is -7 X 7
4 -
0
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Comparing the Three Representations in a Binary System
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
5.1 Bit-Serial and Ripple-Carry Adders
Half-adder (HA): Truth table and block diagram
Full-adder (FA): Truth table and block diagram
x y c c s ---------------------- 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1
Inputs Outputs
c out c in
out in x
y
s
FA
x y c s ---------------- 0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 0
Inputs Outputs
HA
x y
c
s
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Half-Adder Implementations
Three implementations of a half-adder.
c
s
(b) NOR-gate half-adder.
x
y
x
y
(c) NAND-gate half-adder with complemented carry.
x
y
c
s
s
cx
y
x
y
(a) AND/XOR half-adder._
_
_c
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Full-Adder Implementations
Possible designs for a full-adder in terms of half-adders, logic gates, and CMOS transmission gates.
HA
HA
xy
c in
cout
(a) Built of half-adders.
s
(b) Built as an AND-OR circuit.
(c) Suitable for CMOS realization.
cout
s
c in
xy
0 1 2 3
0 1 2 3
xy
c in
cout
s
0
1
Mux
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Full-Adder Details
CMOS transmission gate and its use in a 2-to-1 mux.
z
x
x
0
1
(a) CMOS transmission gate: circuit and symbol
(b) Two-input mux built of two transmission gates
TG
TG TG
y P
N
Logic equations for a full-adder: s = x y cin (odd parity function)
= x y cin x y cin x y cin x y cin
cout = x y x cin y cin (majority function)
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Simple Adders Built of Full-Adders
Using full-adders in building bit-serial and ripple-carry adders.
x y
c
x
s
y
c
x
s
y
c out c in
0 0
0
c 0
31
31
31
31
FA
s
c c
1 1
1
1 2 FA FA
32 . . .
s 32
x
s
y
c c
i i
i
i i+1 FA Carry
FF Shift
Shift
x
y
s
(a) Bit-serial adder.
(b) Ripple-carry adder.
Clock
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Critical Path Through a Ripple-Carry Adder
Critical path in a k-bit ripple-carry adder.
x
s
y
c
x
s
y
c
x
s
y
c
x
s
y
c
c out c in
0 0
0
c 0
1 1
1
1
k-2 k–2
k–2
2 k
k–1
k–1
k–1
k–1
FA FA FA FA . . . c k–2
s k
Tripple-add = TFA(x,ycout) + (k – 2)TFA(cincout) + TFA(cins)
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
x y c c s ---------------------- 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1
Inputs Outputs
c out c in
out in x
y
s
FA
Binary Adders as Versatile Building Blocks
Four-bit binary adder used to realize the logic function f = w + xyz and its complement.
c
3
c
4
c
2
c
1
c
0
0
1 w
1 z
0 y
x Bit 3 Bit 2 Bit 1 Bit 0
w xyz
(w xyz)
w xyz xyz xy 0
Set one input to 0: cout = AND of other inputs
Set one input to 1: cout = OR of other inputs
Set one input to 0 and another to 1: s = NOT of third input
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Conditions and Exceptions
Two’s-complement adder with provisions for detecting conditions and exceptions.
FAFA
xy 11 x0y0
c0c1
s0s1
FAc2
sk–1
cout cin...
ck–1ck–2
sk–2
ck
xk–2yk–2xk–1yk–1
FA
Overflow
Negative
Zero
overflow2’s-compl = ck ck–1 = ckck–1 ck ck–1
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Manchester Carry Chains and Adders
Sum digit in radix r si = (xi + yi + ci) mod r
Special case of radix 2 si = xi yi ci
Computing the carries ci is thus our central problem For this, the actual operand digits are not important What matters is whether in a given position a carry is
generated, propagated, or annihilated (absorbed)
For binary addition:
gi = xi yi pi = xi yi ai =xiyi = (xi yi) It is also helpful to define a transfer signal:
ti = gi pi = ai = xi yi
Using these signals, the carry recurrence is written as
ci+1 = gi ci pi = gi ci gi ci pi = gi ci ti
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Carry Network is the Essence of a Fast Adder
The main part of an adder is the carry network. The rest is just a set of gates to produce the g and p signals and the sum bits.
Carry network
. . . . . .
x i y i
g p
s
i i
i
c i c i+1
c k 1
c k
c k 2 c 1
c 0
g p 1 1 g p 0 0
g p k 2 k 2 g p i+1 i+1 g p k 1 k 1
c 0 . . . . . .
0 0 0 1 1 0 1 1
annihilated or killed propagated generated (impossible)
Carry is: g i p i
gi = xi yi pi = xi yi
Ripple; Skip;Lookahead;Parallel-prefix
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Ripple-Carry Adder Revisited
The carry propagation network of a ripple-carry adder.
. . . c
k 1
c
k c
k 2
c
1
g
p
1
1
g
p
0
0
g
p
k 2
k 2
g
p
k 1
k 1
c
0 c
2
The carry recurrence: ci+1 = gi pi ci
Latency of k-bit adder is roughly 2k gate delays:
1 gate delay for production of p and g signals, plus 2(k – 1) gate delays for carry propagation, plus1 XOR gate delay for generation of the sum bits
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
The Complete Design of a Ripple-Carry Adder
Carry network
. . . . . .
x i y i
g p
s
i i
i
c i c i+1
c k 1
c k
c k 2 c 1
c 0
g p 1 1 g p 0 0
g p k 2 k 2 g p i+1 i+1 g p k 1 k 1
c 0 . . . . . .
0 0 0 1 1 0 1 1
annihilated or killed propagated generated (impossible)
Carry is: g i p i
gi = xi yi pi = xi yi
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Unrolling the Carry RecurrenceRecall the generate, propagate, annihilate (absorb), and transfer signals:
Signal Radix r Binarygi is 1 iff xi + yi r xi yi
pi is 1 iff xi + yi = r – 1 xi yi
ai is 1 iff xi + yi < r – 1 xiyi = (xi yi) ti is 1 iff xi + yi r – 1 xi yi
si (xi + yi + ci) mod r xi yi ci
The carry recurrence can be unrolled to obtain each carry signal directly from inputs, rather than through propagation
ci = gi–1 ci–1 pi–1
= gi–1 (gi–2 ci–2 pi–2) pi–1
= gi–1 gi–2 pi–1 ci–2 pi–2 pi–1
= gi–1 gi–2 pi–1 gi–3 pi–2 pi–1 ci–3 pi–3 pi–2 pi–1
= gi–1 gi–2 pi–1 gi–3 pi–2 pi–1 gi–4 pi–3 pi–2 pi–1 ci–4 pi–4 pi–3 pi–2 pi–1
= . . .
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Full Carry Lookahead
Theoretically, it is possible to derive each sum digit directly from the inputs that affect it
Carry-lookahead adder design is simply a way of reducing the complexity of this ideal, but impractical, arrangement by hardware sharing among the various lookahead circuits
s0s1s2s3
y0y1y2y3 x0x1x2x3
cin
. . .
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Four-Bit Carry-Lookahead Adder
Complexity reduced by deriving the carry-out indirectly
Four-bit carry network with full lookahead.
g0
g1
g2
g3
c0
c4
c1
c2
c3
p3
p2
p1
p0
Full carry lookahead is quite practical for a 4-bit adder
c1 = g0 c0 p0
c2 = g1 g0 p1 c0 p0 p1
c3 = g2 g1 p2 g0 p1 p2 c0 p0 p1 p2
c4 = g3 g2 p3 g1 p2 p3 g0 p1 p2 p3 c0 p0 p1 p2 p3
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Carry Lookahead Beyond 4 Bits
32-input AND
Consider a 32-bit adder
c1 = g0 c0 p0
c2 = g1 g0 p1 c0 p0 p1
c3 = g2 g1 p2 g0 p1 p2 c0 p0 p1 p2
. . .
c31 = g30 g29 p30 g28 p29 p30 g27 p28 p29 p30 . . . c0 p0 p1 p2 p3 ... p29 p30
32-input OR. . . High fan-ins necessitate
tree-structured circuits
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Solutions to the Fan-in Problem
• Multilevel lookahead• Block Adders•High-radix addition (i.e., radix 2h) : Increases the latency for generating g and p signals and sum digits, but simplifies the carry network (optimal radix?)
Example: 16-bit addition
Radix-16 (four digits)
Two-level carry lookahead (four 4-bit blocks)
Either way, the carries c4, c8, and c12 are determined first
c16 c15 c14 c13 c12 c11 c10 c9 c8 c7 c6 c5 c4 c3 c2 c1 c0
Cout ? ? ? cin
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Block Ripple Adder
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Larger Carry-Lookahead Adder Design Block generate and propagate signals
g [i,i+3] = gi+3 gi+2 pi+3 gi+1 pi+2 pi+3 gi pi+1 pi+2 pi+3
p [i,i+3] = pi pi+1 pi+2 pi+3
• If all 4 bits in a block propagate, the block propagates a carry. • If at least one of the 4 bits generates carry and it can be propagated to the MSB, the block generates a carry.
ic4-bit lookahead carry generator
g p g p g p g p
[i,i+3]p
i+1c
i+2c
i+3c
g
iii+1i+1i+2 i+2 i+3 i+3
[i,i+3]
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
A Building Block for Carry-Lookahead Addition
Four-bit lookahead carry generator.
g0
g1
g2
g3
c0
c4
c1
c2
c3
p3
p2
p1
p0
gi
gi+1
gi+2
gi+3
ci
ci+1
ci+2
ci+3
pi+3
pi+2
pi+1
pi
g
p[i,i+3]
Block Signal GenerationIntermediate Carries
[i,i+3]
Four-bit adder
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Combining Block g and p Signals
Combining of g and p signals of four blocks of arbitrary widths into the g and p signals for the overall block
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
A Two-Level Carry-Lookahead Adder
cccc
4-bit lookahead carry generator
4-bit lookahead carry generator
g p
ccc
g p
12 8 4 0
48 32 16
[0,63]
16-bit Carry-Lookahead Adder
[0,63]
[48,63]
[48,63] g p [32,47]
[32,47] g p [0,15]
[0,15]g p [16,31]
[16,31]
g p [12,15]
[12,15] g p [8,11]
[8,11] g p [4,7]
[4,7] g p [0,3]
[0,3]
Building a 64-bit carry-lookahead adder from 16 4-bit adders and 5 lookahead carry generators.
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Ling Adder and Related Designs
Consider the carry recurrence and its unrolling by 4 steps: ci = gi–1 ci–1 ti–1
= gi–1 gi–2 ti–1 gi–3 ti–2 ti–1 gi–4 ti–3 ti–2 ti–1 ci–4 ti–4 ti–3 ti–2 ti–1
Ling’s modification: Propagate hi = ci ci–1 instead of ci hi = gi–1 hi–1 ti–2
= gi–1 gi–2 gi–3 ti–2 gi–4 ti–3 ti–2 hi–4 ti–4 ti–3 ti–2
CLA: 5 gates max 5 inputs 19 gate inputsLing: 4 gates max 5 inputs 14 gate inputs
The advantage of hi over ci is even greater with wired-OR:
CLA: 4 gates max 5 inputs 14 gate inputsLing: 3 gates max 4 inputs 9 gate inputs
Once hi is known, however, the sum is obtained by a slightly more complex expression compared with si = pi ci
si = (ti hi+1) hi gi ti–1
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Carry Determination as Prefix Computation
g" p"
i 0i 1
j 0j 1
g p
g' p'
Block B'
Block B"
Block B(g, p)
(g", p") (g', p')
¢
g = g" + g'p" p = p'p"
g p
g p
g p
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Formulating the Prefix Computation Problem
The problem of carry determination can be formulated as:Given (g0, p0)(g1, p1) . . . (gk–2, pk–2) (gk–1, pk–1) Find (g [0,0] , p [0,0]) (g [0,1] , p [0,1]) . . . (g [0,k–2] , p [0,k–2]) (g [0,k–1] , p [0,k–1])
c1 c2 . . . ck–1 ck
The desired pairs are found by evaluating all prefixes of (g0, p0) ¢ (g1, p1) ¢ . . . ¢ (gk–2, pk–2) ¢ (gk–1, pk–1)
The carry operator ¢ is associative, but not commutative
[(g1, p1) ¢ (g2, p2)] ¢ (g3, p3) = (g1, p1) ¢ [(g2, p2) ¢ (g3, p3)]
Prefix sums analogy:Given x0 x1 x2 . . . xk–1 Find x0 x0+x1 x0+x1+x2 . . . x0+x1+...+xk–1
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
g0, p0g1, p1g2, p2g3, p3
g[0,0], p[0,0]
= (c1, --)
g[0,1], p[0,1]
= (c2, --)
g[0,2], p[0,2]
= (c3, --)
g[0,3], p[0,3]
= (c4, --)
Example Prefix-Based Carry Network
g p
g p
g p
++
++
26 51
712 5 6
g0, p0g1, p1g2, p2g3, p3
g[0,0], p[0,0]
= (c1, --)
g[0,1], p[0,1]
= (c2, --)
g[0,2], p[0,2]
= (c3, --)
g[0,3], p[0,3]
= (c4, --)
¢¢
¢¢
Four-input prefix sums network
Scan order
Four-bitCarry lookahead network
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Alternative Parallel Prefix Networks
Parallel prefix sums network built of two k/2-input networks and k/2 adders. (Ladner-Fischer)
. . .
Prefix Sums k/2 Prefix Sums k/2
. . .
xk–1 xk/2 xk/2–1 x0
sk–1 sk/2
sk/2–1 s0+ +. . .
. . .
. . . . . .
. . .
. . .. . .
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Brent-Kung Recursive Construction
Parallel prefix sums network built of one k/2-input network and k – 1 adders.
Prefix Sums k/2
xk–1 xk–2 x3 x2 x1 x0
s k–1 s k–2 s 3 s 2 s 1 s 0
++
+
+
+
. . .
. . .
. . .
. . .
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Brent-Kung Carry Network (8-Bit Adder)
¢ ¢ ¢ ¢
¢ ¢
¢ ¢
¢ ¢ ¢
[7, 7 ] [6, 6 ] [5, 5 ] [4, 4 ] [3, 3 ] [2, 2 ] [1, 1 ] [0, 0 ]
[0, 7 ] [0, 6 ] [0, 5 ] [0, 4 ] [0, 3 ] [0, 2 ] [0, 1 ] [0, 0 ]
g p [0,1] [0,1]
g p [1,1] [1,1] g p [0,0] [0,0]
[2, 3 ] [4, 5 ]
[6, 7 ]
[4, 7 ] [0, 3 ]
[0, 1 ]
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Brent-Kung Carry Network (16-Bit Adder)
x0
x1
x2
x3
x4
x5
x6
x7
x8x9x10x11x
12x
13x
14x
15
s0s1s2s3s4s5s6s7s8s9s10s11
s12s13s14s15
1 2 3 4 5 6
Level
Brent-Kung parallel prefix graph for 16 inputs.
Reason for latency
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Kogge-Stone Carry Network (16-Bit Adder)
Kogge-Stone parallel prefix graph for 16 inputs.
x0
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
x14
x15
s0
s1
s2
s3
s4
s5
s6
s7
s8
s9
s10
s11
s12
s13
s14
s15
log2k levels (minimum possible)
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Speed-Cost Tradeoffs in Carry Networks
Method Delay Cost
Ladner-Fischer ? (k/2) log2k
Kogge-Stone ? k log2k – k + 1
Brent-Kung ? 2k – 2 – log2k
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Hybrid B-K/K-S Carry Network (16-Bit Adder)x0x1x2x3x4x5x6x7x8x9x10x11
x12x13x14x15
s0s1s2s3s4s5s6s7s8s9s10s11s12s13s14s15
x0
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
x14
x15
s0s1s2s3s4s5s6s7s8s9s10s11s12s13s14s15
1 2 3 4 5 6
Level
x0
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
x14
x15
s0
s1
s2
s3
s4
s5
s6
s7
s8
s9
s10
s11
s12
s13
s14
s15
Brent- Kung
Brent- Kung
Kogge- Stone
A Hybrid Brent-Kung/ Kogge-Stone parallel prefix graph for 16 inputs.
Brent-Kung: 6 levels 26 cells
Kogge-Stone: 4 levels49 cells
Hybrid: 5 levels 32 cells
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Simple Carry-Skip Adders
Converting a 16-bit ripple-carry adder into a simple carry-skip adder with 4-bit skip blocks.
cc ccc
cc ccc
pppp
SkipSkipSkip
4-Bit Block
Skip logic (2 gates)
1612
8
4
0
0
4
8
1216
[12,15] [8,11] [4,7][0,3]
(a) Ripple-carry adder.
(b) Simple carry-skip adder.
3 2 1 0
Ripple-carry stages
4-Bit Block
4-Bit Block
4-Bit Block
4-Bit Block
4-Bit Block
3 2 1 0
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Another View of Carry-Skip Addition
Street/freeway analogy for carry-skip adder.
c
g
p
4j+1
4j+1
g
p
4j
4j
g
p
4j+2
4j+2
g
p
4j+3
4j+3
c
4j
4j+4
c
4j+3
c
4j+2
c
4j+1
One-way street
Freeway
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Multilevel Carry-Skip Adders
One-level carry-skip adder.
S 1
c out c in
S 1 S 1 S 1 S 1
Example of a two-level carry-skip adder.
S 2
S 1
c out c in
S 1 S 1 S 1 S 1
c out c in
S
2
S
1
S
1
S
1
Two-level carry-skip adder optimized by removing the short-block skip circuits.
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Using Two-Operand Adders
Some applications of multioperand addition
• • • • a • • • • x ---------- • • • • x a • • • • x a • • • • x a • • • • x a ---------------- • • • • • • • • p
0 1 2 3
0 1 2 3
2 2 2 2
• • • • • • p • • • • • • p • • • • • • p • • • • • • p • • • • • • p • • • • • • p • • • • • • p ----------------- • • • • • • • • • s
(0) (1) (2) (3) (4) (5) (6)
Multioperand addition problems for multiplication or inner-product computation in dot notation.
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Serial Implementation with One Adder
Adder x
k bits
k + log n bits x j=0
i–1
(i)
2 (j)
Partial sum register
Serial implementation of multi-operand addition with a single 2-operand adder.
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Pipelined Implementation for Higher Throughput
Serial multi-operand addition when each adder is a 4-stage pipeline.
(i–10)(i–9)
Delay
DelaysReady to compute s(i–12)
x(i–1)
x(i)
x +(i) x(i–1)
x +(i–8) x + (i–11)x +x
(i–7)x +(i–6) x
(i–5)x +(i–4) x
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Parallel Implementation as Tree of Adders
Adding 7 numbers in a binary tree of adders.
Adder Adder Adder
AdderAdder
Adder
k
k+1
k+2
k+3
k+2
k+1k+1
k kk kk k
log2n adder levels
n – 1 adders
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Carry-Save Adders
FA FAFA FA FAFA
FA FAFA FA FAFA
Cut
A ripple-carry adder turns into a carry-save adder if the carries are saved (stored) rather than propagated.
Carry-propagate adder
Carry-save adder (CSA) or (3; 2)-counter or 3-to-2 reduction circuit
c
in
c
out
Carry-propagate adder (CPA) and carry-save adder (CSA) functions in dot notation.
Half-adder
Full-adder
Specifying full- and half-adder blocks, with their inputs and outputs, in dot notation.
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Multioperand Addition Using Carry-Save Adders
Tree of carry-save adders reducing seven numbers to two.
CSACSA
CSA
CSA
CSA
Carry-propagate adder
Serial carry-save addition using a single CSA.
CSA
Input
Sum registerCarry register
Output
CPA
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Example Reduction by a CSA Tree
12 FAs
6 FAs
6 FAs
4 FAs + 1 HA
7-bit adder
Total cost = 7-bit adder + 28 FAs + 1 HA
Addition of seven 6-bit numbers in dot notation.
8 7 6 5 4 3 2 1 0 Bit position
7 7 7 7 7 7 62 = 12 FAs 2 5 5 5 5 5 3 6 FAs
3 4 4 4 4 4 1 6 FAs
1 2 3 3 3 3 2 1 4 FAs + 1 HA
2 2 2 2 2 1 2 1 7-bit adder
--Carry-propagate adder--
1 1 1 1 1 1 1 1 1
Representing a seven-operand addition in tabular form.
A full-adder compacts 3 dots into 2 (compression ratio of 1.5)
A half-adder rearranges 2 dots (no compression, but still useful)
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Width of Adders in a CSA Tree
Adding seven k-bit numbers and the CSA/CPA widths required.
Due to the gradual retirement (dropping out) of some of the result bits, CSA widths do not vary much as we go down the tree levels
k-bit CPA
k-bit CSA k-bit CSA
k-bit CSA
k-bit CSA
0k+2
The index pair [i, j] means that bit positions from i up to j are involved.
k-bit CSA
[0, k–1] [0, k–1]
[0, k–1] [0, k–1]
[0, k–1] [0, k–1]
[0, k–1] [0, k–1]
[0, k–1]
[1, k] [1, k]
[1, k]
[1, k]
[0, k–1]
[2, k+1] [2, k+1]
[2, k+1]
[2, k+1] [1, k–1]
1
[1, k+1]
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Wallace Tree Multiplier
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Wallace Tree Multiplier
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
DADDA Tree Multiplier
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
DADDA Tree Multiplier
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
DADDA Tree Multiplier
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Wallace Tree Multiplier
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Saturating Adders
Saturating (saturation) arithmetic:
When a result’s magnitude is too large, do not wrap around; rather, provide the most positive or the most negative value that is representable in the number format
Designing saturating adders
Saturating arithmetic in desirable in many DSP applications
Saturation value
Overflow
0
1
Adder
Unsigned (quite easy)
Signed (slightly harder)
Example – In 8-bit 2’s-complement format, we have:120 + 26 18 (wraparound); 120 +sat 26 127 (saturating)
Dr. A
mr T
alaat
ELECT 90X
Pro
gram
mab
le Lo
gic C
ircuits
Readings:
Main reference for the above slides: Chapters 5,6,7,& 8, B. Parhami, Computer
Arithmetic: Algorithms and Hardware Design, Oxford University Press, 2000.