Programmable Logic Circuits: Computer Arithmetic: Introduction

Programmable Logic Circuits:

Computer Arithmetic: Introduction

Dr. Eng. Amr T. Abdel-Hamid

ELECT 90X

Fall 2009

Slides based on slides prepared by: • B. Parhami, Computer Arithmetic: Algorithms and Hardware Design, Oxford University Press, 2000.• I. Koren, Computer Arithmetic Algorithms, 2nd Edition, A.K. Peters, Natick, MA, 2002.

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

What is Computer Arithmetic?

Pentium Division Bug (1994-95): Pentium’s radix-4 SRT algorithm occasionally gave incorrect quotient First noted in 1994 by T. Nicely who computed sums of reciprocals of twin primes:

1/5 + 1/7 + 1/11 + 1/13 + . . . + 1/p + 1/(p + 2) + . . .

Worst-case example of division error in Pentium:

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Using a calculator with √, x2, and xy functions, compute:u = √√ … √ 2 = 1.000 677 131 “1024th root of 2”v = 21/1024 = 1.000 677 131 Save u and v; If you

can’t save, recompute values when neededx = (((u2)2)...)2 = 1.999 999 963x' = u1024 = 1.999 999 973 y = (((v2)2)...)2 = 1.999 999 983y' = v1024 = 1.999 999 994 Perhaps v and u are not really the same valuew = v – u = 1 10–11 Nonzero due to hidden digits (u – 1) 1000 =0.677 130 680 [Hidden ... (0) 68](v – 1) 1000 =0.677 130 690 [Hidden ... (0) 69]

A Motivating Example

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Finite Range Can Lead to Disaster

Example: Explosion of Ariane Rocket (1996 June 4)

Unmanned Ariane 5 rocket of the European Space Agency veered off its flight path, broke up, and exploded only 30 s after lift-off (altitude of 3700 m)

The $500 million rocket (with cargo) was on its first voyage after a decade of development costing $7 billion

Cause: “software error in the inertial reference system”Problem specifics: A 64 bit floating point number relating

to the horizontal velocity of the rocket was being converted to a 16 bit signed integer

An SRI* software exception arose during conversion because the 64-bit floating point number had a value greater than what could be represented by a 16-bit signed integer (max 32 767) *SRI = Inertial Reference System

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Encoding Numbers in 4 Bits

Some of the possible ways of assigning 16 distinct codes to represent numbers.

0 2 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16

Unsigned integers

Signed-magnitude

3 + 1 fixed-point, xxx.x

Signed fraction, .xxx

2’s-compl. fraction, x.xxx

2 + 2 floating-point, s 2 e in [ 2, 1], s in [0, 3]

2 + 2 logarithmic (log = xx.xx)

Number format

log x

s e e

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

The Binary Number System In conventional digital computers - integers

represented as binary numbers of fixed length n An ordered sequence of

binary digits Each digit x (bit) is 0 or 1 The above sequence represents the integer value

X

Upper case letters represent numerical values or sequences of digits

Lower case letters, usually indexed, represent individual digits

i

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Radix of a Number System

The weight of the digit x is the i th power of 2 2 is the radix of the binary number system Binary numbers are radix-2 numbers -

allowed digits are 0,1 Decimal numbers are radix-10 numbers - allowed

digits are 0,1,2,…,9 Radix indicated in subscript as a decimal number Example:

(101) - decimal value 101

(101) - decimal value 5

i

10

2

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Range of Representations

Operands and results are stored in registers of fixed length n - finite number of distinct values that can be represented within an arithmetic unit

Xmin ; Xmax - smallest and largest representable values

[Xmin,Xmax] - range of the representable numbers

A result larger then Xmax or smaller than Xmin - incorrectly represented

The arithmetic unit should indicate that the generated result is in error - an overflow indication

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Example - Overflow in Binary System Unsigned integers with 5 binary digits (bits)

Xmax = (31)10 - represented by (11111)2 Xmin = (0)10 - represented by (00000)2

Increasing Xmax by 1 = (32)10 =(100000)2 5-bit representation - only the last five digits retained -

yielding (00000)2 =(0)10

In general - A number X not in the range [Xmin,Xmax]=[0,31] is

represented by X mod 32 If X+Y exceeds Xmax - the result is S = (X+Y) mod 32

Example: X 10001 17 +Y 10010 18

1 00011 3 = 35 mod 32 Result has to be stored in a 5-bit register - the most

significant bit (with weight 2 =32) is discarded5

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Fixed Radix Systems r - the radix of the number system Conventional number systems are also called

fixed-radix systems With no redundancy - 0 xi r-1

xi r introduces redundancy into the fixed-radix number system ?? HOW?

If xi r is allowed -

two machine representations for the same value

-(...,xi+1,xi,... ) and (...,xi+1+1,xi-r,... )

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Representation of Mixed Numbers A sequence of n digits in a register - not

necessarily representing an integer Can represent a mixed number with a fractional

part and an integral part The n digits are partitioned into two - k in the

integral part and m in the fractional part (k+m=n)

The value of an n-tuple with a radix point between the k most significant digits and the m least significant digits

is

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Fixed Point Representations Radix point not stored in register - understood to be in a

fixed position between the k most significant digits and the m least significant digits These are called fixed-point representations

Programmer not restricted to the predetermined position of the radix point Operands can be scaled - same scaling for all operands

Add and subtract operations are correct - aX aY=a(X Y) (a - scaling factor)

Corrections required for multiplication and division aX aY=a X Y ; aX/aY=X/Y

Commonly used positions for the radix point - rightmost side of the number (pure integers - m=0) leftmost side of the number (pure fractions - k=0)

2

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

ULP - Unit in Last Position

Given the length n of the operands, the weight r of the least significant digit indicates the position of the radix point

Unit in the last position (ulp) - the weight of the least significant digit

ulp = r This notation simplifies the discussion No need to distinguish between the different

partitions of numbers into fractional and integral parts

-m

-m

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Representation of Negative Numbers

Fixed-point numbers in a radix r system Two ways of representing negative numbers:

Sign and magnitude representation (or signed-magnitude representation)

Complement representation with two alternativesRadix complement (two's complement in the

binary system)Diminished-radix complement (one's

complement in the binary system)

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Signed-Magnitude Representation Sign and magnitude are represented separately First digit is the sign digit, remaining n-1 digits

represent the magnitude Binary case - sign bit is 0 for positive, 1 for negative

numbers Non-binary case - 0 and r-1 indicate positive and

negative numbers Only 2r out of the r possible sequences are

utilized Two representations for zero - positive and negative

Inconvenient when implementing an arithmetic unit - when testing for zero, the two different representations must be checked

n-1 n

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Disadvantage of the Signed-Magnitude Representation

Operation may depend on the signs of the operands Example - adding a positive number X and a negative

number -Y : X+(-Y) If Y>X, final result is -(Y-X) Calculation -

switch order of operands perform subtraction rather than addition attach the minus sign

A sequence of decisions must be made, costing excess control logic and execution time

This is avoided in the complement representation methods

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Complement Representations of Negative Numbers Two alternatives -

Radix complement (called two's complement in the binary system)

Diminished-radix complement (called one's complement in the binary system)

In both complement methods - positive numbers represented as in the signed-magnitude method

A negative number -Y is represented by R-Y where R is a constant

This representation satisfies -(-Y )=Y since R-(R-Y)=Y

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Advantage of Complement Representation

No decisions made before executing addition or subtraction

Example: X-Y=X+(-Y) -Y is represented by R-Y Addition is performed by X+(R-Y) = R-(Y-X) If Y>X, -(Y-X) is already represented as R-(Y-X) No need to interchange the order of the two

operands

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Two’s Complement r=2, k=n=4, m=0, ulp=2 =1 Radix complement (called two's complement in the binary

case) of a number X = 2 - X It can instead be calculated by X+1 0000 to 0111 represent positive numbers 010 to 710

The two's complement of 0111 is 1000+1=1001 it represents the value (-7)10

The two's complement of 0000 is 1111+1=10000=0 mod 2 - single representation of zero

Each positive number has a corresponding negative number that starts with a 1

1000 representing (-8)10 has no corresponding positive number

Range of representable numbers is -8 X 7

4

0

4

-

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

The Two’s Complement Representation

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Example - Addition in Two’s complement Calculating X+(-Y) with Y>X - 3+(-5)

0011 3 + 1011 -5

1110 -2 Correct result represented in the two's

complement method - no need for preliminary decisions or post corrections

Calculating X+(-Y) with X>Y - 5+(-3) 0101 5 + 1101 -3 1 0010 2

Only the last four least significant digits are retained, yielding 0010

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

One’s Complement in Binary System

r=2, k=n=4, m=0, ulp=2 =1 Diminished-radix complement (called one's

complement in the binary case) of a number X = (2 - 1) - X = X

As before, the sequences 0000 to 0111 represent the positive numbers 010 to 710

The one's complement of 0111 is 1000, representing (-7)10

The one's complement of zero is 1111 - two representations of zero

Range of representable numbers is -7 X 7

4 -

0

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Comparing the Three Representations in a Binary System

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

5.1 Bit-Serial and Ripple-Carry Adders

Half-adder (HA): Truth table and block diagram

Full-adder (FA): Truth table and block diagram

x y c c s ---------------------- 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1

Inputs Outputs

c out c in

out in x

y

s

FA

x y c s ---------------- 0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 0

Inputs Outputs

HA

x y

c

s

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Half-Adder Implementations

Three implementations of a half-adder.

c

s

(b) NOR-gate half-adder.

x

y

x

y

(c) NAND-gate half-adder with complemented carry.

x

y

c

s

s

cx

y

x

y

(a) AND/XOR half-adder._

_

_c

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Full-Adder Implementations

Possible designs for a full-adder in terms of half-adders, logic gates, and CMOS transmission gates.

HA

HA

xy

c in

cout

(a) Built of half-adders.

s

(b) Built as an AND-OR circuit.

(c) Suitable for CMOS realization.

cout

s

c in

xy

0 1 2 3

0 1 2 3

xy

c in

cout

s

0

1

Mux

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Full-Adder Details

CMOS transmission gate and its use in a 2-to-1 mux.

z

x

x

0

1

(a) CMOS transmission gate: circuit and symbol

(b) Two-input mux built of two transmission gates

TG

TG TG

y P

N

Logic equations for a full-adder: s = x y cin (odd parity function)

= x y cin x y cin x y cin x y cin

cout = x y x cin y cin (majority function)

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Simple Adders Built of Full-Adders

Using full-adders in building bit-serial and ripple-carry adders.

x y

c

x

s

y

c

x

s

y

c out c in

0 0

0

c 0

31

31

31

31

FA

s

c c

1 1

1

1 2 FA FA

32 . . .

s 32

x

s

y

c c

i i

i

i i+1 FA Carry

FF Shift

Shift

x

y

s

(a) Bit-serial adder.

(b) Ripple-carry adder.

Clock

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Critical Path Through a Ripple-Carry Adder

Critical path in a k-bit ripple-carry adder.

x

s

y

c

x

s

y

c

x

s

y

c

x

s

y

c

c out c in

0 0

0

c 0

1 1

1

1

k-2 k–2

k–2

2 k

k–1

k–1

k–1

k–1

FA FA FA FA . . . c k–2

s k

Tripple-add = TFA(x,ycout) + (k – 2)TFA(cincout) + TFA(cins)

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

x y c c s ---------------------- 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1

Inputs Outputs

c out c in

out in x

y

s

FA

Binary Adders as Versatile Building Blocks

Four-bit binary adder used to realize the logic function f = w + xyz and its complement.

c

3

c

4

c

2

c

1

c

0

0

1 w

1 z

0 y

x Bit 3 Bit 2 Bit 1 Bit 0

w xyz

(w xyz)

w xyz xyz xy 0

Set one input to 0: cout = AND of other inputs

Set one input to 1: cout = OR of other inputs

Set one input to 0 and another to 1: s = NOT of third input

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Conditions and Exceptions

Two’s-complement adder with provisions for detecting conditions and exceptions.

FAFA

xy 11 x0y0

c0c1

s0s1

FAc2

sk–1

cout cin...

ck–1ck–2

sk–2

ck

xk–2yk–2xk–1yk–1

FA

Overflow

Negative

Zero

overflow2’s-compl = ck ck–1 = ckck–1 ck ck–1

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Manchester Carry Chains and Adders

Sum digit in radix r si = (xi + yi + ci) mod r

Special case of radix 2 si = xi yi ci

Computing the carries ci is thus our central problem For this, the actual operand digits are not important What matters is whether in a given position a carry is

generated, propagated, or annihilated (absorbed)

For binary addition:

gi = xi yi pi = xi yi ai =xiyi = (xi yi) It is also helpful to define a transfer signal:

ti = gi pi = ai = xi yi

Using these signals, the carry recurrence is written as

ci+1 = gi ci pi = gi ci gi ci pi = gi ci ti

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Carry Network is the Essence of a Fast Adder

The main part of an adder is the carry network. The rest is just a set of gates to produce the g and p signals and the sum bits.

Carry network

. . . . . .

x i y i

g p

s

i i

i

c i c i+1

c k 1

c k

c k 2 c 1

c 0

g p 1 1 g p 0 0

g p k 2 k 2 g p i+1 i+1 g p k 1 k 1

c 0 . . . . . .

0 0 0 1 1 0 1 1

annihilated or killed propagated generated (impossible)

Carry is: g i p i

gi = xi yi pi = xi yi

Ripple; Skip;Lookahead;Parallel-prefix

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Ripple-Carry Adder Revisited

The carry propagation network of a ripple-carry adder.

. . . c

k 1

c

k c

k 2

c

1

g

p

1

1

g

p

0

0

g

p

k 2

k 2

g

p

k 1

k 1

c

0 c

2

The carry recurrence: ci+1 = gi pi ci

Latency of k-bit adder is roughly 2k gate delays:

1 gate delay for production of p and g signals, plus 2(k – 1) gate delays for carry propagation, plus1 XOR gate delay for generation of the sum bits

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

The Complete Design of a Ripple-Carry Adder

Carry network

. . . . . .

x i y i

g p

s

i i

i

c i c i+1

c k 1

c k

c k 2 c 1

c 0

g p 1 1 g p 0 0

g p k 2 k 2 g p i+1 i+1 g p k 1 k 1

c 0 . . . . . .

0 0 0 1 1 0 1 1

annihilated or killed propagated generated (impossible)

Carry is: g i p i

gi = xi yi pi = xi yi

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Unrolling the Carry RecurrenceRecall the generate, propagate, annihilate (absorb), and transfer signals:

Signal Radix r Binarygi is 1 iff xi + yi r xi yi

pi is 1 iff xi + yi = r – 1 xi yi

ai is 1 iff xi + yi < r – 1 xiyi = (xi yi) ti is 1 iff xi + yi r – 1 xi yi

si (xi + yi + ci) mod r xi yi ci

The carry recurrence can be unrolled to obtain each carry signal directly from inputs, rather than through propagation

ci = gi–1 ci–1 pi–1

= gi–1 (gi–2 ci–2 pi–2) pi–1

= gi–1 gi–2 pi–1 ci–2 pi–2 pi–1

= gi–1 gi–2 pi–1 gi–3 pi–2 pi–1 ci–3 pi–3 pi–2 pi–1

= gi–1 gi–2 pi–1 gi–3 pi–2 pi–1 gi–4 pi–3 pi–2 pi–1 ci–4 pi–4 pi–3 pi–2 pi–1

= . . .

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Full Carry Lookahead

Theoretically, it is possible to derive each sum digit directly from the inputs that affect it

Carry-lookahead adder design is simply a way of reducing the complexity of this ideal, but impractical, arrangement by hardware sharing among the various lookahead circuits

s0s1s2s3

y0y1y2y3 x0x1x2x3

cin

. . .

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Four-Bit Carry-Lookahead Adder

Complexity reduced by deriving the carry-out indirectly

Four-bit carry network with full lookahead.

g0

g1

g2

g3

c0

c4

c1

c2

c3

p3

p2

p1

p0

Full carry lookahead is quite practical for a 4-bit adder

c1 = g0 c0 p0

c2 = g1 g0 p1 c0 p0 p1

c3 = g2 g1 p2 g0 p1 p2 c0 p0 p1 p2

c4 = g3 g2 p3 g1 p2 p3 g0 p1 p2 p3 c0 p0 p1 p2 p3

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Carry Lookahead Beyond 4 Bits

32-input AND

Consider a 32-bit adder

c1 = g0 c0 p0

c2 = g1 g0 p1 c0 p0 p1

c3 = g2 g1 p2 g0 p1 p2 c0 p0 p1 p2

. . .

c31 = g30 g29 p30 g28 p29 p30 g27 p28 p29 p30 . . . c0 p0 p1 p2 p3 ... p29 p30

32-input OR. . . High fan-ins necessitate

tree-structured circuits

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Solutions to the Fan-in Problem

• Multilevel lookahead• Block Adders•High-radix addition (i.e., radix 2h) : Increases the latency for generating g and p signals and sum digits, but simplifies the carry network (optimal radix?)

Example: 16-bit addition

Radix-16 (four digits)

Two-level carry lookahead (four 4-bit blocks)

Either way, the carries c4, c8, and c12 are determined first

c16 c15 c14 c13 c12 c11 c10 c9 c8 c7 c6 c5 c4 c3 c2 c1 c0

Cout ? ? ? cin

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Block Ripple Adder

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Larger Carry-Lookahead Adder Design Block generate and propagate signals

g [i,i+3] = gi+3 gi+2 pi+3 gi+1 pi+2 pi+3 gi pi+1 pi+2 pi+3

p [i,i+3] = pi pi+1 pi+2 pi+3

• If all 4 bits in a block propagate, the block propagates a carry. • If at least one of the 4 bits generates carry and it can be propagated to the MSB, the block generates a carry.

ic4-bit lookahead carry generator

g p g p g p g p

[i,i+3]p

i+1c

i+2c

i+3c

g

iii+1i+1i+2 i+2 i+3 i+3

[i,i+3]

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

A Building Block for Carry-Lookahead Addition

Four-bit lookahead carry generator.

g0

g1

g2

g3

c0

c4

c1

c2

c3

p3

p2

p1

p0

gi

gi+1

gi+2

gi+3

ci

ci+1

ci+2

ci+3

pi+3

pi+2

pi+1

pi

g

p[i,i+3]

Block Signal GenerationIntermediate Carries

[i,i+3]

Four-bit adder

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Combining Block g and p Signals

Combining of g and p signals of four blocks of arbitrary widths into the g and p signals for the overall block

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

A Two-Level Carry-Lookahead Adder

cccc

4-bit lookahead carry generator

4-bit lookahead carry generator

g p

ccc

g p

12 8 4 0

48 32 16

[0,63]

16-bit Carry-Lookahead Adder

[0,63]

[48,63]

[48,63] g p [32,47]

[32,47] g p [0,15]

[0,15]g p [16,31]

[16,31]

g p [12,15]

[12,15] g p [8,11]

[8,11] g p [4,7]

[4,7] g p [0,3]

[0,3]

Building a 64-bit carry-lookahead adder from 16 4-bit adders and 5 lookahead carry generators.

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Ling Adder and Related Designs

Consider the carry recurrence and its unrolling by 4 steps: ci = gi–1 ci–1 ti–1

= gi–1 gi–2 ti–1 gi–3 ti–2 ti–1 gi–4 ti–3 ti–2 ti–1 ci–4 ti–4 ti–3 ti–2 ti–1

Ling’s modification: Propagate hi = ci ci–1 instead of ci hi = gi–1 hi–1 ti–2

= gi–1 gi–2 gi–3 ti–2 gi–4 ti–3 ti–2 hi–4 ti–4 ti–3 ti–2

CLA: 5 gates max 5 inputs 19 gate inputsLing: 4 gates max 5 inputs 14 gate inputs

The advantage of hi over ci is even greater with wired-OR:

CLA: 4 gates max 5 inputs 14 gate inputsLing: 3 gates max 4 inputs 9 gate inputs

Once hi is known, however, the sum is obtained by a slightly more complex expression compared with si = pi ci

si = (ti hi+1) hi gi ti–1

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Carry Determination as Prefix Computation

g" p"

i 0i 1

j 0j 1

g p

g' p'

Block B'

Block B"

Block B(g, p)

(g", p") (g', p')

¢

g = g" + g'p" p = p'p"

g p

g p

g p

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Formulating the Prefix Computation Problem

The problem of carry determination can be formulated as:Given (g0, p0)(g1, p1) . . . (gk–2, pk–2) (gk–1, pk–1) Find (g [0,0] , p [0,0]) (g [0,1] , p [0,1]) . . . (g [0,k–2] , p [0,k–2]) (g [0,k–1] , p [0,k–1])

c1 c2 . . . ck–1 ck

The desired pairs are found by evaluating all prefixes of (g0, p0) ¢ (g1, p1) ¢ . . . ¢ (gk–2, pk–2) ¢ (gk–1, pk–1)

The carry operator ¢ is associative, but not commutative

[(g1, p1) ¢ (g2, p2)] ¢ (g3, p3) = (g1, p1) ¢ [(g2, p2) ¢ (g3, p3)]

Prefix sums analogy:Given x0 x1 x2 . . . xk–1 Find x0 x0+x1 x0+x1+x2 . . . x0+x1+...+xk–1

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

g0, p0g1, p1g2, p2g3, p3

g[0,0], p[0,0]

= (c1, --)

g[0,1], p[0,1]

= (c2, --)

g[0,2], p[0,2]

= (c3, --)

g[0,3], p[0,3]

= (c4, --)

Example Prefix-Based Carry Network

g p

g p

g p

++

++

26 51

712 5 6

g0, p0g1, p1g2, p2g3, p3

g[0,0], p[0,0]

= (c1, --)

g[0,1], p[0,1]

= (c2, --)

g[0,2], p[0,2]

= (c3, --)

g[0,3], p[0,3]

= (c4, --)

¢¢

¢¢

Four-input prefix sums network

Scan order

Four-bitCarry lookahead network

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Alternative Parallel Prefix Networks

Parallel prefix sums network built of two k/2-input networks and k/2 adders. (Ladner-Fischer)

. . .

Prefix Sums k/2 Prefix Sums k/2

. . .

xk–1 xk/2 xk/2–1 x0

sk–1 sk/2

sk/2–1 s0+ +. . .

. . .

. . . . . .

. . .

. . .. . .

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Brent-Kung Recursive Construction

Parallel prefix sums network built of one k/2-input network and k – 1 adders.

Prefix Sums k/2

xk–1 xk–2 x3 x2 x1 x0

s k–1 s k–2 s 3 s 2 s 1 s 0

++

+

+

+

. . .

. . .

. . .

. . .

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Brent-Kung Carry Network (8-Bit Adder)

¢ ¢ ¢ ¢

¢ ¢

¢ ¢

¢ ¢ ¢

[7, 7 ] [6, 6 ] [5, 5 ] [4, 4 ] [3, 3 ] [2, 2 ] [1, 1 ] [0, 0 ]

[0, 7 ] [0, 6 ] [0, 5 ] [0, 4 ] [0, 3 ] [0, 2 ] [0, 1 ] [0, 0 ]

g p [0,1] [0,1]

g p [1,1] [1,1] g p [0,0] [0,0]

[2, 3 ] [4, 5 ]

[6, 7 ]

[4, 7 ] [0, 3 ]

[0, 1 ]

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Brent-Kung Carry Network (16-Bit Adder)

x0

x1

x2

x3

x4

x5

x6

x7

x8x9x10x11x

12x

13x

14x

15

s0s1s2s3s4s5s6s7s8s9s10s11

s12s13s14s15

1 2 3 4 5 6

Level

Brent-Kung parallel prefix graph for 16 inputs.

Reason for latency

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Kogge-Stone Carry Network (16-Bit Adder)

Kogge-Stone parallel prefix graph for 16 inputs.

x0

x1

x2

x3

x4

x5

x6

x7

x8

x9

x10

x11

x12

x13

x14

x15

s0

s1

s2

s3

s4

s5

s6

s7

s8

s9

s10

s11

s12

s13

s14

s15

log2k levels (minimum possible)

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Speed-Cost Tradeoffs in Carry Networks

Method Delay Cost

Ladner-Fischer ? (k/2) log2k

Kogge-Stone ? k log2k – k + 1

Brent-Kung ? 2k – 2 – log2k

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Hybrid B-K/K-S Carry Network (16-Bit Adder)x0x1x2x3x4x5x6x7x8x9x10x11

x12x13x14x15

s0s1s2s3s4s5s6s7s8s9s10s11s12s13s14s15

x0

x1

x2

x3

x4

x5

x6

x7

x8

x9

x10

x11

x12

x13

x14

x15

s0s1s2s3s4s5s6s7s8s9s10s11s12s13s14s15

1 2 3 4 5 6

Level

x0

x1

x2

x3

x4

x5

x6

x7

x8

x9

x10

x11

x12

x13

x14

x15

s0

s1

s2

s3

s4

s5

s6

s7

s8

s9

s10

s11

s12

s13

s14

s15

Brent- Kung

Brent- Kung

Kogge- Stone

A Hybrid Brent-Kung/ Kogge-Stone parallel prefix graph for 16 inputs.

Brent-Kung: 6 levels 26 cells

Kogge-Stone: 4 levels49 cells

Hybrid: 5 levels 32 cells

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Simple Carry-Skip Adders

Converting a 16-bit ripple-carry adder into a simple carry-skip adder with 4-bit skip blocks.

cc ccc

cc ccc

pppp

SkipSkipSkip

4-Bit Block

Skip logic (2 gates)

1612

8

4

0

0

4

8

1216

[12,15] [8,11] [4,7][0,3]

(a) Ripple-carry adder.

(b) Simple carry-skip adder.

3 2 1 0

Ripple-carry stages

4-Bit Block

4-Bit Block

4-Bit Block

4-Bit Block

4-Bit Block

3 2 1 0

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Another View of Carry-Skip Addition

Street/freeway analogy for carry-skip adder.

c

g

p

4j+1

4j+1

g

p

4j

4j

g

p

4j+2

4j+2

g

p

4j+3

4j+3

c

4j

4j+4

c

4j+3

c

4j+2

c

4j+1

One-way street

Freeway

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Multilevel Carry-Skip Adders

One-level carry-skip adder.

S 1

c out c in

S 1 S 1 S 1 S 1

Example of a two-level carry-skip adder.

S 2

S 1

c out c in

S 1 S 1 S 1 S 1

c out c in

S

2

S

1

S

1

S

1

Two-level carry-skip adder optimized by removing the short-block skip circuits.

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Using Two-Operand Adders

Some applications of multioperand addition

• • • • a • • • • x ---------- • • • • x a • • • • x a • • • • x a • • • • x a ---------------- • • • • • • • • p

0 1 2 3

0 1 2 3

2 2 2 2

• • • • • • p • • • • • • p • • • • • • p • • • • • • p • • • • • • p • • • • • • p • • • • • • p ----------------- • • • • • • • • • s

(0) (1) (2) (3) (4) (5) (6)

Multioperand addition problems for multiplication or inner-product computation in dot notation.

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Serial Implementation with One Adder

Adder x

k bits

k + log n bits x j=0

i–1

(i)

2 (j)

Partial sum register

Serial implementation of multi-operand addition with a single 2-operand adder.

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Pipelined Implementation for Higher Throughput

Serial multi-operand addition when each adder is a 4-stage pipeline.

(i–10)(i–9)

Delay

DelaysReady to compute s(i–12)

x(i–1)

x(i)

x +(i) x(i–1)

x +(i–8) x + (i–11)x +x

(i–7)x +(i–6) x

(i–5)x +(i–4) x

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Parallel Implementation as Tree of Adders

Adding 7 numbers in a binary tree of adders.

Adder Adder Adder

AdderAdder

Adder

k

k+1

k+2

k+3

k+2

k+1k+1

k kk kk k

log2n adder levels

n – 1 adders

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Carry-Save Adders

FA FAFA FA FAFA

FA FAFA FA FAFA

Cut

A ripple-carry adder turns into a carry-save adder if the carries are saved (stored) rather than propagated.

Carry-propagate adder

Carry-save adder (CSA) or (3; 2)-counter or 3-to-2 reduction circuit

c

in

c

out

Carry-propagate adder (CPA) and carry-save adder (CSA) functions in dot notation.

Half-adder

Full-adder

Specifying full- and half-adder blocks, with their inputs and outputs, in dot notation.

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Multioperand Addition Using Carry-Save Adders

Tree of carry-save adders reducing seven numbers to two.

CSACSA

CSA

CSA

CSA

Carry-propagate adder

Serial carry-save addition using a single CSA.

CSA

Input

Sum registerCarry register

Output

CPA

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Example Reduction by a CSA Tree

12 FAs

6 FAs

6 FAs

4 FAs + 1 HA

7-bit adder

Total cost = 7-bit adder + 28 FAs + 1 HA

Addition of seven 6-bit numbers in dot notation.

8 7 6 5 4 3 2 1 0 Bit position

7 7 7 7 7 7 62 = 12 FAs 2 5 5 5 5 5 3 6 FAs

3 4 4 4 4 4 1 6 FAs

1 2 3 3 3 3 2 1 4 FAs + 1 HA

2 2 2 2 2 1 2 1 7-bit adder

--Carry-propagate adder--

1 1 1 1 1 1 1 1 1

Representing a seven-operand addition in tabular form.

A full-adder compacts 3 dots into 2 (compression ratio of 1.5)

A half-adder rearranges 2 dots (no compression, but still useful)

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Width of Adders in a CSA Tree

Adding seven k-bit numbers and the CSA/CPA widths required.

Due to the gradual retirement (dropping out) of some of the result bits, CSA widths do not vary much as we go down the tree levels

k-bit CPA

k-bit CSA k-bit CSA

k-bit CSA

k-bit CSA

0k+2

The index pair [i, j] means that bit positions from i up to j are involved.

k-bit CSA

[0, k–1] [0, k–1]

[0, k–1] [0, k–1]

[0, k–1] [0, k–1]

[0, k–1] [0, k–1]

[0, k–1]

[1, k] [1, k]

[1, k]

[1, k]

[0, k–1]

[2, k+1] [2, k+1]

[2, k+1]

[2, k+1] [1, k–1]

1

[1, k+1]

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Wallace Tree Multiplier

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits


Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

DADDA Tree Multiplier

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits


Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits


Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits


Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Saturating Adders

Saturating (saturation) arithmetic:

When a result’s magnitude is too large, do not wrap around; rather, provide the most positive or the most negative value that is representable in the number format

Designing saturating adders

Saturating arithmetic in desirable in many DSP applications

Saturation value

Overflow

0

1

Adder

Unsigned (quite easy)

Signed (slightly harder)

Example – In 8-bit 2’s-complement format, we have:120 + 26 18 (wraparound); 120 +sat 26 127 (saturating)

Dr. A

mr T

alaat

ELECT 90X

Pro

gram

mab

le Lo

gic C

ircuits

Readings:

Main reference for the above slides: Chapters 5,6,7,& 8, B. Parhami, Computer

Arithmetic: Algorithms and Hardware Design, Oxford University Press, 2000.

Documents

Programmable Logic Circuits: Computer Arithmetic: Introduction