69
55:035 Computer Architecture and Organization Lecture 5

55:035 Computer Architecture and Organization Lecture 5

Embed Size (px)

Citation preview

55:035 Computer Architecture and Organization

Lecture 5

Outline Adders Comparators Shifters Multipliers Dividers Floating Point Numbers

255:035 Computer Architecture and Organization

Binary Representations of Numbers To find negative numbers

Sign and magnitude: msb = ‘1’ 1’s complement: complement each bit to change sign 2’s complement: 2n – positive number

b2b1b0 Unsigned

Sign and Magnitude

1’s

Complement

2’s Complement

0 1 1 3 +3 +3 +3

0 1 0 2 +2 +2 +2

0 0 1 1 +1 +1 +1

0 0 0 0 +0 +0 +0

1 0 0 4 -0 -3 -4

1 0 1 5 -1 -2 -3

1 1 0 6 -2 -1 -2

1 1 1 7 -3 -0 -1

355:035 Computer Architecture and Organization

Single-Bit AdditionHalf Adder Full Adder

A B Cout S

0 0

0 1

1 0

1 1

A B C Cout S

0 0 0

0 0 1

0 1 0

0 1 1

1 0 0

1 0 1

1 1 0

1 1 1

A B

S

Cout

A B

C

S

Cout

out

S

C

out

S

C

4455:035 Computer Architecture and Organization

Single-Bit AdditionHalf Adder Full Adder

A B Cout S

0 0 0 0

0 1 0 1

1 0 0 1

1 1 1 0

A B C Cout S

0 0 0 0 0

0 0 1 0 1

0 1 0 0 1

0 1 1 1 0

1 0 0 0 1

1 0 1 1 0

1 1 0 1 0

1 1 1 1 1

A B

S

Cout

A B

C

S

Cout

555:035 Computer Architecture and Organization

Carry-Ripple Adder Simplest design: cascade full adders

Critical path goes from Cin to Cout Design full adder to have fast carry delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

655:035 Computer Architecture and Organization

Carry Propagate Adders N-bit adder called CPA

Each sum bit depends on all previous carries How do we compute all these carries quickly?

+

BN...1AN...1

SN...1

CinCout

11111 1111 +0000 0000

A4...1

carries

B4...1

S4...1

CinCout

00000 1111 +0000 1111

CinCout

755:035 Computer Architecture and Organization

Propagate and Generate - Define An n-bit adder is just a combinational circuit

si = a XOR b XOR c = aibi’ci’ + ai’bici’ + ai’bi’ci + aibici

ci+1 = MAJ(a,b,c) = aibi + aici + bici

Want to write si in “sum of products” form ci+1 = gi + pici, where gi = aibi, pi = ai + bi

if gi is true, then ci+1 is true, thus carry is generated if pi is true, and if ci is true, ci is propagated

Note that the pi equation can also be written as

pi = ai XOR bi since gi = 1 when ai = bi = 1 (i.e. generate occurs, so propagate is a “don’t care”)

855:035 Computer Architecture and Organization

Propagate and Generate-Lookahead Recursively apply to eliminate carry terms

ci+1 = gi + pigi-1 + pipi-1gi-2 + …

+ pipi-1…p1g0 + pipi-1…p1p0c0

This is a carry-lookahead adder Note large fan-in of OR gate and last AND gate Too big! Build p’s and g’s in steps

c1 = g0 + p0c0

c2 = G01 + P01c0

Where: G01 = g1 + p1g0, P01 = p1p0

9955:035 Computer Architecture and Organization

Propagate and Generate - Blocks In the general case, for any j with i<j, j+1<k

ck+1 = Gik + Pikci

Gik = Gj+1,k + Pj+1,k Gij

Pik = PijPj+1,k

Gik equation in words: A carry is generated out of the block consisting of bits

i through k inclusive if it is generated in the high-order part of the block (j+1, k) or it is generated in the low-order (i,j) part of the block and then

propagated through the high part

0:00:00inGCP0:00:00inGCP

1055:035 Computer Architecture and Organization

PG Logic

S1

B1A1

P1G1

G00

S2

B2

P2G2

G01

A2

S3

B3A3

P3G3

G02

S4

B4

P4G4

G03

A4 Cin

G0 P0

1: Bitwise PG logic

2: Block PG logic

3: Sum logicC0C1C2C3

Cout

C4

1155:035 Computer Architecture and Organization

Carry-Ripple Revisited

S1

B1A1

P1G1

G00

S2

B2

P2G2

G01

A2

S3

B3A3

P3G3

G02

S4

B4

P4G4

G03

A4 Cin

G0 P0

C0C1C2C3

Cout

C4

G03 = G3 + P3 G02

1255:035 Computer Architecture and Organization

Carry-Skip Adder Carry-ripple is slow through all N stages Carry-skip allows carry to skip over groups of n

bits Decision based on n-bit propagate signal

Cin+

S4:1

P4:1

A4:1 B4:1

+

S8:5

P8:5

A8:5 B8:5

+

S12:9

P12:9

A12:9 B12:9

+

S16:13

P16:13

A16:13 B16:13

CoutC4

1

0

C81

0

C121

0

1

0

1355:035 Computer Architecture and Organization

Carry-Lookahead Adder Carry-lookahead adder computes G0i for many

bits in parallel. Uses higher-valency cells with more than two

inputs.

Cin+

S4:1

G4:1P4:1

A4:1 B4:1

+

S8:5

G8:5P8:5

A8:5 B8:5

+

S12:9

G12:9P12:9

A12:9 B12:9

+

S16:13

G16:13P16:13

A16:13 B16:13

C4C8C12Cout

1455:035 Computer Architecture and Organization

Carry-Select Adder Trick for critical paths dependent on late input X

Precompute two possible outputs for X = 0, 1 Select proper output when X arrives

Carry-select adder precomputes n-bit sums For both possible carries into n-bit group

Cin+

A4:1 B4:1

S4:1

C4

+

+

01

A8:5 B8:5

S8:5

C8

+

+

01

A12:9 B12:9

S12:9

C12

+

+

01

A16:13 B16:13

S16:13

Cout

0

1

0

1

0

1

1555:035 Computer Architecture and Organization

Comparators 0’s detector: A = 00…000 1’s detector: A = 11…111 Equality comparator: A = B Magnitude comparator: A < B

1655:035 Computer Architecture and Organization

1’s & 0’s Detectors 1’s detector: N-input AND gate 0’s detector: NOTs + 1’s detector (N-input NOR)

A0

A1

A2

A3

A4

A5

A6

A7

allones

A0

A1

A2

A3

allzeros

allones

A1

A2

A3

A4

A5

A6

A7

A0

1755:035 Computer Architecture and Organization

Equality Comparator Check if each bit is equal (XNOR, aka equality

gate) 1’s detect on bitwise equality

A[0]B[0]

A = B

A[1]B[1]

A[2]B[2]

A[3]B[3]

1855:035 Computer Architecture and Organization

Magnitude Comparator Compute B-A and look at sign B-A = B + ~A + 1 For unsigned numbers, carry out is sign bit

1955:035 Computer Architecture and Organization

Signed vs. Unsigned For signed numbers, comparison is harder

C: carry out Z: zero (all bits of A-B are 0) N: negative (MSB of result) V: overflow (inputs had different signs, output sign B)

2055:035 Computer Architecture and Organization

Shifters Logical Shift:

Shifts number left or right and fills with 0’s 1011 LSR 1 = ____ 1011 LSL1 = ____

Arithmetic Shift: Shifts number left or right. Rt shift sign extends

1011 ASR1 = ____ 1011 ASL1 = ____

Rotate: Shifts number left or right and fills with lost bits

1011 ROR1 = ____ 1011 ROL1 = ____

2155:035 Computer Architecture and Organization

Shifters Logical Shift:

Shifts number left or right and fills with 0’s 1011 LSR 1 = 0101 1011 LSL1 = 0110

Arithmetic Shift: Shifts number left or right. Rt shift sign extends

1011 ASR1 = 1101 1011 ASL1 = 0110

Rotate: Shifts number left or right and fills with lost bits

1011 ROR1 = 1101 1011 ROL1 = 0111

2255:035 Computer Architecture and Organization

Funnel Shifter A funnel shifter can do all six types of shifts Selects N-bit field Y from 2N-bit input

Shift by k bits (0 k < N)

B C

offsetoffset + N-1

0N-12N-1

Y

2355:035 Computer Architecture and Organization

Funnel Shifter Operation

Computing N-k requires an adder

2455:035 Computer Architecture and Organization

Funnel Shifter Operation

Computing N-k requires an adder

2555:035 Computer Architecture and Organization

Funnel Shifter Operation

Computing N-k requires an adder

2655:035 Computer Architecture and Organization

Funnel Shifter Operation

Computing N-k requires an adder

2755:035 Computer Architecture and Organization

Funnel Shifter Operation

Computing N-k requires an adder

2855:035 Computer Architecture and Organization

Simplified Funnel Shifter Optimize down to 2N-1 bit input

2955:035 Computer Architecture and Organization

Simplified Funnel Shifter Optimize down to 2N-1 bit input

3055:035 Computer Architecture and Organization

Simplified Funnel Shifter Optimize down to 2N-1 bit input

3155:035 Computer Architecture and Organization

Simplified Funnel Shifter Optimize down to 2N-1 bit input

3255:035 Computer Architecture and Organization

Simplified Funnel Shifter Optimize down to 2N-1 bit input

3355:035 Computer Architecture and Organization

Funnel Shifter Design 1 N N-input multiplexers

Use 1-of-N hot select signals for shift amount

k[1:0]

s0s1s2s3Y3

Y2

Y1

Y0

Z0Z1Z2Z3Z4

Z5

Z6

left Inverters & Decoder

3455:035 Computer Architecture and Organization

Funnel Shifter Design 2 Log N stages of 2-input muxes

No select decoding needed

Y3

Y2

Y1

Y0Z0

Z1

Z2

Z3

Z4

Z5

Z6

k0k1

left

3555:035 Computer Architecture and Organization

Multi-input Adders Suppose we want to add k N-bit words

Ex: 0001 + 0111 + 1101 + 0010 = _____

3655:035 Computer Architecture and Organization

Multi-input Adders Suppose we want to add k N-bit words

Ex: 0001 + 0111 + 1101 + 0010 = 10111

3755:035 Computer Architecture and Organization

Multi-input Adders Suppose we want to add k N-bit words

Ex: 0001 + 0111 + 1101 + 0010 = 10111

Straightforward solution: k-1 N-input CPAs Large and slow

+

+

0001 0111

+

1101 0010

10101

10111

3855:035 Computer Architecture and Organization

Carry Save Addition A full adder sums 3 inputs and produces 2 outputs

Carry output has twice weight of sum output

N full adders in parallel are called carry save adder Produce N sums and N carry outs

Z4Y4X4

S4C4

Z3Y3X3

S3C3

Z2Y2X2

S2C2

Z1Y1X1

S1C1

XN...1 YN...1 ZN...1

SN...1CN...1

n-bit CSA

3955:035 Computer Architecture and Organization

CSA Application Use k-2 stages of CSAs

Keep result in carry-save redundant form

Final CPA computes actual result

4-bit CSA

5-bit CSA

0001 0111 1101 0010

+

10110101_

0001 0111+1101 10110101_

XYZSC

0101_ 1011 +0010

XYZSC

ABS

4055:035 Computer Architecture and Organization

CSA Application Use k-2 stages of CSAs

Keep result in carry-save redundant form

Final CPA computes actual result

4-bit CSA

5-bit CSA

0001 0111 1101 0010

+

10110101_

01010_ 00011

0001 0111+1101 10110101_

XYZSC

0101_ 1011 +0010 0001101010_

XYZSC

01010_+ 00011

ABS

4155:035 Computer Architecture and Organization

CSA Application Use k-2 stages of CSAs

Keep result in carry-save redundant form

Final CPA computes actual result

4-bit CSA

5-bit CSA

0001 0111 1101 0010

+

10110101_

01010_ 00011

0001 0111+1101 10110101_

XYZSC

0101_ 1011 +0010 0001101010_

XYZSC

01010_+ 00011 10111

ABS

10111

4255:035 Computer Architecture and Organization

Multiplication Example: 1100 : 1210

0101 : 510

4355:035 Computer Architecture and Organization

Multiplication Example: 1100 : 1210

0101 : 510 1100

4455:035 Computer Architecture and Organization

Multiplication Example: 1100 : 1210

0101 : 510 1100 0000

4555:035 Computer Architecture and Organization

Multiplication Example: 1100 : 1210

0101 : 510 1100 0000 1100

4655:035 Computer Architecture and Organization

Multiplication Example: 1100 : 1210

0101 : 510 1100 0000 1100 0000

4755:035 Computer Architecture and Organization

Multiplication Example: 1100 : 1210

0101 : 510 1100 0000 1100 000000111100 : 6010

4855:035 Computer Architecture and Organization

Multiplication Example:

M x N-bit multiplication Produce N M-bit partial products Sum these to produce M+N-bit product

1100 : 1210 0101 : 510 1100 0000 1100 000000111100 : 6010

multiplier

multiplicand

partialproducts

product

4955:035 Computer Architecture and Organization

General Form Multiplicand: Y = (yM-1, yM-2, …, y1, y0)

Multiplier: X = (xN-1, xN-2, …, x1, x0)

Product:

1 1 1 1

0 0 0 0

2 2 2M N N M

j i i jj i i j

j i i j

P y x x y

x0y5 x0y4 x0y3 x0y2 x0y1 x0y0

y5 y4 y3 y2 y1 y0

x5 x4 x3 x2 x1 x0

x1y5 x1y4 x1y3 x1y2 x1y1 x1y0

x2y5 x2y4 x2y3 x2y2 x2y1 x2y0

x3y5 x3y4 x3y3 x3y2 x3y1 x3y0

x4y5 x4y4 x4y3 x4y2 x4y1 x4y0

x5y5 x5y4 x5y3 x5y2 x5y1 x5y0

p0p1p2p3p4p5p6p7p8p9p10p11

multiplier

multiplicand

partialproducts

product

5055:035 Computer Architecture and Organization

Dot Diagram Each dot represents a bit

partial products

multiplier x

x0

x15

5155:035 Computer Architecture and Organization

Array Multipliery0y1y2y3

x0

x1

x2

x3

p0p1p2p3p4p5p6p7

B

ASin Cin

SoutCout

BA

CinCout

Sout

Sin

=

CSAArray

CPA

critical path BA

Sout

Cout CinCout

Sout

=Cin

BA

5255:035 Computer Architecture and Organization

Rectangular Array Squash array to fit rectangular floorplan

y0y1y2y3

x0

x1

x2

x3

p0

p1

p2

p3

p4p5p6p7

5355:035 Computer Architecture and Organization

Fewer Partial Products Array multiplier requires N partial products If we looked at groups of r bits, we could form

N/r partial products. Faster and smaller? Called radix-2r encoding

Ex: r = 2: look at pairs of bits Form partial products of 0, Y, 2Y, 3Y First three are easy, but 3Y requires adder

5455:035 Computer Architecture and Organization

Booth Encoding Instead of 3Y, try –Y, then increment next partial

product to add 4Y Similarly, for 2Y, try –2Y + 4Y in next partial product

5555:035 Computer Architecture and Organization

Booth Hardware Booth encoder generates control lines for each PP

Booth selectors choose PP bits

5655:035 Computer Architecture and Organization

Sign Extension Partial products can be negative

Require sign extension, which is cumbersome High fanout on most significant bit

multiplier x

x0

x15

0

00

x-1

x16x17

ssssssssssssssss

ssssssssssssss

ssssssssssss

ssssssssss

ssssssss

ssssss

ssss

ss

PP0

PP1

PP2

PP3

PP4

PP5

PP6

PP7

PP8

5755:035 Computer Architecture and Organization

Simplified Sign Ext. Sign bits are either all 0’s or all 1’s

Note that all 0’s is all 1’s + 1 in proper column Use this to reduce loading on MSB

s111111111111111s

s1111111111111s

s11111111111s

s111111111s

s1111111s

s11111s

s111s

s1s

PP0

PP1

PP2

PP3

PP4

PP5

PP6

PP7

PP8

5855:035 Computer Architecture and Organization

Even Simpler Sign Ext. No need to add all the 1’s in hardware

Precompute the answer!

ssss

ss1

ss1

ss1

ss1

ss1

ss1

ss

PP0PP1PP2PP3PP4PP5PP6PP7PP8

5955:035 Computer Architecture and Organization

Division - Restoring n times

Shift A and Q left one bit

Subtract M from A, put answer in A

If the sign of A is 1 set q0 to 0

Add M back to A If the sign of A is 0

set q0 to 1

55:035 Computer Architecture and Organization 60

qn 1-

mn 1-

-bit

Divisor M

Controlsequencer

Dividend Q

Shift left

adder

an 1- a0 q0

m0

a n

0

Add/Subtract

Quotientsetting

n 1+

A

Division – Restoring Example

61

10111

1 1 1 1 1

01111

0

0

0

1

000

0

0

00

0

0

0

00

0

1

00

0

0

0

SubtractShift

Restore

1 00001 0000

1 1

Initially

Subtract

Shift

10111

100001100000000

SubtractShift

Restore

101110100010000

1 1

QuotientRemainder

Shift

10111

1 0000

Subtract

Second cycle

First cycle

Third cycle

Fourth cycle

00

0

0

00

1

0

1

10000

1 11 0000

11111Restore

q0Set

q0Set

q0Set

q0Set

1 01

111 1

01

0001

Division - Nonrestoring n times

If the sign of A is 0 shift A and Q left subtract M from A

Else shift A and Q left add M to A

Now if sign of A is 0 set q0 to 1

Else set q0 to 0

If the sign of A is 1 add M to A

55:035 Computer Architecture and Organization 62

qn 1-

mn 1-

-bit

Divisor M

Controlsequencer

Dividend Q

Shift left

adder

an 1- a0 q0

m0

a n

0

Add/Subtract

Quotientsetting

n 1+

A

Division – Nonrestoring Example

63

1

Add

Quotient

Remainder

0 0 0 01

0 0 1 01 1 1 1 1

1 1 1 1 10 0 0 1 1

0 0 0 01 1 1 1 1

Shift 0 0 01100001111

Add

0 0 0 1 10 0 0 0 1 0 0 01 1 1 0 1

ShiftSubtract

Initially 0 0 0 0 0 1 0 0 0

1 1 1 0 0000

1 1 1 0 00 0 0 1 1

0 0 0ShiftAdd

0 0 10 0 0 011 1 1 0 1

ShiftSubtract

0 0 0 110000

Restore remainder

Fourth cycle

Third cycle

Second cycle

First cycle

q0Set

q0Set

q0Set

q0Set

1 01

111 1

01

0001

Floating Point – Single Precision IEEE-754, 854 Decimal point can

“move” – hence it’s “floating”

Floating point is useful for scientific calculations

Can represent Very large integers and Very small fractions ~10±38

55:035 Computer Architecture and Organization 64

Sign ofnumber :

32 bits

mantissa fraction23-bit

representationexcess-127exponent in8-bit signed

Value represented

0 0 1 0 1 0 . . . 00 0 0 1 0 1 0 0 0

S M

Value represented

(a) Single precision

(b) Example of a single-precision number

E

+

1.0010100 287-

=

1. M 2E 127-

=

0 signifies-1 signifies

Floating Point – Double Precision Double Precision can

represent ~10±308

55:035 Computer Architecture and Organization 65

52-bitmantissa fraction

11-bit excess-1023exponent

64 bits

Sign

S M

Value represented 1. M 2E 1023-

=

E

Floating Point The IEEE Standard

requires these operations, at a minimum

Add Subtract Multiply Divide Remainder Square Root Decimal/Binary Conversion

Special Values

Exceptions Underflow, Overflow,

divide by 0, inexact, invalid

55:035 Computer Architecture and Organization 66

E’ M Value

0 0 +/- 0

255 0 +/- ∞

0 ≠ 0 ±0.M X 2 -126

255 ≠ 0 Not a Number NaN

FP Arithmetic Operations Add/Subtract

Shift mantissa of smaller exponent number right by the difference in exponents

Set the exponent of the result = the larger exponent Add/Sub Mantissas, get sign Normalize

MultiplyDivide Add/Sub exponents, Subtract/Add 127 Multiply/Divide Mantissas, determine sign Normalize

55:035 Computer Architecture and Organization 67

FP Guard Bits and Truncation Guard bits

Extra bits during intermediate steps to yield maximum accuracy in the final result

They need to be removed when generating the final result Chopping

simply remove guard bits

Von Neumann rounding if all guard bits 0, chop, else 1

Rounding Add 1 to LSB if guard MSB = 1

55:035 Computer Architecture and Organization 68

FP Add-Subtract Unit

55:035 Computer Architecture and Organization 69

E X

Magnitude M

with larger EM of number

with smaller EM of number

subtractor8-bit

sign

subtractor8-bit

MUX

Mantissa

SHIFTER

SWAP

detector

Normalize andround

Leading zeros

to right

adder/subtractor

SubtractAdd /

Sign

Add/Sub

n bitsS A S B

E A E B M A M B

n E A E B -=

E A E B

S R

E X-

E R M RR :32-bitresultR A B+=

32-bit operandsA : S A E A M AB : S B E B M B

CombinationalCONTROL

network