Upload
dane
View
34
Download
0
Embed Size (px)
DESCRIPTION
Computer Architecture Lecture Notes Spring 2005 Dr. Michael P. Frank. Competency Area 4: Computer Arithmetic. Introduction. In previous chapters we’ve discussed: Performance (execution time, clock cycles, instructions, MIPS, etc) - PowerPoint PPT Presentation
Citation preview
Computer Architecture Lecture Notes Spring 2005Dr. Michael P. Frank
Competency Area 4:Computer Arithmetic
• In previous chapters we’ve discussed:—Performance (execution time, clock cycles,
instructions, MIPS, etc)—Abstractions:
Instruction Set Architecture Assembly Language and Machine
Language
• In this chapter:—Implementing the Architecture:
– How does the hardware really add, subtract, multiply and divide?
– Signed and unsigned representations– Constructing an ALU (Arithmetic Logic Unit)
Introduction
Introduction
• Humans naturally represent numbers in base 10, however, computers understand base 2.
Example: -1
(1111 1111)2 = 255
(Signed Representation)
(Unsigned Representation)
Note: Signed representation includes sign-magnitude and two’scomplement. Also, one’s complement representation.
• Sign Magnitude: One's Complement Two's Complement000 = +0 000 = +0 000 = +0001 = +1 001 = +1 001 = +1010 = +2 010 = +2 010 = +2011 = +3 011 = +3 011 = +3100 = -0 100 = -3 100 = -4101 = -1 101 = -2 101 = -3110 = -2 110 = -1 110 = -2111 = -3 111 = -0 111 = -1
• Sign Magnitude (first bit is sign bit, others magnitude)• Two’s Complement (negation: invert bits and add 1)• One’s Complement (first bit is sign bit, invert other bits for magnitude)
• NOTE: Computers today use two’s complement binary representations for signed numbers.
Possible Representations
• 32 bit signed numbers (MIPS):
0000 0000 0000 0000 0000 0000 0000 0000two = 0ten
0000 0000 0000 0000 0000 0000 0000 0001two = + 1ten
0000 0000 0000 0000 0000 0000 0000 0010two = + 2ten
...0111 1111 1111 1111 1111 1111 1111 1110two = + 2,147,483,646ten
0111 1111 1111 1111 1111 1111 1111 1111two = + 2,147,483,647ten
1000 0000 0000 0000 0000 0000 0000 0000two = – 2,147,483,648ten
1000 0000 0000 0000 0000 0000 0000 0001two = – 2,147,483,647ten
1000 0000 0000 0000 0000 0000 0000 0010two = – 2,147,483,646ten
...1111 1111 1111 1111 1111 1111 1111 1101two = – 3ten
1111 1111 1111 1111 1111 1111 1111 1110two = – 2ten
1111 1111 1111 1111 1111 1111 1111 1111two = – 1ten
maxint
minint
Two’s Complement Representations
•The hardware need only test the first bit to determine the sign.
• Negating a two's complement number: – invert all bits and add 1– Or, preserve rightmost 1 and 0’s to its right,
flip all bits to the left of the rightmost 1• Converting n-bit numbers into m-bit numbers with m > n:
Example: Convert 4-bit signed number into 8-bit number.
0010 0000 0010 (+210)
1010 1111 1010 (-610)
—"sign extension" is used. The most significant bit is copied into the right portion of the new word. For unsigned numbers, the leftmost bits are filled with 0’s.
—Example instructions: lbu/lb, slt/sltu, etc.
Two’s Complement Operations
• Just like in grade school (carry/borrow 1s) 0111 0111 0110+ 0110 - 0110 - 0101
• Two's complement operations easy
—subtraction using addition of negative numbers 0111+ 1010
• Overflow (result too large for finite computer word):
—e.g., adding two n-bit numbers does not yield an n-bit number
0111+ 0001 note that overflow term is
somewhat misleading, 1000 it does not mean a carry “overflowed”
Addition and Subtraction
32-bit ALU with Zero Detect:
Seta31
0
Result0a0
Result1a1
0
Result2a2
0
Operation
b31
b0
b1
b2
Result31
Overflow
Bnegate
Zero
ALU0Less
CarryIn
CarryOut
ALU1Less
CarryIn
CarryOut
ALU2Less
CarryIn
CarryOut
ALU31Less
CarryIn
* Recall that given following control lines, we get these functions:
000 = and 001 = or 010 = add 110 = subtract 111 = slt
* We’ve learned how to build each of these functions in hardware.
• We’ve studied how to implement a 1-bit ALU in hardware that supports the MIPS instruction set:
— key idea: use multiplexor to select desired output function
— we can efficiently perform subtraction using two’s complement
— we can replicate a 1-bit ALU to produce a 32-bit ALU
• Important issues about hardware:
— all of the gates are always working
— the speed of a gate is affected by the number of inputs to the gate
— the speed of a circuit is affected by the number of gates in series
(on the “critical path” or the “deepest level of logic”)
• Changes in hardware organization can improve performance— we’ll look at examples for addition (carry lookahead adder)
and multiplication, and division
So far…
• For adder design: —Problem ripple carry adder is slow due to
sequential evaluation of carry-in/carry-out bits• Consider the carryin inputs:
Better adder design
233333333
111111122
000000011
coutbacinbcinacincarryin
coutbacinbcinacincarryin
coutbacinbcinacincarryin
• Using substitution, we can see the “ripple” effect:
11000000100000012
0000001
,
babacinbcinabbacinbcinaacin
then
bacinbcinacin
• Faster carry schemes exist that improve the speed of adders in hardware and reduce complexity in equations, namely the carry lookahead adder.
• Let cini represent the ith carryin bit, then:
• We can now define the terms generate and propagate:
• Then,
Carry-Lookahead Adder
)()(
)()()(
1
1
iiiiii
iiiiiii
bacinbacin
bacinbcinacin
)(
)(
iii
iii
bappropagate
baggenerate
iiii cinpgcin 1
• Suppose gi is 1. The adder generates a carryout independent of the value of the carryin, i.e.
• Now suppose gi is 0 and pi is 1:
• The adder propagates a carryin to a carryout. In summary, cout is 1 if either gi is 1 or both pi and cin are 1.
• This new approach creates the first level of abstraction.
Carry-Lookahead Adder
)()()()()(
)()()()(
)()()(
)(
0012301231232333334
00120121222223
0010111112
0001
cinppppgpppgppgpgcinpgcin
cinpppgppgpgcinpgcin
cinppgpgcinpgcin
cinpgcin
)( 1 ii cincout
111 iiiiii cinpcinpgcin
iiiiii cincincinpgcin 101
• Sometimes the first level of abstraction will produce large equations. It is beneficial then to look at the second level of abstraction. It is produced by considering a 4-bit adder where we propagate and generate signals at a higher level:
Carry-Lookahead Adder
121314153
8910112
45671
01230
ppppP
ppppP
ppppP
ppppP
• We’re representing a 16-bit adder, with a “super” propagate signal and a “super” generate signal.
• So Pi is true only if the each of the bits in the group propagates a carry.
• For the “super” generate signals it matters only if there is a carry out in the most significant bit.
Carry-Lookahead Adder
)()()(
)()()(
)()()(
)()()(
121314151314151215153
891011910111011112
45675676771
01231232330
gpppgppgpgG
gpppgppgpgG
gpppgppgpgG
gpppgppgpgG
• Now we can represent the carryout signals for the 16-bit adder with two levels of abstraction as
)()()()(
)()()(
)()(
)(
0012301231232334
00120121223
0010112
0001
cinPPPPGPPPGPPGPGCin
cinPPPGPPGPGCin
cinPPGPGCin
cinPGCin
CarryIn
Result0--3
ALU0
CarryIn
Result4--7
ALU1
CarryIn
Result8--11
ALU2
CarryIn
CarryOut
Result12--15
ALU3
CarryIn
C1
C2
C3
C4
P0G0
P1G1
P2G2
P3G3
pigi
pi + 1gi + 1
ci + 1
ci + 2
ci + 3
ci + 4
pi + 2gi + 2
pi + 3gi + 3
a0b0a1b1a2b2a3b3
a4b4a5b5a6b6a7b7
a8b8a9b9
a10b10a11b11
a12b12a13b13a14b14a15b15
Carry-lookahead unit2nd Level of AbstractionCarry-LookAheadAdder Design
O(log n)-time carry-skip adder
(8 bit segment shown)
Pms Gls Pls
CinGCout
P
P
Pms Gls Pls
CinGCout
P
MS LS
LS
Pms Gls Pls
G
Pms Gls Pls
G
P
P
Pms Gls Pls
CinGCout
P
MS LS
MS
Pms Gls Pls
G
Pms Gls Pls
CinGCout
P
LS
GCoutCin
S A B
P
G Cin
S A B
P
GCoutCin
S A B
P
S A B
P
GCoutCin
S A B
P
S A B
P
GCoutCin
S A B
P
S A B
P
G CinG CinG Cin
With this structure, we can do a2n-bit add in 2(n+1) logic stages
Hardwareoverhead is<2× regular
ripple-carry.
1st ca
rry tic
k
2nd carry tick3rd carry tick4th carry tick
• Recall that multiplication is accomplished via shifting and addition.
• Example:0010(multiplicand)
x0110(multiplier)0000
+0010 (shift multiplicand left 1 bit)
00100+ 0010 0001100 (product)
Multiplication Algorithms
Intermediate product
Multiply by LSB of multiplier
Multiplication Algorithm 1
Hardware implementation of Algorithm 1:
64-bit ALU
Control test
MultiplierShift right
ProductWrite
MultiplicandShift left
64 bits
64 bits
32 bits
Done
1. TestMultiplier0
1a. Add multiplicand to product andplace the result in Product register
2. Shift the Multiplicand register left 1 bit
3. Shift the Multiplier register right 1 bit
32nd repetition?
Start
Multiplier0 = 0Multiplier0 = 1
No: < 32 repetitions
Yes: 32 repetitions
Multiplication Algorithm 1
For each bit:
Multiplication Algorithm 1
Iteration Step Multiplier Multiplicand Product
0 Initial Steps 0011 0000 0010 0000 0000
1 1a LSB multiplier = 1 0011 0000 0010 0000 0010
2 Shift Mcand lft -- 0000 0100 0000 0010
3 shift Multiplier rgt 0001 0000 0100 0000 0010
2 1a LSB multiplier = 1 0001 0000 0100 0000 0010
2 Shift Mcand lft -- 0000 1000 0000 0110
3 shift Multiplier rgt 0000 0000 1000 0000 0110
3 1 LSB multiplier = 0 0000 0000 1000 0000 0110
2 Shift Mcand lft -- 0001 0000 0000 0110
3 shift Multiplier rgt 0000 0001 0000 0000 0110
4 1 LSB multiplier = 0 0000 0001 0000 0000 0110
2 Shift Mcand lft 0010 0000 0000 0110
3 shift Multiplier rgt 0000 0010 0000 0000 0110
222 0000011000110010 Example: (4-bit)
• For Algorithm 1 we initialize the left half of the multiplicand to 0 to accommodate for its left shifts. All adds are 64 bits wide. This is wasteful and slow.
• Algorithm 2 instead of shifting multiplicand left, shift product register to the right => half the widths of the ALU and multiplicand
Multiplication Algorithms
MultiplierShift right
Write
32 bits
64 bits
32 bits
Shift right
Multiplicand
32-bit ALU
Product Control test
Done
1. TestMultiplier0
1a. Add multiplicand to the left half ofthe product and place the result inthe left half of the Product register
2. Shift the Product register right 1 bit
3. Shift the Multiplier register right 1 bit
32nd repetition?
Start
Multiplier0 = 0Multiplier0 = 1
No: < 32 repetitions
Yes: 32 repetitions
Multiplication Algorithm 2
For each bit:
Multiplication Algorithm 2
Iteration Step Multiplier Multiplicand Product
0 Initial Steps 0011 0010 0000 0000
1 1a LSB multiplier = 1 0011 0010 0010 0000
2 Shift Product register rgt 0011 0010 0001 0000
3 shift Multiplier rgt 0001 0010 0001 0000
2 1a LSB multiplier = 1 0001 0010 0011 0000
2 Shift Product register rgt 0001 0010 0001 1000
3 shift Multiplier rgt 0000 0010 0001 1000
3 1 LSB multiplier = 0 0000 0010 0001 1000
2 Shift Product register rgt 0000 0010 0000 1100
3 shift Multiplier rgt 0000 0010 0000 1100
4 1 LSB multiplier = 0 0000 0010 0000 1100
2 Shift Product register rgt 0000 0010 0000 0110
3 shift Multiplier rgt 0000 0010 0000 0110
222 0000011000110010 Example: (4-bit)
ControltestWrite
32 bits
64 bits
Shift rightProduct
Multiplicand
32-bit ALU
Multiplication Algorithm 3
• The third multiplication algorithm combines the right half of the product with the multiplier.
• This reduces the number of steps to implement the multiply and it also saves space.Hardware Implementation of Algorithm 3:
Done
1. TestProduct0
1a. Add multiplicand to the left half ofthe product and place the result inthe left half of the Product register
2. Shift the Product register right 1 bit
32nd repetition?
Start
Product0 = 0Product0 = 1
No: < 32 repetitions
Yes: 32 repetitions
Multiplication Algorithm 3
For each bit:
Multiplication Algorithm 3
Iteration Step Multiplicand Product
0 Initial Steps 0010 0000 0011
1 1a LSB product = 1 0010 0010 0011
2 Shift Product register rgt 0010 0001 0001
2 1a LSB product = 1 0010 0011 0001
2 Shift Product register rgt 0010 0001 1000
3 1 LSB product = 0 0010 0001 1000
2 Shift Product register rgt 0010 0000 1100
4 1 LSB product = 0 0010 0000 1100
2 Shift Product register rgt 0010 0000 0110
222 0000011000110010 Example: (4-bit)
• Example:
• Hardware implementations are similar to multiplication algorithms:
+ Algorithm 1 implements conventional division method+ Algorithm 2 reduces divisor register and ALU by half+ Algorithm 3 eliminates quotient register completely
Division Algorithms
10
1010
13
0
46
46
12
16612
-
-
(Divisor)
(Quotient)
(Dividend)
• We need a way to represent:
— numbers with fractions, e.g., 3.1416
— very small numbers, e.g., .000000001
— very large numbers, e.g., 3.15576 109
• Representation:
— sign, exponent, significand: (–1)sign significand 2exponent
— more bits for significand gives more accuracy
— more bits for exponent increases dynamic range
• IEEE 754 floating point standard:
— For Single Precision: 8 bit exponent, 23 bit significand, 1 bit sign
— For Double Precision: 11 bit exponent, 52 bit significand, 1 bit sign
Floating Point Numbers
SIGN EXPONENT SIGNIFICAND
• Leading “1” bit of significand is implicit• Exponent is usually “biased” to make sorting easier
—All 0s is smallest exponent, all 1s is largest—bias of 127 for single precision and 1023 for
double precision
• Summary: (–1)sign fraction) 2exponent – bias
• Example: −0.7510 = −1.122−1 — Single precision: (−1)1 (1 + .1000…) 2126−127
– 1|01111110|10000000000000000000000
— Double precision: (−1)1 (1 + .1000…) 21022−1023
– 1|01111111110|10000000000000000000000…(32 more 0s)
IEEE 754 floating-point standard
FP Addition Algorithm
• The number with the smaller exponent must be shifted right before adding.—So the “binary points” align.
• After adding, the sum must be normalized.—Then it is rounded,
– and possibly re-normalized
• Possible errors include:—Overflow (exponent too big) —Underflow (exp. too small)
Floating-Point Addition Hardware
• Implementsalgorithmfrom prev.slide.
• Note highcomplexitycomparedwith integeraddition HW.
FP Multiplication Algorithm
• Add the exponents.—Adjusting for bias.
• Multiply the significands.• Normalize,
—Check for over/under flow,
• then round.—Repeat if necessary.
• Compute the sign.
Ethics Addendum: Intel Pentium FP bug• In July 1994, Intel discovered there was a bug in the
Pentium’s FP division hardware…—But decided not to announce the bug, and go ahead and
ship chips having the flaw anyway, to save them time & money
– Based on their analysis, they thought errors could arise only rarely.
• Even after the bug was discovered by users, Intel initially refused toreplace the bad chips on request!—They got a lot of
bad PR from this…
• Lesson: Good, ethicalengineers fix problems when they first find them, and don’t cover them up!
• Computer arithmetic is constrained by limited precision.
• Bit patterns have no inherent meaning but standards do exist:
– two’s complement– IEEE 754 floating point
• Computer instructions determine “meaning” of the bit patterns.
• Performance and accuracy are important so there are many complexities in real machines (i.e., algorithms and implementation).
* Please read the remainder of the chapter on your own. However, you will only be responsible for the material that has been covered in the lectures for exams.
Summary