Upload
others
View
15
Download
0
Embed Size (px)
Citation preview
1ECE369
ECE 369
Chapter 3
2ECE369
Our Goal
3ECE369
Lets Build a Processor, Introduction to Instruction Set Architecture
• First Step Into Your Project !!!• How could we build a 1-bit ALU for add, and, or?
• Need to support the set-on-less-than instruction (slt)
– slt is an arithmetic instruction
– produces a 1 if a < b and 0 otherwise
– use subtraction: (a-b) < 0 implies a < b
• Need to support test for equality (beq $t5, $t6, $t7)
– use subtraction: (a-b) = 0 implies a = b
• How could we build a 32-bit ALU? 32
32
32
operation
result
a
b
ALU
Must Read Appendix B
4ECE369
One-bit adder
• Takes three input bits and generates two output bits• Multiple bits can be cascaded
cout = a.b + a.cin + b.cinsum = a <xor> b <xor> cin
5ECE369
Building a 32 bit ALU
b
0
2
Result
Operation
a
1
CarryIn
CarryOut
R e su lt31a3 1
b3 1
R e su lt0
C arryIn
a0
b0
R e su lt1a1
b1
R e su lt2a2
b2
O pe ra t io n
A LU 0
C arry In
C arry O u t
A LU 1
C arry In
C arry O u t
A LU 2
C arry In
C arry O u t
A LU 3 1
C arry In
6ECE369
• Two's complement approach: just negate b and add.• How do we negate?
• A very clever solution:
What about subtraction (a – b) ?
b
0
2
Result
Operation
a
1
CarryIn
CarryOut
000 = and001 = or010 = add
0
2
Result
Operation
a
1
CarryIn
CarryOut
0
1
Binvert
b000 = and001 = or010 = add110 = subtract
7ECE369
Supporting Slt
• Can we figure out the idea?
000 = and001 = or010 = add110 = subtract111 = slt
8ECE369
Seta31
0
Result0a0
Result1a1
0
Result2a2
0
Operation
b31
b0
b1
b2
Result31
Overflow
Bnegate
Zero
ALU0Less
CarryIn
CarryOut
ALU1Less
CarryIn
CarryOut
ALU2Less
CarryIn
CarryOut
ALU31Less
CarryIn
Test for equality
• Notice control lines
000 = and001 = or010 = add110 = subtract111 = slt
• Note: Zero is a 1 if result is zero!
9ECE369
How about “a nor b”
000 = and001 = or010 = add110 = subtract111 = slt
10ECE369
Big Picture
11ECE369
Conclusion
• We can build an ALU to support an instruction set– key idea: use multiplexor to select the output we want– we can efficiently perform subtraction using two’s complement– we can replicate a 1-bit ALU to produce a 32-bit ALU
• Important points about hardware– all of the gates are always working– speed of a gate is affected by the number of inputs to the gate– speed of a circuit is affected by the number of gates in series
(on the “critical path” or the “deepest level of logic”)• Our primary focus: comprehension, however,
– Clever changes to organization can improve performance(similar to using better algorithms in software)
• How about my instruction smt (set if more than)???
12ECE369
ALU Summary
• We can build an ALU to support addition• Our focus is on comprehension, not performance• Real processors use more sophisticated techniques for arithmetic• Where performance is not critical, hardware description languages
allow designers to completely automate the creation of hardware!
13ECE369
Overflow
14ECE369
Formulation
15ECE369
A Simpler Formula ?
16ECE369
Problem: Ripple carry adder is slow!
• Is a 32-bit ALU as fast as a 1-bit ALU?• Is there more than one way to do addition?
• Can you see the ripple? How could you get rid of it?
c1 = a0b0 + a0c0 + b0c0c2 = a1b1 + a1c1 + b1c1 c2 = c3 = a2b2 + a2c2 + b2c2 c3 = c4 = a3b3 + a3c3 + b3c3 c4 =
• Not feasible! Why?
17ECE369
Carry Bit
ininout bcacabc ++=
iiiiiii cbcabac ++=+1
0000001 cbcabac ++=
1111112 cbcabac ++=( ) ( )0000001000000111 cbcababcbcabaaba ++++++=
00100100100100100111 cbbcabbabcbacaabaaba ++++++=
2222223 cbcabac ++=
18ECE369
Generate/Propagate
ai bi ci+1
0 00 11 01 1
0000001 cbcabac ++=
000001 )( cbabac ++=
1111112 cbcabac ++=
11111 )( cbaba ++=
[ ]000001111 )()( cbabababa ++++=
},{ iiii babacommon +=ai bi ci+1
0 00 11 01 1
iibagenerate = ii bapropagate +=
01
00
1
0
1
1
19ECE369
Generate/Propagate (Ctd.)
iibagenerate =ii bapropagate +=
000001 )( cbabac ++=
000 cpg +=
111112 )( cbabac ++=
)( 00011 cpgpg ++=
001011 cppgpg ++=
iiii cpgc +=+ 1
20ECE369
Carry-look-ahead adder
• Motivation: – If we didn't know the value of carry-in, what could we do?– When would we always generate a carry? gi = ai . bi– When would we propagate the carry? pi = ai + bi
• Did we get rid of the ripple?
c1 = g0 + p0c0c2 = g1 + p1c1 c2 = g1 + p1g0 + p1p0c0c3 = g2 + p2c2 c3 = g2 + p2g1 + p2p1g0 + p2p1p0c0c4 = g3 + p3c3 c4 = g3 + p3g2 + p3p2g1 + p3p2p1g0 + p3p2p1p0c0
• Feasible! Why?c1 = a0b0 + a0c0 + b0c0c2 = a1b1 + a1c1 + b1c1 c2 = c3 = a2b2 + a2c2 + b2c2 c3 = c4 = a3b3 + a3c3 + b3c3 c4 =
a3 a2 a1 a0b3 b2 b1 b0
21ECE369
A 4-bit carry look-ahead adder
• Generate g and p term for each bit
• Use g’s, p’s and carry in to generate all C’s
• Also use them to generate block G and P
• CLA principle can be used recursively
22ECE369
16 Bit CLA
23ECE369
Gate Delay for 16 bit Adder
iibagenerate =
ii bapropagate +=1
1+2
1+2+2
24ECE369
Multiplication
• More complicated than addition– Accomplished via shifting and addition
• More time and more area
25ECE369
Done
1. Test�Multiplier0
1a. Add multiplicand to product and�place the result in Product register
2. Shift the Multiplicand register left 1 bit
3. Shift the Multiplier register right 1 bit
32nd repetition?
Start
Multiplier0 = 0Multiplier0 = 1
No: < 32 repetitions
Yes: 32 repetitions
64-bit ALU
Control test
MultiplierShift right
ProductWrite
MultiplicandShift left
64 bits
64 bits
32 bits
Multiplication: Implementation
26ECE369
Example
64-bit ALU
Control test
MultiplierShift right
ProductWrite
MultiplicandShift left
64 bits
64 bits
32 bits
27ECE369
MultiplierShift right
Write
32 bits
64 bits
32 bits
Shift right
Multiplicand
32-bit ALU
Product Control test
Done
1. Test�Multiplier0
1a. Add multiplicand to the left half of�the product and place the result in�the left half of the Product register
2. Shift the Product register right 1 bit
3. Shift the Multiplier register right 1 bit
32nd repetition?
Start
Multiplier0 = 0Multiplier0 = 1
No: < 32 repetitions
Yes: 32 repetitions
Second version
64-bit ALU
Control test
MultiplierShift right
ProductWrite
MultiplicandShift left
64 bits
64 bits
32 bits
28ECE369
Example
MultiplierShift right
Write
32 bits
64 bits
32 bits
Shift right
Multiplicand
32-bit ALU
Product Control test
29ECE369
Control�testWrite
32 bits
64 bits
Shift rightProduct
Multiplicand
32-bit ALU
Done
1. Test�Product0
1a. Add multiplicand to the left half of�the product and place the result in�the left half of the Product register
2. Shift the Product register right 1 bit
32nd repetition?
Start
Product0 = 0Product0 = 1
No: < 32 repetitions
Yes: 32 repetitions
Final version
30ECE369
Example
Control�testWrite
32 bits
64 bits
Shift rightProduct
Multiplicand
32-bit ALU
31ECE369
Division
• Even more complicated– Can be accomplished via shifting and addition/subtraction
• More time and more area• Negative numbers: Even more difficult• There are better techniques, we won’t look at them
32ECE369
Division: First version
33ECE369
Example
34ECE369
Division: Second version
35ECE369
Improved Division
36ECE369
Division (7÷2)
37ECE369
Floating point (a brief look)
• We need a way to represent– Numbers with fractions, e.g., 3.1416– Very small numbers, e.g., 0.000000001– Very large numbers, e.g., 3.15576 x 109
• Representation:– Sign, exponent, fraction: (–1)sign x fraction x 2exponent
– More bits for fraction gives more accuracy– More bits for exponent increases range
• IEEE 754 floating point standard: – single precision: 8 bit exponent, 23 bit fraction– double precision: 11 bit exponent, 52 bit fraction
38ECE369
IEEE 754 floating-point standard
• 1.f x 2e
• 1.s1s2s3s4…. snx2e
• Leading “1” bit of significand is implicit
• Exponent is “biased” to make sorting easier– All 0s is smallest exponent, all 1s is largest– Bias of 127 for single precision and 1023
for double precision
39ECE369
Single Precision
–summary: (–1)sign x (1+significand) x 2(exponent – bias)
• Example:• 11/100 = 11/102= 0.11 = 1.1x10-1
–Decimal: -.75 = -3/4 = -3/22
–Binary: -.11 = -1.1 x 2-1
–IEEE single precision: 1 01111110 10000000000000000000000
–exponent-bias=-1 => exponent = 126 = 01111110
40ECE369
Opposite Way
Sign Exponent Fraction
- 129 0x2-1+1x2-2=0.25
41ECE369
Floating point addition
1.610x10-1 + 9.999x101
0.01610x101 + 9.999x101
10.015x101
1.0015x102
1.002x102
42ECE369
Floating point addition
•
Still normalized?
4. Round the significand to the appropriatenumber of bits
YesOverflow orunderflow?
Start
No
Yes
Done
1. Compare the exponents of the two numbers.Shift the smaller number to the right until itsexponent would match the larger exponent
2. Add the significands
3. Normalize the sum, either shifting right andincrementing the exponent or shifting left
and decrementing the exponent
No Exception
Small ALU
Exponentdifference
Control
ExponentSign Fraction
Big ALU
ExponentSign Fraction
0 1 0 1 0 1
Shift right
0 1 0 1
Increment ordecrement
Shift left or right
Rounding hardware
ExponentSign Fraction
43ECE369
Add 0.510 and -0.437510
44ECE369
Multiplication
45ECE369
Floating point multiply
• To multiply two numbers – Add the two exponent (remember access 127 notation)– Produce the result sign as exor of two signs– Multiply significand portions– Results will be 1x.xxxxx… or 01.xxxx….– In the first case shift result right and adjust exponent– Round off the result– This may require another normalization step
46ECE369
Multiplication 0.510 and -0.437510
47ECE369
Floating point divide
• To divide two numbers – Subtract divisor’s exponent from the dividend’s exponent
(remember access 127 notation)– Produce the result sign as exor of two signs– Divide dividend’s significand by divisor’s significand portions– Results will be 1.xxxxx… or 0.1xxxx….– In the second case shift result left and adjust exponent– Round off the result– This may require another normalization step
48ECE369
Floating point complexities
• Operations are somewhat more complicated (see text)• In addition to overflow we can have “underflow”• Accuracy can be a big problem
– IEEE 754 keeps two extra bits, guard and round– Four rounding modes– Positive divided by zero yields “infinity”– Zero divide by zero yields “not a number”– Other complexities
• Implementing the standard can be tricky• Not using the standard can be even worse
– See text for description of 80x86 and Pentium bug!
49ECE369
Chapter Three Summary
• Computer arithmetic is constrained by limited precision– Read pages 213-215
• Bit patterns have no inherent meaning but standards do exist– two’s complement– IEEE 754 floating point
• Operations are somewhat more complicated (see text)• In addition to overflow we can have “underflow”• Implementing the standard can be tricky• Not using the standard can be even worse (3.10)
– See text for description of 80x86 and Pentium bug!