Upload
yrikki
View
238
Download
1
Embed Size (px)
Citation preview
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
1/64
Chapter 3Arithmetic for Computers
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
2/64
Chapter 3 Arithmetic for Computers 2
Arithmetic for Computers
Operations on integers Addition and subtraction
Multiplication and division
Dealing with overflow
Floating-point real numbers
Representation and operations
3.1Introduction
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
3/64
3
Arithmetic
Where we've been: Performance (seconds, cycles, instructions)
Abstractions:Instruction Set ArchitectureAssembly Language and Machine Language
What's up ahead:
Implementing the Architecture
32
32
32
operation
result
a
b
ALU
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
4/64
4
Bits are just bits (no inherent meaning) conventions define relationship between bits and numbers
Binary numbers (base 2)0000 0001 0010 0011 0100 0101 0110 0111 1000 1001...decimal: 0...2n-1
Of course it gets more complicated:
numbers are finite (overflow)fractions and real numbersnegative numberse.g., no MIPS subi instruction; addi can add a negative number)
How do we represent negative numbers?i.e., which bit patterns will represent which numbers?
Numbers
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
5/64
5
Sign Magnitude: One's Complement Two's Complement
000 = +0 000 = +0 000 = +0001 = +1 001 = +1 001 = +1010 = +2 010 = +2 010 = +2011 = +3 011 = +3 011 = +3100 = -0 100 = -3 100 = -4101 = -1 101 = -2 101 = -3110 = -2 110 = -1 110 = -2111 = -3 111 = -0 111 = -1
Issues: balance, number of zeros, ease of operations
Which one is best? Why?
Possible Representations
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
6/64
6
32 bit signed numbers:
0000 0000 0000 0000 0000 0000 0000 0000two = 0ten0000 0000 0000 0000 0000 0000 0000 0001two = + 1ten0000 0000 0000 0000 0000 0000 0000 0010two = + 2ten...0111 1111 1111 1111 1111 1111 1111 1110two = + 2,147,483,646ten0111 1111 1111 1111 1111 1111 1111 1111two = + 2,147,483,647ten1000 0000 0000 0000 0000 0000 0000 0000two = 2,147,483,648ten1000 0000 0000 0000 0000 0000 0000 0001two = 2,147,483,647ten1000 0000 0000 0000 0000 0000 0000 0010two = 2,147,483,646ten...1111 1111 1111 1111 1111 1111 1111 1101two = 3ten1111 1111 1111 1111 1111 1111 1111 1110two = 2ten1111 1111 1111 1111 1111 1111 1111 1111two = 1ten
maxint
minint
MIPS
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
7/64
7
Negating a two's complement number: invert all bits and add 1 remember: negate (+/-) and invert (1/0) are quite different!
Converting n bit numbers into numbers with more than n bits:
MIPS 16 bit immediate gets converted to 32 bits for arithmetic
copy the most significant bit (the sign bit) into the other bits0010 -> 0000 0010
1010 -> 1111 1010
"sign extension" (lbu vs. lb)
Two's Complement Operations
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
8/64
8
Just like in grade school (carry/borrow 1s)
0111 0111 0110
+ 0110 - 0110 - 0101
Two's complement operations easy
subtraction using addition of negative numbers0111
+ 1010
Overflow (result too large for finite computer word):
e.g., adding two n-bit numbers does not yield an n-bit number0111
+ 0001 note that overf low term is somewhat misleading,
1000 it does not mean a carry overflowed
Addition & Subtraction
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
9/64
9
No overflow when adding a positive and a negative number
No overflow when signs are the same for subtraction
Overflow occurs when the value affects the sign:
overflow when adding two positives yields a negative
or, adding two negatives gives a positive
or, subtract a negative from a positive and get a negative or, subtract a positive from a negative and get a positive
Consider the operations A + B, and A B
Can overflow occur if B is 0 ?
Can overflow occur if A is 0 ?
Detecting Overflow
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
10/64
10
An exception (interrupt) occurs
Control jumps to predefined address for exception
Interrupted address is saved for possible resumption
Details based on software system / language
example: flight control vs. homework assignment
Don't always want to detect overflow
new MIPS instructions: addu, addiu, subu
note: addiu sti l l sign-extends!note: sltu, sltiu for unsigned comparisons
Roll over: circular buffers Saturation: pixel l ightness contr ol
Effects of Overflow
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
11/64
11
Problem: Consider a logic function with three inputs: A, B, and C.
Output D is true if at least one input is trueOutput E is true if exactly two inputs are trueOutput F is true only if all three inputs are true
Show the truth table for these three functions.
Show the Boolean equations for these three functions.
Show an implementation consisting of inverters, AND, and OR gates.
Review: Boolean Algebra & Gates
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
12/64
12
Let's build an ALU to support the andi and ori instructions
we'll just build a 1 bit ALU, and use 32 of them
Possible Implementation (sum-of-products):
b
a
operation
result
op a b res
An ALU (arithmetic logic unit)
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
13/64
13
Appendix: C.5
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
14/64
14
Selects one of the inputs to be the output, based on a control input
Lets build our ALU using a MUX:
S
CA
B0
1
Review: The Multiplexor
note: we call this a 2-input mux
even though i t has 3 inputs!
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
15/64
15
Not easy to decide the best way to build something
Don't want too many inputs to a single gate
Dont want to have to go through too many gates
for our purposes, ease of comprehension is important
Let's look at a 1-bit ALU for addition:
How could we build a 1-bit ALU for add, and, and or?
How could we build a 32-bit ALU?
Different Implementations
cout = a b + a cin + b cinsum = a xor b xor cin
Sum
CarryIn
CarryOut
a
b
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
16/64
16
Building a 32 bit ALU
b
0
2
Result
Operation
a
1
CarryIn
CarryOut
Result31
a31
b31
Result0
CarryIn
a0
b0
Result1
a1
b1
Result2
a2
b2
Operation
ALU0
CarryIn
CarryOut
ALU1
CarryIn
CarryOut
ALU2
CarryIn
CarryOut
ALU31
CarryIn
R0 = a ^ b;
R1 = a V b;
R2 = a + b;
Case (Op) {
0: R = R0;
1: R = R1;
2: R = R2 }
Case (Op) {
0: R = a ^ b;
1: R = a V b;
2: R = a + b}
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
17/64
17
Two's complement approach: just negate b and add.
How do we negate?
A very clever solution:
Result = A + (~B) + 1
What about subtraction (a b) ?
0
2
Result
Operation
a
1
CarryIn
CarryOut
0
1
Binvert
b
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
18/64
18
Need to support the set-on-less-than instruction (slt) remember: slt is an arithmetic instruction
produces a 1 if rs < rt and 0 otherwise
use subtraction: (a-b) < 0 implies a < b
Need to support test for equality (beq $t5, $t6, $t7) use subtraction: (a-b) = 0 implies a = b
Tailoring the ALU to the MIPS
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
19/64
Supporting slt
Can we figure out the idea?0
3
Result
Operation
a
1
CarryIn
CarryOut
0
1
Binvert
b 2
Less
a.
0
3
Result
Operation
a
1
CarryIn
0
1
Binvert
b 2
Less
Set
Overflowdetection
Overflow
b.
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
20/64
20
Set
a31
0
ALU0 Result0
CarryIn
a0
Result1
a1
0
Result2
a2
0
Operation
b31
b0
b1
b2
Result31
Overflow
Binvert
CarryIn
Less
CarryIn
CarryOut
ALU1
Less
CarryIn
CarryOut
ALU2
Less
CarryIn
CarryOut
ALU31
Less
CarryIn
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
21/64
21
Test for equality
Notice control lines:
000 = and
001 = or
010 = add
110 = subtract
111 = slt
Note: zero is a 1 when the resul t is zero!
Seta31
0
Result0a0
Result1a1
0
Result2a2
0
Operation
b31
b0
b1
b2
Result31
Overflow
Bnegate
Zero
ALU0
Less
CarryIn
CarryOut
ALU1
Less
CarryIn
CarryOut
ALU2
Less
CarryIn
CarryOut
ALU31
Less
CarryIn
0
3
Result
Operation
a
1
CarryIn
CarryOut
0
1
Binvert
b 2
Less
a.
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
22/64
22
Conclusion
We can build an ALU to support the MIPS instruction set
key idea: use multiplexor to select the output we want
we can efficiently perform subtraction using twos complement
we can replicate a 1-bit ALU to produce a 32-bit ALU
Important points about hardware
all of the gates are always working the speed of a gate is affected by the number of inputs to the gate
the speed of a circuit is affected by the number of gates in series
(on the critical path or the deepest level of logic)
Our primary focus: comprehension, however,
Clever changes to organization can improve performance(similar to using better algorithms in software)
well look at two examples for addition and multiplication
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
23/64
23
Is a 32-bit ALU as fast as a 1-bit ALU?
Is there more than one way to do addition?
two extremes: ripple carry and sum-of-products
Can you see the ripple? How could you get rid of it?
c1 = b0c0 + a0c0 +a0b0 c2 = b1c1 + a1c1 +a1b1c3 = b2c2 + a2c2 +a2b2 c4 = b3c3 + a3c3 +a3b3
Expanding the carry chainsc2 = a1a0b0+a1a0c0+a1b0c0+b1a0b0+b1a0c0+b1b0c0+a1b1
c3 = ??? c4 = ???
Not feasible! Why?
Cn has 2P terms (P=0n)
Problem: ripple carry adder is slow
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
24/64
24
Appendix: C.6
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
25/64
25
An approach in-between our two extremes
Motivation:
If we didn't know the value of carry-in, what could we do?
When would we always generate a carry? gi = aibi When would we propagate the carry? pi = ai + bi
Did we get rid of the ripple?
c1 = g0 + p0c0 c2 = g1 + p1c1c3 = g2 + p2c2 c4 = g3 + p3c3
Expanding the carry chainsc2 = g1+p1g0+p1p0c0
Feasible! Why?
The carry chain does not disappear, but is much smaller:
Cn has n+1 terms
Carry-lookahead adder
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
26/64
26
How about constructing a 16-bit adder in CLA way?
Cant build a 16 bit adder this way... (too big) Could use ripple carry of 4-bit CLA adders
Better: use the CLA principle again!
Use principle to build bigger adders
CarryIn
Result0--3
ALU0
CarryIn
Result4--7
ALU1
CarryIn
Result8--11
ALU2
CarryIn
CarryOut
Result12--15
ALU3
CarryIn
C1
C2
C3
C4
P0G0
P1
G1
P2G2
P3G3
pigi
pi + 1
gi + 1
ci + 1
ci + 2
ci + 3
ci + 4
pi + 2gi + 2
pi + 3gi + 3
a0
b0a1b1a2b2a3b3
a4b4a5b5a6
b6a7b7
a8b8a9b9
a10b10a11b11
a12b12a13b13a14b14a15b15
Carry-lookahead unit
P0 = p3 p2 p1 p0
P1 = p7 p6 p5 p4
P2 =p11 p10 p9 p8
P3 = p15 p14 p13 p12
G0 = g3+p3g2+p3p2g1+p3p2p1g0
G1 = g7+p7g6+p7p6g5+p7p6p5g4
G2 = g11+p11g10+p11p10g9+p11p10p9g8
G3 = g15+p15g14+p15p14g13+p15p14p13g12
C1 = G0 +(P0
c0)
C2 = G1+(P1 G0)+(P1P0c0)
C3 = G2+(P2 G1)+(P2P1G0)+(P2P1P0c0)
C4 = G3+(P3 G2)+(P3P2G1)+(P3P2P1G0)
+(P3P2P1P0c0)
3
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
27/64
Chapter 3 Arithmetic for Computers 31
Multiplication
Start with long-multiplication approach
1000 1001
10000000000010001001000
Length of product isthe sum of operandlengths
multiplicand
multiplier
product
3.3Multiplica
tion
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
28/64
Chapter 3 Arithmetic for Computers 32
Multiplication Hardware
Initially 0
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
29/64
Chapter 3 Arithmetic for Computers 33
Optimized Multiplier
Perform steps in parallel: add/shift
One cycle per partial-product addition
Thats ok, if frequency of multiplications is low
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
30/64
Chapter 3 Arithmetic for Computers 34
Faster Multiplier
Uses multiple adders
Cost/performance tradeoff
Can be pipelined
Several multiplication performed in parallel
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
31/64
Chapter 3 Arithmetic for Computers 35
MIPS Multiplication
Two 32-bit registers for product
HI: most-significant 32 bits
LO: least-significant 32-bits
Instructions
mult rs, rt / multu rs, rt 64-bit product in HI/LO
mfhi rd / mflo rd
Move from HI/LO to rd
Can test HI value to see if product overflows 32 bits mul rd, rs, rt
Least-significant 32 bits of product> rd
3
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
32/64
Chapter 3 Arithmetic for Computers 36
Division
Check for 0 divisor
Long division approach If divisor dividend bits
1 bit in quotient, subtract
Otherwise 0 bit in quotient, bring down next
dividend bit
Restoring division Do the subtract, and if remainder
goes < 0, add divisor back
Signed division Divide using absolute values
Adjust sign of quotient and remainderas required
100101000 10010100
-1000101011010-1000
100
n-bit operands yield (n+1)-bitquotient and n-bit remainder
quotient
dividend
remainder
divisor
3.4Division
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
33/64
Chapter 3 Arithmetic for Computers 37
Division Hardware
Initially dividend
Initially divisorin left half
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
34/64
Chapter 3 Arithmetic for Computers 38
Optimized Divider
One cycle per partial-remainder subtraction Looks a lot like a multiplier!
Same hardware can be used for both
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
35/64
Chapter 3 Arithmetic for Computers 39
Faster Division
Cant use parallel hardware as in multiplier
Subtraction is conditional on sign of remainder
Faster dividers (e.g. SRT devision)generate multiple quotient bits per step
Still require multiple steps
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
36/64
Chapter 3 Arithmetic for Computers 40
MIPS Division
Use HI/LO registers for result
HI: 32-bit remainder
LO: 32-bit quotient
Instructions
div rs, rt / divu rs, rt
No overflow or divide-by-0 checking
Software must perform checks if required
Use mfhi, mflo to access result
3
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
37/64
Chapter 3 Arithmetic for Computers 41
Floating Point
Representation for non-integral numbers
Including very small and very large numbers
Like scientific notation
2.34 1056
+0.002 104
+987.02 109
In binary
1.xxxxxxx2 2yyyy
Types float and double in C
normalized
not normalized
3.5FloatingP
oint
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
38/64
Chapter 3 Arithmetic for Computers 42
Floating Point Standard
Defined by IEEE Std 754-1985
Developed in response to divergence ofrepresentations
Portability issues for scientific code
Now almost universally adopted
Two representations
Single precision (32-bit) Double precision (64-bit)
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
39/64
Chapter 3 Arithmetic for Computers 43
IEEE Floating-Point Format
S: sign bit (0 non-negative, 1 negative) Normalize significand: 1.0 |significand| < 2.0
Always has a leading pre-binary-point 1 bit, so no need torepresent it explicitly (hidden bit)
Significand is Fraction with the 1. restored
Exponent: excess representation: actual exponent + Bias Ensures exponent is unsigned Single: Bias = 127; Double: Bias = 1203
S Exponent Fraction
single: 8 bits
double: 11 bits
single: 23 bits
double: 52 bits
Bias)(ExponentS 2Fraction)(11)(x
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
40/64
Chapter 3 Arithmetic for Computers 44
Single-Precision Range
Exponents 00000000 and 11111111 reserved
Smallest value
Exponent: 00000001 actual exponent = 1 127 =126
Fraction: 00000 significand = 1.0 1.0 2126 1.2 1038
Largest value
exponent: 11111110 actual exponent = 254 127 = +127
Fraction: 11111significand 2.0
2.0 2+127 3.4 10+38
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
41/64
Chapter 3 Arithmetic for Computers 45
Double-Precision Range
Exponents 000000 and 111111 reserved
Smallest value
Exponent: 00000000001 actual exponent = 1 1023 =1022
Fraction: 00000 significand = 1.0 1.0 21022 2.2 10308
Largest value
Exponent: 11111111110 actual exponent = 2046 1023 = +1023
Fraction: 11111significand 2.0
2.0 2+1023 1.8 10+308
Fl i P i P i i
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
42/64
Chapter 3 Arithmetic for Computers 46
Floating-Point Precision
Relative precision
all fraction bits are significant
Single: approx 223
Equivalent to 23 log102 23 0.3 6 decimal
digits of precision
Double: approx 252
Equivalent to 52 log102 52 0.3 16 decimaldigits of precision
Fl ti P i t E l
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
43/64
Chapter 3 Arithmetic for Computers 47
Floating-Point Example
Represent0.75
0.75 = (1)1 1.12 21
S = 1
Fraction = 1000002
Exponent =1 + Bias
Single:1 + 127 = 126 = 011111102
Double:1 + 1023 = 1022 = 011111111102
Single: 101111110100000 Double: 101111111110100000
Fl ti P i t E l
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
44/64
Chapter 3 Arithmetic for Computers 48
Floating-Point Example
What number is represented by the single-precision float1100000010100000 S = 1
Fraction = 01000002 Fxponent = 100000012 = 129
x = (1)1 (1 + 012) 2(129 127)
= (1) 1.25 22
=5.0
IEEE 754 di f fl ti i t b
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
45/64
51
IEEE 754 encoding of floating-point numbers
Single precision Double precision Object represented
Exponent Fraction Exponent Fraction
0 0 0 0 0
0 Nonzero 0 Nonzero Denomalized number
1-254 Anything 1-2046 Anything Floating-point number
255 0 2047 0 Infinity
255 Nonzero 2047 Nonzero NaN (Not a Number)
+-+-
+-
Fl ti P i t Additi
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
46/64
Chapter 3 Arithmetic for Computers 52
Floating-Point Addition
Consider a 4-digit decimal example 9.999 101 + 1.610 101
1. Align decimal points Shift number with smaller exponent
9.999 101 + 0.016 101
2. Add significands 9.999 101 + 0.016 101 = 10.015 101
3. Normalize result & check for over/underflow
1.0015 102
4. Round and renormalize if necessary 1.002 102
Fl ti P i t Additi
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
47/64
Chapter 3 Arithmetic for Computers 53
Floating-Point Addition
Now consider a 4-digit binary example 1.0002 21 +1.1102 22 (0.5 +0.4375)
1. Align binary points Shift number with smaller exponent
1.0002 21 +0.1112 2
1
2. Add significands 1.0002 2
1 +0.1112 21 = 0.0012 2
1
3. Normalize result & check for over/underflow
1.0002 24
, with no over/underflow 4. Round and renormalize if necessary
1.0002 24 (no change) = 0.0625
FP Add H d
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
48/64
Chapter 3 Arithmetic for Computers 54
FP Adder Hardware
Much more complex than integer adder
Doing it in one clock cycle would take toolong
Much longer than integer operations
Slower clock would penalize all instructions
FP adder usually takes several cycles
Can be pipelined
FP Add H d
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
49/64
Chapter 3 Arithmetic for Computers 55
FP Adder Hardware
Step 1
Step 2
Step 3
Step 4
FP A ith ti H d
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
50/64
Chapter 3 Arithmetic for Computers 58
FP Arithmetic Hardware
FP multiplier is of similar complexity to FPadder But uses a multiplier for significands instead of
an adder
FP arithmetic hardware usually does Addition, subtraction, multiplication, division,
reciprocal, square-root
FP integer conversion
Operations usually takes several cycles Can be pipelined
FP I t ti i MIPS
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
51/64
Chapter 3 Arithmetic for Computers 59
FP Instructions in MIPS
FP hardware is coprocessor 1 Adjunct processor that extends the ISA
Separate FP registers 32 single-precision: $f0, $f1, $f31 Paired for double-precision: $f0/$f1, $f2/$f3,
Release 2 of MIPs ISA supports 32
64-bit FP regs FP instructions operate only on FP registers
Programs generally dont do integer ops on FP data,or vice versa
More registers with minimal code-size impact
FP load and store instructions lwc1, ldc1, swc1, sdc1
e.g., ldc1 $f8, 32($sp)
FP I t ti i MIPS
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
52/64
Chapter 3 Arithmetic for Computers 60
FP Instructions in MIPS
Single-precision arithmetic add.s, sub.s, mul.s, div.s
e.g., add.s $f0, $f1, $f6
Double-precision arithmetic add.d, sub.d, mul.d, div.d
e.g., mul.d $f4, $f4, $f6
Single- and double-precision comparison c.xx.s, c.xx.d (xxis eq, lt, le, ) Sets or clears FP condition-code bit
e.g. c.lt.s $f3, $f4
Branch on FP condition code true or false bc1t, bc1f
e.g., bc1t TargetLabel
FP E ample F to C
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
53/64
Chapter 3 Arithmetic for Computers 61
FP Example: F to C
C code:
float f2c (float fahr) {return ((5.0/9.0)*(fahr - 32.0));
}
fahr in $f12, result in $f0, literals in global memory
space Compiled MIPS code:f2c: lwc1 $f16, const5($gp)
lwc2 $f18, const9($gp)div.s $f16, $f16, $f18lwc1 $f18, const32($gp)sub.s $f18, $f12, $f18mul.s $f0, $f16, $f18jr $ra
FP E l A M lti li ti
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
54/64
Chapter 3 Arithmetic for Computers 62
FP Example: Array Multiplication
X = X + Y Z All 32 32 matrices, 64-bit double-precision elements
C code:void mm (double x[][],
double y[][], double z[][]) {
int i, j, k;for (i = 0; i! = 32; i = i + 1)
for (j = 0; j! = 32; j = j + 1)for (k = 0; k! = 32; k = k + 1)x[i][j] = x[i][j]
+ y[i][k] * z[k][j];}
Addresses ofx, y, z in $a0, $a1, $a2, andi, j, k in $s0, $s1, $s2
FP E l A M lti li ti
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
55/64
Chapter 3 Arithmetic for Computers 63
FP Example: Array Multiplication
MIPS code:li $t1, 32 # $t1 = 32 (row size/loop end)
li $s0, 0 # i = 0; initialize 1st for loop
L1: li $s1, 0 # j = 0; restart 2nd for loop
L2: li $s2, 0 # k = 0; restart 3rd for loop
sll $t2, $s0, 5 # $t2 = i * 32 (size of row of x)
addu $t2, $t2, $s1 # $t2 = i * size(row) + jsll $t2, $t2, 3 # $t2 = byte offset of [i][j]
addu $t2, $a0, $t2 # $t2 = byte address of x[i][j]
l.d $f4, 0($t2) # $f4 = 8 bytes of x[i][j]
L3: sll $t0, $s2, 5 # $t0 = k * 32 (size of row of z)
addu $t0, $t0, $s1 # $t0 = k * size(row) + jsll $t0, $t0, 3 # $t0 = byte offset of [k][j]
addu $t0, $a2, $t0 # $t0 = byte address of z[k][j]
l.d $f16, 0($t0) # $f16 = 8 bytes of z[k][j]
FP E l A M lti li ti
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
56/64
Chapter 3 Arithmetic for Computers 64
FP Example: Array Multiplication
sll $t0, $s0, 5 # $t0 = i*32 (size of row of y)addu $t0, $t0, $s2 # $t0 = i*size(row) + k
sll $t0, $t0, 3 # $t0 = byte offset of [i][k]
addu $t0, $a1, $t0 # $t0 = byte address of y[i][k]
l.d $f18, 0($t0) # $f18 = 8 bytes of y[i][k]
mul.d $f16, $f18, $f16 # $f16 = y[i][k] * z[k][j]
add.d $f4, $f4, $f16 # f4=x[i][j] + y[i][k]*z[k][j]
addiu $s2, $s2, 1 # $k k + 1
bne $s2, $t1, L3 # if (k != 32) go to L3
s.d $f4, 0($t2) # x[i][j] = $f4
addiu $s1, $s1, 1 # $j = j + 1
bne $s1, $t1, L2 # if (j != 32) go to L2addiu $s0, $s0, 1 # $i = i + 1
bne $s0, $t1, L1 # if (i != 32) go to L1
Interpretation of Data
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
57/64
Chapter 3 Arithmetic for Computers 66
Interpretation of Data
Bits have no inherent meaning
Interpretation depends on the instructions
applied Computer representations of numbers
Finite range and precision
Need to account for this in programs
The BIG Picture
Associativity3.6
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
58/64
Chapter 3 Arithmetic for Computers 67
Associativity
Parallel programs may interleaveoperations in unexpected orders
Assumptions of associativity may fail
6Parallelism
andComputerArithmetic:Ass
ociativity
(x+y)+z x+(y+z)
x -1.50E+38 -1.50E+38
y 1.50E+38
z 1.0 1.0
1.00E+00 0.00E+00
0.00E+00
1.50E+38
Need to validate parallel programs undervarying degrees of parallelism
x86 FP Architecture3.7
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
59/64
Chapter 3 Arithmetic for Computers 68
x86 FP Architecture
Originally based on 8087 FP coprocessor
8 80-bit extended-precision registers
Used as a push-down stack
Registers indexed from TOS: ST(0), ST(1),
FP values are 32-bit or 64 in memory Converted on load/store of memory operand
Integer operands can also be convertedon load/store
Very difficult to generate and optimize code Result: poor FP performance
7RealStuff
:FloatingPointinthex86
x86 FP Instructions
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
60/64
Chapter 3 Arithmetic for Computers 69
x86 FP Instructions
Optional variations I: integer operand P: pop operand from stack R: reverse operand order But not all combinations allowed
Data transfer Arithmetic Compare TranscendentalFILD mem/ST(i)
FISTP mem/ST(i)
FLDPI
FLD1
FLDZ
FIADDP mem/ST(i)
FISUBRP mem/ST(i)FIMULP mem/ST(i)FIDIVRP mem/ST(i)
FSQRT
FABSFRNDINT
FICOMP
FIUCOMP
FSTSW AX/mem
FPATAN
F2XMI
FCOS
FPTAN
FPREM
FPSIN
FYL2X
St i SIMD E t i 2 (SSE2)
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
61/64
Chapter 3 Arithmetic for Computers 70
Streaming SIMD Extension 2 (SSE2)
Adds 4 128-bit registers
Extended to 8 registers in AMD64/EM64T
Can be used for multiple FP operands
2 64-bit double precision
4 32-bit double precision
Instructions operate on them simultaneously
Single-Instruction Multiple-Data
Right Shift and Division3.8
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
62/64
Chapter 3 Arithmetic for Computers 71
Right Shift and Division
Left shift by iplaces multiplies an integer
by 2i Right shift divides by 2i?
Only for unsigned integers
For signed integers Arithmetic right shift: replicate the sign bit
e.g.,5 / 4 11111011
2>> 2 = 11111110
2=2
Rounds toward
c.f. 111110112 >>> 2 = 001111102 = +62
FallaciesandPitfalls
Who Cares About FP Accuracy?
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
63/64
Chapter 3 Arithmetic for Computers 72
Who Cares About FP Accuracy?
Important for scientific code
But for everyday consumer use?
My bank balance is out by 0.0002!
The Intel Pentium FDIV bug
The market expects accuracy
See Colwell, The Pentium Chronicles
Concluding Remarks3.9
7/28/2019 Chapter 3 Arithmetic for Computers(Revised) (1)
64/64
Concluding Remarks
ISAs support arithmetic
Signed and unsigned integers
Floating-point approximation to reals
Bounded range and precision
Operations can overflow and underflow
MIPS ISA
Core instructions: 54 most frequently used
100% of SPECINT, 97% of SPECFP
Other instructions: less frequent
Concludin
gRemarks