Upload
paul-obrien
View
221
Download
0
Embed Size (px)
Citation preview
Spring 2013 1
Floating Point Computation
Jyun-Ming Chen
Spring 2013 2
Contents
• Sources of Computational Error
• Computer Representation of (floating-point) Numbers
• Efficiency Issues
Spring 2013 3
Sources of Computational Error
• Converting a mathematical problem to numerical problem, one introduces errors due to limited computation resources:– round off error (limited
precision of representation)
– truncation error (limited time for computation)
• Misc.– Error in original data
– Blunder: to make a mistake through stupidity, ignorance, or carelessness; programming/data input error
– Propagated error
Spring 2013 4
Supplement: Error Classification (Hildebrand)
• Gross error: caused by human or mechanical mistakes
• Roundoff error: the consequence of using a number specified by n correct digits to approximate a number which requires more than n digits (generally infinitely many digits) for its exact specification.
• Truncation error: any error which is neither a gross error nor a roundoff error.
• Frequently, a truncation error corresponds to the fact that, whereas an exact result would be afforded (in the limit) by an infinite sequence of steps, the process is truncated after a certain finite number of steps.
Spring 2013 5
Common Measures of Error
• Definitions– total error = round off + truncation– Absolute error = | numerical – exact |– Relative error = Abs. error / | exact |
• If exact is zero, rel. error is not defined
Spring 2013 6
Ex: Round off error
Representation consists of finite number of digits
The approximation of real-number on the number line is discrete!
R
Spring 2013 7
Watch out for printf !!
• By default, “%f” prints out 6 digits behind decimal point.
Spring 2013 8
Ex: Numerical Differentiation
• Evaluating first derivative of f(x)
hxf
xfh
xfhxfxf
xfhxfxfhxf
hxfhxf
h
h
smallfor ,)('
)(")()(
)('
)(")(')()(
)()(
2
2
2
Truncationerror
Spring 2013 9
Numerical Differentiation (cont)
• Select a problem with known answer– So that we can evaluate the error!
300)10('
3)(')( 23
f
xxfxxf
Spring 2013 10
Numerical Differentiation (cont)
• Error analysis– h (truncation) error
• What happened at h = 0.00001?!
Spring 2013 11
Ex: Polynomial Deflation
• F(x) is a polynomial with 20 real roots
• Use any method to numerically solve a root, then deflate the polynomial to 19th degree
• Solve another root, and deflate again, and again, …
• The accuracy of the roots obtained is getting worse each time due to error propagation
)20()2)(1()( xxxxf
Spring 2013 12
Computer Representation of Floating Point Numbers
Decimal-binary conversion
Floating point VS. fixed point
Standard: IEEE 754 (1985)
Spring 2013 13
Decimal-Binary Conversion• Ex: 29(base 10)
0
12mod1
221
12mod3
2223
12mod7
22227
02mod14
2222214
12mod29
22222229
65
4
04
1510
3
03
14
2510
2
02
13
24
3510
1
01
12
23
34
4510
0
00
11
22
33
44
5510
aa
a
aa
a
aaa
a
aaaa
a
aaaaa
a
aaaaaa
2)29 2)14 12) 7 02) 3 12) 1 12) 0 1
2910=111012
Spring 2013 14
Fraction Binary Conversion
• Ex: 0.625 (base 10)
25
14
0310
35
24
13
0210
45
34
23
12
0110
55
44
33
22
1110
222 000.1
2222 500.0
22222250.1
22222625.0
aaa
aaaa
aaaaa
aaaaa2
a1=12
a2=1
a3=1 a4= a5=…=0
Spring 2013 15
• Computing: • How about 0.110?0.625
1.2502
20.500
21.000
0.62510 = 0.1012
0.110 = 0.000112
Spring 2013 16
Floating VS. Fixed Point
• Decimal, 6 digits (positive number)– fixed point: with 5 digits after decimal point
• 0.00001, … , 9.99999
– Floating point: 2 digits as exponent (10-base); 4 digits for mantissa (accuracy)
• 0.001x1000, … , 9.999x1099
• Comparison:– Fixed point: fixed accuracy; simple math for
computation (used in systems w/o FPU)– Floating point: trade accuracy for larger range of
representation
Spring 2013 17
Floating Point Representation
• Fraction, f– Usually normalized so that
• Base, – 2 for personal computers– 16 for mainframe– …
• Exponent, e
ef
f0.1
Spring 2013 18
IEEE 754-1985
• Purpose: make floating system portable
• Defines: the number representation, how calculation performed, exceptions, …
• Single-precision (32-bit)
• Double-precision (64-bit)
Spring 2013 19
Number Representation
• S: sign of mantissa• Range (roughly)
– Single: 10-38 to 1038
– Double: 10-307 to 10307
• Precision (roughly) – Single: 7-8 significant
decimal digits
– Double: 15 significant decimal digits
308
10242
10
25.3082log1024log
22111
p
p
p
Spring 2013 20
Significant Digits
• In binary sense, 24 bits are significant (with implicit one – next page)
• In decimal sense, roughly 7-8 decimal significant digits
• When you write your program, make sure the results you printed carry the meaningful significant digits.
1
2-23
Spring 2013 21
Implicit One
• Normalized mantissa always 1.0– Only store the fractional part to increase one
extra bit of precision
• Ex: 3.5
12 211.11.115.0125.3
Spring 2013 22
Exponent Bias
• Ex: in single precision, exponent has 8 bits– 0000 0000 (0) to 1111 1111 (255)
• Add an offset to represent +/ – numbers– Effective exponent = biased exponent – bias– Bias value: 32-bit (127); 64-bit (1023)– Ex: 32-bit
• 1000 0000 (128): effective exp.=128-127=1
Spring 2013 23
Ex: Convert – 3.5 to 32-bit FP Number
1271281
2
211.1211.1
1.115.0125.3
05.3 1 s
210000000128 e
2000...1100 m
00000000 00000000 01100000 11000000
Spring 2013 24
Examine Bits of FP Numbers
• Explain how this program works
Spring 2013 25
The “Examiner”
• Use the previous program to – Observe how ME work– Test subnormal behaviors on your
computer/compiler– Convince yourself why the subtraction of two
nearly equal numbers produce lots of error– NaN: Not-a-Number !?
Spring 2013 26
Design Philosophy of IEEE 754
• [s|e|m]• S first: whether the number is +/- can be tested
easily• E before M: simplify sorting• Represent negative by bias (not 2’s complement)
for ease of sorting– [biased rep] –1, 0, 1 = 126, 127, 128– [2’s compl.] –1, 0, 1 = 0xFF, 0x00, 0x01
• More complicated math for sorting, increment/decrement
Spring 2013 27
Exceptions
• Overflow: – ±INF: when number exceeds the range of representation
• Underflow– When the number are too close to zero, they are treated
as zeroes
• Dwarf– The smallest representable number in the FP system
• Machine Epsilon (ME)– A number with computation significance (more later)
Spring 2013 28
Extremities
• E : (1…1) – M (0…0): infinity– M not all zeros; NaN (Not a Number)
• E : (0…0)– M (0…0): clean zero– M not all zero: dirty zero (see next page)
More later
Spring 2013 29
Not-a-Number
• Numerical exceptions– Sqrt of a negative number– Invalid domain of trigonometric functions– …
• Often cause program to stop running
Spring 2013 30
Extremities (32-bit)
• Max:
• Min (w/o stepping into dirty-zero)
11111111111111111111111011111110
1.
00000000000000000000000100000000
1.
(1.111…1)2254-127=(10-0.000…1) 21272128
(1.000…0)21-127=2-126
Spring 2013 31
Dirty-Zero (a.k.a. denormals)
• No “Implicit One”• IEEE 754 did not specify compatibility for
denormals• If you are not sure how to handle them, stay
away from them. Scale your problem properly– “Many problems can be solved by pretending
as if they do not exist”
a.k.a.: also known as
Spring 2013 32
Dirty-Zero (cont)
00000000 10000000 00000000 00000000
00000000 01000000 00000000 0000000000000000 00100000 00000000 0000000000000000 00010000 00000000 00000000
2-126
2-127
2-128
2-129
(Dwarf: the smallest representable)
R
0 2-126
denormals
dwarf
Spring 2013 33
Drawf (32-bit)
Value: 2-149Value: 2-149
Spring 2013 34
Machine Epsilon (ME)
• Definition– smallest non-zero number that makes a
difference when added to 1.0 on your working platform
• This is not the same as the dwarf
Spring 2013 35
Computing ME (32-bit)
1+epsGetting closer to 1.0
ME: (00111111 10000000 00000000 00000001)–1.0
= 2-23 1.12 10-7
Spring 2013 36
Effect of ME
Spring 2013 37
Significance of ME
• Never terminate the iteration on that 2 FP numbers are equal.
• Instead, test whether |x-y| < ME
Machine Epsilon (Wikipedia)
Spring 2013 38
Machine epsilon gives an upper bound on the relative error due to rounding in floating point arithmetic.
Spring 2013 39
Numerical Scaling
• Number density: there are as many IEEE 754 numbers between [1.0, 2.0] as there are in [256, 512]
• Revisit:– “roundoff” error– ME: a measure of real
number density near 1.0
• Implication:– Scale your problem so
that intermediate results lie between 1.0 and 2.0 (where numbers are dense; and where roundoff error is smallest)
R
Spring 2013 40
Scaling (cont)
• Performing computation on denser portions of real line minimizes the roundoff error– but don’t over do it; switch to double precision
will easily increase the precision– The densest part is near subnormal, if density is
defined as numbers per unit length
Spring 2013 41
How Subtraction is Performed on Your PC
• Steps: – convert to Base 2– Equalize the exponents by adjusting the
mantissa values; truncate the values that do not fit
– Subtract mantissa– normalize
Spring 2013 42
Subtraction of Nearly Equal Numbers
• Base 10: 1.24446 – 1.24445
1.
111011101000111010100…
–Significant loss of accuracy (most bits are unreliable)
Spring 2013 43
Theorem of Loss Precision
• x, y be normalized floating point machine numbers, and x>y>0
• If then at most p, at least q significant binary bits are lost in the subtraction of x-y.
• Interpretation:– “When two numbers are very close, their
subtraction introduces a lot of numerical error.”
qp
x
y 212
Spring 2013 44
Implications
• When you program: • You should write these instead:
11)( 2 xxf1111
1122
2
2
2
)11()(
x
x
x
xxxf
1)ln()( xxg )ln()ln()ln()(e
xexxg
Every FP operation introduces error, but the subtraction of nearly equal numbers is the worst and should be avoided whenever possible
Spring 2013 45
Efficiency Issues
• Horner Scheme
• program examples
Spring 2013 46
Horner Scheme
• For polynomial evaluation
• Compare efficiency
Spring 2013 47
Accuracy vs. Efficiency
Spring 2013 48
Good Coding Practice
Spring 2013 49
Storing Multidimensional Array in Linear Memory
C and others
Fortran, MATLAB
Spring 2013 50
On Accessing Arrays …
Which one is more
efficient?
Spring 2013 51
Issues of PI
• 3.14 is often not accurate enough– 4.0*atan(1.0) is a good substitute
Spring 2013 52
Compare:
Spring 2013 53
Exercise
• Explain why
• Explain why converge when implemented numerically
000,101.0000,100
0
i
4
1
3
1
2
11
1
1n n
Spring 2013 54
Exercise
• Why Me( ) does not work as advertised?
• Construct the 64-bit version of everything– Bit-Examiner– Dme( );
• 32-bit: int and float. Can every int be represented by float (if converted)?
Spring 2013 55
Understanding Your Platform
1
448
48
4
2
Memory word: 4 bytes on 32-bit machines
Spring 2013 56
PaddingHow about
Spring 2013 57
Data Alignment (data structure padding)
• Padding is only inserted when a structure member is followed by a member with a larger alignment requirement or at the end of the structure.
• Alignment requirement:
Spring 2013 58
Ex: Padding
// for Data2 to align on a 2-byte boundary
// no padding required; already on 4-byte boundary
// final padding to align a 4-byte boundary
sizeof (struct MixedData) = 12 bytes
Spring 2013 59
Data Alignment (cont)
• By changing the ordering of members in a structure, it is possible to change the amount of padding required to maintain alignment.
• Direct the compiler to ignore data alignment (align it on a 1-byte boundary)
Push current alignment to stack
Spring 2013 60
#include <stdio.h>
struct pad1 {char data1;short data2; int data3;char data4;
};
struct pad2 {int data3;short data2; char data1;char data4;
};
#pragma pack(push)#pragma pack(1)struct pad3 {
char data1;short data2; int data3;char data4;
};#pragma pack(pop)
main(){
printf ("pad1 size: %d\n", sizeof (struct pad1));printf ("pad2 size: %d\n", sizeof (struct pad2));printf ("pad3 size: %d\n", sizeof (struct pad3));
}
1288