Upload
victoria-henry
View
230
Download
3
Tags:
Embed Size (px)
Citation preview
Victor KuliaminInstitute for System Programming, Russian Academy of SciencesMoscow
Astrophysics Geosciences Biosciences Social Sciences
Confidence?
2 / 25
PSI 2009, June 19
Floating-point numbers + arithmetics Basic math functions
Elementary Special
Specialized libraries Linear algebra Number theory Numeric calculus Dynamic systems Optimization …
3 / 25
: sqrt, pow, exp, log, sin, atan, cosh, …: erf, tgamma, j0, y1, …
PSI 2009, June 19
Normal : E > 0 & E < 2k –1 X = (–1)S·2(E–B)·(1+M/2(n–k–1))
Denormal : E = 0 X = (–1)S·2(–
B+1)·(M/2(n–k–1)) Exceptional : E = 2k –1
M = 0 : +, – M ≠ 0 : NaN
4 / 25
sign
k+1 n-1
0
exponent mantissa
0 1 1 1 1 1 1 0 1 0 01 0 0 0 0 0 0 0 0 00 0 0 0 0 00 0 0 0
0 1 k
n, k
S E MB = 2(k–1) –1
2(–1)·1.1012 = 13/16 = 0,8125
0, -0
1/0 = +, (–1)/0 = –0/0 = NaN
n = 32, k = 8 – float (single precision)n = 64, k = 11 – doublen = 79, k = 15 – extended doublen = 128, k = 15 – quadruple
PSI 2009, June 19
1/2(n-k-1) – 1 ulp
Correct rounding – 4 rounding modes to + to – to 0 to the nearest
Exception flags INVALID : Incorrect arguments (NaN
result) DIVISION-BY-ZERO : Infinite result (precise ±∞) OVERFLOW : Too big result (approximate ±∞) UNDERFLOW : Too small (or denormal) result INEXACT : Inexact result
5 / 25
0
PSI 2009, June 19
6 / 25
PSI 2009, June 19
ID Processor arch Library OS
x86 i686 glibc 2.5 Linux Fedora
ia64 ia64 glibc 2.4 Linux Debian
x86_64 x86_64 glibc 2.3.4 Linux RHEL
s390 s390 glibc 2.4 Linux Debian
ppc64 ppc64 glibc 2.7 Linux Debian
ppc32 ppc32 glibc 2.3.5 Linux SLES
sparc UltraSparc III Solaris libc Solaris 10
VC8 x86_64 MS Visual C 2005 Windows XP
VC6 i686 MS Visual C 6.0 Windows XP
PSI 2009, June 19
7 / 25
x86
ia64
x86_64
s390
ppc64
ppc32
sparc
VC6
VC8
x86
ia64
x86_64
s390
ppc64
ppc32
sparc
VC6
VC8
j1 y0 y1
log10
tgamma
log2
lgamma
log1p
j0
exp2
atan erf
expm1 log
erfc
fabs logb sqrt cbrt exp
sin cos tan asin acos
trunc
asinh
rint
acosh
nearby int
atanh
ceil
sinh
floor
cosh
round
tanh
rint(262144.25)↑ = 262144
Exact
1 ulp errors*
2-5 ulp errors
6-210 ulp errors
210-220 ulp errors
>220 ulp errors
Errors in exceptional cases
Errors for denormals
Completely buggy Unsupported
logb(2−1074) = −1022
expm1(2.2250738585072e−308) = 5.421010862427522e−20
exp(−6.453852113757105e−02) = 2.255531908873594e+15
sinh(29.22104351584205) = −1.139998423128585e+12
cosh(627.9957549410666) = −1.453242606709252e+272
sin(33.63133354799544) = 7.99995094799809616e+22
sin(− 1.793463141525662e−76) = 9.801714032956058e−2
acos(−1.0) = −3.141592653589794
cos(917.2279304172412) = −13.44757421002838
erf(3.296656889776298) = 8.035526204864467e+8
erfc(−5.179813474865007) = −3.419501182737284e+287
to nearestto –∞
to +∞
to 0
exp(553.8042397037792) = −1.710893968937284e+239
PSI 2009, June 19
8 / 25
PSI 2009, June 19
9 / 25
PSI 2009, June 19
10 / 25
ceil floor round trunc rint nearby int fabs logb sqrt cbrt exp exp2 expm1 log log10 log2 log1p
x86
ia64
x86_64
s390
ppc64
ppc32
sparc
VC6
VC8
sinh cosh tanh asinh acosh atanh sin cos tan asin acos atan erf erfc tgamma lgamma j0 j1 y0 y1
x86
ia64
x86_64
s390
ppc64
ppc32
sparc
VC6
VC8
Unsupported
Standards IEEE 754 (Floating-point arithmetics)
FP numbers, basic operations ISO 9899 (C language and libraries)
56 real + 16 complex functions IEEE 1003.1 (POSIX)
63 real + 22 complex functions ISO 10697.1-3 (Language independent
arithmetics)Elementary real and complex functions
11 / 25
PSI 2009, June 19
type conversions, +, –, *, /, sqrt, remainder, fma (2008)
Correctly rounded results 4 rounding modes
Infinite results in overflow and precise infinity cases In overflow rounding to 0 returns the biggest finite number
NaN results outside of function domain (and for NaN args)
Exception flagsINVALID, DIVISION-BY-ZERO, OVERFLOW, UNDERFLOW,
INEXACT
12 / 25
PSI 2009, June 19
ISO/IEC 9899 (C language) : 54 real functions Exact values : sin(0) = 0, log(1) = 0, … DIVISION-BY-ZERO flag : log(0), atanh(1), pow(0,x), Г(-n) NaN results and INVALID flag outside of domains
IEEE 1003.1 (POSIX) : 63 real + 22 complex All IEEE 754 flags (except for INEXACT) for real functions errno setting
Domain error ~ INVALID or DIVISION-BY-ZERORange error ~ OVERFLOW or UNDERFLOW
If x is denormalf(x) = x for each f(x)~x in 0 (sin, asin, sinh, expm1…)
In overflow HUGE_VAL should be returned(value of HUGE_VAL unspecified)
13 / 25
Inconsistency with rounding modes
PSI 2009, June 19
Source of non-interoperability
glibc : +∞MSVCRT : max double (1.797693134862316e+308)Solaris libc : max float (3.402823466385289e+38)
Real and complex elementary functions (no erf, gamma, j0, y1, … )
Only symmetric rounding modes (no rounding to + or to –)
Preservation of sign Preservation of monotonicity Inaccuracy 0.5-2.0 ulp Evenness and oddity Exact values : cosh(0) = 1, log(1) = 0, … Asymptotics near 0 : cos(x) ~ 1, sin(x) ~ x, … Relations : expm1 <= exp, cosh >= sinh, atan <=
↓( π/2 ) , …
14 / 25
for sin, cos, tan – small arguments only
PSI 2009, June 19
Domain boundaries and poles (+ flags) Exact values, limits and asymptotics Preservation of sign and monotonicity Symmetries
Evenness, periodicity, others : Г(1+x) = x·Г(x) Relations and range boundaries Precision
Correct rounding (according to mode) Computational accuracy Interoperability and portability
of libraries and applications Feasible – ~ia64 (Intel), crlibm (INRIA)
15 / 25
PSI 2009, June 19
|Correct rounding
Oddity (sym. with –x, 1/x)
16 / 25
Range boundariesPOSIX : f(x) = x for
denormal x and f(x)~x in 0
PSI 2009, June 19
POSIX : HUGE_VAL
instead of +∞
Extension of IEEE 754 on all library functions
Correctly rounded results for 4 modes Except for ones contradicting to range boundaries
Infinite results in overflow and precise infinity cases In overflow rounding to 0 returns the biggest finite number
NaN results outside of function domain (and for NaN args) Exception flags
INVALID (and EDOM for errno) : Incorrect arguments
DIVISION-BY-ZERO (and ERANGE for errno) : Infinite result OVERFLOW (and ERANGE for errno) : Too big result UNDERFLOW (and ERANGE for errno) : Too small result ( + dnr) INEXACT : Inexact result
17 / 25
PSI 2009, June 19
Bit structure of FP numbers Boundaries
o 0, -0, +, -, NaN o Least and greatest positive and negative, normal and denormal
Mantissa patternsFFFFFFFFFFFFF16 FFFFF1111000016 555550000FFFF16
Both arguments and values of a function
Intervals of uniform function behavior
Points hard to compute correctly rounded
result
18 / 25
PSI 2009, June 19
rint(262144.25)↑ = 2621440100000100010000000000000000000100000000000000000000000000000000x10000010001xxxxxxxxxxxxxxxxxx0100000000000000000000000000000000
Neighbourhoods of 0, ±∞ Poles and overflow points Zeroes and extremes Tangents and asymtotics –
horizontal and diagonal
19 / 25
max
0
PSI 2009, June 19
tan(1.11011111111111111111111111111111111111111111000111112·2-22) = 1.1110000000000000000000000000000000000000000101010001 0 178 010…2·2-22
sin(1.11100000000000000000000000000000000000000111000010002·2-19) = 1.1101111111111111111111111111111111111100000010111000 067 11101…2·2-19
j1(1.10000000000000000000000000000000000000000000000000112·2-23) = 1.0111111111111111111111111111111111111111111111101000 094 11001…2·2-22
20 / 25
Rounding to the nearestf = x.xxxxxxxxxx|011111111...1xx...f = x.xxxxxxxxxx|100000000...0xx...
Rounding to 0, +, -f = x.xxxxxxxxxx|00000000...0xx...f = x.xxxxxxxxxx|11111111...1xx...
?!
PSI 2009, June 19
0,5 ulp
PSI 2009, June 19
Probabilistic evaluationUniform independent bits distribution Total N = 2(n-k-1) values ~N·2-m have m consecutive equal
bits
Real data for sin on exponent -16
21 / 25
Eval. 0, +, - N
54 0.5 0 1
53 1 1 2
52 2 4 4
51 4 6 6
50 8 10 12
49 16 19 21
48 32 32 37
47 64 70 67
46 128 142 106
45 256 280 239
44 512 547 518
43 1024 1073 996
42 2048 2103 1985
41 4096 4187 4040
40 8192 8325 8142
PSI 2009, June 19
Exhaustive search Continued fractions (Kahan, 1983) Dyadic method (Tang, 1989; Kahan, 1994) Reduced search (Lefevre, 1997) Lattice reduction (Gonnet, 2002; Stehle, Lefevre,
Zimmermann, 2003) Integer secants method (2007)
...2921
1
115
17
13π
22 / 25
Feasible only for single precision numbers
X ≈ N·π; X = M·2m; 2(n – k – 1) <= M < 2(n – k)
π ≈ (2m·M)/N
3386417804515981120643892082331156599120239393299838035242121518428537554064774221620930267583474709602068045686026362989271814411863708499869721322715946622634302011697632972907922558892710830616034038541342154669787134871905353772776431251615694251273653 · π/2 = 1.0110101011000101101100100110001011001010000111111110 1857 011…2·2849
sin(1.01101010110001011011001001100010110010100001111111112·2849) =1.11111111111111111111111111111111111111111111111111 1690110…2·2-1
sqrt(N·2m) ≈ M + ½; 2(n-k-1) <= M, N < 2(n-k) 2(m+2)·N = (2·M + 1)2 – j (2·M + 1)2 = j (mod 2(m+2))
j = 15
sqrt(1.00100101011001010110010111001010110111001011111101002) =
1.0001001000001111100110011001111010011001001101110100 0 150 000…2
F(x) = f(x) – a·x – b = c1x2 + c2x3 + c3x4 + …
F(x) = c1(G(x) )2, G(x) = x + d1x2 + d2x3 +…
G(x) = y x = H(y), H is the reversed series
xm = H(sqrt(m/c12z)) F(xm) – a·xm – b = m/2z
2–z
Hard points double
o Some hard points with ≥ 48 additional bits can be found in crlibm tests
http://lipforge.ens-lyon.fr/projects/crlibmo Calculated (some) hard points with ≥ 40 additional bits for
sqrt, cbrt, sin, asin, cos, acos, tan, atan, sinh, asinh, cosh, tanh, atanh, exp, log, exp2, expm1, log1p, erf, erfc, j0, j1
float (single precision)o All hard points with ≥ 17 additional bits for sqrt, cbrt, exp, sin, cos
extended doubleo All with ≥ 53 additional bits for sqrt, some for sin, exp
Test suites developed double : all 37 single real variable POSIX functions
Correct values calculated by Maple and MPFR
23 / 25
PSI 2009, June 19
sqrt exp sin atan lgamma j1
Boundary 20 20 20 20 20 20
Intrevals 106 1622 3674 4242 11680 24538
Patterns 141009 138451 331744 155008 121502 109036
Hard points 170170 28587 62342 95512 0 29436
Other 84820 0 4616 0 229 5664
Total 396125 168680 402396 254782 133431 168694
No adequate standards for math librariesSeveral standards, sometimes inconsistent, highly incomplete
Correct rounding is needed for interoperability
Test suites are useful even without standard
24 / 25
PSI 2009, June 19
? Complete set of hard points for some function
? Multiple variable functions
Contact E-mail: [email protected] Web: www.ispras.ru/~kuliamin
Thank you!Questions?
25 / 25
PSI 2009, June 19