Victor Kuliamin Institute for System Programming, Russian Academy of Sciences Moscow

Victor KuliaminInstitute for System Programming, Russian Academy of SciencesMoscow

Astrophysics Geosciences Biosciences Social Sciences

Confidence?

2 / 25

PSI 2009, June 19

Floating-point numbers + arithmetics Basic math functions

Elementary Special

Specialized libraries Linear algebra Number theory Numeric calculus Dynamic systems Optimization …

3 / 25

: sqrt, pow, exp, log, sin, atan, cosh, …: erf, tgamma, j0, y1, …

PSI 2009, June 19

Normal : E > 0 & E < 2k –1 X = (–1)S·2(E–B)·(1+M/2(n–k–1))

Denormal : E = 0 X = (–1)S·2(–

B+1)·(M/2(n–k–1)) Exceptional : E = 2k –1

M = 0 : +, – M ≠ 0 : NaN

4 / 25

sign

k+1 n-1

0

exponent mantissa

0 1 1 1 1 1 1 0 1 0 01 0 0 0 0 0 0 0 0 00 0 0 0 0 00 0 0 0

0 1 k

n, k

S E MB = 2(k–1) –1

2(–1)·1.1012 = 13/16 = 0,8125

0, -0

1/0 = +, (–1)/0 = –0/0 = NaN

n = 32, k = 8 – float (single precision)n = 64, k = 11 – doublen = 79, k = 15 – extended doublen = 128, k = 15 – quadruple

PSI 2009, June 19

1/2(n-k-1) – 1 ulp

Correct rounding – 4 rounding modes to + to – to 0 to the nearest

Exception flags INVALID : Incorrect arguments (NaN

result) DIVISION-BY-ZERO : Infinite result (precise ±∞) OVERFLOW : Too big result (approximate ±∞) UNDERFLOW : Too small (or denormal) result INEXACT : Inexact result

5 / 25

0

PSI 2009, June 19

6 / 25

PSI 2009, June 19

ID Processor arch Library OS

x86 i686 glibc 2.5 Linux Fedora

ia64 ia64 glibc 2.4 Linux Debian

x86_64 x86_64 glibc 2.3.4 Linux RHEL

s390 s390 glibc 2.4 Linux Debian

ppc64 ppc64 glibc 2.7 Linux Debian

ppc32 ppc32 glibc 2.3.5 Linux SLES

sparc UltraSparc III Solaris libc Solaris 10

VC8 x86_64 MS Visual C 2005 Windows XP

VC6 i686 MS Visual C 6.0 Windows XP

PSI 2009, June 19

7 / 25

x86

ia64

x86_64

s390

ppc64

ppc32

sparc

VC6

VC8

x86

ia64

x86_64

s390

ppc64

ppc32

sparc

VC6

VC8

j1 y0 y1

log10

tgamma

log2

lgamma

log1p

j0

exp2

atan erf

expm1 log

erfc

fabs logb sqrt cbrt exp

sin cos tan asin acos

trunc

asinh

rint

acosh

nearby int

atanh

ceil

sinh

floor

cosh

round

tanh

rint(262144.25)↑ = 262144

Exact

1 ulp errors*

2-5 ulp errors

6-210 ulp errors

210-220 ulp errors

>220 ulp errors

Errors in exceptional cases

Errors for denormals

Completely buggy Unsupported

logb(2−1074) = −1022

expm1(2.2250738585072e−308) = 5.421010862427522e−20

exp(−6.453852113757105e−02) = 2.255531908873594e+15

sinh(29.22104351584205) = −1.139998423128585e+12

cosh(627.9957549410666) = −1.453242606709252e+272

sin(33.63133354799544) = 7.99995094799809616e+22

sin(− 1.793463141525662e−76) = 9.801714032956058e−2

acos(−1.0) = −3.141592653589794

cos(917.2279304172412) = −13.44757421002838

erf(3.296656889776298) = 8.035526204864467e+8

erfc(−5.179813474865007) = −3.419501182737284e+287

to nearestto –∞

to +∞

to 0

exp(553.8042397037792) = −1.710893968937284e+239

PSI 2009, June 19

8 / 25

PSI 2009, June 19

9 / 25

PSI 2009, June 19

10 / 25

ceil floor round trunc rint nearby int fabs logb sqrt cbrt exp exp2 expm1 log log10 log2 log1p

x86

ia64

x86_64

s390

ppc64

ppc32

sparc

VC6

VC8

sinh cosh tanh asinh acosh atanh sin cos tan asin acos atan erf erfc tgamma lgamma j0 j1 y0 y1

x86

ia64

x86_64

s390

ppc64

ppc32

sparc

VC6

VC8

Unsupported

Standards IEEE 754 (Floating-point arithmetics)

FP numbers, basic operations ISO 9899 (C language and libraries)

56 real + 16 complex functions IEEE 1003.1 (POSIX)

63 real + 22 complex functions ISO 10697.1-3 (Language independent

arithmetics)Elementary real and complex functions

11 / 25

PSI 2009, June 19

type conversions, +, –, *, /, sqrt, remainder, fma (2008)

Correctly rounded results 4 rounding modes

Infinite results in overflow and precise infinity cases In overflow rounding to 0 returns the biggest finite number

NaN results outside of function domain (and for NaN args)

Exception flagsINVALID, DIVISION-BY-ZERO, OVERFLOW, UNDERFLOW,

INEXACT

12 / 25

PSI 2009, June 19

ISO/IEC 9899 (C language) : 54 real functions Exact values : sin(0) = 0, log(1) = 0, … DIVISION-BY-ZERO flag : log(0), atanh(1), pow(0,x), Г(-n) NaN results and INVALID flag outside of domains

IEEE 1003.1 (POSIX) : 63 real + 22 complex All IEEE 754 flags (except for INEXACT) for real functions errno setting

Domain error ~ INVALID or DIVISION-BY-ZERORange error ~ OVERFLOW or UNDERFLOW

If x is denormalf(x) = x for each f(x)~x in 0 (sin, asin, sinh, expm1…)

In overflow HUGE_VAL should be returned(value of HUGE_VAL unspecified)

13 / 25

Inconsistency with rounding modes

PSI 2009, June 19

Source of non-interoperability

glibc : +∞MSVCRT : max double (1.797693134862316e+308)Solaris libc : max float (3.402823466385289e+38)

Real and complex elementary functions (no erf, gamma, j0, y1, … )

Only symmetric rounding modes (no rounding to + or to –)

Preservation of sign Preservation of monotonicity Inaccuracy 0.5-2.0 ulp Evenness and oddity Exact values : cosh(0) = 1, log(1) = 0, … Asymptotics near 0 : cos(x) ~ 1, sin(x) ~ x, … Relations : expm1 <= exp, cosh >= sinh, atan <=

↓( π/2 ) , …

14 / 25

for sin, cos, tan – small arguments only

PSI 2009, June 19

Domain boundaries and poles (+ flags) Exact values, limits and asymptotics Preservation of sign and monotonicity Symmetries

Evenness, periodicity, others : Г(1+x) = x·Г(x) Relations and range boundaries Precision

Correct rounding (according to mode) Computational accuracy Interoperability and portability

of libraries and applications Feasible – ~ia64 (Intel), crlibm (INRIA)

15 / 25

PSI 2009, June 19

|Correct rounding

Oddity (sym. with –x, 1/x)

16 / 25

Range boundariesPOSIX : f(x) = x for

denormal x and f(x)~x in 0

PSI 2009, June 19

POSIX : HUGE_VAL

instead of +∞

Extension of IEEE 754 on all library functions

Correctly rounded results for 4 modes Except for ones contradicting to range boundaries

Infinite results in overflow and precise infinity cases In overflow rounding to 0 returns the biggest finite number

NaN results outside of function domain (and for NaN args) Exception flags

INVALID (and EDOM for errno) : Incorrect arguments

DIVISION-BY-ZERO (and ERANGE for errno) : Infinite result OVERFLOW (and ERANGE for errno) : Too big result UNDERFLOW (and ERANGE for errno) : Too small result ( + dnr) INEXACT : Inexact result

17 / 25

PSI 2009, June 19

Bit structure of FP numbers Boundaries

o 0, -0, +, -, NaN o Least and greatest positive and negative, normal and denormal

Mantissa patternsFFFFFFFFFFFFF16 FFFFF1111000016 555550000FFFF16

Both arguments and values of a function

Intervals of uniform function behavior

Points hard to compute correctly rounded

result

18 / 25

PSI 2009, June 19

rint(262144.25)↑ = 2621440100000100010000000000000000000100000000000000000000000000000000x10000010001xxxxxxxxxxxxxxxxxx0100000000000000000000000000000000

Neighbourhoods of 0, ±∞ Poles and overflow points Zeroes and extremes Tangents and asymtotics –

horizontal and diagonal

19 / 25

max

0

PSI 2009, June 19

tan(1.11011111111111111111111111111111111111111111000111112·2-22) = 1.1110000000000000000000000000000000000000000101010001 0 178 010…2·2-22

sin(1.11100000000000000000000000000000000000000111000010002·2-19) = 1.1101111111111111111111111111111111111100000010111000 067 11101…2·2-19

j1(1.10000000000000000000000000000000000000000000000000112·2-23) = 1.0111111111111111111111111111111111111111111111101000 094 11001…2·2-22

20 / 25

Rounding to the nearestf = x.xxxxxxxxxx|011111111...1xx...f = x.xxxxxxxxxx|100000000...0xx...

Rounding to 0, +, -f = x.xxxxxxxxxx|00000000...0xx...f = x.xxxxxxxxxx|11111111...1xx...

?!

PSI 2009, June 19

0,5 ulp

PSI 2009, June 19

Probabilistic evaluationUniform independent bits distribution Total N = 2(n-k-1) values ~N·2-m have m consecutive equal

bits

Real data for sin on exponent -16

21 / 25

Eval. 0, +, - N

54 0.5 0 1

53 1 1 2

52 2 4 4

51 4 6 6

50 8 10 12

49 16 19 21

48 32 32 37

47 64 70 67

46 128 142 106

45 256 280 239

44 512 547 518

43 1024 1073 996

42 2048 2103 1985

41 4096 4187 4040

40 8192 8325 8142

PSI 2009, June 19

Exhaustive search Continued fractions (Kahan, 1983) Dyadic method (Tang, 1989; Kahan, 1994) Reduced search (Lefevre, 1997) Lattice reduction (Gonnet, 2002; Stehle, Lefevre,

Zimmermann, 2003) Integer secants method (2007)

...2921

1

115

17

13π

22 / 25

Feasible only for single precision numbers

X ≈ N·π; X = M·2m; 2(n – k – 1) <= M < 2(n – k)

π ≈ (2m·M)/N

3386417804515981120643892082331156599120239393299838035242121518428537554064774221620930267583474709602068045686026362989271814411863708499869721322715946622634302011697632972907922558892710830616034038541342154669787134871905353772776431251615694251273653 · π/2 = 1.0110101011000101101100100110001011001010000111111110 1857 011…2·2849

sin(1.01101010110001011011001001100010110010100001111111112·2849) =1.11111111111111111111111111111111111111111111111111 1690110…2·2-1

sqrt(N·2m) ≈ M + ½; 2(n-k-1) <= M, N < 2(n-k) 2(m+2)·N = (2·M + 1)2 – j (2·M + 1)2 = j (mod 2(m+2))

j = 15

sqrt(1.00100101011001010110010111001010110111001011111101002) =

1.0001001000001111100110011001111010011001001101110100 0 150 000…2

F(x) = f(x) – a·x – b = c1x2 + c2x3 + c3x4 + …

F(x) = c1(G(x) )2, G(x) = x + d1x2 + d2x3 +…

G(x) = y x = H(y), H is the reversed series

xm = H(sqrt(m/c12z)) F(xm) – a·xm – b = m/2z

2–z

Hard points double

o Some hard points with ≥ 48 additional bits can be found in crlibm tests

http://lipforge.ens-lyon.fr/projects/crlibmo Calculated (some) hard points with ≥ 40 additional bits for

sqrt, cbrt, sin, asin, cos, acos, tan, atan, sinh, asinh, cosh, tanh, atanh, exp, log, exp2, expm1, log1p, erf, erfc, j0, j1

float (single precision)o All hard points with ≥ 17 additional bits for sqrt, cbrt, exp, sin, cos

extended doubleo All with ≥ 53 additional bits for sqrt, some for sin, exp

Test suites developed double : all 37 single real variable POSIX functions

Correct values calculated by Maple and MPFR

23 / 25

PSI 2009, June 19

sqrt exp sin atan lgamma j1

Boundary 20 20 20 20 20 20

Intrevals 106 1622 3674 4242 11680 24538

Patterns 141009 138451 331744 155008 121502 109036

Hard points 170170 28587 62342 95512 0 29436

Other 84820 0 4616 0 229 5664

Total 396125 168680 402396 254782 133431 168694

No adequate standards for math librariesSeveral standards, sometimes inconsistent, highly incomplete

Correct rounding is needed for interoperability

Test suites are useful even without standard

24 / 25

PSI 2009, June 19

? Complete set of hard points for some function

? Multiple variable functions

Contact E-mail: [email protected] Web: www.ispras.ru/~kuliamin

Thank you!Questions?

25 / 25

PSI 2009, June 19

Documents

Victor Kuliamin Institute for System Programming, Russian Academy of Sciences Moscow