02 DSD-NL 2016 - Simona Gebruikersmiddag - Floating point onnauwkeurigheid en gevoeligheid van rekenmodellen - Werner Kramer, VORtech

VORtechfloating point arithmeticThe consistency, accuracy and performance dilemma

Werner Kramer

17-05-2016

17-05-2016

VORtech 20 jaar (en trakteerde kennis)

http://www.meetup.com/VORtech-Scientific-Software-Engineering/

17-05-2016

floating point arithmetic

• IEEE 754 Binary floating-point standard

• Pick two:

• Reproducibility

• Accuracy

• Performance

• Testing results with finite

accuracy

17-05-2016

SIMONA-4315Results depend on the location of the grid fileSIMONA-4328Unexpected differences when cleaning roughcombination-files

PS. The RMM-model is known to be slightly unstable.

17-05-2016

SIMONA-4256test-models give different results for release2014 and trunk

Change in revision r5784:waquaref.tab#14 RESTART NCHAR = 7 OPT#IREP = 1 NAME = EXP_RESTART NCHAR = 3 JREP = 1 TYPE = 3 OPTIREP = 1 NAME = SDS_RESTART NCHAR = 3 JREP = 1 TYPE = 3 MAND

+IREP = 1 NAME = TIME_EPS NCHAR = 8 JREP = 1 TYPE = 2 DEF = 1e-4#

Triggered a difference in a matrix factorization (with duplicate eigen values),which resulted in up to 10 cm difference in the water level.

Differences disappear when compiling with debug flags.

17-05-2016 6

IEEE Floating point number

fraction between 1 and 2 (significand)most significant bit (MSB) not stored

0.15625 * 2 = 0.31250 00.31250 * 2 = 0.62500 00.62500 * 2 = 1.25000 10.25000 * 2 = 0.50000 00.50000 * 2 = 1.00000 1

0. 00101

1. 01 2

biased exponent = exponent + 127

sign bit: 0 = positive, 1 = negative

0. 1562532 bit memory value for a decimal number

17-05-2016 7

Some examples

decimal S exponent fraction1.000000 0 01111111 00000000000000000000000-1.000000 1 01111111 000000000000000000000000.5000000 0 01111110 000000000000000000000000.1000000 0 01111011 100110011001100110011010.7578125 0 01111110 10000100000000000000000

0.0000E+00 0 00000000 000000000000000000000000.0000E+00 1 00000000 00000000000000000000000Infinity 0 11111111 00000000000000000000000-Infinity 1 11111111 00000000000000000000000(s)NaN * 11111111 (0)1**********************

• 1.0/0.0 = • 1.0/0.0 = • 0.0/0.0 = 0.0 × = NaN• sqrt(-1.) = NaN

17-05-2016

Distribution of values

• Spacing is dependent on the value of the exponent

• Without denormals there would be a gap around zero

17-05-2016 9

Rounding errors

Addition

3.0 +6.0 1.10 × 2

1.10 × 2 +1.10 × 20.11 × 2 +

10.01 × 21.001 × 2 ?

a) round to nearest (roundTiesToEven ) 1.00 × 2 = 8

b) round down (towards ) 1.00 × 2 = 8

c) round up (towards ) 1.01 × 2 = 10

d) round towards zero 1.00 × 2 = 8

17-05-2016

Factors that affect reproducibility

• Floating-point semantics

• Use of higher-precision intermediate results

fused multiply add instruction (fma) A*x + y

• Differences in math libraries (e.g. sin function)-fimf-arch-precision=(high, medium, low)

• Data alignment changing vectorization

• Parallelism changing operation order

• Implementation differences between processors

-fimf-arch-consistency=true math library gives same results

across processors

17-05-2016

Reassociation

• Addition and multiplication are mathematically associative,

but not computationally associative

• (a+b)+c = a+(b+c)

• (a*b)*c = a*(b*c)

• Divide using multiply by reciprocal x*y => x*(1/y)

• C and C++ disallow reassociation, specify left-to-right order• Fortran allows reordering as long as parentheses are honored

(–assume protect_parens)• Compiler may not obey these by default

17-05-2016

Reassociation

integer ::i, nreal, dimension(n) :: A = 1.0real :: C = -1.0, tiny = 1e-20

do i = 1, nA(i) = A(i) + C + tiny

end do

original code optimized codeinteger ::i, nreal, dimension(n) :: A = 1.0real :: C = -1.0, tiny = 1e-20

C + tinydo i = 1, n

A(i) = A(i) + Cend do

-fp-model keyword

• fast : value-unsafe optimizations (default)

• precise(source): value-safe optimizations only

• strict : precise + diable fma

17-05-2016

Vectorization

• Vector operation works on multiple data at once (e.g. 16 byte

block = 4 reals)

• Vectorized math functions are very slightly less accurate but

faster than the scalar versions

• Unaligned data -> both scalar and vector versions are called

• Can change results run-to-run!

• OS stack alignment

• Address Space Layout Randomization

17-05-2016

Vectorization

https://software.intel.com/en-us/articles/what-are-peel-and-remainder-loops-fortran-vectorization-support

0x00

0x04

0x08

0x0c

0x10

0x14

0x18

0x1c

0x20

0x24

0x28

real(kind=4),dimension(DIM_A) :: x

16 byte

16 byte

vectorized loop:vector operation works on 16byte block at once (SSE2)

do i=1,DIM_Aa(i) = sin(a(i))

end do

remainder:scalar operation on remainingarray elements

peel loop:loop iterations in scalar modeuntil it reaches a 16 byteboundary

17-05-2016

SIMONA testing & improvements

Field max(dif) time 99%(dif) rms(dif) mean(dif)solution_flow.sep.sep 0.001162 1395.00 0.000525 0.000170 0.000119solution_flow.up.up 0.006302 975.00 0.001386 0.000284 0.000106solution_flow.vp.vp 0.011412 1185.00 0.002075 0.000657 0.000127

test suite containing a large number of models

test suite has quantify random option -qrinverts loop when solving matrix system

use -fp-model source to validate codemodifications

change sensitive parts to double precision

release with -fp-model source?

17-05-2016

Technical/legacySoftware correctness is determined by comparison to previous(baseline) results.

Debugging/portingWhen developing and debugging, a higher degree of run-to-run stabilityis required to find potential problems.

LegalAccreditation or approval of software might require exact reproduction ofpreviously defined results.

Customer perceptionDevelopers may understand the technical issues with reproducibility butstill require reproducible results since end users or customers will bedisconcerted by the inconsistencies.

Why Reproducibility

Science

02 DSD-NL 2016 - Simona Gebruikersmiddag - Floating point onnauwkeurigheid en gevoeligheid van rekenmodellen - Werner Kramer, VORtech