13
IEEE Fast Square Root Ref: Graphics Gems III; 2 Spring 2003

IEEE Fast Square Root

  • Upload
    matsu

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

IEEE Fast Square Root. Ref: Graphics Gems III; 2 Spring 2003. Motivation. Square root operations are frequently used in many applications (e.g., computer graphics) Usually speed is more important than accuracy Is there any way faster than sqrt( )? Idea: tabulated sqrt!. - PowerPoint PPT Presentation

Citation preview

Page 1: IEEE Fast Square Root

IEEE Fast Square Root

Ref: Graphics Gems III; 2

Spring 2003

Page 2: IEEE Fast Square Root

2

Motivation

• Square root operations are frequently used in many applications (e.g., computer graphics)

• Usually speed is more important than accuracy

• Is there any way faster than sqrt( )?– Idea: tabulated sqrt!

Page 3: IEEE Fast Square Root

3

Math Background

123

24

22222 3), (say, odd is if

22 4), (say,even is if

:partexponent

22

21

21

21

221

21

mmme

e

mmee

For 52-bit mantissa (double), only limited cases need to be computed: 2252 entries; each entry with 52 bits

Negative exponents: same!

Page 4: IEEE Fast Square Root

4

Abridged Table

• Sacrifice accuracy for smaller tables

• Indexed by first 13 bits of mantissa only– Only 2213 entries; each entry with 20

significant binary bits

• Further accuracy, if required, can be obtained by one or two Newton iterations, using the tabulated value as initial guess

Try this yourself!

Page 5: IEEE Fast Square Root

5

How Numbers are Stored in Memory

SEEEEEEEEEEEMMMM MMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMM

Conceptually:

Byte swapping: Cautious when exchanging binary files and direct data access;But when we read/operate as the declared data, do not need to worry (it reads backward)

B0 B1 B2 B3 B4 B5 B6 B7

Stored (on PC): byte swapping; Least Significant Byte firstB7 B6 B5 B4 B3 B2 B1 B0

This is why the

examiner works

Page 6: IEEE Fast Square Root

6

Byte Swapping (cont)

float 3.5: 0x 4060 0000

double 3.5: 0x 400c 0000 0000 0000

short 1029: 0x 0405

int 218+5: 0x 0040 0005

05 04

05 00 40 00

00 00 60 40

00 00 00 00 00 00 0c 40

Use this program to see

for yourself

Page 7: IEEE Fast Square Root

7

Implementation

Setup Table

Evaluation

Page 8: IEEE Fast Square Root

8

Details: access M.S.Byte

B7 B6 B5 B4 B3 B2 B1 B0

f fi

Page 9: IEEE Fast Square Root

9

MMMMMMMMMMMMMMMM MMMMMMMMMMMMMMMM SEEEEEEEEEEEMMMM MMMMMMMMMMMMMMMM

13 7

f fi

Page 10: IEEE Fast Square Root

10

Evaluation

Page 11: IEEE Fast Square Root

11

Time a function

Page 12: IEEE Fast Square Root

12

Example

Page 13: IEEE Fast Square Root

13

Twice faster; but note the overhead for

building up the tables