A Root of Less Evil I - Université de Montréal · 2008-02-08 · A Root of Less Evil IEEE SIGNAL PROCESSING MAGAZINE [93] MARCH 2005 I n this article, we discuss several useful

[dsp TIPS&TRICKS]Mark Allie and Richard Lyons

A Root of Less Evil

IEEE SIGNAL PROCESSING MAGAZINE [93] MARCH 2005

In this article, we discuss several useful“tricks” for estimating the square rootof a number. Our focus will be onhigh-speed (minimum computations)techniques for approximating the

square root of a single value as well as thesquare root of a sum of squares for quadra-ture (I/Q) vector magnitude estimation.

In the world of DSP, the computationof a square root is found in many applica-tions: calculating root mean squares,computing the magnitudes of fastFourier transform (FFT) results, imple-menting automatic gain control (AGC)techniques, estimating the instantaneousenvelope of a signal (AM demodulation)in digital receivers, and implementingthree-dimensional (3-D) graphics algo-rithms. The fundamental tradeoff whenchoosing a particular square root algo-rithm is execution speed versus algo-rithm accuracy. In this article, wedisregard the advice of legendary lawmanWyatt Earp (1848–1929), “Fast is impor-tant, but accuracy is everything. . . ,” andconcentrate on various high-speed squareroot approximations. In particular, wefocus on algorithms that can be imple-mented efficiently in fixed-point arith-metic. We have also set other constraints:no divisions are allowed; only a smallnumber of computational iterations arepermitted; and only a small look-up table(LUT), if necessary, is allowed.

The first two methods belowdescribe ways to estimate the squareroot of a single value using iterativemethods. The last two techniques aremethods for estimating the magnitudeof a complex number.

NEWTON-RAPHSONINVERSE METHODA venerable technique for computing thesquare root of x is the so-named “Newton-

Raphson square root” iterative techniqueto find y(n), the approximation of

√x ≈ y(n + 1) = [y(n) + x/y(n)]/2.

(1)

Variable n is the iteration index, and y(0)

is an initial guess of √

x used to computey(1). Successive iterations of (1) yieldmore accurate approximations to

√x.

However, to avoid the computational-ly expensive division by y(n) in (1), wecan use the iterative Newton-Raphsoninverse (NRI) method. First, we find theapproximation to the inverse square rootof x using (2), where p = 1/

√x. Next, we

compute the square root of x by multi-plying the final p by x

p(n + 1) = 0.5 p(n)[3 − xp(n)2]. (2)

Two iterations of (2) provide surprisinglygood accuracy. The initial inverse squareroot value p(0), as defined by (3), is usedin the first iteration

p(0) = 1/(2x/3 + 0.354167). (3)

Indeed, (3) was selected because it is thereciprocal of the initial value expressionused in the next iterative square rootmethod presented below.

The square root function looks morelinear when we restrict the range of ourx input. As it turns out, it’s convenient tolimit the range of x to 0.25 � x < 1. Soif x < 0.25, then it must be normalizedand, fortunately, we have a slick way todo so. When x needs normalization, thenx is multiplied by 4n until 0.25 � x < 1.A factor of four is two bits of arithmeticleft shift. The final square root result isdenormalized by a factor of 2n (the

square root of 4n). A denormalizing fac-tor of two is a single arithmetic rightshift. Implementing this normalizationensures 0.25 � x < 1. The error curvefor this two-iteration NRI square rootmethod is shown in Figure 1, where wesee the maximum error is roughly0.0008%. The curve is a plot of the esti-mated square root divided by the truesquare root of x. That maximum errorvalue is impressively small; it is almostworth writing home about because noarithmetic division is needed. But what ifour data throughput requirements onlyallow us to perform one iteration of (2)?In that case, the maximum NRI methoderror is 0.24%. We see a truism of signalprocessing here—improved accuracyrequires additional computations.There’s no free lunch in DSP.

“DSP Tips and Tricks” introduces practi-

cal tips and tricks of design and imple-

mentation of signal processing

algorithms so that you may be able to

incorporate them into your designs.

We welcome readers who enjoy read-

ing this column to submit their contri-

butions. Contact Associate Editors Rick

Lyons ([email protected]) or Amy Bell

([email protected]).

1053-5888/05/$20.00©2005IEEE

[FIG1] NIIRF implementation.

LUT

+ +yx

−

β

z −1

[dsp TIPS&TRICKS] continued


This NRI method is not recommend-ed for fixed-point format, with its signand fractional bits. The coefficients,p(n), and intermediate values in (2) aregreater than one. In fixed-point math,using bits to represent internal resultsgreater than one increases the error bya factor of two per bit. (Certainly, usingmore bits for the integer part of inter-mediate values without taking themaway from the fractional part can bedone, but only at the cost of more CPUcycles and memory.)

NONLINEAR IIR FILTER METHODThe next iterative technique, by Mikamiet al. [1], is specifically aimed at fixed-point implementation. It is configured asa non-linear IIR filter (NIIRF) as depictedin Figure 2, where again input x hasbeen normalized to the range

0.25 � x < 1. The output is given by (4)where the constant β , called the itera-tion acceleration factor, depends on thevalue of input x

y(n + 1) = β[x − y(n)2] + y(n). (4)

The magnitude of the error of (4) isknown in advance when β = 1. The βacceleration factor scales the square rootupdate term, x − y(n)2, so the new esti-mate of y has less error. The accelerationfactor results in reduced error for a givennumber of iterations.

In its standard form, this techniqueuses an LUT to provide β and performstwo iterations of (4). For the first itera-tion, y(0) is initialized using

y(0) = 2x/3 + 0.354167. (5)

The constants in (5) were determinedempirically. The acceleration constantβ , based on x, is stored in an LUTmade up of 12 subintervals of x withan equal width of 1/16. This squareroot method is implemented with therange of x set between 4/16 and 16/16(12 regions), as was done for the NRImethod. This convenient intervaldirectly yields the offset for the LUTwhen using the four most significantmantissa bits of the normalized xvalue as a pointer, as shown in Table 1.

(A listing of a fixed-point DSP assem-bly code, simulated as an ADSP218xpart implementing this NIIRF. It isavailable at http:/ /www.cspl.umd.edu/spm/tips-n-tricks/. Various MAT-LAB square root modeling routines arealso available at this Web site.)

Figure 1 shows the normalizedNIIRF error curves as a function of x.(Normalized, in this instance, meansthe estimated square root is divided bythe true square of x.) The NRI squareroot method is the most accurate infloating-point math, followed by theNIIRF method. This is not surprisingconsidering the greater number ofoperations needed to implement theNRI method. The effect of the LUT forβ is clearly seen in Figure 1, where thecurve shows discontinuities (spikes) atthe table look-up boundaries. Theseare, of course, missing from the NRIerror curve.

One sensible step, when presentedwith algorithms such as the NIIRFsquare root method, is to experimentwith variations of the algorithm toinvestigate accuracy versus computa-tional cost. (We did that for the aboveNRI method by determining the maxi-mum error when only one iteration of(2) was performed.) With this exploreand investigate thought in mind, weexamined seven variations of this NIIRFalgorithm. The first variation, a simpli-fication, is the original NIIRF algo-rithm using only one iteration. We thenstudied the following quadratic func-tion of x to find β

β = 0.763x2 − 1.5688x + 1.314. (6)

Next, we used a linear function of x tofind β, as defined by

β = −0.61951x + 1.0688. (7)

For the quadratic and linear variationsused to compute β, we investigated theirperformance when using both one andtwo iterations of (4). [FIG2] Normalized error for the NRI and NIIRF methods.

NR Inverse Nonlinear IIR Filter

Per

cent

3

2

1

0

−1

−2

−3

−41/8 2/8 3/8 4/8 5/8 6/8 7/8 1

x

× 10−3 Error/SQRT(x)

4 MSBs β β

OF x FIXED-POINT FLT-POINT

0100 0x7b20 0.961914

0101 0x6b90 0.840332

0110 0x6430 0.782715

0111 0x5e10 0.734869

1000 0x5880 0.691406

1001 0x53c0 0.654297

1010 0x4fa0 0.622070

1011 0x4c30 0.595215

1100 0x4970 0.573731

1101 0x4730 0.556152

1110 0x4210 0.516113

1111 0x4060 0.502930

[TABLE1] βββ VERSUS NORMALIZED-X LUT.


For the final NIIRF method varia-tion, we set β equal to the constants0.633 and 0.64 for two and one iterationvariations, respectively. Table 2 providesa comparison of the error magnitudebehavior of the original two-iteration(LUT) NIIRF algorithm and the varia-tions detailed above.

For all the variations listed in Table2, the range of the input x was limitedto 0.25 � x < 1. (The term normal-ized, as used in Table 2, means the esti-mated square root divided by the truesquare of x. )

The error data in Table 2 was generatedusing double-precision floating-pointmath. Comparing the accuracy and com-plexity of the NRI and NIIRF methods, wemight ask why ever use the NIIRFmethod? Oftentimes, double-precisionfloating-point math is not available to us.Fixed-point data is one of the assumptionsof this article. This fixed-point constraintrequires that the algorithms must bereevaluated for the error when fixed-pointmath is used. It turns out this is the rea-son why Mikami et al. developed the NIIRFsquare root method in the first place [1].

In fixed-point math, most of theinternal values are close to each otherand are less than one. Thus, these valuesresult in a lower error using the NIIRFmethod compared to the NRI methodwhen both are implemented in fixed-point format. In fact, there is less errorin the fixed-point implementation of theNIIRF method than in the floating-pointimplementation. All of the NIIRF varia-tions are well-suited to fixed-point math.

Next, we look at high-speed squareroot approximations used to estimate themagnitude of a complex number.

BINARY-SHIFT MAGNITUDEESTIMATIONWhen we want to compute the magnitudeM of the complex number I + jQ, theexact solution is

M =√

I2 + Q2. (8)

To approximate the square root opera-tion in (8), the following binary-shift

magnitude estimation algorithm can beused to estimate M using

M ≈ AMax + BMin (9)

where A and B are constants, Max is themaximum of either |I| or |Q|, and Min isthe minimum of |I| or |Q|.

Many combinations of A and B areprovided in [2], yielding various accura-cies for estimating M. However, of spe-cial interest is the combinationA = 15/16 and B = 15/32 using

M ≈ 1516

Max + 1532

Min. (10)

This algorithm’s appeal is that itrequires no explicit multiples becausethe A and B values are binary fractionsand the formula is implemented withsimple binary right-shifts, additions,and subtractions. (For example, a 15/32times z multiply can be performed byfirst shifting z right by four bits andsubtracting that number from z toobtain 15/16 times z. That result isthen shifted right one bit). This algo-rithm estimates the magnitude M with

[FIG3] Error of binary-shift and equiripple-error methods.

MAX. NORMALIZED MEAN NORMALIZEDALGORITHM ERROR (%) ERROR (%)

NR INVERSE (NRI)

NRI, 2 Iters 8.4E-4 8.3E-5

NRI, 1 Iter 0.24 0.057

NIIRF FLOATING-PT

LUT β, 2 Iters 0.004 5.4E-4

LUT β, 1 Iter 0.099 0.026

Quad. β, 2 Iters 0.0013 2.8E-4

Quad. β, 1 Iter 0.056 0.019

Linear β, 2 Iters 0.024 0.0061

Linear β, 1 Iter 0.28 0.088

β = 0.633, 2 Iters 0.53 0.05

β = 0.64, 1 Iter 1.44 0.23

NIIRF FIXED-PT

NIIRF, 2 Iters 0.0035 5.1E-4

Quad. β, 2 Iters 0.0019 4.1E-4

Linear β, 2 Iters 0.011 0.0029

[TABLE2] ITERATIVE ALGORITHM NORMALIZED ABSOLUTE ERROR.

Percent Error8

6

4

2

00 10 20 30 40

Phase Angle (°)

A=1, B=0; A=7/8, B=1/2

A=15/16,B=15/32

EquirippleError

[dsp TIPS&TRICKS] continued


a maximum error of 6.2% and a meanerror of 3.1%. The percent error of thisbinary-shift magnitude estimationscheme is shown by the dashed curve inFigure 3 as a function of the anglebetween I and Q. (The curves in Figure3 repeat every 45º.)

At the expense of a compare opera-tion, we can improve the accuracy of (10)[3]. If Min � Max/4, we use the coeffi-cients A = 1 and B = 0 to compute (9);otherwise, if Min > Max/4, we useA = 7/8 and B = 1/2. This dual-coeffi-cient (and still multiplier-free) version ofthe binary-shift square root algorithmhas a maximum error of 3.0% and amean error of 0.95%, as shown by thedotted curve in Figure 3.

EQUIRIPPLE-ERRORMAGNITUDE ESTIMATION Another noteworthy scheme for comput-ing the magnitude of the complex numberI + jQ is the equiripple-error magnitude

estimation method by Filip [4]. This tech-nique, with a maximum error of roughly1%, is sweet in its simplicity: ifMin � 0.4142135Max, the complex num-ber’s magnitude is estimated using

M ≈ 0.99Max + 0.197Min. (11)

On the other hand, if Min > 0.4142135Max, M is estimated by

M ≈ 0.84Max + 0.561Min. (12)

This algorithm is so named because itsmaximum error is 1.0% for both (11)and (12). Its mean error is 0.6%.Because its coefficients are not simplebinary fractions, this method is bestsuited for implementations on pro-grammable hardware. The percenterror of this equiripple-error magni-tude estimation method is also shownin Figure 3, where we see the morecomputationally intensive method,equiripple-error, is more accurate. Asusual, accuracy comes at the cost ofcomputations. Although these methodsare less accurate than the iterativesquare root techniques, they can beuseful in those applications where highaccuracy is not needed, such as when Mis used to scale (control the amplitudeof) a system’s input signal.

One implementation issue to keep inmind when using integer arithmetic isthat, even though values |I| and |Q| maybe within your binary word width range,the estimated magnitude value may betoo large to be contained within thenumeric range. The practitioner mustlimit |I| and |Q| in some way to ensurethat the estimated M value does notcause overflows.

SUMMARYWe have discussed several square rootand complex vector magnitude approxi-mation algorithms, with a focus onhigh-throughput (high-speed) algo-rithms as opposed to high-accuracymethods. We investigated the perform-

ance of several variations of iterativeNRI and NIIRF square root methodsand found, not surprisingly, that thenumber of iterations has the most pro-found effect on accuracy. The NRImethod is not appropriate for imple-mentation using fixed-point fractionalbinary arithmetic, while the NIIRFtechnique and the two magnitude esti-mation schemes lend themselves nicelyto fixed-point implementation. Thethree magnitude estimation methodsare less accurate but more computa-tionally efficient than the NRI andNIIRF schemes. All algorithmsdescribed here are open to modificationand experimentation. This will makethem more accurate and computation-ally expensive, or less accurate andcomputationally “cheap.” Your accuracyand data throughput needs determinethe path you take to a root of less evil.

AUTHORSMark Allie is presently an assistant facul-ty associate in the Department ofElectrical and Computer Engineering atthe University of Wisconsin at Madison.He received his B.S. and M.S. degrees in1981 and 1983, respectively, in electricalengineering from the same department.He has been employed as an engineerand consultant in the design of analogand digital signal processing systems forover 20 years.

Richard Lyons is a consulting sys-tems engineer and lecturer with BesserAssociates in Mt. View, California. He isthe author of Understanding DigitalSignal Processing 2/E (Prentice-Hall,2004) and an associate editor for IEEESignal Processing Magazine.

REFERENCES[1] N. Mikami et al., “A new DSP-oriented algorithmfor calculation of square root using a nonlinear digi-tal filter,” IEEE Trans. Signal Processing, pp. 1663–1669, July 1992.

[2] R. Lyons, Understanding Digital SignalProcessing, 2nd ed. Upper Saddle River, NJ: Prentice-Hall, 2004, pp. 481–482.

[3] W. Adams and J. Brady, “Magnitude approxima-tions for microprocessor implementation,” IEEEMicro, pp. 27–31, Oct. 1983.

[4] A. Filip, “Linear approximations to √

x2 + y2having equiripple error characteristics,” IEEE Trans.Audio Electroacoust., pp. 554–556, Dec. 1973.

[SP]

Documents

A Root of Less Evil I - Université de Montréal · 2008-02-08 · A Root of Less Evil IEEE SIGNAL PROCESSING MAGAZINE [93] MARCH 2005 I n this article, we discuss several useful