Algorithms meeting the lower bounds on the multiplicative complexity of length-2n DFTs and their connection with practical algorithms

1504 I E E F TRANTACTIONS ON ACOUSTICS SPFFCH A N D S I G N A L PROCFSSING VOL M NO 9 SEPTEMBER iwn

Algorithms Meeting the Lower Bounds on the Multiplicative Complexity of Length-2 ’

DFT’s and Their Connection with Practical Algorithms

PIERRE DUHAMEL,

Abstract-In a previous paper, the author described an algorithm that computes a length-2” discrete Fourier transform using 2 ” ’ I - 2 n 2 + 4n - 8 nontrivial ( i .e . , # +j = f J-1) complex multiplications. In this paper, it is first shown that this algorithm actually provides the attainable lower bound on the number of complex multiplications. A slight modification of the last step of this algorithm is also shown to provide the attainable lower bound on the number of real multiplications.

A connection with the split-radix algorithm (SRFFT) is then explained, showing that SRFFT is another variation of these optimal algorithms, where the last step is computed recursively from shorter FFT’s in a suboptimal manner.

Finally, once the connection between the minimal complexity and SRFFT (which is the best known practical algorithm) is understood, it provides useful information on the possibility of further improvements of the SRFFT.

I . I N T R O D U C T I O N

HE decimation-in-frequency fast Fourier transform T (FFT) algorithm was published in 1965 by Cooley and Tukey [14]. Their work initiated a lot of studies following the same lines, such as decimation in time versions of the same algorithm, higher radices (radix 4, 8 , mixed radix), and application to real data [ 151.

The first significantly different approach was proposed by Rader and Brenner 121 in 1976, giving birth to another family of so-called “real factor” FFT algorithms.

Still another approach, based on the theory of complexity of computation, was also proposed at the same time by Winograd. This approach had mainly theoretical consequences, as far as length-2” algorithms are concerned (although it also allowed the derivation of powerful algorithms for other lengths): Winograd [7] was able to obtain a lower bound on the number of complex multiplications (including trivial ones b y j ) necessary to compute length-2” DFT’s.

Additional results can be found [ I O ] , [ 1 I ] which provide upper bounds on the number of nontrivial complex multiplications, and in [ 11, which provides an attainable lower bound on the number of real multiplications necessary to compute a length-:!” DFT.

Manuscript received June 30, 1086, revised September 27. 1989 The duthor IS with CNET, 92 13 I Issy-les-Moulinedux, Frdnce IEEE Log Number 9036900

S E N I O R M E M B E R . IEEE

Note that throughout this paper, the term upper bound is used for denoting a number of multiplications that can be reached, even by unpractical algorithms; the term lower bound refers to a number of multiplications which is necessary; while the terms minimal bound, attainable lower bound, and multiplicative complexity denote the very minimal number of multiplications that is necessary and sufficient for computing the desired function.

At the same time, new practical algorithms with regular structure and reduced arithmetic complexity were found. They use the same number of multiplications as “real factor” algorithms, a lower number of additions, and were independently proposed in 131 (the split-radix FFT (SRFFT) ), [4] (the fast Fourier-cosine transform algorithm (FFCT)), and [ 5 ] (the recursive cyclotomic factorization algorithm (RCFA)), the last being ultimately rec- ognized as a decimation-in-frequency SRFFT [ 161.

Nevertheless, both theoretical and practical approaches were performed independently, and the knowledge of optimum algorithms was not used for a better understanding of practical ones. This paper tries to fill this gap, by deriving from theoretical considerations as many useful consequences as possible for practical length-2’’ DFT algorithms.

The first purpose of this work is to put in the same framework two types of optimum algorithms: those minimizing nontrivial complex multiplications, and those minimizing nontrivial real multiplications. In fact, both results are so different that it was not clear that the upper bound published in [ I O ] gave also the multiplicative complexity of the problem. The second purpose is to empha- size the connection with the SRFFT, a precise comparison of practical and optimum algorithms being possible due to the availability of a result on the number of nontrivial complex multiplications.

This framework will also enable us to answer, at least partially, the following questions that arise when com- paring the results obtained in this paper and those obtained elsewhere.

I ) The minimum number of real multiplications necessary to compute a length-2” DFT is slightly more than twice the minimum number of complex multiplications

0096-35 18/90/0900- 1504$01 .OO 0 1990 IEEE

DUHAMEL: ALGORITHMS MEETING LOWER BOUNDS ON THE MULTIPLICATIVE COMPLEXITY OF LENGTH-'" DFT'\ I SO5

needed to compute the same DFT. Since a complex multiplication usually requires 3 real multiplications to be computed, the question is: why twice, instead of three times? While dealing with complex multiplications, have we hidden some of the structure of the problem?

2 ) The SRFFT meets the lower bound on the number of nontrivial real multiplications up to length 16, and the number of nontrivial complex multiplications up to length 64. Is there any relationship between the SRFFT and the algorithms meeting these achievable (but useless as such, in practice) lower bound?

Can the comparison of the practical algorithms with the lower bounds give some direction in the search for new improved algorithms? For example:

3 ) Real factor algorithms have improved significantly radix 2 FFT algorithms, by replacing the complex multiplications (the twiddle factors) by purely real or purely imaginary multiplications, at the cost of some additions. Are the same improvements feasible on other algorithms (radix 4, SRFFT)?

4) The Hartley transform was recently proposed [I71 as a substitute of the Fourier transform to compute cyclic convolutions on real data. Since the Hartley transform is very simply related to the Fourier transform, can the multiplicative lower bounds reveal something about the use- fulness of the Hartley approach?

Besides a better understanding of the problem of minimum multiplication algorithms, these results have other important consequences: they not only give an estimation of how far the practical algorithms are from the optimum, but also indicate directions for future research. This will be addressed in Section V.

Our approach allows this paper to be self-contained: All demonstrations are based on a theorem by Auslander and Winograd [12], and are more concjse in the complex case than in the real one, so they are provided.

Some results concerning the additive complexity are also recalled in Section V.

11. THE "SPLIT-RADIX" DECOMPOSITIONS

Let us consider both decimation-in-frequency (DIF) and decimation-in-time (DIT) versions of the split-radix (SR) decomposition of a DFT:

N - I

x, = c x,, w$ ( 1 ) f f = 0

( WN being the Nth root of unity: WN = COS 2 n / N - j sin 2 n / N ) .

The DIF split-radix decomposition considers separately the outputs X,,, X4, + and X4, + 7 :

N / Z ~ I

f i = 0

N / 4 - I

XI, = c (x i , $- X , i + N / ? ) W$lh ( 2 )

x4k+ I = - x , i + N / Z ) I f = o

- j ( X , , + N / 4 - X f , + ~ N / 3 ) ) W L w Y ( 3 )

On the other hand, the DIT split-radix decomposition cuts the sequence x,, into 3 subsets: { x?,, ), { x4,, + I 3 , { X4ti + 3 3 :

N / 2 - I N / 4 - I

N / 4 - I

,1=0 + w: c x,,, + 3 wy ( 5 )

the redundancy in the computations being more apparent when computing 4 outputs at a time [IS].

These decompositions can be seen to act as a radix 4 on the odd indices, and as a radix 2 on the even indices. It is also straightforward to notice that, if a DIF split- radix decomposition is applied to the initial DFT, it is possible to further apply a DIT SR decomposition, since the obtained results has the same symmetries on index n as the initial problem. This will be used in the next section to provide a decomposition of the DFT into polynomial products.

111. DECOMPOSITION OF A LENGTH-2'' DFT INTO

POLYNOMIAL PRODUCTS

This section briefly recalls the results of [ I O ] , for the sake of clarity.

As seen in the preceding section, the DIF split-radix decomposition transforms the length-N = 2" DFT of ( 1 ) into a length N / 2 DFT ( 2 ) plus two terms in (3) and (4) which can be rewritten as:

Simple calculations show that and VL are of the same type:

with

h;' = h',

h,': = jh,, (9 )

and, since our aim is to find a decomposition of the DFT into a number of polynomial products, one can see that such a decomposition for Uk ( N , h , ) will also provide, recursively, the required result for the initial DFT.

Since U, ( N , h , ) has the same symmetries on index I as

mi I

1506 IEEE TRANSACTIONS ON ACOUSTICS. SPEECH. A N D SIGNAL PROCfiSSINCi. V O L 78. N O . 9. SEPTEMBER 1990

the initial DFT, we can further apply a DIT decomposition:

IV. ACHIEVABLE LOWEK BOUNDS O N THE NUMBER OF

COMPLEX A N D REAL MULTIPLICATIONS TO

A . Bound on the Number of Complex Multiplications In fact, it can be shown that the above number of com-

plex multiplications M,. { DFT,n }, is not only an upper bound of the minimum number of complex multiplications needed to compute a length 2 ” DFT. It is, in fact, the minimal bound. The demonstration is a direct application of a theorem by Auslander and Winograd [12, theorem 31.

First, we shall restate this theorem in our case, in a more restricted version.

Section 111 provided a decomposition of the initial DFT

COMPUTE A LENGTH-2” DFT 2 ” P - I

U, (N, h,) = h2, W c W t k / = 0

2 t 2 - 4 - I

/ = O

2” ~ 4 - I

+ h 4 / + I Wiy4/+1)(4k+l)

+ w j y 4 / + 3 ) ( 4 k + l )

/ = 0

= (N/2 , h 2 / ) + p k ( wN, h4/+ I )

+ Qk ( W N ? h41+3) . ( 10)

Once again, Q, can be expressed in terms of a function of the type P, by making the substitution i = 2 ” - 4 - I - 1:

Q , ( ~ N , h4,+3) = j P , ( w i ’ , h4,21i (11)

An additional property has to be noticed:

Pk+2” J ( w N ? h4 /+l ) =jp! , (wN, h/+l). (12)

From ( 1 1) and (12), we can conclude that, in order to decompose U, (hence the DFT) into polynomial products, it is sufficient to find such a decomposition for the { P, } .

Now, the P,’s can indeed be written as a polynomial product modulo z 2 “ ‘ & j (the proof is given in the Ap- pendix).

To summarize, we have shown that a DFT of length N = 2 ” can be obtained through the computation of 4 polynomial products of length 2’Ip4, 8 polynomial products of length 2 ” - 5 , and so on.

Since we know the minimum number of multiplications required by the computation of a polynomial product, es- tablished by Winograd [ 6 ] , we are now able to evaluate the number of multiplications this algorithm requires to compute the DFT. This is done by simply adding the number of multiplications required for computing each of the polynomial products found in the above decomposition (the proof of optimality will be given in the next section). This number will be denoted by M , { DFT,.} to mark the difference with ,U, { DFT,,, }, the multiplicative complexity of this function (i.e., the minimal bound).

In the above algorithm, all polynomial products are performed modulo z j~ j , which is irreducible in Q [ j ] ( =

the field of rational numbers extended by j = f i). Thus, we know by the work of Winograd [6] that each of these polynomial products modulo z I i f j can be computed with 2 + ’ - 1 complex multiplications [ IO] . Then, with the initial conditions that M, { U, ( 2 3 , h,) } = 1 , and M, { DFT22} = 0, it is easy to obtain the following results, using the recursions as described above:

2 i

M, { U , (N, h , ) } = 2 ” - ’ - 2n + 3 (13)

and

into polynomial products modulo irreducible polynomials in Q [ j 1. Let 6 ( z ) , j = 1 * * * I be the different polynomials involved (note that no individual coefficient of$ ( z ) lies in Q [ j ] ) . Each polynomial $ ( z ) is of degree nJ (which is a power of 2 ) and is used within polynomial products modulo z‘’ f j = R, ( z ) .

Let yr ( z ) be the different polynomials of variables involved. Several of them will be used within polynomial products with the same6 ( z ) . Let tJ be this number.

Following the notations of [12], the system of polynomial products representing the initial length-2 ” DFT will then be noted:

I

t , c (RJ ,&, 7)’ (15) J = I

Reference [ 12, theorem 31 states that the multiplicative complexity of the set of polynomial products in (1 5) is the sum of the individual multiplicative complexities of the polynomial products involved, provided that the following conditions are met:

1) on { q } , j = 1, I : the collection of all the coordi- nates (there are C n, r, of them ) are distinct indeterminates over Q [ j I ;

2) on { J; }, j = 1 , I : no linear combination of the coef- ficients of the polynomials 6 ( z ) (all taken together) lies in Q [ j I .

The first condition is clearly met, since the total number of unknowns involved in the polynomial products is al- ways 2” , all of them being obtained by invertible linear combinations of the 2 ” initial distinct indeterminates.

The second condition is a little more intricate: Careful inspection of the { A } shows that there are 2N’4 - 1 different constants involved in the polynomials J ( z ) , all of them being powers of W N , the Nth root of unity. Further- more, they can be multiplied by a suitable power o f j = f i , such that the resulting constants are Wh, where i =

Let us suppose for a moment that there exists a linear combination of these constants belonging to Q [ j 1. In that case, we can find some a, such that:

1 , * * * , N/4 - 1 .

N / 4 - I

M, { DFT2,,} = 2 ” ” ’ - 2n’ + 4n - 8 . (14) with cy, E Q [ j ] .

m i . D U H A M E L A L G O R I T H M S M F F T I N G L O W F R B O U N D S ON T H F M U L T I P L I C A T I V E C O M P L F Y I T k O F L F N G T H 7 D F T \ I507

W, is then root of the polynomial C ”‘ ’ a,z’. This is impossible, since the irreducible cyclotomic polynomials of order N = 2 ” in Q [ j ] are of degree N / 4 . This con- tradicts our assumption, hence condition 2 is met.

The two above conditions are thus verified, and by application of [ 12, theorem 31 the number of multiplications given in (14) is the minimum number of complex multiplications ( # + j ) needed to compute a length 2 ” DFT:

p, { DFT2,,} = 2 ” ” ’ - 2n’ + 4n - 8. (17)

This result is not the same one as provided by Winograd [7], [8], who did not discard trivial multiplications by kj = + f i , thus making difficult any precise comparison with practical algorithms (which make heavy use of such simplifications). In fact, taking as such the result of (81 would lead to the conclusion that SRFFT is more efficient than the lower bound for some lengths, which is clearly impossible. On the other hand, taking into account trivial multiplications by j in SRFFT would result in an exces- sive number of multiplications tending to indicate that there is still plenty of room for possible improvement of SRFFT, which we show in this paper is not true.

B. Bound on the Number of Real Multiplications The above result can be interestingly compared with the

result of Heideman and Burms [ I ] on the minimum number of real multiplications needed to compute a length-2” DFT

pr { DFT?,,} = 2 ” + ? - 2n’ - 2n - 4. (18)

A first question arises when considering that

p,. { DFT, ,, } 2 p( { DFT? } (19) which seems strange at first glance, since a complex multiplication requires a minimum of 3 real multiplications, so that 3pc { DFT,,,} would have been more likely to be a candidate for p,. { DFT,,, } .

A first explanation could be that, dealing explicitly with complex multiplications, we missed some of the structure of the problem, or, equivalently, that the solu- tions minimizing complex multiplications and those minimizing real multiplications do not have the same structure.

We show in the following that this is not a correct explanation, since it is possible to derive an algorithm minimizing the number of real multiplications with exactly the same structure as the decomposition given in Section 111, and differing only at the very last step (computation of the polynomial products).

Let us see how this works: All the polynomial products involved here are computed mod z Z L + j :

P ( z ) = Y ( z ) . F ( z ) mod ( z z L + j )

= ( z y , z f ) (zf;z’) mod ( z z L + j ) (20)

but z2’ + j is a factor of z”” + 1 , which implies the following: if we define Y‘ ( z ) and F‘ ( z ) as polynomials of degree 2 “ + as:

v: = R,,{ y , }

f,’ = R‘,{ J ; }

we can compute P ( z ) as:

Y,’+?L = +L{ y , }

f : + F = +L{ A } (21

i = 0, . . ., 2” - 1

P ( z ) = [ F ’ ( z ) Y f ( z ) mod ? I

* ( z 2 “ ’ + I ) ] mod . ( z - kj). (22 This costs 2 “ + 2 - 1 real multiplications instead of 2 “ +

- 1 complex multiplications. Let us now evaluate the number of real multiplications

M, involved in the computation of the length-2” DFT by the decomposition given in Section 111, if the polynomial products are computed by this method (22).

Since M , { U, (23, h j ” ) } = 2, and M, { DFT??} = 0, we obtain

M,.(DFT,,,} = 2”” - 2n’ - 2n - 4 ( 2 3 ) which is exactly the lower bound ( 1 8). Hence, the algorithm obtained by a decomposition of the DFT into complex polynomial products and computation of these products as indicated in (22) is also an optimal algorithm in the sense of minimizing the number of real multiplications.

We have now the answer to the question we raised at the beginning of this paragraph: the difference between algorithms minimizing the number of complex and real multiplications is not in the structure of the algorithm, but only in the way the polynomial products are computed.

V. CONNECTION WITH PRACTICAL ALGORITHMS A . Comparison of the Multiplicative Complexities

Unfortunately, these algorithms computing the DFT with minimum number of complex or real multiplications are of theoretical interest only, except for small N : the reason for this is that algorithms for computing polynomial products using the minimum number of multiplications are much too costly in additions, even for moderate degrees of the polynomials involved.

However, an interesting point is that in both cases we count only nontrivial multiplications ( # + I , f j ). We are thus now able to compare in a realistic manner practical FFT algorithms (with multiplicative complexity of 0 ( n 2 ” ) ) with the FFT algorithms meeting the lower bounds (with multiplicative complexity of 0 ( 2 ” ) ) . This comparison is provided in Fig. 1 , while a comparison of the SRFFT with optimum algorithms is provided in Table I for the complex multiplications, and in Table I1 for the real multiplications.

As far as real multiplications are concerned, we note that the multiplicative complexity improves when the radix increases, while the lowest complexity among practical aigorithms is met for the SRFFT, which follows the bound (together with radix 4 ) up to and including N = 16.

A much more interesting conclusion comes from observation of Table I , concerning the number of complex multiplications.

1

1508 IEEE TRANSACTIONS ON ACOUSTICS. SPEECH. AND SIGNAL PROCESSING. VOL. 38. NO. 9. SEPTEMBER 1990

M / N 10 0

9 0 1

'OI 60

L O

30 20 1 0

00 3 L 5 6 7 8 9 10

n=log,N

rodix 2 o radix L A split.rodix + lower bound

Fig. 1 . Number of nontrivial real ( M , ) or complex ( M , ) multiplication per output point required by several algorithms.

TABLE 1 NUMBER OF NONTRIVIAL COMPLEX MULTIPLICATIONS REQUIRED FOR THE

COMPUTATION OF A LENGTH-2" FFT ALGORITHM

minimum multiply Radix 2 Radix 4 5plit-radlx

(Y" . 2 ~ ) 2 " ~ + 4 (3" -812" . (.I)"

N = Zn (n - 3)2"-' + 2 3 9 - 2"' * 4"-8

8 16 32 64

128 216 512

1024

2 10

34 98

218 642

I 138 3586

8

76

492

2732

2

8 26

72 186 456

10x2 2504

2

8 26 72 I 78 408 890

1880

TABLE 11 NUMBER OF NONTRIVIAL REAL MULTIPLICATIONS REQUIRED FOR THE

COMPUTATION OF A LENGTH-N COMPLEX DFT

N Radix 2 Radix 4 Radix 8 BADER- split-radix minimum

BRENNER multiply

16 24 20 20 20 20 32 88 68 68 64 64 264 208 204 196 196 168

128 712 516 516 396 256 1800 1392 1284 1284 876 512 4360 3204 3076 3076 1864

1024 10248 7856 7172 7172 3872

Observation of Table I shows that the split-radix algorithm is optimum up to and including N = 64, and is very near the lower bound for N = 128. The other practical algorithms are diverging earlier from the optimum.

Furthermore, the divergence between the performance of the split-radix algorithm and the optimum occurs precisely at the very length where, in both cases, the optimum polynomial product algorithms are not practical al-

gorithms any more: a degree d = 2 allows to perform real polynomial products with a minimum number of real multiplications, but a degree d = 4 involves too many additions to be useful, and this corresponds to the transition from N = 16 to N = 32 in Table 11. Also, in the case of complex polynomials, a degree d = 4 corresponds to practical minimum multiplication algorithms ( it involves interpolation points 0, f 1 , f j , k 1 + j ), and a degree

I509 D U H A M E L . A L G O R I T H M S M E E T I N G LOWER B O U N D S O N T H E M U L T I P L I C A T I V E C O M P L t - X I T Y OF L E N G T H - ? " D F T . \

d = 8 will involve too many additions for the optimum polynomial products to be of interest. And this corresponds precisely to the divergence between the SRFFT and the lower bound at N = 128 (see Fig. I ) .

The above considerations show that no approach based on polynomial products will be able to improve the SRFFT, except at the cost of an unreasonable number of additions. Note that the situation was different for radix 4, in which the number of complex multiplications can be efficiently reduced by polynomial products.

Let us also insist on the complementarity of these bounds. The study of the lower bound on the number of real multiplications only would tend to show that there is possibly enough room for easily improving practical algorithms. It is only by studying the bounds on the number of complex multiplications that such an improvement ap- pears to be very difficult to achieve.

B. Connection with SRFFT A stronger connection with SRFFT can also be estab-

lished by noting that the SRFFT will have almost the same mathematical structure as the optimum algorithms. In- deed, the decompositions (2)-(4) and (10) are split-radix decompositions, and are only rearrangements of decompositions performed in actual SRFFT (in a DIF SRFFT, a DIT SR decomposition is actually performed on the output, due to the duality between time and frequency in- dexes). Let us assume that, at the very last step (computation of the polynomial products), P, ( W,, h4 /+ ) as defined in ( I O ) is computed in a recursive manner by

P , ( Y V > h4/+I)

/ = 0

C. Consequences for Future Research Many interesting conclusions can be derived from the

considerations above. The most straightforward one is that it is useless to look

for new algorithms either based on complex multiplications using fewer complex multiplications than SRFFT for N < 128, or fewer real multiplications than SRFFT for N < 32.

Another interesting remark states as follows: the same number of multiplications as in SRFFT could also be obtained by the so-called "real factor radix-2 FFT" [2]. They were obtained by making use of some computational trick to replace the complex twiddle factors by purely real or purely imaginary ones. Now, the question is: is it possible to use the same kind of trick with radix 4, or even SRFFT? Such a result would provide algorithms with still fewer multiplications. The knowledge of the lower bound tells us that this is impossible since, for some points ( N = 16, for example) this would produce an algorithm with better performance than the lower bound.

When wondering how to improve SRFFT, one must consider the following.

Comparison of SRFFT with p { DFT?,,} tells us that no algorithm using complex multiplications will be able to improve significantly SRFFT for lengths < 5 12. Further- more, the trick allowing to get real factor algorithms can- not be applied to radices greater than 2 (or at least not in the same manner).

Our conclusion is that there remain very few (yet un- known) approaches that could possibly improve the best known algorithms for length 2" FFT's.

D. Real Data FFT's and FHT's Deriving lower bounds for real-data FFT's is also

straightfokard from the derivations of Section 111: if { x,, } is real, X , and X,-, are complex conjugates. This means that { X4, + } and { X4, + } are redundant, since

i DFT length 2 " - 4

? 2" - 4 1

J 211-4 -

complex complex mult mult

& L + 3 = X,-(4L,+l, = X'?L,+l.

( 2 4 ) Hence, it is useless to compute (4), if ( 3 ) is computed, which divides by two the number of polynomial products involved, and will also cut by a factor o f two the lower bounds.

Therefore, the real-data SRFFT, as described in [ 161, [18], will also be optimum for N < 128 (complex multiplications) and N < 32 (real multiplications), and the consequences stated above for complex data FFT's also hold for both real data FFT's and Hartley transforms (FHT's): Since FHT's can be obtained from a real-data FFT [18] by a simple multiplication by ( 1 + j ), they have exactly the same multiplicative complexity as the corresponding real-data FFT's.

It is easily checked out that iterating the number of multiplications indicated in (24) results in the very number of multiplications that is required by SRFFT:

M c ~ ~ ~ ~ ~ = ( ( 3 n - 8 ) 2 " - ( - l ) " ) / 9 + 1 . (25)

It is even possible to isolate in an actual SRFFT graph the sequence of operations suggested by (24), by following the first DIF decomposition on the input, as given in (3) , and the first DIT decomposition from the outputs, as provided by ( I O ) .

To summarize, we have shown that the two optimum algorithms minimizing the number of complex and real multiplications, together with the SRFFT, have basically the same structure, and that these algorithms differ only in the way the polynomial products resulting from the SR decompositions are computed.

E. Additive comP1exirY Of course, when speaking of practical algorithms, the

number of additions and the number of memory transfers have also to be considered.

1

1510 IEEE TRANSACTIONS ON ACOUSTICS. SPEECH. A N D SIGNAL PROCESSING. VOL 38. NO. 9. SEPTEMBER 1990

The number of memory transfers is highly dependent on the precise implementation of the algorithm (and even on the compiler which is used). This is the reason why very few comparisons were reported ([ 161 compares the number of floating point loads and stores in a radix 4 algorithm versus a split-radix FFT).

Concerning the number of additions, the situation is as follows. The number of real additions required by classical algorithms can be separated in two parts: a first one is due to the butterflies, while the other part consists of the real additions due to the complex multiplications. The first number is exactly the same one in any classical algorithm (radix 2, radix 4 , radix 8, SRFFT), and has been proven in [I91 to be optimal for length 2" FFT's. Fur- thermore, if the 3-mult, 3-add complex multiplication scheme is used, the second part is equal to the number of real multiplications required by the algorithm. Hence, di- minishing the number of additions in such algorithms is equivalent to minimizing the number of complex multiplications (note that this conclusion is valid only for classical algorithms can be separated in two parts: a first one complexity is of such importance in length-2" FFT's. As a consequence, since the SRFFT follows the lower bound on the number of complex multiplications up to "reason- able" lengths, we have here an argument tending to show that SRFFT is the best possible tradeoff for length-2" FFT's.

VI. CONCLUSION

In this paper, we used two successive split-radix decompositions to derive lower bounds for the multiplicative complexity of length-2" DFT algorithms in a con- structive manner, in both cases of number of real and complex multiplications minimization.

We then showed that both algorithms, together with the SRFFT had essentially the same structure, and differed only at the last step (computation of polynomial products).

We also showed that, besides their theoretical importance, these results also give indications about further research for improving 2" FFT algorithms: it is explained why any improvement on the SRFFT must be obtained by the use of some "real factor" algorithm, although a transformation of radix4 or SRFFT algorithms into their corresponding "real factor" versions, in the same manner as was performed in the radix-2 case is impossible.

APPENDIX

In this appendix, we shall prove that 2" 4 - I

Z'k( WN, h q + l ) = / = o C h4/+1 WjY41+11(41+') (A1 1 can be computed as a polynomial product.

It is known in number theory that 5 has period 2"-4 modulo 2"- ' , and runs through the numbers congruent to 1 mod 4 [20].

Hence, there is a one-to-one correspondence between s, 1 1 211-4 -

5 " = I + 41mod . 2" -2 .

and 1 on the interval { 0, I , * . .,

('42) Letusa l sode f inec , (E (0 , 1 , 2 , 3 } ) , a n d h b y

5" = 1 + 41 + ~ ~ 2 ~ ~ - ~ mod 2" (A3)

1 + h2" -2 mod 2". (A4)

( '45)

5'" 4 =

Then, we can write the following permutation:

h:, = h4 /+ , (0 I 1, sI < 2" -4 )

and also, for n 2 4: w $ / + 1 ) ( 4 1 + l l = WjyS\-t'2'! 2 ) ( 5 b f A 2 ! l 2 )

= W ~ 5 ' , t , ' - f / 2 " 25"-eA2,1 2 s \ l t r i r ? 4 1

Furthermore, when sI + s1 2 2'Ip4, we have , r + 2 = [ w;2' ']'I'

and, applying (A4), we find

= j' ' WN. w r = w, I +' .2" - '

Hence

ws;"2'' = [WNJ']"' = j' * W g . (A7)

So, if we define

1) (A8) w, = w;(u = 0, 1, * . . , 2 " - 4 -

then, from (A6) to (A8), we have

s1 + I 2"-4 - 1

s,! + 2 211-4

= j -fy - t A j X w + \ A - 231-4

and, if we state X = X * jfi 'i I

h, = h' . j-f/ \ I

( A l ) becomes

and, if

X ( z ) = c x\izFi

W ( z ) = c WJf

H ( z ) = C h,/z"

then, (A9) reads

X ( z ) = H(z- ' ) 9 W ( z ) mod

. (p' - j ' ) . Q.E.D. (A10)

D U H A M E L : A L G O R I T H M S M E E T I N G L O W E R BOUNDS ON T H E M U L T I P L I C A T I V E C O M P L E X I T Y OF L E N G T H - 2 ” D F T ‘ \ 151 1

ACKNOWLEDGMENT The author would like to thank H. Hollmann and S.

Mayrargue for useful discussions, M. T. Heideman and

[ 131 L. Auslander, E. Feig, and S . Winograd, “The multiplicative complexity of the discrete Fourier transform,” Adv. Applied Math., vol. 5 , no. 5 , pp. 87-109, 1984.

1141 J. W. Cooley and J. W. Tukey, “An algorithm for the machine cal- culation of complex Fourier series,” Murh. Cornput., vol. 19, no. 2 , C . S . B u m s for private communications on the subject

O f minimum multiply algorithms, and 0. Rioul for careful pp, 297-301, 1965,

[ 151 G . D. Bergland, “A fast Fourier transform algorithm for real valued reading of the manuscript.

REFERENCES [ l ] M. T. Heideman and C. S . B u m s , “On the number of multiplica-

tions necessary to compute a length-2” DFT,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 34, no. 1, pp. 91-95, Feb. 1986.

[2] C. M. Rader and N. M. Brenner, “A new principle for fast Fourier transformation,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-24, no. 3, pp. 264-265, June 1976.

[3] P. Duhamel and H. Hollmann, “Split-radix FFT algorithm,” Elec- tron. Lett., vol. 20, no. I , pp. 14-16, Jan. 1984.

[4] M. Vetterli and H. J . Nussbaumer. “Simple FFT and DCT algorithms with reduced number of operations,” Signal Processing, vol. 6 , no. 4, pp. 267-278, July 1984.

[5] J. B. Martens, “Recursive cyclotomic factorization-A new algorithm for calculating the discrete Fourier transform,” 1 / 3 3 Trans. Acoust., Speech, Signal Processing, vol. 32, no. 4, pp. 750-761, Aug. 1984.

[6] S . Winograd, “Some bilinear forms whose multiplicative complexity depends on the field of constants,” Math. Sysr. Theory, vol. 10, no.

[7] S. Winograd, “On the multiplicative complexity of the discrete Fou- rier transform,” Adv. Math. , vol. 32, pp. 83-1 17, May 1979.

[8] S. Winograd, “Signal processing and complexity of computation,” in Proc. ICASSP’80(Denver, CO), Apr. 9-11, 1980, pp. 94-101.

[9] H. J. Nussbaumer, Fast Fourier Transform and Convolution Algo- rithms. Berlin: Springer, 1981,

[IO] P. Duhamel and H. Hollman, “On the existence of a 2 ” FFT algorithm with a number of multiplications lower than 2 ” ” , ” Electron. Let t . , vol. 20, no. 17, pp. 690-692, Aug. 1984.

[ l 11 B. Mescheder, “On the number of active *-operations needed to compute the discrete Fourier transform,” Acta Inform., vol. 13, no. 4, pp. 383-408, May 1980.

1121 L. Auslander and S. Winograd, “The multiplicative complexity of certain semilinear systems defined by polynomials,” Adv. Applied Math., vol. 1, no. 3 , pp. 257-299, 1980.

2, pp. 169-180, 1977.

series,” IEEE Trans. Audio Electroacoust., Gal. AU-20, no. 5 , pp. 353-356, Dec. 1972.

[ 161 P. Duhamel, “Implementation of ‘split-radix’ FFT algorithm for complex, real, and real-symmetric data,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-34, no. 2, pp. 285-295, Apr. 1986.

[17] R . N. Bracewell, “The fast Hartley transform,” Proc. IEEE, vol. 2 2 , no. 8, pp. 1010-1018, Aug. 1984.

1181 P. Duhamel and M. Vetterli. “Improved Fourier and Hartley transform algorithms: Application to cyclic convolution of real data,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 35, no. 6, pp. 818- 824, June 1987.

[19] C. H . Papadimitriou, “Optimality of the fast Fourier transform,” J . Ass. Comput. Mach. , vol. 26, no. 1 , pp. 95-102, Jan. 1979.

[20] R. E. Blahut, Fast Algorithmsfor Signal Processing. Reading. MA: Addison-Wesley, 1986.

Pierre Duhamel (M‘87-SM’87) was born in France in 1953. He received the lngenieur degree in electrical engineering from the National Insti- tute for Applied Sciences (INSA), Rennes, France, in 1975, the Dr.lng. degree in 1978, and the Doc- torat es Sciences in 1986, both from Orsay Uni- versity, France.

From 1975 to 1980. he was with Thomson- CSF, Paris. France, where his research interests were in circuit theory and signal processing, i n - cluding digital filtering and analog fault diagno-

sis. In 1980, he joined the National Research Center in Telecommunica- tions (CNET), Issy-les-Moulineaux, France, where his activities were first concerned with the design of recursive CCD filters. He is now working on fast convolution algorithms, and on the application of the same techniques to adaptive filtering, spectral analysis, and wavelet transforms. He is also an Associate Editor for IEEE TRANSACTIONS O N ACOUSTICS, SPEECH. A N D SIGNAL PROCESSING.

1: 1

Documents

Algorithms meeting the lower bounds on the multiplicative complexity of length-2n DFTs and their connection with practical algorithms