1975 Number Theoretic Transforms to Implement Fast Digital Convolution

8/20/2019 1975 Number Theoretic Transforms to Implement Fast Digital Convolution

1/11

5 5 0 PROCEEDINGS

OF

THEEEE,

VOL. 63

NO. 4, APRIL

1975

Number Theoretic Transforms to Implement

Fast

Digital

Convolution

Invited Paper

Absmiet-Transforms using number theoretic concep ts are developed

as

a method for

fast

and

erroraee

calculat ion of r i t e digital convolu-

t ion. The transforms are def i ied on f ini te f ie lds

and

rings of integers

with

ari thmetic ed o u t

modulo

an integer

and it

is shown that

undercertaincondit ions this gives th e same esults as conventional

digital conwlution. -use of these characterist ics hey are ideally

suited to

digital

computat ion by

t king

into account quant izat ion of

amplitude

as

well as t im e n h e ir d e f h i o n . W h en t h e m o d u l u s

is

chosen as

a

Fezmat number

a

t ransform results that requires on l y on

t h e o r d a of N og N addi t ions and word shifts b u t no multiplications.

In addit ion

to

being efficient , they

have no roundoff

enor a n d d o not

require storage of basis functions. There s a restrict ion on sequence

length mposedbyword engthanda problem withover t low but

methods for overcoming

these

are presented. Results of an implemen-

t a ti on o n an IBM 370/155 are presented and

compared

with the fast

Fourier

tnnsfom

showing a substan t ia l improvement in effkiency and

accuracy.

Variations

o n t h e basic number theoretic transformsare also

presented.

F

I . INTRODUCTION

INITE DIGITAL convolution is a numerical procedure

defined by

y ( n )

=

h n - m ) x ( m ) ,

n = o ,

1 , 2 , .

.

. (1)

N - 1

m

O

and symbolically denoted

~ ( n )h ( n )

* x ( n )

where x(n) , h ( n ) , and y ( n ) are digital numbe r sequences. This

operatio n has many very powerful applications. It is used to

imp lem ent nonrecursive or finite-impulse-response digital filters

either directly or w ith sectioning or block technique s 11

[

2

1

and recursive or infinite-impulse-response digital fiiter by block

me thods [3] . I t also is used to carry out a uto and cross corre-

lation as well as for com putations such

s

polynomial multipli-

cation and multiplication f very large integers [41 -[ 6

There are several methods to implement f inite convolution

that differ in the amount o f computa t ion required , the ef fec ts

of

arithm etic roundo ff, and the amo unt of storage required.

It is som ewh at difficult to com pare various algorithms because

of

the tradeoffs between these various fac tors thatd epen d on

the hardware o r software that

is

available. However, because

of

the complex i ty

of

performing multiplication, the num ber

of multiplications necessary to implement convolutions often

an impor tan t fac tor to beinimized.

This work

was

suppor ted in

part by

the National Science Founda tion

Manuscript eceived Septem ber

5 ,

1974; evised October 15, 1974.

under Grant GK-23697.

R. C. Aganval

was

w i t h the Depar tment of ElectricalEngineering,

Rice University, Houston, Tex. He is now with the IBM Thomas

J

Watson Research Center, Yorktow n Heights, N.Y.

10598.

University, Houston,

Tex

77001.

C.

S. Burrus is with the Department of Electrical Engineering, Rice

The use of transform methods has proven to be useful when

an application allows sequences to be processed in blocks. Th e

most versatile transform

is

the discrete Fourier transform

(DFT) defined by

DFT[ x l 4 X ( k ) = x(n) xp - j2nnk/N) ,

N -

1

n = o

k = O , l ; . - , N - 1 (2)

and the inverse transform

N -

1

k O

I DFT[ Xl

4

(n) = N - ' X ( k ) expj2nnk/N) ,

n = 0 ,

, . * * , N - . (3)

The proper ty of this transform that is important here is t h e

cyclic convolu tion property (CCP) which states that

DFT[h * x ]

=

D F T [ h ] D F T [ x ] .

This implies that a convolu tion can be calculated by

y ( n ) =

I DFT( DFT[ h ] DFT[ x ] }4 )

using two transforms,N multiplications, and one inverse trans-

form.Theconvolution mplementedby 4) is cdl ed cyclic

convolution since it evaluates (1) as if h ( n ) and x(n) were

periodically extend ed outside of the range from 0 t o

( N -

1)

or, equivalently, the ndite sare evaluated mod N . Normal

finite c onvolu tion can be calculated by cyclic convo lution if

zerosare appende d ox(n)and h ( n ) to prevent olding or

aliasing [ 11, [21.

This transform approa ch became useful only whe n Cooley

and Tukey

[

71 introduced a very efficient algorithm kno wn as

the fast Fourier transform (FFT) for calculating the DFT and

its inverse in

(2)

and 3) .Thenum ber of multiplications

necessary to calculate the FF T

of

a number sequence

of

length

N is on he order of N log2 N. mplementation of convolution

using the FF T esults in a considerable

savings

in m ultiplication

when lengths are approx imate ly above N

=

32. Th e disadvan-

tage of this approach is in the form of significant amounts of

roundoff error [

81

storage or generation

of

the complexbasis

functions hat have tobe oundedand stillaconsiderable

amount .of multiplying.

If one look s for the properties that a general transform with

the DFT st ruc ture

~ ( k ) x ( n ) a n k

must have to have the CCP,

it

is

f o u n d [ 9 ] ,

[

101 that

a

s a

N -

1

( 5 )

n=O


2/11

AGARWAL AND BURR US: NUMBERTHEORETIC TRANSFORMS

root of unity

of

order

N ,

i.e.,

N

is the least positive integer

such that

d v

1. (6)

This analysis shows that n hecomplexnumb er field, the

conventional DFT with

a

= exp (- j27r/N)

is

the only t ransform

given by

( 5 )

with he CCP. If, however, oth er fields nd

arithm etic systems are used, new transform s becom e possible

wit h very inte resti ng properties. This is pursued by considering

mathematical systems that are fundamentally compatible with

digital comp uting capability.

In

any practical situation,orwhen workingwith digital

machines, thedata are available onlywithsome inite pre-

cision, and therefo re, witho ut loss of generality, the data can

be considered to be integers with some uppe r b ound . T o com-

pute convolution in thisdigital domain,operations n he

complex umber field of the ontinuou s omain can be

imitated in a finite field o r, more generally, in a finite ring of

integers under ddition s andmultiplications mod ulo some

integer M.

An

integer

a

of orde r N replaces exp -j2n/N)

used in a D FT. In this r ing, when two integer sequenc esx(n)

and h n ) are convolved, t he ou tp ut integer sequence y ( n ) is

congruent to the convent ional convolution

of

x(n) and

h ( n )

mod M. In the ring of integers mod M, co nven tional integers

can beunamb iguously epresented if thei rabsolu te value is

less than M/2 . If the in put integer sequences x(n ) and

h ( n )

are

so scaled that

ly n)l

never exceeds M/2, we would get the

same results by implem enting convolution in the ring of inte-

gers mo d M as that obtained with norma l arithmetic. This is

similar

to

the overflow constra int in ixed-point digital ma-

chines. In most digital filteringapplications, h ( n ) represents

the impulse response and is know n

a

priori;

also the maximum

magnitude

of

the inpu t signal

is

usually known.

By workingn finit e field or ring of integers with

arithm etic carried out mo dulo an integer M, a large class of

transform s exist that have the CCP. By special choices of the

leng th N , th e m od

M,

and the value

a

it is possible

to

have

transforms hat need only word shifts and additions but no

multiplications, that have an FF T typ e fast algorithm , that do

not require storage

of

complex values for

a,

nd that have no

round off errors. These transform s are called num ber theoretic

transform s (NTT) and they look very promising in the evalua-

tion of fin ite convolutions. Their main disadvantage seems to

be a relation of the sequence length N to the required word

length th at can requ ire long word lengths for long sequence

lengths.

These number heoretic ransforms are truly digital trans-

forms, taking into account the quantizatio n in am plitude and

the finite precision

of digital signals. The y b ear the same rela-

tio n o digital signals as theDFT does to discrete-time or

sampled data signals and th e Fourier or Laplace transforms do

to continuo us-time signals. In the same manne r that the rela-

tion of discrete-time signals to continuous-time signals throug h

sampling involves a possible folding or aliasing in the freq uen cy

domain, the relation

of

calculations with the DFT to calcula-

tions with the number theoretic transforms invohes possible

folding of the am plitude th at m ust be taken into ac count.

Th e litera ture on trans form s of these typ es is fairly recent.

Kn uth [4 ] has proposed the use

of

transform s in finite fields.

Pollard [5 ] discussed transform s having the CCP in a finite

field and also gives con ditio ns for having transfo rms with the

CCP in a finite ring

of

integers. Good [ 11 ] also mentione d the

use

of

transform s in a finite ring

of

integers. Schonhage and

5 5 1

Strassen 61defined transform s having the CCP mod ulo a

Ferma t num ber and discussed their application t o fast multi-

plication

of

very large integers. Kn uth

[

121 elaborated on the

work of Schonh age and Strassen. Nicholson [ 101 presented an

algebraic theory of FFT’s in any ring and established fast FFT -

type algorithms tocomp ute hese transforms.Rader

[

131,

[ 141 proposed num ber theo retic ransforms in rings of integers

modulo bothMersenne and Fermat numbers. He f i i t proposed

the application to digital signal processing, showed th at he

transforms couldbecalculated using only additionsandbit

shifting, howed the word lengthconstrain t, and suggested

two-dimensional transforms as a possible relaxation of that

constrain t. Agarwal and Burrus [9 ],

[

151 discussed nu mber

theoretic ransforms in detail, defined Ferma t num ber trans-

forms and also proposed th eir applic ation for fast digital con-

volution. They also suggested possible hardware and software

implementations. Their implementation on the IBM 370/155

showed a factor

of

3 t o

5

speed improvement over efficient

FFT implementat ions

of

cyclic convolution for lengths up to

256. An earlierarticle by Takahasiand shibashi [3 0] was

recently brought to our attention by Dr.

J

W. ooley of

IBM.

11. M O DULAR

ARITHMETIC

In this section, ome of the basic conce pts of modular

arithmetic from number heory relevant to

NTT

will be dis-

cussed. Thiscanbe found nmost basic books onnumber

theory

[

161,

[

171.

Tw o integers Q and

b

are said t o be congruent mod

M if

a = b + k M

(7)

where

k

is some integer and

M

is the m odulus.

This

s written

as

Q = b

(mod

M). 8)

All integers are c ongruent m od

M

to some integer in the finite

set

(0,

1 , 2 ,

*

,

M

-

1) which

is

called the set of integers

mod

M

and denoted by

Z M . ZM

is also known

as

the ring of

integers mod M. If in a ring of integers multiplicative inverses

exist for all nonze ro integers, this ring becomes field

and it can be show n that

Z ,

is a field

if

and only

if

M is a

prime. We will use the symbol

“ZM”

nd the expression “the

ring of integers mod

M ”

or rings as well as fields since a field

is

also a ring. Th e following basic arith me ticoperations are

permissible with modular arithmetic.

Addition: Example, 7 + 12

=

19 = 2 (mod 17).

Negation: Example, -7

=

- 7 + 17 = 10 (mod 17).

Subtraction: Example, 7

-

12 = 7

+

(- 12) =

7

+

5

= 12

(mod 17).

Multiplication: Exam ple, 7

X

12

=

8 4

=

16 (mod 17).

Multiplicative Inverse: Multiplicative inverse of an integer

b

in

Z M

exists if and only

if b

and

M

are relatively prime.

In hat case

b-’ is

an integer such hat

b X b-’ =1

(mod

M).

Example, 7-’

=

5

(mod 17); 7

X 5

=

35

=

1

(mod 17).

Division: a / b exists if and only

if

b has an inverse. In

that case

a / b =

X b-’

.

Example, 12/7

=

12 X

5 =

9

(mod 17); 7 X

9

= 12 (mod 17).

This may seem like a rather peculiar way to d o arithmetic

but i t

is

used quite ofte n by everyone. In discussing the day of

the week, o ne uses an arithm etic mo d 7 or in stating the time,

one is calculating mo d 12 or perhaps 24. Indeed the mantissa

of

a number

in

scientific nota tion is evaluated mo d 10.


3/11

5 5 2

PROCEEDINGS O F THEEEE,PRIL

1975

Because of th e natu re

of

modular arithmetic, numbers do

not have sizes or magnitude. We can n ot say tha t a particular

number is larger than another or that two numb ers are close.

Tuesday may n ot be close to Wednesday or ome before

Wednesday

if

they occu r in differen t weeks.

As was men tioned in the intro duc tion, for the existence of

transforms with the D FT st ructu re given in ( 5 ) and having the

CCP, it

is

necessary that an integer exist that

is

the N th roo tof

uni ty. We will now consider this problem using mo dul ar

arithme tic. First Euler's cp funct ion is definedas cp(M), the

number

of

integers in

Z M

that are relative primes to

M.

For

M

a prime,

q ( M ) = M

1. If

M is

composite and its prime fac-

tored form

is

denoted by

M

= p:p;

* *

p; the n the general

expression for

cp

is

cp M)

= M ( 1

-

l /P1)(1

-

1/P2)

* *

(1

-

l /Pl) .

An impor tant theorem known

as

Euler's theorem states that

for every

a

elatively prime

to M

aq(M) 1

(mod

MI.

(9)

For M prime this reduces to Fermat's theorem

aM- l

= 1 ( m o d M )

(10)

which holds for

all

nonzero elements

of ZM

since t hey are all

relatively prime to

M

if

M

is prime.

There are certain roots

of

unity that are

of

particular inter-

est. If

N is

the least positive integer such that

aN

=

1 (mod

M )

(1)

then

a

is said to be a roo t of unity

of

order

N ,

or simply

of

order

N .

In some

of

the literature

a

is

said to belong to the

exponent N or N

is

the exponent to which a belongs. Another

terminology says

a s

a primitive Nth ro ot of unity.

If the order of

a

the exponent t o which

a

belongs) is equal

t o cp(M), then a s called.a primitive root (do notconfuse with

a primitive Nth ro ot of unit y). If

M

is

prime and

a

s a primi-

tive root, he set of integers ak Mod M),

=

0, 1, 2,

*

,

M - 2)

is the total set

of

nonzero elements

in 2,.

Thus all

nonzero integers in

Z M

can be generated by powers

of

a primi-

tive root . This characterizes the en tire field.

Euler's theorem implies that if

a

s of

order

N

then

N

must

divide

cp(M),

denotedby

NIcp(M).

If

M

is prime it can be

shown that roots of order N exist if and only if

NI

M 1)

and the roots are iven by

= 4 M - 1 N

12)

where

a,+,

enote s a primitive root. More generally,

if a

s a

root

of

order

N

then

ak s of order N / k

if k

IN

ak s of order

N

if

N

and

k

are relatively prime. (1 3)

This implies the number of roots of order

N

is given by

cp(N)

and, therefore , the num ber f primitive roots

is cp(cp(M)).

These

relations will allow on e o calculate

all of

the oots

of

all

possible orders from one primitive roo t. Tables will ofte n list

primes a nd the smallest primitive roo t for each.

These ideas will become clearer by looking at an example.

Consider the field

Z7

witharithmetic mod7. Firs t we will

g ive the f i t ew evalua tions

of

Euler's function.

cp(1) = 1 p 2) = 1 (p(3)

= 2 cp(4)

=

2

( ~ ( 5 )

4

( ~ ( 6 )

2

~ ( 7 ) 6. (14)

Consider raising each element of

Z7

to powers f rom 1 to

6

(mod 7) .

N = O 1 2 3 4 5 6

1 N = l 1 1 1 1 1 1

2 N = 1

2 4 1 2

4 1

4 N = 1

4

2 1

4

2 1

3 N=1 3 2 6 4

5

1

5 N = 1

5

4 6 , 2 3 1

g N = l 6 1 6 1 6 1

This llustrates several very interesting features. Consider the

various root s of order

N .

N

Roots

of

order

N

1 1

2 6

3

6

2 , 4

3, 5

~~ ~

Only hose N that cfivide

cp(M)

=cp(7) = 6 have roots hat

belong to them. The number

of

roots is given by

p ( N )

and the

number

of

primitive roots

iscp(cp(M))

= 2 and they are 3 and

5

Note that both

of

the primitive roots generate

all

the nonzero

eleme nts of t he field while the othe r ro ots gen erate yclic sub-

sets with N distinct members. Also note that Euler's theore m

(Fermat's th eorem in his case)

is

indeed satisfied n tha t all

elements raised

to

the 6th power are congruent to uni ty and

(1 3 ) does generate all the ro ots

of

order N from the primitive

roots. Also note'that every nonz ero integer a has an inverse

x?-2. For a nonprime M , has an inverse given by

if

a

and

M

are relatively prime.

By considering a similar example with

M

a composite rather

tha n a prime, on e observes several differences. First

ZM

is not

a field since all elem ent s will no t have inverses. There

is

no

primitive r oot that will gen erat e the enti re ring, only subsets

with

cp(M)

elements.

When considering a nonprirrie m od

M , M

is a ring and in-

verses exist onl y for integers relatively prime to M. Let M have

the following unique prime power factorization.

.p:' *

. .

;l.

(15)

When the arithmetic s done mod

M , t

is in effect done modulo

eachprimepower pi" simultaneou sly

[ 4 ] , [

181. A set of

arithmeticoperations can bedoneeitherm odu lo each p y

separately and he final esult mod M obtained using the

Chinese remainder theorem [ 4 ] , [ 161, [ 181, or alternatively

all the operations may be don e m odM, ut, they must be valid

operationsmodfor each p y . An integer

a

is said

to

be

of

o rd e r N i n

Z M

if a d nly

if

i t

is of

order

N

in each

Zpiri .

Here

we present so me basic results.

a

b

( m o d M )16)

is true

if

and only

if

~ = b m o d p j ' ) ,. i = 1 , 2 ; * * , Z .1 7 )

If we kno w the residues of an integer

a

modulo each pj', we

c n

uniquely reco nstru ct he integer a (mod M) sing the

Chinese remainder theorem given in the following.

Let


4/11

AGARWAL AND BURRU S: NUMBER THEORETIC TRANSFORMS

5 5 3

and

d i (di mod p i ' ) (mod

$1

1

r'

-1

then

111. NUMBER HEORETICRANSFORMS

In this section, the definition and basic conditions for he

existence of th e NTT will be presented and in particular th e

allowed relations b etw een the mo dulu s M and the transform

length N and basis function

a

are spelled out .

If we have a length

N

sequence of numbers, then a transform

pair of th e form given by

X k)= x ( n ) a k

N-1

n =O

x ( n ) =N-1

X ( k )

a-nk

N

(20)

k=

is said

to

have a DFT structure. By requiring that application

of

the transform metho d in (4) results in cyclic convolution,

the following theore m can be proven.

Theorem I : A length

N

transform having the D FT structure

will implement cyclic convolution

if

and only if the re exists an

inverse

of

N and an element

a,

root

of

unity of order

N ,

i.e.,

N

is the least positive integer such that

d v 1.

This is a very general result applying

to

both rings and fields

that are f i i i te or nf ini te and

it

has been developed from a

variety of points of view [51, [9] , [ lo ] . In addi tion to the

CCP, transform s of thi s ype also allow fast com puta tion

algorithms

of

the FFT type when

N

is highly co mpo site

[

101.

Theo rem 1 is some wha t difficult t o use when investigating

various possible mod uli with mod ular arithm etic,

so

an alter-

nate set of conditio ns will be developed. Let

Z,

represent the

ring of integers (0, 1-, 2,

* * ,

M - 1) with arithme tic carried

out mod M.

Let

M

have the following unique prime power factoriz ation

M

= pi1 p 2 .

;l

(21)

where the pi's are distinct primes. As pointed out in Section

1,

when we carry,

our

arithm etic mod M, we are in effect doing it

modulo eachpi' simultaneously.

Therefore, the length N number theoretic transform having

the CCP in

Zw

must also have th e CCP in

Z p

.ri

for

i

=

1, 2,

*

*

I . This

requires that mod

p p )

an inteber oforder

N

must exist in

Z

r i ,

i.e.,

N is

the least positive integer such that

a N =

( m o d p ? ) , i 1 , 2 ; * * , 2 .2 2 )

Furthermore, since the inverse transform requires N-'

;

he in-

verse

of

N should exist in

Z ',.,

or,

N

should be relatively

prime to M. Now we investigate the existence of an

a

of

order

N , in

each

Zp:,.

By Euler's th eore m (9) and (22 ), we have

Pi

Pi

~ ~ c p ( p ? ) , i

= 1 , 2 ,

*

* * , I (23)

N l ( p i - l ) ,= l , 2 ; * . , 1

N I g c d b ,

-

1 , p z

-

l , * * * , ~ l -) .

We define

O M)

s the greatest common divisor (gcd) of the

( P i - 1)

O ( M ) P g c d { p 1 -, p z -, . . * , p l -

1).

(24)

Therefore,

N IOW).25)

Equa tion (25) gives the necessary conditio n for the existence

of a transform of length N in th e a rith me tic mod Now c on-

sider the converse of

it.

If

NIO(M)

r

N ( c p ( p i ' ) ,

hen here

exist integers

a i

(mod

p y )

of order

N

in Z ri. Using these

ai

we can constru ct ransform s (mod p y ) wh ch have the DFT

structure

of

(1 9) an d are nvertible.Combining hese rans-

forms by the Chinese remainder th eore m (18) one can obtain

a transf orm modM ) having the CCP in

Z,.

Alternatively,

one can combine the ai's by the Chinese remainder theorem

to obtain an

a

mod M ) f order N in Z and construct the

final ransform using this

a.

Th e results will be dentical.

The refo re, (25) is the necessary and sufficient c ondition for

the existence

of

an invertible transform

of

length

N

which has

the CCP mod M. This

is

stated

in

the orm

of

a theorem

[91, [151.

Theorem 2: A length

N

transform having the DFT structure

will implement cyclic convolution modM

if

and only

if

p i

N

O(M). (26)

This also establishes he maximum transform length in

Z,

as

Nmax =

O(M).

This

is a very impo rtant theorem that states exactly what the

possible transfo rm lengths fo r a given modulus are.

Althoughboth heorem s as stated here assume theDFT

structure of (1 9) , theyhold for any general transform having

the CCP

[

lo ]. I t is possible in a ring with a composite modulus

to have a transform with the CCP but not the DFT structure.'

Transforms

of

this sort do not allow an FFT -type fast algo-

rithm and, therefore, do not eem promising.

For numb er theoretic transforms to e attractive in compari-

son to other implem entations of convolution, they should be

computationally efficient.Thereare thre e requirements that

will be considered. First,

N

should e highly comp osite

(preferably apower

of

2) for a ast FFT-type algorithm to

exist and should be large enough for practical sequence lengths.

Seco nd, since complex multiplications take most of th e com-

putational effort in calculating the FFT , it is importa nt that

the multiplication by powers

of

a

be a simple operatio n. This

is

possible

if

the powers of

a

have binary represen tations with

very few bits; preferably also be a power of two, where multi-

plication by a powe r of

a

reduces to a word shift. Third, in

order o facilitate arithme tic mod

M ,

should also have a

binary rep resentatio n with a very few bits and should be large

snoug h to prevent overflow.

Although the class of all possible numb er heoretic trans-

forms seems very large at first consideration, closer examina-

tion shows tha t very few seem to satisfy the aforem entione d

criteria. The param eters that must be chosen are M,

N ,

and

a.

Unfortunately the conditions given by Theo rems 1 and 2 do

Or

( p i -

l ) (Phi - p i pi

l ) . SinceN is

structure

w a s

producedy

G .

Kopec at

M.I.T.

and led to this

ri - ri-1

'Anxample of aransform in Z,, having the CCP butothe D F T

relatively prime

to M

(or

pi 's)

observation.


5/11

5 5 4

not give a system atic way of determining the "best" choices.

As a result one must use intu ition , insight, and a bit of search-

ing. Usually an M is selected and the resulting possible

N

and

a

are then examined.

First we s e e that if M is even, it has a facto r of 2 and, there-

fore, O h 4 ) and Nmax re 1 which implies M should be od d. If

M is a prime then O ( M) = M - 1 which is as large as one could

hope for in a field of M

integers. For

M = 2 &-

1, let k be a

composite P Q , where P is prime. Then 2'- 1 divides fQ

1

and th e maximum possible length of the transform will be

governed by he eng th possible for 2

-

1. Therefore, only

the prime

k

need to be considered interesting. Numbers of this

form are known as Mersenne numbersand Radar [ 141 has

discussed convolu tion using Mersenne numbe rs in deta il. For

Mersenne number transforms, it can be shown that transforms

of length at least 2 P exist and he corresponding a is - 2 .

Mersenne num ber transforms are not

of as

much interest be-

cause

2P

is not highly composite and, herefore, we do no t

have fast FFT-type computationalalgorithms.

PROCEEDINGS

OF

THE IEEE, APRIL

1975

Number theoretic ransforms with Ferm atnum ber as a

modulus are calIed Fermat number transforms (FNT).

As discussed in the last section, for the FN T of length N t o

exist , N must divide O(F,) = Nmax. Now, we consider rans-

form lengths possible inarithmeticmodulo various Ferm at

num bers and also give the corresponding values of a .

Since F ermat numb ers up to F4 are prime, O(F, ) = 2 b , and

we can have an F NT for any length N = 2 m ,

m Q

b . For these

Fermat primes the integer 3 is an

a

of order N

=

2 b , allowing

the largest possible transfo rm length. The re re 2 ' -

1

othe r integers also which are

of

order 2b and can be obtained

from ( 1 3 ) . The integer 2 is

of

order N

=

2 b = 2' . If

(Y

is

taken as 2 or a power

of

2 , all the powers of a would be some

powers of 2 , and for these cases, as discussed in the last sectio n

and in [ 141, the FNT can be computed very efficiently and is

called the Rader transform (RT).

T o bet ter see the charac ter of these prime mo duli consider

an exam ple for F2 similar in m anner to that in Section 11. If

the modulus is M

=

F2

=

17 then

~

N = O 1 2 3 4 5 6 7

8

9

10 112 13 14 156

a N l

4

8

16 15 13 9 1

2

4 8

16 15 13

9

1

3 N = 1 3

9

10 13 5 15 11 16 14

8

7

4 12 2 6 1

q N = 1 4 16 13 1

4

16 13

1

4

16

13 1 4

16 13

g N = l 6

2

12

4 7

8

14

16

11

15

5 13 10 9

3 1

Fo r

M = 2 k +

1 and

k

odd,

3

divides

2&+

1 and the largest

possible transform length

is 2 ,

thu s w e consider only

k

even.

Let k be s2', where

s

is an odd integer. Then 2 2 f

+ 1

divides

2 '+ 1 and 'the length

of

the possible trans form will be gov-

erned by the leng th possible for 22' + 1. There fore, integers of

the form M

=

22'+ 1 are of interest. These numbers are known

as Fermat numbers andwill be discussed in detail inhis paper.

Fermatnumbers seem to optimum in the sense of having

transforms whose length is interesting while the word size

is

moderate. Numbers

of

the form

2 '+

1

are

also of

limited

interest and are discussed in Sectio n IX. A s ystem atic investi-

gation of those

M

which require more than two bit epresenta-

tion is difficult. Our preliminary investigation in that direction

has

not been very encouraging.

Here we see that 3 and 6 are primitive roots tha t will generate

the entire field 21 . The value 2 is

of

order 8 and 4 is of order

4 . Also note that 6 =*in th e sense th at 6 2

=

2 ( mo d 1 7 ) .

Fo r digital filtering applications, the compo sites F s ( b

=

3 2 )

and F 6 ( b

=

6 4 ) also seem to bepractical.Lucas [ 191 has

proven tha t every prime fac tor of a composite F ,

is

of the form

K2'

+

1. Therefore, 2' divides O(F, ) , for r > 4 . In par-

ticular it can be verified tha t for Fs and F 6 , O ( F , )

=

2 '.

Therefore, for hese choices of Fermat numbers, th e maximum

possible transform length

is

N

=

2'+'

=

4 b .

Also, we assert tha t

b given by ( 2 8 ) s of order 4 b in

ZF,,

2 .

*4

a4b

=

2b/4

( p / 2

-

1)

( 2 8 )

We denote this

(Y4b

as *because

IV.

FERMATNUMBER RANSFORMS

(Y

=

2 (mod F f ) .

In this section, we consider one of the m ost promising num-

ber theoretic transforms where the modulus

is

chosen

to

be a The proof that

a 4 b

given by ( 2 8 )

s

of order 4 b with respect

to

Fermat number

any fac tor of F ,

is

given in [ 9 ] . An y odd power of *will

also be of order 2'+'. By raising fl o (2'+2-m)thpower,

Table I below gives values of N for the two most important

( 2 7 ) values of & and also gives the ma xim um possible f l for the most

M = F , = 2 ' + 1

we obtainn integer

a

of order 2 m , m Q t + 2 .

= a b

1

b = 2 '

and F , is called the t th Ferm at num ber. Originally, Ferm at

conjectured

[

161 that these numbers were

all

prime but un-

fortunately not only was the conjecture wrong, it seems that

only FO through F4 are prime and all the others are omposite.

Th e first few values are:

Fo

=

F1

=

F2

=

i } prime

F3 = 257

F4 = 6537

F S = 4 294 967 297 641

X

6 7 0 0 4 1 7

F 6 2 1 . 8 4 x 1019 = 2 7 4 7 7 x 6 7 2 8 0 4 2 1 1 0 7 2 1 .

practical values of b .

Fo r

FNT s

with a prime or composite

modulus

we see

a = 2

or a power

of

2

is

possible for sequence lengths up

to

N

=

2 b

=

2'*'.

This

is a very desirable situation since

N

is highly com-

posite allowing an FFT type algorithm and all multiplications

by pow ers of a are simple word shifts. If a = a i s sed then

sequences of length

N

= 4 b = 2' are possible but one stage

in the FFT a lgor i tw wi l l requ ire two sh i f t s [ 9 ] . This a

= .\/z

and the resulting N = 4 b give the maximum length possible for

Fs and F 6 , however, for prime F , furth er increases in N are

possible up to N = 2 if mo re stages of th e FF T algorithm are

allowed to have multiplication rather than simple word shifts.

From this example it

is

seen that a

=

*= 6

for

M

= F z

gives

the max imum possible N .


6/11

AGARWAL AND BURR US: NUMBER THEORETIC TRANSFORMS 5 5 5

TABLE

PARAMETERSOR

SEVERAL

OSSIBLEMPLEMENTATIONSOR FNT’s

N

r b

F ,

Q = 2

CY Nmax

Q

forNm,x

3

2’ + 1

16

32 25 3

4

216

+

1

32

64

65536

5 32

2”

+ 1

64 128 128

6 64

264 + 1 128

256

25 6

a m ~

ase

corresponds to

t he Rader Transform.

Because of thenature of modulararithm etic discussed in

Section

11,

theFNTcoefficientsdonot seem to have any

physical meaning. Although th e signal for which the FN T is

being taken may be very small, its FN T coefficients may lie

anywhere between

0

and

F , -

1. This is because the conce pt

of magnitude does not exist. This also me ans tha t the con cep t

of

“closeness” of two num bers does not exist in the mo dula r

arithmetic.Therefore,approximationsor roundings arenot

allowed in the m odular arithmetic. A seemingly small approxi-

mation in the t ransform domain may introduceerious error in

the final esult. But, because of thenatu re of themodular

arithm etic, there is no need for a pproxim ation. During various

stages of the comp utation each accumulation of signal “over-

flows” man y times. But still the end result of the convolution

will be exac t

if

the input signals are prope rly boun ded. Some

of the propertiesof the FNT’s are given in [ 9, appendix

A ] .

Example

T o make th e ideas of this sectio n more clear, we now present

an example. This examp le will illustrate several poin ts: trea t-

ment of negative values in the data, the structure of the trans-

form and the inverse transform matrix, negative powers of a

frequen t “overflow” during com puta tion , meaninglessness of

the transform values, and exactness of the final answer. This

example will not demonstrate the efficient implementation

of

the FNT using the binary arithmetic.

Consider two sequences

x

=

( 2 , - 2 ,

,

0

nd

h =

(1,

2 , 0, ) ,

whose conv olution is desired. From he overflow considera-

tion, t is sufficient ifwe workmodulo

F2

= 17. We want

=

4 , for F2 the integer

2

is of order 8, therefo re

2’

=

4

s an

a

of order 4.

The transformation matrix

Tis

given by

4

16 13

(mod 17).

L1 136

4J

ince 4-’

=

- 4 (mod 17), he inverse transformationmatrix

T-’

is given b y

r l l 1 1

T-1 = 4-1

r l l l

= - 4

I

1 - 4

-1

1 -1 1

L1

4

- 1

- 1

- 4

r l 1 1 1 1

= 1 3 I I (mod7).

1 136

4

1 16 1 16

Ll 4 1 63 1

The t ransformsof

x

and

h

are given by

1 1

4

16

16 1

13 16

Note that in x ,

- 2

was represented by

- 2 +

17 = 15. Similarly,

H

= (3 , 9 , 16 , 10 ) and

Y

= X

H =

( 3 , 9 0 , 8 0 , 9 0 )

=

(3,

5, 12, 5 )

(mod 17).

Taking the inverse transform

of Y ,

y = ( 2 , 2 , 1 4 , 2 )

(mod 17).

According to

OUT

assum ption, integers are supposed to lie be-

tween -8 and 8. Therefore, 14must beepresented as

14 - 17 = -3 . This gives

y

= ( 2 , 2 , 3 , 2 ) , which is the co rrect

answer.

Also,

note that

y is

a symme tric sequence, therefore,

Y

is also a symmetric sequence. Oth er than this, the transform

values seem to have no interpretation .

For man y applicationsa direct application of the FNT

to

imple men t convolution will result in a significant impro vem ent

over any alternative methods. There are many other situa tions

where the constraintsof the trans form are too severe.

If data magnitude or machine constrain ts dictate acertain

word length andhencea certain F , , the allowed sequences

length

N

may be too short. If input da ta magnitude and filter

length indicate a possible out pu t magnitude t hat w ould exceed

F , / 2 ,

then overflow becomesaproblem. If

b

bit words are

used w ith a modulu s of

F , = 2 b +

1, themachine can represent

2b

integers but the transform needs

2 b + 1.

We now consider

several partial solutions t o these problems.

V.

M E T H O D S

FOR CONVOLVINGLONG SEQUENCES ND

Arithmetic mod

F ,

can be implemented using

b

=

2‘

bit

represen tation of integers with some provision for representing

2 b .

We have seen the maximum length of sequences which can

FOR

AVO IDING

VERFLOW


7/11

5 5 6


1975

TABLE

I

M A X I M U M ONE-DIMENSIONALYCLICCONVOLUTION ENGTHSUSING

TWO-DIMENSIONALNT O R RT

Word Length

b

N or Q = 2 N or Q =

fi

16 512

32

64 8192

be cyclically convolved using the FN T with

a

=

2 is N

=

2b and

therefore the length of sequences which can be convolved is

proportional to the word length n bits. Thus, for long se-

quences, word length equirem ent may be excessive. Rader

[ 141 suggested using a two-dimensional convolution scheme to

convolve longone-dimensional equences nd Agarwal and

Burms [ 151, [231 presented such a two-dimensional convolu-

tion scheme. Using this schem e, cyclic convolution of leng th

N = LP is implemented as a two-dim ensional cyclic convolu-

tion of length 2L by P. This two-dimensional cyclic convolu-

tion can be imp leme nted using a two-dimensional FN T [ 151,

[2 3] defined similar to the one-dimensional transform. Using

this two-dimensional schem e, the w ord length required is pro-

portional to the square root

of

the length of the sequences to

be convolved which would give for a max imum sequenc e le ngth

8b2, rather han 4b. If

P

is taken as the maximum possible

length 4b, and 2L is a small integer, than either direct convolu-

tion or ano ther high-speed algorithm could be employe d

[

23

1

,

tocom pute convolution along theshort dimensionand the

one-dimensional FN T could be used along the long dimension.

Com putation ally this c omb ination can be very efficient as will

be shown in the implem entation in Section VIII. Table I1 lists

the maximum engths for two-dimensional ransforms. This

approach requires approximately a factor of two increase in

com puta tion and storage requirem ents over a irect ne-

dimensional implementation.

Anotherapp roac h o achieving longer ransforms than

al-

lowed with a

=

2 is to use a =

fi

n (28). It can be show n

[9, 15 tha t

a

=

fi

s of order N

=

4b

=

2'" and th at mu lti-

plication timesfi equires two word shifts..

Examination

of

the FF T algorithm [ 11 shows that if a =fi

isused, only one tage will require multiplication by odd pow ers

of

fi

nd from [9 ] i t is shown his can be done with two

word shifts. The oth er stages will multiply by even powe rs of

fi nd, therefore , use a single shift as for the case with a = 2.

This modification is relatively simple and allows a dou blin g of

the allowed N. Note in the example for Fz that fi 6 which

also gives N

= N m s .

Each additional square root of

a

results in a doubling of the

allowed N ( u p to N = Nma) and adds an addition al stage

of

calculation to th e FF T algorithm. Unfortunately, beyond

fi

each stage will require a general multiplication. Fo r he case

where F , is a prim e, if a few stages of multiplication can be al-

lowed, then N can be increased [ 251. For F 5 and F6 a =

fi

gives the rnaximum N.

This use of a 2 can be com bined w ith the two-dimensional

me thod s to give various desired N.

An othe r possible problem arises because of the mo dular arith-

metic. In the ring of integers mod M, conventional integers can

be unambiguously represented only if their absolute value is

less than M /2. If the npu t integersequences x(n ) and

h ( n )

are so scaled that Iy(n)l never exceeds M /2, we would get the

same results by implementing convolution inhe ring of integers

modulo M as thatobtained with normalarithmetic.This

is

similar tohe overflow constra int inixed-point digital

machines. In most digital filtering applications,

h n )

represents

the impulse response and is known

prior i ;

also the maximum

magnitude of the inpu t signal is usually know n. In this situa-

tion, we can bound the peak output magnitude by

k O

This may well require a longer word leng th than

is

possible or

practical.

One possible so lut ion to this overflow problem involves seg-

menting the words into sho rter blocks an d convolving them

separately

[

91

:

x(n) = x z ( n )+x1(n)2k, Ix,(n)l < 2k

h ( n ) = h z ( n ) + h , ( n ) 2 k , Ihz(n)l < 2 k ( 3 0 )

+ ( x 1

*

h2 + x 2 *

h , ) 2 k

+ x 2 *

h z .

(3 1 )

y = x * h = (x1 * h1)22k

Now, since x1 , h 1 , x ? , and h2 have roughly half the num ber

of

bits, it should be possible to convolve them using a pproxim ately

half the num ber

of

bits. If necessary, a more precise analysis of

the above situation could be easily perform ed. In (31), the last

term, in comparison to the first term , is very small and can

be

neglected. We need t o take tw o transforms for x and tw o trans-

forms for h , the summation shown within the parentheses can

be performed in the transform dom ain.Finally, we need t o take

tw o inverse transforms, one orx 1 * h l and theother or

There is anoth er alternative t o thisproblem suggested by

Rader [ 241 and Parks [251. This

is

based on he Chinese

remainder theorem. The convolution is done modulo two dif-

ferent integers Ml and

M 2

where

Mi

and M2 are such that t he

cyclic convolution in

Z M ~

nd

Z M ~

s easily im plem ented on

the same machine. The final result mo d M1

.

M z is obtained

by the Chinese remainder theorem. M1 is usually a Fermat

number and the cyclic convolution in Znnl is computed using

the FNT. Rader [ 241 suggested usingM2

a

power of 2, in that

case the convolution n ZM? is computed by taking the se-

quences mod M z and then convolving them in Z M ~ sing the

FN T and hen reducing hem back to Z M ~ . n this case M2

should be small enough so that no error is introduced by im-

plementing convolution mod M z , n Z M , . This requires

(x1 * h2

+

x2

*

h l ) .

NM; < M ~ . (32)

Parks [25

I

suggested tha t M z could be a Fermat number, just

smaller than M 1 .

In this case the convolution in Z M ~an also be done using an

FNT. Furtherm ore, the same machine can be util ized to com-

pute the FNT in Z M ~ . ecause M2

I

M1

-

2), therefore, arith-

metic in Z M ~ an be carried out in

Z ( M , - ~ ) .

Example:

M1 =

216 + l M1 2

=

216 - 1, an d Mz

=

2* + 1 These ideas can be

further extended.

Stil l another approach to solving the sequence length N a n d

word le ngth cons traints would be t o use block processing [ 31.

By breaking the sequence of length N into smaller blocks and

scaling and processing them separately with the FN T one can

combine the results to get the desired outp ut. This can be

viewed

as

a type of two-dimensional processing.

VI.

OVERFLOW AND

QUANTIZATION

CONSIDERATIONS

As mentioned in Section

V ,

we could perform cyclic convolu-

tion modulo integer

M

and obtain the correct result

if

the ab-

solute value of the outp ut never exceeds M/2. If this conditio n


8/11

AGARWAL AND: NUMBER THEORETIC TRANSFORMS 5 5 7

is violated, the resulting err or is rath er serious. Because of the

nature of the modular arithmetic we obtain folding or aliasing

of

the signal amplitude . This situation could be avoided if the

signals are properly quantiz ed (or normalized).

Let x ( n ) and

h n )

represent the original signals. They may

have fractional parts (bits to the right of the binary point). To

make use of the num ber theo retic transforms these sequences

must be integer equences.This is easily accom plished by

merely shifting the binaryposition all the way to he right.

This introduces scale factors in the sequences. The integer se-

quences

? n)

and

c n)

are given by

? n)

=

x ( n ) 2bl (33)

C n)

= h n )

2b1 (34)

where

bl

and

b z ,

espectively, represent the num ber of bits to

the right of the binary poin t in x ( n ) and

h n )

sequences.

jqn)

=

x (n) * L n)

=

2b1+ba x ( n ) * h n )

= 2b1+bly n ) . (35)

Now wegive some u pper boun ds on h e output y n ) . These

are due t o Jackson

[

2

1

.

The

L ,

norm

of

a

signal is defined by

The ou tpu t of the cyclic convolution is bounded by

Ijqn)l


9/11

5 5 8


1 9 7 5

TABLE 111

CYCLIC CONVOLUTIONTIMINGS

OR

LENGTH

N

REAL

SEQUENCES

FFT FNT or RT

N ms) ms)

32 16 3.3

64 31

1 2 8

1.4

16.6a

256

256 8O.Oc

512 245 166.OC

102 4 53 0 34O.Oc

204 72O.Oc

W i n g

a

= IT.

%sing two-dimensional RT.

2

by

128 convolution.

fast FN T, unlike the FF T, we do not need t o store the powers

of

a

(if

a

is taken as 2 or a power of 2).

VIII. C OMP A R I S ON WITH THE

FFT

As

noted in the previous section, computing the FNTs a very

simple operation on a binary machine. Now let

us

compare the

complexity of various basic operations involved in computing

the FN T vis-&vis th e FFT . If the tw o sequences x ( n ) and h ( n )

have

b l

and b z bit rep resentatio ns, respectively, andare of

length N , then the ou tpu t y ( n ) would need no more than a

(b l

+

bz +log , N) bit representa tion. To obtain he correct

result b 2 l

+

b2

+

log, N . In Section

V ,

we havegiven a

bet ter bound on the ou tpu t . n Section VI, we have given other

bounds. Roughly speaking, we need twice the num ber of bits

to carry out the convolution using the FN T as compared t o the

fiied-pointFFT mplementa t ionof heconvolu tion. But in

the DFT, every data point is treated as a complex number and

therefore requires two w ords, one for the real part and on e for

the imaginary part. Thus, in effect, the hardware requirem ent

for two transforms is about the same. Although for real data

it

is

possible to make use of the sym metry propertie s of the

DFT’s they require extra computatio n and for the purpose of

comp arison it will be ignored, even t hou gh we have take n this

into accoun t for our IBM 370/155 implem entation to be dis-

cussed later. Therefore, we shall assume that in the FF T imple-

men tation, each data p oint is represented by a b/2 bit real part

and a b/2 bit imaginary p art,

One b/2 bit com plex addition is equivalent to tw o b/2 bit

real addition s, which are comparable o a b-bit ad dition mod F,.

Thus, the complexity of addition/subtraction is the same in

both the transforms. Similarly, it can be shown that a b/ 2 bit

complex multiplication is comparable t o a b-bit m ultiplication

mod

F,.

Computation of the RT requires multiplications by

powers of 2 , which imple me nted as bit shifts and subtraction s

become much simpler operatio ns compared to complex multi-

plications required in the F FT implem entation.

To ompute aength

N

fast

RT, N

log2 N additions/

subtractions, and

( N / 2 )

og2

N / 2

multiplications by some

powers

of

2 are required which are implemented as bit shifts

and subtractions. To compute the convolution using the FF T,

most of the time is taken in com puting the complex multipli-

cations required to comp ute the ransforms.

A

comparison with

RT reveals that these complex multiplications are replaced by

bit shifts and subtrac tions which are much faster operatio ns.

This results in considerable com putation al savings in th e imple-

me ntatio n of convolution. Th eomp utation required to

multiply he wo transforms is about he same for both he

implementations. To convolve long sequences using the two-

dimensional RT the computational effort and required storage

increases by, at the mos t, a factor of 2. Still, the FN T imple-

men tation of convolution is much faster as com pared to th e

FFT implementa t ion .

These transforms were implem ented in assembler language on

an

IBM 37 0/1 55 which has a 32-bit word length

[ 9 ] .

he re-

sults were compared with an efficient F F T program for com-

puting convolution which m akes use of the sym metry

of

the

D FT for real da ta (see Table 111) [201.

Ix.GENERALIZATIONS,

ARIATIONS, A N D OTHER

RESULTS

A . Other Choices

for

M

In this paper, we have primarily discussed numb er theoretic

transforms n he rings of integers modulo Ferm at numbers.

These numbers seem to be the best choice for impleme ntation

on binary compu ters. Nevertheless, any odd integer M can be

used as was discusse d in Sec tion 111. Rad er [ 141 proposed the

use

of

Mersenne numbers M p

=

2p - 1, where p is aprime.

Mersenne number ransforms for (11 = - 2 have N = 2p, and ,

therefore,oot have an FFT-type fast computational

algorithm.

FNT’s require the com puting word leng th to be a pow er of 2.

Many com puters do not have word length a power of 2. Trans-

forms similar to FNT’s exist for many of these situations. For

example, on a 24-bit machine, we may perform convolution

modulo

M

= 224

+

1. Fo r this

M , a

= 23 gives N = 16, and

a

= 2%’

= 2’(212

- 1) gives N = 32 as the maximum length.

In general one could take M = 2”

+

1, s is an od d integer.

In hat case,

a

= 2’ would g ive N = 2“’, and

a

=

(2’)’’’

=

situation s, t may be possible to have transforms of greater

length tha n 2“’. Fo r xam ple , taking M

=

240

+

1, he

maximum possible transform ength is 256 , and taking M =

2w

+

1, the maximum possible transform length is 1024 , but

the corresponding

a’s

may not be simple.

For many computers whose word length is not a power

of

2,

if t he above formu lation is used to compu teransforms

analogous to the FN T, the maximum transform length is very

small. But, if we are willing to sacrifice the “effectiveword

length” to some ex ten t, we can increase the m aximum trans-

form length significantly. Let b , a multiple of 4 be the word

length of the m achine. Let

M =

2b + 1 = M l M 2 .4 7 )

M I

and M z may be nonprim es. t can be easily proved th at

r

2[(’-1)/2+’zr-212’Zf-l - 1) would give N = 2r+2. In many

It may

so

happen tha t M 1 s a small integer and O ( M z )

>>

O M), therefore, because of the presence of M 1 he maximum

transform length is being considerably educed while at he

same time M1 may not be increasing the maximum allowable

outpu t (y,,,=) or he “effective word length” significantly.

In this situation we can com pute transforms n

2~~

with

maximum ransform length O M,) ndmaximum allowable

output magnitude as

M z

/2. At the expense

of

reduced output

range, we have increased the transform ength. Fu rther mo re,

ar i thmetic mod M z can be conveniently ca m ed ou t as arith-

met ic mod M , because

M 2

is a factor of M. At the end of the

com puta tion, we have to reduce the result mod M 2 .

Table

IV

shows this factoriza tion of M for several values of

b .

Log2

M z

shows the “effective word length”

of

the machine.


10/11

AGARWAL ANDS: NUMBER THEORETIC TRANSFORMS

5 5 9

TABLE IV

FACTORIZATIONF M = 2 b t 1

AS MI

’M, A N D

THE h h X l M U M

TRANSFORM

ENGTHC O R R E S P o N D I N G TO M2

Machine Word

~~~ ~

Effective Wordn

2111

Length Length

log2 M z N or

a = 2

N or

cz

=

fi

b

O W )

MI MZ approx. N,,, = O(Mz) 2 bb

12 16

17

24 1 8 240

24 48

20 16

17 61681 16 61680

40 80

24 32

257 97 X 673

16 96 486

28 16

17

15790321 24 15790320 56 112

36 16

17 X 241

433 X 38737 24 144

7244

40 256

257

4278255361

32 4278255360

80 160

48 64

65537

193 X 22253377

32 192

96 192

.56 32

257

5153 X 54410972897

48 224

112 24

60 16 17 X 241 X 61681 4562284561

32 4562284560

120 240

72

97

X

257

X

673

577 X 487824887233

48 576 144 288

80 1024

65537

44479210368001 X

645360

16020

414721

TABLE

V

ME PARAMETERS FOR

NUMBER

HEORETICRANSFORMSN

DECIMAL ARI THMETI C

Machine Effective Digits

b

O M ) MI

MZ approx. N,,,

=

O(Mz) N for cz = 10

4

8

1

73 X 137 4

600

101 9901 69002

8 16

17 5882353

7 5882352 16

10

20

101 3541 X 27961

80 20

12

73

X

137 99990001

89990000 24

16 32

1

353 X 449 X 641 X 1409 X

162 32

~~~~~

Digits

1 10 Mz

69857

Note that for the hoices

of Mz’s

show n in Table IV 2 is always

of

order 2b and also there exists an integer fi 2b/42’” - 1)

which is

of

t he o rd er 4 b in Z M ~ . This leads to an efficient

implementation

of

these transforms, because powers of a are

simple and arithmetic mod M is also simple. For these trans-

forms the w ord length is not a power of 2, but, still the trans-

form length 4b is highly composite. The case b

=

60 needs

special attent ion, for this if M I

is

taken as 17 for remaining

Mz,

M z ) =

240 bu t fi s no t of orde r 240. This is because

fi s

of

order 48 in

Z Z ~ ]

nd of o rder 80 in Z b la l . For th is

case although one can find an integer

a

which is

of

order 240

in Z M ~t is no t likely to be simple.

Thus fa r, our discussion was based on the assumption th at

thecomputer is abinary com puter. Many com puters have

decimal representation of integers and for these com puters, it

will be efficie nt

if

the ar i thmetic

is

done mod

M

=

10’

+

1 and

a and powers of

a

are powers

of

10. We have compiled Table

V to be used for decimal com puters. Similar tables can be com-

piled for othe r radices also.

B.

Complex Number Theoretic Transforms

The nteger field Z , (assuming M to be prime) can be ex-

tend ed to complex integer field denote d by

Z&,

if the follow-

ing equa tion does n ot have a solu tion in Z,:

xz +

1 = 0. (49)

This means

(-

1) does no t have a square root in Z or equiva-

lently a root of order 4 does not exist in M. This implies

Equation 50) s the only condition required for Z& to exist.

In Z every integer is represented

as a +

b ;

a ,

b

E

Z, All the

arithmetical operations are done as in thenormalcomplex

arithmetic with j z

=

-

1.

Both real andcomplex parts are

evaluated mod

M,

separately. The conce pt of magnitude and

phase does not exist in Z h . Complex number theoretic trans-

form s (CNT) similar to N TT exist in Z& , and can be used t o

com pute he cyclic convolution of tw o complex nteger se-

quences. To avoid error due toaliasing both real and imaginary

parts

of

the outp ut should be separately bounded to M /2. Th e

idea of the CNT has been considered by Reed [ 261.

Theorem: A transform having the cyclic convolution prop-

erty in Z& exists if and only

if

N I M Z - 1).

We will n ot give a form al proo f

of

this theorem , but we

will

outline a procedure to find a complex integer a of order

N

in

Z&, if N divides ( M z - 1).

Theorem:

This theorem can be easily proved. This theorem implies that

every complex integer is at mo st an (M 1)th root of a real

integer, in

2 .

Let

a + j b

be an N th roo t of a real integer, i.e.,

N is the least positive integer such that

(a

+

b>N = a real integer 53)

then by 5

2)

4 1 O ( M ) = M -

1.

( 5 0 ) N I ( M

+

1). (54)


11/11

560

How to find an

a

of order M Z 1 in Z&:

Consider complex integers of the form (1 + b ) and search over

b

E Z M

such that (1

+ j b )

is an

M

1)t h roo t of a real integer

(proof for the existence

of

such a

b

can be given).

Then,

(1 +jb>M+’

= I

+ bZ.

55)

Let aM-, e a root of order

M

1) in Z M . Then 1

+

b Z can

be written as some power of

a ~ - ~

(1

+

bZ)

=

a j f - 1

.

(56)

It can be shown that x is odd. Then , -1 given by ( 57 ) is of

order

(Mz

1)

in

Z&.

aMa 1

=

(1

+

b )

aJ?yl)’z)k

(57)

This

can be easily proved. By raising a p - l by Mz l ) /N) th

power, we can find a complex integer a~ of order N in Z& if

NI(Mz - 1). It can be easily proved that

g c d ( M -, M +) = 2 . 58)

Let

N = g c d

M -

, N )

N = N I X N Z

then a~

will

be complex, but (YN,& will be real.

In the fast F FT algorithm to com pute the CNT the part cor-

responding to N z will require complex arithmetical operatio ns

but the part corresponding to N1

will

require only real arith-

metical operations. CNT are good in theory as they offer more

choice n ransform engths.But,

so

far no CNT have been

found for which powers of a are simple. CNT do no t exist for

Ferm at numbers, but they exist for Mersenne numbers.

The only extension we have investigated is when

M = P l Pz ” ‘ P I ( 5 9 )

where

p i ’ s

are distinct primes. We have not investigated th e case

when M contain s prime powers. Z M can be extended to Z& as

before if

4 Y ( p i -

11, i =, 2 ; * . , 1 . ( 6 0 )

Also

the Chinese remainder theorem can be used in

Z&.

It is

applied separately to the real and imaginary parts. For CN T to

exist inZ&, they must exist inZ& also. This gives the following

theorem.

Theorem: CNT of length

N

in Z& exists, if and only if

N lg cd {p :- 1 , ~ ; -, * . * , p ? -

1 ) .

(61)

We find ai

of

order N in Z i i and then combine them by the

Chinese remainder theorem to obta inan a of order N

n

Z&.

C.

Application t o Tw o-Dimensional Filtering

Rader

[

271 has recently discussed the application of t h e FNT

to two-dimensional filtering. In 2-0 applications, the length of

the impulseresponse along eachdimension

is

not too large,

therefore the FNT’s are ideally suited for this application, be-

cause, for these applications the length constra int

of

the FNT’s

PROCEEDINGS O F THE IEEE, APRIL 1975

is not impo rtant. Other choices

of

M discussed in this section

can also be used for this application.

REFERENCES

[ 11 B. Goldand C.M. Rader, Digital

Processing

of

Signals.

New

[ 2 ] T. G. Stockham, “High speedconvolutionandcorrelation,” n

York: McGraw-Hill, 1969.

AFIPS Conf. Proc.,1966 Joint Cornpurer Con5, ol.

28,

pp.

229-

233 (also in [?SI).

[

31

C. S. Burrus, “Block realization o f digital filters,” IEEE Trans

Audio Electroacoust.,vol. AU-20, pp.

230-235,

Oct.

1972.

4 )

D.E. Knuth,

The Art

of

Computer Programming,

vol.

2 ,

Semi-

numerical Algorithms.

Reading, M a s : Addison-Wesley, 1969.

[

51

J M. Pollard, “The fast Fourier transform in a finite field ,”M ath.

61 A. Schonhage and V. Strassen, “Fast mult ipl ication of large num-

Comput.,vol.

2 5 , pp.

365-374,Apr. 1971.

71 J W. Cooley and J W. Tukey,

“An

algorithm for machine calcula-

bers,”

Compur.

(in German), vol. 7, pp. 281-292, 1971.

t ionofcomplex Fourier series,”Math.

Cornput.,

vol. 19, pp. 297-

301, 1966

(also in

[28]) .

81 A. V. Oppenheimand C. Weinstein,“Effectsof finite register

length in digital f i l tering and the fast Fourier transform,” Proc.

91

R.C. Aganv al an d C. S. Burrus, “Fast convolution using Fermat

number ransforms with applications to digital f i ltering,”

ZEEE

pp. 87-97,

Apr.

1974.

Trans. Acoustics,Speech,

and

Signal Processing,

vol. ASSP-22,

IEEE,voI.

60,

pp.

957-976,

Aug.

1972.

[

101 Pi J

Nicholson, “Algebraic theory of finite Fourier transform s,”

J. Comput. Syst. S c i , vol.

5,

pp.

524-547, 1971.

[

1 1 1 I.

J

Good, “The relat ion between two fast Fourier transforms,’’

IEEE Trans. Compur., vol. C-20, pp.

310-317,

Mar.

1971.

[ l Z ]

D. E. Knu th, The art of omputingprogramming-errata et

Rep. STAN-CS-71-194, pp. 21-26, Jan. 1971.

addenda,”Com put. Sci. Dep., Stanf ord Univ., Stan ford, Calif.,

[

131

C. M. Rader,“Thenumber heoreticDFTandexactdiscrete

convolution,” presented at IEEE A rden House W orkshop on Digi-

tal Signal Processing , Harrima n, N.Y., Jan.

11, 1972.

[ 141

-,

“Discrete convolutionvia Mersenne transforms,”

IEEE Trans.

Compur.,

vol. (2-21, pp. 1269-1273, Dec. 1972.

[

151

R.C. Agarwal and C. S. Burrus, “Fast digital convolution using

Fermat ransforms,” n

Sourhwesr IEEE Con f. Rec .,

pp. 538-

[ 161 0 re,

NumberTheory and ItsHistory.

NewYork:McGraw-

543,

Apr.

1973.

[ 171 G. H. Ha rdy and E. M. Wright,

The Theory o f Numbers.

Oxford,

Hill, 1948.

[ 181 N.

S.Szabo and R. I. Tanaka, Residue Arithmetic and Its Applica-

England: Oxf ord Univ. Press,

1960.

[ 191

L. E. Dickson, Hisrory of the Theory of Numbers, vol. I. Wash-

t iom

to

Computer Technology.

New York : McGraw-Hill,

1967.

[ZO]

R.C. Singleton,

“An

algorithm for compu ting he mixed radix

ingto n, D.C.: Carnegie Institute,

1919,

p.

376.

fast Fourier transform,” IEEE Trans. Audio Electroaco ust., vol.

I 2

1

L.

B. Jackson. “On the interactio n of roun d-off no ise and d vnam ic

AU-17, pp. 93-103, June 1969 (also in [ZS]).

. .

range in digital fiiters,”

Bell Sysr. Tech. J. ,

vol. 49, pp. 159-1 84,

1221

R.C. Aganval, “On realization of digital filters,” Ph.D. disserta-

Feb. 1970 (also in [ZS]).

tion , Dep. Elec. Eng., Rice Univ., Ho us ton , Tex., Dec. 1973.

[ 2 3 ]

R.C. Agarwal and C. S. Burrus, “Fastone-dimensionaldigital

convolutionymulti-dimensionalechniques,” IEEE Trans.

Acousr., Speech, and S nal Processing, vol. A SSP-22, pp.

1-10,

Feb.

1974.

[ 241 C. M. Rader, Private Comm un.

(261

I. S. Reed, Private Com mun.

[25 T. W. Parks, Private Commun.

1271

C.M. Rader, “On the application ofhe umberheoret ic

transforms of high speed convolution to two-dimensional f i te r-

ing,” ZEEE Tram. Circuit Theory, o be published.

[ 2 8 ] L. R. Rab iner and C.M. Ra der , E&.,

Digital

Signal

Processing.

New York: IEEE Press,

1972.

[ 2 9 ] C.M. Rader, A no teonexact discreteFourier ransforms,”

IEEE Trans. Audio El e c tr ~ o u st . Conesp.), vol. AU-21,pp.

[ 301 H. Takah asi and Y. Ishibashi, “A new meth od for ‘exact calcula-

558-559, Dec. 1973.

t ion’ by a digital compu ter,”

Znform. Process. Jap. ,

vol.

I,

pp. 28-

42, 1962.

Documents

1975 Number Theoretic Transforms to Implement Fast Digital Convolution