Upload
rajesh-bathija
View
217
Download
0
Embed Size (px)
Citation preview
8/20/2019 1975 Number Theoretic Transforms to Implement Fast Digital Convolution
1/11
5 5 0 PROCEEDINGS
OF
THEEEE,
VOL. 63
NO. 4, APRIL
1975
Number Theoretic Transforms to Implement
Fast
Digital
Convolution
Invited Paper
Absmiet-Transforms using number theoretic concep ts are developed
as
a method for
fast
and
erroraee
calculat ion of r i t e digital convolu-
t ion. The transforms are def i ied on f ini te f ie lds
and
rings of integers
with
ari thmetic ed o u t
modulo
an integer
and it
is shown that
undercertaincondit ions this gives th e same esults as conventional
digital conwlution. -use of these characterist ics hey are ideally
suited to
digital
computat ion by
t king
into account quant izat ion of
amplitude
as
well as t im e n h e ir d e f h i o n . W h en t h e m o d u l u s
is
chosen as
a
Fezmat number
a
t ransform results that requires on l y on
t h e o r d a of N og N addi t ions and word shifts b u t no multiplications.
In addit ion
to
being efficient , they
have no roundoff
enor a n d d o not
require storage of basis functions. There s a restrict ion on sequence
length mposedbyword engthanda problem withover t low but
methods for overcoming
these
are presented. Results of an implemen-
t a ti on o n an IBM 370/155 are presented and
compared
with the fast
Fourier
tnnsfom
showing a substan t ia l improvement in effkiency and
accuracy.
Variations
o n t h e basic number theoretic transformsare also
presented.
F
I . INTRODUCTION
INITE DIGITAL convolution is a numerical procedure
defined by
y ( n )
=
h n - m ) x ( m ) ,
n = o ,
1 , 2 , .
.
. (1)
N - 1
m
O
and symbolically denoted
~ ( n )h ( n )
* x ( n )
where x(n) , h ( n ) , and y ( n ) are digital numbe r sequences. This
operatio n has many very powerful applications. It is used to
imp lem ent nonrecursive or finite-impulse-response digital filters
either directly or w ith sectioning or block technique s 11
[
2
1
and recursive or infinite-impulse-response digital fiiter by block
me thods [3] . I t also is used to carry out a uto and cross corre-
lation as well as for com putations such
s
polynomial multipli-
cation and multiplication f very large integers [41 -[ 6
There are several methods to implement f inite convolution
that differ in the amount o f computa t ion required , the ef fec ts
of
arithm etic roundo ff, and the amo unt of storage required.
It is som ewh at difficult to com pare various algorithms because
of
the tradeoffs between these various fac tors thatd epen d on
the hardware o r software that
is
available. However, because
of
the complex i ty
of
performing multiplication, the num ber
of multiplications necessary to implement convolutions often
an impor tan t fac tor to beinimized.
This work
was
suppor ted in
part by
the National Science Founda tion
Manuscript eceived Septem ber
5 ,
1974; evised October 15, 1974.
under Grant GK-23697.
R. C. Aganval
was
w i t h the Depar tment of ElectricalEngineering,
Rice University, Houston, Tex. He is now with the IBM Thomas
J
Watson Research Center, Yorktow n Heights, N.Y.
10598.
University, Houston,
Tex
77001.
C.
S. Burrus is with the Department of Electrical Engineering, Rice
The use of transform methods has proven to be useful when
an application allows sequences to be processed in blocks. Th e
most versatile transform
is
the discrete Fourier transform
(DFT) defined by
DFT[ x l 4 X ( k ) = x(n) xp - j2nnk/N) ,
N -
1
n = o
k = O , l ; . - , N - 1 (2)
and the inverse transform
N -
1
k O
I DFT[ Xl
4
(n) = N - ' X ( k ) expj2nnk/N) ,
n = 0 ,
, . * * , N - . (3)
The proper ty of this transform that is important here is t h e
cyclic convolu tion property (CCP) which states that
DFT[h * x ]
=
D F T [ h ] D F T [ x ] .
This implies that a convolu tion can be calculated by
y ( n ) =
I DFT( DFT[ h ] DFT[ x ] }4 )
using two transforms,N multiplications, and one inverse trans-
form.Theconvolution mplementedby 4) is cdl ed cyclic
convolution since it evaluates (1) as if h ( n ) and x(n) were
periodically extend ed outside of the range from 0 t o
( N -
1)
or, equivalently, the ndite sare evaluated mod N . Normal
finite c onvolu tion can be calculated by cyclic convo lution if
zerosare appende d ox(n)and h ( n ) to prevent olding or
aliasing [ 11, [21.
This transform approa ch became useful only whe n Cooley
and Tukey
[
71 introduced a very efficient algorithm kno wn as
the fast Fourier transform (FFT) for calculating the DFT and
its inverse in
(2)
and 3) .Thenum ber of multiplications
necessary to calculate the FF T
of
a number sequence
of
length
N is on he order of N log2 N. mplementation of convolution
using the FF T esults in a considerable
savings
in m ultiplication
when lengths are approx imate ly above N
=
32. Th e disadvan-
tage of this approach is in the form of significant amounts of
roundoff error [
81
storage or generation
of
the complexbasis
functions hat have tobe oundedand stillaconsiderable
amount .of multiplying.
If one look s for the properties that a general transform with
the DFT st ruc ture
~ ( k ) x ( n ) a n k
must have to have the CCP,
it
is
f o u n d [ 9 ] ,
[
101 that
a
s a
N -
1
( 5 )
n=O
8/20/2019 1975 Number Theoretic Transforms to Implement Fast Digital Convolution
2/11
AGARWAL AND BURR US: NUMBERTHEORETIC TRANSFORMS
root of unity
of
order
N ,
i.e.,
N
is the least positive integer
such that
d v
1. (6)
This analysis shows that n hecomplexnumb er field, the
conventional DFT with
a
= exp (- j27r/N)
is
the only t ransform
given by
( 5 )
with he CCP. If, however, oth er fields nd
arithm etic systems are used, new transform s becom e possible
wit h very inte resti ng properties. This is pursued by considering
mathematical systems that are fundamentally compatible with
digital comp uting capability.
In
any practical situation,orwhen workingwith digital
machines, thedata are available onlywithsome inite pre-
cision, and therefo re, witho ut loss of generality, the data can
be considered to be integers with some uppe r b ound . T o com-
pute convolution in thisdigital domain,operations n he
complex umber field of the ontinuou s omain can be
imitated in a finite field o r, more generally, in a finite ring of
integers under ddition s andmultiplications mod ulo some
integer M.
An
integer
a
of orde r N replaces exp -j2n/N)
used in a D FT. In this r ing, when two integer sequenc esx(n)
and h n ) are convolved, t he ou tp ut integer sequence y ( n ) is
congruent to the convent ional convolution
of
x(n) and
h ( n )
mod M. In the ring of integers mod M, co nven tional integers
can beunamb iguously epresented if thei rabsolu te value is
less than M/2 . If the in put integer sequences x(n ) and
h ( n )
are
so scaled that
ly n)l
never exceeds M/2, we would get the
same results by implem enting convolution in the ring of inte-
gers mo d M as that obtained with norma l arithmetic. This is
similar
to
the overflow constra int in ixed-point digital ma-
chines. In most digital filteringapplications, h ( n ) represents
the impulse response and is know n
a
priori;
also the maximum
magnitude
of
the inpu t signal
is
usually known.
By workingn finit e field or ring of integers with
arithm etic carried out mo dulo an integer M, a large class of
transform s exist that have the CCP. By special choices of the
leng th N , th e m od
M,
and the value
a
it is possible
to
have
transforms hat need only word shifts and additions but no
multiplications, that have an FF T typ e fast algorithm , that do
not require storage
of
complex values for
a,
nd that have no
round off errors. These transform s are called num ber theoretic
transform s (NTT) and they look very promising in the evalua-
tion of fin ite convolutions. Their main disadvantage seems to
be a relation of the sequence length N to the required word
length th at can requ ire long word lengths for long sequence
lengths.
These number heoretic ransforms are truly digital trans-
forms, taking into account the quantizatio n in am plitude and
the finite precision
of digital signals. The y b ear the same rela-
tio n o digital signals as theDFT does to discrete-time or
sampled data signals and th e Fourier or Laplace transforms do
to continuo us-time signals. In the same manne r that the rela-
tion of discrete-time signals to continuous-time signals throug h
sampling involves a possible folding or aliasing in the freq uen cy
domain, the relation
of
calculations with the DFT to calcula-
tions with the number theoretic transforms invohes possible
folding of the am plitude th at m ust be taken into ac count.
Th e litera ture on trans form s of these typ es is fairly recent.
Kn uth [4 ] has proposed the use
of
transform s in finite fields.
Pollard [5 ] discussed transform s having the CCP in a finite
field and also gives con ditio ns for having transfo rms with the
CCP in a finite ring
of
integers. Good [ 11 ] also mentione d the
use
of
transform s in a finite ring
of
integers. Schonhage and
5 5 1
Strassen 61defined transform s having the CCP mod ulo a
Ferma t num ber and discussed their application t o fast multi-
plication
of
very large integers. Kn uth
[
121 elaborated on the
work of Schonh age and Strassen. Nicholson [ 101 presented an
algebraic theory of FFT’s in any ring and established fast FFT -
type algorithms tocomp ute hese transforms.Rader
[
131,
[ 141 proposed num ber theo retic ransforms in rings of integers
modulo bothMersenne and Fermat numbers. He f i i t proposed
the application to digital signal processing, showed th at he
transforms couldbecalculated using only additionsandbit
shifting, howed the word lengthconstrain t, and suggested
two-dimensional transforms as a possible relaxation of that
constrain t. Agarwal and Burrus [9 ],
[
151 discussed nu mber
theoretic ransforms in detail, defined Ferma t num ber trans-
forms and also proposed th eir applic ation for fast digital con-
volution. They also suggested possible hardware and software
implementations. Their implementation on the IBM 370/155
showed a factor
of
3 t o
5
speed improvement over efficient
FFT implementat ions
of
cyclic convolution for lengths up to
256. An earlierarticle by Takahasiand shibashi [3 0] was
recently brought to our attention by Dr.
J
W. ooley of
IBM.
11. M O DULAR
ARITHMETIC
In this section, ome of the basic conce pts of modular
arithmetic from number heory relevant to
NTT
will be dis-
cussed. Thiscanbe found nmost basic books onnumber
theory
[
161,
[
171.
Tw o integers Q and
b
are said t o be congruent mod
M if
a = b + k M
(7)
where
k
is some integer and
M
is the m odulus.
This
s written
as
Q = b
(mod
M). 8)
All integers are c ongruent m od
M
to some integer in the finite
set
(0,
1 , 2 ,
*
,
M
-
1) which
is
called the set of integers
mod
M
and denoted by
Z M . ZM
is also known
as
the ring of
integers mod M. If in a ring of integers multiplicative inverses
exist for all nonze ro integers, this ring becomes field
and it can be show n that
Z ,
is a field
if
and only
if
M is a
prime. We will use the symbol
“ZM”
nd the expression “the
ring of integers mod
M ”
or rings as well as fields since a field
is
also a ring. Th e following basic arith me ticoperations are
permissible with modular arithmetic.
Addition: Example, 7 + 12
=
19 = 2 (mod 17).
Negation: Example, -7
=
- 7 + 17 = 10 (mod 17).
Subtraction: Example, 7
-
12 = 7
+
(- 12) =
7
+
5
= 12
(mod 17).
Multiplication: Exam ple, 7
X
12
=
8 4
=
16 (mod 17).
Multiplicative Inverse: Multiplicative inverse of an integer
b
in
Z M
exists if and only
if b
and
M
are relatively prime.
In hat case
b-’ is
an integer such hat
b X b-’ =1
(mod
M).
Example, 7-’
=
5
(mod 17); 7
X 5
=
35
=
1
(mod 17).
Division: a / b exists if and only
if
b has an inverse. In
that case
a / b =
X b-’
.
Example, 12/7
=
12 X
5 =
9
(mod 17); 7 X
9
= 12 (mod 17).
This may seem like a rather peculiar way to d o arithmetic
but i t
is
used quite ofte n by everyone. In discussing the day of
the week, o ne uses an arithm etic mo d 7 or in stating the time,
one is calculating mo d 12 or perhaps 24. Indeed the mantissa
of
a number
in
scientific nota tion is evaluated mo d 10.
8/20/2019 1975 Number Theoretic Transforms to Implement Fast Digital Convolution
3/11
5 5 2
PROCEEDINGS O F THEEEE,PRIL
1975
Because of th e natu re
of
modular arithmetic, numbers do
not have sizes or magnitude. We can n ot say tha t a particular
number is larger than another or that two numb ers are close.
Tuesday may n ot be close to Wednesday or ome before
Wednesday
if
they occu r in differen t weeks.
As was men tioned in the intro duc tion, for the existence of
transforms with the D FT st ructu re given in ( 5 ) and having the
CCP, it
is
necessary that an integer exist that
is
the N th roo tof
uni ty. We will now consider this problem using mo dul ar
arithme tic. First Euler's cp funct ion is definedas cp(M), the
number
of
integers in
Z M
that are relative primes to
M.
For
M
a prime,
q ( M ) = M
1. If
M is
composite and its prime fac-
tored form
is
denoted by
M
= p:p;
* *
p; the n the general
expression for
cp
is
cp M)
= M ( 1
-
l /P1)(1
-
1/P2)
* *
(1
-
l /Pl) .
An impor tant theorem known
as
Euler's theorem states that
for every
a
elatively prime
to M
aq(M) 1
(mod
MI.
(9)
For M prime this reduces to Fermat's theorem
aM- l
= 1 ( m o d M )
(10)
which holds for
all
nonzero elements
of ZM
since t hey are all
relatively prime to
M
if
M
is prime.
There are certain roots
of
unity that are
of
particular inter-
est. If
N is
the least positive integer such that
aN
=
1 (mod
M )
(1)
then
a
is said to be a roo t of unity
of
order
N ,
or simply
of
order
N .
In some
of
the literature
a
is
said to belong to the
exponent N or N
is
the exponent to which a belongs. Another
terminology says
a s
a primitive Nth ro ot of unity.
If the order of
a
the exponent t o which
a
belongs) is equal
t o cp(M), then a s called.a primitive root (do notconfuse with
a primitive Nth ro ot of unit y). If
M
is
prime and
a
s a primi-
tive root, he set of integers ak Mod M),
=
0, 1, 2,
*
,
M - 2)
is the total set
of
nonzero elements
in 2,.
Thus all
nonzero integers in
Z M
can be generated by powers
of
a primi-
tive root . This characterizes the en tire field.
Euler's theorem implies that if
a
s of
order
N
then
N
must
divide
cp(M),
denotedby
NIcp(M).
If
M
is prime it can be
shown that roots of order N exist if and only if
NI
M 1)
and the roots are iven by
= 4 M - 1 N
12)
where
a,+,
enote s a primitive root. More generally,
if a
s a
root
of
order
N
then
ak s of order N / k
if k
IN
ak s of order
N
if
N
and
k
are relatively prime. (1 3)
This implies the number of roots of order
N
is given by
cp(N)
and, therefore , the num ber f primitive roots
is cp(cp(M)).
These
relations will allow on e o calculate
all of
the oots
of
all
possible orders from one primitive roo t. Tables will ofte n list
primes a nd the smallest primitive roo t for each.
These ideas will become clearer by looking at an example.
Consider the field
Z7
witharithmetic mod7. Firs t we will
g ive the f i t ew evalua tions
of
Euler's function.
cp(1) = 1 p 2) = 1 (p(3)
= 2 cp(4)
=
2
( ~ ( 5 )
4
( ~ ( 6 )
2
~ ( 7 ) 6. (14)
Consider raising each element of
Z7
to powers f rom 1 to
6
(mod 7) .
N = O 1 2 3 4 5 6
1 N = l 1 1 1 1 1 1
2 N = 1
2 4 1 2
4 1
4 N = 1
4
2 1
4
2 1
3 N=1 3 2 6 4
5
1
5 N = 1
5
4 6 , 2 3 1
g N = l 6 1 6 1 6 1
This llustrates several very interesting features. Consider the
various root s of order
N .
N
Roots
of
order
N
1 1
2 6
3
6
2 , 4
3, 5
~~ ~
Only hose N that cfivide
cp(M)
=cp(7) = 6 have roots hat
belong to them. The number
of
roots is given by
p ( N )
and the
number
of
primitive roots
iscp(cp(M))
= 2 and they are 3 and
5
Note that both
of
the primitive roots generate
all
the nonzero
eleme nts of t he field while the othe r ro ots gen erate yclic sub-
sets with N distinct members. Also note that Euler's theore m
(Fermat's th eorem in his case)
is
indeed satisfied n tha t all
elements raised
to
the 6th power are congruent to uni ty and
(1 3 ) does generate all the ro ots
of
order N from the primitive
roots. Also note'that every nonz ero integer a has an inverse
x?-2. For a nonprime M , has an inverse given by
if
a
and
M
are relatively prime.
By considering a similar example with
M
a composite rather
tha n a prime, on e observes several differences. First
ZM
is not
a field since all elem ent s will no t have inverses. There
is
no
primitive r oot that will gen erat e the enti re ring, only subsets
with
cp(M)
elements.
When considering a nonprirrie m od
M , M
is a ring and in-
verses exist onl y for integers relatively prime to M. Let M have
the following unique prime power factorization.
.p:' *
. .
;l.
(15)
When the arithmetic s done mod
M , t
is in effect done modulo
eachprimepower pi" simultaneou sly
[ 4 ] , [
181. A set of
arithmeticoperations can bedoneeitherm odu lo each p y
separately and he final esult mod M obtained using the
Chinese remainder theorem [ 4 ] , [ 161, [ 181, or alternatively
all the operations may be don e m odM, ut, they must be valid
operationsmodfor each p y . An integer
a
is said
to
be
of
o rd e r N i n
Z M
if a d nly
if
i t
is of
order
N
in each
Zpiri .
Here
we present so me basic results.
a
b
( m o d M )16)
is true
if
and only
if
~ = b m o d p j ' ) ,. i = 1 , 2 ; * * , Z .1 7 )
If we kno w the residues of an integer
a
modulo each pj', we
c n
uniquely reco nstru ct he integer a (mod M) sing the
Chinese remainder theorem given in the following.
Let
8/20/2019 1975 Number Theoretic Transforms to Implement Fast Digital Convolution
4/11
AGARWAL AND BURRU S: NUMBER THEORETIC TRANSFORMS
5 5 3
and
d i (di mod p i ' ) (mod
$1
1
r'
-1
then
111. NUMBER HEORETICRANSFORMS
In this section, the definition and basic conditions for he
existence of th e NTT will be presented and in particular th e
allowed relations b etw een the mo dulu s M and the transform
length N and basis function
a
are spelled out .
If we have a length
N
sequence of numbers, then a transform
pair of th e form given by
X k)= x ( n ) a k
N-1
n =O
x ( n ) =N-1
X ( k )
a-nk
N
(20)
k=
is said
to
have a DFT structure. By requiring that application
of
the transform metho d in (4) results in cyclic convolution,
the following theore m can be proven.
Theorem I : A length
N
transform having the D FT structure
will implement cyclic convolution
if
and only if the re exists an
inverse
of
N and an element
a,
root
of
unity of order
N ,
i.e.,
N
is the least positive integer such that
d v 1.
This is a very general result applying
to
both rings and fields
that are f i i i te or nf ini te and
it
has been developed from a
variety of points of view [51, [9] , [ lo ] . In addi tion to the
CCP, transform s of thi s ype also allow fast com puta tion
algorithms
of
the FFT type when
N
is highly co mpo site
[
101.
Theo rem 1 is some wha t difficult t o use when investigating
various possible mod uli with mod ular arithm etic,
so
an alter-
nate set of conditio ns will be developed. Let
Z,
represent the
ring of integers (0, 1-, 2,
* * ,
M - 1) with arithme tic carried
out mod M.
Let
M
have the following unique prime power factoriz ation
M
= pi1 p 2 .
;l
(21)
where the pi's are distinct primes. As pointed out in Section
1,
when we carry,
our
arithm etic mod M, we are in effect doing it
modulo eachpi' simultaneously.
Therefore, the length N number theoretic transform having
the CCP in
Zw
must also have th e CCP in
Z p
.ri
for
i
=
1, 2,
*
*
I . This
requires that mod
p p )
an inteber oforder
N
must exist in
Z
r i ,
i.e.,
N is
the least positive integer such that
a N =
( m o d p ? ) , i 1 , 2 ; * * , 2 .2 2 )
Furthermore, since the inverse transform requires N-'
;
he in-
verse
of
N should exist in
Z ',.,
or,
N
should be relatively
prime to M. Now we investigate the existence of an
a
of
order
N , in
each
Zp:,.
By Euler's th eore m (9) and (22 ), we have
Pi
Pi
~ ~ c p ( p ? ) , i
= 1 , 2 ,
*
* * , I (23)
N l ( p i - l ) ,= l , 2 ; * . , 1
N I g c d b ,
-
1 , p z
-
l , * * * , ~ l -) .
We define
O M)
s the greatest common divisor (gcd) of the
( P i - 1)
O ( M ) P g c d { p 1 -, p z -, . . * , p l -
1).
(24)
Therefore,
N IOW).25)
Equa tion (25) gives the necessary conditio n for the existence
of a transform of length N in th e a rith me tic mod Now c on-
sider the converse of
it.
If
NIO(M)
r
N ( c p ( p i ' ) ,
hen here
exist integers
a i
(mod
p y )
of order
N
in Z ri. Using these
ai
we can constru ct ransform s (mod p y ) wh ch have the DFT
structure
of
(1 9) an d are nvertible.Combining hese rans-
forms by the Chinese remainder th eore m (18) one can obtain
a transf orm modM ) having the CCP in
Z,.
Alternatively,
one can combine the ai's by the Chinese remainder theorem
to obtain an
a
mod M ) f order N in Z and construct the
final ransform using this
a.
Th e results will be dentical.
The refo re, (25) is the necessary and sufficient c ondition for
the existence
of
an invertible transform
of
length
N
which has
the CCP mod M. This
is
stated
in
the orm
of
a theorem
[91, [151.
Theorem 2: A length
N
transform having the DFT structure
will implement cyclic convolution modM
if
and only
if
p i
N
O(M). (26)
This also establishes he maximum transform length in
Z,
as
Nmax =
O(M).
This
is a very impo rtant theorem that states exactly what the
possible transfo rm lengths fo r a given modulus are.
Althoughboth heorem s as stated here assume theDFT
structure of (1 9) , theyhold for any general transform having
the CCP
[
lo ]. I t is possible in a ring with a composite modulus
to have a transform with the CCP but not the DFT structure.'
Transforms
of
this sort do not allow an FFT -type fast algo-
rithm and, therefore, do not eem promising.
For numb er theoretic transforms to e attractive in compari-
son to other implem entations of convolution, they should be
computationally efficient.Thereare thre e requirements that
will be considered. First,
N
should e highly comp osite
(preferably apower
of
2) for a ast FFT-type algorithm to
exist and should be large enough for practical sequence lengths.
Seco nd, since complex multiplications take most of th e com-
putational effort in calculating the FFT , it is importa nt that
the multiplication by powers
of
a
be a simple operatio n. This
is
possible
if
the powers of
a
have binary represen tations with
very few bits; preferably also be a power of two, where multi-
plication by a powe r of
a
reduces to a word shift. Third, in
order o facilitate arithme tic mod
M ,
should also have a
binary rep resentatio n with a very few bits and should be large
snoug h to prevent overflow.
Although the class of all possible numb er heoretic trans-
forms seems very large at first consideration, closer examina-
tion shows tha t very few seem to satisfy the aforem entione d
criteria. The param eters that must be chosen are M,
N ,
and
a.
Unfortunately the conditions given by Theo rems 1 and 2 do
Or
( p i -
l ) (Phi - p i pi
l ) . SinceN is
structure
w a s
producedy
G .
Kopec at
M.I.T.
and led to this
ri - ri-1
'Anxample of aransform in Z,, having the CCP butothe D F T
relatively prime
to M
(or
pi 's)
observation.
8/20/2019 1975 Number Theoretic Transforms to Implement Fast Digital Convolution
5/11
5 5 4
not give a system atic way of determining the "best" choices.
As a result one must use intu ition , insight, and a bit of search-
ing. Usually an M is selected and the resulting possible
N
and
a
are then examined.
First we s e e that if M is even, it has a facto r of 2 and, there-
fore, O h 4 ) and Nmax re 1 which implies M should be od d. If
M is a prime then O ( M) = M - 1 which is as large as one could
hope for in a field of M
integers. For
M = 2 &-
1, let k be a
composite P Q , where P is prime. Then 2'- 1 divides fQ
1
and th e maximum possible length of the transform will be
governed by he eng th possible for 2
-
1. Therefore, only
the prime
k
need to be considered interesting. Numbers of this
form are known as Mersenne numbersand Radar [ 141 has
discussed convolu tion using Mersenne numbe rs in deta il. For
Mersenne number transforms, it can be shown that transforms
of length at least 2 P exist and he corresponding a is - 2 .
Mersenne num ber transforms are not
of as
much interest be-
cause
2P
is not highly composite and, herefore, we do no t
have fast FFT-type computationalalgorithms.
PROCEEDINGS
OF
THE IEEE, APRIL
1975
Number theoretic ransforms with Ferm atnum ber as a
modulus are calIed Fermat number transforms (FNT).
As discussed in the last section, for the FN T of length N t o
exist , N must divide O(F,) = Nmax. Now, we consider rans-
form lengths possible inarithmeticmodulo various Ferm at
num bers and also give the corresponding values of a .
Since F ermat numb ers up to F4 are prime, O(F, ) = 2 b , and
we can have an F NT for any length N = 2 m ,
m Q
b . For these
Fermat primes the integer 3 is an
a
of order N
=
2 b , allowing
the largest possible transfo rm length. The re re 2 ' -
1
othe r integers also which are
of
order 2b and can be obtained
from ( 1 3 ) . The integer 2 is
of
order N
=
2 b = 2' . If
(Y
is
taken as 2 or a power
of
2 , all the powers of a would be some
powers of 2 , and for these cases, as discussed in the last sectio n
and in [ 141, the FNT can be computed very efficiently and is
called the Rader transform (RT).
T o bet ter see the charac ter of these prime mo duli consider
an exam ple for F2 similar in m anner to that in Section 11. If
the modulus is M
=
F2
=
17 then
~
N = O 1 2 3 4 5 6 7
8
9
10 112 13 14 156
a N l
4
8
16 15 13 9 1
2
4 8
16 15 13
9
1
3 N = 1 3
9
10 13 5 15 11 16 14
8
7
4 12 2 6 1
q N = 1 4 16 13 1
4
16 13
1
4
16
13 1 4
16 13
g N = l 6
2
12
4 7
8
14
16
11
15
5 13 10 9
3 1
Fo r
M = 2 k +
1 and
k
odd,
3
divides
2&+
1 and the largest
possible transform length
is 2 ,
thu s w e consider only
k
even.
Let k be s2', where
s
is an odd integer. Then 2 2 f
+ 1
divides
2 '+ 1 and 'the length
of
the possible trans form will be gov-
erned by the leng th possible for 22' + 1. There fore, integers of
the form M
=
22'+ 1 are of interest. These numbers are known
as Fermat numbers andwill be discussed in detail inhis paper.
Fermatnumbers seem to optimum in the sense of having
transforms whose length is interesting while the word size
is
moderate. Numbers
of
the form
2 '+
1
are
also of
limited
interest and are discussed in Sectio n IX. A s ystem atic investi-
gation of those
M
which require more than two bit epresenta-
tion is difficult. Our preliminary investigation in that direction
has
not been very encouraging.
Here we see that 3 and 6 are primitive roots tha t will generate
the entire field 21 . The value 2 is
of
order 8 and 4 is of order
4 . Also note that 6 =*in th e sense th at 6 2
=
2 ( mo d 1 7 ) .
Fo r digital filtering applications, the compo sites F s ( b
=
3 2 )
and F 6 ( b
=
6 4 ) also seem to bepractical.Lucas [ 191 has
proven tha t every prime fac tor of a composite F ,
is
of the form
K2'
+
1. Therefore, 2' divides O(F, ) , for r > 4 . In par-
ticular it can be verified tha t for Fs and F 6 , O ( F , )
=
2 '.
Therefore, for hese choices of Fermat numbers, th e maximum
possible transform length
is
N
=
2'+'
=
4 b .
Also, we assert tha t
b given by ( 2 8 ) s of order 4 b in
ZF,,
2 .
*4
a4b
=
2b/4
( p / 2
-
1)
( 2 8 )
We denote this
(Y4b
as *because
IV.
FERMATNUMBER RANSFORMS
(Y
=
2 (mod F f ) .
In this section, we consider one of the m ost promising num-
ber theoretic transforms where the modulus
is
chosen
to
be a The proof that
a 4 b
given by ( 2 8 )
s
of order 4 b with respect
to
Fermat number
any fac tor of F ,
is
given in [ 9 ] . An y odd power of *will
also be of order 2'+'. By raising fl o (2'+2-m)thpower,
Table I below gives values of N for the two most important
( 2 7 ) values of & and also gives the ma xim um possible f l for the most
M = F , = 2 ' + 1
we obtainn integer
a
of order 2 m , m Q t + 2 .
= a b
1
b = 2 '
and F , is called the t th Ferm at num ber. Originally, Ferm at
conjectured
[
161 that these numbers were
all
prime but un-
fortunately not only was the conjecture wrong, it seems that
only FO through F4 are prime and all the others are omposite.
Th e first few values are:
Fo
=
F1
=
F2
=
i } prime
F3 = 257
F4 = 6537
F S = 4 294 967 297 641
X
6 7 0 0 4 1 7
F 6 2 1 . 8 4 x 1019 = 2 7 4 7 7 x 6 7 2 8 0 4 2 1 1 0 7 2 1 .
practical values of b .
Fo r
FNT s
with a prime or composite
modulus
we see
a = 2
or a power
of
2
is
possible for sequence lengths up
to
N
=
2 b
=
2'*'.
This
is a very desirable situation since
N
is highly com-
posite allowing an FFT type algorithm and all multiplications
by pow ers of a are simple word shifts. If a = a i s sed then
sequences of length
N
= 4 b = 2' are possible but one stage
in the FFT a lgor i tw wi l l requ ire two sh i f t s [ 9 ] . This a
= .\/z
and the resulting N = 4 b give the maximum length possible for
Fs and F 6 , however, for prime F , furth er increases in N are
possible up to N = 2 if mo re stages of th e FF T algorithm are
allowed to have multiplication rather than simple word shifts.
From this example it
is
seen that a
=
*= 6
for
M
= F z
gives
the max imum possible N .
8/20/2019 1975 Number Theoretic Transforms to Implement Fast Digital Convolution
6/11
AGARWAL AND BURR US: NUMBER THEORETIC TRANSFORMS 5 5 5
TABLE
PARAMETERSOR
SEVERAL
OSSIBLEMPLEMENTATIONSOR FNT’s
N
r b
F ,
Q = 2
CY Nmax
Q
forNm,x
3
2’ + 1
16
32 25 3
4
216
+
1
32
64
65536
5 32
2”
+ 1
64 128 128
6 64
264 + 1 128
256
25 6
a m ~
ase
corresponds to
t he Rader Transform.
Because of thenature of modulararithm etic discussed in
Section
11,
theFNTcoefficientsdonot seem to have any
physical meaning. Although th e signal for which the FN T is
being taken may be very small, its FN T coefficients may lie
anywhere between
0
and
F , -
1. This is because the conce pt
of magnitude does not exist. This also me ans tha t the con cep t
of
“closeness” of two num bers does not exist in the mo dula r
arithmetic.Therefore,approximationsor roundings arenot
allowed in the m odular arithmetic. A seemingly small approxi-
mation in the t ransform domain may introduceerious error in
the final esult. But, because of thenatu re of themodular
arithm etic, there is no need for a pproxim ation. During various
stages of the comp utation each accumulation of signal “over-
flows” man y times. But still the end result of the convolution
will be exac t
if
the input signals are prope rly boun ded. Some
of the propertiesof the FNT’s are given in [ 9, appendix
A ] .
Example
T o make th e ideas of this sectio n more clear, we now present
an example. This examp le will illustrate several poin ts: trea t-
ment of negative values in the data, the structure of the trans-
form and the inverse transform matrix, negative powers of a
frequen t “overflow” during com puta tion , meaninglessness of
the transform values, and exactness of the final answer. This
example will not demonstrate the efficient implementation
of
the FNT using the binary arithmetic.
Consider two sequences
x
=
( 2 , - 2 ,
,
0
nd
h =
(1,
2 , 0, ) ,
whose conv olution is desired. From he overflow considera-
tion, t is sufficient ifwe workmodulo
F2
= 17. We want
=
4 , for F2 the integer
2
is of order 8, therefo re
2’
=
4
s an
a
of order 4.
The transformation matrix
Tis
given by
4
16 13
(mod 17).
L1 136
4J
ince 4-’
=
- 4 (mod 17), he inverse transformationmatrix
T-’
is given b y
r l l 1 1
T-1 = 4-1
r l l l
= - 4
I
1 - 4
-1
1 -1 1
L1
4
- 1
- 1
- 4
r l 1 1 1 1
= 1 3 I I (mod7).
1 136
4
1 16 1 16
Ll 4 1 63 1
The t ransformsof
x
and
h
are given by
1 1
4
16
16 1
13 16
Note that in x ,
- 2
was represented by
- 2 +
17 = 15. Similarly,
H
= (3 , 9 , 16 , 10 ) and
Y
= X
H =
( 3 , 9 0 , 8 0 , 9 0 )
=
(3,
5, 12, 5 )
(mod 17).
Taking the inverse transform
of Y ,
y = ( 2 , 2 , 1 4 , 2 )
(mod 17).
According to
OUT
assum ption, integers are supposed to lie be-
tween -8 and 8. Therefore, 14must beepresented as
14 - 17 = -3 . This gives
y
= ( 2 , 2 , 3 , 2 ) , which is the co rrect
answer.
Also,
note that
y is
a symme tric sequence, therefore,
Y
is also a symmetric sequence. Oth er than this, the transform
values seem to have no interpretation .
For man y applicationsa direct application of the FNT
to
imple men t convolution will result in a significant impro vem ent
over any alternative methods. There are many other situa tions
where the constraintsof the trans form are too severe.
If data magnitude or machine constrain ts dictate acertain
word length andhencea certain F , , the allowed sequences
length
N
may be too short. If input da ta magnitude and filter
length indicate a possible out pu t magnitude t hat w ould exceed
F , / 2 ,
then overflow becomesaproblem. If
b
bit words are
used w ith a modulu s of
F , = 2 b +
1, themachine can represent
2b
integers but the transform needs
2 b + 1.
We now consider
several partial solutions t o these problems.
V.
M E T H O D S
FOR CONVOLVINGLONG SEQUENCES ND
Arithmetic mod
F ,
can be implemented using
b
=
2‘
bit
represen tation of integers with some provision for representing
2 b .
We have seen the maximum length of sequences which can
FOR
AVO IDING
VERFLOW
8/20/2019 1975 Number Theoretic Transforms to Implement Fast Digital Convolution
7/11
5 5 6
PROCEEDINGS O F THEEEE,PRIL
1975
TABLE
I
M A X I M U M ONE-DIMENSIONALYCLICCONVOLUTION ENGTHSUSING
TWO-DIMENSIONALNT O R RT
Word Length
b
N or Q = 2 N or Q =
fi
16 512
32
64 8192
be cyclically convolved using the FN T with
a
=
2 is N
=
2b and
therefore the length of sequences which can be convolved is
proportional to the word length n bits. Thus, for long se-
quences, word length equirem ent may be excessive. Rader
[ 141 suggested using a two-dimensional convolution scheme to
convolve longone-dimensional equences nd Agarwal and
Burms [ 151, [231 presented such a two-dimensional convolu-
tion scheme. Using this schem e, cyclic convolution of leng th
N = LP is implemented as a two-dim ensional cyclic convolu-
tion of length 2L by P. This two-dimensional cyclic convolu-
tion can be imp leme nted using a two-dimensional FN T [ 151,
[2 3] defined similar to the one-dimensional transform. Using
this two-dimensional schem e, the w ord length required is pro-
portional to the square root
of
the length of the sequences to
be convolved which would give for a max imum sequenc e le ngth
8b2, rather han 4b. If
P
is taken as the maximum possible
length 4b, and 2L is a small integer, than either direct convolu-
tion or ano ther high-speed algorithm could be employe d
[
23
1
,
tocom pute convolution along theshort dimensionand the
one-dimensional FN T could be used along the long dimension.
Com putation ally this c omb ination can be very efficient as will
be shown in the implem entation in Section VIII. Table I1 lists
the maximum engths for two-dimensional ransforms. This
approach requires approximately a factor of two increase in
com puta tion and storage requirem ents over a irect ne-
dimensional implementation.
Anotherapp roac h o achieving longer ransforms than
al-
lowed with a
=
2 is to use a =
fi
n (28). It can be show n
[9, 15 tha t
a
=
fi
s of order N
=
4b
=
2'" and th at mu lti-
plication timesfi equires two word shifts..
Examination
of
the FF T algorithm [ 11 shows that if a =fi
isused, only one tage will require multiplication by odd pow ers
of
fi
nd from [9 ] i t is shown his can be done with two
word shifts. The oth er stages will multiply by even powe rs of
fi nd, therefore , use a single shift as for the case with a = 2.
This modification is relatively simple and allows a dou blin g of
the allowed N. Note in the example for Fz that fi 6 which
also gives N
= N m s .
Each additional square root of
a
results in a doubling of the
allowed N ( u p to N = Nma) and adds an addition al stage
of
calculation to th e FF T algorithm. Unfortunately, beyond
fi
each stage will require a general multiplication. Fo r he case
where F , is a prim e, if a few stages of multiplication can be al-
lowed, then N can be increased [ 251. For F 5 and F6 a =
fi
gives the rnaximum N.
This use of a 2 can be com bined w ith the two-dimensional
me thod s to give various desired N.
An othe r possible problem arises because of the mo dular arith-
metic. In the ring of integers mod M, conventional integers can
be unambiguously represented only if their absolute value is
less than M /2. If the npu t integersequences x(n ) and
h ( n )
are so scaled that Iy(n)l never exceeds M /2, we would get the
same results by implementing convolution inhe ring of integers
modulo M as thatobtained with normalarithmetic.This
is
similar tohe overflow constra int inixed-point digital
machines. In most digital filtering applications,
h n )
represents
the impulse response and is known
prior i ;
also the maximum
magnitude of the inpu t signal is usually know n. In this situa-
tion, we can bound the peak output magnitude by
k O
This may well require a longer word leng th than
is
possible or
practical.
One possible so lut ion to this overflow problem involves seg-
menting the words into sho rter blocks an d convolving them
separately
[
91
:
x(n) = x z ( n )+x1(n)2k, Ix,(n)l < 2k
h ( n ) = h z ( n ) + h , ( n ) 2 k , Ihz(n)l < 2 k ( 3 0 )
+ ( x 1
*
h2 + x 2 *
h , ) 2 k
+ x 2 *
h z .
(3 1 )
y = x * h = (x1 * h1)22k
Now, since x1 , h 1 , x ? , and h2 have roughly half the num ber
of
bits, it should be possible to convolve them using a pproxim ately
half the num ber
of
bits. If necessary, a more precise analysis of
the above situation could be easily perform ed. In (31), the last
term, in comparison to the first term , is very small and can
be
neglected. We need t o take tw o transforms for x and tw o trans-
forms for h , the summation shown within the parentheses can
be performed in the transform dom ain.Finally, we need t o take
tw o inverse transforms, one orx 1 * h l and theother or
There is anoth er alternative t o thisproblem suggested by
Rader [ 241 and Parks [251. This
is
based on he Chinese
remainder theorem. The convolution is done modulo two dif-
ferent integers Ml and
M 2
where
Mi
and M2 are such that t he
cyclic convolution in
Z M ~
nd
Z M ~
s easily im plem ented on
the same machine. The final result mo d M1
.
M z is obtained
by the Chinese remainder theorem. M1 is usually a Fermat
number and the cyclic convolution in Znnl is computed using
the FNT. Rader [ 241 suggested usingM2
a
power of 2, in that
case the convolution n ZM? is computed by taking the se-
quences mod M z and then convolving them in Z M ~ sing the
FN T and hen reducing hem back to Z M ~ . n this case M2
should be small enough so that no error is introduced by im-
plementing convolution mod M z , n Z M , . This requires
(x1 * h2
+
x2
*
h l ) .
NM; < M ~ . (32)
Parks [25
I
suggested tha t M z could be a Fermat number, just
smaller than M 1 .
In this case the convolution in Z M ~an also be done using an
FNT. Furtherm ore, the same machine can be util ized to com-
pute the FNT in Z M ~ . ecause M2
I
M1
-
2), therefore, arith-
metic in Z M ~ an be carried out in
Z ( M , - ~ ) .
Example:
M1 =
216 + l M1 2
=
216 - 1, an d Mz
=
2* + 1 These ideas can be
further extended.
Stil l another approach to solving the sequence length N a n d
word le ngth cons traints would be t o use block processing [ 31.
By breaking the sequence of length N into smaller blocks and
scaling and processing them separately with the FN T one can
combine the results to get the desired outp ut. This can be
viewed
as
a type of two-dimensional processing.
VI.
OVERFLOW AND
QUANTIZATION
CONSIDERATIONS
As mentioned in Section
V ,
we could perform cyclic convolu-
tion modulo integer
M
and obtain the correct result
if
the ab-
solute value of the outp ut never exceeds M/2. If this conditio n
8/20/2019 1975 Number Theoretic Transforms to Implement Fast Digital Convolution
8/11
AGARWAL AND: NUMBER THEORETIC TRANSFORMS 5 5 7
is violated, the resulting err or is rath er serious. Because of the
nature of the modular arithmetic we obtain folding or aliasing
of
the signal amplitude . This situation could be avoided if the
signals are properly quantiz ed (or normalized).
Let x ( n ) and
h n )
represent the original signals. They may
have fractional parts (bits to the right of the binary point). To
make use of the num ber theo retic transforms these sequences
must be integer equences.This is easily accom plished by
merely shifting the binaryposition all the way to he right.
This introduces scale factors in the sequences. The integer se-
quences
? n)
and
c n)
are given by
? n)
=
x ( n ) 2bl (33)
C n)
= h n )
2b1 (34)
where
bl
and
b z ,
espectively, represent the num ber of bits to
the right of the binary poin t in x ( n ) and
h n )
sequences.
jqn)
=
x (n) * L n)
=
2b1+ba x ( n ) * h n )
= 2b1+bly n ) . (35)
Now wegive some u pper boun ds on h e output y n ) . These
are due t o Jackson
[
2
1
.
The
L ,
norm
of
a
signal is defined by
The ou tpu t of the cyclic convolution is bounded by
Ijqn)l
8/20/2019 1975 Number Theoretic Transforms to Implement Fast Digital Convolution
9/11
5 5 8
PROCEEDINGS O F THEEEE,PRIL
1 9 7 5
TABLE 111
CYCLIC CONVOLUTIONTIMINGS
OR
LENGTH
N
REAL
SEQUENCES
FFT FNT or RT
N ms) ms)
32 16 3.3
64 31
1 2 8
1.4
16.6a
256
256 8O.Oc
512 245 166.OC
102 4 53 0 34O.Oc
204 72O.Oc
W i n g
a
= IT.
%sing two-dimensional RT.
2
by
128 convolution.
fast FN T, unlike the FF T, we do not need t o store the powers
of
a
(if
a
is taken as 2 or a power of 2).
VIII. C OMP A R I S ON WITH THE
FFT
As
noted in the previous section, computing the FNTs a very
simple operation on a binary machine. Now let
us
compare the
complexity of various basic operations involved in computing
the FN T vis-&vis th e FFT . If the tw o sequences x ( n ) and h ( n )
have
b l
and b z bit rep resentatio ns, respectively, andare of
length N , then the ou tpu t y ( n ) would need no more than a
(b l
+
bz +log , N) bit representa tion. To obtain he correct
result b 2 l
+
b2
+
log, N . In Section
V ,
we havegiven a
bet ter bound on the ou tpu t . n Section VI, we have given other
bounds. Roughly speaking, we need twice the num ber of bits
to carry out the convolution using the FN T as compared t o the
fiied-pointFFT mplementa t ionof heconvolu tion. But in
the DFT, every data point is treated as a complex number and
therefore requires two w ords, one for the real part and on e for
the imaginary part. Thus, in effect, the hardware requirem ent
for two transforms is about the same. Although for real data
it
is
possible to make use of the sym metry propertie s of the
DFT’s they require extra computatio n and for the purpose of
comp arison it will be ignored, even t hou gh we have take n this
into accoun t for our IBM 370/155 implem entation to be dis-
cussed later. Therefore, we shall assume that in the FF T imple-
men tation, each data p oint is represented by a b/2 bit real part
and a b/2 bit imaginary p art,
One b/2 bit com plex addition is equivalent to tw o b/2 bit
real addition s, which are comparable o a b-bit ad dition mod F,.
Thus, the complexity of addition/subtraction is the same in
both the transforms. Similarly, it can be shown that a b/ 2 bit
complex multiplication is comparable t o a b-bit m ultiplication
mod
F,.
Computation of the RT requires multiplications by
powers of 2 , which imple me nted as bit shifts and subtraction s
become much simpler operatio ns compared to complex multi-
plications required in the F FT implem entation.
To ompute aength
N
fast
RT, N
log2 N additions/
subtractions, and
( N / 2 )
og2
N / 2
multiplications by some
powers
of
2 are required which are implemented as bit shifts
and subtractions. To compute the convolution using the FF T,
most of the time is taken in com puting the complex multipli-
cations required to comp ute the ransforms.
A
comparison with
RT reveals that these complex multiplications are replaced by
bit shifts and subtrac tions which are much faster operatio ns.
This results in considerable com putation al savings in th e imple-
me ntatio n of convolution. Th eomp utation required to
multiply he wo transforms is about he same for both he
implementations. To convolve long sequences using the two-
dimensional RT the computational effort and required storage
increases by, at the mos t, a factor of 2. Still, the FN T imple-
men tation of convolution is much faster as com pared to th e
FFT implementa t ion .
These transforms were implem ented in assembler language on
an
IBM 37 0/1 55 which has a 32-bit word length
[ 9 ] .
he re-
sults were compared with an efficient F F T program for com-
puting convolution which m akes use of the sym metry
of
the
D FT for real da ta (see Table 111) [201.
Ix.GENERALIZATIONS,
ARIATIONS, A N D OTHER
RESULTS
A . Other Choices
for
M
In this paper, we have primarily discussed numb er theoretic
transforms n he rings of integers modulo Ferm at numbers.
These numbers seem to be the best choice for impleme ntation
on binary compu ters. Nevertheless, any odd integer M can be
used as was discusse d in Sec tion 111. Rad er [ 141 proposed the
use
of
Mersenne numbers M p
=
2p - 1, where p is aprime.
Mersenne number ransforms for (11 = - 2 have N = 2p, and ,
therefore,oot have an FFT-type fast computational
algorithm.
FNT’s require the com puting word leng th to be a pow er of 2.
Many com puters do not have word length a power of 2. Trans-
forms similar to FNT’s exist for many of these situations. For
example, on a 24-bit machine, we may perform convolution
modulo
M
= 224
+
1. Fo r this
M , a
= 23 gives N = 16, and
a
= 2%’
= 2’(212
- 1) gives N = 32 as the maximum length.
In general one could take M = 2”
+
1, s is an od d integer.
In hat case,
a
= 2’ would g ive N = 2“’, and
a
=
(2’)’’’
=
situation s, t may be possible to have transforms of greater
length tha n 2“’. Fo r xam ple , taking M
=
240
+
1, he
maximum possible transform ength is 256 , and taking M =
2w
+
1, the maximum possible transform length is 1024 , but
the corresponding
a’s
may not be simple.
For many computers whose word length is not a power
of
2,
if t he above formu lation is used to compu teransforms
analogous to the FN T, the maximum transform length is very
small. But, if we are willing to sacrifice the “effectiveword
length” to some ex ten t, we can increase the m aximum trans-
form length significantly. Let b , a multiple of 4 be the word
length of the m achine. Let
M =
2b + 1 = M l M 2 .4 7 )
M I
and M z may be nonprim es. t can be easily proved th at
r
2[(’-1)/2+’zr-212’Zf-l - 1) would give N = 2r+2. In many
It may
so
happen tha t M 1 s a small integer and O ( M z )
>>
O M), therefore, because of the presence of M 1 he maximum
transform length is being considerably educed while at he
same time M1 may not be increasing the maximum allowable
outpu t (y,,,=) or he “effective word length” significantly.
In this situation we can com pute transforms n
2~~
with
maximum ransform length O M,) ndmaximum allowable
output magnitude as
M z
/2. At the expense
of
reduced output
range, we have increased the transform ength. Fu rther mo re,
ar i thmetic mod M z can be conveniently ca m ed ou t as arith-
met ic mod M , because
M 2
is a factor of M. At the end of the
com puta tion, we have to reduce the result mod M 2 .
Table
IV
shows this factoriza tion of M for several values of
b .
Log2
M z
shows the “effective word length”
of
the machine.
8/20/2019 1975 Number Theoretic Transforms to Implement Fast Digital Convolution
10/11
AGARWAL ANDS: NUMBER THEORETIC TRANSFORMS
5 5 9
TABLE IV
FACTORIZATIONF M = 2 b t 1
AS MI
’M, A N D
THE h h X l M U M
TRANSFORM
ENGTHC O R R E S P o N D I N G TO M2
Machine Word
~~~ ~
Effective Wordn
2111
Length Length
log2 M z N or
a = 2
N or
cz
=
fi
b
O W )
MI MZ approx. N,,, = O(Mz) 2 bb
12 16
17
24 1 8 240
24 48
20 16
17 61681 16 61680
40 80
24 32
257 97 X 673
16 96 486
28 16
17
15790321 24 15790320 56 112
36 16
17 X 241
433 X 38737 24 144
7244
40 256
257
4278255361
32 4278255360
80 160
48 64
65537
193 X 22253377
32 192
96 192
.56 32
257
5153 X 54410972897
48 224
112 24
60 16 17 X 241 X 61681 4562284561
32 4562284560
120 240
72
97
X
257
X
673
577 X 487824887233
48 576 144 288
80 1024
65537
44479210368001 X
645360
16020
414721
TABLE
V
ME PARAMETERS FOR
NUMBER
HEORETICRANSFORMSN
DECIMAL ARI THMETI C
Machine Effective Digits
b
O M ) MI
MZ approx. N,,,
=
O(Mz) N for cz = 10
4
8
1
73 X 137 4
600
101 9901 69002
8 16
17 5882353
7 5882352 16
10
20
101 3541 X 27961
80 20
12
73
X
137 99990001
89990000 24
16 32
1
353 X 449 X 641 X 1409 X
162 32
~~~~~
Digits
1 10 Mz
69857
Note that for the hoices
of Mz’s
show n in Table IV 2 is always
of
order 2b and also there exists an integer fi 2b/42’” - 1)
which is
of
t he o rd er 4 b in Z M ~ . This leads to an efficient
implementation
of
these transforms, because powers of a are
simple and arithmetic mod M is also simple. For these trans-
forms the w ord length is not a power of 2, but, still the trans-
form length 4b is highly composite. The case b
=
60 needs
special attent ion, for this if M I
is
taken as 17 for remaining
Mz,
M z ) =
240 bu t fi s no t of orde r 240. This is because
fi s
of
order 48 in
Z Z ~ ]
nd of o rder 80 in Z b la l . For th is
case although one can find an integer
a
which is
of
order 240
in Z M ~t is no t likely to be simple.
Thus fa r, our discussion was based on the assumption th at
thecomputer is abinary com puter. Many com puters have
decimal representation of integers and for these com puters, it
will be efficie nt
if
the ar i thmetic
is
done mod
M
=
10’
+
1 and
a and powers of
a
are powers
of
10. We have compiled Table
V to be used for decimal com puters. Similar tables can be com-
piled for othe r radices also.
B.
Complex Number Theoretic Transforms
The nteger field Z , (assuming M to be prime) can be ex-
tend ed to complex integer field denote d by
Z&,
if the follow-
ing equa tion does n ot have a solu tion in Z,:
xz +
1 = 0. (49)
This means
(-
1) does no t have a square root in Z or equiva-
lently a root of order 4 does not exist in M. This implies
Equation 50) s the only condition required for Z& to exist.
In Z every integer is represented
as a +
b ;
a ,
b
E
Z, All the
arithmetical operations are done as in thenormalcomplex
arithmetic with j z
=
-
1.
Both real andcomplex parts are
evaluated mod
M,
separately. The conce pt of magnitude and
phase does not exist in Z h . Complex number theoretic trans-
form s (CNT) similar to N TT exist in Z& , and can be used t o
com pute he cyclic convolution of tw o complex nteger se-
quences. To avoid error due toaliasing both real and imaginary
parts
of
the outp ut should be separately bounded to M /2. Th e
idea of the CNT has been considered by Reed [ 261.
Theorem: A transform having the cyclic convolution prop-
erty in Z& exists if and only
if
N I M Z - 1).
We will n ot give a form al proo f
of
this theorem , but we
will
outline a procedure to find a complex integer a of order
N
in
Z&, if N divides ( M z - 1).
Theorem:
This theorem can be easily proved. This theorem implies that
every complex integer is at mo st an (M 1)th root of a real
integer, in
2 .
Let
a + j b
be an N th roo t of a real integer, i.e.,
N is the least positive integer such that
(a
+
b>N = a real integer 53)
then by 5
2)
4 1 O ( M ) = M -
1.
( 5 0 ) N I ( M
+
1). (54)
8/20/2019 1975 Number Theoretic Transforms to Implement Fast Digital Convolution
11/11
560
How to find an
a
of order M Z 1 in Z&:
Consider complex integers of the form (1 + b ) and search over
b
E Z M
such that (1
+ j b )
is an
M
1)t h roo t of a real integer
(proof for the existence
of
such a
b
can be given).
Then,
(1 +jb>M+’
= I
+ bZ.
55)
Let aM-, e a root of order
M
1) in Z M . Then 1
+
b Z can
be written as some power of
a ~ - ~
(1
+
bZ)
=
a j f - 1
.
(56)
It can be shown that x is odd. Then , -1 given by ( 57 ) is of
order
(Mz
1)
in
Z&.
aMa 1
=
(1
+
b )
aJ?yl)’z)k
(57)
This
can be easily proved. By raising a p - l by Mz l ) /N) th
power, we can find a complex integer a~ of order N in Z& if
NI(Mz - 1). It can be easily proved that
g c d ( M -, M +) = 2 . 58)
Let
N = g c d
M -
, N )
N = N I X N Z
then a~
will
be complex, but (YN,& will be real.
In the fast F FT algorithm to com pute the CNT the part cor-
responding to N z will require complex arithmetical operatio ns
but the part corresponding to N1
will
require only real arith-
metical operations. CNT are good in theory as they offer more
choice n ransform engths.But,
so
far no CNT have been
found for which powers of a are simple. CNT do no t exist for
Ferm at numbers, but they exist for Mersenne numbers.
The only extension we have investigated is when
M = P l Pz ” ‘ P I ( 5 9 )
where
p i ’ s
are distinct primes. We have not investigated th e case
when M contain s prime powers. Z M can be extended to Z& as
before if
4 Y ( p i -
11, i =, 2 ; * . , 1 . ( 6 0 )
Also
the Chinese remainder theorem can be used in
Z&.
It is
applied separately to the real and imaginary parts. For CN T to
exist inZ&, they must exist inZ& also. This gives the following
theorem.
Theorem: CNT of length
N
in Z& exists, if and only if
N lg cd {p :- 1 , ~ ; -, * . * , p ? -
1 ) .
(61)
We find ai
of
order N in Z i i and then combine them by the
Chinese remainder theorem to obta inan a of order N
n
Z&.
C.
Application t o Tw o-Dimensional Filtering
Rader
[
271 has recently discussed the application of t h e FNT
to two-dimensional filtering. In 2-0 applications, the length of
the impulseresponse along eachdimension
is
not too large,
therefore the FNT’s are ideally suited for this application, be-
cause, for these applications the length constra int
of
the FNT’s
PROCEEDINGS O F THE IEEE, APRIL 1975
is not impo rtant. Other choices
of
M discussed in this section
can also be used for this application.
REFERENCES
[ 11 B. Goldand C.M. Rader, Digital
Processing
of
Signals.
New
[ 2 ] T. G. Stockham, “High speedconvolutionandcorrelation,” n
York: McGraw-Hill, 1969.
AFIPS Conf. Proc.,1966 Joint Cornpurer Con5, ol.
28,
pp.
229-
233 (also in [?SI).
[
31
C. S. Burrus, “Block realization o f digital filters,” IEEE Trans
Audio Electroacoust.,vol. AU-20, pp.
230-235,
Oct.
1972.
4 )
D.E. Knuth,
The Art
of
Computer Programming,
vol.
2 ,
Semi-
numerical Algorithms.
Reading, M a s : Addison-Wesley, 1969.
[
51
J M. Pollard, “The fast Fourier transform in a finite field ,”M ath.
61 A. Schonhage and V. Strassen, “Fast mult ipl ication of large num-
Comput.,vol.
2 5 , pp.
365-374,Apr. 1971.
71 J W. Cooley and J W. Tukey,
“An
algorithm for machine calcula-
bers,”
Compur.
(in German), vol. 7, pp. 281-292, 1971.
t ionofcomplex Fourier series,”Math.
Cornput.,
vol. 19, pp. 297-
301, 1966
(also in
[28]) .
81 A. V. Oppenheimand C. Weinstein,“Effectsof finite register
length in digital f i l tering and the fast Fourier transform,” Proc.
91
R.C. Aganv al an d C. S. Burrus, “Fast convolution using Fermat
number ransforms with applications to digital f i ltering,”
ZEEE
pp. 87-97,
Apr.
1974.
Trans. Acoustics,Speech,
and
Signal Processing,
vol. ASSP-22,
IEEE,voI.
60,
pp.
957-976,
Aug.
1972.
[
101 Pi J
Nicholson, “Algebraic theory of finite Fourier transform s,”
J. Comput. Syst. S c i , vol.
5,
pp.
524-547, 1971.
[
1 1 1 I.
J
Good, “The relat ion between two fast Fourier transforms,’’
IEEE Trans. Compur., vol. C-20, pp.
310-317,
Mar.
1971.
[ l Z ]
D. E. Knu th, The art of omputingprogramming-errata et
Rep. STAN-CS-71-194, pp. 21-26, Jan. 1971.
addenda,”Com put. Sci. Dep., Stanf ord Univ., Stan ford, Calif.,
[
131
C. M. Rader,“Thenumber heoreticDFTandexactdiscrete
convolution,” presented at IEEE A rden House W orkshop on Digi-
tal Signal Processing , Harrima n, N.Y., Jan.
11, 1972.
[ 141
-,
“Discrete convolutionvia Mersenne transforms,”
IEEE Trans.
Compur.,
vol. (2-21, pp. 1269-1273, Dec. 1972.
[
151
R.C. Agarwal and C. S. Burrus, “Fast digital convolution using
Fermat ransforms,” n
Sourhwesr IEEE Con f. Rec .,
pp. 538-
[ 161 0 re,
NumberTheory and ItsHistory.
NewYork:McGraw-
543,
Apr.
1973.
[ 171 G. H. Ha rdy and E. M. Wright,
The Theory o f Numbers.
Oxford,
Hill, 1948.
[ 181 N.
S.Szabo and R. I. Tanaka, Residue Arithmetic and Its Applica-
England: Oxf ord Univ. Press,
1960.
[ 191
L. E. Dickson, Hisrory of the Theory of Numbers, vol. I. Wash-
t iom
to
Computer Technology.
New York : McGraw-Hill,
1967.
[ZO]
R.C. Singleton,
“An
algorithm for compu ting he mixed radix
ingto n, D.C.: Carnegie Institute,
1919,
p.
376.
fast Fourier transform,” IEEE Trans. Audio Electroaco ust., vol.
I 2
1
L.
B. Jackson. “On the interactio n of roun d-off no ise and d vnam ic
AU-17, pp. 93-103, June 1969 (also in [ZS]).
. .
range in digital fiiters,”
Bell Sysr. Tech. J. ,
vol. 49, pp. 159-1 84,
1221
R.C. Aganval, “On realization of digital filters,” Ph.D. disserta-
Feb. 1970 (also in [ZS]).
tion , Dep. Elec. Eng., Rice Univ., Ho us ton , Tex., Dec. 1973.
[ 2 3 ]
R.C. Agarwal and C. S. Burrus, “Fastone-dimensionaldigital
convolutionymulti-dimensionalechniques,” IEEE Trans.
Acousr., Speech, and S nal Processing, vol. A SSP-22, pp.
1-10,
Feb.
1974.
[ 241 C. M. Rader, Private Comm un.
(261
I. S. Reed, Private Com mun.
[25 T. W. Parks, Private Commun.
1271
C.M. Rader, “On the application ofhe umberheoret ic
transforms of high speed convolution to two-dimensional f i te r-
ing,” ZEEE Tram. Circuit Theory, o be published.
[ 2 8 ] L. R. Rab iner and C.M. Ra der , E&.,
Digital
Signal
Processing.
New York: IEEE Press,
1972.
[ 2 9 ] C.M. Rader, A no teonexact discreteFourier ransforms,”
IEEE Trans. Audio El e c tr ~ o u st . Conesp.), vol. AU-21,pp.
[ 301 H. Takah asi and Y. Ishibashi, “A new meth od for ‘exact calcula-
558-559, Dec. 1973.
t ion’ by a digital compu ter,”
Znform. Process. Jap. ,
vol.
I,
pp. 28-
42, 1962.