1975 Number Theoretic Transforms to Implement Fast Digital Convolution

Embed Size (px)

Citation preview

  • 8/20/2019 1975 Number Theoretic Transforms to Implement Fast Digital Convolution

    1/11

    5 5 0 PROCEEDINGS

    OF

    THEEEE,

    VOL. 63

    NO. 4, APRIL

    1975

    Number Theoretic Transforms to Implement

    Fast

    Digital

    Convolution

    Invited Paper

    Absmiet-Transforms using number theoretic concep ts are developed

    as

    a method for

    fast

    and

    erroraee

    calculat ion of r i t e digital convolu-

    t ion. The transforms are def i ied on f ini te f ie lds

    and

    rings of integers

    with

    ari thmetic ed o u t

    modulo

    an integer

    and it

    is shown that

    undercertaincondit ions this gives th e same esults as conventional

    digital conwlution. -use of these characterist ics hey are ideally

    suited to

    digital

    computat ion by

    t king

    into account quant izat ion of

    amplitude

    as

    well as t im e n h e ir d e f h i o n . W h en t h e m o d u l u s

    is

    chosen as

    a

    Fezmat number

    a

    t ransform results that requires on l y on

    t h e o r d a of N og N addi t ions and word shifts b u t no multiplications.

    In addit ion

    to

    being efficient , they

    have no roundoff

    enor a n d d o not

    require storage of basis functions. There s a restrict ion on sequence

    length mposedbyword engthanda problem withover t low but

    methods for overcoming

    these

    are presented. Results of an implemen-

    t a ti on o n an IBM 370/155 are presented and

    compared

    with the fast

    Fourier

    tnnsfom

    showing a substan t ia l improvement in effkiency and

    accuracy.

    Variations

    o n t h e basic number theoretic transformsare also

    presented.

    F

    I . INTRODUCTION

    INITE DIGITAL convolution is a numerical procedure

    defined by

    y ( n )

    =

    h n - m ) x ( m ) ,

    n = o ,

    1 , 2 , .

    .

    . (1)

    N - 1

    m

    O

    and symbolically denoted

    ~ ( n )h ( n )

    * x ( n )

    where x(n) , h ( n ) , and y ( n ) are digital numbe r sequences. This

    operatio n has many very powerful applications. It is used to

    imp lem ent nonrecursive or finite-impulse-response digital filters

    either directly or w ith sectioning or block technique s 11

    [

    2

    1

    and recursive or infinite-impulse-response digital fiiter by block

    me thods [3] . I t also is used to carry out a uto and cross corre-

    lation as well as for com putations such

    s

    polynomial multipli-

    cation and multiplication f very large integers [41 -[ 6

    There are several methods to implement f inite convolution

    that differ in the amount o f computa t ion required , the ef fec ts

    of

    arithm etic roundo ff, and the amo unt of storage required.

    It is som ewh at difficult to com pare various algorithms because

    of

    the tradeoffs between these various fac tors thatd epen d on

    the hardware o r software that

    is

    available. However, because

    of

    the complex i ty

    of

    performing multiplication, the num ber

    of multiplications necessary to implement convolutions often

    an impor tan t fac tor to beinimized.

    This work

    was

    suppor ted in

    part by

    the National Science Founda tion

    Manuscript eceived Septem ber

    5 ,

    1974; evised October 15, 1974.

    under Grant GK-23697.

    R. C. Aganval

    was

    w i t h the Depar tment of ElectricalEngineering,

    Rice University, Houston, Tex. He is now with the IBM Thomas

    J

    Watson Research Center, Yorktow n Heights, N.Y.

    10598.

    University, Houston,

    Tex

    77001.

    C.

    S. Burrus is with the Department of Electrical Engineering, Rice

    The use of transform methods has proven to be useful when

    an application allows sequences to be processed in blocks. Th e

    most versatile transform

    is

    the discrete Fourier transform

    (DFT) defined by

    DFT[ x l 4 X ( k ) = x(n) xp - j2nnk/N) ,

    N -

    1

    n = o

    k = O , l ; . - , N - 1 (2)

    and the inverse transform

    N -

    1

    k O

    I DFT[ Xl

    4

    (n) = N - ' X ( k ) expj2nnk/N) ,

    n = 0 ,

    , . * * , N - . (3)

    The proper ty of this transform that is important here is t h e

    cyclic convolu tion property (CCP) which states that

    DFT[h * x ]

    =

    D F T [ h ] D F T [ x ] .

    This implies that a convolu tion can be calculated by

    y ( n ) =

    I DFT( DFT[ h ] DFT[ x ] }4 )

    using two transforms,N multiplications, and one inverse trans-

    form.Theconvolution mplementedby 4) is cdl ed cyclic

    convolution since it evaluates (1) as if h ( n ) and x(n) were

    periodically extend ed outside of the range from 0 t o

    ( N -

    1)

    or, equivalently, the ndite sare evaluated mod N . Normal

    finite c onvolu tion can be calculated by cyclic convo lution if

    zerosare appende d ox(n)and h ( n ) to prevent olding or

    aliasing [ 11, [21.

    This transform approa ch became useful only whe n Cooley

    and Tukey

    [

    71 introduced a very efficient algorithm kno wn as

    the fast Fourier transform (FFT) for calculating the DFT and

    its inverse in

    (2)

    and 3) .Thenum ber of multiplications

    necessary to calculate the FF T

    of

    a number sequence

    of

    length

    N is on he order of N log2 N. mplementation of convolution

    using the FF T esults in a considerable

    savings

    in m ultiplication

    when lengths are approx imate ly above N

    =

    32. Th e disadvan-

    tage of this approach is in the form of significant amounts of

    roundoff error [

    81

    storage or generation

    of

    the complexbasis

    functions hat have tobe oundedand stillaconsiderable

    amount .of multiplying.

    If one look s for the properties that a general transform with

    the DFT st ruc ture

    ~ ( k ) x ( n ) a n k

    must have to have the CCP,

    it

    is

    f o u n d [ 9 ] ,

    [

    101 that

    a

    s a

    N -

    1

    ( 5 )

    n=O

  • 8/20/2019 1975 Number Theoretic Transforms to Implement Fast Digital Convolution

    2/11

    AGARWAL AND BURR US: NUMBERTHEORETIC TRANSFORMS

    root of unity

    of

    order

    N ,

    i.e.,

    N

    is the least positive integer

    such that

    d v

    1. (6)

    This analysis shows that n hecomplexnumb er field, the

    conventional DFT with

    a

    = exp (- j27r/N)

    is

    the only t ransform

    given by

    ( 5 )

    with he CCP. If, however, oth er fields nd

    arithm etic systems are used, new transform s becom e possible

    wit h very inte resti ng properties. This is pursued by considering

    mathematical systems that are fundamentally compatible with

    digital comp uting capability.

    In

    any practical situation,orwhen workingwith digital

    machines, thedata are available onlywithsome inite pre-

    cision, and therefo re, witho ut loss of generality, the data can

    be considered to be integers with some uppe r b ound . T o com-

    pute convolution in thisdigital domain,operations n he

    complex umber field of the ontinuou s omain can be

    imitated in a finite field o r, more generally, in a finite ring of

    integers under ddition s andmultiplications mod ulo some

    integer M.

    An

    integer

    a

    of orde r N replaces exp -j2n/N)

    used in a D FT. In this r ing, when two integer sequenc esx(n)

    and h n ) are convolved, t he ou tp ut integer sequence y ( n ) is

    congruent to the convent ional convolution

    of

    x(n) and

    h ( n )

    mod M. In the ring of integers mod M, co nven tional integers

    can beunamb iguously epresented if thei rabsolu te value is

    less than M/2 . If the in put integer sequences x(n ) and

    h ( n )

    are

    so scaled that

    ly n)l

    never exceeds M/2, we would get the

    same results by implem enting convolution in the ring of inte-

    gers mo d M as that obtained with norma l arithmetic. This is

    similar

    to

    the overflow constra int in ixed-point digital ma-

    chines. In most digital filteringapplications, h ( n ) represents

    the impulse response and is know n

    a

    priori;

    also the maximum

    magnitude

    of

    the inpu t signal

    is

    usually known.

    By workingn finit e field or ring of integers with

    arithm etic carried out mo dulo an integer M, a large class of

    transform s exist that have the CCP. By special choices of the

    leng th N , th e m od

    M,

    and the value

    a

    it is possible

    to

    have

    transforms hat need only word shifts and additions but no

    multiplications, that have an FF T typ e fast algorithm , that do

    not require storage

    of

    complex values for

    a,

    nd that have no

    round off errors. These transform s are called num ber theoretic

    transform s (NTT) and they look very promising in the evalua-

    tion of fin ite convolutions. Their main disadvantage seems to

    be a relation of the sequence length N to the required word

    length th at can requ ire long word lengths for long sequence

    lengths.

    These number heoretic ransforms are truly digital trans-

    forms, taking into account the quantizatio n in am plitude and

    the finite precision

    of digital signals. The y b ear the same rela-

    tio n o digital signals as theDFT does to discrete-time or

    sampled data signals and th e Fourier or Laplace transforms do

    to continuo us-time signals. In the same manne r that the rela-

    tion of discrete-time signals to continuous-time signals throug h

    sampling involves a possible folding or aliasing in the freq uen cy

    domain, the relation

    of

    calculations with the DFT to calcula-

    tions with the number theoretic transforms invohes possible

    folding of the am plitude th at m ust be taken into ac count.

    Th e litera ture on trans form s of these typ es is fairly recent.

    Kn uth [4 ] has proposed the use

    of

    transform s in finite fields.

    Pollard [5 ] discussed transform s having the CCP in a finite

    field and also gives con ditio ns for having transfo rms with the

    CCP in a finite ring

    of

    integers. Good [ 11 ] also mentione d the

    use

    of

    transform s in a finite ring

    of

    integers. Schonhage and

    5 5 1

    Strassen 61defined transform s having the CCP mod ulo a

    Ferma t num ber and discussed their application t o fast multi-

    plication

    of

    very large integers. Kn uth

    [

    121 elaborated on the

    work of Schonh age and Strassen. Nicholson [ 101 presented an

    algebraic theory of FFT’s in any ring and established fast FFT -

    type algorithms tocomp ute hese transforms.Rader

    [

    131,

    [ 141 proposed num ber theo retic ransforms in rings of integers

    modulo bothMersenne and Fermat numbers. He f i i t proposed

    the application to digital signal processing, showed th at he

    transforms couldbecalculated using only additionsandbit

    shifting, howed the word lengthconstrain t, and suggested

    two-dimensional transforms as a possible relaxation of that

    constrain t. Agarwal and Burrus [9 ],

    [

    151 discussed nu mber

    theoretic ransforms in detail, defined Ferma t num ber trans-

    forms and also proposed th eir applic ation for fast digital con-

    volution. They also suggested possible hardware and software

    implementations. Their implementation on the IBM 370/155

    showed a factor

    of

    3 t o

    5

    speed improvement over efficient

    FFT implementat ions

    of

    cyclic convolution for lengths up to

    256. An earlierarticle by Takahasiand shibashi [3 0] was

    recently brought to our attention by Dr.

    J

    W. ooley of

    IBM.

    11. M O DULAR

    ARITHMETIC

    In this section, ome of the basic conce pts of modular

    arithmetic from number heory relevant to

    NTT

    will be dis-

    cussed. Thiscanbe found nmost basic books onnumber

    theory

    [

    161,

    [

    171.

    Tw o integers Q and

    b

    are said t o be congruent mod

    M if

    a = b + k M

    (7)

    where

    k

    is some integer and

    M

    is the m odulus.

    This

    s written

    as

    Q = b

    (mod

    M). 8)

    All integers are c ongruent m od

    M

    to some integer in the finite

    set

    (0,

    1 , 2 ,

    *

    ,

    M

    -

    1) which

    is

    called the set of integers

    mod

    M

    and denoted by

    Z M . ZM

    is also known

    as

    the ring of

    integers mod M. If in a ring of integers multiplicative inverses

    exist for all nonze ro integers, this ring becomes field

    and it can be show n that

    Z ,

    is a field

    if

    and only

    if

    M is a

    prime. We will use the symbol

    “ZM”

    nd the expression “the

    ring of integers mod

    M ”

    or rings as well as fields since a field

    is

    also a ring. Th e following basic arith me ticoperations are

    permissible with modular arithmetic.

    Addition: Example, 7 + 12

    =

    19 = 2 (mod 17).

    Negation: Example, -7

    =

    - 7 + 17 = 10 (mod 17).

    Subtraction: Example, 7

    -

    12 = 7

    +

    (- 12) =

    7

    +

    5

    = 12

    (mod 17).

    Multiplication: Exam ple, 7

    X

    12

    =

    8 4

    =

    16 (mod 17).

    Multiplicative Inverse: Multiplicative inverse of an integer

    b

    in

    Z M

    exists if and only

    if b

    and

    M

    are relatively prime.

    In hat case

    b-’ is

    an integer such hat

    b X b-’ =1

    (mod

    M).

    Example, 7-’

    =

    5

    (mod 17); 7

    X 5

    =

    35

    =

    1

    (mod 17).

    Division: a / b exists if and only

    if

    b has an inverse. In

    that case

    a / b =

    X b-’

    .

    Example, 12/7

    =

    12 X

    5 =

    9

    (mod 17); 7 X

    9

    = 12 (mod 17).

    This may seem like a rather peculiar way to d o arithmetic

    but i t

    is

    used quite ofte n by everyone. In discussing the day of

    the week, o ne uses an arithm etic mo d 7 or in stating the time,

    one is calculating mo d 12 or perhaps 24. Indeed the mantissa

    of

    a number

    in

    scientific nota tion is evaluated mo d 10.

  • 8/20/2019 1975 Number Theoretic Transforms to Implement Fast Digital Convolution

    3/11

    5 5 2

    PROCEEDINGS O F THEEEE,PRIL

    1975

    Because of th e natu re

    of

    modular arithmetic, numbers do

    not have sizes or magnitude. We can n ot say tha t a particular

    number is larger than another or that two numb ers are close.

    Tuesday may n ot be close to Wednesday or ome before

    Wednesday

    if

    they occu r in differen t weeks.

    As was men tioned in the intro duc tion, for the existence of

    transforms with the D FT st ructu re given in ( 5 ) and having the

    CCP, it

    is

    necessary that an integer exist that

    is

    the N th roo tof

    uni ty. We will now consider this problem using mo dul ar

    arithme tic. First Euler's cp funct ion is definedas cp(M), the

    number

    of

    integers in

    Z M

    that are relative primes to

    M.

    For

    M

    a prime,

    q ( M ) = M

    1. If

    M is

    composite and its prime fac-

    tored form

    is

    denoted by

    M

    = p:p;

    * *

    p; the n the general

    expression for

    cp

    is

    cp M)

    = M ( 1

    -

    l /P1)(1

    -

    1/P2)

    * *

    (1

    -

    l /Pl) .

    An impor tant theorem known

    as

    Euler's theorem states that

    for every

    a

    elatively prime

    to M

    aq(M) 1

    (mod

    MI.

    (9)

    For M prime this reduces to Fermat's theorem

    aM- l

    = 1 ( m o d M )

    (10)

    which holds for

    all

    nonzero elements

    of ZM

    since t hey are all

    relatively prime to

    M

    if

    M

    is prime.

    There are certain roots

    of

    unity that are

    of

    particular inter-

    est. If

    N is

    the least positive integer such that

    aN

    =

    1 (mod

    M )

    (1)

    then

    a

    is said to be a roo t of unity

    of

    order

    N ,

    or simply

    of

    order

    N .

    In some

    of

    the literature

    a

    is

    said to belong to the

    exponent N or N

    is

    the exponent to which a belongs. Another

    terminology says

    a s

    a primitive Nth ro ot of unity.

    If the order of

    a

    the exponent t o which

    a

    belongs) is equal

    t o cp(M), then a s called.a primitive root (do notconfuse with

    a primitive Nth ro ot of unit y). If

    M

    is

    prime and

    a

    s a primi-

    tive root, he set of integers ak Mod M),

    =

    0, 1, 2,

    *

    ,

    M - 2)

    is the total set

    of

    nonzero elements

    in 2,.

    Thus all

    nonzero integers in

    Z M

    can be generated by powers

    of

    a primi-

    tive root . This characterizes the en tire field.

    Euler's theorem implies that if

    a

    s of

    order

    N

    then

    N

    must

    divide

    cp(M),

    denotedby

    NIcp(M).

    If

    M

    is prime it can be

    shown that roots of order N exist if and only if

    NI

    M 1)

    and the roots are iven by

    = 4 M - 1 N

    12)

    where

    a,+,

    enote s a primitive root. More generally,

    if a

    s a

    root

    of

    order

    N

    then

    ak s of order N / k

    if k

    IN

    ak s of order

    N

    if

    N

    and

    k

    are relatively prime. (1 3)

    This implies the number of roots of order

    N

    is given by

    cp(N)

    and, therefore , the num ber f primitive roots

    is cp(cp(M)).

    These

    relations will allow on e o calculate

    all of

    the oots

    of

    all

    possible orders from one primitive roo t. Tables will ofte n list

    primes a nd the smallest primitive roo t for each.

    These ideas will become clearer by looking at an example.

    Consider the field

    Z7

    witharithmetic mod7. Firs t we will

    g ive the f i t ew evalua tions

    of

    Euler's function.

    cp(1) = 1 p 2) = 1 (p(3)

    = 2 cp(4)

    =

    2

    ( ~ ( 5 )

    4

    ( ~ ( 6 )

    2

    ~ ( 7 ) 6. (14)

    Consider raising each element of

    Z7

    to powers f rom 1 to

    6

    (mod 7) .

    N = O 1 2 3 4 5 6

    1 N = l 1 1 1 1 1 1

    2 N = 1

    2 4 1 2

    4 1

    4 N = 1

    4

    2 1

    4

    2 1

    3 N=1 3 2 6 4

    5

    1

    5 N = 1

    5

    4 6 , 2 3 1

    g N = l 6 1 6 1 6 1

    This llustrates several very interesting features. Consider the

    various root s of order

    N .

    N

    Roots

    of

    order

    N

    1 1

    2 6

    3

    6

    2 , 4

    3, 5

    ~~ ~

    Only hose N that cfivide

    cp(M)

    =cp(7) = 6 have roots hat

    belong to them. The number

    of

    roots is given by

    p ( N )

    and the

    number

    of

    primitive roots

    iscp(cp(M))

    = 2 and they are 3 and

    5

    Note that both

    of

    the primitive roots generate

    all

    the nonzero

    eleme nts of t he field while the othe r ro ots gen erate yclic sub-

    sets with N distinct members. Also note that Euler's theore m

    (Fermat's th eorem in his case)

    is

    indeed satisfied n tha t all

    elements raised

    to

    the 6th power are congruent to uni ty and

    (1 3 ) does generate all the ro ots

    of

    order N from the primitive

    roots. Also note'that every nonz ero integer a has an inverse

    x?-2. For a nonprime M , has an inverse given by

    if

    a

    and

    M

    are relatively prime.

    By considering a similar example with

    M

    a composite rather

    tha n a prime, on e observes several differences. First

    ZM

    is not

    a field since all elem ent s will no t have inverses. There

    is

    no

    primitive r oot that will gen erat e the enti re ring, only subsets

    with

    cp(M)

    elements.

    When considering a nonprirrie m od

    M , M

    is a ring and in-

    verses exist onl y for integers relatively prime to M. Let M have

    the following unique prime power factorization.

    .p:' *

    . .

    ;l.

    (15)

    When the arithmetic s done mod

    M , t

    is in effect done modulo

    eachprimepower pi" simultaneou sly

    [ 4 ] , [

    181. A set of

    arithmeticoperations can bedoneeitherm odu lo each p y

    separately and he final esult mod M obtained using the

    Chinese remainder theorem [ 4 ] , [ 161, [ 181, or alternatively

    all the operations may be don e m odM, ut, they must be valid

    operationsmodfor each p y . An integer

    a

    is said

    to

    be

    of

    o rd e r N i n

    Z M

    if a d nly

    if

    i t

    is of

    order

    N

    in each

    Zpiri .

    Here

    we present so me basic results.

    a

    b

    ( m o d M )16)

    is true

    if

    and only

    if

    ~ = b m o d p j ' ) ,. i = 1 , 2 ; * * , Z .1 7 )

    If we kno w the residues of an integer

    a

    modulo each pj', we

    c n

    uniquely reco nstru ct he integer a (mod M) sing the

    Chinese remainder theorem given in the following.

    Let

  • 8/20/2019 1975 Number Theoretic Transforms to Implement Fast Digital Convolution

    4/11

    AGARWAL AND BURRU S: NUMBER THEORETIC TRANSFORMS

    5 5 3

    and

    d i (di mod p i ' ) (mod

    $1

    1

    r'

    -1

    then

    111. NUMBER HEORETICRANSFORMS

    In this section, the definition and basic conditions for he

    existence of th e NTT will be presented and in particular th e

    allowed relations b etw een the mo dulu s M and the transform

    length N and basis function

    a

    are spelled out .

    If we have a length

    N

    sequence of numbers, then a transform

    pair of th e form given by

    X k)= x ( n ) a k

    N-1

    n =O

    x ( n ) =N-1

    X ( k )

    a-nk

    N

    (20)

    k=

    is said

    to

    have a DFT structure. By requiring that application

    of

    the transform metho d in (4) results in cyclic convolution,

    the following theore m can be proven.

    Theorem I : A length

    N

    transform having the D FT structure

    will implement cyclic convolution

    if

    and only if the re exists an

    inverse

    of

    N and an element

    a,

    root

    of

    unity of order

    N ,

    i.e.,

    N

    is the least positive integer such that

    d v 1.

    This is a very general result applying

    to

    both rings and fields

    that are f i i i te or nf ini te and

    it

    has been developed from a

    variety of points of view [51, [9] , [ lo ] . In addi tion to the

    CCP, transform s of thi s ype also allow fast com puta tion

    algorithms

    of

    the FFT type when

    N

    is highly co mpo site

    [

    101.

    Theo rem 1 is some wha t difficult t o use when investigating

    various possible mod uli with mod ular arithm etic,

    so

    an alter-

    nate set of conditio ns will be developed. Let

    Z,

    represent the

    ring of integers (0, 1-, 2,

    * * ,

    M - 1) with arithme tic carried

    out mod M.

    Let

    M

    have the following unique prime power factoriz ation

    M

    = pi1 p 2 .

    ;l

    (21)

    where the pi's are distinct primes. As pointed out in Section

    1,

    when we carry,

    our

    arithm etic mod M, we are in effect doing it

    modulo eachpi' simultaneously.

    Therefore, the length N number theoretic transform having

    the CCP in

    Zw

    must also have th e CCP in

    Z p

    .ri

    for

    i

    =

    1, 2,

    *

    *

    I . This

    requires that mod

    p p )

    an inteber oforder

    N

    must exist in

    Z

    r i ,

    i.e.,

    N is

    the least positive integer such that

    a N =

    ( m o d p ? ) , i 1 , 2 ; * * , 2 .2 2 )

    Furthermore, since the inverse transform requires N-'

    ;

    he in-

    verse

    of

    N should exist in

    Z ',.,

    or,

    N

    should be relatively

    prime to M. Now we investigate the existence of an

    a

    of

    order

    N , in

    each

    Zp:,.

    By Euler's th eore m (9) and (22 ), we have

    Pi

    Pi

    ~ ~ c p ( p ? ) , i

    = 1 , 2 ,

    *

    * * , I (23)

    N l ( p i - l ) ,= l , 2 ; * . , 1

    N I g c d b ,

    -

    1 , p z

    -

    l , * * * , ~ l -) .

    We define

    O M)

    s the greatest common divisor (gcd) of the

    ( P i - 1)

    O ( M ) P g c d { p 1 -, p z -, . . * , p l -

    1).

    (24)

    Therefore,

    N IOW).25)

    Equa tion (25) gives the necessary conditio n for the existence

    of a transform of length N in th e a rith me tic mod Now c on-

    sider the converse of

    it.

    If

    NIO(M)

    r

    N ( c p ( p i ' ) ,

    hen here

    exist integers

    a i

    (mod

    p y )

    of order

    N

    in Z ri. Using these

    ai

    we can constru ct ransform s (mod p y ) wh ch have the DFT

    structure

    of

    (1 9) an d are nvertible.Combining hese rans-

    forms by the Chinese remainder th eore m (18) one can obtain

    a transf orm modM ) having the CCP in

    Z,.

    Alternatively,

    one can combine the ai's by the Chinese remainder theorem

    to obtain an

    a

    mod M ) f order N in Z and construct the

    final ransform using this

    a.

    Th e results will be dentical.

    The refo re, (25) is the necessary and sufficient c ondition for

    the existence

    of

    an invertible transform

    of

    length

    N

    which has

    the CCP mod M. This

    is

    stated

    in

    the orm

    of

    a theorem

    [91, [151.

    Theorem 2: A length

    N

    transform having the DFT structure

    will implement cyclic convolution modM

    if

    and only

    if

    p i

    N

    O(M). (26)

    This also establishes he maximum transform length in

    Z,

    as

    Nmax =

    O(M).

    This

    is a very impo rtant theorem that states exactly what the

    possible transfo rm lengths fo r a given modulus are.

    Althoughboth heorem s as stated here assume theDFT

    structure of (1 9) , theyhold for any general transform having

    the CCP

    [

    lo ]. I t is possible in a ring with a composite modulus

    to have a transform with the CCP but not the DFT structure.'

    Transforms

    of

    this sort do not allow an FFT -type fast algo-

    rithm and, therefore, do not eem promising.

    For numb er theoretic transforms to e attractive in compari-

    son to other implem entations of convolution, they should be

    computationally efficient.Thereare thre e requirements that

    will be considered. First,

    N

    should e highly comp osite

    (preferably apower

    of

    2) for a ast FFT-type algorithm to

    exist and should be large enough for practical sequence lengths.

    Seco nd, since complex multiplications take most of th e com-

    putational effort in calculating the FFT , it is importa nt that

    the multiplication by powers

    of

    a

    be a simple operatio n. This

    is

    possible

    if

    the powers of

    a

    have binary represen tations with

    very few bits; preferably also be a power of two, where multi-

    plication by a powe r of

    a

    reduces to a word shift. Third, in

    order o facilitate arithme tic mod

    M ,

    should also have a

    binary rep resentatio n with a very few bits and should be large

    snoug h to prevent overflow.

    Although the class of all possible numb er heoretic trans-

    forms seems very large at first consideration, closer examina-

    tion shows tha t very few seem to satisfy the aforem entione d

    criteria. The param eters that must be chosen are M,

    N ,

    and

    a.

    Unfortunately the conditions given by Theo rems 1 and 2 do

    Or

    ( p i -

    l ) (Phi - p i pi

    l ) . SinceN is

    structure

    w a s

    producedy

    G .

    Kopec at

    M.I.T.

    and led to this

    ri - ri-1

    'Anxample of aransform in Z,, having the CCP butothe D F T

    relatively prime

    to M

    (or

    pi 's)

    observation.

  • 8/20/2019 1975 Number Theoretic Transforms to Implement Fast Digital Convolution

    5/11

    5 5 4

    not give a system atic way of determining the "best" choices.

    As a result one must use intu ition , insight, and a bit of search-

    ing. Usually an M is selected and the resulting possible

    N

    and

    a

    are then examined.

    First we s e e that if M is even, it has a facto r of 2 and, there-

    fore, O h 4 ) and Nmax re 1 which implies M should be od d. If

    M is a prime then O ( M) = M - 1 which is as large as one could

    hope for in a field of M

    integers. For

    M = 2 &-

    1, let k be a

    composite P Q , where P is prime. Then 2'- 1 divides fQ

    1

    and th e maximum possible length of the transform will be

    governed by he eng th possible for 2

    -

    1. Therefore, only

    the prime

    k

    need to be considered interesting. Numbers of this

    form are known as Mersenne numbersand Radar [ 141 has

    discussed convolu tion using Mersenne numbe rs in deta il. For

    Mersenne number transforms, it can be shown that transforms

    of length at least 2 P exist and he corresponding a is - 2 .

    Mersenne num ber transforms are not

    of as

    much interest be-

    cause

    2P

    is not highly composite and, herefore, we do no t

    have fast FFT-type computationalalgorithms.

    PROCEEDINGS

    OF

    THE IEEE, APRIL

    1975

    Number theoretic ransforms with Ferm atnum ber as a

    modulus are calIed Fermat number transforms (FNT).

    As discussed in the last section, for the FN T of length N t o

    exist , N must divide O(F,) = Nmax. Now, we consider rans-

    form lengths possible inarithmeticmodulo various Ferm at

    num bers and also give the corresponding values of a .

    Since F ermat numb ers up to F4 are prime, O(F, ) = 2 b , and

    we can have an F NT for any length N = 2 m ,

    m Q

    b . For these

    Fermat primes the integer 3 is an

    a

    of order N

    =

    2 b , allowing

    the largest possible transfo rm length. The re re 2 ' -

    1

    othe r integers also which are

    of

    order 2b and can be obtained

    from ( 1 3 ) . The integer 2 is

    of

    order N

    =

    2 b = 2' . If

    (Y

    is

    taken as 2 or a power

    of

    2 , all the powers of a would be some

    powers of 2 , and for these cases, as discussed in the last sectio n

    and in [ 141, the FNT can be computed very efficiently and is

    called the Rader transform (RT).

    T o bet ter see the charac ter of these prime mo duli consider

    an exam ple for F2 similar in m anner to that in Section 11. If

    the modulus is M

    =

    F2

    =

    17 then

    ~

    N = O 1 2 3 4 5 6 7

    8

    9

    10 112 13 14 156

    a N l

    4

    8

    16 15 13 9 1

    2

    4 8

    16 15 13

    9

    1

    3 N = 1 3

    9

    10 13 5 15 11 16 14

    8

    7

    4 12 2 6 1

    q N = 1 4 16 13 1

    4

    16 13

    1

    4

    16

    13 1 4

    16 13

    g N = l 6

    2

    12

    4 7

    8

    14

    16

    11

    15

    5 13 10 9

    3 1

    Fo r

    M = 2 k +

    1 and

    k

    odd,

    3

    divides

    2&+

    1 and the largest

    possible transform length

    is 2 ,

    thu s w e consider only

    k

    even.

    Let k be s2', where

    s

    is an odd integer. Then 2 2 f

    + 1

    divides

    2 '+ 1 and 'the length

    of

    the possible trans form will be gov-

    erned by the leng th possible for 22' + 1. There fore, integers of

    the form M

    =

    22'+ 1 are of interest. These numbers are known

    as Fermat numbers andwill be discussed in detail inhis paper.

    Fermatnumbers seem to optimum in the sense of having

    transforms whose length is interesting while the word size

    is

    moderate. Numbers

    of

    the form

    2 '+

    1

    are

    also of

    limited

    interest and are discussed in Sectio n IX. A s ystem atic investi-

    gation of those

    M

    which require more than two bit epresenta-

    tion is difficult. Our preliminary investigation in that direction

    has

    not been very encouraging.

    Here we see that 3 and 6 are primitive roots tha t will generate

    the entire field 21 . The value 2 is

    of

    order 8 and 4 is of order

    4 . Also note that 6 =*in th e sense th at 6 2

    =

    2 ( mo d 1 7 ) .

    Fo r digital filtering applications, the compo sites F s ( b

    =

    3 2 )

    and F 6 ( b

    =

    6 4 ) also seem to bepractical.Lucas [ 191 has

    proven tha t every prime fac tor of a composite F ,

    is

    of the form

    K2'

    +

    1. Therefore, 2' divides O(F, ) , for r > 4 . In par-

    ticular it can be verified tha t for Fs and F 6 , O ( F , )

    =

    2 '.

    Therefore, for hese choices of Fermat numbers, th e maximum

    possible transform length

    is

    N

    =

    2'+'

    =

    4 b .

    Also, we assert tha t

    b given by ( 2 8 ) s of order 4 b in

    ZF,,

    2 .

    *4

    a4b

    =

    2b/4

    ( p / 2

    -

    1)

    ( 2 8 )

    We denote this

    (Y4b

    as *because

    IV.

    FERMATNUMBER RANSFORMS

    (Y

    =

    2 (mod F f ) .

    In this section, we consider one of the m ost promising num-

    ber theoretic transforms where the modulus

    is

    chosen

    to

    be a The proof that

    a 4 b

    given by ( 2 8 )

    s

    of order 4 b with respect

    to

    Fermat number

    any fac tor of F ,

    is

    given in [ 9 ] . An y odd power of *will

    also be of order 2'+'. By raising fl o (2'+2-m)thpower,

    Table I below gives values of N for the two most important

    ( 2 7 ) values of & and also gives the ma xim um possible f l for the most

    M = F , = 2 ' + 1

    we obtainn integer

    a

    of order 2 m , m Q t + 2 .

    = a b

    1

    b = 2 '

    and F , is called the t th Ferm at num ber. Originally, Ferm at

    conjectured

    [

    161 that these numbers were

    all

    prime but un-

    fortunately not only was the conjecture wrong, it seems that

    only FO through F4 are prime and all the others are omposite.

    Th e first few values are:

    Fo

    =

    F1

    =

    F2

    =

    i } prime

    F3 = 257

    F4 = 6537

    F S = 4 294 967 297 641

    X

    6 7 0 0 4 1 7

    F 6 2 1 . 8 4 x 1019 = 2 7 4 7 7 x 6 7 2 8 0 4 2 1 1 0 7 2 1 .

    practical values of b .

    Fo r

    FNT s

    with a prime or composite

    modulus

    we see

    a = 2

    or a power

    of

    2

    is

    possible for sequence lengths up

    to

    N

    =

    2 b

    =

    2'*'.

    This

    is a very desirable situation since

    N

    is highly com-

    posite allowing an FFT type algorithm and all multiplications

    by pow ers of a are simple word shifts. If a = a i s sed then

    sequences of length

    N

    = 4 b = 2' are possible but one stage

    in the FFT a lgor i tw wi l l requ ire two sh i f t s [ 9 ] . This a

    = .\/z

    and the resulting N = 4 b give the maximum length possible for

    Fs and F 6 , however, for prime F , furth er increases in N are

    possible up to N = 2 if mo re stages of th e FF T algorithm are

    allowed to have multiplication rather than simple word shifts.

    From this example it

    is

    seen that a

    =

    *= 6

    for

    M

    = F z

    gives

    the max imum possible N .

  • 8/20/2019 1975 Number Theoretic Transforms to Implement Fast Digital Convolution

    6/11

    AGARWAL AND BURR US: NUMBER THEORETIC TRANSFORMS 5 5 5

    TABLE

    PARAMETERSOR

    SEVERAL

    OSSIBLEMPLEMENTATIONSOR FNT’s

    N

    r b

    F ,

    Q = 2

    CY Nmax

    Q

    forNm,x

    3

    2’ + 1

    16

    32 25 3

    4

    216

    +

    1

    32

    64

    65536

    5 32

    2”

    + 1

    64 128 128

    6 64

    264 + 1 128

    256

    25 6

    a m ~

    ase

    corresponds to

    t he Rader Transform.

    Because of thenature of modulararithm etic discussed in

    Section

    11,

    theFNTcoefficientsdonot seem to have any

    physical meaning. Although th e signal for which the FN T is

    being taken may be very small, its FN T coefficients may lie

    anywhere between

    0

    and

    F , -

    1. This is because the conce pt

    of magnitude does not exist. This also me ans tha t the con cep t

    of

    “closeness” of two num bers does not exist in the mo dula r

    arithmetic.Therefore,approximationsor roundings arenot

    allowed in the m odular arithmetic. A seemingly small approxi-

    mation in the t ransform domain may introduceerious error in

    the final esult. But, because of thenatu re of themodular

    arithm etic, there is no need for a pproxim ation. During various

    stages of the comp utation each accumulation of signal “over-

    flows” man y times. But still the end result of the convolution

    will be exac t

    if

    the input signals are prope rly boun ded. Some

    of the propertiesof the FNT’s are given in [ 9, appendix

    A ] .

    Example

    T o make th e ideas of this sectio n more clear, we now present

    an example. This examp le will illustrate several poin ts: trea t-

    ment of negative values in the data, the structure of the trans-

    form and the inverse transform matrix, negative powers of a

    frequen t “overflow” during com puta tion , meaninglessness of

    the transform values, and exactness of the final answer. This

    example will not demonstrate the efficient implementation

    of

    the FNT using the binary arithmetic.

    Consider two sequences

    x

    =

    ( 2 , - 2 ,

    ,

    0

    nd

    h =

    (1,

    2 , 0, ) ,

    whose conv olution is desired. From he overflow considera-

    tion, t is sufficient ifwe workmodulo

    F2

    = 17. We want

    =

    4 , for F2 the integer

    2

    is of order 8, therefo re

    2’

    =

    4

    s an

    a

    of order 4.

    The transformation matrix

    Tis

    given by

    4

    16 13

    (mod 17).

    L1 136

    4J

    ince 4-’

    =

    - 4 (mod 17), he inverse transformationmatrix

    T-’

    is given b y

    r l l 1 1

    T-1 = 4-1

    r l l l

    = - 4

    I

    1 - 4

    -1

    1 -1 1

    L1

    4

    - 1

    - 1

    - 4

    r l 1 1 1 1

    = 1 3 I I (mod7).

    1 136

    4

    1 16 1 16

    Ll 4 1 63 1

    The t ransformsof

    x

    and

    h

    are given by

    1 1

    4

    16

    16 1

    13 16

    Note that in x ,

    - 2

    was represented by

    - 2 +

    17 = 15. Similarly,

    H

    = (3 , 9 , 16 , 10 ) and

    Y

    = X

    H =

    ( 3 , 9 0 , 8 0 , 9 0 )

    =

    (3,

    5, 12, 5 )

    (mod 17).

    Taking the inverse transform

    of Y ,

    y = ( 2 , 2 , 1 4 , 2 )

    (mod 17).

    According to

    OUT

    assum ption, integers are supposed to lie be-

    tween -8 and 8. Therefore, 14must beepresented as

    14 - 17 = -3 . This gives

    y

    = ( 2 , 2 , 3 , 2 ) , which is the co rrect

    answer.

    Also,

    note that

    y is

    a symme tric sequence, therefore,

    Y

    is also a symmetric sequence. Oth er than this, the transform

    values seem to have no interpretation .

    For man y applicationsa direct application of the FNT

    to

    imple men t convolution will result in a significant impro vem ent

    over any alternative methods. There are many other situa tions

    where the constraintsof the trans form are too severe.

    If data magnitude or machine constrain ts dictate acertain

    word length andhencea certain F , , the allowed sequences

    length

    N

    may be too short. If input da ta magnitude and filter

    length indicate a possible out pu t magnitude t hat w ould exceed

    F , / 2 ,

    then overflow becomesaproblem. If

    b

    bit words are

    used w ith a modulu s of

    F , = 2 b +

    1, themachine can represent

    2b

    integers but the transform needs

    2 b + 1.

    We now consider

    several partial solutions t o these problems.

    V.

    M E T H O D S

    FOR CONVOLVINGLONG SEQUENCES ND

    Arithmetic mod

    F ,

    can be implemented using

    b

    =

    2‘

    bit

    represen tation of integers with some provision for representing

    2 b .

    We have seen the maximum length of sequences which can

    FOR

    AVO IDING

    VERFLOW

  • 8/20/2019 1975 Number Theoretic Transforms to Implement Fast Digital Convolution

    7/11

    5 5 6

    PROCEEDINGS O F THEEEE,PRIL

    1975

    TABLE

    I

    M A X I M U M ONE-DIMENSIONALYCLICCONVOLUTION ENGTHSUSING

    TWO-DIMENSIONALNT O R RT

    Word Length

    b

    N or Q = 2 N or Q =

    fi

    16 512

    32

    64 8192

    be cyclically convolved using the FN T with

    a

    =

    2 is N

    =

    2b and

    therefore the length of sequences which can be convolved is

    proportional to the word length n bits. Thus, for long se-

    quences, word length equirem ent may be excessive. Rader

    [ 141 suggested using a two-dimensional convolution scheme to

    convolve longone-dimensional equences nd Agarwal and

    Burms [ 151, [231 presented such a two-dimensional convolu-

    tion scheme. Using this schem e, cyclic convolution of leng th

    N = LP is implemented as a two-dim ensional cyclic convolu-

    tion of length 2L by P. This two-dimensional cyclic convolu-

    tion can be imp leme nted using a two-dimensional FN T [ 151,

    [2 3] defined similar to the one-dimensional transform. Using

    this two-dimensional schem e, the w ord length required is pro-

    portional to the square root

    of

    the length of the sequences to

    be convolved which would give for a max imum sequenc e le ngth

    8b2, rather han 4b. If

    P

    is taken as the maximum possible

    length 4b, and 2L is a small integer, than either direct convolu-

    tion or ano ther high-speed algorithm could be employe d

    [

    23

    1

    ,

    tocom pute convolution along theshort dimensionand the

    one-dimensional FN T could be used along the long dimension.

    Com putation ally this c omb ination can be very efficient as will

    be shown in the implem entation in Section VIII. Table I1 lists

    the maximum engths for two-dimensional ransforms. This

    approach requires approximately a factor of two increase in

    com puta tion and storage requirem ents over a irect ne-

    dimensional implementation.

    Anotherapp roac h o achieving longer ransforms than

    al-

    lowed with a

    =

    2 is to use a =

    fi

    n (28). It can be show n

    [9, 15 tha t

    a

    =

    fi

    s of order N

    =

    4b

    =

    2'" and th at mu lti-

    plication timesfi equires two word shifts..

    Examination

    of

    the FF T algorithm [ 11 shows that if a =fi

    isused, only one tage will require multiplication by odd pow ers

    of

    fi

    nd from [9 ] i t is shown his can be done with two

    word shifts. The oth er stages will multiply by even powe rs of

    fi nd, therefore , use a single shift as for the case with a = 2.

    This modification is relatively simple and allows a dou blin g of

    the allowed N. Note in the example for Fz that fi 6 which

    also gives N

    = N m s .

    Each additional square root of

    a

    results in a doubling of the

    allowed N ( u p to N = Nma) and adds an addition al stage

    of

    calculation to th e FF T algorithm. Unfortunately, beyond

    fi

    each stage will require a general multiplication. Fo r he case

    where F , is a prim e, if a few stages of multiplication can be al-

    lowed, then N can be increased [ 251. For F 5 and F6 a =

    fi

    gives the rnaximum N.

    This use of a 2 can be com bined w ith the two-dimensional

    me thod s to give various desired N.

    An othe r possible problem arises because of the mo dular arith-

    metic. In the ring of integers mod M, conventional integers can

    be unambiguously represented only if their absolute value is

    less than M /2. If the npu t integersequences x(n ) and

    h ( n )

    are so scaled that Iy(n)l never exceeds M /2, we would get the

    same results by implementing convolution inhe ring of integers

    modulo M as thatobtained with normalarithmetic.This

    is

    similar tohe overflow constra int inixed-point digital

    machines. In most digital filtering applications,

    h n )

    represents

    the impulse response and is known

    prior i ;

    also the maximum

    magnitude of the inpu t signal is usually know n. In this situa-

    tion, we can bound the peak output magnitude by

    k O

    This may well require a longer word leng th than

    is

    possible or

    practical.

    One possible so lut ion to this overflow problem involves seg-

    menting the words into sho rter blocks an d convolving them

    separately

    [

    91

    :

    x(n) = x z ( n )+x1(n)2k, Ix,(n)l < 2k

    h ( n ) = h z ( n ) + h , ( n ) 2 k , Ihz(n)l < 2 k ( 3 0 )

    + ( x 1

    *

    h2 + x 2 *

    h , ) 2 k

    + x 2 *

    h z .

    (3 1 )

    y = x * h = (x1 * h1)22k

    Now, since x1 , h 1 , x ? , and h2 have roughly half the num ber

    of

    bits, it should be possible to convolve them using a pproxim ately

    half the num ber

    of

    bits. If necessary, a more precise analysis of

    the above situation could be easily perform ed. In (31), the last

    term, in comparison to the first term , is very small and can

    be

    neglected. We need t o take tw o transforms for x and tw o trans-

    forms for h , the summation shown within the parentheses can

    be performed in the transform dom ain.Finally, we need t o take

    tw o inverse transforms, one orx 1 * h l and theother or

    There is anoth er alternative t o thisproblem suggested by

    Rader [ 241 and Parks [251. This

    is

    based on he Chinese

    remainder theorem. The convolution is done modulo two dif-

    ferent integers Ml and

    M 2

    where

    Mi

    and M2 are such that t he

    cyclic convolution in

    Z M ~

    nd

    Z M ~

    s easily im plem ented on

    the same machine. The final result mo d M1

    .

    M z is obtained

    by the Chinese remainder theorem. M1 is usually a Fermat

    number and the cyclic convolution in Znnl is computed using

    the FNT. Rader [ 241 suggested usingM2

    a

    power of 2, in that

    case the convolution n ZM? is computed by taking the se-

    quences mod M z and then convolving them in Z M ~ sing the

    FN T and hen reducing hem back to Z M ~ . n this case M2

    should be small enough so that no error is introduced by im-

    plementing convolution mod M z , n Z M , . This requires

    (x1 * h2

    +

    x2

    *

    h l ) .

    NM; < M ~ . (32)

    Parks [25

    I

    suggested tha t M z could be a Fermat number, just

    smaller than M 1 .

    In this case the convolution in Z M ~an also be done using an

    FNT. Furtherm ore, the same machine can be util ized to com-

    pute the FNT in Z M ~ . ecause M2

    I

    M1

    -

    2), therefore, arith-

    metic in Z M ~ an be carried out in

    Z ( M , - ~ ) .

    Example:

    M1 =

    216 + l M1 2

    =

    216 - 1, an d Mz

    =

    2* + 1 These ideas can be

    further extended.

    Stil l another approach to solving the sequence length N a n d

    word le ngth cons traints would be t o use block processing [ 31.

    By breaking the sequence of length N into smaller blocks and

    scaling and processing them separately with the FN T one can

    combine the results to get the desired outp ut. This can be

    viewed

    as

    a type of two-dimensional processing.

    VI.

    OVERFLOW AND

    QUANTIZATION

    CONSIDERATIONS

    As mentioned in Section

    V ,

    we could perform cyclic convolu-

    tion modulo integer

    M

    and obtain the correct result

    if

    the ab-

    solute value of the outp ut never exceeds M/2. If this conditio n

  • 8/20/2019 1975 Number Theoretic Transforms to Implement Fast Digital Convolution

    8/11

    AGARWAL AND: NUMBER THEORETIC TRANSFORMS 5 5 7

    is violated, the resulting err or is rath er serious. Because of the

    nature of the modular arithmetic we obtain folding or aliasing

    of

    the signal amplitude . This situation could be avoided if the

    signals are properly quantiz ed (or normalized).

    Let x ( n ) and

    h n )

    represent the original signals. They may

    have fractional parts (bits to the right of the binary point). To

    make use of the num ber theo retic transforms these sequences

    must be integer equences.This is easily accom plished by

    merely shifting the binaryposition all the way to he right.

    This introduces scale factors in the sequences. The integer se-

    quences

    ? n)

    and

    c n)

    are given by

    ? n)

    =

    x ( n ) 2bl (33)

    C n)

    = h n )

    2b1 (34)

    where

    bl

    and

    b z ,

    espectively, represent the num ber of bits to

    the right of the binary poin t in x ( n ) and

    h n )

    sequences.

    jqn)

    =

    x (n) * L n)

    =

    2b1+ba x ( n ) * h n )

    = 2b1+bly n ) . (35)

    Now wegive some u pper boun ds on h e output y n ) . These

    are due t o Jackson

    [

    2

    1

    .

    The

    L ,

    norm

    of

    a

    signal is defined by

    The ou tpu t of the cyclic convolution is bounded by

    Ijqn)l

  • 8/20/2019 1975 Number Theoretic Transforms to Implement Fast Digital Convolution

    9/11

    5 5 8

    PROCEEDINGS O F THEEEE,PRIL

    1 9 7 5

    TABLE 111

    CYCLIC CONVOLUTIONTIMINGS

    OR

    LENGTH

    N

    REAL

    SEQUENCES

    FFT FNT or RT

    N ms) ms)

    32 16 3.3

    64 31

    1 2 8

    1.4

    16.6a

    256

    256 8O.Oc

    512 245 166.OC

    102 4 53 0 34O.Oc

    204 72O.Oc

    W i n g

    a

    = IT.

    %sing two-dimensional RT.

    2

    by

    128 convolution.

    fast FN T, unlike the FF T, we do not need t o store the powers

    of

    a

    (if

    a

    is taken as 2 or a power of 2).

    VIII. C OMP A R I S ON WITH THE

    FFT

    As

    noted in the previous section, computing the FNTs a very

    simple operation on a binary machine. Now let

    us

    compare the

    complexity of various basic operations involved in computing

    the FN T vis-&vis th e FFT . If the tw o sequences x ( n ) and h ( n )

    have

    b l

    and b z bit rep resentatio ns, respectively, andare of

    length N , then the ou tpu t y ( n ) would need no more than a

    (b l

    +

    bz +log , N) bit representa tion. To obtain he correct

    result b 2 l

    +

    b2

    +

    log, N . In Section

    V ,

    we havegiven a

    bet ter bound on the ou tpu t . n Section VI, we have given other

    bounds. Roughly speaking, we need twice the num ber of bits

    to carry out the convolution using the FN T as compared t o the

    fiied-pointFFT mplementa t ionof heconvolu tion. But in

    the DFT, every data point is treated as a complex number and

    therefore requires two w ords, one for the real part and on e for

    the imaginary part. Thus, in effect, the hardware requirem ent

    for two transforms is about the same. Although for real data

    it

    is

    possible to make use of the sym metry propertie s of the

    DFT’s they require extra computatio n and for the purpose of

    comp arison it will be ignored, even t hou gh we have take n this

    into accoun t for our IBM 370/155 implem entation to be dis-

    cussed later. Therefore, we shall assume that in the FF T imple-

    men tation, each data p oint is represented by a b/2 bit real part

    and a b/2 bit imaginary p art,

    One b/2 bit com plex addition is equivalent to tw o b/2 bit

    real addition s, which are comparable o a b-bit ad dition mod F,.

    Thus, the complexity of addition/subtraction is the same in

    both the transforms. Similarly, it can be shown that a b/ 2 bit

    complex multiplication is comparable t o a b-bit m ultiplication

    mod

    F,.

    Computation of the RT requires multiplications by

    powers of 2 , which imple me nted as bit shifts and subtraction s

    become much simpler operatio ns compared to complex multi-

    plications required in the F FT implem entation.

    To ompute aength

    N

    fast

    RT, N

    log2 N additions/

    subtractions, and

    ( N / 2 )

    og2

    N / 2

    multiplications by some

    powers

    of

    2 are required which are implemented as bit shifts

    and subtractions. To compute the convolution using the FF T,

    most of the time is taken in com puting the complex multipli-

    cations required to comp ute the ransforms.

    A

    comparison with

    RT reveals that these complex multiplications are replaced by

    bit shifts and subtrac tions which are much faster operatio ns.

    This results in considerable com putation al savings in th e imple-

    me ntatio n of convolution. Th eomp utation required to

    multiply he wo transforms is about he same for both he

    implementations. To convolve long sequences using the two-

    dimensional RT the computational effort and required storage

    increases by, at the mos t, a factor of 2. Still, the FN T imple-

    men tation of convolution is much faster as com pared to th e

    FFT implementa t ion .

    These transforms were implem ented in assembler language on

    an

    IBM 37 0/1 55 which has a 32-bit word length

    [ 9 ] .

    he re-

    sults were compared with an efficient F F T program for com-

    puting convolution which m akes use of the sym metry

    of

    the

    D FT for real da ta (see Table 111) [201.

    Ix.GENERALIZATIONS,

    ARIATIONS, A N D OTHER

    RESULTS

    A . Other Choices

    for

    M

    In this paper, we have primarily discussed numb er theoretic

    transforms n he rings of integers modulo Ferm at numbers.

    These numbers seem to be the best choice for impleme ntation

    on binary compu ters. Nevertheless, any odd integer M can be

    used as was discusse d in Sec tion 111. Rad er [ 141 proposed the

    use

    of

    Mersenne numbers M p

    =

    2p - 1, where p is aprime.

    Mersenne number ransforms for (11 = - 2 have N = 2p, and ,

    therefore,oot have an FFT-type fast computational

    algorithm.

    FNT’s require the com puting word leng th to be a pow er of 2.

    Many com puters do not have word length a power of 2. Trans-

    forms similar to FNT’s exist for many of these situations. For

    example, on a 24-bit machine, we may perform convolution

    modulo

    M

    = 224

    +

    1. Fo r this

    M , a

    = 23 gives N = 16, and

    a

    = 2%’

    = 2’(212

    - 1) gives N = 32 as the maximum length.

    In general one could take M = 2”

    +

    1, s is an od d integer.

    In hat case,

    a

    = 2’ would g ive N = 2“’, and

    a

    =

    (2’)’’’

    =

    situation s, t may be possible to have transforms of greater

    length tha n 2“’. Fo r xam ple , taking M

    =

    240

    +

    1, he

    maximum possible transform ength is 256 , and taking M =

    2w

    +

    1, the maximum possible transform length is 1024 , but

    the corresponding

    a’s

    may not be simple.

    For many computers whose word length is not a power

    of

    2,

    if t he above formu lation is used to compu teransforms

    analogous to the FN T, the maximum transform length is very

    small. But, if we are willing to sacrifice the “effectiveword

    length” to some ex ten t, we can increase the m aximum trans-

    form length significantly. Let b , a multiple of 4 be the word

    length of the m achine. Let

    M =

    2b + 1 = M l M 2 .4 7 )

    M I

    and M z may be nonprim es. t can be easily proved th at

    r

    2[(’-1)/2+’zr-212’Zf-l - 1) would give N = 2r+2. In many

    It may

    so

    happen tha t M 1 s a small integer and O ( M z )

    >>

    O M), therefore, because of the presence of M 1 he maximum

    transform length is being considerably educed while at he

    same time M1 may not be increasing the maximum allowable

    outpu t (y,,,=) or he “effective word length” significantly.

    In this situation we can com pute transforms n

    2~~

    with

    maximum ransform length O M,) ndmaximum allowable

    output magnitude as

    M z

    /2. At the expense

    of

    reduced output

    range, we have increased the transform ength. Fu rther mo re,

    ar i thmetic mod M z can be conveniently ca m ed ou t as arith-

    met ic mod M , because

    M 2

    is a factor of M. At the end of the

    com puta tion, we have to reduce the result mod M 2 .

    Table

    IV

    shows this factoriza tion of M for several values of

    b .

    Log2

    M z

    shows the “effective word length”

    of

    the machine.

  • 8/20/2019 1975 Number Theoretic Transforms to Implement Fast Digital Convolution

    10/11

    AGARWAL ANDS: NUMBER THEORETIC TRANSFORMS

    5 5 9

    TABLE IV

    FACTORIZATIONF M = 2 b t 1

    AS MI

    ’M, A N D

    THE h h X l M U M

    TRANSFORM

    ENGTHC O R R E S P o N D I N G TO M2

    Machine Word

    ~~~ ~

    Effective Wordn

    2111

    Length Length

    log2 M z N or

    a = 2

    N or

    cz

    =

    fi

    b

    O W )

    MI MZ approx. N,,, = O(Mz) 2 bb

    12 16

    17

    24 1 8 240

    24 48

    20 16

    17 61681 16 61680

    40 80

    24 32

    257 97 X 673

    16 96 486

    28 16

    17

    15790321 24 15790320 56 112

    36 16

    17 X 241

    433 X 38737 24 144

    7244

    40 256

    257

    4278255361

    32 4278255360

    80 160

    48 64

    65537

    193 X 22253377

    32 192

    96 192

    .56 32

    257

    5153 X 54410972897

    48 224

    112 24

    60 16 17 X 241 X 61681 4562284561

    32 4562284560

    120 240

    72

    97

    X

    257

    X

    673

    577 X 487824887233

    48 576 144 288

    80 1024

    65537

    44479210368001 X

    645360

    16020

    414721

    TABLE

    V

    ME PARAMETERS FOR

    NUMBER

    HEORETICRANSFORMSN

    DECIMAL ARI THMETI C

    Machine Effective Digits

    b

    O M ) MI

    MZ approx. N,,,

    =

    O(Mz) N for cz = 10

    4

    8

    1

    73 X 137 4

    600

    101 9901 69002

    8 16

    17 5882353

    7 5882352 16

    10

    20

    101 3541 X 27961

    80 20

    12

    73

    X

    137 99990001

    89990000 24

    16 32

    1

    353 X 449 X 641 X 1409 X

    162 32

    ~~~~~

    Digits

    1 10 Mz

    69857

    Note that for the hoices

    of Mz’s

    show n in Table IV 2 is always

    of

    order 2b and also there exists an integer fi 2b/42’” - 1)

    which is

    of

    t he o rd er 4 b in Z M ~ . This leads to an efficient

    implementation

    of

    these transforms, because powers of a are

    simple and arithmetic mod M is also simple. For these trans-

    forms the w ord length is not a power of 2, but, still the trans-

    form length 4b is highly composite. The case b

    =

    60 needs

    special attent ion, for this if M I

    is

    taken as 17 for remaining

    Mz,

    M z ) =

    240 bu t fi s no t of orde r 240. This is because

    fi s

    of

    order 48 in

    Z Z ~ ]

    nd of o rder 80 in Z b la l . For th is

    case although one can find an integer

    a

    which is

    of

    order 240

    in Z M ~t is no t likely to be simple.

    Thus fa r, our discussion was based on the assumption th at

    thecomputer is abinary com puter. Many com puters have

    decimal representation of integers and for these com puters, it

    will be efficie nt

    if

    the ar i thmetic

    is

    done mod

    M

    =

    10’

    +

    1 and

    a and powers of

    a

    are powers

    of

    10. We have compiled Table

    V to be used for decimal com puters. Similar tables can be com-

    piled for othe r radices also.

    B.

    Complex Number Theoretic Transforms

    The nteger field Z , (assuming M to be prime) can be ex-

    tend ed to complex integer field denote d by

    Z&,

    if the follow-

    ing equa tion does n ot have a solu tion in Z,:

    xz +

    1 = 0. (49)

    This means

    (-

    1) does no t have a square root in Z or equiva-

    lently a root of order 4 does not exist in M. This implies

    Equation 50) s the only condition required for Z& to exist.

    In Z every integer is represented

    as a +

    b ;

    a ,

    b

    E

    Z, All the

    arithmetical operations are done as in thenormalcomplex

    arithmetic with j z

    =

    -

    1.

    Both real andcomplex parts are

    evaluated mod

    M,

    separately. The conce pt of magnitude and

    phase does not exist in Z h . Complex number theoretic trans-

    form s (CNT) similar to N TT exist in Z& , and can be used t o

    com pute he cyclic convolution of tw o complex nteger se-

    quences. To avoid error due toaliasing both real and imaginary

    parts

    of

    the outp ut should be separately bounded to M /2. Th e

    idea of the CNT has been considered by Reed [ 261.

    Theorem: A transform having the cyclic convolution prop-

    erty in Z& exists if and only

    if

    N I M Z - 1).

    We will n ot give a form al proo f

    of

    this theorem , but we

    will

    outline a procedure to find a complex integer a of order

    N

    in

    Z&, if N divides ( M z - 1).

    Theorem:

    This theorem can be easily proved. This theorem implies that

    every complex integer is at mo st an (M 1)th root of a real

    integer, in

    2 .

    Let

    a + j b

    be an N th roo t of a real integer, i.e.,

    N is the least positive integer such that

    (a

    +

    b>N = a real integer 53)

    then by 5

    2)

    4 1 O ( M ) = M -

    1.

    ( 5 0 ) N I ( M

    +

    1). (54)

  • 8/20/2019 1975 Number Theoretic Transforms to Implement Fast Digital Convolution

    11/11

    560

    How to find an

    a

    of order M Z 1 in Z&:

    Consider complex integers of the form (1 + b ) and search over

    b

    E Z M

    such that (1

    + j b )

    is an

    M

    1)t h roo t of a real integer

    (proof for the existence

    of

    such a

    b

    can be given).

    Then,

    (1 +jb>M+’

    = I

    + bZ.

    55)

    Let aM-, e a root of order

    M

    1) in Z M . Then 1

    +

    b Z can

    be written as some power of

    a ~ - ~

    (1

    +

    bZ)

    =

    a j f - 1

    .

    (56)

    It can be shown that x is odd. Then , -1 given by ( 57 ) is of

    order

    (Mz

    1)

    in

    Z&.

    aMa 1

    =

    (1

    +

    b )

    aJ?yl)’z)k

    (57)

    This

    can be easily proved. By raising a p - l by Mz l ) /N) th

    power, we can find a complex integer a~ of order N in Z& if

    NI(Mz - 1). It can be easily proved that

    g c d ( M -, M +) = 2 . 58)

    Let

    N = g c d

    M -

    , N )

    N = N I X N Z

    then a~

    will

    be complex, but (YN,& will be real.

    In the fast F FT algorithm to com pute the CNT the part cor-

    responding to N z will require complex arithmetical operatio ns

    but the part corresponding to N1

    will

    require only real arith-

    metical operations. CNT are good in theory as they offer more

    choice n ransform engths.But,

    so

    far no CNT have been

    found for which powers of a are simple. CNT do no t exist for

    Ferm at numbers, but they exist for Mersenne numbers.

    The only extension we have investigated is when

    M = P l Pz ” ‘ P I ( 5 9 )

    where

    p i ’ s

    are distinct primes. We have not investigated th e case

    when M contain s prime powers. Z M can be extended to Z& as

    before if

    4 Y ( p i -

    11, i =, 2 ; * . , 1 . ( 6 0 )

    Also

    the Chinese remainder theorem can be used in

    Z&.

    It is

    applied separately to the real and imaginary parts. For CN T to

    exist inZ&, they must exist inZ& also. This gives the following

    theorem.

    Theorem: CNT of length

    N

    in Z& exists, if and only if

    N lg cd {p :- 1 , ~ ; -, * . * , p ? -

    1 ) .

    (61)

    We find ai

    of

    order N in Z i i and then combine them by the

    Chinese remainder theorem to obta inan a of order N

    n

    Z&.

    C.

    Application t o Tw o-Dimensional Filtering

    Rader

    [

    271 has recently discussed the application of t h e FNT

    to two-dimensional filtering. In 2-0 applications, the length of

    the impulseresponse along eachdimension

    is

    not too large,

    therefore the FNT’s are ideally suited for this application, be-

    cause, for these applications the length constra int

    of

    the FNT’s

    PROCEEDINGS O F THE IEEE, APRIL 1975

    is not impo rtant. Other choices

    of

    M discussed in this section

    can also be used for this application.

    REFERENCES

    [ 11 B. Goldand C.M. Rader, Digital

    Processing

    of

    Signals.

    New

    [ 2 ] T. G. Stockham, “High speedconvolutionandcorrelation,” n

    York: McGraw-Hill, 1969.

    AFIPS Conf. Proc.,1966 Joint Cornpurer Con5, ol.

    28,

    pp.

    229-

    233 (also in [?SI).

    [

    31

    C. S. Burrus, “Block realization o f digital filters,” IEEE Trans

    Audio Electroacoust.,vol. AU-20, pp.

    230-235,

    Oct.

    1972.

    4 )

    D.E. Knuth,

    The Art

    of

    Computer Programming,

    vol.

    2 ,

    Semi-

    numerical Algorithms.

    Reading, M a s : Addison-Wesley, 1969.

    [

    51

    J M. Pollard, “The fast Fourier transform in a finite field ,”M ath.

    61 A. Schonhage and V. Strassen, “Fast mult ipl ication of large num-

    Comput.,vol.

    2 5 , pp.

    365-374,Apr. 1971.

    71 J W. Cooley and J W. Tukey,

    “An

    algorithm for machine calcula-

    bers,”

    Compur.

    (in German), vol. 7, pp. 281-292, 1971.

    t ionofcomplex Fourier series,”Math.

    Cornput.,

    vol. 19, pp. 297-

    301, 1966

    (also in

    [28]) .

    81 A. V. Oppenheimand C. Weinstein,“Effectsof finite register

    length in digital f i l tering and the fast Fourier transform,” Proc.

    91

    R.C. Aganv al an d C. S. Burrus, “Fast convolution using Fermat

    number ransforms with applications to digital f i ltering,”

    ZEEE

    pp. 87-97,

    Apr.

    1974.

    Trans. Acoustics,Speech,

    and

    Signal Processing,

    vol. ASSP-22,

    IEEE,voI.

    60,

    pp.

    957-976,

    Aug.

    1972.

    [

    101 Pi J

    Nicholson, “Algebraic theory of finite Fourier transform s,”

    J. Comput. Syst. S c i , vol.

    5,

    pp.

    524-547, 1971.

    [

    1 1 1 I.

    J

    Good, “The relat ion between two fast Fourier transforms,’’

    IEEE Trans. Compur., vol. C-20, pp.

    310-317,

    Mar.

    1971.

    [ l Z ]

    D. E. Knu th, The art of omputingprogramming-errata et

    Rep. STAN-CS-71-194, pp. 21-26, Jan. 1971.

    addenda,”Com put. Sci. Dep., Stanf ord Univ., Stan ford, Calif.,

    [

    131

    C. M. Rader,“Thenumber heoreticDFTandexactdiscrete

    convolution,” presented at IEEE A rden House W orkshop on Digi-

    tal Signal Processing , Harrima n, N.Y., Jan.

    11, 1972.

    [ 141

    -,

    “Discrete convolutionvia Mersenne transforms,”

    IEEE Trans.

    Compur.,

    vol. (2-21, pp. 1269-1273, Dec. 1972.

    [

    151

    R.C. Agarwal and C. S. Burrus, “Fast digital convolution using

    Fermat ransforms,” n

    Sourhwesr IEEE Con f. Rec .,

    pp. 538-

    [ 161 0 re,

    NumberTheory and ItsHistory.

    NewYork:McGraw-

    543,

    Apr.

    1973.

    [ 171 G. H. Ha rdy and E. M. Wright,

    The Theory o f Numbers.

    Oxford,

    Hill, 1948.

    [ 181 N.

    S.Szabo and R. I. Tanaka, Residue Arithmetic and Its Applica-

    England: Oxf ord Univ. Press,

    1960.

    [ 191

    L. E. Dickson, Hisrory of the Theory of Numbers, vol. I. Wash-

    t iom

    to

    Computer Technology.

    New York : McGraw-Hill,

    1967.

    [ZO]

    R.C. Singleton,

    “An

    algorithm for compu ting he mixed radix

    ingto n, D.C.: Carnegie Institute,

    1919,

    p.

    376.

    fast Fourier transform,” IEEE Trans. Audio Electroaco ust., vol.

    I 2

    1

    L.

    B. Jackson. “On the interactio n of roun d-off no ise and d vnam ic

    AU-17, pp. 93-103, June 1969 (also in [ZS]).

    . .

    range in digital fiiters,”

    Bell Sysr. Tech. J. ,

    vol. 49, pp. 159-1 84,

    1221

    R.C. Aganval, “On realization of digital filters,” Ph.D. disserta-

    Feb. 1970 (also in [ZS]).

    tion , Dep. Elec. Eng., Rice Univ., Ho us ton , Tex., Dec. 1973.

    [ 2 3 ]

    R.C. Agarwal and C. S. Burrus, “Fastone-dimensionaldigital

    convolutionymulti-dimensionalechniques,” IEEE Trans.

    Acousr., Speech, and S nal Processing, vol. A SSP-22, pp.

    1-10,

    Feb.

    1974.

    [ 241 C. M. Rader, Private Comm un.

    (261

    I. S. Reed, Private Com mun.

    [25 T. W. Parks, Private Commun.

    1271

    C.M. Rader, “On the application ofhe umberheoret ic

    transforms of high speed convolution to two-dimensional f i te r-

    ing,” ZEEE Tram. Circuit Theory, o be published.

    [ 2 8 ] L. R. Rab iner and C.M. Ra der , E&.,

    Digital

    Signal

    Processing.

    New York: IEEE Press,

    1972.

    [ 2 9 ] C.M. Rader, A no teonexact discreteFourier ransforms,”

    IEEE Trans. Audio El e c tr ~ o u st . Conesp.), vol. AU-21,pp.

    [ 301 H. Takah asi and Y. Ishibashi, “A new meth od for ‘exact calcula-

    558-559, Dec. 1973.

    t ion’ by a digital compu ter,”

    Znform. Process. Jap. ,

    vol.

    I,

    pp. 28-

    42, 1962.