13
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 57, NO. 4, APRIL 2010 823 Efficient Reverse Converter Designs for the New 4-Moduli Sets and Based on New CRTs Amir Sabbagh Molahosseini, Keivan Navi, Chitra Dadkhah, Omid Kavehei, and Somayeh Timarchi Abstract—In this paper, we introduce two new 4-moduli sets and for developing efficient large dynamic range (DR) residue number systems (RNS). These moduli sets consist of simple and well-formed moduli which can result in efficient implementation of the reverse converter as well as internal RNS arithmetic cir- cuits. The moduli set has -bit DR and it can result in a fast RNS arithmetic unit, while the -bit DR moduli set is a conversion friendly moduli set which can lead to a high-speed and low-cost reverse converter design. Next, efficient reverse converters for the proposed moduli sets based on new Chinese remainder theorems (New CRTs) are presented. The converter for the moduli set is derived by New CRT-II with better performance compared to the re- verse converter for the latest introduced -bit DR moduli set . Also, New CRT-I is used to achieve a high-performance reverse converter for the moduli set . This converter has less conversion delay and lower hardware requirements than the reverse converter for a recently suggested -bit DR moduli set . Index Terms—Computer arithmetic, new Chinese remainder theorems (New CRTs), residue arithmetic, reverse converter, residue number system (RNS). I. INTRODUCTION T HE residue number system (RNS) has been an important research field in computer arithmetic for many decades, mainly because of its carry-free nature which can lead to high performance computing architectures with a particular power consumption and delay specification [1]. In RNS, a regular Manuscript received November 11, 2008; revised February 26, 2009. First published December 18, 2009; current version published April 09, 2010. This paper was recommended by Associate Editor V. De. A. S. Molahosseini is with Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran 1477893855, Iran (e-mail: [email protected]). K. Navi is with Department of Electrical and Computer Engineering, Shahid Beheshti University, GC, Tehran 1983963113, Iran (e-mail: [email protected]). C. Dadkhah is with Department of Electrical Engineering, K. N. Toosi University of Technology, Tehran 1969764499, Iran (e-mail: dad- [email protected]). O. Kavehei is with the Centre for High Performance Integrated Technologies and Systems (CHiPTec), School of Electrical and Electronic Engineering, The University of Adelaide, Adelaide, SA 5005, Australia (e-mail: omid@eleceng. adelaide.edu.au). S. Timarchi is with Department of Electrical and Computer Engi- neering, Shahid Beheshti University, GC, Tehran 1983963113, Iran (e-mail: [email protected]). Digital Object Identifier 10.1109/TCSI.2009.2026681 weighted number is converted into a set of small residues and since arithmetic operations on residues can be performed in par- allel without carry propagation between them, RNS can result in high-speed addition, subtraction and multiplication [2], [3]. The RNS has been widely considered as an alternative to the weighted number system for efficient hardware implementation of digital signal processing (DSP) computation algorithms [4], [5]. In particular, RNS is an interesting and useful method for the implementation of high-speed FIR filters [6], [7]. Moreover, RNS has applications in image processing systems, especially RNS image coding which can offer high-speed VLSI implementation of secure image processing algorithms [8]. In addition, redundant RNS is extensively used ito design of the error detection and correction codes [9], [10]. The first step for designing an RNS system is the moduli set selection. The moduli set consists of a set of pairwise relatively prime integer numbers. The dynamic range (DR) of an RNS system is defined in terms of the product of the moduli, and it denotes the interval of integers which can be uniquely repre- sented in RNS. The proper selection of a moduli set has an im- portant role in the design of the RNS systems because the speed of RNS arithmetic unit as well as the complexity of residue to bi- nary converter depend on the form and the number of the moduli [11]. Another important concern for reverse converter design is the selection of an appropriate conversion algorithm. The algo- rithms of reverse conversion are mainly based on the Chinese remainder theorem (CRT), mixed-radix conversion (MRC) and the new Chinese remainder theorems (New CRTs) [12]. Among these, New CRTs have simple computations which can be effi- ciently realized in hardware. Up to now, many moduli sets have been suggested for RNS which can be classified based on their DR. The prominent -bit DR moduli sets are [13], [14], [15] and [16]. The DR provided by these moduli sets is not sufficient for applications which require larger DR with more parallelism. Therefore, -bit DR 4-moduli sets such as [17], [17], [18], [19] and [20] have been introduced to increase parallelism in RNS arithmetic unit. In addition to these, Cao et al. [21] and Hariri et al. [22] used 4 and 3-moduli sets for providing more DR than -bit, and they proposed the -bit DR moduli sets and , respec- tively. But the use of modulo results in an increase in the latency of the RNS arithmetic unit. Also, an efficient reverse 1549-8328/$26.00 © 2010 IEEE

Efficient Reverse Converter Designs for the New 4-Moduli Sets \u003cformula formulatype=\"inline\"\u003e\u003ctex Notation=\"TeX\"\u003e$\\{2^{n} -1, 2^{n}, 2^{n} +1, 2^{2n + 1}-1\\}$\u003c/tex\u003e\u003c/formula\u003e

  • Upload
    rmit

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 57, NO. 4, APRIL 2010 823

Efficient Reverse Converter Designs for the New4-Moduli Sets �� �� ��� �� � �� ����� � and�� �� �� � �� ���� ��� � � Based on New CRTs

Amir Sabbagh Molahosseini, Keivan Navi, Chitra Dadkhah, Omid Kavehei, and Somayeh Timarchi

Abstract—In this paper, we introduce two new 4-moduli sets� � � � �� �� �� � and � � � �� �� �� �

� for developing efficient large dynamic range (DR) residuenumber systems (RNS). These moduli sets consist of simple andwell-formed moduli which can result in efficient implementationof the reverse converter as well as internal RNS arithmetic cir-cuits. The moduli set � � � � � � �� �� � has� -bit DR and it can result in a fast RNS arithmetic unit, whilethe � -bit DR moduli set � � � � � �� �� � � isa conversion friendly moduli set which can lead to a high-speedand low-cost reverse converter design. Next, efficient reverseconverters for the proposed moduli sets based on new Chineseremainder theorems (New CRTs) are presented. The converterfor the moduli set � � � � � � �� �� � is derivedby New CRT-II with better performance compared to the re-verse converter for the latest introduced � -bit DR moduli set� � � � � � � � � � �� � . Also, New CRT-I

is used to achieve a high-performance reverse converter for themoduli set � � � � � �� �� � � . This converter hasless conversion delay and lower hardware requirements than thereverse converter for a recently suggested � -bit DR moduli set� � � � � �� � �� �� � .

Index Terms—Computer arithmetic, new Chinese remaindertheorems (New CRTs), residue arithmetic, reverse converter,residue number system (RNS).

I. INTRODUCTION

T HE residue number system (RNS) has been an importantresearch field in computer arithmetic for many decades,

mainly because of its carry-free nature which can lead to highperformance computing architectures with a particular powerconsumption and delay specification [1]. In RNS, a regular

Manuscript received November 11, 2008; revised February 26, 2009. Firstpublished December 18, 2009; current version published April 09, 2010. Thispaper was recommended by Associate Editor V. De.

A. S. Molahosseini is with Department of Computer Engineering, Scienceand Research Branch, Islamic Azad University, Tehran 1477893855, Iran(e-mail: [email protected]).

K. Navi is with Department of Electrical and Computer Engineering, ShahidBeheshti University, GC, Tehran 1983963113, Iran (e-mail: [email protected]).

C. Dadkhah is with Department of Electrical Engineering, K. N.Toosi University of Technology, Tehran 1969764499, Iran (e-mail: [email protected]).

O. Kavehei is with the Centre for High Performance Integrated Technologiesand Systems (CHiPTec), School of Electrical and Electronic Engineering, TheUniversity of Adelaide, Adelaide, SA 5005, Australia (e-mail: [email protected]).

S. Timarchi is with Department of Electrical and Computer Engi-neering, Shahid Beheshti University, GC, Tehran 1983963113, Iran (e-mail:[email protected]).

Digital Object Identifier 10.1109/TCSI.2009.2026681

weighted number is converted into a set of small residues andsince arithmetic operations on residues can be performed in par-allel without carry propagation between them, RNS can resultin high-speed addition, subtraction and multiplication [2], [3].The RNS has been widely considered as an alternative to theweighted number system for efficient hardware implementationof digital signal processing (DSP) computation algorithms [4],[5]. In particular, RNS is an interesting and useful methodfor the implementation of high-speed FIR filters [6], [7].Moreover, RNS has applications in image processing systems,especially RNS image coding which can offer high-speed VLSIimplementation of secure image processing algorithms [8]. Inaddition, redundant RNS is extensively used ito design of theerror detection and correction codes [9], [10].

The first step for designing an RNS system is the moduli setselection. The moduli set consists of a set of pairwise relativelyprime integer numbers. The dynamic range (DR) of an RNSsystem is defined in terms of the product of the moduli, andit denotes the interval of integers which can be uniquely repre-sented in RNS. The proper selection of a moduli set has an im-portant role in the design of the RNS systems because the speedof RNS arithmetic unit as well as the complexity of residue to bi-nary converter depend on the form and the number of the moduli[11]. Another important concern for reverse converter design isthe selection of an appropriate conversion algorithm. The algo-rithms of reverse conversion are mainly based on the Chineseremainder theorem (CRT), mixed-radix conversion (MRC) andthe new Chinese remainder theorems (New CRTs) [12]. Amongthese, New CRTs have simple computations which can be effi-ciently realized in hardware.

Up to now, many moduli sets have been suggested for RNSwhich can be classified based on their DR. The prominent -bitDR moduli sets are [13],

[14], [15] and[16]. The DR provided by these moduli sets is not sufficient

for applications which require larger DR with more parallelism.Therefore, -bit DR 4-moduli sets such as

[17], [17], [18],[19] and

[20] have been introduced to increase parallelism inRNS arithmetic unit. In addition to these, Cao et al. [21] andHariri et al. [22] used 4 and 3-moduli sets for providing moreDR than -bit, and they proposed the -bit DR moduli sets

and , respec-tively. But the use of modulo results in an increase inthe latency of the RNS arithmetic unit. Also, an efficient reverse

1549-8328/$26.00 © 2010 IEEE

824 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 57, NO. 4, APRIL 2010

converter for the -bit DR moduli sethas been proposed in [23]. The

problem is that the speed of the arithmetic unit of RNS systemsbased on this moduli set is restricted to the low-performancemodulo . Recently, Cao et al. proposed the

-bit DR moduli set[24] for reducing the total delay of internal RNS arithmetic cir-cuits. The use of balanced and well-formed moduli makes thismoduli set attractive for RNS arithmetic unit. But some of themultiplicative inverses of this moduli set have inefficient formswhich result in increasing the cost and the delay of reverse con-verter. Thus, the need for a new -bit DR moduli set which re-sults in fast RNS arithmetic unit as well as efficient reverse con-version is evident. Research on -bit DR moduli sets is morecompetitive. Recently, Zhang et al. [25] suggested the -bit DRmoduli set , and presented areverse converter for this moduli set based on New CRT-II. Foreliminating the ROM tables and multiplications which are re-quired by New CRT-II, they used moduli andin their moduli set for having simple multiplicative inverses.However, the use of moduli of the forms andresults in increasing the cost and the delay of the reverse con-verter.

In this paper, efficient reverse converters for the new RNSmoduli sets and

are presented. The converter for the proposed-bit DR moduli set is achieved

by an adder-based implementation of the New CRT-II withoutthe use of ROM or multiplier. The resulting architecture is fasterand needs less hardware requirements than the reverse converterof the moduli set [24].Next, due to the arithmetic properties of the -bit DR moduliset , the New CRT-I is employed forderiving an efficient reverse converter with superior area-timecomplexities than the reverse converter of the moduli set

[25]. Also the proposed moduliset can speed up internal RNSarithmetic processing, compared to the moduli set

. Furthermore, while the DR of our moduliset is larger than the DR of moduliset ( times greater), the proposedreverse converter for the moduli set

has lower hardware cost and the same speed as the reverseconverter of the moduli set [21].

In the rest of the paper, a brief introduction of RNS with de-scription of New CRTs is presented in Section II. Section III in-troduces the reverse conversion algorithms with their hardwareimplementations for the proposed moduli sets. Performance ofthe presented reverse converters in terms of conversion delayand hardware requirements are evaluated and compared withother converters in Section IV.

II. BACKGROUND

The Residue Number System: An RNS is defined in termsof a relatively-prime moduli set where

for , and denotes the greatest

common divisor of and . A weighted number can berepresented as , where

(1)

Such a representation is unique for any integer in the range, where is the DR of the moduli

set [26].New Chinese Remainder Theorem 1: For the 4-moduli set

, the number can be converted from itsresidue representation by New CRT-I [12], [27]as follows:

(2)

where

(3)

(4)

(5)

New Chinese Remainder Theorem 2: by New CRT-II [12],[25], with the 4-moduli set , the number canbe calculated from its corresponding residuesusing the following equations

(6)

(7)

(8)

where

(9)

(10)

(11)

where and are the multiplicative inverses.

III. REVERSE CONVERTER DESIGNS

In this section, New CRT II and I are applied to derive ef-ficient reverse conversion algorithms for the new moduli sets

and ,respectively. Also, adder-based hardware implementations ofthe conversion algorithms are presented.

A. Converter for the Moduli Set

Conversion Algorithm: The New CRT-II is employed fordesigning an efficient reverse conversion algorithm. The fol-lowing theorems, propositions and properties are needed for thederivation of conversion algorithm. First, we must prove thatthis moduli set consists of pairwise relatively prime moduli.

Theorem 1: The moduli setincludes pairwise relatively prime numbers.

Proof: It is well-known that the prominent numbersand are pairwise relatively prime moduli. Therefore,

MOLAHOSSEINI et al.: EFFICIENT REVERSE CONVERTER DESIGNS 825

we should prove that these numbers are relatively-prime to thenumber . So, based on Euclid’s Theorem, we have

(12)

Hence, we have

(13)

(14)

(15)

Since the greatest common divisors are one, the numbersand are relatively prime to the modulo .

Proposition 1: The multiplicative inverse ofmodulo is .

Proof: We know that

(16)

(17)

Therefore, by letting the valuesand in (9), we have

(18)

Proposition 2: The multiplicative inverse of modulois .

Proof: By considering (10), it is clear that

(19)

Proposition 3: The multiplicative inverse of modulois .

Proof: Since , we have

(20)

Theorem 2: In the RNS based on the 4-moduli set, the

number can be calculated from its corresponding residuesby

(21)

where

(22)

(23)

Proof: By letting, and the values of from propositions

1, 2 and 3 into (6)–(8), (21)–(23) are obtained.Theorem 2 is used for designing an efficient reverse con-

verter. But before implementing it, according to the following

properties, (21)–(23) can be simplified to decrease the hardwarecomplexity.

Property 1: The residue of a negative residue numberin modulo is the one’s complement of , where

[22].Property 2: The multiplication of a residue number by

in modulo is carried out by bit circular left shift,where is a natural number [22].

Consider the 4-moduli setand let the corresponding residues of the integer number be

. The residues have bit-level representation as

(24)

(25)

(26)

(27)

First, we simplify (22) as follows:

(28)

where

(29)

(30)

(31)

Next, (23) can be rewritten as

(32)

where

(33)

(34)

826 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 57, NO. 4, APRIL 2010

(35)

Since, is a number that is smaller than , we can considertwo cases for . First, when is smaller than , and thesecond, when is equal to . If , we have

(36)

Else if , the following binary vector can be obtained as

(37)

Therefore, is calculated as

(38)

Finally, (21) can be simplified as below

(39)

where

(40)

The above equation can be evaluated as follows:

(41)

(42)

(43)

(44)

Therefore, we have

(45)

(46)

Since, the most significant bits of in (46) and theleast significant bit of in (43) are 1’s, we can use the followingvectors instead of and

(47)

(48)

We know that,

(49)

Consequently, in (39) can be computed by

(50)

Finally, by substituting (28) and (50) in (39) we have

(51)

where

(52)

(53)

MOLAHOSSEINI et al.: EFFICIENT REVERSE CONVERTER DESIGNS 827

Also, since is an -bit number, in (51) can be obtained as

(54)

Example 1: Consider the moduli setwhere . The weighted number can be calcu-

lated from its RNS representation (3, 17, 4, 2) as followFor the moduli set is and also residues havebinary representation as below

By letting the values of residues and in (30), (31), (29),(34), (37), (38) and (33) we have

Then, the required values should be substituted in (41), (42),(45), (48) and (50). So

Finally, by letting values of and in (53), (52) and (54),can be computed as follows:

To verify the result, we have

Therefore, the weighted number 1319 in the RNS based on the4-moduli set has representation as (3, 17, 4, 2).

Fig. 1. Converter for moduli set �� � � � �� � � �� � � ��.

Hardware Implementation: Hardware architecture of the pro-posed reverse converter for the 4-moduli set

with corresponding residues is shownin Fig. 1. Implementation is based on (29), (33), (50), (52) and(54). Firstly, the operand preparation unit 1 (OPU 1) preparesthe required operands (30), (31), (34) and (38) and these prepa-rations rely on simply manipulating the routing of the bits of theresidues. Also, we need NOT gates for performing the inver-sions needed in (31) and (36). In addition, an -bit 2 1 multi-plexer (MUX) is used for obtaining (38).

Implementation of (29) and (33) requires one moduloadder and one modulo adder, respec-

tively. These modulo adders can be implemented with differentmethods [29], [30]. In this paper, we considered the carry-prop-agate adder (CPA) with end-around carry (EAC) [29]. Thedelay of a CPA with EAC is twice the delay of a regular CPA,

828 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 57, NO. 4, APRIL 2010

TABLE ICHARACTERIZATION OF EACH PART OF THE PROPOSED REVERSE CONVERTER FOR THE MODULI SET �� � �� � � � � �� � � ��

while it has the same hardware complexity. Hence, the moduliand adders rely on and -bit

CPAs with EAC, respectively. Also, since (31) has bitsof 1’s, full adders (FA’s) in the modulo adderare reduced to pairs of XNOR/OR gates. Realizationof (50) relies on a 4-operand modulo adder [28], andit can be implemented by two -bit carry save adders (CSAs)with EAC followed by a modulo adder. The OPU 2requires NOT gates for preparing (48) and (45). Also,since (41) has bits of 0’s and (48) has bits of 1’s,

of the FA’s in CSA1 and CSA2 are reduced to the pairsof XOR/AND and XNOR/OR gates, respectively. Also, themodulo adder is implemented by a -bit CPA withEAC. Finally, implementation of (52) requires a -bitregular binary subtracter. This subtracter can be realized byNOT gates, FA’s and pairs of XNOR/OR gates.That is why by representing in bits and inverting it,we have bits with the value of 1. Hence, of theFAs in subtractor reduced to pairs of XNOR/OR gates.It should be noted realization of (53) and (54) rely on simpleconcatenation without the use of any computational hardware.Area and delay specifications of each part of the converter areshown in Table I.

B. Converter for the Moduli Set

Conversion Algorithm: Using New CRT-I, the reverse con-verter for the moduli set is obtainedas follow.

Theorem 3: The moduli setincludes pairwise relatively prime numbers.

Proof: As proved in [21], the numbers andare pairwise relatively prime. Therefore, it should be

proved that these numbers are relatively-prime to the number. Hence, based on Euclid’s Theorem, we have

(55)

(56)

(57)

Therefore, it can be seen that the numbersand are pairwise relatively prime.

Proposition 4: The multiplicative inverse of the numbermodulo is .

Proof: By substitutingand into (3), we have

(58)

Proposition 5: The multiplicative inverse ofmodulo is .

Proof: Since , from (4) it is obviousthat

(59)

Proposition 6: The multiplicative inverse ofmodulo is .

Proof: We know that

(60)

Therefore

(61)

Theorem 4: For the moduli set, the weighted binary

number can be calculated from its corresponding residuesby (62), shown at the bottom of the page.

(62)

MOLAHOSSEINI et al.: EFFICIENT REVERSE CONVERTER DESIGNS 829

Proof: By substitutingand and the values of multiplicative inverses

from Propositions 4, 5 and 6 into (2), (62) is obtained.In a similar manner like before, Theorem 4 can be simplified

by employing Properties 1 and 2, for deriving a high perfor-mance hardware design.

The (62) can be rewritten as

(63)

where

(64)

(65)

(66)

The above equation can be separated in two parts as

(67)

where

(68)

(69)

Now, for we have

(70)

This equation can be calculated by

(71)

where

(72)

(73)

Finally, can be computed as

(74)

Since, (65), (69) and (73) have some constant bits; we can usethe following vectors instead of and .

(75)

(76)

830 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 57, NO. 4, APRIL 2010

(77)

It is clear that,

(78)

Hence, (64) can be calculated by

(79)

Therefore, six summands are reduced to five and this simplifi-cation results in removing one -bit CSA from the hardwarearchitecture. Finally, in (63) is obtained by

(80)

Example 2: In the RNS with moduli setwhere . The RNS number (12, 15, 4, 2) can be

converted into its equivalent weighted number as below Forthe moduli set is , and residues have binary

representation as

Substituting the value of residues in (75), (68), (74), (72), (77)and (79), we have

Then, can be simply computed by letting values of andin (80) as below

Verification can be done as

Fig. 2. Converter for moduli set �� � � � �� � � �� � � ��.

TABLE IICHARACTERIZATION OF EACH PART OF THE PROPOSED REVERSE CONVERTER

FOR THE MODULI SET �� � �� � � �� � � � � ��

So, the weighted number 2684 in the RNS based on the moduliset has representation as (12, 15, 4, 2).

Hardware Implementation: Based on (79) and (80), hardwarearchitecture of the proposed reverse converter for the moduli set

consists of only one 5-operandmodulo adder [28] as depicted in Fig. 2. Descriptionof the different parts of the reverse converter for the moduli set

is presented in Table II.

IV. PERFORMANCE EVALUATION

The reverse converters presented in this paper are designedespecially for the new moduli setsand . The dynamic ranges of thesemoduli sets are and -bit, respectively. Hence, to verify theperformance of these reverse converters, they have to be com-pared to other reverse converters for moduli sets with similarDR. Such moduli sets are [21],

[22],[23],

[24] and [25]. The major

MOLAHOSSEINI et al.: EFFICIENT REVERSE CONVERTER DESIGNS 831

TABLE IIIHARDWARE REQUIREMENTS AND CONVERSION DELAYS OF THE DIFFERENT REVERSE CONVERTERS

TABLE IVAREA AND DELAY COMPARISONS BETWEEN THE PROPOSED REVERSE CONVERTERS AND RELATED WORKS

TABLE VAREA AND DELAY COMPARISONS BETWEEN THE OTHER REVERSE CONVERTERS FOR 5N-BIT DR MODULI SETS

drawbacks of these moduli sets were investigated in the intro-duction. Table III shows the hardware requirements and conver-sion delays of the reverse converters for these moduli sets as wellas our designs. For a fair comparison, -bit CPAs with EAC areconsidered for the implementation of the modulo adderfor all converters. The hardware architecture of [24] mainly con-sists of two modulo adders, two moduloadders, one modulo adder, two regular subtracters,one -bit binary adder, and some FAs. As evaluated in [20],the total delay of the converter of [24] is ,where and denote the delay of an FA and the numberof levels in an CSA tree, respectively. Also, the reverse con-verter of [25] has an adder-based architecture, and its criticaldelay path includes one modulo adder, one 5-operandmodulo adder and one 5-operand regular CPA. Themodulo adder of [25] is implemented by using two

-bit CPAs, with the total delay of . More-over, the 5-operand modulo adder used in [25] is basedon the method of [29], and consists of three CSAs with EAC fol-lowed by a -bit CPA with EAC. The authors of [25] considerthe delay of a -bit CPA with EAC as (an optimisticassumption from [29]), and they reported the total delay of the5-operand modulo adder as . But, as in-dicated in [29], the realistic delay of a -bit CPA with EACis . Finally, the 5-operand regular CPA of [25] involves

three CSAs and one -bit regular CPA. Therefore, consid-ering these points, the total delay of [25] is obtained as

.In order to perform an accurate comparison, all the converters

were described in VHDL, and simulated with ISE v10.1. Next,the converters circuits were implemented using Cadence. TheTSMC 65 nm technology was used and the power supply andfrequency were set to 0.9 V and 50 MHZ, respectively. Thearea constraint optimization with 10 iterations was performed.Tables IV and V present the resulting area and delay versus .Moreover, some samples of the layouts of the reverse convertersare showed in Figs. 3–9. It should be noted that some criteriashould be considered for selecting the values of . The proposedconverters as well as the converters of [21], [22] and [25], canbe implemented for every values of . But, the converter of [24]has been designed for the values of where or .Also, the converter of [23] can be implemented only for oddvalues of . Therefore, we have implemented all the convertersfor the values of except for [23]. The converterof [23] is implemented for the values of .

For a fair comparison between the performances of the re-verse converters for different moduli sets, the moduli sets shouldprovide the same DR, and also they must rely on same arithmeticunit speed. Between the -bit DR moduli sets, the moduli sets

and

832 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 57, NO. 4, APRIL 2010

Fig. 3. Layout of the proposed converter for the moduli set �� ��� � � � �

�� � � �� for � � �.

Fig. 4. Layout of the proposed converter for the moduli set �� � �� � �

�� � � � � �� for � � �.

Fig. 5. Layout of the converter of [24] for � � �.

Fig. 6. Layout of the converter of [25] for � � �.

Fig. 7. Layout of the converter of [22] for � � �.

have the same speed, while the other -bit DRmoduli sets rely on arithmetic unit with longer delay. Hence, theproposed reverse converter for the moduli set

can be directly compared to the converter of [24],while for comparing with the converters for other -bit DRmoduli sets, whole system must be considered. Due to this, theimplementation results for the proposed converters as well asthe converters of [24] and [25] are noted in Table IV, and the re-sults of the converters for other -bit DR moduli sets are listedin Table V.

As shown in Table III, the proposed reverse converters for themoduli sets and

have better performance in terms of area anddelay than those of [24] and [25], respectively. The implemen-tation results of Table IV are also consistent with the formulasgiven in Table III. It can be seen from Table IV that the proposedreverse converter for the moduli set

is faster and requires less area than the converter of [24]. Es-pecially, the difference of area increases for larger . Also, theresults of Table IV show that the proposed reverse converter for

MOLAHOSSEINI et al.: EFFICIENT REVERSE CONVERTER DESIGNS 833

Fig. 8. Layout of the converter of [21] for � � �.

Fig. 9. Layout of the converter of [23] for � � �.

the moduli set leads to signif-icant reduction in delay and area compared to the converter of[25]. Based on formulas of Table III, among the converters for

-bit DR moduli sets, the reverse converter of the moduli set[22] has the best area and delay. The

implementation results of Table V confirm this improvement.Finally, while the proposed reverse converter for the moduli set

supports larger DR than the re-verse converter of the moduli set , ithas less area than the converter of [21]. Particularly, while in thetheoretical formulas of Table III, our converter for the moduli set

has the same speed as the con-verter of [21], but the implementation results of Tables IV andV show that our converter relies on smaller delay.

It is essential to remark that another important issue in thedesign of RNS systems is the speed of the internal RNS arith-metic processing. The RNS systems based on -bit DR modulisets and

have

TABLE VICOMPARISON OF THE SPEED OF THE DIFFERENT MODULI SETS

an arithmetic unit with lower speed than the -bit DR modulisets and

. To verify this, we used the method whichwas introduced in [22] and [24] for comparing the speed of thedifferent RNS moduli sets. It is the magnitude of the largestmodulo that dictates the speed of arithmetic operations; how-ever, speed and cost also depend on the moduli chosen [2]. Thus,the modulo determines the overall speed of the RNS sys-tems based on the moduli setsand . Also, the overallspeed of the arithmetic unit of RNS systems based on the modulisetsand is restricted to the speed of themodulo . Finally, in the moduli sets

and, the critical moduli are

and , respectively. The unit gate delays of the par-allel-prefix modular adders of [31] and [32] were used for es-timating the addition operation delay in moduli of the forms

and , respectively. It should be noted that thereis no modular adder specially designed for modulo of the forms

. Hence, we must use the generic modular adders for per-forming modulo addition. The unit gate delays of theadders of [31] and [32] are and ,respectively. Moreover, the method of [33] considered to com-pute the delay of addition in modulo . There-fore, the unit gate delays are obtained and listed in Table VI.It should be noted that by using the addition unit gate delaysof [30] and [31], the delay of moduli andare obtained as and , respec-tively. Hence, for the moduli set ,the is the critical modulo. It is clear from Table VI thatthe moduli sets and

are faster than the modulisets and

. In par-ticular, it can be seen that while the moduli set

has five modulus, it has same speed asour proposed four-moduli set .Also, the moduli set is fasterthan moduli set . There-fore, it can be concluded that the proposed moduli sets

and canresult in better tradeoffs between the RNS arithmetic unit delayand reverse converter performance.

834 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 57, NO. 4, APRIL 2010

V. CONCLUSION

This paper introduces two new large dynamic range4-moduli sets and

as alternatives to the recentlysuggested moduli setsand , respectively. Also,efficient reverse converters for these moduli sets based onNew CRTs are presented. The proposed converter architecturesare memory-less and adder-based and they can be efficientlypipelined. Comparison with the reverse converters for the latestintroduced large dynamic range moduli sets has shown thatthe proposed converters have better performance in terms ofconversion delay and hardware requirements. Furthermore,with the new proposed moduli sets, the internal RNS arithmeticcircuits as well as the reverse converter can be implementedefficiently, resulting in higher overall performance of the RNSsystem.

ACKNOWLEDGMENT

The authors would like to thank Dr. B. Yoberd for hisliterature contribution, and the anonymous reviewers for theirvaluable comments which have improved the quality of thepaper.

REFERENCES

[1] T. Stouratitis and V. Paliouras, “Considering the alternatives in low-power design,” IEEE Circuits Devices, vol. 7, pp. 23–29, 2001.

[2] B. Parhami, Computer Arithmetic: Algorithms and Hardware De-sign. Oxford, U.K.: Oxford Univ. Press, 2000.

[3] P. V. A. Mohan, Residue Number Systems: Algorithms and Architec-tures. Norwell, MA: Kluwer, 2002.

[4] M. A. Soderstrand et al., Residue Number System Arithmetic: ModernApplications in Digital Signal Processing. Piscataway, NJ: IEEEPress, 1986.

[5] G. C. Cardarilli, A. Nannarelli, and M. Re, “Residue number system forlow-power DSP applications,” in Proc. 41nd Asilomar Conf. Signals,Syst., Comput., 2007, pp. 1412–1416.

[6] W. K. Jenkins and B. J. Leon, “The use of residue number systemsin the design of finite impulse response digital filters,” IEEE Trans.Circuits Syst., vol. CAS-24, no. 4, pp. 191–201, Apr. 1977.

[7] R. Conway and J. Nelson, “Improved RNS FIR filter architectures,”IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 51, no. 1, pp. 26–28,Jan. 2004.

[8] W. Wang, M. N. S. Swamy, and M. O. Ahmad, “RNS application fordigital image processing,” in Proc. 4th IEEE Int. Workshop System-on-Chip for Real Time Appl., 2004, pp. 77–80.

[9] V. T. Goh and M. U. Siddiqi, “Multiple error detection and correctionbased on redundant residue number systems,” IEEE Trans. Commun.,vol. 56, no. 3, pp. 325–330, Mar. 2008.

[10] S. Timarchi and K. Navi, “Efficient class of redundant residue numbersystem,” in Proc. IEEE Int. Symp. Intell. Signal Process., 2007, pp. 1–6.

[11] W. Wang, M. N. S. Swamy, and M. O. Ahmad, “Moduli selection inRNS for efficient VLSI implementation,” in Proc. IEEE Int. Symp. Cir-cuits Syst., 2003, pp. 25–28.

[12] Y. Wang, “Residue-to-binary converters based on new Chineseremainder theorems,” IEEE Trans. Circuits Syst. II, Analog. Digit.Signal Process., vol. 47, no. 3, pp. 197–205, Mar. 2000.

[13] Y. Wang, X. Song, M. Aboulhamid, and H. Shen, “Adder based residueto binary numbers converters for �� � �� � � � � ��,” IEEE Trans.Signal Process., vol. 50, no. 7, pp. 1772–1779, Jul. 2002.

[14] W. Wang, M. N. S. Swamy, M. O. Ahmad, and Y. Wang, “A high-speedresidue-to-binary converter and a scheme of its VLSI implementation,”IEEE Trans. Circuits Syst. II, Analog. Digit. Signal Process., vol. 47,no. 12, pp. 1576–1581, Dec. 2000.

[15] A. Hiasat and A. Sweidan, “Residue number system to binary converterfor the moduli set �� � � ��� � ���,” Elsevier J. Syst. Architect.,vol. 49, pp. 53–58, 2003.

[16] P. V. A. Mohan, “RNS-to-binary converter for a new three-moduli set�� � �� � � � � ��,” IEEE Trans. Circuits Syst. II, Exp. Briefs,vol. 54, no. 9, pp. 775–779, Sep. 2007.

[17] P. V. A. Mohan and A. B. Premkumar, “RNS-to-binary converters fortwo four-moduli set �� � �� � � � � �� � � �� and �� ��� � � � � �� � � ��,” IEEE Trans. Circuits Syst. I, Reg. Papers,vol. 54, no. 6, pp. 1245–1254, Jun. 2007.

[18] M. Hosseinzadeh, A. S. Molahosseini, and K. Navi, “An improved re-verse converter for the moduli set �� � �� � � � � �� � � ��,”IEICE Electron. Exp., vol. 5, no. 17, pp. 672–677, 2008.

[19] B. Cao, T. Srikanthan, and C. H. Chang, “Efficient reverse convertersfor the four-moduli sets �� � �� � � � � �� � � �� and �� ��� � � � � �� � � ��,” Proc. IEE Comput. Digit. Tech., vol. 152,pp. 687–696, 2005.

[20] P. V. A. Mohan, “New reverse converters for the moduli set �� ��� � � �� � � �� � � ��,” Elsevier J. Electron. Commun. (AEU),vol. 62, no. 9, pp. 643–658, 2008.

[21] B. Cao, C. H. Chang, and T. Srikanthan, “An efficient reverse converterfor the 4-moduli set �� � �� � � � � �� � � �� based on the newChinese remainder theorem,” IEEE Trans. Circuits Syst. I, Reg. Papers,vol. 50, no. 10, pp. 1296–1303, 2003.

[22] A. Hariri, K. Navi, and R. Rastegar, “A new high dynamic range moduliset with efficient reverse converter,” Elsevier J. Comput. Math. WithAppl., vol. 55, no. 4, pp. 660–668, 2008.

[23] A. A. Hiasat, “VLSI implementation of new arithmetic residue to bi-nary decoders,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol.13, no. 1, pp. 153–158, Jan. 2005.

[24] B. Cao, C. H. Chang, and T. Srikanthan, “A residue-to-binary converterfor a new five-moduli set,” IEEE Trans. Circuits Syst. I, Reg. Papers,vol. 54, no. 5, pp. 1041–1049, May 2007.

[25] W. Zhang and P. Siy, “An efficient design of residue to binary converterfor four moduli set �� � �� � � �� � � �� � � �� based onnew CRT II,” Elsevier J. Inf. Sci., vol. 178, no. 1, pp. 264–279, 2008.

[26] F. J. Taylor, “Residue arithmetic: A tutorial with examples,” IEEEComputer, vol. 17, pp. 50–62, 1984.

[27] A. S. Molahosseini, K. Navi, O. Hashemipour, and A. Jalali, “An ef-ficient architecture for designing reverse converters based on a gen-eral three-moduli set,” Elsevier J. Syst. Architect., vol. 54, no. 10, pp.929–934, 2008.

[28] S. J. Piestrak, “Design of residue generators and multioperand modularadders using carry-save adders,” IEEE Trans. Comput., vol. 423, no. 1,pp. 68–77, Jan. 1994.

[29] S. J. Piestrak, “A high speed realization of a residue to binary con-verter,” IEEE Trans. Circuits Syst. II, Analog. Digit. Signal Process.,vol. 42, no. 10, pp. 661–663, Oct. 1995.

[30] L. Kalampoukas, D. Nikolos, C. Efstathiou, H. T. Vergos, and J. Kala-matianos, “High-speed parallel-prefix modulo � � � adders,” IEEETrans. Comput., vol. 49, no. 7, pp. 673–679, Jul. 2000.

[31] C. Efstathiou, H. T. Vergos, and D. Nikolos, “Fast parallel-prefixmodulo � � � adder,” IEEE Trans. Comput., vol. 53, no. 9, pp.1211–1216, Sep. 2004.

[32] R. A. Patel, M. Benaissa, N. Powell, and S. Boussakta, “Novelpower-delay-area-efficient approach to generic modular addition,”IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 54, pp. 1279–1292,2007.

[33] A. A. Hiasat, “High-speed and reduced area modular adder structuresfor RNS,” IEEE Trans. Comput., vol. 51, no. 1, pp. 84–89, Jan. 2002.

Amir Sabbagh Molahosseini was born in Kerman,Iran, in 1983. He received the B.Sc. degree fromShahid Bahonar University of Kerman, Iran, in2005, and the M.Sc. degree (with highest honors)from Islamic Azad University (IAU), Science andResearch Branch, Tehran, Iran, in 2007, both inComputer Engineering. He is currently workingtoward the Ph.D. degree in computer engineering atScience & Research Branch of IAU, Tehran, Iran.

His research interests include VLSI design andcomputer arithmetic with emphasis on residue

number system.

MOLAHOSSEINI et al.: EFFICIENT REVERSE CONVERTER DESIGNS 835

Keivan Navi received the B.Sc. and M.Sc. degrees incomputer hardware engineering from Beheshti Uni-versity, Tehran, Iran, in 1987 and Sharif University ofTechnology, Tehran, Iran, in 1990, respectively. Healso received the Ph.D. degree in computer architec-ture from Paris XI University, Paris, France, in 1995.

He is currently Associate Professor in facultyof electrical and computer engineering of BeheshtiUniversity. His research interests include VLSIdesign, single electron transistors (SET), carbonnano tube, computer arithmetic, interconnection

network and quantum computing. He has published over 30 ISI and researchjournal papers and over 70 IEEE, international and national conference paper.

Chitra Dadkhah received the B.Sc. degree insoftware engineering from Shahid Beheshti Uni-versity, Tehran, Iran, in 1990, the M.Sc.degree incomputer engineering (artificial intelligence) fromthe IASI Department, University of Paris 11, Orsay,France, 1993. She also received the Ph.D. degreein computer engineering from the Department ofComputer Engineering & IT, Amir Kabir University(Polytechnique), Tehran, Iran, 2005.

She is Assistant Professor in Electrical En-gineering Faculty of K. N. Toosi University of

Technology, Tehran, Iran. Her research interests include soft computing,residue number systems, expert system, and verification and validation of theknowledge.

Omid Kavehei received the B.Sc. and M.Sc. degreesin computer architecture engineering from ArakAzad University, Iran, and Shahid Beheshti Univer-sity, Tehran, Iran, in 2003 and 2005, respectively.

He is currently a Post-graduate Research Studentat the University of Adelaide, Centre for High-Per-formance Integrated Technologies and Systems(CHiPTec). He has been pursuing teaching andresearch in the mainstreams of computer-aideddesign (CAD), numerical methods, robust statisticalmethods, computer arithmetic and integrated circuits

in the general area of Integrated VLSI Systems. His research interest hasparticularly focused on the field of robust design and optimization methods fordeep sub-micron (DSM) VLSI circuits with emphasis on low-power and high-performance circuit design.

Somayeh Timarchi received the B.Sc. degreein computer hardware engineering from ShahidBeheshti University, Tehran, Iran, in 2002 and theM.Sc. degree from Sharif University of Technology,Tehran, Iran, in 2004. Currently, she is pursuing thePh.D. degree in the faculty of electrical and computerengineering of Shahid Beheshti University.

She is currently part time Instructor in the De-partment of Electrical and Computer Engineering atShahid Beheshti University. She was also a VisitingResearcher in the laboratory of Electronics and Mea-

surements in the Engineering Department of the University of Siena, Italy. Herresearch interests include computer arithmetic, residue and redundant numbersystem, VLSI Design, modeling and design of ultra low-power arithmeticcircuits.