70
CHAPTER-1 INTRODUCTION The residue number system (RNS) has been employed for efficient parallel carry-free arithmetic computations (addition, subtraction, and multiplication) in DSP applications as the computations for each residue channel can independently be done without carry propagation. A residue number system is defined by a set of N integer constants, {m 1 , m 2 , m 3 , ... , m N },referred to as the moduli. Let M be the least common multiple of all the m i . Any arbitrary integer X smaller than M can be represented in the defined residue number system as a set of N smaller integers {x 1 , x 2 , x 3 , ... , x N } with x i = X modulo m i representing the residue class of X to that modulus. RNS based computations can achieve significant speedup over the binary-system-based computation, they are widely used in DSP processors, FIR filters, and communication components Arithmetic modulo 2 n + 1 computation is one of the most common RNS operations that are used in pseudorandom number generation and cryptography [The modulo 2 n + 1addition is the most crucial step among the commonly used moduli sets, such as {2 n − 1, 2 n , 2 n + 1}, {2 n − 1, 2 n , 2 n + 1, 22 n + 1} and {2 n 1, 2 n , 2 n + 1, 2 n +1 + 1}. There are many previously reported methods to speed up the modulo 2 n + 1 addition. Depending on the input/output data representations,these methods can be classified into two categories, 1

Report

Embed Size (px)

Citation preview

Page 1: Report

CHAPTER-1 INTRODUCTION

The residue number system (RNS) has been employed for efficient parallel

carry-free arithmetic computations (addition, subtraction, and multiplication) in DSP

applications as the computations for each residue channel can independently be

done without carry propagation. A residue number system is defined by a set

of N integer constants, {m1, m2, m3, ... , mN },referred to as the moduli. Let M be

the least common multiple of all the mi. Any arbitrary integer X smaller than M can be

represented in the defined residue number system as a set of  N smaller integers

{x1, x2, x3, ... , xN} with xi = X modulo mi representing the residue class of X to that

modulus.

RNS based computations can achieve significant speedup over the binary-

system-based computation, they are widely used in DSP processors, FIR filters, and

communication components Arithmetic modulo 2n + 1 computation is one of the most

common RNS operations that are used in pseudorandom number generation and

cryptography [The modulo 2n + 1addition is the most crucial step among the

commonly used moduli sets, such as {2n − 1, 2n, 2n + 1}, {2n − 1, 2n, 2n + 1, 22n + 1}

and {2n − 1, 2n, 2n + 1, 2n+1 + 1}. There are many previously reported methods to

speed up the modulo 2n + 1 addition. Depending on the input/output data

representations,these methods can be classified into two categories,

namely,diminished-1 and weighted respectively. In the diminished-1 representation,

each input and output operand is decreased by 1 compared with its weighted

representation. Therefore, only n-bit operands are needed in diminished-1 modulo

2n + 1 addition, leading to smaller and faster components. However, this incurs an

overhead due to the translators from/to the binary weighted system. On the other

hand, the weighted-1 representation uses (n + 1)-bit operands for computations,

avoiding the overhead of translators, but requires larger area compared with the

diminished-1 representations. The general operations in modulo 2n + 1 addition were

discussed including diminished-1 and weighted modulo addition. parallel-prefix

adders for diminished-1 modulo 2n+ 1 addition. To improve the area–time and time–

power products, the circular carry selection scheme was used to efficiently select the

1

Page 2: Report

correct carry-in signals for final modulo addition . The aforementioned methods all

deal with diminished-1 modulo addition. However, the hardware for

decreasing/increasing the inputs/outputs by 1 is omitted in the literature. In addition,

the value zero is not allowed in diminished-1 modulo 2n + 1 addition, and hence, the

zero-detection circuit is required to avoid incorrect computation. This leads to

increased hardware cost, here proposed a unified approach for weighted and

diminished-1 modulo 2n + 1 addition. This approach is based on making the modulo

2n + 1addition of two (n + 1)-bit input numbers A and B congruent to Y + U + 1,

where Y and U are two n-bit numbers. Thus, any dimished-1 adder can be used to

perform weighted modulo 2n + 1 addition of Y and U. first use the translators to

decrease the sum of two n-bit inputs A and B by 1 and then performed the weighted

modulo 2n + 1 addition using diminished-1 adders. It should be noted that, for the

architecture , the ranges of two inputs A and B are less than that proposed (i.e., {0,

2n − 1} versus {0, 2n}). In this brief, we propose improved area-efficient weighted

modulo 2n + 1 adder design using diminished-1 adders with simple correction

schemes. This is achieved by subtracting the sum of two (n + 1)-bit input numbers by

the constant 2n + 1 and producing carry and sum vectors. The modulo 2n + 1 addition

can then be performed using parallel-prefix structure diminished-1 adders by taking

in the sum and carry vectors plus the inverted end-around carry with simple

correction schemes. Compared with the work in, the area cost for our proposed

adders is lower. In addition, our proposed adders do not require the hardware for

zero detection that is needed in diminished-1 modulo 2n + 1 addition.

2

Page 3: Report

CHAPTER-2

AIM AND SCOPE OF PROJECT

In the diminished-1 representation, each input and output operand is decreased by 1

compared with its weighted representation. Therefore, only n-bit operands are needed in

diminished-1 modulo 2n + 1 addition, leading to smaller and faster components. However,

this incurs an overhead due to the translators from/to the binary weighted system. On the

other hand, the weighted-1 representation uses (n + 1)-bit operands for computations,

avoiding the overhead of translators, but requires larger area compared with the diminished-

1 representations. To improve the area–time and time–power products, the circular carry

selection scheme was used to efficiently select the correct carry-in signals for final modulo

addition. The aforementioned methods all deal with diminished-1 modulo addition. However,

the hardware for decreasing /increasing the inputs/outputs by 1 is omitted in the literature. In

addition, the value zero is not allowed in diminished-1 modulo 2n + 1 addition,and hence, the

zero-detection circuit is required to avoid incorrect computation.The brent –kung tree based

prefix structure uses onle less are when compared with the sklansky style prefix structure

This leads to increased hardware cost.The proposed unified approach for weighted and

diminished-1 modulo 2n + 1 addition is based on making the modulo 2n + 1addition of two (n

+ 1)-bit input numbers A and B congruent to Y + U + 1, where Y and U are two n-bit

numbers. Thus, any dimished-1 adder can be used to perform weighted modulo 2n + 1

addition of Y and U. The authors first used the translators to decrease the sum of two n-bit

inputs A and B by 1 and then performed the weighted modulo 2n + 1 addition using

diminished-1 adders In this design we are combining the previous two modulo (2n+1)

adders (diminished-1, weighted-1) to reduce the area & improve the performance,

.

3

Page 4: Report

CHAPTER-3

EXISTING METHODOLOGY

3.1 THEORY

Residue arithmetic has been used in digital computing systems for many

years. In particular, arithmetic modulo appears to play an important role in a variety

of applications. Modulo 2n+1 arithmetic is most commonly met in the residue number

system (RNS) , which is an arithmetic system well-suited to applications in which the

operations are limited to addition, subtraction and multiplication; a common case for

several digital signal processor (DSP) algorithms. The RNS has been used for the

design of digital signal processors finite- impulse response (FIR) filters and

communication components [16].

Three-moduli sets ({2n − 1, 2n, 2n + 1}of the form have received significant attention

as the RNS base, mainly because of the existence of efficient residue to binary

converters Addition in such systems is performed using three channels, that, in fact,

are a modulo {2n – 1)(equivalently one’s complement), a modulo and a modulo

adder( 2n + 1). From this, we conclude that the design of an efficient modulo (( 2n +

1}adder is a vital task in RNS-based applications that include a modulus of the form.

Unfortunately, in an RNS that uses a three moduli set , {2n − 1, 2n, 2n + 1} the

modulo(2n + 1} channel becomes the execution-rate bottleneck, since it has to deal

with n+1 bit operands, while the other two channels operate on -bit ones. The

diminished-1 representation was introduced to alleviate this problem, by having

each operand represented decreased by one compared to its weighted

representation and by deriving the results in an alternative manner when one or both

operands or the results are zero. The diminished-1sum is then computed as, by a

diminished-1 adder, which is an adder that increments the integer sum of and

whenever the carry flag of their respective integer addition is not set.A diminished-1

adder can be derived by connecting the inverted carry output of an integer adder

back to its carry input. However,such solutions are inefficient due to the resulting

oscillations. Therefore, a number of efficient architectures that do not suffer from

oscillations have been proposed. The need for handling zero operands and results

separately, as well as the need for time and hardware consuming input (output)

4

Page 5: Report

translators from (to) the weighted to (from) the diminished- 1 representation, make

the use of the diminished-1 representation efficient only when a large number of

calculations take place before a new conversion is required. In all other cases,

including all applications apart from RNS implementations, modulo adders with

operands in weighted representation are more suitable. Efficient architectures for

modulo adders for operands in weighted representation have also been proposed .

These two cases, namely modulo adders that operate on operands in the

diminished-1 representation (hereafter called diminished-1 adders) and those that

operate on operands that follow a weighted representation (hereafter called weighted

adders) have, so far, been considered distinct cases and efficient architectures for

them have been studied independently. In this brief it is shown that these two

alternatives can be unified. A diminished-1 adder can be derived by connecting the

inverted carry output of an integer adder back to its carry input. Given two -bit

numbers and , the problem of computing two -bit numbers and , such that to be

congruent to modulo , is attacked. It is shown that this problem has a constant time

solution, enabling every architecture that has been or will be proposed for

diminished-1 addition to also be used for addition of operands in the weighted

representation. The required unifying arithmetic operator is just a simplified inverted

end-around carry-save-adder (CSA) stage.[12],[15]

Fig 3.1 CSA stage with inverted end-around carry

5

Page 6: Report

3.2 REVIEW OF TWO PREVIOUS WEIGHED MODULO 2N+1 ADDER

Given two (n + 1)-bit numbers A and B, where 0 ≤ A,B ≤ 2n, the values of

diminished-1 of A and B are denoted by A ∗ =A − 1 and B ∗ = B − 1, respectively.

The diminished-1 sum S∗can be computed by

S∗ = |S − 1|2n+1 = |A + B − 1|2n+1 = |A ∗ + B |∗ 2n + cout (1)

where |X|Z is defined as modulo Z of X, and cout is denoted as the inverted end-

around carry of the diminished-1 modulo 2n sum of n-bit A ∗ and B∗.

3.2.1 VERGOS AND EFSTATHIOU

In this first compute the congruent modulo sum of A + B to produce Y and

U, and then, the final modulo sum is performed by any diminished-1 modulo

adder .Suppose A and B are two (n + 1)-bit input numbers, i.e., A = anan−1, . a0 = an ×

2n + An and B = bnbn−1, . . . , b0 = bn × 2n + Bn, where 0 <= A,B <= 2n, and An and Bn are

two n-bit numbers; then

|A + B|2n+1 = ||An + Bn + D + 1|2n+1 + 1|2n+1 =|Y + U + 1|2n+1., D = 2n − 4 + 2cn+1 + sn,

which is equivalent to 1111, . . . , cn+1sn, where cn+1 = an • bn (• is denoted as the logic

AND operation), and sn = an ⊕ bn (⊕ is denoted as the logic EXCLUSIVE-OR

operation) is the bit of D with binary weights 21 and 20, respectively. The first step of

this equation computes modulo 2n + 1 carry-save addition, giving the carry vector Y

and the sum vector U, where Y = yn−2yn−3, . . . , y0yn−1 and U = un−1un−2, . . . , u0 are

produced by adding An, Bn, and D, respectively. It can be seen that the values of D

with binary weights of 22 through 2n−1 are all 1, which can simplify the design of

adders to produce the carries and sums using OR and XNOR gates for every bit

position directly .In the bits of D with binary weights 21 and 20, the adders should be

modified to accept the values sn and cn+1, respectively.

6

Page 7: Report

Fig 3.2 Architecture of Vergos and Efstanthiou

3.2.2 VERGOS AND BAKALIS

In this method subtract the sum of the two n-bit inputs A and B by 1 to produce the

diminished-1 values A’ and B’, and modulo 2n sum of A’ and B’ can be performed by

any diminished-1 architecture, as follows:

||A + B|2n+1|2n = |A’ + B’|2n + c’out.

The value c’out is the inverted end-around carry produced by A’ + B’, and the

architecture is shown in Fig.3.2. The architecture proposed makes use of a

constant time operator, which is composed of a simplified carry-save adder stage,

leading to efficient modulo 2n + 1 adders. The architecture proposed] can be applied

in the design of area-efficient residue generators and multioperand modulo adders.

However, the values that are subtracted by the inputs A and B are not constants. In

this way to implement the translator for decreasing the sum of two inputs by 1 was

not mentioned.The ranges of two inputs A and B are less than the one proposed in

older one (i.e., {0, 2n − 1} versus {0, 2n}). [1]

7

Page 8: Report

Fig 3.3 Architecture of Vergos and Bakalis

3.3 DIMINISHED -1 ADDER

Diminished-1 adder can be used for the modulo 2n +1 addition of two n-bit

operands in the weighted representation, if it is driven by operands whose sum has

been decreased by 1. This scheme outperforms solutions that are based on the use

of binary adders and/or weighted modulo 2n + 1 adders in both area and delay

terms. The diminished adder used in this type Sklansky-style diminished adder .

For the Sklansky adder shown in Fig 3.3 ,. The Sklansky-style parallel-prefix

operation requires N/2 additions at each stage of the tree. Since all additions at a

given stage in the tree are completely independent, they can be run in parallel. This

is what makes this technique attractive for parallelizing associative functions.This

sklansky type structure uses more are than the brent kung tree prefix structure. [1].

8

Page 9: Report

FIG 3.4 Sklansky-style parallel-prefix structure

FIG 3.5 BASIC CELLS IN SKLANSKY –STYLE STRUCTURE

9

Page 10: Report

Sklansky prefix tree takes the least logic levels to compute the carries. Plus, it

uses less cells than Kogge-Stone structure at the cost of higher fan-out..The

sklansky style prefix structure uses large area when compared with the brent-kung

tree parallel prefix structures For a 16-bit Sklansky prefix tree, the maximum fan-out

is 9 (i.e. f = 3). The structure can be viewed as a compacted version of Brent-kung's,

where logic levels is reduced and fan-out increased. Sklansky-style parallel-prefix

structure with correction circuits for our proposed weighted modulo 28 + 1 adder. The

square (_) and diamond (♦) nodes denote the pre- and postprocessing stages of the

operands, respectively. The black nodes (•) evaluate the prefix operator, and the

white nodes (◦) pass the unchanged signals to the next prefix level.[1]

10

Page 11: Report

CHAPTER-4

PROPOSED SYSTEM

4.1 INTRODUCTION

An area-efficient weighted modulo 2n + 1 adder design using diminished-1

adders with simple correction schemes. This is achieved by subtracting the sum of

two (n + 1)-bit input numbers by the constant 2n + 1 and producing carry and sum

vectors. The modulo 2n + 1 addition can then be performed using parallel-prefix

structure diminished-1 adders by taking in the sum and carry vectors plus the

inverted end-around carry with simple correction schemes. The area cost for our

proposed adders is lower. In addition, our proposed adders do not require the

hardware for zero detection that is needed in diminished-1 modulo 2n + 1 addition..

4.2 THEORY

An area efficient modulo Instead of subtracting the sum of A and B by

D, which is not a constant as proposed in we use the constant value −(2n + 1) to be

added by the sum of A and B. In addition, we make the two inputs A and B to be in

the range {0, 2n}, which is 1 more than {0, 2n − 1} as proposed in we present the

designs of our proposed weighted modulo 2n +1 adder

.Given two (n + 1)-bit inputs,. A = anan−1, . . . , a0 and B =bnbn−1, . . . , b0, where

0 ≤ A,B ≤ 2n. The weighted modulo 2n + 1 of A + B can be represented as follows

11

Page 12: Report

From these equations , it can easily be seen that the value of the weighted modulo

2n + 1 addition can be obtained by first subtracting the value of the sum of A and B

by (2n + 1) (i.e., 0111, . . . , 1) and then using the diminished-1 adder to get the final

modulo sum by making the inverted end-around carry as the carry-in Now, we

present the method of weighted modulo 2n + 1 addition of A and B as follows.

Denoting Y’and U’ as the carry and sum vectors of the summation of A,B and

−(2n + 1), where Y’= y’n −2y’n −3, . . . , y’0y’n−1 and U’ = u’n−1u’n−2, . . . , u’0, the modulo

addition can be expressed as follows:

For i = 0 to n − 2, the values of y’i and u’i can be expressed as y’i = ai ∨ bi and u’i =

ai ⊕ bi, respectively (∨ is denoted as logic OR operation). Since the bit widths of Y’

and U’ are only n bits, the values of y’n −1 and u’n−1 are required to be computed taking

the values of an, bn, an−1, and bn−1 into consideration (i.e., y’n −1 and u’n−1 are the values

of the carry and the sum produced by 2an + 2bn + an−1 + bn−1 + 1, respectively). It

should be noted that 0 ≤ A,B ≤ 2n, which means an = an−1 = 1 or bn = bn−1 = 1 will

cause the value of A or B to exceed the range of {0, 2n}. Thus, these input

combinations (i.e., an = an−1 = 1 or bn = bn−1 = 1) are not allowed and can be viewed as

don’t care conditions, which can help us simplify the circuits for generating y’n −1 and

u’n−1. That is, the maximum value of 2an + 2bn + an−1 + bn−1 + 1 is 5, which occurs at an

= bn = 1 (i.e., the maximum value of y’n −1 is 2). .The reason for FIX is that, under

12

Page 13: Report

some conditions, y’n −1 =2 (e.g., an = bn = 1 and an−1 = bn−1 = 0), which cannot be

represented by 1-bit line therefore,the value of y’n −1 is set to 1, and the remaining

value of carry (i.e., 1) is set to FIX.[1].

4.3 RESIDUE NUMBER SYSTEM

A basic number system consists of a correspondence between

sequences of digits and numbers. In a fixed-point number system, each sequence

corresponds to exactly one number, and the radix-point |the “decimal point" in the

ordinary decimal number system| that is used to separate the integral and fractional

parts of a representation is in a fixed position.In contrast, in a °floating-point number

system, a given sequence may correspond to several numbers: the position of the

radix-point is not fixed,and each position in a digit-sequence indicates the particular

number represented. Usually, °floating-point systems are used for the representation

of real numbers, and fixed-point systems are used to represent integers (in which the

radix point is implicitly assumed to be at the right-hand end) or as parts of floating-

point representations; but there are a few exceptions to this general rule. Almost all

applications of RNS are as fixed-point number systems.If we consider a number

such as 271.834 in the ordinary decimal number system, we can observe that each

digit has a weight that corresponds to its position: hundred for the 2, ten for the 7, ...

thousand for the 4. This number system is therefore an example of a positional (or

weighted) number system; residue number systems, on the other hand, are non-

positional. The decimal number system is also a single-radix (or fixed-radix ) system,

as it has only one base (i.e. ten). .Although mixed-radix (i.e. multiple-radix) systems

are relatively rare, there are a few useful ones. Indeed, for the purposes of

conversion to and from other number systems, as well as for certain operations, it is

sometimes useful to associate a residue number system with a weighted, mixed-

radix number system.

Residue number systems are based on the congruence relation, which is defined as

follows. Two integers a and b are said to be congruent modulo m if m divides exactly

the difference of a and b; it is common, especially in mathematics tests, to write a= b

13

Page 14: Report

(mod m) to denote this. Thus, for example, 10 = 7 (mod 3); 10 = 4 (mod 3); 10 = 1

(mod 3), and 10 =-2 (mod 3). The number m is a modulus or base.

If q and r are the quotient and remainder, respectively, of the integer division of a by

m -that is, a = q.m + r -|then, by defenition, we have a = r (mod m). The number r is

said to be the residue of a with respect to m, and we shall usually denote this by r

=/a/m The set of m smallest values, {0; 1; 2; : : : ;m – 1}, that the residue may

assume is called the set of least positive residues modulo m. Unless otherwise

specified, we shall assume that these are the only residues in use.

Consider a set, {m1;m2; : : : ;mN}, of N positive and pairwise relatively prime

moduli Let M be the product of the moduli. Then every number X < M has a unique

representation in the residue number system, which is the set of residues

{/X]MI:1<=I<=N}. A partial proof of this is as follows. Suppose X1 and X2 are two

di®erent numbers with the same residue-set. Then /X1/mi = /x/jmi , and so /X1 - X2/mi

= 0. Therefore X1 - X2 is the least common multiple (lcm) of mi. But if the mi are

relatively prime, then their lcm is M, and it must be that X1 - X2 is a multiple of M. So

it cannot be that X1 < M and X2 < M. Therefore, the set {/jX/mi : 1<= i <= N} is unique

and may be taken as the representation of X. The number M is called the dynamic

range of the RNS, because the number of numbers that can be represented is M.

For unsigned numbers, that range is [0;M - 1].[17]

Representations in a system in which the moduli are not pairwise relatively

prime will be not be unique: two or more numbers will have the same representation.

As an example, the residues of the integers zero through fifteen relative to the

moduli two, three, and five (which are pairwise relatively prime) are given in the left

half of Table 4.1. And the residues of the same numbers relative to the moduli two,

four, and six (which are not pairwise relatively prime) are given in the right half of the

same table.Observe that no sequence of residues is repeated in the first half,

whereas there are repetitions in the second. The preceding discussions define what

may be considered standard residue number systems, and it is with these that we

shall primarily be concerned. Nevertheless, there are useful examples of

nontandard" RNS, the most common of which are the redundant residue number

systems. Such a system is obtained by,essentially, adding extra (redundant) moduli

to a standard system. The dynamic range then consists of a \legitimate" range,

14

Page 15: Report

defined by the non-redundant moduli and an \illegitimate" range; for arithmetic

operations,initial operands and results should be within legitimate range. Redundant

number systems of this type are especially useful in fault-tolerant computing. The

redundant moduli mean that digit-positions with errors may be excluded from

computations while still retaining a su±cient part of the dynamic range. Furthermore,

both the detection and correction of errors are possible: with k redundant moduli, it is

possible to detect up to k errors and to correct up to k/2errors. A different form of

redundancy can be introduced by extending the size of the digit-set corresponding to

a modulus, in a manner similar to RSDs. For a modulus m, the normal digit set is

{0,1,...m-1} but if instead the digit-set used is {0,1...m’-1}, where m’>=m then some

residues will have redundant representations.[10].[18]

Table 4.1 Residues for various moduli

4.3.1 MODULE SELECTION

In general, then, there are at least four considerations that should be taken

into account in the selection of moduli. First, the selected moduli must provide an

adequate range whilst also ensuring that RNS representations are unique. The

second is, as indicated above, the effiency of binary representations; in this regard, a

15

Page 16: Report

balance between the different moduli in a given moduli-set is also important. The

third is that, ideally, the implementations of arithmetic units for RNS should to some

extent be compatible with those for conventional arithmetic, especially given the \

legacy" that exists for the latter. And the fourth is the size of individual moduli:

Although, as we shall see, certain RNS-arithmetic operations do not require carries

between digits, which is one of the primary advantages of RNS, this is so only

between digits. Since a digit is ultimately represented in binary,there will be carries

between bits, and therefore it is important to ensure that digits (and, therefore, the

moduli) are not too large. Low-precision digits also make it possible to realize cost-

effective table-lookup implementations of arithmetic operations. But, on the other

hand, if the moduli are small, then a large number of them may be required to ensure

a sufficient dynamic range. Of course, ultimately the choices made, and indeed

whether RNS is useful or not, depend on the particular applications and technologies

at hand.

4.3.2 Negative numbers

Some applications require that it be possible to represent negative numbers

as well as positive ones. As with the conventional number systems, any one of the

radix complement, diminished-radix complement, or sign-and- magnitude notations

may be used in RNS for such representation. The merits and drawbacks of choosing

one over the other are similar to those for the conventional notations. In contrast with

the conventional notations, however, the determination of sign is much more diffcult

with the residue notations, as is magnitude-comparison. This is the case even with

sign-and- magnitude notation, since determining the sign of the result of an

arithmetic operation such as addition or subtraction is not easy|even if the signs of

the operands are known. The extension of sign-and-magnitude notation to RNS

involves the use of a single sign-digit or prepending to each residue in a

representation an extra bit or digit for the sign; we shall assume the former. For the

comple- ment notations, the range of representable numbers is usually partitioned

into two approximately equal parts, such that approximately half of the numbers are

positive and the rest are negative.

16

Page 17: Report

4..3.3 Basic arithmetic

The standard arithmetic operations of addition/subtraction and multiplication are

easily implemented with residue notation, depending on the choice of the moduli, but

division is much more difficult. The latter is not surprising, in light of the statement

above on the diffculties of sign-determination and magnitude-comparison. Residue

addition is carried out by individually adding corresponding digits, relative to the

modulus for their position. That is, a carry-out from one digit position is not

propagated into the next digit position. Subtraction may be carried out by negating

(in whatever is the chosen notation) the subtrahend and adding to the minuend. This

is straightforward for numbers in diminished-radix complement or radix complement

notation. For numbers represented in residue sign-and-magnitude, a slight

modiffcation of the algorithm for conventional sign-and-magnitude is necessary: the

sign digit is fanned out to all positions in the residue representation, and addition

then proceeds as in the case for unsigned numbers but with a conventional sign-

and-magnitude algorithm. Multiplication too can be performed simply by multiplying

corresponding residue digit-pairs, relative to the modulus for their position; that

is, multiply digits and ignore or adjust an appropriate part of the result.

4.3.4 Conversion

The most direct way to convert from a conventional representation to a

residue one, a process known as forward conversion, is to divide by each of the

given moduli and then collect the remainders. This, however, is likely to be a costly

operation if the number is represented in an arbitrary radix and the moduli are

arbitrary. If, on the other hand, the number is represented in radix-2 (or a radix that is

a power of two) and the moduli are of a suitable form (e.g. 2n¡1), then there

procedures that can be implemented with more effciency. The conversion from

residue notation to a conventional notation, a process known as reverse conversion,

is more di±cult (conceptually, if not necessarily in the implementation) and so far has

been one of the major impediments to the adoption use of RNS. One way in which it

can be done is to assign weights to the digits of a residue representation and then

produce a \conventional" (i.e positional, weighted) mixed-radix representation from

this. This mixed-radix representation can then be converted into whatever

17

Page 18: Report

conventional form is desired. In practice, the use of a direct conversion procedure for

the latter can be avoided by carrying out the arithmetic of the conversion in the

notation for the result. Another approach involves the use of the Chinese Remainder

Theorem, which is the basis for many algorithms for conversion from residue to

conventional notation; this too involves, in essence, the extraction of a mixed-radix

representation.

Residue number systems are also useful in error detection and correction.

This is apparent, given the independence of digits in a residue-number

representation: an error in one digit does not corrupt any other digits. In general, the

use of redundant moduli, i.e. extra moduli that play no role in determining the

dynamic range,facilitates both error detection and correction. But even without

redundant moduli, fault tolerance is possible, since computation can still continue

after the isolation of faulty digit-positions, provided that a smaller dynamic range is

acceptable. RNS can help speed up complex-number arithmetic:[2]

4.4 MODULO 2N+1 ADDER DESIGN

Efficient modulo 2n+1 adders are important for several applications

including residue number system, digital signal processors and cryptography

algorithms. In a conventional modulo 2n+1 adder, all operands have (n+1)-bit length.

To avoid using (n+1)-bit circuits, the diminished-1 and carry save diminished-1

number systems can be effectively used in applications. In the paper, we also derive

two new architectures for designing modulo 2n+1 adder, based on n-bit ripple-carry

adder. The first architecture is a faster design whereas the second one uses less

hardware. In the proposed method, the special treatment required for zero operands

in Diminished-1 number system is removed. In the fastest modulo 2n+1 adders in

normal binary system, there are 3-operand adders.For efficient design the hardware

overhead and power consumption will be reduced. As well as power reduction, in

some cases, power-delay product will be also reduced.

The modular characteristic of the Residue Number System (RNS) offers

the potential for high-speed and parallel arithmetic. In RNS logic, each operand is

represented by its residues with respect to a set of numbers comprising the base.

Addition, subtraction and multiplication are performed in parallel on the residues in

18

Page 19: Report

distinct design units (often called channels) avoiding carry propagation among

residues So, arithmetic operations, e.g. addition, subtraction and multiplication can

be carried out more efficiently in RNS than in conventional two’s complement

systems. That makes RNS a good candidate for implementing a lot of application

fields. Typical applications of the RNS can be found in Digital Signal Processing

(DSP) for filtering, convolutions, correlations, FFT computation , fault-tolerant

computer systems communication cryptography. The choice of moduli set is very

important and necessary for nearly equal delay of the channels. Special moduli sets

have been used extensively to reduce the hardware complexity in the

implementation of converters and arithmetic operations. Among which the triple

moduli set {2n-1,2n,2n+1} has some benefits . Because of operand lengths of these

moduli, the operation delay of this system is determined by the modulo 2n+ 1

channel. The latter means that, if we cut down the time required for modulo 2 n+1

addition,]. In order to speed up the modulo 2n+1 arithmetic operations the

diminished-1 representation of binary numbers has been introduced]. In the

Diminished-1 number system, each number X is represented by X*=X-1, while zero

is handled separately.. But in these circuits, it is necessary to use special treatment

for zero operands. To overcome mentioned problem, a number representation

socalled “Carry Save Diminished-1” has been proposed in. In this paper, an addition

algorithm in the carry save diminished-1 system is proposed. In the proposed

addition algorithm, the special treatment for zero operands is not required. Modulo

2n+ 1 adders can also be designed as a special case of general modulo m adders.

The novel architecture removes some significant problems of old structures and

reduces both area and power dissipation. In the paper, we derive new methodology

for modulo 2n+ 1 adder that leads to a ripple-carry adder architecture. Although

ripple-carry adder has more delay than carry-accelerate adder, it is useful for low

power and low area applications. Using implementation in a CMOS technology, we

show that the proposed ripple-carry design methodology leads to considerably less

area and power consumption than those reported in the related papers and in some

cases, power-delay product is also reduced. The conventional methods for modulo

2n+ 1 adder including general modulo adders, diminished-1 and carry save

diminished-1 modulo adders implemented by ripple-carry and parallel-prefix addition.

19

Page 20: Report

Modulo 2n+ 1 adders can be designed as a special case of general modulo m

adders. To remove the problem of (n+1)-bit wide circuits for the modulo 2n+ 1

channel, the diminished-1 and carry save diminished-1 number systems have been

proposed.[3]

Fig 4.1 general block diagram of modulo 2n+1 adder.

The general block diagram of 2N+1 adder is given in fig 4.1 The

only difference between modulo 2n +1 adder and modulo 2n - 1 adder is the inverter

that takes cout as input. In this end-around adder, cout needs to be inverted before

going to the incrementer. The ways of building a modulo 2n + 1 can also be divided

into three categories. One utilizes the reduced parallel prefix tree with an extra logic

level at the bottom . A second method uses the similar idea as the full parallel prefix

tree. The third one is the end-around adder with any type of adder followed by an

incrementer.[12]

20

Page 21: Report

4.4.1 Parallel-Prefix Ling Structures for Modulo 2n + 1 Adders

Ling's scheme can be applied to modulo 2n + 1 adders. Efficient modulo

2n+1 adders are important for several applications including residue number system,

digital signal processors and cryptography algorithms. In a conventional modulo 2n+1

adder, all operands have (n+1)-bit length. To avoid using (n+1)-bit circuits, the

diminished-1 and carry save diminished-1 number systems can be effectively used in

applications The idea is applicable for full parallelprefix structure for modulo 2n + 1

adders . Diminished-1 adder can be used for the modulo 2n +1 addition of two n-bit

operands in the weighted representation, if it is driven by operands whose sum has

been decreased by 1. This scheme outperforms solutions that are based on the use

of binary adders and/or weighted modulo 2n + 1 adders in both area and delay terms.

. To improve the area–time and time–power products, the circular carry selection

scheme was used to efficiently select the correct carry-in signals for final modulo

addition . The aforementioned methods all deal with diminished-1 modulo addition.

However, the hardware for decreasing/increasing the inputs/outputs by 1 is omitted

in the literature. This approach is based on making the modulo 2n + 1 addition of two

(n + 1)-bit input numbers A and B congruent to Y + U + 1, where Y and U are two n-

bit numbers. The three main parts of this technique are translator,diminished-1 adder

and correction scheme.Thus, any dimished-1 adder can be used to perform

weighted modulo 2n + 1 addition of Y and U In addition, the value zero is not allowed

in diminished-1 modulo 2n + 1 addition, and hence, the zero-detection circuit is

required to avoid incorrect computation any dimished-1 adder can be used to

perform weighted modulo 2n + 1 addition of Y and U We then apply this scheme in

the design of residue generators (RGs) and multi-operand modulo adders (MOMAs).

However, due to the complexity of the full parallel-prefix structure, the benefit tend to

diminish when Ling's equations are utilized, especially for wide adders (i.e. 64-bit or

larger). As there is an inverted carry-in for modulo 2n + 1 adders, the same logic will

be there in Ling's reduced prefix tree.[13]

21

Page 22: Report

Fig 4.2 modified Modulo 2n + 1 Adder with the Reduced Parallel-Pre_x

Ling Structure

4.4.2 Combination of Binary and Modulo 2n +1 Adder

Reviewing binary and modulo 2n+1 adder architectures, it can be found that

the prefix tree can be applied to all these adders. Special moduli sets have been

used extensively to reduce the hardware complexity in the implementation of

converters and arithmetic operations. Modulo 2n+ 1 adders can be designed as a

special case of general modulo m adders. To remove the problem of (n+1)-bit wide

circuits for the modulo 2n+ 1 channel, the diminished-1 and carry save diminished-1

number systems have been proposed Among which the triple moduli set {2n-

1,2n,2n+1} has some benefits . Because of operand lengths of these moduli, the

operation delay of this system is determined by the modulo 2n+ 1 channel. Modulo

adders are an extension of binary adders. The reduced parallel-prefix structure

applies to both modulo 2n-1 and 2n+1 adders, however, with the only difference of

one inverter.

22

Page 23: Report

Fig 4.3 Combined Binary and Modulo Adders.

In Figure 4.3, gi=pi comes from the pre-computation stage. The selections

0, 1 and 2 in the multiplexor are for modulo 2n + 1, modulo 2n - 1 and binary addition,

respectively the general equation for binary addition, the combined-function adder

can be formulized as the following equation shows.

Inserting this structure between pre- and post-computation, the adder architecture is

complete. The modified parallel-prefix tree does not handle the carry-input. This is

the only difference between this special prefix tree and that solely for binary adder.

23

Page 24: Report

The carry input is handled at the last row of gray cells. This agrees with the

associativity of the synthesis rule. The prefix tree can be modified from any type of

normal binary prefix tree.[4].

4.5 CARRY PROPAGATION ADDITION

Carry-propagate addition finally converts the redundant carry-save output

from the carry-save adder into irredundant binary representation by performing a

carry-propagation . A variety of different schemes exist to speed up carry-

propagation that trade off area versus speed. The relevant adder architectures and

their characteristics are summarized in Table 4.2. Two principles have to be

distinguished here: the prefix structure employed to propagate carries from lower to

upper bits and the sum bit generation that determines how the sum bits are

calculated from the carries.

Table 4.2 ADDER ARCHITECTURE CHARACTERISTICS

4.5.1 PREFIX STRUCTURE

carry-propagation in binary addition is a prefix problem [8], which can be

calculated using prefix structures .Besides the straightforward serial-prefix structure

(implemented by the ripple-carry adder) many different parallel-prefix structures

exist, which speed up carry-propagation at the cost of increased area requirements .

They basically differ in terms of depth (= circuit speed), size (= circuit area) and

24

Page 25: Report

maximum fanout, which can be bounded (constant) or unbounded (dependent on the

operand width) and influences circuit speed and area in a more subtle way. The

internal signals of a prefix implementation can be coded in different ways, resulting in

different possible logic implementations. Most common are the use of

generate/propagate signal pairs computed by AND-OR gates and carry-in-0/carry-in-

1 signal pairs

4.5.2 Architecture Performance Comparison

The relative performance of all these adder architectures varies greatly among

different technology libraries, so that only qualitative characteristics regarding area

and speed are summarized in Table I instead of quantitative comparison results. In

addition, the following observations can be made: • The ripple-carry adder

implemented using full-adder cells is always the smallest and slowest adder.

• The carry-skip adder massively speeds up the ripplecarry adder at a very

moderate area penalty but is still slower than any other architecture. However, due to

its false paths it cannot readily be used in synthesis-based design.

• The carry-select adder is very area efficient for medium speeds if special carry-

select adder cells are available in the library. Its prefix structure has the special

property of allowing maximally 2 prefix nodes per bit position.

• The carry-increment adder is an optimization of the carry-select adder that uses

the carry-lookahead scheme instead of the carry-select scheme for the same prefix

structure. It has the same delay but a 30% smaller gate count.

• The Brent-Kung parallel-prefix adder gives a good trade-off between area and

speed, lying in the range of -15% to -30% area reduction at +15% to +30% delay

increase as compared to the faster Sklansky parallelPrefix adder.

• The Sklansky parallel-prefix adder (uses the prefix structure first proposed by

Sklansky for conditional-sum adders has a prefix structure of minimal depth and

therefore is among the fastest adder architectures. Its unbounded-fanout property

helps reduce circuit area (fewer prefix nodes) but adds some extra delay for driving

the high-fanout nodes.

• The Kogge-Stone parallel-prefix adder also has a minimal depth prefix structure.

Its bounded-fanout property eliminates the need for driving high-fanout nodes,

25

Page 26: Report

making it the fastest adder in most technologies, but comes at the cost of much

bigger area (more prefix nodes) and more wiring. Compared to the Sklansky prefix

adder, it shows an area increase between +23% (8 bit) and +75% (128 bit) at a fairly

constant delay reduction of around -4% (all widths).[5].

4.6 PARELLEL PREFIX TREE STRUCTURE

Parallel-prefix trees have various architectures. These prefix trees can be

distinguished by four major factors. 1) Radix/Valency 2) Logic Levels 3) Fan-out 4)

Wire Tracks In the following discussion about prefix trees, the radix is assumed to be

2 (i.e. the number of 32 inputs to the logic gates is always 2). The more aggressive

prefix schemes have logic levels [log2(n)], where n is the width of the inputs.

However, these schemes require higher fanout,or many wire-tracks or dense logic

gates, which will compromise the performance e.g.speed or power. Some other

schemes have relieved fan-out and wire tracks at the cost of more logic levels. When

radix is fixed, The design trade-off is made among the logic levels, fan-out and wire

tracks. Kogge-Stone, Brent-Kung ,Sklansky , Ladner-Fischer are the major type

prefix structure. These prefix networks achieve three extreme goals: minimal logic

levels and wire tracks, minimal max-fanout and logic levels, and minimal wire tracks

and max-fanout, respectively. In addition, Ladner–Fischer, Han–Carlson and

Knowles implemented the trade-off between each pair of the extreme cases.

Structure of the prefix network determines the type of the prefix adder. Ziegler et

considered sparsity, fanout and radix as three dimensions in the design space of

regular parallel prefix adders and presented a unified formalism to describe such

structures. Kogge–Stone tree was a better choice than Ladner– Fischer tree..

4.6.1 Kogge-stone parallel prefix structure

Kogge-Stone prefix tree is among the type of prefix trees that use the

fewest logic levels.A 16-bit example is shown in Figure 3.8. In fact, Kogge-Stone is a

member of Knowles prefix tree . The 16-bit prefix tree can be viewed as Knowels .

The numbers in the brackets represent the maximum branch fan-out at each logic

level. The maximum fan-out is 2 in all logic levels for all width Kogge-Stone prefix

trees. The key of building a prefix tree is how to implement according to the specific

features of that type of prefix tree and apply the rules described in the previous

26

Page 27: Report

section. Gray cells are inserted similar to black cells except that the gray cells final

output carry outs instead of intermediate G=P group. The reason of starting with

Kogge-Stone prefix tree is that it is the easiest to build in terms of using a program

concept. The example in Figure 4.4 is 16-bit (a power of 2) prefix tree.

Fig 4.4 Kogge-Stone Prefix Tree

For the Kogge-Stone prefix tree, at the logic level 1, the inputs span is 1 bit (e.g.

group (4:3) take the inputs at bit 4 and bit 3). Group (4:3) will be taken as inputs and

combined with group (6:5) to generate group (6:3) at logic level 2. Group (6:3) will be

taken as inputs and combined with group (10:7) to generate group (10:3) at logic

level 3, and so on so forth.

4.6.2 brent kung

Brent-Kung prefix tree is a well-known structure with relatively sparse

network. The fanout is among the minimum as f = 0. So is the wire tracks where t =

0. The cost is the extra L - 1 logic levels. A 16-bit example is shown in Figure 4.5.

27

Page 28: Report

The critical path is shown in the figure with a thick gray line.Brent-Kung tree uses

only Less are when compared with Sklasky prefix tree.

.

Fig 4.5 16-bit Brent-Kung Prefix Tree

4.6.3 Sklansky Prefix Tree Sklansky prefix tree takes the least logic levels to compute the carries. Plus, it

uses less cells than Kogge-Stone structure at the cost of higher fan-out. Figure 4.6

shows the 16-bit example of Sklansky prefix tree with critical path in solid line.The

sklansky style prefix structure uses large area when compared with the brent-kung

tree parallel prefix structures For a 16-bit Sklansky prefix tree, the maximum fan-out

is 9 (i.e. f = 3). The structure can be viewed as a compacted version of Brent-kung's,

where logic levels is reduced and fan-out increased. The number of logic levels is

28

Page 29: Report

log2n. Each logic level has n=2 cells as can be observed in Figure 4.6. The area is

estimated as (n/2)log2n. When n = 16, 32 cells are required.

Fig 4.6 16-bit Sklansky Prefix Tree

4.6.4 Ladner-Fischer Prefix Tree

The major problem of Sklansky prefix tree is its high fan-out. Ladner-

Fischer prefix tree is proposed to relieve this problem. To reduce fan-out without

adding extra cells, more logic levels have to be added. Figure 4.6 shows a 16-bit

example of Ladner-Fischer prefix tree..Ladner-Fischer prefix tree is a structure that

sits between Brent-Kung and Sklansky prefix tree. It can be observed that in Figure

4.6 the first two logic levels of the structure are exactly the same as Brent-Kung's.

Starting from logic level 3, fan-out more than 2 is allowed (i.e. f > 0). Comparing the

fan-out of Ladner-Fischer's and Sklansky's, the number is reduced by a factor of 2

29

Page 30: Report

since Ladner-Fischer prefix tree allows more fa-nout one logic level later than

Sklansky prefix tree.[4]

Fig 4.7 11 bit Ladner-Fischer Prefix Tree Synthesis

.6.5 Knowles Prefix Tree

Knowles proposed a family of prefix trees with flexible architectures. Knowles

prefix trees use the fan-out at each logic level to name their family members. ].

Figure 4.7 shows a 16-bit Knowles prefix tree. Even different fan-out in the same

logic level is allowed in Knowles prefix trees, which is called hybrid Knowles prefix

tree. It can be proven that overlapping is allowed even for more than 1 bit as it is

allowed in prefix trees The Knowles prefix tree family has multiple architectures

which it can implement. It will not be diffficult to extend the algorithm once the basic

concepts on the prefix trees are forrmly established. Both Kogge-Stone and Knowles

prefix tree have the same number of logic levels. In Knowles prefix tree, the fan-out

30

Page 31: Report

at logic level 4 is 3 instead of 2. To build such prefix trees, the pseudo-code made for

Kogge-Stone prefix tree can be reused except for the change at the last level, they

also have the same number of cells. Hence, the area for Knowles prefix tree is also

estimated as nlog2n - n + 1.

Fig 4.8 16-bit Knowles Prefix Tree

4.6.6 HAN-CARLSON PREFIX TREE

The idea of Han-Carlson prefix tree is similar to Kogge-Stone's structure since it

has a maximum fan-out of 2 or f = 0. The difference is that Han-Carlson prefix tree

uses much less cells and wire tracks than Kogge-Stone. The cost is one extra logic

level. Han-Carlson prefix tree can be viewed as a sparse version of Kogge-Stone

prefix tree. In fact, the fan-out at all logic levels is the same (i.e. 2). The pseudo-code

for Kogge-Stone's structure can be easily modi_ed to build a Han-Carlson prefix tree.

The major difference is that in each logic level, Han-Carlson prefix tree places cells

every other bit and the last logic level accounts for the missing carries. Figure 4.8

shows a 16-bit Han-Carlson prefix tree, ignoring the buffers. The critical path is

31

Page 32: Report

shown with thick solid line. This type of Han-Carlson prefix tree has log2n + 1

logic levels. It happens to have the same number cells as Sklansky prefix

tree since the cells in the extra logic level can be move up to make the

each of the previous logic levels all have n=2 cells. The area is estimated

as (n/2)log2n. When n = 16, the number is 32.

Fig 4.9 HAN –CARLSON PREFIX TREE

4.6.7 Harris Prefix Tree

The idea from Harris about prefix tree is to try to balance the logic levels, fan-out

and wire tracks. Harris proposed a cube to show the taxonomy for prefix trees in

Figure 4.9, which illustrates the idea for 16-bit prefix trees . All the prefix trees

mentioned above are on the cube, with Sklansky prefix tree standing at the fan-out

extreme, Brent- Kung at the logic levels extreme, and Kogge-Stone at the wire track

32

Page 33: Report

extreme. The balanced prefix structure is close to the center of cube . The logic

levels is 24 + 1 = 5,maximum fan-out is 2f + 1 = 3 and wire track is 2t = 2. The

diagram is shown in Figure 4.9 with critical path in solid line

Fig 4.10 HARRIS PREFIX TREE

These are the various types of prefix parelel structures that are using for adder

design.Each will have their own advantages and disadvantages ,so according to the

purpose of task we can select the prefix structures.

.6.8 Algorithmic Analysis for Prefix Trees

Unfolding the algorithms mentioned, prefix trees can be built structurally

either by HDL or schematic entry.Each type pf prefix wil show difference in area,logic

levels ,fan out and Wiretracks .According to usage only we are choosing the prefix

tree structures in diminished -1 adders. Table 3.4 summarizes the prefix trees'

parameters, including logic levels, area estimation, fan-out and wire tracks.[5],[6].

33

Page 34: Report

FIG 4.3 ALGORITHMIC ANALYSIS

Type Logic levels

Area Fanout Wire tracks

Brent- Kung 2logn-2 2n-log2n-2 2 1

Kogge-Stone Log2n Nlog2n-n+1 2 n/2

Ladner-Fischer Log2n+1 (n/4)log2n+3n/4-1 n/4+1 1

Knowles Log2n (n/2)log2n 3 1

Sklansky Log2n (n/2)log2n n/2+1 1

Han-Carlson Log2n (n/2)log2n 2 n/4

Harris Log2n+1 (n/2)log2n 3 n/8

4.11 DESIGN OF AREA –EFFICIENT WEIGHTED MODULO 2N+ 1 ADDER

An improved area-efficient weighted modulo 2n + 1 adder design using

diminished-1 adders with simple correction schemes. This is achieved by subtracting

the sum of two (n + 1)-bit input numbers by the constant 2n + 1 and producing carry

and sum vectors. The modulo 2n + 1 addition can then be performed using parallel-

34

Page 35: Report

prefix structure diminished-1 adders by taking in the sum and carry vectors plus the

inverted end-around carry with simple correction schemes. The area cost for our

proposed adders is lower. In addition, our proposed adders do not require the

hardware for zero detection that is needed in diminished-1 modulo 2n + 1

addition.This consists of three blocks 1) Translator 2) diminished -1 adder 3)

correction circuit.Fig 4.7 shows the architecture of area efficient weighted modulo

2n+1 adder using correction schemes.

Fig 4.11 Architecture of proposed modulo 2n+1 adder

4.7.1 TRANSLATOR

Translator subtracts the sum of two (n + 1)-bit input numbers by the constant

2n + 1 and produces carry and sum vectors. we use the constant value −(2n + 1) to

be added by the sum of A and B. In addition, we make the two inputs A and B to be

35

Page 36: Report

in the range {0, 2n}, which is 1 more than {0, 2n – 1} in the existing system.The

translator wil change the 9-bit input to 8 bit output which wil be input for diminished

adder.The architecture of translator is given in the fig 4.12

Fig 4.12 Architecture of translator

4.7.2 TRANSLATOR CIRCUITS

Translator consists of FAF and FA+ architecture the values given to translator

wil pass through these circuits and it wil acts as a translator which reduces 1 bit in

this area and then make a proper input for the diminished -1 adder.Fig 4.13 shows

the structure of basic cells in translator.[1].

Fig 4.13 basic cells in translator

36

Page 37: Report

4.7.3 TRANSLATOR FROM MODULO 2N+1 TO THE PROPOSED

REPRESENTATION

Let be a binary number with and

the targeted representation. The zero indication bit can

be computed by : ,

while , or equivalently .

The last relation reveals that can be computed by a modulo adder, that

accepts as inputs the all 1s operand and the n least significant bits of operand and

as carry input the signal. Assuming an inclusive-OR implementation of the adder,

we have that and . Therefore, utilizing we get that the carry at each

position is given by :

The latter relation reveals that the adder required for implementing a translator

from the binary system to the adopted representation is composed by an exclusive-

NOR gate per bit and of a carry computation unit easily implemented as trees of

NOR gate

4.7.4 DIMINISHED-1 ADDER

Depending on the input/output data representations, these methods can be

classified into two categories, namely,diminished-1 and weighted, respectively. In

the diminished-1 representation, each input and output operand is decreased by 1

compared with its weighted representation. Therefore, only n-bit operands are

needed in diminished-1 modulo 2n + 1 addition, leading to smaller and faster

components. However, this incurs an overhead due to the translators from/to the

binary weighted system. On the other hand, the weighted-1 representation uses (n +

1)-bit operands for computations, avoiding the overhead of translators,but requires

37

Page 38: Report

larger area compared with the diminished-1 representations. The general operations

in modulo 2n + 1 addition were discussed in , including diminished-1 and weighted

modulo addition. proposed efficient parallel-prefix adders for diminished-1 modulo

2n+ 1 addition. To improve the area–time and time–power products, the circular carry

selection scheme was used to efficiently select the correct carry-in signals for final

modulo addition . The aforementioned methods all deal with diminished-1 modulo

addition. However, the hardware for decreasing/increasing the inputs/outputs by 1 is

omitted in the literature. In addition, the value zero is not allowed in diminished-1

modulo 2n + 1 addition, and hence, the zero-detection circuit is required to avoid

incorrect computation any dimished-1 adder can be used to perform weighted

modulo 2n + 1 addition of Y and U. first the translators to decrease the sum of two

n-bit inputs A and B by 1 and then performed the weighted modulo 2n + 1 addition

using diminished-1 adders. It should be noted that, for the architecture in Vergos and

Bakalis, the ranges of two inputs A and B are less than that proposed in Vergos and

efstathiou (i.e.,{0, 2n − 1} versus {0, 2n}). In this brief, we propose improved area-

efficient weighted modulo 2n + 1 adder design using diminished-1 adders with

simple correction schemes.

Diminished-1 adder can be used for the modulo 2n +1 addition of two n-bit

operands in the weighted representation, if it is driven by operands whose sum has

been decreased by 1. This scheme outperforms solutions that are based on the use

of binary adders and/or weighted modulo 2n + 1 adders in both area and delay terms.

We then apply this scheme in the design of residue generators (RGs) and multi-

operand modulo adders (MOMAs). The resulting arithmetic components remove at

least a whole parallel adder out of the critical path of the currently most efficient

proposals. Experimental results indicate savings of more than 30% in execution time

and of approximately 19% in implementation area when the proposed architectures

are used.Various tpes of diminished adders are Kogge–Stone tree, Sklansky, Brent–

Kung There are many classic parallel prefix adders that have been proposed,

including Sklansky , Kogge–Stone and Brent–Kung . These prefix networks achieve

three extreme goals: minimal logic levels and wire tracks, minimal max-fanout and

logic levels, and minimal wire tracks and max-fanout, respectively. In

addition, Ladner–Fischer, Han–Carlson and Knowles implemented the trade-off

between each pair of the extreme cases. Structure of the prefix network determines

38

Page 39: Report

the type of the prefix adder. Ziegler et considered sparsity, fanout and radix as three

dimensions in the design space of regular parallel prefix adders and presented a

unified formalism to describe such structures. Kogge–Stone tree was a better choice

than Ladner– Fischer tree. The works discussed above are based on ASIC

technology. Vitoroulis investigated the performance of parallel prefix adders

implemented with FPGA technology. It reported on the area requirements and critical

path delay for a variety of classical parallel prefix adder structures. However, parallel

prefix trees were implemented as a single adder, without being a part of bigger

designs. The diminished-1 adder’s result forms the least significant bits of the

weighted sum. The indication of complementary input vectors at the diminished-1

adder is the most significant bit of the weighted sum.

Parallel prefix networks are widely used in high- performance adders. Networks

io the Literature represent tradeoffs between number of logic levels, fanout, and

wiring tracks. . Adders using these networks are compared using the method

of logical effort. Tbe new architecture is competitive in latency and area for some

technologies.Common prefix computations include addition, incrementation,

priority encoding, etc[1].

4..7..3.1 Brent kung parallel prefix tree

The Brent-Kung adder is a parallel prefix form carry look-ahead adder The

Brent-Kung adder is a parallel prefix adder that requires 2(log2N)-1 stages. It

was originally proposed as a simple and regular design of a parallel adder that

addresses the problems of connecting gates in a way to minimize chip area.

Accordingly, it is considered one of the better tree adders for minimizing wiring

tracks, fanout, and gate count and is used as a basis for many other networks.,To

implement a parallel prefix tree, we need half-adder to calculate generated-carry and

propagated-carry at each bit position. Then, using these carry signals, we need

some other cells to compute group-generated carries and group propagated carries.

shows some gate-level basic cells which calculate group-propagated carry Pi:j and

groupgenerated carry Gi:j in the parallel prefix tree’s intermediate stages. In, the

quadrate cell calculates Pi:j and Gi:j simultaneously whereas the triangular cell just

calculates Gi:j . Therefore the circuit of the quadrate cell is more complex than that of

39

Page 40: Report

the triangular cell. With the help of these basic cells, the rough implementation of

Brent–Kung tree. We use HAi (0 ≤ i ≤ 7) to denote Half adder. Here, we do not take

the buffers into account. Here, for a regular parallel prefix adder which does addition

of two addends, we always assume that the incoming carry into this adder is c0 = 0.

For two N-bit binary addends

x = (xn−1xn−2, . . . , x0), y = (yn−1yn−2, . . . , y0), the formulations of computing

carry and sum at bit position i in parallel prefix tree are ci = Gi−1:0 _ (Pi−1:0 ^ c0), si

= Pi ⊕ ci , where 0 ≤ i ≤ n − 1. Because c0 = 0, we have ci = Gi−1:0 _ (Pi−1:0 ^ c0) =

Gi−1:0. That is why we can use two different basic cells in to build the regular Brent–

Kung tree in. The idea is that sometimes only the signal Gi−1:0 is needed, therefore

the triangular cell which is more simple can be used to reduce the complexity.

Vitoroulis compared the performance and area for regular parallel prefix trees which

are implemented on FPGA technology. But when the parallel prefix trees are

implemented as components of our EAC adder they cannot be designed in the

regular way . Both Gi:0 and Pi:0 should be kept as the outputs for reuse in the next

stage. For example, if we want to use Brent–Kung tree as the component in the EAC

adder, which means the parallel prefix tree is implemented using Brent–Kung tree,

we can only use the quadrate cell to calculate the signals in the intermediate stages.

We must change the regular design of Brent–Kung tree the rough architecture of the

modified Brent–Kung tree adopted. Therefore on FPGA technology, the properties of

the different parallel prefix trees such as area and performance will be different from

the results listed in Vitoroulis’s report. correction schemes. Diminished-1 adder can

be used for the modulo 2n +1 addition of two n-bit operands in the weighted

representation, if it is driven by operands whose sum has been decreased by 1. This

scheme outperforms solutions that are based on the use of binary adders and/or

weighted modulo 2n + 1 adders in both area and delay terms As a result, if we

implement different parallel prefix trees in our EAC adder, we should first change the

implementation of the parallel prefix tree itself; then, we also should take into

account the relationship between the parallel prefix trees and the other parts of the

EAC adder. Parallel prefix networks are widely used in high- performance adders.

Networks io the Literature represent tradeoffs between number of logic levels,

fanout, and wiring tracks. . Adders using these networks are compared using

the method of logical effort. Tbe new architecture is competitive in latency and

40

Page 41: Report

area for some technologies.Common prefix computations include addition,

incrementation, priority encoding

Fig 4.14 8 bit brent kung tree

Brent-Kung prefix tree is a well-known structure with relatively sparse network. The

fanout is among the minimum as f = 0. So is the wire tracks where t = 0. The cost is

the extra L - 1 logic levels This consists of a half adder and 2 basic cells one is

square cell and other is triangular cell. Generate and propagate for least significant i

bits.The output from the translator is giving to the half adder then it is goin to the

quadratic cell and then goin to the triangular cells.

Equations = (g0,p0) gi = Ai.Bi pi=AiÅBi for i>0:

(Gi,Pi)=(gi,pi)•(Gi-1,Pi-1) = (gi, pi) • (gi-1, pi-1) • . . . . • (g1, p1)

41

Page 42: Report

Fig 4.15 Basic cells in brent kung tree based diminished adder

some gate-level circuits of basic cells calculating Pi:j and Gi:j in the intermediated

stages of parallel prefix tree. The quadrate cell calculates Pi:j and Gi:j simultaneously

while the triangular cell just calculates Gi:j . So, the circuit of the quadrate cell is

more complex than that of the triangular cell. Using the basic cells, the rough

implementation of Brent-Kung tree is shown in Fig.4 .10 we use HAi (0 ≤ i ≤ 7) to

denote Half adder. For a regular parallel prefix adder which just does the addition of

two addends, the incoming carry into this adder is c0 = 0. For two N-bit binary

addends x = (xn−1xn−2 . . . x0), y = (yn−1yn−2 . . . y0) the formulations of computing carry

and sum at bit position i in parallel prefix tree are ci = Gi−1:0 ∨ (Pi−1:0 ∧ c0), si = Pi ∨ ci,

where 0 ≤ i ≤ n − 1. Because c0 = 0, so, we have ci = Gi−1:0 ∨ (Pi−1:0 ∧ c0) = Gi−1:0. That

why we can use two different basic cells i to build the Brent- Kung tree The

performance and area for regular parallel prefix tree implemented on FPGA

technology is good. But when the parallel prefix trees are implemented as

components of the EAC adder , they can’t be designed in the regular way . Both Gi:0

and Pi:0 should be kept as outputs for use in the next stage. For example, if we want

42

Page 43: Report

to implement Brent-Kung tree as the parallel prefix tree in we must only use one

basic cell .

4.7.4 CORRECTION CIRCUIT

The reason for FIX is that, under some conditions, y’n −1 = 2 (e.g., an = bn = 1

and an−1 = bn−1 = 0), which cannot be represented by 1-bit line (marked as “∗” in

Table I); therefore, the value of y’n −1 is set to 1, and the remaining value of carry (i.e.,

1) is set to FIX. Notice that FIX is wired-OR with the carry-out of Y ‘ + U’ (i.e., cout) to

be the inverted endaround carry (denoted by cout ∨ FIX) as the carry-in for the

diminished-1 addition stage later on. When y’n −1 = 2, FIX =1; otherwise, FIX = 0.

According to Table I, we can have y’n −1 = (an ∨ bn ∨ an−1 ∨ bn−1), u’n−1 = an−1 ⊕ bn−1,

and FIX = anbn ∨ bnan−1 ∨ anbn−1, respectively.Based on the aforementioned, our

proposed weighted modulo 2n + 1 addition of A and B is equivalent to

Fig 4.16 Correction circuit

This consists of a and gate and or gate this the signal of FIX can be computed in

parallel with the translation to Y ‘ + U’, leading to efficient correction

43

Page 44: Report

Table 4.4 Truth table for fix

an Bn An-1 Bn-1 U’n-1 Y’n-1 Fix

0 0 0 0 1 0 0

0 0 0 1 0 1 0

0 0 1 0 0 1 0

0 0 1 1 1 1 0

0 1 0 0 1 1 0

0 1 0 1 X X X

0 1 1 0 0 1* 1

0 1 1 1 X X X

1 0 0 0 1 1 0

1 0 0 1 0 1* 1

1 0 1 0 X X X

1 0 1 1 X X X

1 1 0 0 1 1* 1

1 1 0 1 X X X

1 1 1 0 X X X

1 1 1 1 X X X

According to Table I, we can have y_n −1 = (an ∨ bn ∨ an−1 ∨ bn−1), u_n−1 =

an−1 ⊕ bn−1, and FIX = anbn ∨ bnan−1 ∨ anbn−1, respectively. Based on the

aforementioned, our proposed weighted modulo 2n + 1 addition of A and B is

equivalent to

44

Page 45: Report

CHAPTER-5 5.1 RESULTS AND ANALYSIS

5.1.1 The wave form of translator is given below:

5.1.2 The output wave form of modulo 2n+1 adder without correction scheme is shown below:

45

Page 46: Report

5.1.3 The output waveform of modulo 2n+1 with corection scheme is given below

Sum of numbers upto 256 (8bit) wil come as usual then the value wil come zero,one respectively.

5.2 SYNTHESIS REPORT

5.2.1 SKLANSKY-STYLE PARELLEL PREFIX STRUCTURE Target Device:XA3S250E-4VQG100.

NUMBER OF SLICES: 30 OUT OF 2448 1%

NUMBER OF 4 INPUT LUTS: 52 OUT OF 4896 1%

NUMBER OF IOS: 27

NUMBER OF BONDED IOBS: 27 OUT OF 66 40%

46

Page 47: Report

5.2.2 BRENT –KUNG STYLE PARELLEL PREFIX STRUCTURE

Target Device:XA3S250E-4VQG100.

NUMBER OF SLICES: 24 OUT OF 2448 0%NUMBER OF 4 INPUT LUTS: 44 OUT OF 4896 0%NUMBER OF IOS: 27NUMBER OF BONDED IOBS: 27 OUT OF 66 40%

47

Page 48: Report

CHAPTER -6

CONCLUSION

An improved area-efficient weighted modulo 2n + 1 adder has been designed

with brent –kung parallel prefix tree based diminished adder. This has been achieved

by modifying the existing diminished-1 modulo adders to incorporate simple

correction schemes. The proposed adders can perform weighted modulo 2n + 1

addition and produce sums that are within the range {0, 2n}. The area cost for our

proposed adders is lower. In addition, proposed adders do not require the hardware

for zero detection that is needed in diminished-1 modulo 2n + 1 addition. This is

achieved by subtracting the sum of two (n + 1)-bit input numbers by the constant 2n +

1 and producing carry and sum vectors. The modulo 2n + 1 addition can then be

performed using parallel-prefix structure diminished-1 adders by taking in the sum

and carry vectors plus the inverted end-around carry with simple correction

schemes.Correction scheme include a fix value if the input value is higher than a

particular value then the value of fix is 1 othervise it will show zero.The main module

used are translator ,diminished -1 adder and a correction sheme.. The area cost for

our proposed adders is lower. In addition, our proposed adders do not require the

hardware for zero detection that is needed in diminished-1 modulo 2n + 1

addition .Brent prefix structure uses only less area when compared with the

skylansky prefix structure. The proposed adders has been implemented using 0.13-

μm CMOS technology, and the area required for our adders is lesser than previously

reported weighted modulo 2n + 1 adders with the same delay constraints. Synthesis

results show that our proposed adders can outperform previously reported weighted

modulo adder in terms of area under the same delay constraints.

REFERENCE 48

Page 49: Report

[1] H.T.Vergos and C.Efstathiou,”A unifying approach for weighted and diminished-1

modulo 2n+1 addition”IEEE Trans.circuit system 0ct 2008.

[2] M.A.soderstrand,W.K.Jenkins,”Residue Number System Arithmetic” Modern

application in Digital Signal Processing.

[3] . Somayeh Timarchi, Keivan Navi “Improved Modulo 2n +1 Adder Design “

International Journal of Computer and Information Engineering 2:7 2008

[4] Jun chen “ parallel-prefix structures for binary and modulo “ december, 2008.

[5] Zimmermann and David Q. Tran “Asilomar optimized synthesis of sum-of-

products” Reto Conference on Signals, Systems, and Computers, November 2003

[6] Feng Liu∗, Fariborz F.F†, Otmane Ait Mohamed “A Comparative Study of

Parallel Prefix Adders in FPGA Implementation of EAC” 2009 12th Euromicro

Conference on Digital System Design

[7] F. Liu, Q. Tan “Field programmable gate array prototyping of end-around carry

parallel prefix tree architectures “IET Computers & Digital Techniques Received on

27th March 2009

[8] J.Sklansky,”conditional sum addition logic”IRE Trans. Electron comput june 1960

[9] Amir Sabbagh Molahosseini, Keivan Navi, Chitra Dadkhah, Omid Kavehei, and

Somayeh Timarchi .“Efficient Reverse Converter Designs for the New 4-Moduli Sets”

IEEE transactions on circuits and systems, april 2010.

[10]”Residue number system” world scientific publishing Pvt.Ltd.

http://www.worldscibooks.com/engineering/p523.html

[11] H T Vergos, D Nicholas “Diminished –one modulo 2n+1 adder design” IEEE Tran

comput.Dec 2002.

[12] T. B Juang,M Y Tsai “Corrections on VLSI Design of diminished –one modulo

2n+1 adder using circular carry selection.

[13] R. Zimmermann, “Efficient VLSI implementation of modulo 2n ± 1 addition and

multiplication,” in Proc. 14th IEEE Symp. Comput. Arithmetic,Apr. 1999,.

[14] A. S. Madhukumar and F. Chin, “Enhanced architecture for residue number

system-based CDMA for high-rate data transmission,” IEEE Trans. Wireless

Communn Sep. 2004

[15] G. L. Bernocchi, G. C. Cardarilli, “Low-power adaptive filter based on RNS

components,” in Proc. IEEE ISCAS, May 2007, ..

49

Page 50: Report

[16] N. Kostaras and H. T. Vergos, “KoVer: A sophisticated residue arithmetic core

generator,” in Proc. 16th IEEE Int. Workshop Rapid Syst.Prototyp., 2005,

[17] T. Keller, T. H. Liew, and L. Hanzo, “Adaptive redundant residue number system

coded multicarrier modulation,” IEEE J. Sel. Areas Commun., , Nov. 2000.

[18] G. C. Cardarilli, A. Nannarelli, and M. Re, “Reducing power dissipation in FIR

filters using the residue number system,” in Proc. IEEE 43rd IEEE Midw. Symp.

Circuits Syst.jan 2000,

50