Upload
deepak-anilkumar
View
712
Download
2
Tags:
Embed Size (px)
Citation preview
CHAPTER-1 INTRODUCTION
The residue number system (RNS) has been employed for efficient parallel
carry-free arithmetic computations (addition, subtraction, and multiplication) in DSP
applications as the computations for each residue channel can independently be
done without carry propagation. A residue number system is defined by a set
of N integer constants, {m1, m2, m3, ... , mN },referred to as the moduli. Let M be
the least common multiple of all the mi. Any arbitrary integer X smaller than M can be
represented in the defined residue number system as a set of N smaller integers
{x1, x2, x3, ... , xN} with xi = X modulo mi representing the residue class of X to that
modulus.
RNS based computations can achieve significant speedup over the binary-
system-based computation, they are widely used in DSP processors, FIR filters, and
communication components Arithmetic modulo 2n + 1 computation is one of the most
common RNS operations that are used in pseudorandom number generation and
cryptography [The modulo 2n + 1addition is the most crucial step among the
commonly used moduli sets, such as {2n − 1, 2n, 2n + 1}, {2n − 1, 2n, 2n + 1, 22n + 1}
and {2n − 1, 2n, 2n + 1, 2n+1 + 1}. There are many previously reported methods to
speed up the modulo 2n + 1 addition. Depending on the input/output data
representations,these methods can be classified into two categories,
namely,diminished-1 and weighted respectively. In the diminished-1 representation,
each input and output operand is decreased by 1 compared with its weighted
representation. Therefore, only n-bit operands are needed in diminished-1 modulo
2n + 1 addition, leading to smaller and faster components. However, this incurs an
overhead due to the translators from/to the binary weighted system. On the other
hand, the weighted-1 representation uses (n + 1)-bit operands for computations,
avoiding the overhead of translators, but requires larger area compared with the
diminished-1 representations. The general operations in modulo 2n + 1 addition were
discussed including diminished-1 and weighted modulo addition. parallel-prefix
adders for diminished-1 modulo 2n+ 1 addition. To improve the area–time and time–
power products, the circular carry selection scheme was used to efficiently select the
1
correct carry-in signals for final modulo addition . The aforementioned methods all
deal with diminished-1 modulo addition. However, the hardware for
decreasing/increasing the inputs/outputs by 1 is omitted in the literature. In addition,
the value zero is not allowed in diminished-1 modulo 2n + 1 addition, and hence, the
zero-detection circuit is required to avoid incorrect computation. This leads to
increased hardware cost, here proposed a unified approach for weighted and
diminished-1 modulo 2n + 1 addition. This approach is based on making the modulo
2n + 1addition of two (n + 1)-bit input numbers A and B congruent to Y + U + 1,
where Y and U are two n-bit numbers. Thus, any dimished-1 adder can be used to
perform weighted modulo 2n + 1 addition of Y and U. first use the translators to
decrease the sum of two n-bit inputs A and B by 1 and then performed the weighted
modulo 2n + 1 addition using diminished-1 adders. It should be noted that, for the
architecture , the ranges of two inputs A and B are less than that proposed (i.e., {0,
2n − 1} versus {0, 2n}). In this brief, we propose improved area-efficient weighted
modulo 2n + 1 adder design using diminished-1 adders with simple correction
schemes. This is achieved by subtracting the sum of two (n + 1)-bit input numbers by
the constant 2n + 1 and producing carry and sum vectors. The modulo 2n + 1 addition
can then be performed using parallel-prefix structure diminished-1 adders by taking
in the sum and carry vectors plus the inverted end-around carry with simple
correction schemes. Compared with the work in, the area cost for our proposed
adders is lower. In addition, our proposed adders do not require the hardware for
zero detection that is needed in diminished-1 modulo 2n + 1 addition.
2
CHAPTER-2
AIM AND SCOPE OF PROJECT
In the diminished-1 representation, each input and output operand is decreased by 1
compared with its weighted representation. Therefore, only n-bit operands are needed in
diminished-1 modulo 2n + 1 addition, leading to smaller and faster components. However,
this incurs an overhead due to the translators from/to the binary weighted system. On the
other hand, the weighted-1 representation uses (n + 1)-bit operands for computations,
avoiding the overhead of translators, but requires larger area compared with the diminished-
1 representations. To improve the area–time and time–power products, the circular carry
selection scheme was used to efficiently select the correct carry-in signals for final modulo
addition. The aforementioned methods all deal with diminished-1 modulo addition. However,
the hardware for decreasing /increasing the inputs/outputs by 1 is omitted in the literature. In
addition, the value zero is not allowed in diminished-1 modulo 2n + 1 addition,and hence, the
zero-detection circuit is required to avoid incorrect computation.The brent –kung tree based
prefix structure uses onle less are when compared with the sklansky style prefix structure
This leads to increased hardware cost.The proposed unified approach for weighted and
diminished-1 modulo 2n + 1 addition is based on making the modulo 2n + 1addition of two (n
+ 1)-bit input numbers A and B congruent to Y + U + 1, where Y and U are two n-bit
numbers. Thus, any dimished-1 adder can be used to perform weighted modulo 2n + 1
addition of Y and U. The authors first used the translators to decrease the sum of two n-bit
inputs A and B by 1 and then performed the weighted modulo 2n + 1 addition using
diminished-1 adders In this design we are combining the previous two modulo (2n+1)
adders (diminished-1, weighted-1) to reduce the area & improve the performance,
.
3
CHAPTER-3
EXISTING METHODOLOGY
3.1 THEORY
Residue arithmetic has been used in digital computing systems for many
years. In particular, arithmetic modulo appears to play an important role in a variety
of applications. Modulo 2n+1 arithmetic is most commonly met in the residue number
system (RNS) , which is an arithmetic system well-suited to applications in which the
operations are limited to addition, subtraction and multiplication; a common case for
several digital signal processor (DSP) algorithms. The RNS has been used for the
design of digital signal processors finite- impulse response (FIR) filters and
communication components [16].
Three-moduli sets ({2n − 1, 2n, 2n + 1}of the form have received significant attention
as the RNS base, mainly because of the existence of efficient residue to binary
converters Addition in such systems is performed using three channels, that, in fact,
are a modulo {2n – 1)(equivalently one’s complement), a modulo and a modulo
adder( 2n + 1). From this, we conclude that the design of an efficient modulo (( 2n +
1}adder is a vital task in RNS-based applications that include a modulus of the form.
Unfortunately, in an RNS that uses a three moduli set , {2n − 1, 2n, 2n + 1} the
modulo(2n + 1} channel becomes the execution-rate bottleneck, since it has to deal
with n+1 bit operands, while the other two channels operate on -bit ones. The
diminished-1 representation was introduced to alleviate this problem, by having
each operand represented decreased by one compared to its weighted
representation and by deriving the results in an alternative manner when one or both
operands or the results are zero. The diminished-1sum is then computed as, by a
diminished-1 adder, which is an adder that increments the integer sum of and
whenever the carry flag of their respective integer addition is not set.A diminished-1
adder can be derived by connecting the inverted carry output of an integer adder
back to its carry input. However,such solutions are inefficient due to the resulting
oscillations. Therefore, a number of efficient architectures that do not suffer from
oscillations have been proposed. The need for handling zero operands and results
separately, as well as the need for time and hardware consuming input (output)
4
translators from (to) the weighted to (from) the diminished- 1 representation, make
the use of the diminished-1 representation efficient only when a large number of
calculations take place before a new conversion is required. In all other cases,
including all applications apart from RNS implementations, modulo adders with
operands in weighted representation are more suitable. Efficient architectures for
modulo adders for operands in weighted representation have also been proposed .
These two cases, namely modulo adders that operate on operands in the
diminished-1 representation (hereafter called diminished-1 adders) and those that
operate on operands that follow a weighted representation (hereafter called weighted
adders) have, so far, been considered distinct cases and efficient architectures for
them have been studied independently. In this brief it is shown that these two
alternatives can be unified. A diminished-1 adder can be derived by connecting the
inverted carry output of an integer adder back to its carry input. Given two -bit
numbers and , the problem of computing two -bit numbers and , such that to be
congruent to modulo , is attacked. It is shown that this problem has a constant time
solution, enabling every architecture that has been or will be proposed for
diminished-1 addition to also be used for addition of operands in the weighted
representation. The required unifying arithmetic operator is just a simplified inverted
end-around carry-save-adder (CSA) stage.[12],[15]
Fig 3.1 CSA stage with inverted end-around carry
5
3.2 REVIEW OF TWO PREVIOUS WEIGHED MODULO 2N+1 ADDER
Given two (n + 1)-bit numbers A and B, where 0 ≤ A,B ≤ 2n, the values of
diminished-1 of A and B are denoted by A ∗ =A − 1 and B ∗ = B − 1, respectively.
The diminished-1 sum S∗can be computed by
S∗ = |S − 1|2n+1 = |A + B − 1|2n+1 = |A ∗ + B |∗ 2n + cout (1)
where |X|Z is defined as modulo Z of X, and cout is denoted as the inverted end-
around carry of the diminished-1 modulo 2n sum of n-bit A ∗ and B∗.
3.2.1 VERGOS AND EFSTATHIOU
In this first compute the congruent modulo sum of A + B to produce Y and
U, and then, the final modulo sum is performed by any diminished-1 modulo
adder .Suppose A and B are two (n + 1)-bit input numbers, i.e., A = anan−1, . a0 = an ×
2n + An and B = bnbn−1, . . . , b0 = bn × 2n + Bn, where 0 <= A,B <= 2n, and An and Bn are
two n-bit numbers; then
|A + B|2n+1 = ||An + Bn + D + 1|2n+1 + 1|2n+1 =|Y + U + 1|2n+1., D = 2n − 4 + 2cn+1 + sn,
which is equivalent to 1111, . . . , cn+1sn, where cn+1 = an • bn (• is denoted as the logic
AND operation), and sn = an ⊕ bn (⊕ is denoted as the logic EXCLUSIVE-OR
operation) is the bit of D with binary weights 21 and 20, respectively. The first step of
this equation computes modulo 2n + 1 carry-save addition, giving the carry vector Y
and the sum vector U, where Y = yn−2yn−3, . . . , y0yn−1 and U = un−1un−2, . . . , u0 are
produced by adding An, Bn, and D, respectively. It can be seen that the values of D
with binary weights of 22 through 2n−1 are all 1, which can simplify the design of
adders to produce the carries and sums using OR and XNOR gates for every bit
position directly .In the bits of D with binary weights 21 and 20, the adders should be
modified to accept the values sn and cn+1, respectively.
6
Fig 3.2 Architecture of Vergos and Efstanthiou
3.2.2 VERGOS AND BAKALIS
In this method subtract the sum of the two n-bit inputs A and B by 1 to produce the
diminished-1 values A’ and B’, and modulo 2n sum of A’ and B’ can be performed by
any diminished-1 architecture, as follows:
||A + B|2n+1|2n = |A’ + B’|2n + c’out.
The value c’out is the inverted end-around carry produced by A’ + B’, and the
architecture is shown in Fig.3.2. The architecture proposed makes use of a
constant time operator, which is composed of a simplified carry-save adder stage,
leading to efficient modulo 2n + 1 adders. The architecture proposed] can be applied
in the design of area-efficient residue generators and multioperand modulo adders.
However, the values that are subtracted by the inputs A and B are not constants. In
this way to implement the translator for decreasing the sum of two inputs by 1 was
not mentioned.The ranges of two inputs A and B are less than the one proposed in
older one (i.e., {0, 2n − 1} versus {0, 2n}). [1]
7
Fig 3.3 Architecture of Vergos and Bakalis
3.3 DIMINISHED -1 ADDER
Diminished-1 adder can be used for the modulo 2n +1 addition of two n-bit
operands in the weighted representation, if it is driven by operands whose sum has
been decreased by 1. This scheme outperforms solutions that are based on the use
of binary adders and/or weighted modulo 2n + 1 adders in both area and delay
terms. The diminished adder used in this type Sklansky-style diminished adder .
For the Sklansky adder shown in Fig 3.3 ,. The Sklansky-style parallel-prefix
operation requires N/2 additions at each stage of the tree. Since all additions at a
given stage in the tree are completely independent, they can be run in parallel. This
is what makes this technique attractive for parallelizing associative functions.This
sklansky type structure uses more are than the brent kung tree prefix structure. [1].
8
FIG 3.4 Sklansky-style parallel-prefix structure
FIG 3.5 BASIC CELLS IN SKLANSKY –STYLE STRUCTURE
9
Sklansky prefix tree takes the least logic levels to compute the carries. Plus, it
uses less cells than Kogge-Stone structure at the cost of higher fan-out..The
sklansky style prefix structure uses large area when compared with the brent-kung
tree parallel prefix structures For a 16-bit Sklansky prefix tree, the maximum fan-out
is 9 (i.e. f = 3). The structure can be viewed as a compacted version of Brent-kung's,
where logic levels is reduced and fan-out increased. Sklansky-style parallel-prefix
structure with correction circuits for our proposed weighted modulo 28 + 1 adder. The
square (_) and diamond (♦) nodes denote the pre- and postprocessing stages of the
operands, respectively. The black nodes (•) evaluate the prefix operator, and the
white nodes (◦) pass the unchanged signals to the next prefix level.[1]
10
CHAPTER-4
PROPOSED SYSTEM
4.1 INTRODUCTION
An area-efficient weighted modulo 2n + 1 adder design using diminished-1
adders with simple correction schemes. This is achieved by subtracting the sum of
two (n + 1)-bit input numbers by the constant 2n + 1 and producing carry and sum
vectors. The modulo 2n + 1 addition can then be performed using parallel-prefix
structure diminished-1 adders by taking in the sum and carry vectors plus the
inverted end-around carry with simple correction schemes. The area cost for our
proposed adders is lower. In addition, our proposed adders do not require the
hardware for zero detection that is needed in diminished-1 modulo 2n + 1 addition..
4.2 THEORY
An area efficient modulo Instead of subtracting the sum of A and B by
D, which is not a constant as proposed in we use the constant value −(2n + 1) to be
added by the sum of A and B. In addition, we make the two inputs A and B to be in
the range {0, 2n}, which is 1 more than {0, 2n − 1} as proposed in we present the
designs of our proposed weighted modulo 2n +1 adder
.Given two (n + 1)-bit inputs,. A = anan−1, . . . , a0 and B =bnbn−1, . . . , b0, where
0 ≤ A,B ≤ 2n. The weighted modulo 2n + 1 of A + B can be represented as follows
11
From these equations , it can easily be seen that the value of the weighted modulo
2n + 1 addition can be obtained by first subtracting the value of the sum of A and B
by (2n + 1) (i.e., 0111, . . . , 1) and then using the diminished-1 adder to get the final
modulo sum by making the inverted end-around carry as the carry-in Now, we
present the method of weighted modulo 2n + 1 addition of A and B as follows.
Denoting Y’and U’ as the carry and sum vectors of the summation of A,B and
−(2n + 1), where Y’= y’n −2y’n −3, . . . , y’0y’n−1 and U’ = u’n−1u’n−2, . . . , u’0, the modulo
addition can be expressed as follows:
For i = 0 to n − 2, the values of y’i and u’i can be expressed as y’i = ai ∨ bi and u’i =
ai ⊕ bi, respectively (∨ is denoted as logic OR operation). Since the bit widths of Y’
and U’ are only n bits, the values of y’n −1 and u’n−1 are required to be computed taking
the values of an, bn, an−1, and bn−1 into consideration (i.e., y’n −1 and u’n−1 are the values
of the carry and the sum produced by 2an + 2bn + an−1 + bn−1 + 1, respectively). It
should be noted that 0 ≤ A,B ≤ 2n, which means an = an−1 = 1 or bn = bn−1 = 1 will
cause the value of A or B to exceed the range of {0, 2n}. Thus, these input
combinations (i.e., an = an−1 = 1 or bn = bn−1 = 1) are not allowed and can be viewed as
don’t care conditions, which can help us simplify the circuits for generating y’n −1 and
u’n−1. That is, the maximum value of 2an + 2bn + an−1 + bn−1 + 1 is 5, which occurs at an
= bn = 1 (i.e., the maximum value of y’n −1 is 2). .The reason for FIX is that, under
12
some conditions, y’n −1 =2 (e.g., an = bn = 1 and an−1 = bn−1 = 0), which cannot be
represented by 1-bit line therefore,the value of y’n −1 is set to 1, and the remaining
value of carry (i.e., 1) is set to FIX.[1].
4.3 RESIDUE NUMBER SYSTEM
A basic number system consists of a correspondence between
sequences of digits and numbers. In a fixed-point number system, each sequence
corresponds to exactly one number, and the radix-point |the “decimal point" in the
ordinary decimal number system| that is used to separate the integral and fractional
parts of a representation is in a fixed position.In contrast, in a °floating-point number
system, a given sequence may correspond to several numbers: the position of the
radix-point is not fixed,and each position in a digit-sequence indicates the particular
number represented. Usually, °floating-point systems are used for the representation
of real numbers, and fixed-point systems are used to represent integers (in which the
radix point is implicitly assumed to be at the right-hand end) or as parts of floating-
point representations; but there are a few exceptions to this general rule. Almost all
applications of RNS are as fixed-point number systems.If we consider a number
such as 271.834 in the ordinary decimal number system, we can observe that each
digit has a weight that corresponds to its position: hundred for the 2, ten for the 7, ...
thousand for the 4. This number system is therefore an example of a positional (or
weighted) number system; residue number systems, on the other hand, are non-
positional. The decimal number system is also a single-radix (or fixed-radix ) system,
as it has only one base (i.e. ten). .Although mixed-radix (i.e. multiple-radix) systems
are relatively rare, there are a few useful ones. Indeed, for the purposes of
conversion to and from other number systems, as well as for certain operations, it is
sometimes useful to associate a residue number system with a weighted, mixed-
radix number system.
Residue number systems are based on the congruence relation, which is defined as
follows. Two integers a and b are said to be congruent modulo m if m divides exactly
the difference of a and b; it is common, especially in mathematics tests, to write a= b
13
(mod m) to denote this. Thus, for example, 10 = 7 (mod 3); 10 = 4 (mod 3); 10 = 1
(mod 3), and 10 =-2 (mod 3). The number m is a modulus or base.
If q and r are the quotient and remainder, respectively, of the integer division of a by
m -that is, a = q.m + r -|then, by defenition, we have a = r (mod m). The number r is
said to be the residue of a with respect to m, and we shall usually denote this by r
=/a/m The set of m smallest values, {0; 1; 2; : : : ;m – 1}, that the residue may
assume is called the set of least positive residues modulo m. Unless otherwise
specified, we shall assume that these are the only residues in use.
Consider a set, {m1;m2; : : : ;mN}, of N positive and pairwise relatively prime
moduli Let M be the product of the moduli. Then every number X < M has a unique
representation in the residue number system, which is the set of residues
{/X]MI:1<=I<=N}. A partial proof of this is as follows. Suppose X1 and X2 are two
di®erent numbers with the same residue-set. Then /X1/mi = /x/jmi , and so /X1 - X2/mi
= 0. Therefore X1 - X2 is the least common multiple (lcm) of mi. But if the mi are
relatively prime, then their lcm is M, and it must be that X1 - X2 is a multiple of M. So
it cannot be that X1 < M and X2 < M. Therefore, the set {/jX/mi : 1<= i <= N} is unique
and may be taken as the representation of X. The number M is called the dynamic
range of the RNS, because the number of numbers that can be represented is M.
For unsigned numbers, that range is [0;M - 1].[17]
Representations in a system in which the moduli are not pairwise relatively
prime will be not be unique: two or more numbers will have the same representation.
As an example, the residues of the integers zero through fifteen relative to the
moduli two, three, and five (which are pairwise relatively prime) are given in the left
half of Table 4.1. And the residues of the same numbers relative to the moduli two,
four, and six (which are not pairwise relatively prime) are given in the right half of the
same table.Observe that no sequence of residues is repeated in the first half,
whereas there are repetitions in the second. The preceding discussions define what
may be considered standard residue number systems, and it is with these that we
shall primarily be concerned. Nevertheless, there are useful examples of
nontandard" RNS, the most common of which are the redundant residue number
systems. Such a system is obtained by,essentially, adding extra (redundant) moduli
to a standard system. The dynamic range then consists of a \legitimate" range,
14
defined by the non-redundant moduli and an \illegitimate" range; for arithmetic
operations,initial operands and results should be within legitimate range. Redundant
number systems of this type are especially useful in fault-tolerant computing. The
redundant moduli mean that digit-positions with errors may be excluded from
computations while still retaining a su±cient part of the dynamic range. Furthermore,
both the detection and correction of errors are possible: with k redundant moduli, it is
possible to detect up to k errors and to correct up to k/2errors. A different form of
redundancy can be introduced by extending the size of the digit-set corresponding to
a modulus, in a manner similar to RSDs. For a modulus m, the normal digit set is
{0,1,...m-1} but if instead the digit-set used is {0,1...m’-1}, where m’>=m then some
residues will have redundant representations.[10].[18]
Table 4.1 Residues for various moduli
4.3.1 MODULE SELECTION
In general, then, there are at least four considerations that should be taken
into account in the selection of moduli. First, the selected moduli must provide an
adequate range whilst also ensuring that RNS representations are unique. The
second is, as indicated above, the effiency of binary representations; in this regard, a
15
balance between the different moduli in a given moduli-set is also important. The
third is that, ideally, the implementations of arithmetic units for RNS should to some
extent be compatible with those for conventional arithmetic, especially given the \
legacy" that exists for the latter. And the fourth is the size of individual moduli:
Although, as we shall see, certain RNS-arithmetic operations do not require carries
between digits, which is one of the primary advantages of RNS, this is so only
between digits. Since a digit is ultimately represented in binary,there will be carries
between bits, and therefore it is important to ensure that digits (and, therefore, the
moduli) are not too large. Low-precision digits also make it possible to realize cost-
effective table-lookup implementations of arithmetic operations. But, on the other
hand, if the moduli are small, then a large number of them may be required to ensure
a sufficient dynamic range. Of course, ultimately the choices made, and indeed
whether RNS is useful or not, depend on the particular applications and technologies
at hand.
4.3.2 Negative numbers
Some applications require that it be possible to represent negative numbers
as well as positive ones. As with the conventional number systems, any one of the
radix complement, diminished-radix complement, or sign-and- magnitude notations
may be used in RNS for such representation. The merits and drawbacks of choosing
one over the other are similar to those for the conventional notations. In contrast with
the conventional notations, however, the determination of sign is much more diffcult
with the residue notations, as is magnitude-comparison. This is the case even with
sign-and- magnitude notation, since determining the sign of the result of an
arithmetic operation such as addition or subtraction is not easy|even if the signs of
the operands are known. The extension of sign-and-magnitude notation to RNS
involves the use of a single sign-digit or prepending to each residue in a
representation an extra bit or digit for the sign; we shall assume the former. For the
comple- ment notations, the range of representable numbers is usually partitioned
into two approximately equal parts, such that approximately half of the numbers are
positive and the rest are negative.
16
4..3.3 Basic arithmetic
The standard arithmetic operations of addition/subtraction and multiplication are
easily implemented with residue notation, depending on the choice of the moduli, but
division is much more difficult. The latter is not surprising, in light of the statement
above on the diffculties of sign-determination and magnitude-comparison. Residue
addition is carried out by individually adding corresponding digits, relative to the
modulus for their position. That is, a carry-out from one digit position is not
propagated into the next digit position. Subtraction may be carried out by negating
(in whatever is the chosen notation) the subtrahend and adding to the minuend. This
is straightforward for numbers in diminished-radix complement or radix complement
notation. For numbers represented in residue sign-and-magnitude, a slight
modiffcation of the algorithm for conventional sign-and-magnitude is necessary: the
sign digit is fanned out to all positions in the residue representation, and addition
then proceeds as in the case for unsigned numbers but with a conventional sign-
and-magnitude algorithm. Multiplication too can be performed simply by multiplying
corresponding residue digit-pairs, relative to the modulus for their position; that
is, multiply digits and ignore or adjust an appropriate part of the result.
4.3.4 Conversion
The most direct way to convert from a conventional representation to a
residue one, a process known as forward conversion, is to divide by each of the
given moduli and then collect the remainders. This, however, is likely to be a costly
operation if the number is represented in an arbitrary radix and the moduli are
arbitrary. If, on the other hand, the number is represented in radix-2 (or a radix that is
a power of two) and the moduli are of a suitable form (e.g. 2n¡1), then there
procedures that can be implemented with more effciency. The conversion from
residue notation to a conventional notation, a process known as reverse conversion,
is more di±cult (conceptually, if not necessarily in the implementation) and so far has
been one of the major impediments to the adoption use of RNS. One way in which it
can be done is to assign weights to the digits of a residue representation and then
produce a \conventional" (i.e positional, weighted) mixed-radix representation from
this. This mixed-radix representation can then be converted into whatever
17
conventional form is desired. In practice, the use of a direct conversion procedure for
the latter can be avoided by carrying out the arithmetic of the conversion in the
notation for the result. Another approach involves the use of the Chinese Remainder
Theorem, which is the basis for many algorithms for conversion from residue to
conventional notation; this too involves, in essence, the extraction of a mixed-radix
representation.
Residue number systems are also useful in error detection and correction.
This is apparent, given the independence of digits in a residue-number
representation: an error in one digit does not corrupt any other digits. In general, the
use of redundant moduli, i.e. extra moduli that play no role in determining the
dynamic range,facilitates both error detection and correction. But even without
redundant moduli, fault tolerance is possible, since computation can still continue
after the isolation of faulty digit-positions, provided that a smaller dynamic range is
acceptable. RNS can help speed up complex-number arithmetic:[2]
4.4 MODULO 2N+1 ADDER DESIGN
Efficient modulo 2n+1 adders are important for several applications
including residue number system, digital signal processors and cryptography
algorithms. In a conventional modulo 2n+1 adder, all operands have (n+1)-bit length.
To avoid using (n+1)-bit circuits, the diminished-1 and carry save diminished-1
number systems can be effectively used in applications. In the paper, we also derive
two new architectures for designing modulo 2n+1 adder, based on n-bit ripple-carry
adder. The first architecture is a faster design whereas the second one uses less
hardware. In the proposed method, the special treatment required for zero operands
in Diminished-1 number system is removed. In the fastest modulo 2n+1 adders in
normal binary system, there are 3-operand adders.For efficient design the hardware
overhead and power consumption will be reduced. As well as power reduction, in
some cases, power-delay product will be also reduced.
The modular characteristic of the Residue Number System (RNS) offers
the potential for high-speed and parallel arithmetic. In RNS logic, each operand is
represented by its residues with respect to a set of numbers comprising the base.
Addition, subtraction and multiplication are performed in parallel on the residues in
18
distinct design units (often called channels) avoiding carry propagation among
residues So, arithmetic operations, e.g. addition, subtraction and multiplication can
be carried out more efficiently in RNS than in conventional two’s complement
systems. That makes RNS a good candidate for implementing a lot of application
fields. Typical applications of the RNS can be found in Digital Signal Processing
(DSP) for filtering, convolutions, correlations, FFT computation , fault-tolerant
computer systems communication cryptography. The choice of moduli set is very
important and necessary for nearly equal delay of the channels. Special moduli sets
have been used extensively to reduce the hardware complexity in the
implementation of converters and arithmetic operations. Among which the triple
moduli set {2n-1,2n,2n+1} has some benefits . Because of operand lengths of these
moduli, the operation delay of this system is determined by the modulo 2n+ 1
channel. The latter means that, if we cut down the time required for modulo 2 n+1
addition,]. In order to speed up the modulo 2n+1 arithmetic operations the
diminished-1 representation of binary numbers has been introduced]. In the
Diminished-1 number system, each number X is represented by X*=X-1, while zero
is handled separately.. But in these circuits, it is necessary to use special treatment
for zero operands. To overcome mentioned problem, a number representation
socalled “Carry Save Diminished-1” has been proposed in. In this paper, an addition
algorithm in the carry save diminished-1 system is proposed. In the proposed
addition algorithm, the special treatment for zero operands is not required. Modulo
2n+ 1 adders can also be designed as a special case of general modulo m adders.
The novel architecture removes some significant problems of old structures and
reduces both area and power dissipation. In the paper, we derive new methodology
for modulo 2n+ 1 adder that leads to a ripple-carry adder architecture. Although
ripple-carry adder has more delay than carry-accelerate adder, it is useful for low
power and low area applications. Using implementation in a CMOS technology, we
show that the proposed ripple-carry design methodology leads to considerably less
area and power consumption than those reported in the related papers and in some
cases, power-delay product is also reduced. The conventional methods for modulo
2n+ 1 adder including general modulo adders, diminished-1 and carry save
diminished-1 modulo adders implemented by ripple-carry and parallel-prefix addition.
19
Modulo 2n+ 1 adders can be designed as a special case of general modulo m
adders. To remove the problem of (n+1)-bit wide circuits for the modulo 2n+ 1
channel, the diminished-1 and carry save diminished-1 number systems have been
proposed.[3]
Fig 4.1 general block diagram of modulo 2n+1 adder.
The general block diagram of 2N+1 adder is given in fig 4.1 The
only difference between modulo 2n +1 adder and modulo 2n - 1 adder is the inverter
that takes cout as input. In this end-around adder, cout needs to be inverted before
going to the incrementer. The ways of building a modulo 2n + 1 can also be divided
into three categories. One utilizes the reduced parallel prefix tree with an extra logic
level at the bottom . A second method uses the similar idea as the full parallel prefix
tree. The third one is the end-around adder with any type of adder followed by an
incrementer.[12]
20
4.4.1 Parallel-Prefix Ling Structures for Modulo 2n + 1 Adders
Ling's scheme can be applied to modulo 2n + 1 adders. Efficient modulo
2n+1 adders are important for several applications including residue number system,
digital signal processors and cryptography algorithms. In a conventional modulo 2n+1
adder, all operands have (n+1)-bit length. To avoid using (n+1)-bit circuits, the
diminished-1 and carry save diminished-1 number systems can be effectively used in
applications The idea is applicable for full parallelprefix structure for modulo 2n + 1
adders . Diminished-1 adder can be used for the modulo 2n +1 addition of two n-bit
operands in the weighted representation, if it is driven by operands whose sum has
been decreased by 1. This scheme outperforms solutions that are based on the use
of binary adders and/or weighted modulo 2n + 1 adders in both area and delay terms.
. To improve the area–time and time–power products, the circular carry selection
scheme was used to efficiently select the correct carry-in signals for final modulo
addition . The aforementioned methods all deal with diminished-1 modulo addition.
However, the hardware for decreasing/increasing the inputs/outputs by 1 is omitted
in the literature. This approach is based on making the modulo 2n + 1 addition of two
(n + 1)-bit input numbers A and B congruent to Y + U + 1, where Y and U are two n-
bit numbers. The three main parts of this technique are translator,diminished-1 adder
and correction scheme.Thus, any dimished-1 adder can be used to perform
weighted modulo 2n + 1 addition of Y and U In addition, the value zero is not allowed
in diminished-1 modulo 2n + 1 addition, and hence, the zero-detection circuit is
required to avoid incorrect computation any dimished-1 adder can be used to
perform weighted modulo 2n + 1 addition of Y and U We then apply this scheme in
the design of residue generators (RGs) and multi-operand modulo adders (MOMAs).
However, due to the complexity of the full parallel-prefix structure, the benefit tend to
diminish when Ling's equations are utilized, especially for wide adders (i.e. 64-bit or
larger). As there is an inverted carry-in for modulo 2n + 1 adders, the same logic will
be there in Ling's reduced prefix tree.[13]
21
Fig 4.2 modified Modulo 2n + 1 Adder with the Reduced Parallel-Pre_x
Ling Structure
4.4.2 Combination of Binary and Modulo 2n +1 Adder
Reviewing binary and modulo 2n+1 adder architectures, it can be found that
the prefix tree can be applied to all these adders. Special moduli sets have been
used extensively to reduce the hardware complexity in the implementation of
converters and arithmetic operations. Modulo 2n+ 1 adders can be designed as a
special case of general modulo m adders. To remove the problem of (n+1)-bit wide
circuits for the modulo 2n+ 1 channel, the diminished-1 and carry save diminished-1
number systems have been proposed Among which the triple moduli set {2n-
1,2n,2n+1} has some benefits . Because of operand lengths of these moduli, the
operation delay of this system is determined by the modulo 2n+ 1 channel. Modulo
adders are an extension of binary adders. The reduced parallel-prefix structure
applies to both modulo 2n-1 and 2n+1 adders, however, with the only difference of
one inverter.
22
Fig 4.3 Combined Binary and Modulo Adders.
In Figure 4.3, gi=pi comes from the pre-computation stage. The selections
0, 1 and 2 in the multiplexor are for modulo 2n + 1, modulo 2n - 1 and binary addition,
respectively the general equation for binary addition, the combined-function adder
can be formulized as the following equation shows.
Inserting this structure between pre- and post-computation, the adder architecture is
complete. The modified parallel-prefix tree does not handle the carry-input. This is
the only difference between this special prefix tree and that solely for binary adder.
23
The carry input is handled at the last row of gray cells. This agrees with the
associativity of the synthesis rule. The prefix tree can be modified from any type of
normal binary prefix tree.[4].
4.5 CARRY PROPAGATION ADDITION
Carry-propagate addition finally converts the redundant carry-save output
from the carry-save adder into irredundant binary representation by performing a
carry-propagation . A variety of different schemes exist to speed up carry-
propagation that trade off area versus speed. The relevant adder architectures and
their characteristics are summarized in Table 4.2. Two principles have to be
distinguished here: the prefix structure employed to propagate carries from lower to
upper bits and the sum bit generation that determines how the sum bits are
calculated from the carries.
Table 4.2 ADDER ARCHITECTURE CHARACTERISTICS
4.5.1 PREFIX STRUCTURE
carry-propagation in binary addition is a prefix problem [8], which can be
calculated using prefix structures .Besides the straightforward serial-prefix structure
(implemented by the ripple-carry adder) many different parallel-prefix structures
exist, which speed up carry-propagation at the cost of increased area requirements .
They basically differ in terms of depth (= circuit speed), size (= circuit area) and
24
maximum fanout, which can be bounded (constant) or unbounded (dependent on the
operand width) and influences circuit speed and area in a more subtle way. The
internal signals of a prefix implementation can be coded in different ways, resulting in
different possible logic implementations. Most common are the use of
generate/propagate signal pairs computed by AND-OR gates and carry-in-0/carry-in-
1 signal pairs
4.5.2 Architecture Performance Comparison
The relative performance of all these adder architectures varies greatly among
different technology libraries, so that only qualitative characteristics regarding area
and speed are summarized in Table I instead of quantitative comparison results. In
addition, the following observations can be made: • The ripple-carry adder
implemented using full-adder cells is always the smallest and slowest adder.
• The carry-skip adder massively speeds up the ripplecarry adder at a very
moderate area penalty but is still slower than any other architecture. However, due to
its false paths it cannot readily be used in synthesis-based design.
• The carry-select adder is very area efficient for medium speeds if special carry-
select adder cells are available in the library. Its prefix structure has the special
property of allowing maximally 2 prefix nodes per bit position.
• The carry-increment adder is an optimization of the carry-select adder that uses
the carry-lookahead scheme instead of the carry-select scheme for the same prefix
structure. It has the same delay but a 30% smaller gate count.
• The Brent-Kung parallel-prefix adder gives a good trade-off between area and
speed, lying in the range of -15% to -30% area reduction at +15% to +30% delay
increase as compared to the faster Sklansky parallelPrefix adder.
• The Sklansky parallel-prefix adder (uses the prefix structure first proposed by
Sklansky for conditional-sum adders has a prefix structure of minimal depth and
therefore is among the fastest adder architectures. Its unbounded-fanout property
helps reduce circuit area (fewer prefix nodes) but adds some extra delay for driving
the high-fanout nodes.
• The Kogge-Stone parallel-prefix adder also has a minimal depth prefix structure.
Its bounded-fanout property eliminates the need for driving high-fanout nodes,
25
making it the fastest adder in most technologies, but comes at the cost of much
bigger area (more prefix nodes) and more wiring. Compared to the Sklansky prefix
adder, it shows an area increase between +23% (8 bit) and +75% (128 bit) at a fairly
constant delay reduction of around -4% (all widths).[5].
4.6 PARELLEL PREFIX TREE STRUCTURE
Parallel-prefix trees have various architectures. These prefix trees can be
distinguished by four major factors. 1) Radix/Valency 2) Logic Levels 3) Fan-out 4)
Wire Tracks In the following discussion about prefix trees, the radix is assumed to be
2 (i.e. the number of 32 inputs to the logic gates is always 2). The more aggressive
prefix schemes have logic levels [log2(n)], where n is the width of the inputs.
However, these schemes require higher fanout,or many wire-tracks or dense logic
gates, which will compromise the performance e.g.speed or power. Some other
schemes have relieved fan-out and wire tracks at the cost of more logic levels. When
radix is fixed, The design trade-off is made among the logic levels, fan-out and wire
tracks. Kogge-Stone, Brent-Kung ,Sklansky , Ladner-Fischer are the major type
prefix structure. These prefix networks achieve three extreme goals: minimal logic
levels and wire tracks, minimal max-fanout and logic levels, and minimal wire tracks
and max-fanout, respectively. In addition, Ladner–Fischer, Han–Carlson and
Knowles implemented the trade-off between each pair of the extreme cases.
Structure of the prefix network determines the type of the prefix adder. Ziegler et
considered sparsity, fanout and radix as three dimensions in the design space of
regular parallel prefix adders and presented a unified formalism to describe such
structures. Kogge–Stone tree was a better choice than Ladner– Fischer tree..
4.6.1 Kogge-stone parallel prefix structure
Kogge-Stone prefix tree is among the type of prefix trees that use the
fewest logic levels.A 16-bit example is shown in Figure 3.8. In fact, Kogge-Stone is a
member of Knowles prefix tree . The 16-bit prefix tree can be viewed as Knowels .
The numbers in the brackets represent the maximum branch fan-out at each logic
level. The maximum fan-out is 2 in all logic levels for all width Kogge-Stone prefix
trees. The key of building a prefix tree is how to implement according to the specific
features of that type of prefix tree and apply the rules described in the previous
26
section. Gray cells are inserted similar to black cells except that the gray cells final
output carry outs instead of intermediate G=P group. The reason of starting with
Kogge-Stone prefix tree is that it is the easiest to build in terms of using a program
concept. The example in Figure 4.4 is 16-bit (a power of 2) prefix tree.
Fig 4.4 Kogge-Stone Prefix Tree
For the Kogge-Stone prefix tree, at the logic level 1, the inputs span is 1 bit (e.g.
group (4:3) take the inputs at bit 4 and bit 3). Group (4:3) will be taken as inputs and
combined with group (6:5) to generate group (6:3) at logic level 2. Group (6:3) will be
taken as inputs and combined with group (10:7) to generate group (10:3) at logic
level 3, and so on so forth.
4.6.2 brent kung
Brent-Kung prefix tree is a well-known structure with relatively sparse
network. The fanout is among the minimum as f = 0. So is the wire tracks where t =
0. The cost is the extra L - 1 logic levels. A 16-bit example is shown in Figure 4.5.
27
The critical path is shown in the figure with a thick gray line.Brent-Kung tree uses
only Less are when compared with Sklasky prefix tree.
.
Fig 4.5 16-bit Brent-Kung Prefix Tree
4.6.3 Sklansky Prefix Tree Sklansky prefix tree takes the least logic levels to compute the carries. Plus, it
uses less cells than Kogge-Stone structure at the cost of higher fan-out. Figure 4.6
shows the 16-bit example of Sklansky prefix tree with critical path in solid line.The
sklansky style prefix structure uses large area when compared with the brent-kung
tree parallel prefix structures For a 16-bit Sklansky prefix tree, the maximum fan-out
is 9 (i.e. f = 3). The structure can be viewed as a compacted version of Brent-kung's,
where logic levels is reduced and fan-out increased. The number of logic levels is
28
log2n. Each logic level has n=2 cells as can be observed in Figure 4.6. The area is
estimated as (n/2)log2n. When n = 16, 32 cells are required.
Fig 4.6 16-bit Sklansky Prefix Tree
4.6.4 Ladner-Fischer Prefix Tree
The major problem of Sklansky prefix tree is its high fan-out. Ladner-
Fischer prefix tree is proposed to relieve this problem. To reduce fan-out without
adding extra cells, more logic levels have to be added. Figure 4.6 shows a 16-bit
example of Ladner-Fischer prefix tree..Ladner-Fischer prefix tree is a structure that
sits between Brent-Kung and Sklansky prefix tree. It can be observed that in Figure
4.6 the first two logic levels of the structure are exactly the same as Brent-Kung's.
Starting from logic level 3, fan-out more than 2 is allowed (i.e. f > 0). Comparing the
fan-out of Ladner-Fischer's and Sklansky's, the number is reduced by a factor of 2
29
since Ladner-Fischer prefix tree allows more fa-nout one logic level later than
Sklansky prefix tree.[4]
Fig 4.7 11 bit Ladner-Fischer Prefix Tree Synthesis
.6.5 Knowles Prefix Tree
Knowles proposed a family of prefix trees with flexible architectures. Knowles
prefix trees use the fan-out at each logic level to name their family members. ].
Figure 4.7 shows a 16-bit Knowles prefix tree. Even different fan-out in the same
logic level is allowed in Knowles prefix trees, which is called hybrid Knowles prefix
tree. It can be proven that overlapping is allowed even for more than 1 bit as it is
allowed in prefix trees The Knowles prefix tree family has multiple architectures
which it can implement. It will not be diffficult to extend the algorithm once the basic
concepts on the prefix trees are forrmly established. Both Kogge-Stone and Knowles
prefix tree have the same number of logic levels. In Knowles prefix tree, the fan-out
30
at logic level 4 is 3 instead of 2. To build such prefix trees, the pseudo-code made for
Kogge-Stone prefix tree can be reused except for the change at the last level, they
also have the same number of cells. Hence, the area for Knowles prefix tree is also
estimated as nlog2n - n + 1.
Fig 4.8 16-bit Knowles Prefix Tree
4.6.6 HAN-CARLSON PREFIX TREE
The idea of Han-Carlson prefix tree is similar to Kogge-Stone's structure since it
has a maximum fan-out of 2 or f = 0. The difference is that Han-Carlson prefix tree
uses much less cells and wire tracks than Kogge-Stone. The cost is one extra logic
level. Han-Carlson prefix tree can be viewed as a sparse version of Kogge-Stone
prefix tree. In fact, the fan-out at all logic levels is the same (i.e. 2). The pseudo-code
for Kogge-Stone's structure can be easily modi_ed to build a Han-Carlson prefix tree.
The major difference is that in each logic level, Han-Carlson prefix tree places cells
every other bit and the last logic level accounts for the missing carries. Figure 4.8
shows a 16-bit Han-Carlson prefix tree, ignoring the buffers. The critical path is
31
shown with thick solid line. This type of Han-Carlson prefix tree has log2n + 1
logic levels. It happens to have the same number cells as Sklansky prefix
tree since the cells in the extra logic level can be move up to make the
each of the previous logic levels all have n=2 cells. The area is estimated
as (n/2)log2n. When n = 16, the number is 32.
Fig 4.9 HAN –CARLSON PREFIX TREE
4.6.7 Harris Prefix Tree
The idea from Harris about prefix tree is to try to balance the logic levels, fan-out
and wire tracks. Harris proposed a cube to show the taxonomy for prefix trees in
Figure 4.9, which illustrates the idea for 16-bit prefix trees . All the prefix trees
mentioned above are on the cube, with Sklansky prefix tree standing at the fan-out
extreme, Brent- Kung at the logic levels extreme, and Kogge-Stone at the wire track
32
extreme. The balanced prefix structure is close to the center of cube . The logic
levels is 24 + 1 = 5,maximum fan-out is 2f + 1 = 3 and wire track is 2t = 2. The
diagram is shown in Figure 4.9 with critical path in solid line
Fig 4.10 HARRIS PREFIX TREE
These are the various types of prefix parelel structures that are using for adder
design.Each will have their own advantages and disadvantages ,so according to the
purpose of task we can select the prefix structures.
.6.8 Algorithmic Analysis for Prefix Trees
Unfolding the algorithms mentioned, prefix trees can be built structurally
either by HDL or schematic entry.Each type pf prefix wil show difference in area,logic
levels ,fan out and Wiretracks .According to usage only we are choosing the prefix
tree structures in diminished -1 adders. Table 3.4 summarizes the prefix trees'
parameters, including logic levels, area estimation, fan-out and wire tracks.[5],[6].
33
FIG 4.3 ALGORITHMIC ANALYSIS
Type Logic levels
Area Fanout Wire tracks
Brent- Kung 2logn-2 2n-log2n-2 2 1
Kogge-Stone Log2n Nlog2n-n+1 2 n/2
Ladner-Fischer Log2n+1 (n/4)log2n+3n/4-1 n/4+1 1
Knowles Log2n (n/2)log2n 3 1
Sklansky Log2n (n/2)log2n n/2+1 1
Han-Carlson Log2n (n/2)log2n 2 n/4
Harris Log2n+1 (n/2)log2n 3 n/8
4.11 DESIGN OF AREA –EFFICIENT WEIGHTED MODULO 2N+ 1 ADDER
An improved area-efficient weighted modulo 2n + 1 adder design using
diminished-1 adders with simple correction schemes. This is achieved by subtracting
the sum of two (n + 1)-bit input numbers by the constant 2n + 1 and producing carry
and sum vectors. The modulo 2n + 1 addition can then be performed using parallel-
34
prefix structure diminished-1 adders by taking in the sum and carry vectors plus the
inverted end-around carry with simple correction schemes. The area cost for our
proposed adders is lower. In addition, our proposed adders do not require the
hardware for zero detection that is needed in diminished-1 modulo 2n + 1
addition.This consists of three blocks 1) Translator 2) diminished -1 adder 3)
correction circuit.Fig 4.7 shows the architecture of area efficient weighted modulo
2n+1 adder using correction schemes.
Fig 4.11 Architecture of proposed modulo 2n+1 adder
4.7.1 TRANSLATOR
Translator subtracts the sum of two (n + 1)-bit input numbers by the constant
2n + 1 and produces carry and sum vectors. we use the constant value −(2n + 1) to
be added by the sum of A and B. In addition, we make the two inputs A and B to be
35
in the range {0, 2n}, which is 1 more than {0, 2n – 1} in the existing system.The
translator wil change the 9-bit input to 8 bit output which wil be input for diminished
adder.The architecture of translator is given in the fig 4.12
Fig 4.12 Architecture of translator
4.7.2 TRANSLATOR CIRCUITS
Translator consists of FAF and FA+ architecture the values given to translator
wil pass through these circuits and it wil acts as a translator which reduces 1 bit in
this area and then make a proper input for the diminished -1 adder.Fig 4.13 shows
the structure of basic cells in translator.[1].
Fig 4.13 basic cells in translator
36
4.7.3 TRANSLATOR FROM MODULO 2N+1 TO THE PROPOSED
REPRESENTATION
Let be a binary number with and
the targeted representation. The zero indication bit can
be computed by : ,
while , or equivalently .
The last relation reveals that can be computed by a modulo adder, that
accepts as inputs the all 1s operand and the n least significant bits of operand and
as carry input the signal. Assuming an inclusive-OR implementation of the adder,
we have that and . Therefore, utilizing we get that the carry at each
position is given by :
The latter relation reveals that the adder required for implementing a translator
from the binary system to the adopted representation is composed by an exclusive-
NOR gate per bit and of a carry computation unit easily implemented as trees of
NOR gate
4.7.4 DIMINISHED-1 ADDER
Depending on the input/output data representations, these methods can be
classified into two categories, namely,diminished-1 and weighted, respectively. In
the diminished-1 representation, each input and output operand is decreased by 1
compared with its weighted representation. Therefore, only n-bit operands are
needed in diminished-1 modulo 2n + 1 addition, leading to smaller and faster
components. However, this incurs an overhead due to the translators from/to the
binary weighted system. On the other hand, the weighted-1 representation uses (n +
1)-bit operands for computations, avoiding the overhead of translators,but requires
37
larger area compared with the diminished-1 representations. The general operations
in modulo 2n + 1 addition were discussed in , including diminished-1 and weighted
modulo addition. proposed efficient parallel-prefix adders for diminished-1 modulo
2n+ 1 addition. To improve the area–time and time–power products, the circular carry
selection scheme was used to efficiently select the correct carry-in signals for final
modulo addition . The aforementioned methods all deal with diminished-1 modulo
addition. However, the hardware for decreasing/increasing the inputs/outputs by 1 is
omitted in the literature. In addition, the value zero is not allowed in diminished-1
modulo 2n + 1 addition, and hence, the zero-detection circuit is required to avoid
incorrect computation any dimished-1 adder can be used to perform weighted
modulo 2n + 1 addition of Y and U. first the translators to decrease the sum of two
n-bit inputs A and B by 1 and then performed the weighted modulo 2n + 1 addition
using diminished-1 adders. It should be noted that, for the architecture in Vergos and
Bakalis, the ranges of two inputs A and B are less than that proposed in Vergos and
efstathiou (i.e.,{0, 2n − 1} versus {0, 2n}). In this brief, we propose improved area-
efficient weighted modulo 2n + 1 adder design using diminished-1 adders with
simple correction schemes.
Diminished-1 adder can be used for the modulo 2n +1 addition of two n-bit
operands in the weighted representation, if it is driven by operands whose sum has
been decreased by 1. This scheme outperforms solutions that are based on the use
of binary adders and/or weighted modulo 2n + 1 adders in both area and delay terms.
We then apply this scheme in the design of residue generators (RGs) and multi-
operand modulo adders (MOMAs). The resulting arithmetic components remove at
least a whole parallel adder out of the critical path of the currently most efficient
proposals. Experimental results indicate savings of more than 30% in execution time
and of approximately 19% in implementation area when the proposed architectures
are used.Various tpes of diminished adders are Kogge–Stone tree, Sklansky, Brent–
Kung There are many classic parallel prefix adders that have been proposed,
including Sklansky , Kogge–Stone and Brent–Kung . These prefix networks achieve
three extreme goals: minimal logic levels and wire tracks, minimal max-fanout and
logic levels, and minimal wire tracks and max-fanout, respectively. In
addition, Ladner–Fischer, Han–Carlson and Knowles implemented the trade-off
between each pair of the extreme cases. Structure of the prefix network determines
38
the type of the prefix adder. Ziegler et considered sparsity, fanout and radix as three
dimensions in the design space of regular parallel prefix adders and presented a
unified formalism to describe such structures. Kogge–Stone tree was a better choice
than Ladner– Fischer tree. The works discussed above are based on ASIC
technology. Vitoroulis investigated the performance of parallel prefix adders
implemented with FPGA technology. It reported on the area requirements and critical
path delay for a variety of classical parallel prefix adder structures. However, parallel
prefix trees were implemented as a single adder, without being a part of bigger
designs. The diminished-1 adder’s result forms the least significant bits of the
weighted sum. The indication of complementary input vectors at the diminished-1
adder is the most significant bit of the weighted sum.
Parallel prefix networks are widely used in high- performance adders. Networks
io the Literature represent tradeoffs between number of logic levels, fanout, and
wiring tracks. . Adders using these networks are compared using the method
of logical effort. Tbe new architecture is competitive in latency and area for some
technologies.Common prefix computations include addition, incrementation,
priority encoding, etc[1].
4..7..3.1 Brent kung parallel prefix tree
The Brent-Kung adder is a parallel prefix form carry look-ahead adder The
Brent-Kung adder is a parallel prefix adder that requires 2(log2N)-1 stages. It
was originally proposed as a simple and regular design of a parallel adder that
addresses the problems of connecting gates in a way to minimize chip area.
Accordingly, it is considered one of the better tree adders for minimizing wiring
tracks, fanout, and gate count and is used as a basis for many other networks.,To
implement a parallel prefix tree, we need half-adder to calculate generated-carry and
propagated-carry at each bit position. Then, using these carry signals, we need
some other cells to compute group-generated carries and group propagated carries.
shows some gate-level basic cells which calculate group-propagated carry Pi:j and
groupgenerated carry Gi:j in the parallel prefix tree’s intermediate stages. In, the
quadrate cell calculates Pi:j and Gi:j simultaneously whereas the triangular cell just
calculates Gi:j . Therefore the circuit of the quadrate cell is more complex than that of
39
the triangular cell. With the help of these basic cells, the rough implementation of
Brent–Kung tree. We use HAi (0 ≤ i ≤ 7) to denote Half adder. Here, we do not take
the buffers into account. Here, for a regular parallel prefix adder which does addition
of two addends, we always assume that the incoming carry into this adder is c0 = 0.
For two N-bit binary addends
x = (xn−1xn−2, . . . , x0), y = (yn−1yn−2, . . . , y0), the formulations of computing
carry and sum at bit position i in parallel prefix tree are ci = Gi−1:0 _ (Pi−1:0 ^ c0), si
= Pi ⊕ ci , where 0 ≤ i ≤ n − 1. Because c0 = 0, we have ci = Gi−1:0 _ (Pi−1:0 ^ c0) =
Gi−1:0. That is why we can use two different basic cells in to build the regular Brent–
Kung tree in. The idea is that sometimes only the signal Gi−1:0 is needed, therefore
the triangular cell which is more simple can be used to reduce the complexity.
Vitoroulis compared the performance and area for regular parallel prefix trees which
are implemented on FPGA technology. But when the parallel prefix trees are
implemented as components of our EAC adder they cannot be designed in the
regular way . Both Gi:0 and Pi:0 should be kept as the outputs for reuse in the next
stage. For example, if we want to use Brent–Kung tree as the component in the EAC
adder, which means the parallel prefix tree is implemented using Brent–Kung tree,
we can only use the quadrate cell to calculate the signals in the intermediate stages.
We must change the regular design of Brent–Kung tree the rough architecture of the
modified Brent–Kung tree adopted. Therefore on FPGA technology, the properties of
the different parallel prefix trees such as area and performance will be different from
the results listed in Vitoroulis’s report. correction schemes. Diminished-1 adder can
be used for the modulo 2n +1 addition of two n-bit operands in the weighted
representation, if it is driven by operands whose sum has been decreased by 1. This
scheme outperforms solutions that are based on the use of binary adders and/or
weighted modulo 2n + 1 adders in both area and delay terms As a result, if we
implement different parallel prefix trees in our EAC adder, we should first change the
implementation of the parallel prefix tree itself; then, we also should take into
account the relationship between the parallel prefix trees and the other parts of the
EAC adder. Parallel prefix networks are widely used in high- performance adders.
Networks io the Literature represent tradeoffs between number of logic levels,
fanout, and wiring tracks. . Adders using these networks are compared using
the method of logical effort. Tbe new architecture is competitive in latency and
40
area for some technologies.Common prefix computations include addition,
incrementation, priority encoding
Fig 4.14 8 bit brent kung tree
Brent-Kung prefix tree is a well-known structure with relatively sparse network. The
fanout is among the minimum as f = 0. So is the wire tracks where t = 0. The cost is
the extra L - 1 logic levels This consists of a half adder and 2 basic cells one is
square cell and other is triangular cell. Generate and propagate for least significant i
bits.The output from the translator is giving to the half adder then it is goin to the
quadratic cell and then goin to the triangular cells.
Equations = (g0,p0) gi = Ai.Bi pi=AiÅBi for i>0:
(Gi,Pi)=(gi,pi)•(Gi-1,Pi-1) = (gi, pi) • (gi-1, pi-1) • . . . . • (g1, p1)
41
Fig 4.15 Basic cells in brent kung tree based diminished adder
some gate-level circuits of basic cells calculating Pi:j and Gi:j in the intermediated
stages of parallel prefix tree. The quadrate cell calculates Pi:j and Gi:j simultaneously
while the triangular cell just calculates Gi:j . So, the circuit of the quadrate cell is
more complex than that of the triangular cell. Using the basic cells, the rough
implementation of Brent-Kung tree is shown in Fig.4 .10 we use HAi (0 ≤ i ≤ 7) to
denote Half adder. For a regular parallel prefix adder which just does the addition of
two addends, the incoming carry into this adder is c0 = 0. For two N-bit binary
addends x = (xn−1xn−2 . . . x0), y = (yn−1yn−2 . . . y0) the formulations of computing carry
and sum at bit position i in parallel prefix tree are ci = Gi−1:0 ∨ (Pi−1:0 ∧ c0), si = Pi ∨ ci,
where 0 ≤ i ≤ n − 1. Because c0 = 0, so, we have ci = Gi−1:0 ∨ (Pi−1:0 ∧ c0) = Gi−1:0. That
why we can use two different basic cells i to build the Brent- Kung tree The
performance and area for regular parallel prefix tree implemented on FPGA
technology is good. But when the parallel prefix trees are implemented as
components of the EAC adder , they can’t be designed in the regular way . Both Gi:0
and Pi:0 should be kept as outputs for use in the next stage. For example, if we want
42
to implement Brent-Kung tree as the parallel prefix tree in we must only use one
basic cell .
4.7.4 CORRECTION CIRCUIT
The reason for FIX is that, under some conditions, y’n −1 = 2 (e.g., an = bn = 1
and an−1 = bn−1 = 0), which cannot be represented by 1-bit line (marked as “∗” in
Table I); therefore, the value of y’n −1 is set to 1, and the remaining value of carry (i.e.,
1) is set to FIX. Notice that FIX is wired-OR with the carry-out of Y ‘ + U’ (i.e., cout) to
be the inverted endaround carry (denoted by cout ∨ FIX) as the carry-in for the
diminished-1 addition stage later on. When y’n −1 = 2, FIX =1; otherwise, FIX = 0.
According to Table I, we can have y’n −1 = (an ∨ bn ∨ an−1 ∨ bn−1), u’n−1 = an−1 ⊕ bn−1,
and FIX = anbn ∨ bnan−1 ∨ anbn−1, respectively.Based on the aforementioned, our
proposed weighted modulo 2n + 1 addition of A and B is equivalent to
Fig 4.16 Correction circuit
This consists of a and gate and or gate this the signal of FIX can be computed in
parallel with the translation to Y ‘ + U’, leading to efficient correction
43
Table 4.4 Truth table for fix
an Bn An-1 Bn-1 U’n-1 Y’n-1 Fix
0 0 0 0 1 0 0
0 0 0 1 0 1 0
0 0 1 0 0 1 0
0 0 1 1 1 1 0
0 1 0 0 1 1 0
0 1 0 1 X X X
0 1 1 0 0 1* 1
0 1 1 1 X X X
1 0 0 0 1 1 0
1 0 0 1 0 1* 1
1 0 1 0 X X X
1 0 1 1 X X X
1 1 0 0 1 1* 1
1 1 0 1 X X X
1 1 1 0 X X X
1 1 1 1 X X X
According to Table I, we can have y_n −1 = (an ∨ bn ∨ an−1 ∨ bn−1), u_n−1 =
an−1 ⊕ bn−1, and FIX = anbn ∨ bnan−1 ∨ anbn−1, respectively. Based on the
aforementioned, our proposed weighted modulo 2n + 1 addition of A and B is
equivalent to
44
CHAPTER-5 5.1 RESULTS AND ANALYSIS
5.1.1 The wave form of translator is given below:
5.1.2 The output wave form of modulo 2n+1 adder without correction scheme is shown below:
45
5.1.3 The output waveform of modulo 2n+1 with corection scheme is given below
Sum of numbers upto 256 (8bit) wil come as usual then the value wil come zero,one respectively.
5.2 SYNTHESIS REPORT
5.2.1 SKLANSKY-STYLE PARELLEL PREFIX STRUCTURE Target Device:XA3S250E-4VQG100.
NUMBER OF SLICES: 30 OUT OF 2448 1%
NUMBER OF 4 INPUT LUTS: 52 OUT OF 4896 1%
NUMBER OF IOS: 27
NUMBER OF BONDED IOBS: 27 OUT OF 66 40%
46
5.2.2 BRENT –KUNG STYLE PARELLEL PREFIX STRUCTURE
Target Device:XA3S250E-4VQG100.
NUMBER OF SLICES: 24 OUT OF 2448 0%NUMBER OF 4 INPUT LUTS: 44 OUT OF 4896 0%NUMBER OF IOS: 27NUMBER OF BONDED IOBS: 27 OUT OF 66 40%
47
CHAPTER -6
CONCLUSION
An improved area-efficient weighted modulo 2n + 1 adder has been designed
with brent –kung parallel prefix tree based diminished adder. This has been achieved
by modifying the existing diminished-1 modulo adders to incorporate simple
correction schemes. The proposed adders can perform weighted modulo 2n + 1
addition and produce sums that are within the range {0, 2n}. The area cost for our
proposed adders is lower. In addition, proposed adders do not require the hardware
for zero detection that is needed in diminished-1 modulo 2n + 1 addition. This is
achieved by subtracting the sum of two (n + 1)-bit input numbers by the constant 2n +
1 and producing carry and sum vectors. The modulo 2n + 1 addition can then be
performed using parallel-prefix structure diminished-1 adders by taking in the sum
and carry vectors plus the inverted end-around carry with simple correction
schemes.Correction scheme include a fix value if the input value is higher than a
particular value then the value of fix is 1 othervise it will show zero.The main module
used are translator ,diminished -1 adder and a correction sheme.. The area cost for
our proposed adders is lower. In addition, our proposed adders do not require the
hardware for zero detection that is needed in diminished-1 modulo 2n + 1
addition .Brent prefix structure uses only less area when compared with the
skylansky prefix structure. The proposed adders has been implemented using 0.13-
μm CMOS technology, and the area required for our adders is lesser than previously
reported weighted modulo 2n + 1 adders with the same delay constraints. Synthesis
results show that our proposed adders can outperform previously reported weighted
modulo adder in terms of area under the same delay constraints.
REFERENCE 48
[1] H.T.Vergos and C.Efstathiou,”A unifying approach for weighted and diminished-1
modulo 2n+1 addition”IEEE Trans.circuit system 0ct 2008.
[2] M.A.soderstrand,W.K.Jenkins,”Residue Number System Arithmetic” Modern
application in Digital Signal Processing.
[3] . Somayeh Timarchi, Keivan Navi “Improved Modulo 2n +1 Adder Design “
International Journal of Computer and Information Engineering 2:7 2008
[4] Jun chen “ parallel-prefix structures for binary and modulo “ december, 2008.
[5] Zimmermann and David Q. Tran “Asilomar optimized synthesis of sum-of-
products” Reto Conference on Signals, Systems, and Computers, November 2003
[6] Feng Liu∗, Fariborz F.F†, Otmane Ait Mohamed “A Comparative Study of
Parallel Prefix Adders in FPGA Implementation of EAC” 2009 12th Euromicro
Conference on Digital System Design
[7] F. Liu, Q. Tan “Field programmable gate array prototyping of end-around carry
parallel prefix tree architectures “IET Computers & Digital Techniques Received on
27th March 2009
[8] J.Sklansky,”conditional sum addition logic”IRE Trans. Electron comput june 1960
[9] Amir Sabbagh Molahosseini, Keivan Navi, Chitra Dadkhah, Omid Kavehei, and
Somayeh Timarchi .“Efficient Reverse Converter Designs for the New 4-Moduli Sets”
IEEE transactions on circuits and systems, april 2010.
[10]”Residue number system” world scientific publishing Pvt.Ltd.
http://www.worldscibooks.com/engineering/p523.html
[11] H T Vergos, D Nicholas “Diminished –one modulo 2n+1 adder design” IEEE Tran
comput.Dec 2002.
[12] T. B Juang,M Y Tsai “Corrections on VLSI Design of diminished –one modulo
2n+1 adder using circular carry selection.
[13] R. Zimmermann, “Efficient VLSI implementation of modulo 2n ± 1 addition and
multiplication,” in Proc. 14th IEEE Symp. Comput. Arithmetic,Apr. 1999,.
[14] A. S. Madhukumar and F. Chin, “Enhanced architecture for residue number
system-based CDMA for high-rate data transmission,” IEEE Trans. Wireless
Communn Sep. 2004
[15] G. L. Bernocchi, G. C. Cardarilli, “Low-power adaptive filter based on RNS
components,” in Proc. IEEE ISCAS, May 2007, ..
49
[16] N. Kostaras and H. T. Vergos, “KoVer: A sophisticated residue arithmetic core
generator,” in Proc. 16th IEEE Int. Workshop Rapid Syst.Prototyp., 2005,
[17] T. Keller, T. H. Liew, and L. Hanzo, “Adaptive redundant residue number system
coded multicarrier modulation,” IEEE J. Sel. Areas Commun., , Nov. 2000.
[18] G. C. Cardarilli, A. Nannarelli, and M. Re, “Reducing power dissipation in FIR
filters using the residue number system,” in Proc. IEEE 43rd IEEE Midw. Symp.
Circuits Syst.jan 2000,
50