Upload
yermakov-vadim-ivanovich
View
215
Download
0
Embed Size (px)
Citation preview
7/27/2019 32 Bits
1/5
LETTERS
International Journal of Recent Trends in Engineering, Vol 2, No. 6, November 2009
58
A Novel Power Delay Optimized 32-bit Parallel
Prefix Adder For High Speed ComputingP.Ramanathan
1, P.T.Vanathi
2
1PSG College of Technology / Department of ECE, Coimbatore, India.
Email: [email protected] PSG College of Technology / Department of ECE, Coimbatore, India.
Email: [email protected]
AbstractParallel Prefix addition is a technique forimproving the speed of binary addition. Due to continuing
integrating intensity and the growing needs of portable
devices, low-power and high-performance designs are of
prime importance. The classical parallel prefix adder
structures presented in the literature over the years
optimize for logic depth, area, fan-out and interconnect
count of logic circuits. In this paper, a new architecture for
performing 32-bit Parallel Prefix addition is proposed. The
proposed 32-bit prefix adder is compared with several
classical adders of same bit width in terms of power, delay
and number of computational nodes. The results reveal that
the proposed 32-bit Parallel Prefix adder has the least
power delay product when compared with its peer existing
Prefix adder structures. Tanner EDA tool was used for
simulating the adder designs in the TSMC 180 nm and
TSMC 130 nm technologies.
Index Terms Parallel Prefix Adder, Dot operator, Semi-
Dot operator, CMOS, Odd-dot operator, Even-dot operator,
Odd-semi-dot operator, Even-semi-dot operator
I. INTRODUCTION
VLSI Integer adders find applications in Arithmetic
and Logic units (ALUs), microprocessors and memoryaddressing units. Speed of the adder often decides theminimum clock cycle time in a microprocessor. The needfor a Parallel Prefix adder is that it is primarily fast whencompared with ripple carry adders. Parallel Prefix adders(PPA) are family of adders derived from the commonly
known carry look ahead adders. These adders are bestsuited for adders with wider word lengths. PPA circuits
use a tree network to reduce the latencyto
2(log )O n where n represents the number of bits. A
three step process is generally involved in theconstruction of a PPA. The first step involves the creation
of generate and propagate signals for the input operandbits. The second step involves the generation of carrysignals. In PPA, the dot operator and the semi-dotoperator are introduced. The dot operator isdefined by the equation (1) and the semi-dot operator is defined by the equation (2)
1 1 1 1( , ) ( , ) ( , )i i i i i i i i ip g p g p p g p g = + (1)
1 1 1( , ) ( , ) ( )i i i i i i ip g p g g p g = + (2)
In the above equation, operator is applied on twopairs of bits ( , )
i ip g and 1 1( , )i ip g . These bits
represent generate and propagate signals used in addition.The output of the operator is a new pair of bits which is
again combined using a dot operator
or semi-dotoperator with another pairs of bits. This proceduraluse of dot operator and semi-dot operator createsa prefix tree network which ultimately ends in thegeneration of all carry signals. In the final step, the sumbits of the adder are generated with the propagate signalsof the operand bits and the preceding stage carry bit usinga xor gate. The semi-dot operator will be present aslast computation node in each column of the prefix graphstructures, where it is essential to compute only generateterm, whose value is the carry generated from that bit tothe succeeding bit.
The structure of the prefix network specifies the type of
the PPA. The Prefix network described by Haiku Zhu,Chung-Kuan Cheng and Ronald Graham [1], has theminimal depth for a given n bit adder. Optimallogarithmic adder structures with a fan-out of two forminimizing the area-delay product is presented by
Matthew Ziegler and Mircea Stan [2]. The Sklanskyadder [3] presents a minimum depth prefix network at the
cost of increased fan-out for certain computation nodes.The algorithm invented by Kogge-Stone [4] has bothoptimal depth and low fan-out but produces massivelycomplex circuit realizations and also account for large
number of interconnects. Brent-Kung adder [5] has themerit of minimal number of computation nodes, which
yields in reduced area but structure has maximum depthwhich yields slight increase in latency when comparedwith other structures. The Han-Carlson adder [6]combines Brent-Kung and Kogge-Stone structures toachieve a balance between logic depth and interconnectcount. Knowles [7] presented a class of logarithmic
adders with minimum depth by allowing the fan-out togrow. Ladner and Fischer [8] proposed a general methodto construct a prefix network with slightly higher depthwhen compared with Sklansky topology but achievedsome merit by reducing the maximum fan-out forcomputation nodes in the critical path. Related work on
PPA literature such as Ling adder [9], achieve improved
performance gains by changing the equation of the dotoperator . Taxonomy of classical Prefix ParallelAdders based on fan-out, interconnect count and depth
2009 ACADEMY PUBLISHER
7/27/2019 32 Bits
2/5
LETTERS
International Journal of Recent Trends in Engineering, Vol 2, No. 6, November 2009
59
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Stage 1
Stage 2
Stage 3
Stage 4
Stage 5
C15C14C13 C12C 11 C10 C9 C8 C7 C6C5 C4 C3 C2 C0
C1
Inputs
Outputs
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
C31 C30 C29 C28 C27C26C25 C24C23C22 C21C20 C19C18 C16C17
Stage 6
Stage 7
Stage 8
Stage 9
characteristics has been presented by Harris [10]. Sparsetree binary adder proposed by Yan Sun and etal [11]combines the benefits of prefix adder and carry saveadder. Integer Linear Programming method to buildparallel prefix adders is proposed by Jainhau Liu and etal
[12]. A new technique to achieve energy savings forKogge-Stone adder was proposed by Frustaci and etal
[14]. A novel Parallel Prefix adder for 32-bits has beenproposed. The proposed structure has the least powerdelay product amongst all its peer ones.
II. EXISTING PARALLEL PREFIX ADDERS
Brent Kung adder is oriented towards simpler treestructure with a fewer computation nodes. Kogge-Stoneadder possesses a regular layout and is preferred for highperformance applications. Han-Carlson adder the reducesthe hardware complexity when compared to that of a
Kogge-Stone adder but at the cost of introducing aadditional stage to its carry merge path. In Sklanskyadder, the fan-out from the inputs to outputs along thecritical path increases drastically which introduceslatency in the circuit. Ladner-Fischer adder is animproved version of Sklansky adder, where the maximumfan-out is reduced. Table I summarizes the dataregarding the requirement of number of computation
nodes and logic depth for various existing Parallel Prefixadders. Let n be the word-length of the adder in termsof bits.
III. PROPOSED 32-BITPARALLELREFIXADDER
Fig. 1 shows the architecture of the proposed 32-bitparallel prefix adder. The objective is to eliminate themassive overlap between the prefix sub-terms beingcomputed. Hence the associate property of the dotoperator is employed to keep the number of computationnodes at a minimum. The first stage of the computation
is called as pre-processing. The first stage in thearchitectures of the proposed 32-bit prefix adder involvethe creation of generate and propagate signals forindividual operand bits in active low format. Theequations (3) and (4) represent the functionality of thefirst stage.
TABLE I.COMPARATIVE STUDY ON EXISTING PREFIX ADDERS
Adder Type Number of Computation Nodes Logic Depth
Brent-Kung 2[2* 2 log ]n n 2[(2*log ) 2]n
Kogge-Stone 2[( * log ) 1]n n n + 2log n
Han-Carlson2[ *(log )]
2
nn 2[(log ) 1]n +
Ladner-Fischer2[ *(log )]
2
nn 2[(log ) 1]n +
Sklansky2[ *(log )]
2n n 2log n
Figure 1. Proposed 32-bit Parallel Prefix Adder
( )i i i
G a b= (3)
i i i i iP a b a b= = (4)
In the above equations, ,i ia b represent input operand
bits for the adder, where i varies from 0 to 31. Thesecond stage in the prefix addition is termed as prefixcomputation. This stage is responsible for creation ofgroup generate and group propagate signals. For deriving
the carry signals in the second stage, this architectureintroduces four different computation nodes for achieving
improved performance. There are two cells designed forthe dot operator. First cell for the dot operator named
odd-dot represented by a , works with active lowinputs and generates active high outputs.
The second cell for the dot operator named even-dot
represented by a , works with active high inputs and
generates active low outputs. Similarly, there are twocells designed for the semi-dot operator. First cell for the
semi-dot operator named odd-semi-dot represented by a , works with active low inputs and generates activehigh inputs. The second cell for the semi-dot operatornamed even-semi-dot represented by a , works with
active high inputs and generates active low outputs. Thestages with odd indexes use odd-dot and odd-semi-dotcells where as the stages with even indexes use even-dotand even-semi-dot cells. The equations (5) and (6)represent the functionality of odd-dot and even-dot cellsrespectively.
1 1
1 1
1 1
( , ) ( , ) ( , )
( , ( )
( , )
i i i i
i i i i i
i i i i i
P G p g p g
p p g p g
p p g p g
=
= + +
= +
(5)
2009 ACADEMY PUBLISHER
7/27/2019 32 Bits
3/5
LETTERS
International Journal of Recent Trends in Engineering, Vol 2, No. 6, November 2009
60
1 1
1 1
( , ) ( , ) ( , )
( , )
i i i i
i i i i i
P G p g p g
p p g p g
=
= +
(6)
The equations (7) and (8) represent the functionality ofodd-semi-dot and even-semi-dot cells respectively.
1 1
1
( ) ( , ) ( ) ( ( )
( )
i i i i i i
i i i i
G p g g g p g
g p g c
= = +
= + = (7)
1 1( ) ( , ) ( ) ( )
i i i i i i iG p g g g p g c = = + = (8)
CMOS logic family will implement only inverting
functions. Thus cascading odd cells and even cellsalternatively gives the benefit of elimination of twoinverters between them, if a dot or a semi-dot
computation node in an odd stage receives both of itsinput edges from any of the even stages and vice-versa.But it is essential to introduce two inverters in a path, if a
dot or a semi-dot computation node in an even stagereceives any of its edges from any of the even stages andvice-versa. From the prefix graph of the proposedstructure shown in Figure. 1, we infer that there are only
few edges with a pair of inverters, to make ( , )G P as
( , )G P or to make ( , )G P as ( , )G P respectively. The
pair of inverters in a path is represented by a in theprefix graph. By introducing two cells for dot operatorand two cells for semi-dot operator, we have eliminated a
large number of inverters. Due to inverter elimination inpaths, the propagation delay in those paths would havereduced. Further we achieve a benefit in power reduction,since these inverters if not eliminated, would havecontributed to significant amount of power dissipationdue to switching. The output of the odd-semi-dot cells
gives the value of the carry signal in that correspondingbit position. The output of the even-semi-dot cell gives
the complemented value of carry signal in thatcorresponding bit position. The final stage in the prefixaddition is termed as post-processing. The final stageinvolves generation of sum bits from the active low
Propagate signals of the individual operand bits and the
carry bits generated in true form or complement form.The proposed 32-bit structure has a maximum fan-out of6 and a logic depth of 9. The first stage and last stageare intrinsically fast because they involve only simpleoperations on signals local to each bit position. The
intermediate stage embodies long distance propagation ofcarries, so the performance of the adder depends on the
intermediate stage. The lateral fan-out slightly increases,but we get an advantage of limited interconnect lengthssince the tree grows along the main diagonal.
III. SIMULATION ENVIRONMENT
Simulation of various Parallel Prefix Adder designswere carried out with Tanner EDA tool using TSMC 180nm and TSMC 130 nm technologies. All the ParallelPrefix Adder structures were implemented using CMOS
logic family. The aspect ratio of the MOS transistor
devices were chosen such that 3*n p
W W
L L
=
. For
TSMC 180 nm technology, threshold voltages of NMOS
and PMOS transistors are around 0.3694 V and -0.3944V respectively and the supply voltage was kept at 1.8 V..For TSMC 130 nm technology, threshold voltages of
NMOS and PMOS transistors are around 0.332 V and -0.3499 V respectively and the supply voltage was kept at1.3 V. The parameters considered for comparison arepower consumption, worst case delay and power-delayproduct. The various PPA structures were then comparedwith the number of transistors needed for each of the
circuit realizations.
IV. RESULTS AND DISCUSSION
Table II lists the structural characteristics for each ofthe 32-bit Parallel Prefix adders. From the table it isobserved that the proposed 32-bit Parallel Prefix adder
has the least number of computation nodes amongst allother peer designs. Structure of the proposed 32-bit adderalso reveals that the prefix tree builds along the maindiagonal after the first two stages.
TABLE II.CHARACTERISTICS OF 32-BIT PARALLEL PREFIX ADDERS
Adder TypeNumber of Computation Nodes
Logic DepthDot Semi-Dot
Brent-Kung 26 31 8
Kogge-Stone 98 31 5
Han-Carlson 33 31 6
Ladner-Fischer 33 31 6
Sklansky 33 31 5
Proposed 23 31 9
Table III and Table IV list out the performancecomparison of the various 32-bit Parallel Prefix adders in
180 m technology and 130 m technology.
TABLE III.PERFORMANCE COMPARISON OF 32-BIT PARALLEL PREFIX
ADDERS USING TSMC 180 NM TECHNOLOGY
Adder Name
Average
Power(W)
Delay(ns)
Power-Delay
Product( X 10-15
Joules)
Brent-Kung 228.5272 1.05 239.9536
Kogge-Stone 342.1976 0.79 270.3361
Han-Carlson 253.1247 0.89 225.2810
Sklansky 261.1407 0.70 184.8985
Ladner-Fischer 232.7179 0.91 211.7733Proposed 211.9441 0.85 180.1525
2009 ACADEMY PUBLISHER
7/27/2019 32 Bits
4/5
LETTERS
International Journal of Recent Trends in Engineering, Vol 2, No. 6, November 2009
61
TABLE IV.PERFORMANCE COMPARISON OF 32-BIT PARALLEL PREFIX
ADDERS USING TSMC 130 NM TECHNOLOGY
Adder NameAveragePower
(W)
Delay(ns)
Power-Delay
Product( X 10-15Joules)
Brent-Kung 67.02993 0.88 58.9863
Kogge-Stone 101.4262 0.62 62.8842
Han-Carlson 76.92042 0.68 52.3059
Sklansky 78.23902 0.53 41.4667
Ladner-Fischer 69.72199 0.67 46.7137
Proposed 62.9453 0.65 40.9144
Fig. 2 and Fig. 3 show the Power and Power DelayProduct comparison of various PPA Designs simulatedusing TSMC 180 nm Technology respectively.
Figure 2. Power Comparison of various PPA Designs in 180 nm
Figure 3. Power Delay Product of various PPA Designs in 180 nm
Figure 4. Power Comparison of various PPA Designs in 130 nm
Figure 5. Power Delay Product of various PPA Designs in 130nm
V. CONCLUSIONS
It is observed that the Proposed 32-bit Parallel Prefixadder has the least power delay product when compared
with its peer existing ones. The Proposed structure offers7.2 % power savings is achieved when compared withBrent Kung adder. The Proposed design offers 38 %
2009 ACADEMY PUBLISHER
7/27/2019 32 Bits
5/5