32 Bits

7/27/2019 32 Bits

1/5

LETTERS

International Journal of Recent Trends in Engineering, Vol 2, No. 6, November 2009

58

A Novel Power Delay Optimized 32-bit Parallel

Prefix Adder For High Speed ComputingP.Ramanathan

1, P.T.Vanathi

2

1PSG College of Technology / Department of ECE, Coimbatore, India.

Email: [email protected] PSG College of Technology / Department of ECE, Coimbatore, India.

Email: [email protected]

AbstractParallel Prefix addition is a technique forimproving the speed of binary addition. Due to continuing

integrating intensity and the growing needs of portable

devices, low-power and high-performance designs are of

prime importance. The classical parallel prefix adder

structures presented in the literature over the years

optimize for logic depth, area, fan-out and interconnect

count of logic circuits. In this paper, a new architecture for

performing 32-bit Parallel Prefix addition is proposed. The

proposed 32-bit prefix adder is compared with several

classical adders of same bit width in terms of power, delay

and number of computational nodes. The results reveal that

the proposed 32-bit Parallel Prefix adder has the least

power delay product when compared with its peer existing

Prefix adder structures. Tanner EDA tool was used for

simulating the adder designs in the TSMC 180 nm and

TSMC 130 nm technologies.

Index Terms Parallel Prefix Adder, Dot operator, Semi-

Dot operator, CMOS, Odd-dot operator, Even-dot operator,

Odd-semi-dot operator, Even-semi-dot operator

I. INTRODUCTION

VLSI Integer adders find applications in Arithmetic

and Logic units (ALUs), microprocessors and memoryaddressing units. Speed of the adder often decides theminimum clock cycle time in a microprocessor. The needfor a Parallel Prefix adder is that it is primarily fast whencompared with ripple carry adders. Parallel Prefix adders(PPA) are family of adders derived from the commonly

known carry look ahead adders. These adders are bestsuited for adders with wider word lengths. PPA circuits

use a tree network to reduce the latencyto

2(log )O n where n represents the number of bits. A

three step process is generally involved in theconstruction of a PPA. The first step involves the creation

of generate and propagate signals for the input operandbits. The second step involves the generation of carrysignals. In PPA, the dot operator and the semi-dotoperator are introduced. The dot operator isdefined by the equation (1) and the semi-dot operator is defined by the equation (2)

1 1 1 1( , ) ( , ) ( , )i i i i i i i i ip g p g p p g p g = + (1)

1 1 1( , ) ( , ) ( )i i i i i i ip g p g g p g = + (2)

In the above equation, operator is applied on twopairs of bits ( , )

i ip g and 1 1( , )i ip g . These bits

represent generate and propagate signals used in addition.The output of the operator is a new pair of bits which is

again combined using a dot operator

or semi-dotoperator with another pairs of bits. This proceduraluse of dot operator and semi-dot operator createsa prefix tree network which ultimately ends in thegeneration of all carry signals. In the final step, the sumbits of the adder are generated with the propagate signalsof the operand bits and the preceding stage carry bit usinga xor gate. The semi-dot operator will be present aslast computation node in each column of the prefix graphstructures, where it is essential to compute only generateterm, whose value is the carry generated from that bit tothe succeeding bit.

The structure of the prefix network specifies the type of

the PPA. The Prefix network described by Haiku Zhu,Chung-Kuan Cheng and Ronald Graham [1], has theminimal depth for a given n bit adder. Optimallogarithmic adder structures with a fan-out of two forminimizing the area-delay product is presented by

Matthew Ziegler and Mircea Stan [2]. The Sklanskyadder [3] presents a minimum depth prefix network at the

cost of increased fan-out for certain computation nodes.The algorithm invented by Kogge-Stone [4] has bothoptimal depth and low fan-out but produces massivelycomplex circuit realizations and also account for large

number of interconnects. Brent-Kung adder [5] has themerit of minimal number of computation nodes, which

yields in reduced area but structure has maximum depthwhich yields slight increase in latency when comparedwith other structures. The Han-Carlson adder [6]combines Brent-Kung and Kogge-Stone structures toachieve a balance between logic depth and interconnectcount. Knowles [7] presented a class of logarithmic

adders with minimum depth by allowing the fan-out togrow. Ladner and Fischer [8] proposed a general methodto construct a prefix network with slightly higher depthwhen compared with Sklansky topology but achievedsome merit by reducing the maximum fan-out forcomputation nodes in the critical path. Related work on

PPA literature such as Ling adder [9], achieve improved

performance gains by changing the equation of the dotoperator . Taxonomy of classical Prefix ParallelAdders based on fan-out, interconnect count and depth

2009 ACADEMY PUBLISHER

7/27/2019 32 Bits

2/5

LETTERS


59

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Stage 1

Stage 2

Stage 3

Stage 4

Stage 5

C15C14C13 C12C 11 C10 C9 C8 C7 C6C5 C4 C3 C2 C0

C1

Inputs

Outputs

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16

C31 C30 C29 C28 C27C26C25 C24C23C22 C21C20 C19C18 C16C17

Stage 6

Stage 7

Stage 8

Stage 9

characteristics has been presented by Harris [10]. Sparsetree binary adder proposed by Yan Sun and etal [11]combines the benefits of prefix adder and carry saveadder. Integer Linear Programming method to buildparallel prefix adders is proposed by Jainhau Liu and etal

[12]. A new technique to achieve energy savings forKogge-Stone adder was proposed by Frustaci and etal

[14]. A novel Parallel Prefix adder for 32-bits has beenproposed. The proposed structure has the least powerdelay product amongst all its peer ones.

II. EXISTING PARALLEL PREFIX ADDERS

Brent Kung adder is oriented towards simpler treestructure with a fewer computation nodes. Kogge-Stoneadder possesses a regular layout and is preferred for highperformance applications. Han-Carlson adder the reducesthe hardware complexity when compared to that of a

Kogge-Stone adder but at the cost of introducing aadditional stage to its carry merge path. In Sklanskyadder, the fan-out from the inputs to outputs along thecritical path increases drastically which introduceslatency in the circuit. Ladner-Fischer adder is animproved version of Sklansky adder, where the maximumfan-out is reduced. Table I summarizes the dataregarding the requirement of number of computation

nodes and logic depth for various existing Parallel Prefixadders. Let n be the word-length of the adder in termsof bits.

III. PROPOSED 32-BITPARALLELREFIXADDER

Fig. 1 shows the architecture of the proposed 32-bitparallel prefix adder. The objective is to eliminate themassive overlap between the prefix sub-terms beingcomputed. Hence the associate property of the dotoperator is employed to keep the number of computationnodes at a minimum. The first stage of the computation

is called as pre-processing. The first stage in thearchitectures of the proposed 32-bit prefix adder involvethe creation of generate and propagate signals forindividual operand bits in active low format. Theequations (3) and (4) represent the functionality of thefirst stage.

TABLE I.COMPARATIVE STUDY ON EXISTING PREFIX ADDERS

Adder Type Number of Computation Nodes Logic Depth

Brent-Kung 2[2* 2 log ]n n 2[(2*log ) 2]n

Kogge-Stone 2[( * log ) 1]n n n + 2log n

Han-Carlson2[ *(log )]

2

nn 2[(log ) 1]n +

Ladner-Fischer2[ *(log )]

2

nn 2[(log ) 1]n +

Sklansky2[ *(log )]

2n n 2log n

Figure 1. Proposed 32-bit Parallel Prefix Adder

( )i i i

G a b= (3)

i i i i iP a b a b= = (4)

In the above equations, ,i ia b represent input operand

bits for the adder, where i varies from 0 to 31. Thesecond stage in the prefix addition is termed as prefixcomputation. This stage is responsible for creation ofgroup generate and group propagate signals. For deriving

the carry signals in the second stage, this architectureintroduces four different computation nodes for achieving

improved performance. There are two cells designed forthe dot operator. First cell for the dot operator named

odd-dot represented by a , works with active lowinputs and generates active high outputs.

The second cell for the dot operator named even-dot

represented by a , works with active high inputs and

generates active low outputs. Similarly, there are twocells designed for the semi-dot operator. First cell for the

semi-dot operator named odd-semi-dot represented by a , works with active low inputs and generates activehigh inputs. The second cell for the semi-dot operatornamed even-semi-dot represented by a , works with

active high inputs and generates active low outputs. Thestages with odd indexes use odd-dot and odd-semi-dotcells where as the stages with even indexes use even-dotand even-semi-dot cells. The equations (5) and (6)represent the functionality of odd-dot and even-dot cellsrespectively.

1 1

1 1

1 1

( , ) ( , ) ( , )

( , ( )

( , )

i i i i

i i i i i

i i i i i

P G p g p g

p p g p g

p p g p g

=

= + +

= +

(5)


7/27/2019 32 Bits

3/5

LETTERS


60

1 1

1 1

( , ) ( , ) ( , )

( , )

i i i i

i i i i i

P G p g p g

p p g p g

=

= +

(6)

The equations (7) and (8) represent the functionality ofodd-semi-dot and even-semi-dot cells respectively.

1 1

1

( ) ( , ) ( ) ( ( )

( )

i i i i i i

i i i i

G p g g g p g

g p g c

= = +

= + = (7)

1 1( ) ( , ) ( ) ( )

i i i i i i iG p g g g p g c = = + = (8)

CMOS logic family will implement only inverting

functions. Thus cascading odd cells and even cellsalternatively gives the benefit of elimination of twoinverters between them, if a dot or a semi-dot

computation node in an odd stage receives both of itsinput edges from any of the even stages and vice-versa.But it is essential to introduce two inverters in a path, if a

dot or a semi-dot computation node in an even stagereceives any of its edges from any of the even stages andvice-versa. From the prefix graph of the proposedstructure shown in Figure. 1, we infer that there are only

few edges with a pair of inverters, to make ( , )G P as

( , )G P or to make ( , )G P as ( , )G P respectively. The

pair of inverters in a path is represented by a in theprefix graph. By introducing two cells for dot operatorand two cells for semi-dot operator, we have eliminated a

large number of inverters. Due to inverter elimination inpaths, the propagation delay in those paths would havereduced. Further we achieve a benefit in power reduction,since these inverters if not eliminated, would havecontributed to significant amount of power dissipationdue to switching. The output of the odd-semi-dot cells

gives the value of the carry signal in that correspondingbit position. The output of the even-semi-dot cell gives

the complemented value of carry signal in thatcorresponding bit position. The final stage in the prefixaddition is termed as post-processing. The final stageinvolves generation of sum bits from the active low

Propagate signals of the individual operand bits and the

carry bits generated in true form or complement form.The proposed 32-bit structure has a maximum fan-out of6 and a logic depth of 9. The first stage and last stageare intrinsically fast because they involve only simpleoperations on signals local to each bit position. The

intermediate stage embodies long distance propagation ofcarries, so the performance of the adder depends on the

intermediate stage. The lateral fan-out slightly increases,but we get an advantage of limited interconnect lengthssince the tree grows along the main diagonal.

III. SIMULATION ENVIRONMENT

Simulation of various Parallel Prefix Adder designswere carried out with Tanner EDA tool using TSMC 180nm and TSMC 130 nm technologies. All the ParallelPrefix Adder structures were implemented using CMOS

logic family. The aspect ratio of the MOS transistor

devices were chosen such that 3*n p

W W

L L

=

. For

TSMC 180 nm technology, threshold voltages of NMOS

and PMOS transistors are around 0.3694 V and -0.3944V respectively and the supply voltage was kept at 1.8 V..For TSMC 130 nm technology, threshold voltages of

NMOS and PMOS transistors are around 0.332 V and -0.3499 V respectively and the supply voltage was kept at1.3 V. The parameters considered for comparison arepower consumption, worst case delay and power-delayproduct. The various PPA structures were then comparedwith the number of transistors needed for each of the

circuit realizations.

IV. RESULTS AND DISCUSSION

Table II lists the structural characteristics for each ofthe 32-bit Parallel Prefix adders. From the table it isobserved that the proposed 32-bit Parallel Prefix adder

has the least number of computation nodes amongst allother peer designs. Structure of the proposed 32-bit adderalso reveals that the prefix tree builds along the maindiagonal after the first two stages.

TABLE II.CHARACTERISTICS OF 32-BIT PARALLEL PREFIX ADDERS

Adder TypeNumber of Computation Nodes

Logic DepthDot Semi-Dot

Brent-Kung 26 31 8

Kogge-Stone 98 31 5

Han-Carlson 33 31 6

Ladner-Fischer 33 31 6

Sklansky 33 31 5

Proposed 23 31 9

Table III and Table IV list out the performancecomparison of the various 32-bit Parallel Prefix adders in

180 m technology and 130 m technology.

TABLE III.PERFORMANCE COMPARISON OF 32-BIT PARALLEL PREFIX

ADDERS USING TSMC 180 NM TECHNOLOGY

Adder Name

Average

Power(W)

Delay(ns)

Power-Delay

Product( X 10-15

Joules)

Brent-Kung 228.5272 1.05 239.9536

Kogge-Stone 342.1976 0.79 270.3361

Han-Carlson 253.1247 0.89 225.2810

Sklansky 261.1407 0.70 184.8985

Ladner-Fischer 232.7179 0.91 211.7733Proposed 211.9441 0.85 180.1525


7/27/2019 32 Bits

4/5

LETTERS


61

TABLE IV.PERFORMANCE COMPARISON OF 32-BIT PARALLEL PREFIX

ADDERS USING TSMC 130 NM TECHNOLOGY

Adder NameAveragePower

(W)

Delay(ns)

Power-Delay

Product( X 10-15Joules)

Brent-Kung 67.02993 0.88 58.9863

Kogge-Stone 101.4262 0.62 62.8842

Han-Carlson 76.92042 0.68 52.3059

Sklansky 78.23902 0.53 41.4667

Ladner-Fischer 69.72199 0.67 46.7137

Proposed 62.9453 0.65 40.9144

Fig. 2 and Fig. 3 show the Power and Power DelayProduct comparison of various PPA Designs simulatedusing TSMC 180 nm Technology respectively.

Figure 2. Power Comparison of various PPA Designs in 180 nm

Figure 3. Power Delay Product of various PPA Designs in 180 nm

Figure 4. Power Comparison of various PPA Designs in 130 nm

Figure 5. Power Delay Product of various PPA Designs in 130nm

V. CONCLUSIONS

It is observed that the Proposed 32-bit Parallel Prefixadder has the least power delay product when compared

with its peer existing ones. The Proposed structure offers7.2 % power savings is achieved when compared withBrent Kung adder. The Proposed design offers 38 %


7/27/2019 32 Bits

5/5

Documents

32 Bits