32 Bits

Embed Size (px)

Citation preview

  • 7/27/2019 32 Bits

    1/5

    LETTERS

    International Journal of Recent Trends in Engineering, Vol 2, No. 6, November 2009

    58

    A Novel Power Delay Optimized 32-bit Parallel

    Prefix Adder For High Speed ComputingP.Ramanathan

    1, P.T.Vanathi

    2

    1PSG College of Technology / Department of ECE, Coimbatore, India.

    Email: [email protected] PSG College of Technology / Department of ECE, Coimbatore, India.

    Email: [email protected]

    AbstractParallel Prefix addition is a technique forimproving the speed of binary addition. Due to continuing

    integrating intensity and the growing needs of portable

    devices, low-power and high-performance designs are of

    prime importance. The classical parallel prefix adder

    structures presented in the literature over the years

    optimize for logic depth, area, fan-out and interconnect

    count of logic circuits. In this paper, a new architecture for

    performing 32-bit Parallel Prefix addition is proposed. The

    proposed 32-bit prefix adder is compared with several

    classical adders of same bit width in terms of power, delay

    and number of computational nodes. The results reveal that

    the proposed 32-bit Parallel Prefix adder has the least

    power delay product when compared with its peer existing

    Prefix adder structures. Tanner EDA tool was used for

    simulating the adder designs in the TSMC 180 nm and

    TSMC 130 nm technologies.

    Index Terms Parallel Prefix Adder, Dot operator, Semi-

    Dot operator, CMOS, Odd-dot operator, Even-dot operator,

    Odd-semi-dot operator, Even-semi-dot operator

    I. INTRODUCTION

    VLSI Integer adders find applications in Arithmetic

    and Logic units (ALUs), microprocessors and memoryaddressing units. Speed of the adder often decides theminimum clock cycle time in a microprocessor. The needfor a Parallel Prefix adder is that it is primarily fast whencompared with ripple carry adders. Parallel Prefix adders(PPA) are family of adders derived from the commonly

    known carry look ahead adders. These adders are bestsuited for adders with wider word lengths. PPA circuits

    use a tree network to reduce the latencyto

    2(log )O n where n represents the number of bits. A

    three step process is generally involved in theconstruction of a PPA. The first step involves the creation

    of generate and propagate signals for the input operandbits. The second step involves the generation of carrysignals. In PPA, the dot operator and the semi-dotoperator are introduced. The dot operator isdefined by the equation (1) and the semi-dot operator is defined by the equation (2)

    1 1 1 1( , ) ( , ) ( , )i i i i i i i i ip g p g p p g p g = + (1)

    1 1 1( , ) ( , ) ( )i i i i i i ip g p g g p g = + (2)

    In the above equation, operator is applied on twopairs of bits ( , )

    i ip g and 1 1( , )i ip g . These bits

    represent generate and propagate signals used in addition.The output of the operator is a new pair of bits which is

    again combined using a dot operator

    or semi-dotoperator with another pairs of bits. This proceduraluse of dot operator and semi-dot operator createsa prefix tree network which ultimately ends in thegeneration of all carry signals. In the final step, the sumbits of the adder are generated with the propagate signalsof the operand bits and the preceding stage carry bit usinga xor gate. The semi-dot operator will be present aslast computation node in each column of the prefix graphstructures, where it is essential to compute only generateterm, whose value is the carry generated from that bit tothe succeeding bit.

    The structure of the prefix network specifies the type of

    the PPA. The Prefix network described by Haiku Zhu,Chung-Kuan Cheng and Ronald Graham [1], has theminimal depth for a given n bit adder. Optimallogarithmic adder structures with a fan-out of two forminimizing the area-delay product is presented by

    Matthew Ziegler and Mircea Stan [2]. The Sklanskyadder [3] presents a minimum depth prefix network at the

    cost of increased fan-out for certain computation nodes.The algorithm invented by Kogge-Stone [4] has bothoptimal depth and low fan-out but produces massivelycomplex circuit realizations and also account for large

    number of interconnects. Brent-Kung adder [5] has themerit of minimal number of computation nodes, which

    yields in reduced area but structure has maximum depthwhich yields slight increase in latency when comparedwith other structures. The Han-Carlson adder [6]combines Brent-Kung and Kogge-Stone structures toachieve a balance between logic depth and interconnectcount. Knowles [7] presented a class of logarithmic

    adders with minimum depth by allowing the fan-out togrow. Ladner and Fischer [8] proposed a general methodto construct a prefix network with slightly higher depthwhen compared with Sklansky topology but achievedsome merit by reducing the maximum fan-out forcomputation nodes in the critical path. Related work on

    PPA literature such as Ling adder [9], achieve improved

    performance gains by changing the equation of the dotoperator . Taxonomy of classical Prefix ParallelAdders based on fan-out, interconnect count and depth

    2009 ACADEMY PUBLISHER

  • 7/27/2019 32 Bits

    2/5

    LETTERS

    International Journal of Recent Trends in Engineering, Vol 2, No. 6, November 2009

    59

    15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

    Stage 1

    Stage 2

    Stage 3

    Stage 4

    Stage 5

    C15C14C13 C12C 11 C10 C9 C8 C7 C6C5 C4 C3 C2 C0

    C1

    Inputs

    Outputs

    31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16

    C31 C30 C29 C28 C27C26C25 C24C23C22 C21C20 C19C18 C16C17

    Stage 6

    Stage 7

    Stage 8

    Stage 9

    characteristics has been presented by Harris [10]. Sparsetree binary adder proposed by Yan Sun and etal [11]combines the benefits of prefix adder and carry saveadder. Integer Linear Programming method to buildparallel prefix adders is proposed by Jainhau Liu and etal

    [12]. A new technique to achieve energy savings forKogge-Stone adder was proposed by Frustaci and etal

    [14]. A novel Parallel Prefix adder for 32-bits has beenproposed. The proposed structure has the least powerdelay product amongst all its peer ones.

    II. EXISTING PARALLEL PREFIX ADDERS

    Brent Kung adder is oriented towards simpler treestructure with a fewer computation nodes. Kogge-Stoneadder possesses a regular layout and is preferred for highperformance applications. Han-Carlson adder the reducesthe hardware complexity when compared to that of a

    Kogge-Stone adder but at the cost of introducing aadditional stage to its carry merge path. In Sklanskyadder, the fan-out from the inputs to outputs along thecritical path increases drastically which introduceslatency in the circuit. Ladner-Fischer adder is animproved version of Sklansky adder, where the maximumfan-out is reduced. Table I summarizes the dataregarding the requirement of number of computation

    nodes and logic depth for various existing Parallel Prefixadders. Let n be the word-length of the adder in termsof bits.

    III. PROPOSED 32-BITPARALLELREFIXADDER

    Fig. 1 shows the architecture of the proposed 32-bitparallel prefix adder. The objective is to eliminate themassive overlap between the prefix sub-terms beingcomputed. Hence the associate property of the dotoperator is employed to keep the number of computationnodes at a minimum. The first stage of the computation

    is called as pre-processing. The first stage in thearchitectures of the proposed 32-bit prefix adder involvethe creation of generate and propagate signals forindividual operand bits in active low format. Theequations (3) and (4) represent the functionality of thefirst stage.

    TABLE I.COMPARATIVE STUDY ON EXISTING PREFIX ADDERS

    Adder Type Number of Computation Nodes Logic Depth

    Brent-Kung 2[2* 2 log ]n n 2[(2*log ) 2]n

    Kogge-Stone 2[( * log ) 1]n n n + 2log n

    Han-Carlson2[ *(log )]

    2

    nn 2[(log ) 1]n +

    Ladner-Fischer2[ *(log )]

    2

    nn 2[(log ) 1]n +

    Sklansky2[ *(log )]

    2n n 2log n

    Figure 1. Proposed 32-bit Parallel Prefix Adder

    ( )i i i

    G a b= (3)

    i i i i iP a b a b= = (4)

    In the above equations, ,i ia b represent input operand

    bits for the adder, where i varies from 0 to 31. Thesecond stage in the prefix addition is termed as prefixcomputation. This stage is responsible for creation ofgroup generate and group propagate signals. For deriving

    the carry signals in the second stage, this architectureintroduces four different computation nodes for achieving

    improved performance. There are two cells designed forthe dot operator. First cell for the dot operator named

    odd-dot represented by a , works with active lowinputs and generates active high outputs.

    The second cell for the dot operator named even-dot

    represented by a , works with active high inputs and

    generates active low outputs. Similarly, there are twocells designed for the semi-dot operator. First cell for the

    semi-dot operator named odd-semi-dot represented by a , works with active low inputs and generates activehigh inputs. The second cell for the semi-dot operatornamed even-semi-dot represented by a , works with

    active high inputs and generates active low outputs. Thestages with odd indexes use odd-dot and odd-semi-dotcells where as the stages with even indexes use even-dotand even-semi-dot cells. The equations (5) and (6)represent the functionality of odd-dot and even-dot cellsrespectively.

    1 1

    1 1

    1 1

    ( , ) ( , ) ( , )

    ( , ( )

    ( , )

    i i i i

    i i i i i

    i i i i i

    P G p g p g

    p p g p g

    p p g p g

    =

    = + +

    = +

    (5)

    2009 ACADEMY PUBLISHER

  • 7/27/2019 32 Bits

    3/5

    LETTERS

    International Journal of Recent Trends in Engineering, Vol 2, No. 6, November 2009

    60

    1 1

    1 1

    ( , ) ( , ) ( , )

    ( , )

    i i i i

    i i i i i

    P G p g p g

    p p g p g

    =

    = +

    (6)

    The equations (7) and (8) represent the functionality ofodd-semi-dot and even-semi-dot cells respectively.

    1 1

    1

    ( ) ( , ) ( ) ( ( )

    ( )

    i i i i i i

    i i i i

    G p g g g p g

    g p g c

    = = +

    = + = (7)

    1 1( ) ( , ) ( ) ( )

    i i i i i i iG p g g g p g c = = + = (8)

    CMOS logic family will implement only inverting

    functions. Thus cascading odd cells and even cellsalternatively gives the benefit of elimination of twoinverters between them, if a dot or a semi-dot

    computation node in an odd stage receives both of itsinput edges from any of the even stages and vice-versa.But it is essential to introduce two inverters in a path, if a

    dot or a semi-dot computation node in an even stagereceives any of its edges from any of the even stages andvice-versa. From the prefix graph of the proposedstructure shown in Figure. 1, we infer that there are only

    few edges with a pair of inverters, to make ( , )G P as

    ( , )G P or to make ( , )G P as ( , )G P respectively. The

    pair of inverters in a path is represented by a in theprefix graph. By introducing two cells for dot operatorand two cells for semi-dot operator, we have eliminated a

    large number of inverters. Due to inverter elimination inpaths, the propagation delay in those paths would havereduced. Further we achieve a benefit in power reduction,since these inverters if not eliminated, would havecontributed to significant amount of power dissipationdue to switching. The output of the odd-semi-dot cells

    gives the value of the carry signal in that correspondingbit position. The output of the even-semi-dot cell gives

    the complemented value of carry signal in thatcorresponding bit position. The final stage in the prefixaddition is termed as post-processing. The final stageinvolves generation of sum bits from the active low

    Propagate signals of the individual operand bits and the

    carry bits generated in true form or complement form.The proposed 32-bit structure has a maximum fan-out of6 and a logic depth of 9. The first stage and last stageare intrinsically fast because they involve only simpleoperations on signals local to each bit position. The

    intermediate stage embodies long distance propagation ofcarries, so the performance of the adder depends on the

    intermediate stage. The lateral fan-out slightly increases,but we get an advantage of limited interconnect lengthssince the tree grows along the main diagonal.

    III. SIMULATION ENVIRONMENT

    Simulation of various Parallel Prefix Adder designswere carried out with Tanner EDA tool using TSMC 180nm and TSMC 130 nm technologies. All the ParallelPrefix Adder structures were implemented using CMOS

    logic family. The aspect ratio of the MOS transistor

    devices were chosen such that 3*n p

    W W

    L L

    =

    . For

    TSMC 180 nm technology, threshold voltages of NMOS

    and PMOS transistors are around 0.3694 V and -0.3944V respectively and the supply voltage was kept at 1.8 V..For TSMC 130 nm technology, threshold voltages of

    NMOS and PMOS transistors are around 0.332 V and -0.3499 V respectively and the supply voltage was kept at1.3 V. The parameters considered for comparison arepower consumption, worst case delay and power-delayproduct. The various PPA structures were then comparedwith the number of transistors needed for each of the

    circuit realizations.

    IV. RESULTS AND DISCUSSION

    Table II lists the structural characteristics for each ofthe 32-bit Parallel Prefix adders. From the table it isobserved that the proposed 32-bit Parallel Prefix adder

    has the least number of computation nodes amongst allother peer designs. Structure of the proposed 32-bit adderalso reveals that the prefix tree builds along the maindiagonal after the first two stages.

    TABLE II.CHARACTERISTICS OF 32-BIT PARALLEL PREFIX ADDERS

    Adder TypeNumber of Computation Nodes

    Logic DepthDot Semi-Dot

    Brent-Kung 26 31 8

    Kogge-Stone 98 31 5

    Han-Carlson 33 31 6

    Ladner-Fischer 33 31 6

    Sklansky 33 31 5

    Proposed 23 31 9

    Table III and Table IV list out the performancecomparison of the various 32-bit Parallel Prefix adders in

    180 m technology and 130 m technology.

    TABLE III.PERFORMANCE COMPARISON OF 32-BIT PARALLEL PREFIX

    ADDERS USING TSMC 180 NM TECHNOLOGY

    Adder Name

    Average

    Power(W)

    Delay(ns)

    Power-Delay

    Product( X 10-15

    Joules)

    Brent-Kung 228.5272 1.05 239.9536

    Kogge-Stone 342.1976 0.79 270.3361

    Han-Carlson 253.1247 0.89 225.2810

    Sklansky 261.1407 0.70 184.8985

    Ladner-Fischer 232.7179 0.91 211.7733Proposed 211.9441 0.85 180.1525

    2009 ACADEMY PUBLISHER

  • 7/27/2019 32 Bits

    4/5

    LETTERS

    International Journal of Recent Trends in Engineering, Vol 2, No. 6, November 2009

    61

    TABLE IV.PERFORMANCE COMPARISON OF 32-BIT PARALLEL PREFIX

    ADDERS USING TSMC 130 NM TECHNOLOGY

    Adder NameAveragePower

    (W)

    Delay(ns)

    Power-Delay

    Product( X 10-15Joules)

    Brent-Kung 67.02993 0.88 58.9863

    Kogge-Stone 101.4262 0.62 62.8842

    Han-Carlson 76.92042 0.68 52.3059

    Sklansky 78.23902 0.53 41.4667

    Ladner-Fischer 69.72199 0.67 46.7137

    Proposed 62.9453 0.65 40.9144

    Fig. 2 and Fig. 3 show the Power and Power DelayProduct comparison of various PPA Designs simulatedusing TSMC 180 nm Technology respectively.

    Figure 2. Power Comparison of various PPA Designs in 180 nm

    Figure 3. Power Delay Product of various PPA Designs in 180 nm

    Figure 4. Power Comparison of various PPA Designs in 130 nm

    Figure 5. Power Delay Product of various PPA Designs in 130nm

    V. CONCLUSIONS

    It is observed that the Proposed 32-bit Parallel Prefixadder has the least power delay product when compared

    with its peer existing ones. The Proposed structure offers7.2 % power savings is achieved when compared withBrent Kung adder. The Proposed design offers 38 %

    2009 ACADEMY PUBLISHER

  • 7/27/2019 32 Bits

    5/5