5
992 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 8, AUGUST 2005 APPENDIX PROOF OF (6) Proof: If a polynomial is in [de- fined in (1)] then its fourth power is equal to (13) because is a field of characteristic 2. Since , in we get and from (13) we obtain (14) From (2), it follows that and, therefore, from (14), we get that completes the proof of (6). REFERENCES [1] FIPS 197: Advanced Encryption Standard, 2001. [2] K. Gaj and P. Chodowiec, “Comparison of the hardware performance of the AES candidates using reconfigurable hardware,” in Proc. 3rd Ad- vanced Encryption Standard Candidate Conf. (AES3), New York, Apr. 2000, pp. 40–54. [3] A. J. Elbirt, W. Yip, B. Chetwynd, and C. Paar, “An FPGA implemen- tation and performance evaluation of the AES block cipher candidate algorithm finalists,” in Proc. 3rd Advanced Encryption Standard Candi- date Conf. (AES3), New York, Apr. 2000, pp. 13–27. [4] V. Rijmen. Efficient implementation of the Rijndael S-box. [Online]. Available: http://www.esat.kuleuven.ac.be/~rijmen/rijndael/sbox.pdf [5] A. Rudra, P. K. Dubey, C. S. Jutla, V. Kumar, J. Rao, and P. Rohatgi, “Ef- ficient Rijndael encryption implementation with composite field arith- metic,” in Proc. Int. Workshop Cryptographic Hardware and Embedded Systems (CHES’01), vol. 2161, 2001, pp. 171–184. [6] A. Satoh, S. Morioka, K. Takano, and S. Munetoh, “A compact Rijndael hardware architecture with S-box optimization,” in Proc. Theory and Application of Cryptology and Information Security (ASIACRYPT’01), vol. 2248, Gold Coast, Australia, Dec. 9–13, 2001, pp. 239–254. [7] P. Davies. Thales e-Security white paper: Flexible security. [On- line]. Available: http://www.thales-esecurity.com/Whitepapers/docu- ments/WP_Flexible_Security.pdf [8] V.Fischer and M. Drutarovský, “Two methods of Rijndael implemen- tation in reconfigurable hardware,” in Proc. Int. Workshop on Crypto- graphic Hardware and Embedded Systems (CHES’01), vol. 2162, Paris, France, May 2001, pp. 81–96. [9] J. Wolkerstorfer, “An ASIC implementation of the AES MixColumn op- eration,” in Proc. Austrochip 2001, Vienna, Austria, Oct. 12, 2001, pp. 129–132. [10] C.-C. Lu and S.-Y. Tseng, “Integrated design of AES (advanced encryp- tion standard) encrypter and decrypter,” in Proc. IEEE Int. Conf. Appli- cation-Specific Systems, Architectures and Processors (ASAP’02), 2002, pp. 277–285. [11] X. Zhang and K. K. Parhi, “Implementation approaches for the advanced encryption standard algorithm,” IEEE Circuits Syst. Mag., vol. 2, no. 4, pp. 24–46, Mar. 2002. [12] P. Chodowiec and K. Gaj, “Very compact FPGA implementation of the AES algorithm,” in Proc. Int. Workshop on Cryptographic Hardware and Embedded Systems (CHES’03), vol. 2779, Cologne, Germany, Sep. 2003, pp. 319–333. A 32-Bit Carry Lookahead Adder Using Dual-Path All-N Logic Ge Yang, Seong-Ook Jung, Kwang-Hyun Baek, Soo Hwan Kim, Suki Kim, and Sung-Mo Kang Abstract—We have developed dual path all-N logic (DPANL) and applied it to 32-bit adder design for higher performance. The speed is significantly enhanced due to reduced capacitance at each evaluation node of dynamic circuits. The power saving is achieved due to reduced adder cell size and minimal race problem. Post-layout simulation results show that this adder can operate at frequencies up to 1.85 GHz for 0.35- m 1P4M CMOS tech- nology and is 32.4% faster than the adder using all-N transistor (ANT). It also consumes 29.2% less power than the ANT adder. A 0.35- m CMOS chip has been fabricated and tested to verify the functionality and perfor- mance of the DPANL adder on silicon. Index Terms—CMOS, dynamic-logic circuit, high performance, low-power design. I. INTRODUCTION Much work has been done recently on high-performance low-power adder design critical for microprocessors [1]–[3]. Dynamic circuits have been widely used owning to faster switching speed and less area than the conventional static CMOS circuits. Pipelined structure has also been used to further enhance the operating frequency to achieve higher throughput. In pipelined systems of NORA [4], ZIPPER [5], and TSPC [6], low- speed pMOS logic blocks are used. For speed improvement, all-N logic (ANL) [7] was introduced to use only high-speed nMOS logic in all stages. All-N transistor (ANT) [1] was developed by using a feedback transistor pair to improve the performance of ANL. For further speed improvement with reduced power consumption, we propose dual-path all-N logic (DPANL). This paper is organized as following. Section II reviews previous work. Section III introduces DPANL. Simulation and chip testing re- sults are shown in Section IV, followed by the conclusion in Section V. II. PREVIOUS WORK NORA uses two-phase clock signals instead of four-phase clock sig- nals and avoids the race problem caused by clock skews with con- strained logic composition [6]. True single-phase clock (TSPC) uses only a single-phase clock without inversion. It does not suffer from the clock skew problems and thus can operate at high clock frequency [7]. Both NORA and TSPC pipelined systems have the drawback of using low speed pMOS logic blocks that limit the performance of pipelined systems. Fig. 1 shows a circuit diagram of the CMOS dynamic circuit ANL. It removes the drawback of TSPC logic by using an nMOS logic tree in N2-block. To overcome the voltage drop problem in the nMOS logic tree, a positive feedback pMOS P3 in N2-block is used to pull up the Manuscript received January 21, 2004; revised December 14, 2004. This work was supported in part by Semiconductor Research Corporation under Contract 2001-HJ-891, in part by Intel Corporation, and in part by BK21 program. G. Yang is with the Nvidia Corporation, Santa Clara, CA 95050 USA (e-mail: [email protected]). S.-O. Jung is with Qualcomm Inc., San Diego, CA 92121 USA. K.-H. Baek is with Rockwell Scientific, Thousand Oaks, CA 91360 USA. S. H. Kim and S. Kim are with Korea University, Seoul 136-701, Korea. S.-M. Kang is with the University of California, Santa Cruz, CA 95064 USA. Digital Object Identifier 10.1109/TVLSI.2005.853605 1063-8210/$20.00 © 2005 IEEE Authorized licensed use limited to: Oxford Engineering College. Downloaded on November 26, 2009 at 00:29 from IEEE Xplore. Restrictions apply.

A 32-Bit Carry Lookahead Adder

Embed Size (px)

Citation preview

Page 1: A 32-Bit Carry Lookahead Adder

992 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 8, AUGUST 2005

APPENDIX

PROOF OF (6)

Proof: If a polynomial P (X) =0�j�3 ajX

j is inK[X] [de-fined in (1)] then its fourth power is equal to

P4(X) =

0�j�3

a4

jX4j (13)

becauseK is a field of characteristic 2. SinceX4j mod (X4+1) = 1,in R we get X4j = 1 and from (13) we obtain

P4(X) =

0�j�3

a4

j = P4(1): (14)

From (2), it follows that

c(1) = (x+ 1) + 1 + 1 + x = 1

and, therefore, from (14), we get

c4(X) = 1 = c(X) � c

3(X) = c(X) � d(X)

that completes the proof of (6).

REFERENCES

[1] FIPS 197: Advanced Encryption Standard, 2001.[2] K. Gaj and P. Chodowiec, “Comparison of the hardware performance

of the AES candidates using reconfigurable hardware,” in Proc. 3rd Ad-vanced Encryption Standard Candidate Conf. (AES3), New York, Apr.2000, pp. 40–54.

[3] A. J. Elbirt, W. Yip, B. Chetwynd, and C. Paar, “An FPGA implemen-tation and performance evaluation of the AES block cipher candidatealgorithm finalists,” in Proc. 3rd Advanced Encryption Standard Candi-date Conf. (AES3), New York, Apr. 2000, pp. 13–27.

[4] V. Rijmen. Efficient implementation of the Rijndael S-box. [Online].Available: http://www.esat.kuleuven.ac.be/~rijmen/rijndael/sbox.pdf

[5] A. Rudra, P. K. Dubey, C. S. Jutla, V. Kumar, J. Rao, and P. Rohatgi, “Ef-ficient Rijndael encryption implementation with composite field arith-metic,” in Proc. Int. Workshop Cryptographic Hardware and EmbeddedSystems (CHES’01), vol. 2161, 2001, pp. 171–184.

[6] A. Satoh, S. Morioka, K. Takano, and S. Munetoh, “A compact Rijndaelhardware architecture with S-box optimization,” in Proc. Theory andApplication of Cryptology and Information Security (ASIACRYPT’01),vol. 2248, Gold Coast, Australia, Dec. 9–13, 2001, pp. 239–254.

[7] P. Davies. Thales e-Security white paper: Flexible security. [On-line]. Available: http://www.thales-esecurity.com/Whitepapers/docu-ments/WP_Flexible_Security.pdf

[8] V. Fischer and M. Drutarovský, “Two methods of Rijndael implemen-tation in reconfigurable hardware,” in Proc. Int. Workshop on Crypto-graphic Hardware and Embedded Systems (CHES’01), vol. 2162, Paris,France, May 2001, pp. 81–96.

[9] J. Wolkerstorfer, “An ASIC implementation of the AES MixColumn op-eration,” in Proc. Austrochip 2001, Vienna, Austria, Oct. 12, 2001, pp.129–132.

[10] C.-C. Lu and S.-Y. Tseng, “Integrated design of AES (advanced encryp-tion standard) encrypter and decrypter,” in Proc. IEEE Int. Conf. Appli-cation-Specific Systems, Architectures and Processors (ASAP’02), 2002,pp. 277–285.

[11] X. Zhang and K. K. Parhi, “Implementation approaches for the advancedencryption standard algorithm,” IEEE Circuits Syst. Mag., vol. 2, no. 4,pp. 24–46, Mar. 2002.

[12] P. Chodowiec and K. Gaj, “Very compact FPGA implementation of theAES algorithm,” in Proc. Int. Workshop on Cryptographic Hardwareand Embedded Systems (CHES’03), vol. 2779, Cologne, Germany, Sep.2003, pp. 319–333.

A 32-Bit Carry Lookahead AdderUsing Dual-Path All-N Logic

Ge Yang, Seong-Ook Jung, Kwang-Hyun Baek, Soo Hwan Kim,Suki Kim, and Sung-Mo Kang

Abstract—We have developed dual path all-N logic (DPANL) and appliedit to 32-bit adder design for higher performance. The speed is significantlyenhanced due to reduced capacitance at each evaluation node of dynamiccircuits. The power saving is achieved due to reduced adder cell size andminimal race problem. Post-layout simulation results show that this addercan operate at frequencies up to 1.85 GHz for 0.35- m 1P4M CMOS tech-nology and is 32.4% faster than the adder using all-N transistor (ANT). Italso consumes 29.2% less power than the ANT adder. A 0.35- m CMOSchip has been fabricated and tested to verify the functionality and perfor-mance of the DPANL adder on silicon.

Index Terms—CMOS, dynamic-logic circuit, high performance,low-power design.

I. INTRODUCTION

Much work has been done recently on high-performance low-poweradder design critical for microprocessors [1]–[3]. Dynamic circuitshave been widely used owning to faster switching speed and less areathan the conventional static CMOS circuits. Pipelined structure hasalso been used to further enhance the operating frequency to achievehigher throughput.

In pipelined systems of NORA [4], ZIPPER [5], and TSPC [6], low-speed pMOS logic blocks are used. For speed improvement, all-N logic(ANL) [7] was introduced to use only high-speed nMOS logic in allstages. All-N transistor (ANT) [1] was developed by using a feedbacktransistor pair to improve the performance of ANL. For further speedimprovement with reduced power consumption, we propose dual-pathall-N logic (DPANL).

This paper is organized as following. Section II reviews previouswork. Section III introduces DPANL. Simulation and chip testing re-sults are shown in Section IV, followed by the conclusion in Section V.

II. PREVIOUS WORK

NORA uses two-phase clock signals instead of four-phase clock sig-nals and avoids the race problem caused by clock skews with con-strained logic composition [6]. True single-phase clock (TSPC) usesonly a single-phase clock without inversion. It does not suffer from theclock skew problems and thus can operate at high clock frequency [7].Both NORA and TSPC pipelined systems have the drawback of usinglow speed pMOS logic blocks that limit the performance of pipelinedsystems.

Fig. 1 shows a circuit diagram of the CMOS dynamic circuit ANL.It removes the drawback of TSPC logic by using an nMOS logic treein N2-block. To overcome the voltage drop problem in the nMOS logictree, a positive feedback pMOS P3 in N2-block is used to pull up the

Manuscript received January 21, 2004; revised December 14, 2004. Thiswork was supported in part by Semiconductor Research Corporation underContract 2001-HJ-891, in part by Intel Corporation, and in part by BK21program.

G. Yang is with the Nvidia Corporation, Santa Clara, CA 95050 USA (e-mail:[email protected]).

S.-O. Jung is with Qualcomm Inc., San Diego, CA 92121 USA.K.-H. Baek is with Rockwell Scientific, Thousand Oaks, CA 91360 USA.S. H. Kim and S. Kim are with Korea University, Seoul 136-701, Korea.S.-M. Kang is with the University of California, Santa Cruz, CA 95064 USA.Digital Object Identifier 10.1109/TVLSI.2005.853605

1063-8210/$20.00 © 2005 IEEE

Authorized licensed use limited to: Oxford Engineering College. Downloaded on November 26, 2009 at 00:29 from IEEE Xplore. Restrictions apply.

Page 2: A 32-Bit Carry Lookahead Adder

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 8, AUGUST 2005 993

Fig. 1. ANL logic.

Fig. 2. ANT logic.

evaluation node. pMOS P3 in N1-block and nMOS N3 in N2-blockare used to solve the charge sharing problem between the point OUTand the point B. When the clock slew rate is high enough, pMOS P3 inN1-block and nMOS N3 in N2-block can be omitted [7].

A schematic diagram of ANT logic is shown in Fig. 2. It improves theperformance using the feedback transistor pair, pMOS P3 and nMOSN3. In evaluation phase, if the nMOS logic tree is evaluated, after thevoltage of the evaluation node A drops to below (Vdd�Vth), pMOS P3turns on. Then it pulls up point B and turns on nMOS N3. nMOS N3in turn pulls down evaluation node A and accelerates the evaluation.However, the speedup using the feedback transistor pair is not signif-icant when the number of serial nMOS transistors in the logic tree issmall.

III. CIRCUIT DIAGRAM AND OPERATING PRINCIPLES

A. Basic Idea

The performance of N1-block is affected by the rise time of theoutput point since two processes are involved. First the evaluation nodeA is pulled down through the current path in the nMOS logic tree. ThenpMOS P2 turns on and the output point gets pulled up. The capacitanceat the evaluation node A significantly affects the performance. For ex-ample, in Fig. 2(a), the gate capacitances of pMOS P2, P3, nMOS N2,and the drain capacitances of nMOS N3, pMOS P1, and the nMOStransistors at the top of the nMOS logic tree are connected to the eval-uation node A. To further enhance the performance, we need to reducethe capacitance at the evaluation node.

We have developed DPANL to achieve this goal [8]. N1-block ofDPANL is shown in Fig. 3(a). The nMOS logic trees in Path 1 and

Fig. 3. Circuit diagram of DPANL.

Path 2 are identical except that Path 1 is made faster than Path 2, sincePath 1 influences the rise time of the output. The sizes of the transis-tors in Path 1 and Path 2 should guarantee that the short circuit currentthrough pMOS P3, nMOS N4 and N3 does not affect the performance.The capacitance at the evaluation node A consists of the gate capaci-tance of pMOS P3, the drain capacitance of pMOS P1 and the nMOStransistors at the top of the nMOS logic tree. It is much smaller thanthe corresponding capacitance in ANT, and thus helps achieve higherperformance.

Power consumption is also less in DPANL. The total width of thetwo nMOS logic trees in DPANL can be made the same as or evenless than that of ANT. In ANT, in order to charge and discharge thelarge capacitance at the evaluation node A, the sizes of pMOS P1 andnMOS N1 must be large. Also, in order to discharge the capacitanceintroduced by the feedback transistor pair at point B, nMOS N4 and N2need to be large. In DPANL, since evaluation nodes A and B have smallcapacitances, pMOS P1 and P2, nMOS N1 and N2 can be small; nMOSN4 and N3 can also be small since there is no feedback transistor pairattached to point C. So the total channel width of transistors in DPANLcan be smaller than that in the ANT.

The same principle applies to N2-block. The circuit diagram ofN2-block is shown in Fig. 3(b).

B. Operating Principles of the DPANL

When the clock is low, N1-block of DPANL begins its prechargephase. The clocked pMOS P1 and P2 are turned on, and the evaluationnodes A and B are precharged to high. The clocked foot transistorsnMOS N1 and N2 are turned off, allowing no current through Path 1and Path 2. Since the evaluation node A is precharged to high, pMOSP3 is turned off. nMOS N4 is turned off by the clock. So the outputpoint keeps its previous state in the capacitance at that point.

When the clock is high, N1-block begins its evaluation phase. If thenMOS logic tree is not evaluated, the evaluation nodes A and B stayhigh. pMOS P3 is off, nMOS N4 and N3 are on, the output is pulleddown. If the nMOS logic tree is evaluated, the evaluation nodes A andB are pulled down through Path 1 and Path 2, respectively. nMOS N3is turned off. pMOS P3 is turned on and the output is pulled up.

The operating principles of N2-block are similar to those ofN1-block. One thing to note is that when the nMOS logic tree isevaluated, the evaluation nodes can not reach full Vdd because ofthe threshold voltage drop in nMOS transistors. The presence of thefeedback transistors pMOS P5 and P6 is to pull up the evaluationnodes to full Vdd.

Authorized licensed use limited to: Oxford Engineering College. Downloaded on November 26, 2009 at 00:29 from IEEE Xplore. Restrictions apply.

Page 3: A 32-Bit Carry Lookahead Adder

994 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 8, AUGUST 2005

TABLE ITRANSISTOR SIZING FOR ANT[10]

C. Minimal Race Problem in DPANL

Race exists in TSPC, ANL, ANT, and DPANL, namely, outputglitches caused by a race between the discharge of the evaluation nodein the logic block and the discharge of the output node by the latchblock. Let us take the ANT circuit in Fig. 2(a) as an example. Assumethe output was high during precharge phase. If the nMOS logic tree isevaluated in the evaluation phase, the output will still be high. But atthe beginning of the evaluation phase, node A and CLK are both high,the output will be discharged through nMOS N4 and N2. After theevaluation node A is discharged, nMOS N2 is turned off and pMOSP2 is turned on, the output is pulled up again, thus forming the largeglitch. The large output glitch consumes additional dynamic power.

In order to minimize the race problem, we need to speedup the dis-charge of the evaluation node A and slow down the discharge of theoutput. As we have discussed before, the capacitance at the evaluationnode of DPANL is much smaller than that of ANT, so discharge of theevaluation node of DPANL is much faster. To slow down the dischargeof the output, we can do transistor sizing for the latch block so that onthe basis of equal rise time and fall time of output, the discharge pathnMOS N4 and N2 are chosen as weak as possible.

IV. SIMULATION AND CHIP TESTING RESULTS

The Kogge–Stone graph [9] is generally used for tree structure carrylookahead adders. It has a regular structure and the maximum fanoutat each cell for each pipeline stage is 2, which leads to high perfor-mance. But it requires many long interconnects, causing much area andthus much power consumption. S. Knowles introduced a new familyof adder structures that offer some tradeoff between performance andpower [3]. One of the structures is very suitable for pipeline systemsbecause the maximum fanout at each cell for each pipeline stage is 3.It requires less wiring than the Kogg–Stone graph. Thus, we adoptedthis adder structure for low-power adder design.

A. Prelayout Simulation

For simulation, 0.35-�m 1P4M CMOS technology with 3.3-V powersupply is used. The delay, area, power consumption, and the leakagecurrent of DPANL are normalized to 1 for comparison. Proper tran-sistor sizing has been done for DPANL, ANL, and ANT. Table I showsthe transistor sizing for ANT N1-block [Fig. 2(a)]; this sizing informa-tion is from the published ANT work [10], and it also used 0.35-�mCMOS process. We used similar transistor sizing for ANL. Table IIshows the transistor sizing for DPANL N1-block [Fig. 3(a)]. Based onTables I and II, the carry generation cell (gi+pi�gi�1) using ANT has84 �m total channel width of all transistors, while the carry generationcell using DPANL only has 52 �m total channel width of all transistors.The DPANL carry generation cell has 38% less area compared with theANT cell, even though it has three more transistors.

We have built three adders using DPANL, ANL, and ANT, respec-tively. The total channel width of all transistors in the adder is takenas the area of the adder. The power consumptions of the adders aremeasured at 1.25 GHz. In order to make the carry propagation chainhave the critical delay path, the input signals (A31A30 . . .A1A0) +(B31B30 . . .B1B0) are chosen as follows: (00 . . . 00)+ (11 . . . 11)and (11 . . . 11) + (00 . . . 01) [2]. Power consumption is measured forthis input.

TABLE IITRANSISTOR SIZING FOR DPANL

TABLE IIISIMULATION RESULTS OF THE THREE ADDERS

Fig. 4. (a) Adder floorplan. (b) H-tree clock distribution.

As shown in Table III, ANT is slower than ANL although it has feed-back transistor pair. This is because the number of serial nMOS tran-sistors in the adder cell circuit is only two and thus the evaluation nodecan be discharged quickly. Feedback transistor pair not only is inef-fective, but also increases the capacitance at the evaluation node. ANTcan be faster than ANL when the number of serial nMOS transistors islarger. ANT also consumes more area and power than ANL.

Table III shows that DPANL adder can operate at frequencies upto 2.1 GHz. It is 31.3% and 27.3% faster than ANT adder and ANLadder, respectively. The DPANL adder consumes 32.8% less area and29.2% less power than ANT adder. And it consumes 17.8% less areaand 15.4% less power than ANL adder. The leakage current of DPANLadder is also smaller than that of ANT adder and ANL adder, eventhough DPANL circuit has two evaluation paths. This is because thatthe total channel width of all transistors in the DPANL carry generationcell is smaller than that of the ANT cell and the ANL cell.

B. Postlayout Simulation

Fig. 4(a) shows the floor plan for the adder. The inputs are fed to theleft side of the adder. There are five stages of pi, gi generation afterthe p, g generation stage. The sum stage generates the outputs on theright side of the adder. In the layout, the clock signal is fed to the top ofthe adder and the buffered clock signal is fed to the center of the adder.Then the clock signal propagates to all the cells in the adder throughthe H-tree. We have used a four-level H-tree clock distribution in theadder, while a two-level H-tree is shown in Fig. 4(b). Fig. 5 shows theadder layout.

Table IV shows the post-layout simulation results for the 32-bitDPANL adder and the 32-bit ANT adder. The process for both addersis the TSMC 0.35-�m 1P4M CMOS process, and Vdd is 3.3 V. The

Authorized licensed use limited to: Oxford Engineering College. Downloaded on November 26, 2009 at 00:29 from IEEE Xplore. Restrictions apply.

Page 4: A 32-Bit Carry Lookahead Adder

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 8, AUGUST 2005 995

Fig. 5. DPANL adder layout.

TABLE IVPOSTLAYOUT SIMULATION RESULTS

highest clock frequency that DPANL adder can operate correctly is1.85 GHz. It is lower than 2.1-GHz clock frequency predicted in theprelayout simulation, due to the inclusion of routing capacitances. Thelayout area of DPANL adder is 0.7 mm2. The power consumption ofthe adder under 1.85-GHz clock frequency is 1 W. The ANT addercan run up to 1.25 GHz. It is slower than DAPNL adder due to largercapacitance at the evaluation node of dynamic circuit. The layoutarea of the ANT adder is 1.86 mm2, which is about 2.7 times of thelayout area of DPANL adder. Although the DPANL carry generationcell has more transistors than the ANT carry generation cell, the totalchannel width of all transistors in the ANT cell is 1.6 times of that inthe DPANL cell. Also, our manual layout yielded smaller area thanthe place and routing of ANT adder done by using EDA tools [10].

C. Chip Testing

The functionality of the adder chip was verified using HP80000 datagenerator and oscilloscope. It was not possible to feed 1.85-GHz clocksignal into the chip due to large capacitance at the chip package pins.In order to verify the DPANL adder performance, we applied 1.6-VVdd, while the normal Vdd for TSMC 0.35-�m process is 3.3 V. Thepost-layout simulation showed that DPANL adder should work up to200-MHz clock frequency with 1.6-V Vdd. The critical delay inputswere chosen as follows: (00 . . . 00) + (11 . . . 11) and (11 . . . 11) +(00 . . . 01) [2]. Chip measurements confirmed the correct adder opera-tion under 200-MHz clock frequency. Fig. 6 shows the measured mostsignificant bit (MSB) of adder output. The MSB output is 1 for oneclock cycle and is 0 for the next clock cycle, according to the criticaldelay inputs. The MSB output frequency is 100 MHz, which is cor-rectly half of the clock frequency. The measured chip power consump-tion under 200-MHz clock frequency and 1.6-V Vdd was 80 mW.

Fig. 6. Measured MSB output.

D. Discussion

Whereas scaling down of supply voltage is the most effective wayto reduce power consumption, the threshold voltages of transistors alsoneed to be scaled down to meet performance requirements. However,the lowering of the transistor threshold voltage leads to the exponen-tial growth of the subthreshold leakage current. For deep-submicronprocesses, the floating evaluation node and the output node of DPANLlogic may be discharged by leakage currents. Keeper similar to dominocircuit could be applied to keep the noise margin of the evaluation node.And back-to-back inverters could be used to keep the noise margin ofthe output node.

We also simulated the DPANL adder using 0.13-�m CMOS SPICEparameters, and the Vdd is 1.2 V. The simulation results show that theDPANL adder can operate up to 5.4 GHz, and the power consumptionis 170 mW.

V. CONCLUSION

In this paper, we have proposed and analyzed DPANL dynamic cir-cuit suitable for high-performance and low-power pipelined system.DPANL has smaller capacitance at each evaluation node and its raceproblem is minimal. DPANL outperforms ANL and ANT in both per-formance, area, and power consumption. The functionality and perfor-mance of a 32-bit CLA adder using the proposed circuit has been ver-ified through chip fabrication and testing.

ACKNOWLEDGMENT

The authors would like to thank Prof. A. Shakauri, University ofCalifornia, Santa Cruz, for providing much help with the chip testing.

REFERENCES

[1] C.-C. Wang, P.-M. Lee, R.-C. Lee, and C.-J. Huang, “A 1.25 GHz 32-bittree-structured carry lookahead adder,” in Proc. 2001 IEEE Int. Symp.Circuits and Systems, vol. 4, 2001, pp. 80–83.

[2] K.-H. Cheng, W.-S. Lee, and Y.-C. Huang, “A 1.2 V 500 MHz 32-bitcarry-lookahead adder,” in Proc. 8th IEEE Int. Conf. Electronics, Cir-cuits and Systems, vol. 2, 2001, pp. 765–768.

[3] S. Knowles, “A family of adders,” in Proc. 15th IEEE Symp. ComputerArithmetic, 2001, pp. 277–281.

[4] N. F. Goncalves and H. J. De Man, “NORA: A race-free dynamic CMOStechnology for pipelined logic structures,” IEEE J. Solid-State Circuits,vol. SSC–18, no. 6, pp. 261–266, Jun. 1983.

[5] C. M. Lee and E. W. Szeto, “Zipper CMOS,” IEEE Circuits DevicesMag., vol. 2, no. 3, pp. 10–16, May 1986.

[6] J. Yuan and C. Svensson, “High-speed CMOS circuit technique,” IEEEJ. Solid-State Circuits, vol. 24, no. 2, pp. 62–70, Feb. 1989.

[7] R. X. Gu and M. I. Elmasry, “All-N-logic high-speed true-single-phasedynamic CMOS logic,” IEEE J. Solid-State Circuits, vol. 31, no. 2, pp.221–229, Feb. 1996.

Authorized licensed use limited to: Oxford Engineering College. Downloaded on November 26, 2009 at 00:29 from IEEE Xplore. Restrictions apply.

Page 5: A 32-Bit Carry Lookahead Adder

996 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 8, AUGUST 2005

[8] G. Yang, S. O. Jung, S. H. Kim, and S. M. Kang, “A low-power 2.1GHz 32-bit carry lookahead adder using dual path all-N-logic,” in Proc.45th IEEE Int. Midwest Symp. Circuits and Systems, vol. 2, 2002, pp.298–301.

[9] P. M. Kogge and H. S. Stone, “A parallel algorithm for the efficient solu-tion of a general class of recurrence equations,” IEEE Trans. Commun.,vol. COM-22, no. 4, pp. 786–793, Aug. 1973.

[10] C.-C. Wang, Y.-L. Tseng, P.-M. Lee, R.-C. Lee, and C.-J. Huang, “A1.25 GHz 32-bit tree-structured carry lookahead adder using modifiedANT logic,” IEEE Trans. Circuits Syst. I, Fundam. Theory Appl., vol.50, no. 9, pp. 1208–1216, Sep. 2003.

Function-Based Compact Test PatternGeneration for Path Delay Faults

Maria K. Michael and Spyros Tragoudas

Abstract—We present a function-based nonenumerative automatic testpattern generation (ATPG) methodology for detecting path delay faults(PDFs). The proposed technique consists of a number of topological cir-cuit traversals during each a linear number of Boolean functions is gener-ated per circuit line. From each such function we derive a test that detectsmany PDFs. The two major strengths of the approach, that stem from thefunction-based formulations used, are very compact test sets, and scala-bility in test efficiency. The performance of an implementation based onbinary decision diagrams is evaluated and compared with existing compactmethods to demonstrate the superiority of the proposed method.

Index Terms—Automatic test pattern generation (ATPG), binary de-cision diagram (BDD), Boolean/algebraic test generation, delay faults,nonenumerative, test compaction, test efficiency, testing.

I. INTRODUCTION

Automatic test pattern generation (ATPG) for path delay faults(PDFs) is an important problem that has been considered in [1], [2],[4]–[6], and [8]–[13], among others. Under the PDF model, a fault is asequence of falling or rising transitions along a physical path, from aprimary input to a primary output in the circuit. A pair of patterns mustbe applied to test each PDF. In this work, we consider combinationaland fully enhanced-scanned sequential circuits.

In traditional enumerative methods, such as [1] and [4], the ATPGprocess is applied on a fault-by-fault basis. To overcome the problemof examining all PDFs, which can be an exponential number, manyenumerative methods consider only the longest paths. However, suchrestrictions remain enumerative since the examined paths in manycircuits remain prohibitively many. The work in [5] suggests notexamining paths but instead subpaths (segments). In the strict sense ofthe definition, this approach cannot be classified as path-enumerative.However, it does not guarantee a polynomial bound on the number ofexamined subpaths since the number of examined subpaths is a linearfraction of the total number of PDFs.

Manuscript received September 11, 2003; revised June 12, 2004. This workwas supported in part by a grant from Intel Corporation.

M. K. Michael is with the Department of Electrical and ComputerEngineering, University of Cyprus, 1678 Nicosia, Cyprus (e-mail:[email protected]).

S. Tragoudas is with the Electrical and Computer Engineering Depart-ment, Southern Illinois University, Carbondale, IL 62901 USA (e-mail:[email protected]).

Digital Object Identifier 10.1109/TVLSI.2005.853607

Nonenumerative ATPG approaches [6], [11] were proposed to over-come the problem of path enumeration. Both approaches are usinggraph theoretic arguments and are building on top of PODEM-like faultpropagation methods along selected paths in the circuit. Unfortunately,the fault coverage from both of these methodologies is very low. Theirtest efficiency (number of detected faults per generated test) is also quitelow. More importantly, none of these methods addresses scalability. Inour context, we refer to scalability as the ability of the approach tomaintain the test efficiency as the number of targeted PDFs increases.A major difference between the proposed method and the approachesin [6] and [11] is that we use function-based techniques to generatethe tests. Function-based ATPG methods for PDFs have also been pro-posed in [1], among some others, but all these approaches are faultenumerative.

Apart from the nonenumerative techniques in [6] and [11], other pro-cedures that explicitly target the generation of compact test sets forPDFs were proposed in [2], [12], and [13]. The test compaction pro-cedure of [2], as well as the most recent one included in [12], is usingthe concept of primary and secondary target faults. Once a test is foundfor a primary fault, it is expanded so that it also detects one or moresecondary faults. The level of compaction in both of these techniquesdepends greatly on the selection order of the primary and secondaryfaults. A slightly different concept, the one of finding maximal sets ofpotentially compatible faults, is used in [13]. Even though they maynot target all faults explicitly, the above methods remain enumerative,since they are based on the principle of first targeting a single fault andthen attempting to find one or more faults that can be tested mutuallywith the original fault.

The proposed ATPG tool is called NEAT, for Non-EnumerativeATPG. The approach consists of simple topological circuit traversals,whose number is linear to the number of primary inputs. Duringeach traversal, a user-defined (constant) number of appropriatelyformulated Boolean functions is maintained per circuit line. Eachsuch function, which we call a test function, is guaranteed to sensitizemany subpaths from a primary input up to the line. When a circuittraversal is completed, tests that detect several PDFs originating fromsome primary input are generated. The work presented here buildsupon the ATPG scenario introduced in [8], which did not guaranteehazard-free robust test generation. The current work expands on [8]by introducing a complete, systematic, and scalable framework thatcan be used to generate all types of tests for PDFs (robust, nonrobust,and functionally sensitizable). NEAT also attempts to maintain the testefficiency as more tests are generated. A new dynamic compactiontechnique, whose performance is boosted by the fact that we implicitlymaintain very large sets of tests in the form of test functions, assistsin this goal.

A circuit is represented as a directed graph, denoted by G. The sub-circuit of G induced by primary input I is denoted by GI , and it alsocontains all lines of G that are not driven by I but immediately drivesome node in GI . We call such lines the supporting points of GI .The controlling (noncontrolling) value of a gate g is denoted by cv(g)

(ncv(g)) 2 f0; 1g. A transition is designated by tr 2 fr; fg, wherer = rising and f = falling. The positive (negative) cofactor of aBoolean function f with respect to variable x is denoted by fx (fx),where fx = fjx=1 (fx = fjx=0).

Let gate g be on a PDF. An input of g is either an on-input whichassumes a certain transition to be propagated or an off-input which as-sumes a value to be justified. We use the PDF classification of [4],which categorizes PDF tests into robust, nonrobust, functional sensi-tizable, and functional unsensitizable. Table I shows the constraints of

1063-8210/$20.00 © 2005 IEEE

Authorized licensed use limited to: Oxford Engineering College. Downloaded on November 26, 2009 at 00:29 from IEEE Xplore. Restrictions apply.