10
78 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 1, JANUARY 2015 Reliable Low-Power Multiplier Design Using Fixed-Width Replica Redundancy Block I-Chyn Wey, Chien-Chang Peng, and Feng-Yu Liao Abstract—In this paper, we propose a reliable low-power multiplier design by adopting algorithmic noise tolerant (ANT) architecture with the fixed-width multiplier to build the reduced precision replica redundancy block (RPR). The proposed ANT architecture can meet the demand of high precision, low power consumption, and area efficiency. We design the fixed-width RPR with error compensation circuit via analyzing of probability and statistics. Using the partial product terms of input correction vector and minor input correction vector to lower the truncation errors, the hardware complexity of error compensation circuit can be simplified. In a 12 × 12 bit ANT multiplier, circuit area in our fixed-width RPR can be lowered by 44.55% and power consumption in our ANT design can be saved by 23% as compared with the state-of-art ANT design. Index Terms— Algorithmic noise tolerant (ANT), fixed-width multiplier, reduced-precision replica (RPR), voltage overscaling (VOS). I. I NTRODUCTION T HE RAPID growth of portable and wireless computing systems in recent years drives the need for ultralow power systems. To lower the power dissipation, supply voltage scal- ing is widely used as an effective low-power technique since the power consumption in CMOS circuits is proportional to the square of supply voltage [1]. However, in deep-submicrometer process technologies, noise interference problems have raised difficulty to design the reliable and efficient microelectronics systems; hence, the design techniques to enhance noise toler- ance have been widely developed [2]–[12]. An aggressive low-power technique, referred to as voltage overscaling (VOS), was proposed in [4] to lower supply voltage beyond critical supply voltage without sacrificing the throughput. However, VOS leads to severe degradation in signal-to-noise ratio (SNR). A novel algorithmic noise tolerant (ANT) technique [2] combined VOS main block with reduced-precision replica (RPR), which combats soft errors effectively while achieving significant energy saving. Some ANT deformation designs are presented in [5]–[9] and the ANT design concept is further extended to system level Manuscript received March 22, 2013; revised October 4, 2013; accepted January 8, 2014. Date of publication February 13, 2014; date of current version January 16, 2015. This work was supported by the National Science Council of Taiwan under Grant NSC-101-2628-E-182-002-MY2. I-C. Wey is with the Graduate Institute of Electrical Engineering, Electri- cal Engineering Department, Green Technology Research Center, Healthy- Aging Research Center, School of Electrical and Computer Engineering, College of Engineering, Chang Gung University, Tao-Yuan 333, Taiwan (e-mail: [email protected]). C.-C. Peng and F.-Y. Liao are with the Graduate Institute of Electri- cal Engineering, Chang Gung University, Tao-Yuan 333, Taiwan (e-mail: [email protected]; [email protected]). Digital Object Identifier 10.1109/TVLSI.2014.2303487 Fig. 1. ANT architecture [2]. in [10]. However, the RPR designs in the ANT designs of [5]–[7] are designed in a customized manner, which are not easily adopted and repeated. The RPR designs in the ANT designs of [8] and [9] can operate in a very fast manner, but their hardware complexity is too complex. As a result, the RPR design in the ANT design of [2] is still the most popular design because of its simplicity. However, adopting with RPR in [2] should still pay extra area overhead and power consumption. In this paper, we further proposed an easy way using the fixed-width RPR to replace the full-width RPR block in [2]. Using the fixed-width RPR, the computation error can be corrected with lower power consumption and lower area overhead. We take use of probability, statistics, and partial product weight analysis to find the approximate compensation vector for a more precise RPR design. In order not to increase the critical path delay, we restrict the compensation circuit in RPR must not be located in the critical path. As a result, we can realize the ANT design with smaller circuit area, lower power consumption, and lower critical supply voltage. II. ANT ARCHITECTURE DESIGNS The ANT technique [2] includes both main digital signal processor (MDSP) and error correction (EC) block, as shown in Fig. 1. To meet ultralow power demand, VOS is used in MDSP. However, under the VOS, once the crit- ical path delay T cp of the system becomes greater than the sampling period T samp , the soft errors will occur. It leads to severe degradation in signal precision. In the ANT tech- nique [2], a replica of the MDSP but with reduced preci- sion operands and shorter computation delay is used as EC block. Under VOS, there are a number of input-dependent soft errors in its output y a [n]; however, RPR output y r [n] is still correct since the critical path delay of the replica 1063-8210 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

78 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION …kresttechnology.com/krest-academic-projects/krest-mtech... · 2016-01-08 · 78 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 78 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION …kresttechnology.com/krest-academic-projects/krest-mtech... · 2016-01-08 · 78 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION

78 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 1, JANUARY 2015

Reliable Low-Power Multiplier Design UsingFixed-Width Replica Redundancy Block

I-Chyn Wey, Chien-Chang Peng, and Feng-Yu Liao

Abstract— In this paper, we propose a reliable low-powermultiplier design by adopting algorithmic noise tolerant (ANT)architecture with the fixed-width multiplier to build the reducedprecision replica redundancy block (RPR). The proposed ANTarchitecture can meet the demand of high precision, low powerconsumption, and area efficiency. We design the fixed-width RPRwith error compensation circuit via analyzing of probability andstatistics. Using the partial product terms of input correctionvector and minor input correction vector to lower the truncationerrors, the hardware complexity of error compensation circuitcan be simplified. In a 12 × 12 bit ANT multiplier, circuitarea in our fixed-width RPR can be lowered by 44.55% andpower consumption in our ANT design can be saved by 23% ascompared with the state-of-art ANT design.

Index Terms— Algorithmic noise tolerant (ANT), fixed-widthmultiplier, reduced-precision replica (RPR), voltage overscaling(VOS).

I. INTRODUCTION

THE RAPID growth of portable and wireless computingsystems in recent years drives the need for ultralow power

systems. To lower the power dissipation, supply voltage scal-ing is widely used as an effective low-power technique sincethe power consumption in CMOS circuits is proportional to thesquare of supply voltage [1]. However, in deep-submicrometerprocess technologies, noise interference problems have raiseddifficulty to design the reliable and efficient microelectronicssystems; hence, the design techniques to enhance noise toler-ance have been widely developed [2]–[12].

An aggressive low-power technique, referred to as voltageoverscaling (VOS), was proposed in [4] to lower supplyvoltage beyond critical supply voltage without sacrificingthe throughput. However, VOS leads to severe degradationin signal-to-noise ratio (SNR). A novel algorithmic noisetolerant (ANT) technique [2] combined VOS main blockwith reduced-precision replica (RPR), which combats softerrors effectively while achieving significant energy saving.Some ANT deformation designs are presented in [5]–[9] andthe ANT design concept is further extended to system level

Manuscript received March 22, 2013; revised October 4, 2013; acceptedJanuary 8, 2014. Date of publication February 13, 2014; date of current versionJanuary 16, 2015. This work was supported by the National Science Councilof Taiwan under Grant NSC-101-2628-E-182-002-MY2.

I-C. Wey is with the Graduate Institute of Electrical Engineering, Electri-cal Engineering Department, Green Technology Research Center, Healthy-Aging Research Center, School of Electrical and Computer Engineering,College of Engineering, Chang Gung University, Tao-Yuan 333, Taiwan(e-mail: [email protected]).

C.-C. Peng and F.-Y. Liao are with the Graduate Institute of Electri-cal Engineering, Chang Gung University, Tao-Yuan 333, Taiwan (e-mail:[email protected]; [email protected]).

Digital Object Identifier 10.1109/TVLSI.2014.2303487

Fig. 1. ANT architecture [2].

in [10]. However, the RPR designs in the ANT designs of[5]–[7] are designed in a customized manner, which are noteasily adopted and repeated. The RPR designs in the ANTdesigns of [8] and [9] can operate in a very fast manner,but their hardware complexity is too complex. As a result,the RPR design in the ANT design of [2] is still the mostpopular design because of its simplicity. However, adoptingwith RPR in [2] should still pay extra area overhead and powerconsumption. In this paper, we further proposed an easy wayusing the fixed-width RPR to replace the full-width RPR blockin [2]. Using the fixed-width RPR, the computation error canbe corrected with lower power consumption and lower areaoverhead. We take use of probability, statistics, and partialproduct weight analysis to find the approximate compensationvector for a more precise RPR design. In order not to increasethe critical path delay, we restrict the compensation circuit inRPR must not be located in the critical path. As a result, wecan realize the ANT design with smaller circuit area, lowerpower consumption, and lower critical supply voltage.

II. ANT ARCHITECTURE DESIGNS

The ANT technique [2] includes both main digitalsignal processor (MDSP) and error correction (EC) block,as shown in Fig. 1. To meet ultralow power demand, VOSis used in MDSP. However, under the VOS, once the crit-ical path delay Tcp of the system becomes greater than thesampling period Tsamp, the soft errors will occur. It leadsto severe degradation in signal precision. In the ANT tech-nique [2], a replica of the MDSP but with reduced preci-sion operands and shorter computation delay is used as ECblock. Under VOS, there are a number of input-dependentsoft errors in its output ya[n]; however, RPR output yr [n]is still correct since the critical path delay of the replica

1063-8210 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: 78 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION …kresttechnology.com/krest-academic-projects/krest-mtech... · 2016-01-08 · 78 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION

WEY et al.: RELIABLE LOW-POWER MULTIPLIER DESIGN 79

Fig. 2. Proposed ANT architecture with fixed-width RPR.

is smaller than Tsamp [4]. Therefore, yr [n] is applied todetect errors in the MDSP output ya[n]. Error detection isaccomplished by comparing the difference |ya[n] − yr [n]|against a threshold Th. Once the difference between ya[n]and yr [n] is larger than Th, the output y[n] is yr [n] insteadof ya[n]. As a result, y[n] can be expressed as

y[n] ={

ya [n] , if |ya [n] − yr [n]| ≤ T h

yr [n] , if |ya [n] − yr [n]| > T h.

(1)

Th is determined by

T h = max∀input|yo [n] − yr [n]| (2)

where yo[n] is error free output signal. In this way, the powerconsumption can be greatly lowered while the SNR can stillbe maintained without severe degradation [2].

III. PROPOSED ANT MULTIPLIER DESIGN

USING FIXED-WIDTH RPR

In this paper, we further proposed the fixed-width RPR toreplace the full-width RPR block in the ANT design [2],as shown in Fig. 2, which can not only provide highercomputation precision, lower power consumption, and lowerarea overhead in RPR, but also perform with higher SNR,more area efficient, lower operating supply voltage, and lowerpower consumption in realizing the ANT architecture. Wedemonstrate our fixed-width RPR-based ANT design in anANT multiplier.

The fixed-width designs are usually applied in DSP appli-cations to avoid infinite growth of bit width. Cutting off n-bitleast significant bit (LSB) output is a popular solution to con-struct a fixed-width DSP with n-bit input and n-bit output. Thehardware complexity and power consumption of a fixed-widthDSP is usually about half of the full-length one. However,truncation of LSB part results in rounding error, which needsto be compensated precisely. Many literatures [13]–[22] havebeen presented to reduce the truncation error with constantcorrection value [13]–[15] or with variable correction value[16]–[22]. The circuit complexity to compensate with constantcorrected value can be simpler than that of variable correctionvalue; however, the variable correction approaches are usuallymore precise.

In [16]–[22], their compensation method is to compensatethe truncation error between the full-length multiplier and

the fixed-width multiplier. However, in the fixed-width RPRof an ANT multiplier, the compensation error we need tocorrect is the overall truncation error of MDSP block. Unlike[16]–[22], our compensation method is to compensate thetruncation error between the full-length MDSP multiplierand the fixed-width RPR multiplier. In nowadays, there aremany fixed-width multiplier designs applied to the full-widthmultipliers. However, there is still no fixed-width RPR designapplied to the ANT multiplier designs.

To achieve more precise error compensation, we compensatethe truncation error with variable correction value. We con-struct the error compensation circuit mainly using the partialproduct terms with the largest weight in the least significantsegment. The error compensation algorithm makes use ofprobability, statistics, and linear regression analysis to findthe approximate compensation value [16]. To save hardwarecomplexity, the compensation vector in the partial productterms with the largest weight in the least significant segmentis directly inject into the fixed-width RPR, which does notneed extra compensation logic gates [17]. To further lower thecompensation error, we also consider the impact of truncatedproducts with the second most significant bits on the errorcompensation. We propose an error compensation circuit usinga simple minor input correction vector to compensation theerror remained. In order not to increase the critical path delay,we locate the compensation circuit in the noncritical path ofthe fixed-width RPR. As compared with the full-width RPRdesign in [15], the proposed fixed-width RPR multiplier notonly performs with higher SNR but also with lower circuitryarea and lower power consumption.

A. Proposed Precise Error Compensation Vector forFixed-Width RPR Design

In the ANT design, the function of RPR is to correct theerrors occurring in the output of MDSP and maintain the SNRof whole system while lowering supply voltage. In the caseof using fixed-width RPR to realize ANT architecture, wenot only lower circuit area and power consumption, butalso accelerate the computation speed as compared with theconventional full-length RPR. However, we need to compen-sate huge truncation error due to cutting off many hardwareelements in the LSB part of MDSP.

In the MDSP of n-bit ANT Baugh–Wooley array multiplier,its two unsigned n-bit inputs of X and Y can be expressed as

X =n−1∑i=0

xi · 2i , Y =n−1∑j=0

y j · 2 j . (3)

The multiplication result P is the summation of partial prod-ucts of xi y j , which is expressed as

P =2n−1∑k=0

pk · 2k =n−1∑j=0

n−1∑i=0

xi y j · 2i+ j . (4)

The (n/2)-bit unsigned full-width Baugh–Wooley partial prod-uct array can be divided into four subsets, which are mostsignificant part (MSP), input correction vector [ICV(β)], minor

Page 3: 78 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION …kresttechnology.com/krest-academic-projects/krest-mtech... · 2016-01-08 · 78 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION

80 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 1, JANUARY 2015

Fig. 3. 12 × 12 bit ANT multiplier is implemented with the six-bit fixed-width replica redundancy block.

ICV [MICV(α)], and LSP, as shown in Fig. 3. In the fixed-width RPR, only MSP part is kept and the other partsare removed. Therefore, the other three parts of ICV(β),MICV(α), and LSP are called as truncated part. The truncatedICV(β) and MICV(α) are the most important parts becauseof their highest weighting. Therefore, they can be applied toconstruct the truncation error compensation algorithm.

To evaluate the accuracy of a fixed-width RPR, we canexploit the difference between the (n/2)-bit fixed-width RPRoutput and the 2n-bit full-length MDSP output, which isexpressed as

ε = P − Pt (5)

where P is the output of the complete multiplier in MDSPand Pt is the output of the fixed-width multiplier in RPR.Pt can be expressed as

Pt =n−1∑

j= n2 +1

y j 2j

n−1∑i= 3n

2 − j

xi 2i

+ f(

xn−1 y n2, xn−2 y n

2 +1, xn−3 y n2 +2, . . . , x n

2y n

2 +2

)+ f

(xn−2 y n

2, xn−3 y n

2 +1, xn−4 y n2 +2, . . . , x n

2yn−2

)

=n−1∑

j= n2 +1

y j 2j

n−1∑i= 3n

2 − j

xi 2i + f (ICV) + f (MICV)

=n−1∑

j= n2 +1

y j 2j

n−1∑i= 3n

2 − j

xi 2i + f (EC) (6)

where f (EC) is the error compensation function, f (ICV)is the error compensation function contributed by the inputcorrection vector ICV(β), and f (MICV) is the error compen-sation function contributed by minor input correction vectorMICV(α).

The source of errors generated in the fixed-width RPR isdominated by the bit products of ICV since they have thelargest weight. In [8], it is reported that a low-cost EC circuitcan be designed easily if a simple relationship between f (EC)

Fig. 4. Statistical curves of average truncation error in the LSP block andthe curves of compensation function with β − 1, β, and β + 1 in the 12-bitfixed-width RPR-based ANT multiplier.

Fig. 5. Analysis of absolute average compensation error under variousβ values in the 12-bit fixed-width RPR-based ANT multiplier.

and β is found. It is noted that β is the summation of allpartial products of ICV. By statistically analyzing the truncateddifference between MDSP and fixed-width RPR with uniforminput distribution, we can find the relationship between f (EC)and β. As shown in Fig. 4, the statistical results show that theaverage truncation error in the fixed-width RPR multiplier isapproximately distributed between β and β+1. More precisely,as β = 0, the average truncation error is close to β + 1. Asβ > 0, the average truncation error is very close to β. If we canselect β as the compensation vector, the compensation vectorcan directly inject into the fixed-width RPR as compensation,which does not need extra compensation logic gates [17].

We go further to analyze the compensation precision byselecting β as the compensation vector. We can find thatthe absolute average error in β = 0 is much larger thanthat in other β cases, as shown in Fig. 5. Moreover, theabsolute average error in β = 0 is larger than 0.5∗2(3n/2),while the absolute average error in other β situations issmaller than 0.5∗2(3n/2). Therefore, we can apply multipleinput error compensation vectors to further enhance the errorcompensation precision. For the β > 0 case, we can still selectβ as the compensation vector. For the β = 0 case, we selectβ + 1 combining with MICV as the compensation vector.

Before directly injecting the compensation vector β into thefixed-width RPR, we go further to double check the weight forthe partial product terms in ICV with the same partial productsummation value β but with different locations. As shown in

Page 4: 78 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION …kresttechnology.com/krest-academic-projects/krest-mtech... · 2016-01-08 · 78 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION

WEY et al.: RELIABLE LOW-POWER MULTIPLIER DESIGN 81

TABLE I

RELATION BETWEEN THE PARTIAL PRODUCT TERM’S LOCATION IN

ICV AND THE STATISTICAL COMPENSATION ERROR EAVG

Fig. 6. Analysis of average positive and negative compensation error undervarious β values in the 12-bit fixed-width RPR-based ANT multiplier.

Table I, the average error value for each ICV vector with thesame partial product term summation value is nearly the sameeven their partial product term’s location is different. That isto say that no matter ICV = (1,0,0,0,0,0), ICV = (0,1,0,0,0,0),ICV = (0, 0, 1, 0, 0, 0), ICV = (0, 0, 0, 1, 0, 0), ICV =(0, 0, 0, 0, 1, 0), or ICV = (0, 0, 0, 0, 0, 1), their weight ineach partial product term for truncation error compensationis nearly the same. Therefore, we apply the same weight ofunity to each input correction vector element. This conclusionis beneficial for us to inject the compensation vector β into thefixed-width RPR directly. In this way, no extra compensationlogic gates are needed for this part compensation and onlywire connections are needed.

For the β = 0 case, we go further to analyze the errorprofile in the ICV and MICV. In ICV, we can find that all thetruncation errors are positive when β = 0, as shown in Fig. 6.

It implies us that if we adopt the multiple compensationvectors for the average compensation error terms are largerthan 0.5∗2(3n/2), we can lower the compensation error effec-tively and no additional compensation error will be generated.The multiple compensation vectors are constructed by ICV(β)

Fig. 7. Analysis of absolute average compensation error under various βlvalues while β = 0 in the 12-bit fixed-width RPR-based ANT multiplier.

combined with MICV(α). The weight of MICV(α) is only halfof ICV(β). The summation of all partial products of MICV(α),which is denoted as βl , have four possible values of 0, 1,2, and 3 as n = 12 and β = 0. In Fig. 7, the statisticalresults show that the average truncation error contributed bythe MICV in the case of β = 0 is approximately proportionalto α. Moreover, the absolute average truncation error in thesituation of α = 0 is smaller than 0.5∗2(3n/2), while theabsolute average truncation error in the situation of βl > 0is larger than 0.5∗2(3n/2). For the case of the absolute averagetruncation error is smaller than 0.5∗2(3n/2), βl = 0, selectingβ as the compensation vector is suitable. However, for thecase of the absolute average truncation error is larger than0.5∗2(3n/2), selecting β as the compensation vector is notsuitable since insufficient error compensation will occur.Therefore, we adopt ICV together with MICV to amend thisinsufficient error compensation case when β = 0 and βl > 0as well. If β = 0 is contributed by βl > 0, we will inject onemore carry-in compensated vector in the weight of 2(3n/2).In this way, we can remove the cases of |ε| > 0.5∗2(3n/2)

effectively.Finally, the proposed error compensation algorithm is

expressed as (7), as shown at the bottom of the page.As shown in Fig. 8, we can demonstrate that the compen-

sation error is effectively lowered by adopting ICV togetherwith MICV while comparing with the case of fixed-width RPRonly applying the compensation vector of β and with the caseof full-width RPR.

B. Proposed Precise Error Compensation Vector forFixed-Width RPR Design

To realize the fixed-width RPR, we construct one directlyinjecting ICV(β) to basically meet the statistic distribution

f (ICV) =

⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩

0, if βl = 0

β, if βl = 1, or βl = 2 & (up − MICV �= 0 & down − MICV �= 0)

β − 1, if βl = 2 & (up − MICV = 0 or down − MICV = 0), βl = 3,

or βl = 4 & (up − MICV �= 0 & down − MICV �= 0)

β − 2, if βl = 4 & (up − MICV = 0 or down − MICV = 0)

(7)

Page 5: 78 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION …kresttechnology.com/krest-academic-projects/krest-mtech... · 2016-01-08 · 78 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION

82 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 1, JANUARY 2015

Fig. 8. Comparison of absolute error between the proposed design, the fixed-width RPR with compensation vector β only, and the full-width RPR in the12-bit fixed-width RPR-based ANT multiplier.

Fig. 9. Proposed high-accuracy fixed-width RPR multiplier with compensa-tion constructed by the multiple truncation EC vectors combined ICV togetherwith MICV.

and one minor compensation vector MICV(α) to amendthe insufficient error compensation cases. The compensationvector ICV(β) is realized by directly injecting the partial termsof Xn−1Yn/2, Xn−2Y(n/2)+1, Xn−3Y(n/2)+2, . . . , X(n/2)+2Yn−2.These directly injecting compensation terms are labeled asC1, C2, C3, . . . , C(n/2)−1 in Fig. 9. The other compensationvector used to mend the insufficient error compensation case isconstructed by one conditional controlled OR gate. One inputof OR gate is injected by X(n/2)Yn−1, which is designed torealize the function of compensation vector β. The other inputis conditional controlled by the judgment formula used tojudge whether β = 0 and βl �= 0 as well. As shown in Fig. 8,the term Cm1 is used to judge whether β = 0 or not. The judg-ment function is realized by one NOR gate, while its inputs areXn−1Yn/2, Xn−2Y(n/2)+1, Xn−3Y(n/2)+2, . . . , X(n/2)+2Yn−2.The term Cm2 is used to judge whether βl �= 0. The judgmentfunction is realized by one OR gate, while its inputs areXn−2Yn/2, Xn−3Y(n/2)+1, Xn−4Y(n/2)+2, . . . , X(n/2)+1Yn−2. Ifboth of these two judgments are true, a compensation termCm is generated via a two-input AND gate. Then, Cm isinjected together with X(n/2)Yn−1 into a two-input OR gateto correct the insufficient error compensation. Accordingly,in the case of β = 0 and βl �= 0 as well, one additionalcarry-in signal C(n/2) is injected into the compensationvector to modify the compensation value as β + 1 insteadof β. Moreover, the carry-in signal C(n/2) is injected in thebottom of error compensation vector, which is the farthestlocation away from the critical path. Therefore, not only the

error compensation precision in the fixed-width RPR can beenhanced, the computation delay will also not be postponed.Since the critical supply voltage is dominated by thecritical delay time of the RPR circuit, preserving the criticalpath of RPR not be postponed is very important. Finally,the proposed high-precision fixed-width RPR multipliercircuit is shown in Fig. 9. In our presented fixed-width RPRdesign, the adder cells can be saved by half as compared withthe conventional full-width RPR. Moreover, the proposedhigh-precision fixed-width RPR design can even providehigher precision as compared with the full-width RPR design.

IV. PERFORMANCE COMPARISONS

To evaluate and compare the performance of the proposedfixed-width RPR based ANT design and the previous full-width RPR-based ANT design, we implemented these twoANT designs in a 12-bit by 12-bit multiplier. The mainperformance indexes are the precision of RPR blocks, thesilicon area of RPR blocks, the critical computation delay ofRPR blocks, the error probability of RPR blocks under VOS,and the lowest reliable operating supply voltage under VOS.Through quantitative analysis of experimental data, we candemonstrate that our proposed design can more effectivelyrestrain the soft noise interference resulting from postponedcomputation delay under VOS when the circuit operates witha very low-voltage supply. Moreover, hardware overhead andpower consumption can also be lowered in the proposedfixed-width RPR-based ANT design. Finally, we implementedour proposed 12-bit by 12-bit fixed-width RPR-based ANTmultiplier design in TSMC 90-nm CMOS process technology.

First, we compare the proposed fixed-width multiplier withother literature designs [2], [17], respectively. All performancecomparisons are evaluated under 12-bit ANT-based multiplierdesigns. To evaluate the truncation error compensation per-formance, we define the index of absolute mean error (εavg),mean-square error (εms), maximum truncation error (εmax),and variance of error (εν), respectively. The comparison basisis the fixed-width RPR without any compensation vectorand the definition of various error evaluation indexes areexpressed as

εavg,% = εavg

εavg (Truncated)× 100% (8)

εms,% = εms

εms(Truncated)× 100% (9)

εmax,% = εmax

εmax(Truncated)× 100% (10)

εν,% = εν

εν(Truncated)× 100%. (11)

The smaller value of εavg, εms, εmax, and εν represents themore accurate EC performance. The precision analysis resultsof various fixed-width RPR multipliers or full-width RPR mul-tiplier are shown in Table II. The fixed-width RPR multipliersare the six-bit multipliers while their LSPs are truncated andvarious error compensation vectors are applied. The full-widthRPR multiplier is the six-bit by six-bit multiplier. As shownin Table II, the fixed-width RPR designs usually perform withhigher truncation errors than that of the full-width RPR design

Page 6: 78 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION …kresttechnology.com/krest-academic-projects/krest-mtech... · 2016-01-08 · 78 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION

WEY et al.: RELIABLE LOW-POWER MULTIPLIER DESIGN 83

TABLE II

COMPARISONS OF THE ABSOLUTE MEAN ERROR, THE MEAN-SQUARE

ERROR, AND THE VARIANCE OF ERROR UNDER VARIOUS

RPR-BASED 12-BIT ANT MULTIPLIER DESIGNS

Fig. 10. Comparisons of the output signal SNR for various RPR designswith different truncated bits.

because more computation cells are truncated. However, withappreciate error compensation vector or multiple truncationerror compensation vectors, the fixed-width RPR designs stillhave the chance to perform with lower truncation errors.As shown in Table II, the absolute mean error, the meansquare error, the maximum error, and the variance of errorin our proposed fixed-width RPR multiplier can be lowered to21.39%, 5.57%, 9.18%, and 9.00%, respectively, in the 12-bitby 12-bit ANT multiplier design. All these truncation errorevaluation indexes are the lowest ones as compared with thestate-art-designs of [2] and [17] because multiple truncationEC vectors combined ICV together with MICV are applied tolower the truncation errors based on probability and statisticsanalysis.

To confirm the consistency of the compensation accuracy,we compare the output signal quality in terms of SNR forvarious RPR designs with different truncated bits, which isshown in Fig. 10. In the 12-bit ANT multiplier, the truncatedRPRs are analyzed from word length of five to ten bits. Asshown in Fig. 10, the output SNR of our proposed six-bit fixed-width RPR-based ANT multiplier design can be maintainedto 33.15 dB. As compared with the directly truncated six-bitfixed-width RPR, the signal quality can be improved with13.89-dB enhancement. At the same time, the SNR in thefull-width RPR-based ANT design is 24.28 dB. With varioustruncation bits, the output SNR differences between our pro-posed fixed-width RPR-based ANT multiplier design and theconventional full-width RPR-based ANT design are ranged

Fig. 11. Comparisons of delay profiles for the six-bit fixed-width RPR, thesix-bit full-width RPR, and the 12-bit full-length multiplier.

from 6.83 to 9.13 dB. The analysis results show that theoutput signal quality of the conventional full-width RPR-basedANT design is influenced by the truncation error. However,the output SNR of the our proposed fixed-width RPR-basedANT design can still be maintained high and is not severelyinfluenced by the truncation error because a precise errorcompensation vector is applied.

To lowering supply voltage beyond critical supply voltagewithout sacrificing the throughput and without leading tosevere SNR degradation, the critical computation delay inthe RPR block must be as fast as possible. The shortercomputation delay in RPR can bring the benefits of loweroverscaling supply voltage adopted. The comparison of delayprofiles under various 10 000 input random patterns forthe presented six-bit fixed-width RPR design, the previoussix-bit full-width RPR design, and the standard 12-bit full-length multiplier are shown in Fig. 11. Because the compu-tation word length in the RPR block is less than that in themain block computation unit, the critical computation delayand the average computation delay in the RPR block can all beshorter. In the presented fixed-width RPR design, the criticalcomputation delay and the average computation delay can befurther shortened since the fanout and wires loading of addercells are lowered to be close to half. As shown in Fig. 11, thecritical computation delay in the proposed RPR block can beshortened by 56.3% as compared with the main computationblock. As compared with the previous full-width RPR, thecritical computation delay in our proposed RPR block canalso be shortened by 12.6%.

To determine the optimized bit-length of fixed-RPR, com-parisons of the Kvos and RPR area with different bit lengthof RPR block are shown in Fig. 12. Our goal is to selectthe bit length of fixed-RPR, which can achieve the lowestKvos. The smaller RPR can perform with higher speed, lowerpower consumption, and lower area. However, RPR with lowerbit length would also lead to SNR degradation. Therefore, atradeoff between hardware area and RPR bit length is needed.In this paper, the optimized bit length of fixed-RPR is selectedas six based on the lowest Kvos analysis results shown inFig. 12.

Page 7: 78 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION …kresttechnology.com/krest-academic-projects/krest-mtech... · 2016-01-08 · 78 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION

84 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 1, JANUARY 2015

Fig. 12. Comparisons of the Kvos and RPR area with different bit lengthof RPR block.

Fig. 13. Comparisons of the error probability of various RPR blocks underdifferent VOS scale.

To evaluate the signal quality performance of variousdesigns under VOS, we compare the error probability ofvarious RPR blocks under VOS by injecting 10 000 inputrandom patterns as test bench. Here, we follow the definitionof the VOS factor, Kvos, defined in [2]–[10] to identify thelevel of the supply voltage can be reliably lowered. The Kvosis ranged from one to zero. The lower Kvos means the lowersupply voltage is adopted and lower power is consumed.The error probability of zero represents all the input patternscan be computed reliably and correctly under VOS, whilethe increase of error probability represents that there aresome computation errors occurring in the computation unit.As shown in Fig. 13, the error probability in the standard12-bit full-length multiplier increases sharply as Kvos goesbelow 0.9. Our proposed fixed-width RPR design and the full-width RPR design can more effectively restrain the soft noiseinterference resulting from postponed computation delay underVOS. The sharply increase of error probability in both thesix-bit full-length RPR multiplier and the six-bit fixed-lengthRPR multiplier can be suspended until Kvos goes below 0.65.

To further identify the lowest achievable reliable Kvos,we compare the output SNR of various RPR designs underdifferent Kvos. The desirable SNR is expected to be higherthan 25 dB. As shown in Fig. 14, the output SNR in the

Fig. 14. Comparisons of the output SNR of various RPR designs underdifferent Kvos.

Fig. 15. Comparisons of the total logic synthesis cell area of various RPRdesigns with different bit length.

standard multiplier is lower than 25 dB while the Kvos isscaled below 0.851. The conventional full-width RPR canmaintain the output SNR is higher than 25 dB until the Kvosis scaled below 0.658. The breakdown point of Kvos in ourproposed fixed-width RPR design is further lower than thatof the conventional full-width design since the presented RPRdesign can compute more precisely. As a result, the reliablyKvos can be scaled to 0.623 while the output SNR of ourpresented fixed-width RPR can still be maintained to be higherthan 25 dB. In the latest literature joint residue number systemRPR (JRR)-based ANT design [8] with moduli set (127, 128,129, 257), the reliably Kvos can be further lowered to 0.551while the multiplier output SNR can still be maintained to behigher than 25 dB. It is mainly because that the JRR designcan operate in a higher speed in the main multiplier block.

The RPR area is another key factor that will affect thepower saving. Hence, we compare the circuitry area occupiedby the fixed-width RPR multiplier and the full-width RPRmultiplier. The truncated RPRs are analyzed from word lengthof five to ten bits using Synopsys design complier CAD tool.The total synthesized logic cell silicon area for various RPRdesigns is plotted in Fig. 15. As illustration, the silicon areain the proposed design can be averagely saved by 45.51% ascompared with various word length full-width RPR multiplierdesigns. Under the case of six-bit fixed-width RPR multiplier

Page 8: 78 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION …kresttechnology.com/krest-academic-projects/krest-mtech... · 2016-01-08 · 78 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION

WEY et al.: RELIABLE LOW-POWER MULTIPLIER DESIGN 85

Fig. 16. Chip layout of the proposed 12-bit fixed-width RPR-based ANTmultiplier design.

TABLE III

PERFORMANCE SPECIFICATION SUMMARY OF THE PROPOSED 12-BIT

FIXED-WIDTH RPR-BASED ANT MULTIPLIER DESIGN

design, the chip area occupied by total logic cells can besaved by 44.55% as compared with the six-bit full-width RPRdesign.

Finally, we implement the proposed 12-bit fixed-widthRPR-based ANT multiplier in TSMC 90-nm process, as shownin Fig. 16. The silicon chip area of proposed ANT multipliercircuit is 4616.5 μm2. With lower achievable reliable supplyvoltage and lower silicon area in the presented RPR block,the power consumption in the proposed 12-bit fixed-widthRPR-based ANT multiplier design can be saved by 47.3%as compared with the full-width RPR-based ANT design. Thechip specification is summarized in Table III.

In Fig. 17, we evaluate the power saving performance undervarious ANT designs. In Fig. 17(a), we compare the powerconsumption of various multiplier designs under differentKvos. Under lower Kvos, the power consumption can belowered. However, the hardware complexity increase in theANT-based design would lead to power consumption increase.In the standard multiplier, its Kvos can only be lowered to0.851 and its power consumption is 752 μW under such Kvos.In the proposed ANT design, the RPR design is simpler andwith lower truncation compensation error. Therefore, the Kvoscan be lowered to 0.623 and the power consumption can belowered to 393 μW with 52.3% power saving. At the same

Fig. 17. Power-saving comparisons under various design environments.(a) Comparisons of the power consumption of various multiplier designs underdifferent Kvos. (b) Comparisons of the SNR under different power savingpercentage.

time, the Kvos in the full-width RPR-based ANT design [2]can be lowered to 0.658 and its power consumption can belowered to 435 μW. As for the JRR design, its Kvos can belowered to 0.551. However, it needs four parallel multipliersin the RNS computing system, which occupies larger circuitryarea. Therefore, the lowest power in the JRR design can onlybe lowered to 468 μW. The area overhead of our proposed12-bit fixed-width RPR-based ANT multiplier design is 12.4%as compared with 12-bit standard multiplier, and the poweroverhead is 4.7% in normal voltage supply. However, thepower consumption of proposed multiplier design can be savedby 52.3% as compared with standard multiplier under VOSoperation.

In Fig. 17(b), we further compare the output signalSNR under different power-saving environments. In the JRRdesign [8], its Kvos can be lowered to 0.551 and its lowestpower saving can achieve 59% as compared with the standardmultiplier with normal supply voltage. In the proposed fixed-width RPR-based ANT design, the critical path is loweredwith higher compensation precision and lower hardware area.Therefore, the lowest power saving in the proposed design canachieve 68%, while the full-width RPR-based ANT design canachieve 62%. The desirable SNR here is expected to be 25 dB.For the case the system requirement in signal quality is higherthan 25 dB, the merits in these designs can still be maintained.

Page 9: 78 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION …kresttechnology.com/krest-academic-projects/krest-mtech... · 2016-01-08 · 78 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION

86 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 1, JANUARY 2015

Fig. 18. Comparisons of the output SNR of various process corners underdifferent Kvos.

To further consider the temperature and process variation,performance comparisons of the output SNR under variousprocess corners with different Kvos are shown in Fig. 18.As illustration, we perform the postlayout simulation undertypical–typical corner with 25 °C, fast–fast corner with 0 °C,and slow–slow corner with 100 °C. Under these three designcorners, the Kvos in the proposed ANT design are 0.623,0.582, and 0.683, respectively. The low-voltage low-powermerit in the presented ANT design can still be preserved underprocess deviation and high-temperature environments.

V. CONCLUSION

In this paper, a low-error and area-efficient fixed-widthRPR-based ANT multiplier design is presented. The pro-posed 12-bit ANT multiplier circuit is implemented in TSMC90-nm process and its silicon area is 4616.5 μm2. Under 0.6 Vsupply voltage and 200-MHz operating frequency, the powerconsumption is 0.393 mW. In the presented 12-bit by 12-bitANT multiplier, the circuitry area in our fixed-width RPR canbe saved by 45%, the lowest reliable operating supply voltagein our ANT design can be lowered to 0.623 VDD, and powerconsumption in our ANT design can be saved by 23% ascompared with the state-of-art ANT design.

ACKNOWLEDGMENT

The authors would like to thank National Chip Implemen-tation Center in Taiwan for technical support and supplyingthe EDA tools.

REFERENCES

[1] (2009). The International Technology Roadmap for Semiconductors[Online]. Available: http://public.itrs.net/

[2] B. Shim, S. Sridhara, and N. R. Shanbhag, “Reliable low-power digi-tal signal processing via reduced precision redundancy,” IEEE Trans.Very Large Scale Integr. (VLSI) Syst., vol. 12, no. 5, pp. 497–510,May 2004.

[3] B. Shim and N. R. Shanbhag, “Energy-efficient soft-error tolerant digitalsignal processing,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst.,vol. 14, no. 4, pp. 336–348, Apr. 2006.

[4] R. Hedge and N. R. Shanbhag, “Energy-efficient signal processingvia algorithmic noise-tolerance,” in Proc. IEEE Int. Symp. Low PowerElectron. Des., Aug. 1999, pp. 30–35.

[5] V. Gupta, D. Mohapatra, A. Raghunathan, and K. Roy, “Low-power dig-ital signal processing using approximate adders,” IEEE Trans. Comput.Added Des. Integr. Circuits Syst., vol. 32, no. 1, pp. 124–137, Jan. 2013.

[6] Y. Liu, T. Zhang, and K. K. Parhi, “Computation error analysis indigital signal processing systems with overscaled supply voltage,” IEEETrans. Very Large Scale Integr. (VLSI) Syst., vol. 18, no. 4, pp. 517–526,Apr. 2010.

[7] J. N. Chen, J. H. Hu, and S. Y. Li, “Low power digital signal processingscheme via stochastic logic protection,” in Proc. IEEE Int. Symp. CircuitsSyst., May 2012, pp. 3077–3080.

[8] J. N. Chen and J. H. Hu, “Energy-efficient digital signal processingvia voltage-overscaling-based residue number system,” IEEE Trans.Very Large Scale Integr. (VLSI) Syst., vol. 21, no. 7, pp. 1322–1332,Jul. 2013.

[9] P. N. Whatmough, S. Das, D. M. Bull, and I. Darwazeh, “Circuit-leveltiming error tolerance for low-power DSP filters and transforms,” IEEETrans. Very Large Scale Integr. (VLSI) Syst., vol. 21, no. 6, pp. 12–18,Feb. 2012.

[10] G. Karakonstantis, D. Mohapatra, and K. Roy, “Logic and memorydesign based on unequal error protection for voltage-scalable, robustand adaptive DSP systems,” J. Signal Process. Syst., vol. 68, no. 3,pp. 415–431, 2012.

[11] Y. Pu, J. P. de Gyvez, H. Corporaal, and Y. Ha, “An ultra lowenergy/frame multi-standard JPEG co-processor in 65-nm CMOS withsub/near threshold power supply,” IEEE J. Solid State Circuits, vol. 45,no. 3, pp. 668–680, Mar. 2010.

[12] H. Fuketa, K. Hirairi, T. Yasufuku, M. Takamiya, M. Nomura,H. Shinohara, et al., “12.7-times energy efficiency increase of16-bit integer unit by power supply voltage (VDD) scaling from 1.2Vto 310mV enabled by contention-less flip-flops (CLFF) and separatedVDD between flip-flops and combinational logics,” in Proc. ISLPED,Fukuoka, Japan, Aug. 2011, pp. 163–168.

[13] Y. C. Lim, “Single-precision multiplier with reduced circuit complexityfor signal processing applications,” IEEE Trans. Comput., vol. 41, no. 10,pp. 1333–1336, Oct. 1992.

[14] M. J. Schulte and E. E. Swartzlander, “Truncated multiplication withcorrection constant,” in Proc. Workshop VLSI Signal Process., vol. 6.1993, pp. 388–396.

[15] S. S. Kidambi, F. El-Guibaly, and A. Antoniou, “Area-efficient multi-pliers for digital signal processing applications,” IEEE Trans. CircuitsSyst. II, Exp. Briefs, vol. 43, no. 2, pp. 90–95, Feb. 1996.

[16] J. M. Jou, S. R. Kuang, and R. D. Chen, “Design of low-error fixed-widthmultipliers for DSP applications,” IEEE Trans. Circuits Syst., vol. 46,no. 6, pp. 836–842, Jun. 1999.

[17] S. J. Jou and H. H. Wang, “Fixed-width multiplier for DSP application,”in Proc. IEEE Int. Symp. Comput. Des., Sep. 2000, pp. 318–322.

[18] F. Curticapean and J. Niittylahti, “A hardware efficient direct digitalfrequency synthesizer,” in Proc. 8th IEEE Int. Conf. Electron., Circuits,Syst., vol. 1. Sep. 2001, pp. 51–54.

[19] A. G. M. Strollo, N. Petra, and D. D. Caro, “Dual-tree error compensa-tion for high performance fixed-width multipliers,” IEEE Trans. CircuitsSyst. II, Exp. Briefs, vol. 52, no. 8, pp. 501–507, Aug. 2005.

[20] S. R. Kuang and J. P. Wang, “Low-error configurable truncated mul-tipliers for multiply-accumulate applications,” Electron. Lett., vol. 42,no. 16, pp. 904–905, Aug. 2006.

[21] N. Petra, D. D. Caro, V. Garofalo, N. Napoli, and A. G. M. Strollo,“Truncated binary multipliers with variable correction and minimummean square error,” IEEE Trans. Circuits Syst., vol. 57, no. 6,pp. 1312–1325, Jun. 2010.

[22] I. C. Wey and C. C. Wang, “Low-error and area-efficient fixed-width multiplier by using minor input correction vector,” in Proc.IEEE Int. Conf. Electron. Inf. Eng., vol. 1. Kyoto, Japan, Aug. 2010,pp. 118–122.

I-Chyn Wey was born in Taipei, Taiwan, in 1979.He received the B.S. and M.S. degrees in electronicsengineering from Chang Gung University, Taoyuan,Taiwan, and the Ph.D. degree in electronics engi-neering from National Taiwan University, Taipei, in2001, 2003, and 2008, respectively.

He is currently an Assistant Professor with theDepartment of Electrical Engineering, Chang GungUniversity. His current research interests includeVLSI CMOS circuits design, noise-tolerant CMOScircuits, near-threshold-voltage CMOS circuits, and

ultralow-power CMOS circuits design.

Page 10: 78 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION …kresttechnology.com/krest-academic-projects/krest-mtech... · 2016-01-08 · 78 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION

WEY et al.: RELIABLE LOW-POWER MULTIPLIER DESIGN 87

Chien-Chang Peng was born in Taichung, Taiwan,in 1984. He received the B.S. degree in electronicsengineering from Chung Yuan Christian University,Zhongli, Taiwan, and the M.S. degree in electricalengineering from Chang Gang University, Taoyuan,Taiwan, in 2008 and 2010, respectively, where heis currently pursuing the Ph.D. degree from theDepartment of Electrical Engineering.

His current research interests include VLSI CMOScircuits design, ultralow-power CMOS circuitsdesign, soft-error-tolerant CMOS circuits design,

and noise detection circuits design.

Feng-Yu Liao was born in Taichung, Taiwan, in 1986. He received theM.S. degree in electrical engineering from Chang Gang University, Taoyuan,Taiwan, in 2010.

His current research interests include VLSI CMOS circuit design andultralow-power circuits design.