5
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 1, JANUARY 2015 203 An Accuracy-Adjustment Fixed-Width Booth Multiplier Based on Multilevel Conditional Probability Yuan-Ho Chen Abstract— This brief proposes an accuracy-adjustment fixed-width Booth multiplier that compensates the truncation error using a multilevel conditional probability (MLCP) estimator and derives a closed form for various bit widths L and column information w. Compared with the exhaustive simulations strategy, the proposed MLCP estimator substan- tially reduces simulation time and easily adjusts accuracy based on math- ematical derivations. Unlike previous conditional-probability methods, the proposed MLCP uses entire nonzero code, namely MLCP, to estimate the truncation error and achieve higher accuracy levels. Furthermore, the simple and small MLCP compensated circuit is proposed in this brief. The results of this brief show that the proposed MLCP Booth multipliers achieve low-cost high-accuracy performance. Index Terms— Fixed-width Booth multiplier, multilevel conditional probability (MLCP), truncation error. I. I NTRODUCTION Fixed-width multipliers are widely used in digital signal processing (DSP) applications [1]–[4], such as fast Fourier transform [2] and discrete cosine transform [3], [4]. To generate an output with the same width as the input, fixed-width multipliers truncate the half least significant bits (LSBs) in DSP applications. Thus, truncation errors can occur in fixed-width multiplier designs. The fixed-width multiplier with highest accuracy is called a posttruncated (P-T) multiplier, which truncates half of the LSBs results after calculating all products. However, a P-T multiplier requires a large circuit area to calculate truncation part products. By contrast, a direct-truncated (D-T) multiplier truncates half of the LSBs products directly to conserve circuit area, but produces a large truncation error. To achieve a balanced design between accuracy (P-T) and area cost (D-T), several researchers have presented various error-compensated circuits to alleviate the truncation errors in Baugh–Wooley (BW) multipliers [5]–[10] and Booth multipliers [11]–[21]. Because a few products are truncated after Booth encoding, the multipliers have a smaller truncation error than that of BW multipliers [17]. Therefore, many previous works have focused on the compensated circuit in Booth multipliers [11]–[21]. Song et al. [18] present a binary threshold based on statistical analysis. Their compensated circuit consumes a large circuit area because of the complex curve fitting required for statistical analysis. Wang et al. [19] use more product information to improve accuracy, but their exhaustive simulation required a considerable amount of established time. To reduce the established time for compensated circuits, Li et al. [20] present probability estimator (PEB) that substantially reduces calculation time. An adaptive conditional-probability estimator (ACPE) [21] is presented to improve the accuracy using conditional probability to further induce the column information w for adjusting the accuracy Manuscript received August 23, 2013; revised November 30, 2013 and January 21, 2014; accepted January 22, 2014. Date of publication February 11, 2014; date of current version January 16, 2015. This work was supported by the Chip Implementation Center and National Science Council (NSC) under Project CIC T18-102C-N0001, Project NSC 102-2221-E-033-030, and Project NSC 101-2218-E-033-005. The author is with the Department of Information and Computer Engi- neering, Chung Yuan Christian University, Zhongli 320, Taiwan (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TVLSI.2014.2302447 TABLE I MAPPED TABLE OF A MODIFIED BOOTH ENCODER when applied to types of DSP systems. Therefore, two types of compensated circuits for various w are introduced in [18], and the generalized form of PEB is presented in [17]. In sum, the established time for compensated circuits and adjustment are critical to fixed- width Booth multipliers. This brief proposes an accuracy-adjustment fixed-width Booth multiplier that uses the multilevel conditional probability (MLCP) method to implement the compensated circuit. The MLCP method produces a closed form with various bitwidths L and column information w; thus, the compensated circuit can be established quickly, and the accuracy can be adjusted by changing w. In contrast to the conditional-probability method for ACPE [21], which uses single nonzero code to estimate truncation errors, the proposed MLCP generates estimates by employing all nonzero code, which demonstrates high levels of intercorrelation. Although MLCP method has higher complexity to estimate truncation errors when compared with ACPE one, the accuracy of MLCP method is higher than that of ACPE method. Furthermore, simple and small compensated circuits are proposed from a single compensated closed form. According to the tradeoff between accuracy and circuit area, the MLCP method provides a balance between accuracy and circuit area. The imple- mentation results of this brief show that the proposed MLCP Booth multiplier achieves low-cost high-accuracy performance. The remainder of this brief is organized as follows. Section II presents the fundamental derivation for a Booth multiplier. The derivation and architecture of the proposed MLCP estimator are addressed in Section III. Section IV presents comparisons and a dis- cussion of these approaches, and Section V provides the conclusion. II. FIXED-WIDTH MODIFIED BOOTH MULTIPLIER Modified Booth encoding is commonly used in multiplier designs to reduce the number of partial products [22]. The 2L -bit product P can be expressed in two’s complement representation as follows: A =−a L 1 2 L 1 + L 2 i =0 a i · 2 i B =−b L 1 2 L 1 + L 2 i =0 b i · 2 i P = A × B. (1) Table I lists three concatenated inputs b 2i +1 , b 2i , and b 2i 1 mapped into y i using a Booth encoder, in which the nonzero code z i is an one-bit digit of which the value is determined according to 1063-8210 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

NSS 1-Fixed-Width Booth Multiplier

Embed Size (px)

DESCRIPTION

ieee

Citation preview

Page 1: NSS 1-Fixed-Width Booth Multiplier

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 1, JANUARY 2015 203

An Accuracy-Adjustment Fixed-Width Booth MultiplierBased on Multilevel Conditional Probability

Yuan-Ho Chen

Abstract— This brief proposes an accuracy-adjustment fixed-widthBooth multiplier that compensates the truncation error using a multilevelconditional probability (MLCP) estimator and derives a closed form forvarious bit widths L and column information w. Compared with theexhaustive simulations strategy, the proposed MLCP estimator substan-tially reduces simulation time and easily adjusts accuracy based on math-ematical derivations. Unlike previous conditional-probability methods,the proposed MLCP uses entire nonzero code, namely MLCP, to estimatethe truncation error and achieve higher accuracy levels. Furthermore,the simple and small MLCP compensated circuit is proposed in this brief.The results of this brief show that the proposed MLCP Booth multipliersachieve low-cost high-accuracy performance.

Index Terms— Fixed-width Booth multiplier, multilevelconditional probability (MLCP), truncation error.

I. INTRODUCTION

Fixed-width multipliers are widely used in digital signal processing(DSP) applications [1]–[4], such as fast Fourier transform [2] anddiscrete cosine transform [3], [4]. To generate an output with thesame width as the input, fixed-width multipliers truncate the halfleast significant bits (LSBs) in DSP applications. Thus, truncationerrors can occur in fixed-width multiplier designs. The fixed-widthmultiplier with highest accuracy is called a posttruncated (P-T)multiplier, which truncates half of the LSBs results after calculatingall products. However, a P-T multiplier requires a large circuit areato calculate truncation part products. By contrast, a direct-truncated(D-T) multiplier truncates half of the LSBs products directly toconserve circuit area, but produces a large truncation error.

To achieve a balanced design between accuracy (P-T) and area cost(D-T), several researchers have presented various error-compensatedcircuits to alleviate the truncation errors in Baugh–Wooley (BW)multipliers [5]–[10] and Booth multipliers [11]–[21]. Because a fewproducts are truncated after Booth encoding, the multipliers have asmaller truncation error than that of BW multipliers [17]. Therefore,many previous works have focused on the compensated circuit inBooth multipliers [11]–[21]. Song et al. [18] present a binarythreshold based on statistical analysis. Their compensated circuitconsumes a large circuit area because of the complex curve fittingrequired for statistical analysis. Wang et al. [19] use more productinformation to improve accuracy, but their exhaustive simulationrequired a considerable amount of established time. To reduce theestablished time for compensated circuits, Li et al. [20] presentprobability estimator (PEB) that substantially reduces calculationtime. An adaptive conditional-probability estimator (ACPE) [21] ispresented to improve the accuracy using conditional probability tofurther induce the column information w for adjusting the accuracy

Manuscript received August 23, 2013; revised November 30, 2013 andJanuary 21, 2014; accepted January 22, 2014. Date of publication February 11,2014; date of current version January 16, 2015. This work was supported bythe Chip Implementation Center and National Science Council (NSC) underProject CIC T18-102C-N0001, Project NSC 102-2221-E-033-030, and ProjectNSC 101-2218-E-033-005.

The author is with the Department of Information and Computer Engi-neering, Chung Yuan Christian University, Zhongli 320, Taiwan (e-mail:[email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TVLSI.2014.2302447

TABLE IMAPPED TABLE OF A MODIFIED BOOTH ENCODER

when applied to types of DSP systems. Therefore, two types ofcompensated circuits for various w are introduced in [18], and thegeneralized form of PEB is presented in [17]. In sum, the establishedtime for compensated circuits and adjustment are critical to fixed-width Booth multipliers.

This brief proposes an accuracy-adjustment fixed-width Boothmultiplier that uses the multilevel conditional probability (MLCP)method to implement the compensated circuit. The MLCP methodproduces a closed form with various bitwidths L and columninformation w; thus, the compensated circuit can be establishedquickly, and the accuracy can be adjusted by changing w. In contrastto the conditional-probability method for ACPE [21], which usessingle nonzero code to estimate truncation errors, the proposedMLCP generates estimates by employing all nonzero code, whichdemonstrates high levels of intercorrelation. Although MLCP methodhas higher complexity to estimate truncation errors when comparedwith ACPE one, the accuracy of MLCP method is higher than that ofACPE method. Furthermore, simple and small compensated circuitsare proposed from a single compensated closed form. According tothe tradeoff between accuracy and circuit area, the MLCP methodprovides a balance between accuracy and circuit area. The imple-mentation results of this brief show that the proposed MLCP Boothmultiplier achieves low-cost high-accuracy performance.

The remainder of this brief is organized as follows. Section IIpresents the fundamental derivation for a Booth multiplier. Thederivation and architecture of the proposed MLCP estimator areaddressed in Section III. Section IV presents comparisons and a dis-cussion of these approaches, and Section V provides the conclusion.

II. FIXED-WIDTH MODIFIED BOOTH MULTIPLIER

Modified Booth encoding is commonly used in multiplier designsto reduce the number of partial products [22]. The 2L-bit product Pcan be expressed in two’s complement representation as follows:

A = −aL−12L−1 +L−2∑

i=0

ai · 2i

B = −bL−12L−1 +L−2∑

i=0

bi · 2i

P = A × B. (1)

Table I lists three concatenated inputs b2i+1, b2i , and b2i−1mapped into yi using a Booth encoder, in which the nonzero codezi is an one-bit digit of which the value is determined according to

1063-8210 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: NSS 1-Fixed-Width Booth Multiplier

204 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 1, JANUARY 2015

TABLE IIPARTIAL PRODUCTS FOR AN EIGHT-BIT BOOTH ENCODER

Fig. 1. Partial product array for Booth multiplier.

whether yi equals zero and z consists of zi . Table II shows the partialproducts with corresponding yi for an eight-bit Booth encoder. Afterencoding, the partial product array with an even width L containsQ = L/2 rows.

Fig. 1 shows the partial product array in a Booth multiplier forinducing the column information w, where w indicates the numberof true product columns included in the compensated circuit. Thedefinition of w is the same as that in [21].

III. PROPOSED MLCP ESTIMATOR

The quantized product Pq for a fixed-width multiplier can beexpressed as follows:

P ≈ Pq = M P + T P = M P + σ · 2L (2)

where MP is the main part of multiplier, which uses real partialproducts to calculate results; TP is the truncation part (Fig. 1, shadedregion), which will be truncated using fixed-width multiplication;and σ represents the compensated bias of the MLCP estimator,which consists of TPmj and TPmi parts by performing the roundingoperation Round()

σ = Round(TPmj + TPmi

). (3)

The major term TPmj provides true information and the minor termTPmi can be estimated based on the proposed MLCP method. Thus,the compensated bias σ can be summed by obtaining TPmj andestimating TPmi.

A. Derived MLCP Formula

Fig. 2 shows that the TP can be partitioned into encoding groupset (G) and column set (T). The encoding groups in G are definedas follows:

G0 = 2−L (p0,0 + n0) + · · · + 2−1−w pL−1−w,0

G1 = 2−(L−2)(p0,1 + n1) + · · · + 2−1−w pL−3−w,1

...

G Q−1 = 2−2(p0,Q−1 + nQ−1) (4)

Fig. 2. Truncation part of the proposed Booth multiplier.

Fig. 3. G set of the proposed Booth multiplier with w = 3.

and the column groups in T set are defined as follows:

T1 = 2−1(pL−1,0 + pL−3,1 + · · · + p1,Q−1)

T2 = 2−2(pL−2,0 + pL−4,1 + · · · + nQ−1)

...

TL = 2−L (p0,0 + n0). (5)

With the column information w, the terms TPmj and TPmi can beexpressed as the following equations:

TPmj = T1 + T2 + · · · + Tw (6)

TPmi = G0 + G1 + · · · + Gα (7)

where α = Q−1−�w/2�, ��� represents the flooring operation, TPmjis constructed by summing Ti (w ≥ i ≥ 1), and TPmi consists ofG j , (α ≥ j ≥ 0). Note that the G set changes based on the columninformation w. Fig. 3 shows an example for TP, where w = 3.

The MLCP method proposed in this brief involves using thenonzero code z to establish an MLCP estimator. The expected valueson all elements in TPmi with corresponding nonzero code are derivedfirst. In contrast to the method in [21], the proposed MLCP methodinvolves using nonzero code to estimate TPmi. Therefore, moretruncation errors can be reduced compared with [21], which involvesusing only one nonzero bit. For example, the values L = 8, w = 1,and z = 1111 can be used to calculate the expected value of p0,1

E[p0,1|z = 1111]= P[p0,1 = 1]P[p0,1|z1 = 1]P[z1 = 1|z0 = 1]

=∑

m=±1,−2

⎝∑

n=±1,±2

P[p0,1 = 1]

Page 3: NSS 1-Fixed-Width Booth Multiplier

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 1, JANUARY 2015 205

Fig. 4. Examples of an 8 × 8 MLCP Booth multiplier with w = 1 and nonzero codes 1111, 1110, and 0101. (a) MLCP for z = 1111. (b) MLCP forz = 1110. (c) MLCP for z = 0101.

× P[p0,1|y1 = n]P[y1 = n|y0 = m]⎞

=(

1 × 0 + 1

2× 1

3+ 1

2× 1

3+ 0 × 1

3

)

m=−2

+(

1 × 0 + 1

2× 1

3+ 1

2× 1

3+ 0 × 1

3

)

m=−1

+(

1 × 1

3+ 1

2× 1

3+ 1

2× 1

3+ 0 × 0

)

m=1

= 4

9. (8)

Fig. 4(a) shows the expected values of all elements in TPmi. Then,a regular rule is observed, the expected values for all products withcorresponding z j = 1 are equal to 1/2 (except for p0, j and n j ).The expected values of p0, j and n j depend greatly on the numberof the nonzero code z j = 1; that is, the order of z j code can affectthe expected value, and the expected values for p0, j and n j can besummarized as follows:

E[n j |z], E[p0, j |z] =⎧⎨

13k × 3k+1

2 as k = odd13k × 3k−1

2 as k = even

k =j∑

ik=0

zik (9)

where k is the number of nonzero code z until j th bits. Fig. 4 showsthree examples. The expected values of p0,3 as z = 1111, p0,2 asz = 1110, and n0 as z = 0101 are 40/81, 4/9, and 2/3, respectively

E[p0,3|z = 1111] = 1

3k× 3k − 1

2|k=4 = 40

81(10)

E[p0,2|z = 1110] = 1

3k× 3k − 1

2|k=2 = 4

9(11)

E[n0|z = 0101] = 1

3k× 3k + 1

2|k=1 = 2

3. (12)

B. Proposed Generalized MLCP Format

With the derivation of the MLCP method, the expected value ofeach part in TPmi can be estimated as follows:

TPmi = T P0 + · · · + T Pα

� E[(T P0 + · · · + T Pα)|z]= E[T P0|z] + E[T P1|z] + · · · + E[T Pα|z]= E[T P0|z0] + E[T P1|{z1, z0}] + · · · + E[T Pα |z]= E0 + E1 + · · · + Eα (13)

where the conditional expected values E0, E1, . . . , Eα dependgreatly on the nonzero code nz. Therefore, the conditional expectedvalue can be estimated using (9), and which yields three cases forthe expected value of TPmi.

Case 1: β = odd

TPmi � β

2× 2−w. (14)

Case 2: β = even

TPmi � β − 1

2× 2−w. (15)

Case 3: β = 0

TPmi = 0 (16)

where

β =α∑

i=0

zi . (17)

Because only the carry propagation from TPmi to TPmj must beconsidered, the expected value of TPmi can be simplified as

TPmi =α∑

i=0

Ei � Sone × 2−w (18)

where{

Sone =⌊

β−12

⌋as z �= 00 · · · 0

Sone = 0 as z = 00 . . . 0.(19)

The expected value of TPmi is the function of the number of zi =1 for 0 ≤ i ≤ α, and Sone indicates the sum of nonzero code zwith corresponding w. Thus, the compensated bias can be obtainedby substituting σ in (3) with the expected value of TPmi in (18)and (19)

σ = Round(TPmj + TPmi)

= Round(TPmj + Sone · 2−w). (20)

C. Architecture of the Proposed MLCP Booth Multiplier

With the proposed MLCP formula in (20), the compensated biasσ can be obtained with the corresponding L and w. Fig. 5 showsthat the proposed MLCP Booth multiplier has a Booth encoderaddressed in [19] and a carry-save-adder (CSA) array with 4–2 and3–2 compressors [23]. The compensated circuit sums TPmj and TPmiall together. The proposed MLCP compensated circuit implements(18) using CSA architecture and the function of subtracting one is

Page 4: NSS 1-Fixed-Width Booth Multiplier

206 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 1, JANUARY 2015

Fig. 5. Architecture of the proposed MLCP Booth multiplier for L = 16 and w = 3.

Fig. 6. Usage of MLCP circuit with corresponding w.

designed by adding all one values for twos complement representa-tion. Using L = 16 and w = 3 as an example, the Sone in (19) canbe expressed as follows:

Sone =⌊

β + 4′b1111

2

⌋. (21)

The proposed MLCP circuits depend on α, thus, various word lengthsL and column information w can use the same MLCP circuit. Usingα = 6 as an example, the MLCP circuit (Fig. 5) can be employed,yielding L = 16 with w = 3, L = 16 with w = 2, and L = 14 withw = 1, and so on. Fig. 6 shows the use of the MLCP circuit withcorresponding w.

Because the proposed MLCP method entails using the conditional-probability method, it yields considerable time saved for the com-pensated circuit compared with the exhaustive and time-consumingheuristic simulation methods [13], [14], [18], [19]. Therefore, theproposed MLCP compensated circuit can easily implement a largebitwidth (as L > 16) Booth multiplier and adjust accuracy bychanging the column information w.

IV. COMPARISONS AND DISCUSSION

This section presents a comparison of the accuracy, area cost, andcomputation delay of fixed-width Booth multipliers.

A. Accuracy

In this brief, the average absolute error |ε̄| is presented andcompared for accuracy. The definitions of |ε̄| are as follows:

|ε̄| = E[|P − Pq |]/2L . (22)

Table III shows the |ε̄| for D-T, P-T, the proposed MLCP estimator,and previous works [17]–[19] and [21], respectively. The |ε̄| is themost crucial metric for comparing the accuracy of a fixed-widthBooth multiplier. Table III shows that the proposed MLCP Boothmultipliers achieve high performance with various bitwidth L andcolumn information w. Because of the structure in [19] and [21],

TABLE IIICOMPARISON OF THE AVERAGE ABSOLUTE ERROR |ε̄| VALUES FOR

VARIOUS METHODS

which precalculates the summing of the p0,Q−1 and nQ−1 in thetruncation part, the |ε̄| values in [19] and [21] are more favorablethan that of the proposed MLCP estimator when w = 1; otherwise,the proposed MLCP Booth multiplier demonstrates superior |ε̄|performance for various L and w values. As the column informationw increases, the true partial products of TPmj also increase. It meansthe TPmi, which needs to be estimated, decreases. Thus, based on w

increasing, the accuracy of these methods comes very close.

B. Circuit Performance

Area cost and computation delay are critical in Booth multiplierdesigns. Table IV lists the area and delay of the proposed MLCPestimator and previous designs with various L and w values. Thearea and delay information was implemented using the Synopsysdesign compiler with a TSMC 40-nm CMOS standard cell library tosynthesize the RTL design. All the multipliers are implemented usingthe CSA architecture in Fig. 5 with their own compensated circuit.The methods presented in [18] and [19] involve using exhaustivesimulation to design the compensated circuit, and therefore requirea long simulation time to establish compensated circuit. However,the MLCP estimator and the method in [17] and [21] require usingmathematical derivation to establish a compensated circuit. Thesemethods greatly reduce the simulation time, and can be extendedto long bitwidth multiplier designs. The design in [17] outperformsother circuits in area and delay, but its |ε̄| values are higher thanthose of other circuits. Although the design in [18] has a largestcircuit area and delay, its |ε̄| values are lower than others when

Page 5: NSS 1-Fixed-Width Booth Multiplier

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 1, JANUARY 2015 207

TABLE IVCOMPARISONS OF AREA COST (μm2) AND DELAY (ns) FOR

VARIOUS METHODS

Fig. 7. Chip photomicrograph and characteristics.

w > 1. Compensated circuit designs generally include a tradeoffbetween accuracy and area. The MLCP method obtains a balancebetween accuracy and circuit area, and it further adjusts the accuracyby varying w based on the MLCP formula. Therefore, the proposedMLCP Booth multiplier achieves low cost and flexible accuracy.

C. Chip Implementation

To verify the circuit performance in a real chip, the proposedMLCP Booth multiplier was fabricated using the TSMC 0.18-μmCMOS process. Fig. 7 shows the chip photomicrograph and char-acteristics of the proposed 16 × 16 MLCP Booth multiplier, wherew = 3. To avoid the I/O limited phenomenon in the chip design, thetest module, which is positioned near the proposed MLCP multiplier(Fig. 7), was designed using serial-to-parallel buffers to reduce inputand output ports. The test pattern fed into the proposed core at afrequency of 100 MHz; thus, the proposed core demonstrated a delaypath smaller than 10 ns.

V. CONCLUSION

This brief presents a closed MLCP formula that includes columninformation w to adjust accuracy depending on system requirements.This formula is derived without performing time-consuming andexhaustive simulations, and can be applied to lengthy Booth multi-pliers to achieve high-accuracy performance. Therefore, the proposedMLCP compensated circuit can be used to develop a high-accuracy,low-cost, and flexible fixed-width Booth multiplier.

REFERENCES

[1] K. K. Parhi, VLSI Digital Signal Processing Systems: Design andImplementation. New York, NY, USA: Wiley, 1999.

[2] S. N. Tang, J. W. Tsai, and T. Y. Chang, “A 2.4-Gs/s FFT processor forOFDM-based WPAN applications,” IEEE Trans. Circuits Syst. II, Exp.Briefs, vol. 57, no. 6, pp. 451–455, Jun. 2010.

[3] S. C. Hsia and S. H. Wang, “Shift-register-based data transposition forcost-effective discrete cosine transform,” IEEE Trans. Very Large ScaleIntegr. (VLSI) Syst., vol. 15, no. 6, pp. 725–728, Jun. 2007.

[4] Y. H. Chen, T. Y. Chang, and C. Y. Li, “High throughput DA-based DCTwith high accuracy error-compensated adder tree,” IEEE Trans. VeryLarge Scale Integr. (VLSI) Syst., vol. 19, no. 4, pp. 709–714, Apr. 2011.

[5] L. D. Van and C. C. Yang, “Generalized low-error area-efficient fixed-width multipliers,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52,no. 8, pp. 1608–1619, Aug. 2005.

[6] L. D. Van, S. S. Wang, and W. S. Feng, “Design of the lower errorfixed-width multiplier and its application,” IEEE Trans. Circuits Syst. II,Exp. Briefs, vol. 47, no. 10, pp. 1112–1118, Oct. 2000.

[7] C. H. Chang and R. K. Satzoda, “A low error and high performancemultiplexer-based truncated multiplier,” IEEE Trans. Very Large ScaleIntegr. (VLSI) Syst., vol. 18, no. 12, pp. 1767–1771, Dec. 2010.

[8] N. Petra, D. D. Caro, V. Garofalo, E. Napoli, and A. G. M. Strollo,“Truncated binary multipliers with variable correction and minimummean square error,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57,no. 6, pp. 1312–1325, Jun. 2010.

[9] N. Petra, D. D. Caro, V. Garofalo, E. Napoli, and A. G. M. Strollo,“Design of fixed-width multipliers with linear compensation function,”IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 58, no. 5, pp. 947–960,May 2011.

[10] I. C. Wey and C. C. Wang, “Low-error and hardware-efficient fixed-width multiplier by using the dual-group minor input correction vectorto lower input correction vector compensation error,” IEEE Trans.Very Large Scale Integr. (VLSI) Syst., vol. 20, no. 10, pp. 1923–1928,Oct. 2012.

[11] S. J. Jou, M. H. Tsai, and Y. L. Tsao, “Low-error reduced-width Boothmultipliers for DSP applications,” IEEE Trans. Circuits Syst. I, Reg.Papers, vol. 50, no. 11, pp. 1470–1474, Nov. 2003.

[12] H. A. Huang, Y. C. Liao, and H. C. Chang, “A self-compensation fixed-width Booth multiplier and its 128-point FFT applications,” in Proc.IEEE Int. Symp. Circuits Syst., May 2006, pp. 3538–3541.

[13] Y. H. Chen, T. Y. Chang, and R. Y. Jou, “A statistical error-compensatedBooth multiplier and its DCT applications,” in Proc. IEEE Region 10Conf., Nov. 2010, pp. 1146–1149.

[14] T. B. Juang and S. F. Hsiao, “Low-error carry-free fixed-width multipli-ers with low-cost compensation circuits,” IEEE Trans. Circuits Syst. II,Exp. Briefs, vol. 52, no. 6, pp. 299–303, Jun. 2005.

[15] K. J. Cho, K. C. Lee, J. G. Chung, and K. K. Parhi, “Design of low-errorfixed-width modified Booth multiplier,” IEEE Trans. Very Large ScaleIntegr. (VLSI) Syst., vol. 12, no. 5, pp. 522–531, May 2004.

[16] S. R. Kuang, J. P. Wang, and C. Y. Guo, “Modified Booth multiplierswith a regular partial product array,” IEEE Trans. Circuits Syst. II, Exp.Briefs, vol. 56, no. 5, pp. 404–408, May 2009.

[17] Y. H. Chen, C. Y. Li, and T. Y. Chang, “Area-effective and power-efficient fixed-width Booth multipliers using generalized probabilisticestimation bias,” IEEE J. Emerging Sel. Topics Circuits Syst., vol. 1,no. 3, pp. 277–288, Sep. 2011.

[18] M. A. Song, L. D. Van, and S. Y. Kuo, “Adaptive low-error fixed-widthBooth multipliers,” IEICE Trans. Fundam., vol. A, no. 6, pp. 1180–1187,Jun. 2007.

[19] J. P. Wang, S. R. Kuang, and S. C. Liang, “High-accuracy fixed-widthmodified Booth multipliers for lossy applications,” IEEE Trans. VeryLarge Scale Integr. (VLSI) Syst., vol. 19, no. 1, pp. 52–60, Jan. 2011.

[20] C. Y. Li, Y. H. Chen, T. Y. Chang, and J. N. Chen, “A probabilisticestimation bias circuit for fixed-width Booth multiplier and its DCTapplications,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 58, no. 4,pp. 215–219, Apr. 2011.

[21] Y. H. Chen and T. Y. Chang, “A high-accuracy adaptive conditional-probability estimator for fixed-width Booth multipliers,” IEEE Trans.Circuits Syst. I, Reg. Papers, vol. 59, no. 3, pp. 594–603, Mar. 2012.

[22] B. Parhami, Computer Arithmetic: Algorithms and Hardware Designs.Oxford, U.K.: Oxford Univ. Press, 2000.

[23] C. H. Chang, J. Gu, and M. Zhang, “Ultra low-voltage low-powerCMOS 4-2 and 5-2 compressors for fast arithmetic circuits,” IEEE Trans.Circuits Syst. I, Reg. Papers, vol. 51, no. 10, pp. 1985–1997, Oct. 2004.

[24] Y. Wang, J. Ostermann, and Y. Zhang, Video Processing and Commu-nications, 1st ed. Upper Saddle River, NJ, USA: Prentice-Hall, 2002.