37
IEE5049-Spring 2012 Digital Integrated Circuit Low Power Finite Impulse Response Filter in PTM 90nm Wei-Lun Liu(0050209)/Tao-Yi Lee (0050184) Institute of Electronics Engineering National Chiao Tung University

NCTu DIC 2012 term report

Embed Size (px)

Citation preview

Page 1: NCTu DIC 2012 term report

IEE5049-Spring 2012 Digital Integrated Circuit

Low Power Finite Impulse Response Filter

in PTM 90nmWei-Lun Liu(0050209)/Tao-Yi Lee (0050184)

Institute of Electronics Engineering

National Chiao Tung University

Page 2: NCTu DIC 2012 term report

Outline

Motivation

Review of Low Power Techniques

Design of Finite Impulse Filter

Performance Summary

Future Works

Conclusion

22012 IEE5049 Digital Integrated Circuit

Page 3: NCTu DIC 2012 term report

Motivation-Battery Storage: A Limiting Factor

Basic technology has evolved little: Controlled by the periodic table

store energy using a chemical reaction

Battery capacity increases between 3~7%/year

Limiting Factors:

Energy density

Battery Size

Safe handling

32012 IEE5049 Digital Integrated Circuit

Material MJ/kg

𝑈235 79500000

Gasoline 47.2

Lithium battery 1.3

Lithium-ion battery 0.72

Methanol 19.7

Source of data: Wikipedia

Page 4: NCTu DIC 2012 term report

Evolution of Commercialized Battery Tech.

42012 IEE5049 Digital Integrated Circuit

Accelerated circa 1990,

slower than the growth of

power consumption in IC

~ linear growth

Source of data: Wikipedia

Page 5: NCTu DIC 2012 term report

Evolution of Power Consumption in Intel Proceesor

52012 IEE5049 Digital Integrated Circuit

Source of data: Dr. Shekhar Borkar, Intel (Re-plotted)

Exponential growth circa

1990

Page 6: NCTu DIC 2012 term report

Previous Works

Categorized by Design Levels (Gajski-Kuhn Y-Chart)

System/ApplicationAlgorithm

Micro-ArchitectureMultiplexing in Time or Multiplexing in Space

LogicLogic Family, Standard Cell/Full Custom

CircuitSizing, Thresholds

DeviceSemiconductor Process: Bulk CMOS, Silicone on Insulator,

etc.

62012 IEE5049 Digital Integrated Circuit

Page 7: NCTu DIC 2012 term report

Previous Works – Circuit Level: Multiple 𝑽𝑫𝑫

Multiple VDDs Kuroda, T.; , "Optimization and control of VDD and VTH for low-power, high-

speed CMOS design," ICCAD 2002

Hamada, M.; Ootaguro, Y.; Kuroda, T.; , "Utilizing surplus timing for power reduction," CICC 2001

72012 IEE5049 Digital Integrated Circuit

Page 8: NCTu DIC 2012 term report

Previous Works – Multiple 𝑽𝑻𝑯Multiple Threshold Partitioning: Independent body

biasing; Variable doping levels provided by process

Effect of body biasing diminishes for 𝐿 < 100nm

𝑉𝑇𝐻 = 𝑉𝑇𝐻0 + 𝜸 −2𝜙𝐹 + 𝑉𝑆𝐵 − −2𝜙𝐹

82012 IEE5049 Digital Integrated Circuit

Page 9: NCTu DIC 2012 term report

Previous Works – Power Gating + Clock Gating Mahmoodi, H.; Tirumalashetty, V.; Cooke, M.; Roy, K.; , "Ultra Low-Power Clocking Scheme Using Energy Recovery and Clock Gating," VLSI Systems, IEEE Transactions

on , Jan. 2009 Mueller, M.; Wortmann, A.; Simon, S.; Kugel, M.; Schoenauer, T.; , "The impact of clock gating schemes on the power dissipation of synthesizable register files," ISCAS,

May 2004 Wimer, S.; Koren, I.; , "The Optimal Fan-Out of Clock Network for Power Minimization by Adaptive Gating," VLSI Systems, IEEE Transactions on , vol.PP, no.99, pp.1-9, 0 Jianchao Lu; Taskin, B.; , "Reconfigurable clock polarity assignment for peak current reduction of clock-gated circuits," ISCAS, May 2011 Shaker, M.O.; Bayoumi, M.A.; , "A clock gated flip-flop for low power applications in 90 nm CMOS," ISCAS, vol., no., pp.558-562, 15-18 May 2011 Hai Li; Bhunia, S.; Yiran Chen; Roy, K.; Vijaykumar, T.N.; , "DCG: deterministic clock-gating for low-power microprocessor design," VLSI Systems, IEEE Transactions on,

March 2004 Jaewon Oh; Pedram, M.; , "Gated clock routing for low-power microprocessor design," ICCAD, IEEE Transactions on , vol.20, no.6, pp.715-722, Jun 2001 Sathyamurthy, H.; Sapatnekar, S.S.; Fishburn, J.P.; , "Speeding up pipelined circuits through a combination of gate sizing and clock skew optimization," ICCAD, IEEE

Transactions on , Feb 1998 Sanghyeon Baeg; , "Delay Fault Coverage Enhancement by Partial Clocking for Low-Power Designs With Heavily Gated Clocks," ICCAD, IEEE Transactions on , vol.26,

no.12, pp.2215-2221, Dec. 2007

92012 IEE5049 Digital Integrated Circuit

D Q

Clk

D Q

Clk

D Q

Clk

D Q

Clk

CLKEN

DQ

D Q

Clk

D Q

Clk

D Q

Clk

D Q

Clk

CLK

EN

D

D Q

Clk

Q

always @ (posedge CLK)

if(EN)

Q<= D;

Page 10: NCTu DIC 2012 term report

Previous Works – Gate Level Power Optimization

102012 IEE5049 Digital Integrated Circuit

V. Stojanovic, et. al.,”Energy-Delay Tradeoffs in Combinational Logic using Gate Sizing and Supply Voltage Optimization”

i

VDDi

Wi

piWi

i+1

VDDi+1

Wi+1 pi+1Wi+1

Wwire

Page 11: NCTu DIC 2012 term report

Previous Works – Gate Level Power Optimization

td =𝐾𝑑𝑉𝑑𝑑𝑉𝑑𝑑 − 𝑉𝑜𝑛

𝛼𝑑

𝑊𝑜𝑢𝑡𝑊𝑖𝑛+𝑊𝑝𝑎𝑟

𝑊𝑖𝑛

=𝐾𝑑𝑉𝑑𝑑𝑉𝑑𝑑 − 𝑉𝑜𝑛

𝛼𝑑ℎ +𝑝

𝑔= 𝜏𝑛𝑜𝑚 ∙ 𝑔 ∙ ℎ +

𝑝

𝑔

g =Win

Winv

p =Wpar

Winv

h =Wout

Win

112012 IEE5049 Digital Integrated Circuit

Page 12: NCTu DIC 2012 term report

Previous Works – Gate Level Power Optimization

D = i 𝑡𝑑,𝑖

=𝐾𝑑𝑉𝑑𝑑𝑉𝑑𝑑 − 𝑉𝑜𝑛

𝛼𝑑

…+𝑊𝑖𝑊𝑖−1+𝑊𝑝𝑎𝑟,𝑖−1

𝑊𝑖−1+𝑊𝑖+1𝑊𝑖+𝑊𝑝𝑎𝑟,𝑖

𝑊𝑖

+𝑊𝑖+2𝑊𝑖+1+𝑊𝑝𝑎𝑟,𝑖+1

𝑊𝑖+1+⋯

Assuming Wpar,i is proportional to Wi, therefore 𝑊𝑝𝑎𝑟,𝑖

𝑊𝑖is

approximately a constant,

⇒𝜕D

𝜕Wi= 𝜏𝑛𝑜𝑚𝑔𝑖

1

Wiℎ𝑖−1 − ℎ𝑖

122012 IEE5049 Digital Integrated Circuit

Page 13: NCTu DIC 2012 term report

Previous Works – Gate Level Power Optimization

ESW = α0→1𝐾𝑒 𝑊𝑜𝑢𝑡 +𝑊𝑝𝑎𝑟 𝑉𝑑𝑑2

The sensitivity of energy to delay due to the sizing of stage i within a logic block is 𝜕ESW𝜕Wi=𝜕

𝜕Wi

𝑖

𝛼0→1𝐾𝑒𝑉𝑑𝑑2 𝑊𝑖+1 +𝑊𝑝𝑎𝑟,𝑖

= 𝛼0→1𝐾𝑒𝑉𝑑𝑑2 1 + 𝑝𝑖 = α0→1𝐾𝑒pVdd

2

132012 IEE5049 Digital Integrated Circuit

Page 14: NCTu DIC 2012 term report

Previous Works – Gate Level Power Optimization

𝜕ESW𝜕Wi𝜕D

𝜕Wi

=𝛼0→1𝐾𝑒𝑉𝑑𝑑

2 1+𝑝𝑖

𝜏𝑛𝑜𝑚𝑔𝑖1

Wiℎ𝑖−1−ℎ𝑖

= −eci

τnom(fi−fi−1)

Energy stored on the logic gate ieci = 𝛼0→1KeWi Vdd,i−1

2 + pnom,iVdd,i2

When 𝑓𝑖 = 𝑓𝑖−1, least delay is achieved

142012 IEE5049 Digital Integrated Circuit

Ener

gy

Dmin Delay

Page 15: NCTu DIC 2012 term report

Previous Works – Cost /Benefit Comparison

Tech.Power

Benefit

Timing

Penalty

Area

Penalty

Impact:

Arch.

Impact:

Design

Impact:

Verification

Impact:

P&R

Multi 𝑉𝑇𝐻 Medium Little Little Low Low None Low

Clock

Gating

MediumLittle Little Low Low None Low

Multi 𝑉𝑑𝑑 Large Some Little High Medium Low Medium

152012 IEE5049 Digital Integrated Circuit

Page 16: NCTu DIC 2012 term report

Characterization of PTM 90nm 𝑰𝑫-𝑽𝒈𝒔

162012 IEE5049 Digital Integrated Circuit

Drain Induced Barrier Lowering

(DIBL) is observed

Page 17: NCTu DIC 2012 term report

Characterization of PTM 90nm 𝑰𝑫-𝑽𝒅𝒔

172012 IEE5049 Digital Integrated Circuit

𝛼 → 1 is observed

Page 18: NCTu DIC 2012 term report

Characterization of PTM 90nm 𝒈𝒎-𝑽𝒈𝒔

182012 IEE5049 Digital Integrated Circuit

Minimum Widths and Lengths

Page 19: NCTu DIC 2012 term report

Architecture

Item Specification

FIR impulse response length (Window) 4 bit

Quantization Levels 16 (4 bit)

192012 IEE5049 Digital Integrated Circuit

z-1z-1 z-1

h[0] h[1] h[2] h[3]

x[n]

y[n]

Page 20: NCTu DIC 2012 term report

Multirate Decomposition

Each signal path is operating in lowest clock rate

Multiplexing in space

𝐻 𝑧 =

𝑖=0

𝑛−1

ℎ𝑖 ∙ 𝑧−𝑖

is divided into

𝐻𝑘 𝑧𝑁 =

𝑖=0

𝑛𝑁−1

ℎ𝑘+𝑁∙𝑖 ∙ 𝑧−𝑁𝑖

202012 IEE5049 Digital Integrated Circuit

Reference: P. P. Vaidyanathan

, “Multirate Systems And Filter Banks”

Page 21: NCTu DIC 2012 term report

Multiplier Design

Multiplication is very hardware costly.

More partial product will result in more complex adder tree

To reduce the hardware cost, we should reduce the number of partial product

Different way of numerical representation

212012 IEE5049 Digital Integrated Circuit

Page 22: NCTu DIC 2012 term report

Numerical Representation

Two’s complement

Booth Radix

Canonical Signed Digit

222012 IEE5049 Digital Integrated Circuit

Page 23: NCTu DIC 2012 term report

Two’s Compliment

Two’s complement expression is very common in digital arithmetic number expression.

The number of 1’s become huge when larger numeric field is being concerned

232012 IEE5049 Digital Integrated Circuit

-1510=000100012=24+20

510=000001012=22+20

-110=111111112 =-27+26+25+24+23+22+21+20

-110=11111111111111112

3058310=01110111011101112

Page 24: NCTu DIC 2012 term report

Booth Radix

Booth Radix is a kind a encode that analyzes pairs of adjacent digits in two’s complement.

The coefficient of each bit can be found by table lookup

The number of non-zero digits are less than two’s complement

242012 IEE5049 Digital Integrated Circuit

-110=000000000000000-2=-20

3058310= +00-+00-+00-+00-2 =215-212+211-28+27-24+23-20

Page 25: NCTu DIC 2012 term report

Canonical Signed Digit(CSD)

Canonical Signed Digit is a method that represent digital number using ternary digit set (+,0,-) in the minimum number of non-zero digits.

There is always an unique CSD representation of every number.

252012 IEE5049 Digital Integrated Circuit

-110=000000000000000-2=-20

3058310= +000-000-000-00-2 =215-211-27-23-20

Page 26: NCTu DIC 2012 term report

Algorithm

262012 IEE5049 Digital Integrated Circuit

Start

i = 0Carry = 0X*n = Xn-1

Carry=0?

Xi+1 Xi Xi+1 Xi

Ci = -1Carry = 1

i < n ?

End

Ci = 0Carry = 1

Ci = 1Carry = 0

Ci = 0Carry = 0

i = i+1

Yes

NoYes

00 , 10 11 10 01 , 11

0001

No

Page 27: NCTu DIC 2012 term report

Comparison

Form # of non-zero’s Decoding method Usage

Two’s Complement Most N/ANormal digit

representation

Booth Radix MiddleParallel

(inter-bit irrelevant)

Variable

Multiplier

CSD Less iterativeConstant

Multiplier

272012 IEE5049 Digital Integrated Circuit

Since the coefficient of FIR filter is known, we choose CSD form to implement constant multiplier

Page 28: NCTu DIC 2012 term report

Common Sub-expression Elimination(CSE)

A method that treats repeated redundant evaluations (namely Common Sub-expressions) among different multiplier as an single operation.

282012 IEE5049 Digital Integrated Circuit

3058310= +000-000-000-00-2

3979910= +0+00-00-000-00-2

X = +0000000-000-00-2

3058310= X + (0000-00000000000)2 = X + 204810

3979910= X + (00+00-0000000000)2 = X + 716810

A * 30583 = A * X + A * 2048

A * 39799 = A * X + A * 7168

Page 29: NCTu DIC 2012 term report

Common Sub-expression Elimination(CSE)

Though CSE is a powerful hardware reducing technique, it has its own disadvantage.

Pros: Could reduce the total hardware needed

Cons: Critical path will be prolonged

We should constrain the maximum level of CSE to limit the critical path

292012 IEE5049 Digital Integrated Circuit

Page 30: NCTu DIC 2012 term report

Results and Conclusion

Dynamic Power Consumption

302012 IEE5049 Digital Integrated Circuit

Supply FIR_CSD FIR_2’s Complement

1V 0.75673mW

0.5V 0.05437mW

Supply FIR_CSD FIR_2’s Complement

1V 0.75673mW

0.5V 0.05437mW

Propagation Delay

Page 31: NCTu DIC 2012 term report

Results and Conclusion

The multiplier structure have a very significant influence on the total power/delay/area issue

Choosing a correct logic family could also help in adjusting the performance of multiplier

312012 IEE5049 Digital Integrated Circuit

Page 32: NCTu DIC 2012 term report

Future Works

Take adder structure into consideration

Applied level constrained CSE algorithm

Implement a practical FIR with a specified function

322012 IEE5049 Digital Integrated Circuit

Page 33: NCTu DIC 2012 term report

References

Roy, S.; Nipun, M.M.K.; Wikner, J.J.; , "Ultra-low power FIR filter using STSC-CVL logic," IC Design & Technology (ICICDT), 2011 IEEE International Conference on , vol., no., pp.1-4, 2-4 May 2011

Alan V. Oppenheim, “Discrete-Time Signal Processing,” 2nd ed.

M. Keating, et. al., “Low Power Methodology Manual, for System-on-Chip Design,” Springer 2008.

J. Rabaey, “Low Power Design Essentials,” Springer, 2009

332012 IEE5049 Digital Integrated Circuit

Page 34: NCTu DIC 2012 term report

FIR_CSD @ VDD=1V, pseudo random input

342012 IEE5049 Digital Integrated Circuit

Page 35: NCTu DIC 2012 term report

FIR_CSD @ VDD=0.5, pseudo random input

352012 IEE5049 Digital Integrated Circuit

Page 36: NCTu DIC 2012 term report

FIR_2’ Comp. @ VDD=1V, pseudo random input

362012 IEE5049 Digital Integrated Circuit

Page 37: NCTu DIC 2012 term report

FIR_2’ Comp. @ VDD=0.5V, pseudo random input

372012 IEE5049 Digital Integrated Circuit