NCTu DIC 2012 term report

Preview:

Citation preview

IEE5049-Spring 2012 Digital Integrated Circuit

Low Power Finite Impulse Response Filter

in PTM 90nmWei-Lun Liu(0050209)/Tao-Yi Lee (0050184)

Institute of Electronics Engineering

National Chiao Tung University

Outline

Motivation

Review of Low Power Techniques

Design of Finite Impulse Filter

Performance Summary

Future Works

Conclusion

22012 IEE5049 Digital Integrated Circuit

Motivation-Battery Storage: A Limiting Factor

Basic technology has evolved little: Controlled by the periodic table

store energy using a chemical reaction

Battery capacity increases between 3~7%/year

Limiting Factors:

Energy density

Battery Size

Safe handling

32012 IEE5049 Digital Integrated Circuit

Material MJ/kg

π‘ˆ235 79500000

Gasoline 47.2

Lithium battery 1.3

Lithium-ion battery 0.72

Methanol 19.7

Source of data: Wikipedia

Evolution of Commercialized Battery Tech.

42012 IEE5049 Digital Integrated Circuit

Accelerated circa 1990,

slower than the growth of

power consumption in IC

~ linear growth

Source of data: Wikipedia

Evolution of Power Consumption in Intel Proceesor

52012 IEE5049 Digital Integrated Circuit

Source of data: Dr. Shekhar Borkar, Intel (Re-plotted)

Exponential growth circa

1990

Previous Works

Categorized by Design Levels (Gajski-Kuhn Y-Chart)

System/ApplicationAlgorithm

Micro-ArchitectureMultiplexing in Time or Multiplexing in Space

LogicLogic Family, Standard Cell/Full Custom

CircuitSizing, Thresholds

DeviceSemiconductor Process: Bulk CMOS, Silicone on Insulator,

etc.

62012 IEE5049 Digital Integrated Circuit

Previous Works – Circuit Level: Multiple 𝑽𝑫𝑫

Multiple VDDs Kuroda, T.; , "Optimization and control of VDD and VTH for low-power, high-

speed CMOS design," ICCAD 2002

Hamada, M.; Ootaguro, Y.; Kuroda, T.; , "Utilizing surplus timing for power reduction," CICC 2001

72012 IEE5049 Digital Integrated Circuit

Previous Works – Multiple 𝑽𝑻𝑯Multiple Threshold Partitioning: Independent body

biasing; Variable doping levels provided by process

Effect of body biasing diminishes for 𝐿 < 100nm

𝑉𝑇𝐻 = 𝑉𝑇𝐻0 + 𝜸 βˆ’2πœ™πΉ + 𝑉𝑆𝐡 βˆ’ βˆ’2πœ™πΉ

82012 IEE5049 Digital Integrated Circuit

Previous Works – Power Gating + Clock Gating Mahmoodi, H.; Tirumalashetty, V.; Cooke, M.; Roy, K.; , "Ultra Low-Power Clocking Scheme Using Energy Recovery and Clock Gating," VLSI Systems, IEEE Transactions

on , Jan. 2009 Mueller, M.; Wortmann, A.; Simon, S.; Kugel, M.; Schoenauer, T.; , "The impact of clock gating schemes on the power dissipation of synthesizable register files," ISCAS,

May 2004 Wimer, S.; Koren, I.; , "The Optimal Fan-Out of Clock Network for Power Minimization by Adaptive Gating," VLSI Systems, IEEE Transactions on , vol.PP, no.99, pp.1-9, 0 Jianchao Lu; Taskin, B.; , "Reconfigurable clock polarity assignment for peak current reduction of clock-gated circuits," ISCAS, May 2011 Shaker, M.O.; Bayoumi, M.A.; , "A clock gated flip-flop for low power applications in 90 nm CMOS," ISCAS, vol., no., pp.558-562, 15-18 May 2011 Hai Li; Bhunia, S.; Yiran Chen; Roy, K.; Vijaykumar, T.N.; , "DCG: deterministic clock-gating for low-power microprocessor design," VLSI Systems, IEEE Transactions on,

March 2004 Jaewon Oh; Pedram, M.; , "Gated clock routing for low-power microprocessor design," ICCAD, IEEE Transactions on , vol.20, no.6, pp.715-722, Jun 2001 Sathyamurthy, H.; Sapatnekar, S.S.; Fishburn, J.P.; , "Speeding up pipelined circuits through a combination of gate sizing and clock skew optimization," ICCAD, IEEE

Transactions on , Feb 1998 Sanghyeon Baeg; , "Delay Fault Coverage Enhancement by Partial Clocking for Low-Power Designs With Heavily Gated Clocks," ICCAD, IEEE Transactions on , vol.26,

no.12, pp.2215-2221, Dec. 2007

92012 IEE5049 Digital Integrated Circuit

D Q

Clk

D Q

Clk

D Q

Clk

D Q

Clk

CLKEN

DQ

D Q

Clk

D Q

Clk

D Q

Clk

D Q

Clk

CLK

EN

D

D Q

Clk

Q

always @ (posedge CLK)

if(EN)

Q<= D;

Previous Works – Gate Level Power Optimization

102012 IEE5049 Digital Integrated Circuit

V. Stojanovic, et. al.,”Energy-Delay Tradeoffs in Combinational Logic using Gate Sizing and Supply Voltage Optimization”

i

VDDi

Wi

piWi

i+1

VDDi+1

Wi+1 pi+1Wi+1

Wwire

Previous Works – Gate Level Power Optimization

td =𝐾𝑑𝑉𝑑𝑑𝑉𝑑𝑑 βˆ’ π‘‰π‘œπ‘›

𝛼𝑑

π‘Šπ‘œπ‘’π‘‘π‘Šπ‘–π‘›+π‘Šπ‘π‘Žπ‘Ÿ

π‘Šπ‘–π‘›

=𝐾𝑑𝑉𝑑𝑑𝑉𝑑𝑑 βˆ’ π‘‰π‘œπ‘›

π›Όπ‘‘β„Ž +𝑝

𝑔= πœπ‘›π‘œπ‘š βˆ™ 𝑔 βˆ™ β„Ž +

𝑝

𝑔

g =Win

Winv

p =Wpar

Winv

h =Wout

Win

112012 IEE5049 Digital Integrated Circuit

Previous Works – Gate Level Power Optimization

D = i 𝑑𝑑,𝑖

=𝐾𝑑𝑉𝑑𝑑𝑉𝑑𝑑 βˆ’ π‘‰π‘œπ‘›

𝛼𝑑

…+π‘Šπ‘–π‘Šπ‘–βˆ’1+π‘Šπ‘π‘Žπ‘Ÿ,π‘–βˆ’1

π‘Šπ‘–βˆ’1+π‘Šπ‘–+1π‘Šπ‘–+π‘Šπ‘π‘Žπ‘Ÿ,𝑖

π‘Šπ‘–

+π‘Šπ‘–+2π‘Šπ‘–+1+π‘Šπ‘π‘Žπ‘Ÿ,𝑖+1

π‘Šπ‘–+1+β‹―

Assuming Wpar,i is proportional to Wi, therefore π‘Šπ‘π‘Žπ‘Ÿ,𝑖

π‘Šπ‘–is

approximately a constant,

β‡’πœ•D

πœ•Wi= πœπ‘›π‘œπ‘šπ‘”π‘–

1

Wiβ„Žπ‘–βˆ’1 βˆ’ β„Žπ‘–

122012 IEE5049 Digital Integrated Circuit

Previous Works – Gate Level Power Optimization

ESW = Ξ±0β†’1𝐾𝑒 π‘Šπ‘œπ‘’π‘‘ +π‘Šπ‘π‘Žπ‘Ÿ 𝑉𝑑𝑑2

The sensitivity of energy to delay due to the sizing of stage i within a logic block is πœ•ESWπœ•Wi=πœ•

πœ•Wi

𝑖

𝛼0β†’1𝐾𝑒𝑉𝑑𝑑2 π‘Šπ‘–+1 +π‘Šπ‘π‘Žπ‘Ÿ,𝑖

= 𝛼0β†’1𝐾𝑒𝑉𝑑𝑑2 1 + 𝑝𝑖 = Ξ±0β†’1𝐾𝑒pVdd

2

132012 IEE5049 Digital Integrated Circuit

Previous Works – Gate Level Power Optimization

πœ•ESWπœ•Wiπœ•D

πœ•Wi

=𝛼0β†’1𝐾𝑒𝑉𝑑𝑑

2 1+𝑝𝑖

πœπ‘›π‘œπ‘šπ‘”π‘–1

Wiβ„Žπ‘–βˆ’1βˆ’β„Žπ‘–

= βˆ’eci

Ο„nom(fiβˆ’fiβˆ’1)

Energy stored on the logic gate ieci = 𝛼0β†’1KeWi Vdd,iβˆ’1

2 + pnom,iVdd,i2

When 𝑓𝑖 = π‘“π‘–βˆ’1, least delay is achieved

142012 IEE5049 Digital Integrated Circuit

Ener

gy

Dmin Delay

Previous Works – Cost /Benefit Comparison

Tech.Power

Benefit

Timing

Penalty

Area

Penalty

Impact:

Arch.

Impact:

Design

Impact:

Verification

Impact:

P&R

Multi 𝑉𝑇𝐻 Medium Little Little Low Low None Low

Clock

Gating

MediumLittle Little Low Low None Low

Multi 𝑉𝑑𝑑 Large Some Little High Medium Low Medium

152012 IEE5049 Digital Integrated Circuit

Characterization of PTM 90nm 𝑰𝑫-π‘½π’ˆπ’”

162012 IEE5049 Digital Integrated Circuit

Drain Induced Barrier Lowering

(DIBL) is observed

Characterization of PTM 90nm 𝑰𝑫-𝑽𝒅𝒔

172012 IEE5049 Digital Integrated Circuit

𝛼 β†’ 1 is observed

Characterization of PTM 90nm π’ˆπ’Ž-π‘½π’ˆπ’”

182012 IEE5049 Digital Integrated Circuit

Minimum Widths and Lengths

Architecture

Item Specification

FIR impulse response length (Window) 4 bit

Quantization Levels 16 (4 bit)

192012 IEE5049 Digital Integrated Circuit

z-1z-1 z-1

h[0] h[1] h[2] h[3]

x[n]

y[n]

Multirate Decomposition

Each signal path is operating in lowest clock rate

Multiplexing in space

𝐻 𝑧 =

𝑖=0

π‘›βˆ’1

β„Žπ‘– βˆ™ π‘§βˆ’π‘–

is divided into

π»π‘˜ 𝑧𝑁 =

𝑖=0

π‘›π‘βˆ’1

β„Žπ‘˜+π‘βˆ™π‘– βˆ™ π‘§βˆ’π‘π‘–

202012 IEE5049 Digital Integrated Circuit

Reference: P. P. Vaidyanathan

, β€œMultirate Systems And Filter Banks”

Multiplier Design

Multiplication is very hardware costly.

More partial product will result in more complex adder tree

To reduce the hardware cost, we should reduce the number of partial product

Different way of numerical representation

212012 IEE5049 Digital Integrated Circuit

Numerical Representation

Two’s complement

Booth Radix

Canonical Signed Digit

222012 IEE5049 Digital Integrated Circuit

Two’s Compliment

Two’s complement expression is very common in digital arithmetic number expression.

The number of 1’s become huge when larger numeric field is being concerned

232012 IEE5049 Digital Integrated Circuit

-1510=000100012=24+20

510=000001012=22+20

-110=111111112 =-27+26+25+24+23+22+21+20

-110=11111111111111112

3058310=01110111011101112

Booth Radix

Booth Radix is a kind a encode that analyzes pairs of adjacent digits in two’s complement.

The coefficient of each bit can be found by table lookup

The number of non-zero digits are less than two’s complement

242012 IEE5049 Digital Integrated Circuit

-110=000000000000000-2=-20

3058310= +00-+00-+00-+00-2 =215-212+211-28+27-24+23-20

Canonical Signed Digit(CSD)

Canonical Signed Digit is a method that represent digital number using ternary digit set (+,0,-) in the minimum number of non-zero digits.

There is always an unique CSD representation of every number.

252012 IEE5049 Digital Integrated Circuit

-110=000000000000000-2=-20

3058310= +000-000-000-00-2 =215-211-27-23-20

Algorithm

262012 IEE5049 Digital Integrated Circuit

Start

i = 0Carry = 0X*n = Xn-1

Carry=0?

Xi+1 Xi Xi+1 Xi

Ci = -1Carry = 1

i < n ?

End

Ci = 0Carry = 1

Ci = 1Carry = 0

Ci = 0Carry = 0

i = i+1

Yes

NoYes

00 , 10 11 10 01 , 11

0001

No

Comparison

Form # of non-zero’s Decoding method Usage

Two’s Complement Most N/ANormal digit

representation

Booth Radix MiddleParallel

(inter-bit irrelevant)

Variable

Multiplier

CSD Less iterativeConstant

Multiplier

272012 IEE5049 Digital Integrated Circuit

Since the coefficient of FIR filter is known, we choose CSD form to implement constant multiplier

Common Sub-expression Elimination(CSE)

A method that treats repeated redundant evaluations (namely Common Sub-expressions) among different multiplier as an single operation.

282012 IEE5049 Digital Integrated Circuit

3058310= +000-000-000-00-2

3979910= +0+00-00-000-00-2

X = +0000000-000-00-2

3058310= X + (0000-00000000000)2 = X + 204810

3979910= X + (00+00-0000000000)2 = X + 716810

A * 30583 = A * X + A * 2048

A * 39799 = A * X + A * 7168

Common Sub-expression Elimination(CSE)

Though CSE is a powerful hardware reducing technique, it has its own disadvantage.

Pros: Could reduce the total hardware needed

Cons: Critical path will be prolonged

We should constrain the maximum level of CSE to limit the critical path

292012 IEE5049 Digital Integrated Circuit

Results and Conclusion

Dynamic Power Consumption

302012 IEE5049 Digital Integrated Circuit

Supply FIR_CSD FIR_2’s Complement

1V 0.75673mW

0.5V 0.05437mW

Supply FIR_CSD FIR_2’s Complement

1V 0.75673mW

0.5V 0.05437mW

Propagation Delay

Results and Conclusion

The multiplier structure have a very significant influence on the total power/delay/area issue

Choosing a correct logic family could also help in adjusting the performance of multiplier

312012 IEE5049 Digital Integrated Circuit

Future Works

Take adder structure into consideration

Applied level constrained CSE algorithm

Implement a practical FIR with a specified function

322012 IEE5049 Digital Integrated Circuit

References

Roy, S.; Nipun, M.M.K.; Wikner, J.J.; , "Ultra-low power FIR filter using STSC-CVL logic," IC Design & Technology (ICICDT), 2011 IEEE International Conference on , vol., no., pp.1-4, 2-4 May 2011

Alan V. Oppenheim, β€œDiscrete-Time Signal Processing,” 2nd ed.

M. Keating, et. al., β€œLow Power Methodology Manual, for System-on-Chip Design,” Springer 2008.

J. Rabaey, β€œLow Power Design Essentials,” Springer, 2009

332012 IEE5049 Digital Integrated Circuit

FIR_CSD @ VDD=1V, pseudo random input

342012 IEE5049 Digital Integrated Circuit

FIR_CSD @ VDD=0.5, pseudo random input

352012 IEE5049 Digital Integrated Circuit

FIR_2’ Comp. @ VDD=1V, pseudo random input

362012 IEE5049 Digital Integrated Circuit

FIR_2’ Comp. @ VDD=0.5V, pseudo random input

372012 IEE5049 Digital Integrated Circuit