Upload
tao-yi-lee
View
129
Download
0
Embed Size (px)
Citation preview
IEE5049-Spring 2012 Digital Integrated Circuit
Low Power Finite Impulse Response Filter
in PTM 90nmWei-Lun Liu(0050209)/Tao-Yi Lee (0050184)
Institute of Electronics Engineering
National Chiao Tung University
Outline
Motivation
Review of Low Power Techniques
Design of Finite Impulse Filter
Performance Summary
Future Works
Conclusion
22012 IEE5049 Digital Integrated Circuit
Motivation-Battery Storage: A Limiting Factor
Basic technology has evolved little: Controlled by the periodic table
store energy using a chemical reaction
Battery capacity increases between 3~7%/year
Limiting Factors:
Energy density
Battery Size
Safe handling
32012 IEE5049 Digital Integrated Circuit
Material MJ/kg
𝑈235 79500000
Gasoline 47.2
Lithium battery 1.3
Lithium-ion battery 0.72
Methanol 19.7
Source of data: Wikipedia
Evolution of Commercialized Battery Tech.
42012 IEE5049 Digital Integrated Circuit
Accelerated circa 1990,
slower than the growth of
power consumption in IC
~ linear growth
Source of data: Wikipedia
Evolution of Power Consumption in Intel Proceesor
52012 IEE5049 Digital Integrated Circuit
Source of data: Dr. Shekhar Borkar, Intel (Re-plotted)
Exponential growth circa
1990
Previous Works
Categorized by Design Levels (Gajski-Kuhn Y-Chart)
System/ApplicationAlgorithm
Micro-ArchitectureMultiplexing in Time or Multiplexing in Space
LogicLogic Family, Standard Cell/Full Custom
CircuitSizing, Thresholds
DeviceSemiconductor Process: Bulk CMOS, Silicone on Insulator,
etc.
62012 IEE5049 Digital Integrated Circuit
Previous Works – Circuit Level: Multiple 𝑽𝑫𝑫
Multiple VDDs Kuroda, T.; , "Optimization and control of VDD and VTH for low-power, high-
speed CMOS design," ICCAD 2002
Hamada, M.; Ootaguro, Y.; Kuroda, T.; , "Utilizing surplus timing for power reduction," CICC 2001
72012 IEE5049 Digital Integrated Circuit
Previous Works – Multiple 𝑽𝑻𝑯Multiple Threshold Partitioning: Independent body
biasing; Variable doping levels provided by process
Effect of body biasing diminishes for 𝐿 < 100nm
𝑉𝑇𝐻 = 𝑉𝑇𝐻0 + 𝜸 −2𝜙𝐹 + 𝑉𝑆𝐵 − −2𝜙𝐹
82012 IEE5049 Digital Integrated Circuit
Previous Works – Power Gating + Clock Gating Mahmoodi, H.; Tirumalashetty, V.; Cooke, M.; Roy, K.; , "Ultra Low-Power Clocking Scheme Using Energy Recovery and Clock Gating," VLSI Systems, IEEE Transactions
on , Jan. 2009 Mueller, M.; Wortmann, A.; Simon, S.; Kugel, M.; Schoenauer, T.; , "The impact of clock gating schemes on the power dissipation of synthesizable register files," ISCAS,
May 2004 Wimer, S.; Koren, I.; , "The Optimal Fan-Out of Clock Network for Power Minimization by Adaptive Gating," VLSI Systems, IEEE Transactions on , vol.PP, no.99, pp.1-9, 0 Jianchao Lu; Taskin, B.; , "Reconfigurable clock polarity assignment for peak current reduction of clock-gated circuits," ISCAS, May 2011 Shaker, M.O.; Bayoumi, M.A.; , "A clock gated flip-flop for low power applications in 90 nm CMOS," ISCAS, vol., no., pp.558-562, 15-18 May 2011 Hai Li; Bhunia, S.; Yiran Chen; Roy, K.; Vijaykumar, T.N.; , "DCG: deterministic clock-gating for low-power microprocessor design," VLSI Systems, IEEE Transactions on,
March 2004 Jaewon Oh; Pedram, M.; , "Gated clock routing for low-power microprocessor design," ICCAD, IEEE Transactions on , vol.20, no.6, pp.715-722, Jun 2001 Sathyamurthy, H.; Sapatnekar, S.S.; Fishburn, J.P.; , "Speeding up pipelined circuits through a combination of gate sizing and clock skew optimization," ICCAD, IEEE
Transactions on , Feb 1998 Sanghyeon Baeg; , "Delay Fault Coverage Enhancement by Partial Clocking for Low-Power Designs With Heavily Gated Clocks," ICCAD, IEEE Transactions on , vol.26,
no.12, pp.2215-2221, Dec. 2007
92012 IEE5049 Digital Integrated Circuit
D Q
Clk
D Q
Clk
D Q
Clk
D Q
Clk
CLKEN
DQ
D Q
Clk
D Q
Clk
D Q
Clk
D Q
Clk
CLK
EN
D
D Q
Clk
Q
always @ (posedge CLK)
if(EN)
Q<= D;
Previous Works – Gate Level Power Optimization
102012 IEE5049 Digital Integrated Circuit
V. Stojanovic, et. al.,”Energy-Delay Tradeoffs in Combinational Logic using Gate Sizing and Supply Voltage Optimization”
i
VDDi
Wi
piWi
i+1
VDDi+1
Wi+1 pi+1Wi+1
Wwire
Previous Works – Gate Level Power Optimization
td =𝐾𝑑𝑉𝑑𝑑𝑉𝑑𝑑 − 𝑉𝑜𝑛
𝛼𝑑
𝑊𝑜𝑢𝑡𝑊𝑖𝑛+𝑊𝑝𝑎𝑟
𝑊𝑖𝑛
=𝐾𝑑𝑉𝑑𝑑𝑉𝑑𝑑 − 𝑉𝑜𝑛
𝛼𝑑ℎ +𝑝
𝑔= 𝜏𝑛𝑜𝑚 ∙ 𝑔 ∙ ℎ +
𝑝
𝑔
g =Win
Winv
p =Wpar
Winv
h =Wout
Win
112012 IEE5049 Digital Integrated Circuit
Previous Works – Gate Level Power Optimization
D = i 𝑡𝑑,𝑖
=𝐾𝑑𝑉𝑑𝑑𝑉𝑑𝑑 − 𝑉𝑜𝑛
𝛼𝑑
…+𝑊𝑖𝑊𝑖−1+𝑊𝑝𝑎𝑟,𝑖−1
𝑊𝑖−1+𝑊𝑖+1𝑊𝑖+𝑊𝑝𝑎𝑟,𝑖
𝑊𝑖
+𝑊𝑖+2𝑊𝑖+1+𝑊𝑝𝑎𝑟,𝑖+1
𝑊𝑖+1+⋯
Assuming Wpar,i is proportional to Wi, therefore 𝑊𝑝𝑎𝑟,𝑖
𝑊𝑖is
approximately a constant,
⇒𝜕D
𝜕Wi= 𝜏𝑛𝑜𝑚𝑔𝑖
1
Wiℎ𝑖−1 − ℎ𝑖
122012 IEE5049 Digital Integrated Circuit
Previous Works – Gate Level Power Optimization
ESW = α0→1𝐾𝑒 𝑊𝑜𝑢𝑡 +𝑊𝑝𝑎𝑟 𝑉𝑑𝑑2
The sensitivity of energy to delay due to the sizing of stage i within a logic block is 𝜕ESW𝜕Wi=𝜕
𝜕Wi
𝑖
𝛼0→1𝐾𝑒𝑉𝑑𝑑2 𝑊𝑖+1 +𝑊𝑝𝑎𝑟,𝑖
= 𝛼0→1𝐾𝑒𝑉𝑑𝑑2 1 + 𝑝𝑖 = α0→1𝐾𝑒pVdd
2
132012 IEE5049 Digital Integrated Circuit
Previous Works – Gate Level Power Optimization
𝜕ESW𝜕Wi𝜕D
𝜕Wi
=𝛼0→1𝐾𝑒𝑉𝑑𝑑
2 1+𝑝𝑖
𝜏𝑛𝑜𝑚𝑔𝑖1
Wiℎ𝑖−1−ℎ𝑖
= −eci
τnom(fi−fi−1)
Energy stored on the logic gate ieci = 𝛼0→1KeWi Vdd,i−1
2 + pnom,iVdd,i2
When 𝑓𝑖 = 𝑓𝑖−1, least delay is achieved
142012 IEE5049 Digital Integrated Circuit
Ener
gy
Dmin Delay
Previous Works – Cost /Benefit Comparison
Tech.Power
Benefit
Timing
Penalty
Area
Penalty
Impact:
Arch.
Impact:
Design
Impact:
Verification
Impact:
P&R
Multi 𝑉𝑇𝐻 Medium Little Little Low Low None Low
Clock
Gating
MediumLittle Little Low Low None Low
Multi 𝑉𝑑𝑑 Large Some Little High Medium Low Medium
152012 IEE5049 Digital Integrated Circuit
Characterization of PTM 90nm 𝑰𝑫-𝑽𝒈𝒔
162012 IEE5049 Digital Integrated Circuit
Drain Induced Barrier Lowering
(DIBL) is observed
Characterization of PTM 90nm 𝑰𝑫-𝑽𝒅𝒔
172012 IEE5049 Digital Integrated Circuit
𝛼 → 1 is observed
Characterization of PTM 90nm 𝒈𝒎-𝑽𝒈𝒔
182012 IEE5049 Digital Integrated Circuit
Minimum Widths and Lengths
Architecture
Item Specification
FIR impulse response length (Window) 4 bit
Quantization Levels 16 (4 bit)
192012 IEE5049 Digital Integrated Circuit
z-1z-1 z-1
h[0] h[1] h[2] h[3]
x[n]
y[n]
Multirate Decomposition
Each signal path is operating in lowest clock rate
Multiplexing in space
𝐻 𝑧 =
𝑖=0
𝑛−1
ℎ𝑖 ∙ 𝑧−𝑖
is divided into
𝐻𝑘 𝑧𝑁 =
𝑖=0
𝑛𝑁−1
ℎ𝑘+𝑁∙𝑖 ∙ 𝑧−𝑁𝑖
202012 IEE5049 Digital Integrated Circuit
Reference: P. P. Vaidyanathan
, “Multirate Systems And Filter Banks”
Multiplier Design
Multiplication is very hardware costly.
More partial product will result in more complex adder tree
To reduce the hardware cost, we should reduce the number of partial product
Different way of numerical representation
212012 IEE5049 Digital Integrated Circuit
Numerical Representation
Two’s complement
Booth Radix
Canonical Signed Digit
222012 IEE5049 Digital Integrated Circuit
Two’s Compliment
Two’s complement expression is very common in digital arithmetic number expression.
The number of 1’s become huge when larger numeric field is being concerned
232012 IEE5049 Digital Integrated Circuit
-1510=000100012=24+20
510=000001012=22+20
-110=111111112 =-27+26+25+24+23+22+21+20
-110=11111111111111112
3058310=01110111011101112
Booth Radix
Booth Radix is a kind a encode that analyzes pairs of adjacent digits in two’s complement.
The coefficient of each bit can be found by table lookup
The number of non-zero digits are less than two’s complement
242012 IEE5049 Digital Integrated Circuit
-110=000000000000000-2=-20
3058310= +00-+00-+00-+00-2 =215-212+211-28+27-24+23-20
Canonical Signed Digit(CSD)
Canonical Signed Digit is a method that represent digital number using ternary digit set (+,0,-) in the minimum number of non-zero digits.
There is always an unique CSD representation of every number.
252012 IEE5049 Digital Integrated Circuit
-110=000000000000000-2=-20
3058310= +000-000-000-00-2 =215-211-27-23-20
Algorithm
262012 IEE5049 Digital Integrated Circuit
Start
i = 0Carry = 0X*n = Xn-1
Carry=0?
Xi+1 Xi Xi+1 Xi
Ci = -1Carry = 1
i < n ?
End
Ci = 0Carry = 1
Ci = 1Carry = 0
Ci = 0Carry = 0
i = i+1
Yes
NoYes
00 , 10 11 10 01 , 11
0001
No
Comparison
Form # of non-zero’s Decoding method Usage
Two’s Complement Most N/ANormal digit
representation
Booth Radix MiddleParallel
(inter-bit irrelevant)
Variable
Multiplier
CSD Less iterativeConstant
Multiplier
272012 IEE5049 Digital Integrated Circuit
Since the coefficient of FIR filter is known, we choose CSD form to implement constant multiplier
Common Sub-expression Elimination(CSE)
A method that treats repeated redundant evaluations (namely Common Sub-expressions) among different multiplier as an single operation.
282012 IEE5049 Digital Integrated Circuit
3058310= +000-000-000-00-2
3979910= +0+00-00-000-00-2
X = +0000000-000-00-2
3058310= X + (0000-00000000000)2 = X + 204810
3979910= X + (00+00-0000000000)2 = X + 716810
A * 30583 = A * X + A * 2048
A * 39799 = A * X + A * 7168
Common Sub-expression Elimination(CSE)
Though CSE is a powerful hardware reducing technique, it has its own disadvantage.
Pros: Could reduce the total hardware needed
Cons: Critical path will be prolonged
We should constrain the maximum level of CSE to limit the critical path
292012 IEE5049 Digital Integrated Circuit
Results and Conclusion
Dynamic Power Consumption
302012 IEE5049 Digital Integrated Circuit
Supply FIR_CSD FIR_2’s Complement
1V 0.75673mW
0.5V 0.05437mW
Supply FIR_CSD FIR_2’s Complement
1V 0.75673mW
0.5V 0.05437mW
Propagation Delay
Results and Conclusion
The multiplier structure have a very significant influence on the total power/delay/area issue
Choosing a correct logic family could also help in adjusting the performance of multiplier
312012 IEE5049 Digital Integrated Circuit
Future Works
Take adder structure into consideration
Applied level constrained CSE algorithm
Implement a practical FIR with a specified function
322012 IEE5049 Digital Integrated Circuit
References
Roy, S.; Nipun, M.M.K.; Wikner, J.J.; , "Ultra-low power FIR filter using STSC-CVL logic," IC Design & Technology (ICICDT), 2011 IEEE International Conference on , vol., no., pp.1-4, 2-4 May 2011
Alan V. Oppenheim, “Discrete-Time Signal Processing,” 2nd ed.
M. Keating, et. al., “Low Power Methodology Manual, for System-on-Chip Design,” Springer 2008.
J. Rabaey, “Low Power Design Essentials,” Springer, 2009
332012 IEE5049 Digital Integrated Circuit
FIR_CSD @ VDD=1V, pseudo random input
342012 IEE5049 Digital Integrated Circuit
FIR_CSD @ VDD=0.5, pseudo random input
352012 IEE5049 Digital Integrated Circuit
FIR_2’ Comp. @ VDD=1V, pseudo random input
362012 IEE5049 Digital Integrated Circuit
FIR_2’ Comp. @ VDD=0.5V, pseudo random input
372012 IEE5049 Digital Integrated Circuit