Upload
victor-norton
View
215
Download
2
Embed Size (px)
Citation preview
Power Saving at Architectural Level
Xiao XingMarch 7, 2005
Purpose of Power Saving In VLSI Circuits
• For Portability: So that portable Devices
Don’t require Batteries That are
as Large as A Brief Case.
• For Cooling: So one does NOT Have to
Resort to Expensive Cooling
Equipment, that Might Cost more
than the Circuit you’re trying to
Cool off.
Types of Power Consumption
• Dynamic Power ( Main type of Power Consumption)
• Short Circuit Power
• Static Power [1]– Leakage – Sub-threshold
Power Saving Schemes at Different Levels
• Transistor Level [Decreasing Transistor & Interconnect
Capacitances]
• Gate-Level [Input Ordering, Tree Vs. Chain]
• Logic Level [MCML (Low Voltage Swing), Domino
(Small Device Count)]
• Architectural Level [Parallelism, Pipelining, etc]
Can Save the Most Power for Suitable Applications [2]
Pipelining to Save Power
• PDynamic = C * f * VDD2 * Alpha [3]
• Decreasing VDD has the largest Impact on Decreasing Dynamic Power
• Decreasing VDD should also decrease Leakage Power
• Sub-Threshold & Short-Circuit (up or down) Power Dissipation might increase, due to the Slightly Increased Device Count (Pipe-Line Registers)
• Decreasing VDD will also slow down your Circuit, But With Pipelining & Parallelism, This Loss of Speed Can be Compensated.
Pipeline Operation Illustrated
Idea behind Pipelining for Power Saving
• Pipelining Utilizes Parallelism to Boost the Throughput of the Non-Pipelined Circuit
• The Throughput Boost can be Nullified by Decreasing VDD of the Pipelined Circuit (The Pipelined Circuit Now has Roughly the Same Throughput as the Non-Pipelined Circuit)
• But the Decreased VDD Decreased Dynamic Power Consumption
16 bit value in Read Register 2
. .
16 or 32-bit instructions
Enable fromControl Unit
. .
16-bit Immediate Value
. .
7-bitOp Code
3-Bit addressingThe read register 1
3-Bit addressingThe read register 2
3-Bit addressingThe Destination Register
RegisterFile
. .
. .
. .
16 bit value in Read Register 1
7-bitOP-Code
InstructionFetch /RegisterAccess
Pipe-LineRegisters
16+16+7+3 = 42
Flip Flops
RegisterAccess /Execute
Pipe-LineRegisters
Enable fromControl Unit
ALU
16+16+3+1=36
FlipFlops
Execute /Write-Back
Pipe-LineRegisters
16+7+3+3+3=32
FlipFlops
. .
. .
. .
Enable fromControl Unit
. .
. .
Signal fromControl UnitIndicating if 2 writesAre NEEDed
LeastSignificant16-bit of ALUOutput result
MOSTSignificant16-bit of ALUOutput result
16 or 32-bit data written back
Pipelined Data Path for a RISC Micro-Processor
Actual Circuit Utilized To Analyze Pipelining as a Viable Power Saving
Scheme• A 32-Bit Shift Register– Not Large Scale, Transparent to Implement– 32 Flip-Flops, Pipelined to 4 Stages, requiring 3 Extra Flip-flops,
with Each Extra Flip-Flop Serving as the Corresponding Pipe-Line Register
– Power Ratio is 10+ : 1 (Possibly 1 of the Better Cases, Almost Trivializing the Power by the Pipeline Registers), So Power Saved by Decreasing VDD, should Substantially Out-Weight the Extra Power of the Extra Flip-Flops
– Power Ratio Comparable to that of a VLSI with its necessary Pipe-Lined Registers (the # of the FF ‘s Required Generally proportional to the Size of the VLSI Circuit)
– Parallel Version, Parallel + Pipelined Version– Layout of the Flip-Flop For Power/Area, Simulation/Estimation– Interested in the Relative % (Should be Applicable to a Bigger
Picture) Power Saved
Architecture Analyzed
• Plain Shift-Register– 32 Flip-Flops– VDD at Max (2.5 or 3V for CMOSP18)– Input Rate == 1 Bit Inputted (Processed) Every 32
Clock Cycles– Clock Period decreased to find out the Maximum
Operating Frequency (By Looking at Waveform Quality, and Voltage Swing)
– Throughput = Input Rate * Frequency = (1 Bit/ 32 Cycles) * (f
cycles/second) = x Bit/Second
Architecture Analyzed
• Pipelined Shift-Reg– 35 Flip-Flops– Input Rate == 1 Bit Inputted Every 8 Clock
Cycles– VDD, f initially same as that of Plain Version,
then Drop to Achieve the same Through-Put
8 Flip-Flops 1 8 Flip-Flops 1 8 Flip-Flops 1 8 Flip-Flops
Architecture Analyzed
• Parallel Shift-Reg– 64 Flip-Flops, 1 Demux, 1 Mux– Input Rate = 2 Bits Inputted Every 32 Clock-
Cycles– VDD, f initially same as that of Plain Version,
then Drop to Achieve the same Through-Put
32 Flip-Flops
32 Flip-Flops
De-
Mux
M u
X
Architecture Analyzed
• Pipelined + Parallel– 70 Flips-Flops, 1 Mux, 1 DeMux– Input Rate = 2 Bits Every 8 Clock Cycles– VDD, f initially same as that of Plain Version,
then Drop to Achieve the same Through-Put
8 Flip-Flops 1 8 Flip-Flops 1 8 Flip-Flops 1 8 Flip-Flops
8 Flip-Flops 1 8 Flip-Flops 1 8 Flip-Flops 1 8 Flip-Flops
Summary
• The Effectiveness of Architectural Approaches (Pipelining, Full-Parallelism, etc) as Viable Power-Saving Schemes for Digital IC ‘s, will be Simulated on a Smaller Scale.
• The Resulting Relative Percentage Power-Saved, should be Applicable on a Grander Scale.
• Pipelining An Average VLSI circuit, May need more than 10% of Hardware/Power for the Pipe-Line Registers (Flip-Flops)
Time Table
• Feb 1 March 1: Literature Survey
• March 8 March 12 : Layout
• March 14 March 18: Simulating Serial & Pipelined Versions
• Mach 19 March 23: Simulating Parallel & The Combo Version
• March 24 End of March: Preparing for the Final Presentation
• April 1st April 15: Write up the Final Report
References[1]. Jan. M Raebaey, “Digital Integrated Circuits”, 2nd Ed., Prentice Hall, 2003
[2]. Jerry Frenkil, “A Multi-Level Approach to Low- Power IC Design”, IEEE Spectrum, Vol 35, Number 2, 1998
[3]. Anantha P. Chandrakasan, “Low Power CMOS Digital Design, IEEE Journal of Solid State Circuits, pp. 473
-- 484, 1992
[4]. K.K. Parhi, "Low-Power Digital VLSI Approaches", Chapter in Circuits and Systems in the Information Age , Edited by Y. Huang and C. Wei, pp. 3-22, IEEE Press, June 1997 (ISCAS-97 Tutorial Book)
Aside• Portability:
If your portable device is very power hungry, and Knowing the limited advancement there has been/will be in terms of Battery Capacity, one would need a Very Large Battery to expect it to keep going and going.
Intel CPUs getting hotter and hotter than they used to be, and Average House hold Maybe able to afford a CPU, but not necessarily something as Drastic as a Vapor Cooling Computer Case.
Application Suitability for Pipelining-For-Power-Saving:1) Power Consumption of the VLSI being pipelined, must >> the Power Consumption of the Pipeline Registers.2) Large & Complex Data Dependency Large & Complex 3) Huge Discrepancy between the delays of the Pipeline stages (1 + 1 + 1000 clock Cycles)