Power Saving at Architectural Level Xiao Xing March 7, 2005

Power Saving at Architectural Level

Xiao XingMarch 7, 2005

Purpose of Power Saving In VLSI Circuits

• For Portability: So that portable Devices

Don’t require Batteries That are

as Large as A Brief Case.

• For Cooling: So one does NOT Have to

Resort to Expensive Cooling

Equipment, that Might Cost more

than the Circuit you’re trying to

Cool off.

Types of Power Consumption

• Dynamic Power ( Main type of Power Consumption)

• Short Circuit Power

• Static Power [1]– Leakage – Sub-threshold

Power Saving Schemes at Different Levels

• Transistor Level [Decreasing Transistor & Interconnect

Capacitances]

• Gate-Level [Input Ordering, Tree Vs. Chain]

• Logic Level [MCML (Low Voltage Swing), Domino

(Small Device Count)]

• Architectural Level [Parallelism, Pipelining, etc]

Can Save the Most Power for Suitable Applications [2]

Pipelining to Save Power

• PDynamic = C * f * VDD2 * Alpha [3]

• Decreasing VDD has the largest Impact on Decreasing Dynamic Power

• Decreasing VDD should also decrease Leakage Power

• Sub-Threshold & Short-Circuit (up or down) Power Dissipation might increase, due to the Slightly Increased Device Count (Pipe-Line Registers)

• Decreasing VDD will also slow down your Circuit, But With Pipelining & Parallelism, This Loss of Speed Can be Compensated.

Pipeline Operation Illustrated

Idea behind Pipelining for Power Saving

• Pipelining Utilizes Parallelism to Boost the Throughput of the Non-Pipelined Circuit

• The Throughput Boost can be Nullified by Decreasing VDD of the Pipelined Circuit (The Pipelined Circuit Now has Roughly the Same Throughput as the Non-Pipelined Circuit)

• But the Decreased VDD Decreased Dynamic Power Consumption

16 bit value in Read Register 2

. .

16 or 32-bit instructions

Enable fromControl Unit

. .

16-bit Immediate Value

. .

7-bitOp Code

3-Bit addressingThe read register 1

3-Bit addressingThe read register 2

3-Bit addressingThe Destination Register

RegisterFile

. .

. .

. .

16 bit value in Read Register 1

7-bitOP-Code

InstructionFetch /RegisterAccess

Pipe-LineRegisters

16+16+7+3 = 42

Flip Flops

RegisterAccess /Execute

Pipe-LineRegisters


ALU

16+16+3+1=36

FlipFlops

Execute /Write-Back

Pipe-LineRegisters

16+7+3+3+3=32

FlipFlops

. .

. .

. .


. .

. .

Signal fromControl UnitIndicating if 2 writesAre NEEDed

LeastSignificant16-bit of ALUOutput result

MOSTSignificant16-bit of ALUOutput result

16 or 32-bit data written back

Pipelined Data Path for a RISC Micro-Processor

Actual Circuit Utilized To Analyze Pipelining as a Viable Power Saving

Scheme• A 32-Bit Shift Register– Not Large Scale, Transparent to Implement– 32 Flip-Flops, Pipelined to 4 Stages, requiring 3 Extra Flip-flops,

with Each Extra Flip-Flop Serving as the Corresponding Pipe-Line Register

– Power Ratio is 10+ : 1 (Possibly 1 of the Better Cases, Almost Trivializing the Power by the Pipeline Registers), So Power Saved by Decreasing VDD, should Substantially Out-Weight the Extra Power of the Extra Flip-Flops

– Power Ratio Comparable to that of a VLSI with its necessary Pipe-Lined Registers (the # of the FF ‘s Required Generally proportional to the Size of the VLSI Circuit)

– Parallel Version, Parallel + Pipelined Version– Layout of the Flip-Flop For Power/Area, Simulation/Estimation– Interested in the Relative % (Should be Applicable to a Bigger

Picture) Power Saved

Architecture Analyzed

• Plain Shift-Register– 32 Flip-Flops– VDD at Max (2.5 or 3V for CMOSP18)– Input Rate == 1 Bit Inputted (Processed) Every 32

Clock Cycles– Clock Period decreased to find out the Maximum

Operating Frequency (By Looking at Waveform Quality, and Voltage Swing)

– Throughput = Input Rate * Frequency = (1 Bit/ 32 Cycles) * (f

cycles/second) = x Bit/Second


• Pipelined Shift-Reg– 35 Flip-Flops– Input Rate == 1 Bit Inputted Every 8 Clock

Cycles– VDD, f initially same as that of Plain Version,

then Drop to Achieve the same Through-Put

8 Flip-Flops 1 8 Flip-Flops 1 8 Flip-Flops 1 8 Flip-Flops


• Parallel Shift-Reg– 64 Flip-Flops, 1 Demux, 1 Mux– Input Rate = 2 Bits Inputted Every 32 Clock-

Cycles– VDD, f initially same as that of Plain Version,


32 Flip-Flops

32 Flip-Flops

De-

Mux

M u

X


• Pipelined + Parallel– 70 Flips-Flops, 1 Mux, 1 DeMux– Input Rate = 2 Bits Every 8 Clock Cycles– VDD, f initially same as that of Plain Version,




Summary

• The Effectiveness of Architectural Approaches (Pipelining, Full-Parallelism, etc) as Viable Power-Saving Schemes for Digital IC ‘s, will be Simulated on a Smaller Scale.

• The Resulting Relative Percentage Power-Saved, should be Applicable on a Grander Scale.

• Pipelining An Average VLSI circuit, May need more than 10% of Hardware/Power for the Pipe-Line Registers (Flip-Flops)

Time Table

• Feb 1 March 1: Literature Survey

• March 8 March 12 : Layout

• March 14 March 18: Simulating Serial & Pipelined Versions

• Mach 19 March 23: Simulating Parallel & The Combo Version

• March 24 End of March: Preparing for the Final Presentation

• April 1st April 15: Write up the Final Report

References[1]. Jan. M Raebaey, “Digital Integrated Circuits”, 2nd Ed., Prentice Hall, 2003

[2]. Jerry Frenkil, “A Multi-Level Approach to Low- Power IC Design”, IEEE Spectrum, Vol 35, Number 2, 1998

[3]. Anantha P. Chandrakasan, “Low Power CMOS Digital Design, IEEE Journal of Solid State Circuits, pp. 473

-- 484, 1992

[4]. K.K. Parhi, "Low-Power Digital VLSI Approaches", Chapter in Circuits and Systems in the Information Age , Edited by Y. Huang and C. Wei, pp. 3-22, IEEE Press, June 1997 (ISCAS-97 Tutorial Book)

Aside• Portability:

If your portable device is very power hungry, and Knowing the limited advancement there has been/will be in terms of Battery Capacity, one would need a Very Large Battery to expect it to keep going and going.

Intel CPUs getting hotter and hotter than they used to be, and Average House hold Maybe able to afford a CPU, but not necessarily something as Drastic as a Vapor Cooling Computer Case.

Application Suitability for Pipelining-For-Power-Saving:1) Power Consumption of the VLSI being pipelined, must >> the Power Consumption of the Pipeline Registers.2) Large & Complex Data Dependency Large & Complex 3) Huge Discrepancy between the delays of the Pipeline stages (1 + 1 + 1000 clock Cycles)

Documents

Power Saving at Architectural Level Xiao Xing March 7, 2005