43
CpE 690: Digital System Design Fall 2013 Lecture 11 Low Power Design Bryan Ackland Department of Electrical and Computer Engineering Stevens Institute of Technology Hoboken, NJ 07030 Adapted from Digital Integrated Circuits: A Design Perspective, Rabaey et. al., 2003 and Lecture Notes, David Mahoney Harris CMOS VLSI Design

Lecture 11 - CpE 690 Introduction to VLSI Design

Embed Size (px)

DESCRIPTION

CpE 690 Introduction to VLSI DesignFall 2013Stevens Institute of TechnologyDesign for Low Power

Citation preview

Page 1: Lecture 11 - CpE 690 Introduction to VLSI Design

CpE 690: Digital System Design Fall 2013

Lecture 11 Low Power Design

1

Bryan Ackland Department of Electrical and Computer Engineering

Stevens Institute of Technology Hoboken, NJ 07030

Adapted from Digital Integrated Circuits: A Design Perspective, Rabaey et. al., 2003 and Lecture Notes, David Mahoney Harris CMOS VLSI Design

Page 2: Lecture 11 - CpE 690 Introduction to VLSI Design

2

• CMOS developed in 1970’s as a low power technology – (almost) no DC current when gate is not switching – no static power dissipation

• CMOS replaces NMOS in 1980’s as dominant digital technology – NMOS designs dissipated about 200µW/gate – Power dissipation no longer an issue!

• CMOS process technology evolves to provide: – more transistors per chip (Moore’s Law) – faster switching speed (few MHz hundreds of MHz)

• 1992 DEC announces Alpha 64-bit microprocessor – triumph of high speed CMOS digital design – first 200MHz processor, 1.7M transistors – 30W power dissipation – Power dissipation is once again an issue!

CMOS – a Low Power Technology

Page 3: Lecture 11 - CpE 690 Introduction to VLSI Design

3

• Need to remove heat from high performance chips – max. operating temperature silicon transistors: 150 – 200 °C

• Chip on PC board can dissipate 2-3 watts

• With suitable heatsink, maybe 10 watts

• With forced-air cooling (fans), up to 150W

• With sophisticated liquid cooling, maybe 1000W

Why Power Matters: Package & System Cooling

This image cannot currently be displayed.

Page 4: Lecture 11 - CpE 690 Introduction to VLSI Design

4

• Today, we see more hand-held battery operated devices

• Unlike CMOS technology, battery technology has seen only modest improvements over last few decades

• Expected battery lifetime increase over the next 5 years: 30 to 40%

Why Power Matters: Battery Size & Weight

Rabaey: Low Power Design Essentials 2009

Page 5: Lecture 11 - CpE 690 Introduction to VLSI Design

5

• Power Supply and Ground design – If VDD=1.0V, a 100W chip draws 100 amps! – Many package pins required – Virtex-6 1924-pin package:

• 220 power and 484 GND pins – On-chip wiring distribute this current – Electro-migration issues

• On-chip noise and system reliability

– Large currents switched through package and PCB inductance

• Environmental Concerns

– Computers and consumer electronics account for 15% of residential energy consumption

Why Power Matters: Power Distribution

Page 6: Lecture 11 - CpE 690 Introduction to VLSI Design

6

• Power is drawn from a voltage source attached to the VDD and GND pins of a chip.

• Instantaneous Power: (watts)

• Energy: (joules)

• Average Power:

Back to Basics: Power & Energy

( ) ( ) ( )P t I t V t=

0

( )T

E P t dt= ∫

avg0

1 ( )TEP P t dt

T T= = ∫

Page 7: Lecture 11 - CpE 690 Introduction to VLSI Design

7

• Power Supply:

• Resistor

• Capacitor

– but they do store energy:

Back to Basics: Power in Circuit Elements

( ) ( )VDD DD DDP t I t V=

( ) ( ) ( )2

2RR R

V tP t I t R

R= =

( ) ( ) ( )

( )

0 0

212

0

C

C

V

C

dVE I t V t dt C V t dtdt

C V t dV CV

∞ ∞

= =

= =

∫ ∫

Capacitors don’t dissipate power!

Vc C

V(t) t=0

R

Page 8: Lecture 11 - CpE 690 Introduction to VLSI Design

8

• Ptotal = Pdynamic + Pstatic

• Dynamic power: Pdynamic = Pswitching + Pshortcircuit

– Switching load capacitances

– Short-circuit current

• Static power: Pstatic = (Isub + Igate + Ijunct + Icontention)VDD

– Subthreshold leakage

– Gate leakage

– Junction leakage

– Contention current

Power Dissipation in CMOS

Page 9: Lecture 11 - CpE 690 Introduction to VLSI Design

9

• When the gate output rises from GND to VDD:

– Energy stored in capacitor is

– But energy drawn from the supply is

– Half the energy from VDD is dissipated in the pMOS transistor as heat, other half stored in capacitor

• When the gate output falls from VDD to GND – Stored energy in capacitor is dumped to GND – Dissipated as heat in the nMOS transistor

Dynamic Power: Charging a Capacitor

212C L DDE C V=

( )0 0

2

0

DD

VDD DD L DD

V

L DD L DD

dVE I t V dt C V dtdt

C V dV C V

∞ ∞

= =

= =

∫ ∫

∫ independent of size of transistors!

Page 10: Lecture 11 - CpE 690 Introduction to VLSI Design

10

• Example: VDD = 1.0 V, CL = 150 fF, f = 1 GHz

Switching Waveforms

Page 11: Lecture 11 - CpE 690 Introduction to VLSI Design

11

Switching Waveforms

[ ]

switching0

0

sw

2sw

1 ( )

( )

T

DD DD

TDD

DD

DDDD

DD

P i t V dtT

V i t dtT

V Tf CVT

CV f

=

=

=

=

∫C

fswiDD(t)

VDD

Note: Pswitching is independent of drive strength of the nMOS and pMOS transistors

Page 12: Lecture 11 - CpE 690 Introduction to VLSI Design

Activity Factor

• Suppose the system clock frequency = f • Most gates do not switch every clock cycle • Let fsw = αf, where α = activity factor

– α = P0→1 : probability that a signal switches from 0 to 1 in any clock cycle

– If the signal is the system clock, α = 1 – If the signal switches once per cycle, α = 0.5 – If the signal is random (clocked) data, α = 0.25 – Static CMOS logic has (empirically) α ≈ 0.1

• Dynamic power of a circuit:

12

2switching DDP CV fα=

Page 13: Lecture 11 - CpE 690 Introduction to VLSI Design

Dynamic Power Example

• 1 billion transistor chip – 50M logic transistors

• Average width: 12 λ • Activity factor = 0.1

– 950M memory transistors • Average width: 4 λ • Activity factor = 0.02

– 65 nm, 1.0V process (λ = 25nm) – C = 1 fF/µm (gate) + 0.8 fF/µm (diffusion)

• Estimate dynamic power consumption @ 1 GHz.

Neglect wire capacitance and short-circuit current.

13

Page 14: Lecture 11 - CpE 690 Introduction to VLSI Design

Solution

14

Page 15: Lecture 11 - CpE 690 Introduction to VLSI Design

Reducing Switching Power

• So try to minimize: – Activity factor – Capacitance – Supply voltage – Frequency

15

2switching DDP CV fα=

Page 16: Lecture 11 - CpE 690 Introduction to VLSI Design

Activity Factor Estimation

• Let Pi = probability (node i = 1) and Pi = (1 – Pi) = probability (node i = 0)

• αi = prob. that node i makes a transition from 0 to 1, so

• αi = Pi • Pi = (1 ─ Pi) • Pi

16 Pi

αi

Page 17: Lecture 11 - CpE 690 Introduction to VLSI Design

Activity Factor Estimation

• For random data, α = 0.5 • 0.5 = 0.25

• Data is often not completely random – e.g. upper bits of 64-bit words representing bank account

balances are usually 0

• Data propagating through ANDs and ORs has lower activity factor

17

Page 18: Lecture 11 - CpE 690 Introduction to VLSI Design

Example: Switching Probability of NOR2

• For NOR2, PY = PA • PB

• PY = (1 – PY ) = (1 – PA • PB)

• αY = PY • PY

= (PA • PB)•(1 – PA • PB) • If PA = PB = 0.5, PY = 0.25, αY = 3/16 ≈ 0.19

18

A B Y

0 0 1

0 1 0

1 0 0

1 1 0

A B Y

Page 19: Lecture 11 - CpE 690 Introduction to VLSI Design

Switching Probabilities (Static Gates)

• Remember αY = PY • PY

19

Page 20: Lecture 11 - CpE 690 Introduction to VLSI Design

Example: 4-input AND gate

20

• Assume all inputs have P=0.5

• Which has the lowest power?

A B C D

Y

Y

A B C D

A B C D Y

P=3/4 α=3/16

P=3/4 α=3/16

P=15/16 α=15/256

P=1/16 α=15/256

P=1/16 α=15/256

P=3/4 α=3/16

P=1/4 α=3/16

P=7/8 α=7/64

P=1/8 α=7/64

P=1/16 α=15/256

P=15/16 α=15/256

Page 21: Lecture 11 - CpE 690 Introduction to VLSI Design

Number of Stages vs. Power

21

• Power depends on activity and capacitance at each node

• Generally fewer stages usually mean less power

• Compare this to delay – frequently add stages to improve delay (stage effort ≈ 4)

• Tradeoff between speed and power

Page 22: Lecture 11 - CpE 690 Introduction to VLSI Design

Input Ordering

22

• What if inputs have different activity levels?

• Beneficial to postpone the introduction of signals with a high activity – i.e. signals with signal probability close to 0.5

A B

C

X

F

P=0.5 α=0.25

P=0.2 α=0.16

P=0.1 α=0.09

P=0.1 α=0.09

P=0.01 α=0.01

P=0.02 α=0.02 B

C A

X

F P=0.5

α=0.25

P=0.2 α=0.16

P=0.1 α=0.09

P=0.01 α=0.01

Page 23: Lecture 11 - CpE 690 Introduction to VLSI Design

Beware of Glitches!

23

• Extra transitions caused by finite propagation delay

Suppose input changes from ABCD = “1101” to “0111” ? Glitching occurs whenever a node makes more transitions than necessary to reach its final value Glitching can raise the activity factor of a gate to greater than 1!

A B C D Y

n3 n4 n5 n6 n7

Page 24: Lecture 11 - CpE 690 Introduction to VLSI Design

Activity of Dynamic Circuits

24

Exercise: What is activity factor of dynamic NOR2 if PA=PB=0.5 ?

Page 25: Lecture 11 - CpE 690 Introduction to VLSI Design

Clock Gating

25

• Another way to reduce the activity is to turn off the clock to registers in unused blocks

– Saves clock activity (α = 1) – Eliminates all switching activity in the block – Requires determining if block will be used

Page 26: Lecture 11 - CpE 690 Introduction to VLSI Design

Capacitance

26

• Extra capacitance slows response and increases power – Always try to reduce parasitic and wiring capacitance – Good floorplanning to keep high activity communicating gates

close to each other – Drive long wires with inverters or buffers rather than complex gates

• Gate sizing and number of stages

– Designing network for minimum delay will usually result in a high-power network.

– Small increase in delay (by reducing the # of stages or increasing the logical effort per stage) can give large reduction in power

– There are no closed form solutions to determine gate sizes that minimize power under a delay constraint.

– Can be solved numerically

Delay

Energy

Page 27: Lecture 11 - CpE 690 Introduction to VLSI Design

Voltage

27

• Power dissipated in gate is Pav = α.f.CL.VDD2

• Energy per switching event* is Es = Pav/(2.α.f) = (CL.VDD2)/2

– Power & Energy can be significantly reduced by decreasing VDD

• But delay of gate is D = (CL. ∆V)/I

≈ (CL.VDD)/[(β/2).(VDD-Vt)2]

– Decreasing VDD increases delay

• Circuit can be made (almost) arbitrarily low power at the expense of performance – not very useful

* switching event is defined as a transition from 0→1 or 1→0

Page 28: Lecture 11 - CpE 690 Introduction to VLSI Design

Energy-Delay Product

28

• Introduce metric energy-delay product (EDP)

• Minimum EDP at VDD = 3.Vt (for long channel process)

𝐸𝐸𝐸 = 𝐸𝑠.𝐸 =𝑘.𝐶𝐿2.𝑉𝐷𝐷3

𝑉𝐷𝐷 − 𝑉𝑡 2

normalized units

VDD

𝑉𝑡 = 0.4𝑉

Page 29: Lecture 11 - CpE 690 Introduction to VLSI Design

Frequency

29

• Suppose we can do a task in T sec. on one processor

• Can we do it in T/2 sec. on two processors – if application has sufficient intrinsic parallelism

• How about doing it in T sec. on two processors running at half clock frequency?

• This gives no net power savings.

• But 𝑠𝑠𝑠𝑠𝑠 ∝ (𝑉𝐷𝐷 − 𝑉𝑇)2 𝑉𝐷𝐷⁄ , so if we reduce clock frequency, we can also reduce 𝑉𝐷𝐷:

Proc. at V volts, f Hz

= P watts

Proc. at V volts, f/2 Hz

= P/2 watts

Proc. at V volts, f/2 Hz

= P/2 watts ≡ +

Page 30: Lecture 11 - CpE 690 Introduction to VLSI Design

Reduced Frequency & Voltage

30

• Parallelism with reduced 𝑓 and 𝑉𝐷𝐷 leads to lower power – diminishing returns as 𝑉𝐷𝐷 approaches 𝑉𝑇

VDD (volts)

Rel. Speed

VT = 0.5

In this example, reducing speed by factor of 50% allows voltage reduction of ~35%

Proc. at V volts, f Hz

= P watts

Proc. at 0.65V volts, f/2 Hz ≈ 0.2 P watts

≡ + Proc. at 0.65V volts, f/2 Hz ≈ 0.2 P watts

Page 31: Lecture 11 - CpE 690 Introduction to VLSI Design

Dynamic Power Dissipation Example

31

• A NAND2 gate of size (input capacitance) 12 is driving a load of 94C units of capacitance. Assume the inputs A, B are independent and uniformly distributed. What is the power dissipation of this gate if the gate capacitance C of a unit sized transistor is 0.1fF, VDD is 1.0V and the operating frequency is 1GHz?

94 12

A B

Y

Page 32: Lecture 11 - CpE 690 Introduction to VLSI Design

Short-Circuit Power

32

• Finite slope of the input signal – sets up a direct current path between VDD and GND for a short

period during switching when both the NMOS and PMOS devices are conducting.

• Depends on duration and slope of the input signal, tsc

• ISC which is determined by – saturation current of the P and N transistors

• depends on sizes, process technology, temperature, etc. – ratio between input and output slopes (a function of CL)

ISC

Esc ≈ tsc.VDD.ISC

Page 33: Lecture 11 - CpE 690 Introduction to VLSI Design

Slope Engineering

33

ISC

Small Capacitive Load

ISC≈0

Large Capacitive Load

• Output fall time significantly shorter than input rise time

• Output “tracks” input as per DC transfer function

• Large ISC when VIN ≈ VSW

• Output fall time significantly longer than input rise time

• Output transition lags input • When VIN = VSW, Vdsp is still

very small, so small ISC

Page 34: Lecture 11 - CpE 690 Introduction to VLSI Design

Impact of CL on ISC

• When CL is small, ISC is large! – Short circuit dissipation is minimized by matching the rise/fall

times of the input and output signals - slope engineering.

• Typically less than 10% of dynamic power if rise/fall times are comparable for input and output

34

CL = 20 fF

CL = 100 fF

CL = 500 fF

time ( 10-10 sec)

500 psec input slope

Page 35: Lecture 11 - CpE 690 Introduction to VLSI Design

Static Power Dissipation

• Static power is consumed even when chip is quiescent – i.e. powered up but not running

• Ratio’ed circuits (e.g. pseudo-NMOS) burn power in fight between ON transistors

– known as contention current

• Leakage consumes power from current passing through normally off devices – sub-threshold current

– gate leakage current

– diode junction leakage current

35

Page 36: Lecture 11 - CpE 690 Introduction to VLSI Design

Leakage Sources

• Leakage currents are very small (per transistor basis) – prior to 130 nm, not usually an issue (except in sleep mode of

battery operated devices) – but when multiplied by hundreds of millions of nanometer devices,

can account for as much as 1/3 of active power

• All increase exponentially with temperature

36

gate leakage

sub-threshold leakage

junction leakage

Page 37: Lecture 11 - CpE 690 Introduction to VLSI Design

Sub-threshold Leakage

• Shockley model assumes Ids = 0 when Vgs ≤ Vt

• But in real transistors, 𝐼𝑠𝑠 ≈ 100𝑛𝑛 × (𝑊/𝐿) when Vgs = Vt

• For Vgs < Vt, Ids decreases exponentially with Vgs

• In nanometer processes, as we reduce VDD, we also

reduce Vt to maintain good on-current

• But off-current increases with smaller Vt

Typical values in 65 nm, Vds=1.0V: Ioff = 100 nA/µm @ Vt = 0.3 V Ioff = 10 nA/µm @ Vt = 0.4 V Ioff = 1 nA/µm @ Vt = 0.5 V

37

𝐼𝑑𝑠 = 𝐼010𝑉𝑔𝑔−𝑉𝑡

𝑆 where S is sub-threshold slope ≈ 100mV/decade

𝐼𝑜𝑜𝑜 = 𝐼010−𝑉𝑡𝑆

Page 38: Lecture 11 - CpE 690 Introduction to VLSI Design

Stack Effect

• Series OFF transistors have less leakage – for N1 to have any leakage, Vx > 0 – so N2 has negative Vgs – leakage through 2-stack reduces ~10x – leakage through 3-stack reduces further

• Leakage and delay trade off

– Aim for low leakage in sleep and low delay in active mode

• To reduce leakage: – Increase Vt: multiple Vt

• Use low Vt only in speed critical circuits – Increase Vs: stack effect

• Input vector control in sleep 38

0

0

Vx

1

N2

N1

Page 39: Lecture 11 - CpE 690 Introduction to VLSI Design

Gate & Junction Leakage

• Gate leakage extremely strong function of tox and Vgs

– Negligible for older processes

– Approaches sub-threshold leakage at 65 nm

• An order of magnitude less for pMOS than nMOS

• Control gate leakage in the process using tox > 10 Å – High-k gate dielectrics help

– Some processes provide multiple tox

• e.g. thicker oxide for 3.3 V I/O transistors

• Junction leakage usually negligible – becoming little more significant in nanometer processes

• Control gate & junction leakage in circuits by limiting VDD 39

Page 40: Lecture 11 - CpE 690 Introduction to VLSI Design

Power Gating

• Turn OFF power to blocks when they are idle to save leakage

– Use virtual VDD (VDDV)

– Gate outputs to prevent invalid logic levels to next block

• Voltage drop across sleep transistor degrades performance during normal operation

– Size the transistor wide enough to minimize impact • Switching wide sleep transistor costs dynamic power

– Only justified when circuit sleeps long enough

40

Page 41: Lecture 11 - CpE 690 Introduction to VLSI Design

Static Power Example

• Revisit power estimation for 1 billion transistor chip • Estimate static power consumption

– 65nm process, VDD=1.0V, λ = 25nm – 50M logic transistors (average width 12 λ) – 950M memory transistors (average width 4 λ) – Subthreshold leakage

• Normal Vt: 100 nA/µm • High Vt: 10 nA/µm • High Vt used in all memories and in 95% of logic gates

– Gate leakage 5 nA/µm – Junction leakage negligible

41

Page 42: Lecture 11 - CpE 690 Introduction to VLSI Design

Static Power Solution

42

Page 43: Lecture 11 - CpE 690 Introduction to VLSI Design

Voltage & Frequency Control

• Run each block at the lowest possible voltage and frequency that meets performance requirements

• Multiple Voltage Domains – Provide separate supplies to different blocks – Level converters required when crossing from low to high VDD

domains • Dynamic Voltage Scaling

– Adjust VDD and f according to workload

43