Chapter 3 Power and Energy Basics - Sahand University of ...ee.sut.ac.ir/People/Courses/167/03.pdf · Chapter 3 Power and Energy Basics Jan M. Rabaey Power and Energy Basics Slide

Chapter 3

Power and Energy Basics

Jan M. Rabaey

Power and Energy Basics

Slide 3.1

The goal of this chapter isto derive clear and unam-biguous definitions andmodels for all of the designmetrics relevant in the low-power design domain.Anyone with some trainingand experience in digitaldesign is probably alreadyfamiliar with a majority ofthem. If you are one ofthem, you should consider

this chapter as a review. However, we recommend that everyone at least browse through thematerial, as some new definitions, perspectives, and methodologies are offered. In addition, if onetruly wants to tackle the energy problem, it is essential to have an in-depth understanding of thecauses of energy dissipation in today’s advanced digital circuits.

Chapter Outline

Metrics

Dynamic power

Static power

Energy– delay trade-offs

Slide 3.2

Before discussing the var-ious sources of power dissi-pation in modern digitalintegrated circuits, it isworth spending some timeevaluating the metrics typi-cally used to evaluate thequality of a circuit ordesign. Unambiguous defi-nitions are essential if one

wants to provide fair comparisons. The rest of this chapter divides the sources of power roughlyalong the lines of dynamic and static power. At the end of the chapter, we make the point thatoptimization for power or energy alone rarely makes sense. Design for low power is most often atrade-off process, performed primarily in the energy-delay space. Realizing this goes a long way insetting up the foundations for an effective power-minimization design methodology.

J. Rabaey, Low Power Design Essentials, Series on Integrated Circuits and Systems,DOI 10.1007/978-0-387-71713-5_3, � Springer ScienceþBusiness Media, LLC 2009

53

Metrics

Delay (s):– Performance metric

Energy (Joule)– Efficiency metric: effort to perform a task

Power (Watt)– Energy consumed per unit time

Power*Delay (Joule)– Mostly a technology parameter – measures the efficiency of

performing an operation in a given technologyEnergy*Delay = Power*Delay2 (Joule s)– Combined performance and energy metric – figure of merit of

design styleOther Metrics: Energy-Delayn(Joule sn)– Increased weight on performance over energy

Slide 3.3

The basic design metrics –propagation delay, energy,and power – are well-knownto anyone with a digitaldesign experience. Yet, theymay not be sufficient. Intoday’s design environment,where both delay and energyplay on an almost equal base,optimizing for only one para-meter rarely makes sense.For instance, the designwith the minimum propaga-tion delay in general takes anexorbitant amount of energy,

and, vice versa, the design with the minimum energy is unacceptably slow. Both represent extremes in arich optimization space, where many other optimal operational points exist. Hence some other metrics ofpotential interest have been defined, such as the energy–delay product, which puts an equal weight onboth parameters. In fact, the normalized energy–delay products for a number of optimized general-purpose designs fall consistently within a narrow range. While being an interesting metric, the energy–-delay product of an actual design only tells us how close the design is to a perfect balance betweenperformance and energy efficiency. In real designs, achieving that balance may not necessarily be ofinterest. Typically, one metric is assigned greater weight – for instance, energy is minimized for a givenmaximum delay or delay is minimized for a given maximum energy. For these off-balance situations,othermetrics can be defined such as energy– delayn. Though interesting, these derivedmetrics however arerarely used, as they lead to optimization for only one target in the overall design space.

It is worth at this point to recap the definition of propagation delay: it is measured as the timedifference between the 50% transition points of the input and output waveforms. For modules withmultiple inputs and outputs, we typically define the propagation delay as the worst-case delay over allpossible scenarios.

Where Is Power Dissipated in CMOS?

Active (Dynamic) power– (Dis)charging capacitors

– Short-circuit powerBoth pull-up and pull-down on during transition

Static (leakage) power– Transistors are imperfect switches

Static currents– Biasing currents

Slide 3.4

Power dissipation sourcescan be divided in twomajor classes: dynamic andstatic. The differencebetween the two is that theformer is proportional tothe activity in the networkand the switching fre-quency, whereas the latteris independent of both.Until recently, dynamicpower vastly outweighedstatic power. With theemergence of leakage as amajor power component

54 Chapter #3

though, both should now be treated on an equal footing. Biasing currents for ‘‘analog’’ componentssuch as sense amplifiers or level converters strictly fall under the static power-consumption class,but originate from a design choice rather than a device deficiency.

Active (or Dynamic) Power

Sources:Charging and discharging capacitorsTemporary glitches (dynamic hazards)Short-circuit currents

Key property of active power:

fPdyn ∝where f is the switching frequency

Slide 3.5

As mentioned, dynamicpower is proportional tothe switching frequency.The charging and dischar-ging of capacitances is andshould be the main sourceof dynamic power dissipa-tion – as these operationsare at the core of what con-stitutes MOS digital circuitdesign. The other contribu-tions (short-circuit currentsand dynamic hazards orglitches) are parasitic effects

and should be made as small as possible.

Charging Capacitors

210 CVE =→

2

2

1CVER =

CV

R

CV

2

2

1CVE =

Applying a voltage step

Value of R does not impact energy!

d

dd dC

C

C

Slide 3.6

The following equation isprobably the most impor-tant one you will encoun-ter in this book: tocharge a capacitance Cby applying a voltagestep V, an amount ofenergy equal to CV2 istaken from the supply.Half of that energy isstored on the capacitor;the other half is dissi-pated as heat in the resis-tance of the charging net-work. During dischargethe stored energy in turn

is dissipated as heat as well. Observe that the resistance of the networks does not enter theequation.

Power and Energy Basics 55

Applied to Complementary CMOS Gate

One half of the power from the supply is consumed in the pull-up network and one half is stored on CL

Charge from CL is dumped during the 1→ 0 transitionIndependent of resistance of charging/discharging network

VDD

Vout

iL

CL

PMOS

NETWORK

NMOS

A 1

AN

NETWORK

210 DDLVCE =→

2

21

DDLR VCE =

2

21

DDLC VCE =

Slide 3.7

This model applies directlyto a digital CMOS gate,where the PMOS andNMOS transistors formthe resistive charge and dis-charge networks. For thesake of simplicity, the totalcapacitance of the networkis lumped into the outputcapacitance of the gate.

Slide 3.8

More generically, we cancompute the energy ittakes to charge a capaci-tance from a voltage V1

to a voltage V2. Usingsimilar math, we derivethat this requires fromthe supply an amount ofenergy equal toCV2(V2–V1). This equa-tion will come in handyfor a number of specialcircuits. One example isthe NMOS pass-transistorchain. It is well-knownthat the maximum voltage

at the end of such as chain is one threshold voltage below the supply [Rabaey03]. Using theafore-derived equation, we find that the energy dissipation in this case equalsCVDD(VDD�VTH), and is proportional to the swing at the output. In general, reducing theswing in a digital network results in a linear reduction in energy consumption.

Circuits with Reduced Swing

E0 1 = VC0

dVC

dtdt = CV dVC

0

V VT

= CV (V VTH )

Energy consumed is proportional to output swing

56 Chapter #3

Charging Capacitors – Revisited

RC EEE +=→10

2)( CVT

RCER =

R

CI

2

2

1CVEC =

Driving from a constant current source

22

0

)()( CVT

RCTRIdtRIIE

I

CVT

R ===

=

∫∞

Energy dissipated in resistor can be reducedby increasing charging time T (i.e., decreasing I )

Slide 3.9

So far, we have assumedthat charging a capacitoralways requires an amountof energy equal to CV2.This is true only when thedriving waveform is a vol-tage step. It is actually pos-sible to reduce the requiredenergy by choosing otherwaveforms. Assume, forinstance, that a currentsource with a fixed currentI is used instead. Underthose circumstances, theenergy consumed in theresistor is reduced to (RC/T)CV2 where T is the char-

ging time, and the output voltage rises linearly with time. Observe that the resistance of the networkplays a role under these circumstances. From this, it appears that the dissipation in the resistor canbe reduced to very small values, if not zero, by charging the capacitor very slowly (i.e., by reducing I).

Slide 3.10

In fact, the current-drivenscenario results in an actualenergy reduction over thevoltage-driven approachfor T > 2RC. As a refer-ence, the time it takes forthe output of the voltage-driven circuit to movebetween 0% and 90%points equals 2.3RC.Hence, the current-drivencircuit is more energy-effi-cient than the voltage-drivenone as long as it is slower.

For this scheme to work,the same approach shouldbe used to discharge the

capacitor, and the charge flowing through the source should be recovered. If not, the energygained is just wasted in the source.

The idea of ‘‘energy-efficient’’ charging gained a lot of attention in the 1990s. However, theinferior performance and the complexity of the circuitry ensured that the ideas remained confinedto the academic world. With the prospect of further voltage scaling bottoming out, these conceptsmay gain some traction anew (some more about this in Chapter 13).

Charging Capacitors

Using constant voltage or current driver?

Energy dissipated using constant-current charging can be made arbitrarily small at the expense of delay:Adiabatic charging

Econstant_current < Econstant_voltage

if

T > 2RC

Note: p(RC) = 0.69 RCt →

t

0 90%(RC) = 2.3 RC


Charging Capacitors

Driving using a sine wave (e.g., from resonant circuit)

Energy dissipated in resistor can be made arbitrarily smallif frequency ω << 1/RC (output signal in phase with input sinusoid)

R

Cv(t)

2

2

1CVEC =

Slide 3.11

Charging a capacitorusing a current source isonly one option. Othervoltage and current wave-forms can be imagined aswell. For instance, assumethat the input voltagewaveform is a sinusoidrather than a step. Afirst-order analysis showsthat this circuit outper-forms the voltage-stepapproach, for sinusoidfrequencies o below 1/RC.The easiest way to cometo this conclusion is to

evaluate the circuit in the frequency domain. The RC network is a low-pass filter with asingle pole at op = 1/RC. It is well-known that for frequencies much smaller than the pole,the output sinusoid has the same amplitude and the same phase as those of the input wave-form. In other words, no or negligible current is flowing through the resistor, and hence littlepower is dissipated. The attractive feature of the sinusoidal waveforms is that these are easilygenerated by resonant networks (such as LC oscillators). Again, with some notable exceptionssuch as power regulators, sinusoidal charging has found little industrial following.

Dynamic Power Consumption

Power = Energy per transition × Transition rate

= CLVDD2f0→1

= CswitchedVDD2f

Power dissipation is data dependent – depends on the switching probability,Switched capacitance Cswitched =

= CLVDD2f 0→1

p

CL= αCL(α is called the switching activity factor)

p0→1

p0→1

Slide 3.12

This brings us back to thegeneric case of the CMOSinverter. To translate thederived energy per opera-tion into power, it must bemultiplied with the rate ofpower-consuming transi-tions f0!1. The unit of theresulting metric is Watt(= Joules/sec). This trans-lation leads right away toone of the hardest problemsin power analysis and opti-mization: it requires knowl-edge of the ‘‘activity’’ of thecircuit. Consider a circuit

with a clock frequency f. The probability that a node will make a 0-to-1 transition at a given clocktick is given by af, where 0 � a � 1 is the activity factor at that node. As we discuss in the followingslides, a is a function of the circuit topology and the activity of the input signals. The accuracy ofpower estimation depends largely upon howwell the activity is known – which is most often not verymuch.

58 Chapter #3

The derived expression can be expanded for a complete module by summing over all nodes. Theaverage power is then expressed as (aC)V2f. Here aC is called the effective capacitance of themodule, and equals the average amount of capacitance that is being charged in the module everyclock cycle.

Impact of Logic Function

A B Out

0 0 1

0 1 0

1 0 0

1 1 0

Example: Static two-input NOR gate

Assume signal probabilitiespA =1 = 1/2pB =1 = 1/2

Then transition probabilityp0→1 = pout=0 × pout=1

= 3/4 × 1/4 = 3/16

αNOR = 3/16

If inputs switch every cycle

NAND gate yields similar result

Slide 3.13

Let us, for instance, derivethe activity of a two-inputNOR gate (which definesthe topology of the circuit).Assume that each input hasan equal probability of beinga 1 or a 0, and that the prob-ability of a transition at aclock tick is 50–50 as well,ensuringanevendistributionbetween states. With the aidof the truth table we derivethat the probability of a0!1 transition (or the activ-ity) equals 3/16.More gener-ally, theactivityat theoutputnode can be expressed as a

function of the 1-probabilities of the inputsA andB:�NOR ¼ pApBð1� pApBÞ.

Impact of Logic Function

A B Out

0 0 1

0 1 0

1 0 0

1 1 0

Example: Static two-input XOR gate

Assume signal probabilitiespA=1 = 1/2

= 1/2pB=1

= 1/2 × 1/2 = 1/4

Then transition probabilityp0→1

p0→1

= pout=0 × pout=1

= 1/4

If inputs switch every cycle

Slide 3.14

A similar analysis can beperformed for an XORgate. The observed activityis a bit higher (1/4).


Transition Probabilities for Basic Gates

p0→1

AND (1 – pApB )pApB

OR (1 – pA)(1 –pB)(1 – (1 –pA)(1 – pB))

XOR (1– (pA +pB – 2pApB))(pA + pB – 2 pApB)

Activity for static CMOS gatesα = p0p1

As a function of the input probabilities

Slide 3.15

These results can be gener-alized for all basic gates.

Activity as a Function of Topology

α NOR,NAND = (2N – 1)/22NαXOR = 1/4

XOR versus NAND/NOR

XOR

NAND/NOR

P

Slide 3.16

The topology of the logicnetwork has a majorimpact on the activity.This is nicely illustrated bycomparing the activity ofNAND (NOR) and XORgates as a function of fan-in. The output-transitionprobability of a NANDgate goes asymptoticallyto zero. The probability ofthe output being a 0 isindeed becoming smallerwith increasing fan-in. Anexample of such a networkis a memory-address deco-

der. On the other hand, the activity of an XOR network is independent of fan-in. This does notbode well for the power dissipation of modules such as large en(de)cryption and coding functions,which primarily consist of XORs.

Slide 3.17

One obvious question is how the choice of logic family impacts activity and power dissipation.Some interesting global trends can be observed. Consider, for instance, the case of dynamic logic.The only power-consuming transitions in pre-charged logic occur when the output evaluates to 0,after which it has to be recharged to a high in the next pre-charge cycle. Hence, the activity factor ais equal to the probability of the output being equal to 0. This means that the activity is alwayshigher in dynamic logic (compared to static), independent of the function. This does not mean per sethat the power dissipation of dynamic logic is higher, as the effective capacitance is the product of

60 Chapter #3

activity and capacitance,the latter being smaller indynamic logic. In generalthough, the higher activityoutweighs the capacitancegain.

Slide 3.18

Another interesting logicfamily is differential logic,which may seem attractivefor very low-voltagedesigns due to its increasedsignal-to-noise ratio. Dif-ferential implementationscome unfortunately withan inherent disadvantagefrom a power perspective:not only is the overall capa-citance higher, the activityis higher as well (for bothstatic and dynamic imple-mentations). The only

positive argument is that differential implementation reduces the number of gates needed for agiven function, and thus reduces the length of the critical path.

Slide 3.19

As activity is such an important parameter in the analysis of power dissipation, it is worthwhilespending some time on how to evaluate the activity of more complex logic networks. One maywonder whether it is possible to develop a ‘‘static power analyzer’’ along the lines of the ‘‘statictiming analyzer’’. The latter evaluates the propagation delay of a logic network analyzing only thetopology of the network without any simulation (hence the name ‘‘static’’). The odds for successful

How About Dynamic Logic?

Energy dissipated when effective output is zero!

or p0→1 = p0

V DD

Eval

Pre-charge

Always larger than p0p1!

Activity in dynamic circuits hence always higher than in static.But ... capacitance most often smaller.

E.g., p0→1(NAND) = 1/2N ; p0→1(NOR) = (2N – 1)/2N

Differential Logic?

V DD

Out Out

Gate

Static: Activity is doubled

Dynamic: Transition probability is 1!

Hence power always increases.


static power analysis seemfavorable at a first glance.Consider, for instance, thenetwork shown on the slide,and assume that the 1- and0-probabilities of the pri-mary input signals areknown. Using the basicgate expressions presentedearlier, the output signalprobabilities can be com-puted for the first layer ofgates starting from the pri-mary inputs. This process isthen repeated until the pri-mary outputs are reached.This process seems fairly

straightforward indeed. However, there is a catch. For the basic gate equations to be valid, the inputsmust be statistically independent. In probability theory, to say that two events are independentintuitively means that the occurrence of one event makes it neither more nor less probable that theother occurs. While this assumption is in general true for the network of the slide (assuming obviouslythat all the primary input signals are independent), it unfortunately rarely holds in actual circuits.

Reconvergent Fan-out (Spatial Correlation)

pZ

= 1–pA . p(X |A) = 1

Becomes complex and intractable real fast

Inputs to gates can be interdependent (correlated)

no reconvergence

pZ = 1–(1–pA )pB

A XA XZ

B

Z

reconvergent

pZ = 1–(1–pA)pA ?NO!

pZ = 1

reconvergence

Must use conditional probabilities

PZ: probability that Z = 1

probability that X = 1 given that A = 1

Slide 3.20

Even if the primary inputs toa logic network are indepen-dent, the signals may becomecorrelated or ‘‘colored’’, whilethey propagate through thelogic network. This is bestillustrated with a simpleexample, which showcasesthe impact of a networkproperty called reconvergentfan-out. In the rightmost cir-cuit, the inputs to theNANDgate Z are not independent,but are both functions of thesame input signalA. To com-pute the output probabilitiesof Z, the expression derived

earlier for aNANDgate is no longer applicable, and conditional probabilities need to be used.Conditionalprobability is the probability of some event A, given the occurrence of some other event B. Conditionalprobability is expressed as p(A|B), and is read as ‘‘the probability ofA, givenB’’.More specifically, one canderive that pðAjBÞ ¼ pðA \ BÞ=pðBÞ; assuming that p(B)6¼ 0.

While propagating these conditional probabilities through the network is theoretically possible,you may guess that the complexity of doing so for complex networks rapidly becomes unmanage-able – and that indeed is the case.

Evaluating Power Dissipation of Complex Logic

Simple idea: start from inputs and propagate signal probabilities to outputs

But:– Reconvergent fan-out– Feedback and temporal/spatial correlations

0.10.50.9

0.1

0.1

0.5

0.5

0.045

0.25

0.99 0.989

p1

62 Chapter #3

Temporal Correlations

Activity estimation the hardest part of power analysisTypically done through simulation with actual input vectors (see later slides)

R LogicX

Feedback

X is a function of itself→ correlated in time

Temporal correlation ininput streams

01010101010101...00000001111111...

Both streams have same P = 1 but different switching statistics

Slide 3.21

The story gets complicatedeven further by the occur-rence of temporal correla-tions. A signal shows tem-poral correlation if a datavalue in the signal stream isdependent upon previousvalues in the stream. Tem-poral correlations are thenorm in sequential net-works, as any signal in thenetwork is typically a func-tion of its previous valuesowing to the existence offeedback network. In addi-tion, primary input signals

as well may show temporal dependence. For example, in a digitized speech signal any sample valueis dependent upon the previous values.

All these arguments help to illustrate that static activity analysis is a very hard problem indeed,and actually all but impossible. Hence, power analysis tools either rely on simulations of actualsignal traces to derive the signal probabilities or make simplifying assumptions – for instance, it isassumed that the input signals are independent and purely random. This is discussed in more detailin Chapter 12. In the following chapters, we will most often assume that activity of a module in itstypical operation mode can be characterized by an independent parameter a.

Glitching in Static CMOS

ABC 101

X

Z

Gate Delay

The result is correct,but extra power is dissipated

A

BX

ZC

Glitch

Analysis so far did not include timing effects

Also known as dynamic hazards:“A single input change causingmultiple changes in the output”

000

Slide 3.22

So far, we have assumedthat the dynamic powerdissipation solely resultsfrom the charging (and dis-charging) of capacitancesin between clock events.Some additional sources ofdynamic power dissipation(i.e., proportional to theclock frequency) should beconsidered. Though (dis)-charging of capacitors isessential to the operationof a CMOS digital circuit,dynamic hazards and short-circuit currents are not.

They should be considered as parasitic, and be kept to an absolute minimum.A dynamic hazard occurs when a single input change causes multiple transitions at the output of

a gate. These events, also known as ‘‘glitches’’, are obviously wasteful, as a capacitor is chargedand/or discharged without having an impact on the final result. In the analysis of the transition


probabilities of complex logic circuits, presented in the earlier slides, glitches did not appear, as thepropagation delays of the individual gates were ignored – all events were assumed to be instanta-neous. To detect the occurrence of dynamic hazards a detailed timing analysis is necessary.

Example: Chain of NAND Gates

1Out1 Out2 Out3 Out4 Out5

0 200 400 6000.0

1.0

2.0

3.0

Time (ps)

Out8

Out6

Out 2

Out6

Out1

Out3

Out7

Out5

Vol

tage

(V

)

Slide 3.23

A typical example of theeffect of glitching is illu-strated in this slide, whichshows the simulatedresponse of a chain ofNAND gates with allinputs going simulta-neously from 0 to 1. Initi-ally, all the outputs are 1, asone of the inputs was 0. Forthis particular transition,all the odd bits must transi-tion to 0, while the even bitsremain at the value of 1.However, owing to thefinite propagation delay,

the even output bits at the higher bit positions start to discharge, and the voltage drops. Whenthe correct input ripples through the network, the output ultimately goes high. The glitch on theeven bits causes extra power dissipation beyond what is required to strictly implement the logicfunction. Although the glitches in this example are only partial (i.e., not from rail to rail), theycontribute significantly to the power dissipation. Long chains of gates often occur in importantstructures such as adders and multipliers, and the glitching component can easily dominate theoverall power consumption.

What Causes Glitches?

A,B

AX

Y

Z

B

C

D

C,D

X

Y

Z

A

B

D

C

XZ

Y

A,B

C,D

X

Y

Z

Uneven arrival times of input signals of gates due tounbalanced delay pathsSolution: balancing delay paths!

Slide 3.24

The occurrence of glitchingin a circuit is mainly due to amismatch in the path lengthsin the network. If all inputsignals of a gate changesimultaneously, no glitchingoccurs. On the other hand,if input signals change atdifferent times, a dynamichazard may develop. Sucha mismatch in signal timingis typically the result of dif-ferent path lengths withrespect to the primary inputsof the network. This is illu-strated in this slide, wheretwo equivalent, but topolo-

gically different, realizations of the function F = A.B.C.D are analyzed. Assume that the AND gate

64 Chapter #3

has a unit delay. The leftmost network suffers from glitching as a result of the disparity between thearrival times of the input signals for gates Y and Z. For example, for gate Z, inputD settles at time 0,whereas inputYonly settles at time 2.Redesigning the network so that all arrival times are identical candramatically reduce the number of superfluous transitions, as shown in the rightmost network.

Short-Circuit Currents

(also called crowbar currents)

Vout

CL

VDD

V in

VDD–VTH

Isc

VTHt

t

Ipeak

PMOS and NMOS simultaneously ON during transition

Psc ~ f

IscV in

Slide 3.25

So far, it was assumed that theNMOS and PMOS transis-tors of a CMOS gate arenever ON simultaneously.Thisassumption isnot entirelycorrect, as the finite slope ofthe input signal during switch-ing causes a direct currentpath between VDD andGND for a short period oftime. The extra power dissi-pation due to these ‘‘short-circuit’’ or ‘‘crowbar’’ cur-rents is proportional to theswitching activity, similarto the capacitive powerdissipation.

Short-Circuit Currents

Equalizing rise/fall times of input and output signals limits Psc to 10–15% of the dynamic dissipation

Large load Small load

VDD

=

time (s)0 20

−0.5

0

0.5

1

1.5

2

2.5

40 60

I sc (A

)

× 10−4

C = 20 fF

C = 100 fF

C = 500 fF

IMAXIsc

VoutCL

Vin

VDD

Isc∼ 0

VoutCL

Vin

[Ref: H. Veendrick, JSSC’84]

Slide 3.26

The peak value of the short-circuit current is also astrong function of the ratiobetween the slopes of theinput and output signals.This relationship is best illu-strated by the followingsimple analysis: Consider astatic CMOS inverter with a0!1 transition at the input.Assume first that the loadcapacitance is very large,so that the output fall timeis significantly larger thanthe input rise time (leftside). Under those circum-stances, the input movesthrough the transient region

before the output starts to change. As the source–drain voltage of the PMOS device is approximatelyzero during that period, the device shuts off without ever delivering any current. The short-circuitcurrent is close to zero. Consider now the reverse case (right side), where the output capacitance isvery small, and the output fall time is substantially smaller than the input rise time. The drain–sourcevoltage of the PMOS device equals VDD for most of the transition time, guaranteeing a maximal


short-circuit current. This clearly represents the worst-case condition. The conclusions of thisintuitive analysis are confirmed by the simulation results.

This analysis may lead to the (faulty) conclusion that the short-circuit dissipation is minimizedby making the output rise/fall time substantially larger than the input rise/fall time. On the otherhand, making the output rise/fall time too large slows down the circuit, and causes large short-circuit currents in the connecting gates. A more practical rule that optimizes the power consump-tion in a global way, can be formulated: The power dissipation due to short-circuit currents isminimized by matching the rise/fall times of the input and output signals. At the overall circuit level,this means that rise/fall times of all signals should be kept constant within a range. Equalizing theinput and output transition times of a gate is not the optimum solution for the individual gate, butkeeps the overall short-circuit current within bounds (maximum 10–15% of the total dynamicpower dissipation). Observe also that the impact of short-circuit current is reduced when we lowerthe supply voltage. In the extreme case, when VDD < VTHn + |VTHp|, the short-circuit dissipationis completely eliminated, because the devices are never ON simultaneously.

Modeling Short-Circuit Power

Can be modeled as capacitor

)( baτin

τoutkCSC +=

a, b: technology parametersk : function of supply and threshold voltages, and transistor sizes

2DDSCSC VCE =

Easily included in timing and power models

Slide 3.27

As the short-circuit power isproportional to theclock frequency, it canbemod-eled as an equivalent capacitor:Psc = CscVDD

2f, which thencan be lumped into the outputcapacitance of the gate. Beaware however that CSC is afunction of the input and out-put transition times.

Transistors Leak

Drain leakage– Diffusion currents

– Drain-induced barrier lowering (DIBL)

Junction leakage– Gate-induced drain leakage (GIDL)

Gate leakage– Tunneling currents through thin oxide

Slide 3.28

Although dynamic power tra-ditionally has dominated thepower budget, static powerhas become an increasingconcern when scaling below100nm. The main reasonsbehind this have beendiscussed at length inChapter 2. Sub-thresholddrain–source leakage, junc-tion leakage, and gate leakageall play important roles, but incontemporary design it is thesub-threshold leakage that isthe main cause of concern.

66 Chapter #3

Sub-threshold Leakage

Off-current increases exponentially when reducing VTH

S

V

leak

TH

W

WII

−

= 100

0 Pleak = VDD.I leak

Slide 3.29

In Chapter 2, it waspointed out that the mainreason behind the increasein drain–source leakage isthe gradual reduction ofthe threshold voltageforced by the lowering ofthe supply voltages. Anyreduction in threshold vol-tage causes the leakage cur-rent to grow exponentially.The chart illustrating this isrepeated for the purpose ofclarity.

Sub-Threshold Leakage

Leakage current increases with drain voltage(mostly due to DIBL)

S

VV λ

λ

leak

DSdTH

W

WII

+−

= 100

0 (for VDS > 3 kT/q)

Hence

)10)(10(0

0S

V

DDS

V

leak

DDdTH

VW

WIP

−

=

Leakage power is a strong function of supply voltage

Slide 3.30

An additional factor is theincreasing impact of theDIBL effect. Combiningthe equations for sub-threshold leakage and theinfluence of DIBL on VTH,an expression for the leak-age power of a gate can bederived. Observe the expo-nential dependence of leak-age power upon both VTH

and VDD.

Slide 3.31

The dependence of the leakage current on the applied drain–source voltage creates some interestingside effects in complex gates. Consider, for example, the case of a two-input NAND gate where thetwo NMOS transistors in the pull-down network are turned off. If the off-resistance of NMOStransistors would be fixed, and not a function of the applied voltage, one would expect that thedoubling of the resistance by putting two transistors in series would halve the leakage current(compared to a similar-sized inverter).

An actual analysis shows that the reduction in leakage is substantially larger. When the pull-down chain is off, node M settles to an intermediate voltage, set by balancing the leakagecurrents of transistors M1 and M2. This reduces the drain–source voltage of both transistors(especially of transistor M2), which translates into a substantial reduction in the leakage currents


due to DIBL. In addition,the gate–source voltage oftransistor M1 becomesnegative, resulting in anexponential reduction ofthe leakage current. Thisis further augmented bythe reverse body-biasing,which raises the thresholdof M1 – this effect is onlysecondary though.Using the expressions for

the leakage currentsderived earlier, we candetermine the voltagevalue of the intermediatenode, VM, and derive an

expression for the leakage current as a function of the DIBL factor �d and the sub-thresholdswing S. The resulting equation shows that the reduction in leakage current obtained by stackingtransistors is indeed larger than the linear factor one would initially expect. This is called thestacking effect.

Stacking Effect

factor 9

Leakage Reduction

2 NMOS 9

3 NMOS 17

4 NMOS 24

2 PMOS 8

3 PMOS 12

4 PMOS 16

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

3× 10–9

VM (V)

I leak

(A) IM1 IM2

90 nm NMOS

Slide 3.32

Themechanics of the stack-ing effect are illustratedwith the example of twostacked NMOS transistors(as in the NAND gate ofSlide 3.31) implemented ina 90 nm technology. Thecurrents through transis-torsM1 andM2 are plottedas a function of the inter-mediate voltage VM. Theactual operation point issituated at the crossing ofthe two load lines. As canbe observed, the drain–source voltage of M2 isreduced from 1 V to

60 mV, resulting in a ninefold reduction in leakage current. The negative VGS of 60 mV fortransistor M1 translates into a similar reduction.

The impact of the stacking effect is further detailed in the table, which illustrates the reduction inleakage currents for various stack sizes in 90 nm technology. The leakage reductions for bothNMOS and PMOS stacks are quite impressive. The impact is somewhat smaller for the PMOSchains, as the DIBL effect is smaller for those devices. The stacking effect will prove to be apowerful tool in the fight against static power dissipation.

Stacking Effect

NAND gate:

Assume that body-biasing effect in short-channeltransistor is small

S

VV MdTH

II+−

′= 100

S

VVVV

leak,M1

leak,M2

MDDdTHM

II)(

0 10−+−−

′=

DDd

dM VV

21+≈

)21

1(

10 d

dDDd

S

Vλ λ

λ

λ

λλ

λ

inv

stack

I

I ++−

≈ (instead of theexpected factor of 2)

VDD

68 Chapter #3

Gate Tunneling

IGD~ e–Tox eVGD, IGS ~ e–Tox eVGS

Independent of the sub-threshold leakage 0V

IGS

Exponential function of supply voltage

Modeled in BSIM4Also in BSIM3v3 (but not always included in foundry models)

NMOS gate leakage usually worse than PMOS

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

× 10

VDD (V)

I gat

e(A

)

90 nm CMOSILeak

IGD

VDD

VDD

ISUB

Slide 3.33

While sub-threshold cur-rents dominate the staticpower dissipation, otherleakage sources should notbe ignored. Gate leakage isbecoming significant in thesub-100 nm era. Gate leak-age currents flow from onelogical gate into the nextone, and hence have a pro-foundly different impact ongate operation comparedto sub-threshold currents.Whereas the latter can bereduced by increasingthreshold voltages, theonly way to reduce thegate leakage component is

to decrease the voltage stress over the gate dielectric – which means reducing voltage levels.Similar to sub-threshold leakage, gate leakage is also an exponential function of the supply voltage.

This is illustrated by the simulation results of a 90 nmCMOS inverter. Themaximum leakage currentis around 100 pA, which is an order of magnitude lower than the sub-threshold current. Yet, even forthese small values, the impact can be large, especially if one wants to store a charge on a capacitor fora substantial amount of time (such as in DRAMs, charge pumps, and even dynamic logic).Remember also that the gate leakage is an exponential function of the dielectric thickness.

Other Sources of Static Power Dissipation

Diode (drain–substrate) reverse-bias currents

n well

n+n+ n+p+p+p+

p substrate

• Electron-hole pair generation in depletion region of reverse-biased diodes• Diffusion of minority carriers through junction• For sub-50 nm technologies with highly doped pn junctions, tunneling through narrow depletion region becomes an issue

Strong function of temperature

Much smaller than other leakage components in general

Slide 3.34

Finally, junction leakage,though substantially smal-ler than the previouslymentioned leakage contri-butions, should not beignored. With the decreas-ing thickness of the deple-tion regions owing to thehigh doping levels, sometunneling effects maybecome pronounced insub-50 nm technologynodes. The strong depen-dence upon temperaturemust again be emphasized.


Other Sources of Static Power Dissipation

Circuit with dc bias currents:

Should be turned off when not in use, or standby current should be minimized

sense amplifiers, voltage converters and regulators, sensors, mixed-signal components, etc.

Slide 3.35

A majority of the state-of-the-art digital circuits containa number of analog compo-nents. Examples of such cir-cuits are sense amplifiers,reference voltages, voltageregulators, level converters,and temperature and leakagesensors.Onepropertyof eachof these circuits is that theyneedabias current for correctoperation. These currentscan become a sizable part ofthe total static power budget.To reduce their contribution,twomechanisms can be used:

(1) Trade off performance for current – Reducing the bias current of an analog circuit, in general,impacts its performance. For instance, the gain and slew rate of an amplifier benefit, from ahigher bias current.

(2) Power management – some analog components need to operate only for a fraction of the time.For example, a sense amplifier in a DRAM or SRAMmemory only needs to be ON at the endof the read cycle. Under those conditions the static power can be substantially reduced byturning off the bias when not in use. While being most effective, this technique does not alwayswork as some bias or reference networks need to be ON all the time, or their start-up timewould be too long to be practical.

In short, every analog circuit should be examined very carefully, and bias current and ON timeshould be minimized. The ‘‘a bias should never be on when not used’’ principle rules.

Summary of Power Dissipation Sources

α – switching activityC L – load capacitanceCSC – short-circuit capacitanceVswing – voltage swingf – frequency

P ~ α ⋅ (CL + CSC) ⋅ Vswing ⋅ VDD ⋅ f + (IDC + ILeak) ⋅ VDD

IDC – static currentIleak – leakage current

operationenergy

P × rate + staticpower=

Slide 3.36

From all the preceding dis-cussions, a global expressionfor the power dissipation ofa digital circuit can bederived. The two majorcomponents, the dynamicand static dissipation, areeasily recognized. An inter-esting perspective on therelationship between thetwo is obtained by realizingthat a given computation(such as a multiplication orthe execution of an instruc-tion on a processor) is bestcharacterized by its energy

cost. Static dissipation, on the other hand, is best captured as a power quantity. To determine the

70 Chapter #3

relative balance between the two, the former must be translated into power by multiplying it with itsexecution rate, or, in other words, the activity. Hence, precise knowledge of the activity is essential ifone wants to estimate the overall power dissipation. Note: in a similar way, multiplying the staticpower with the time period leads to a global expression for energy.

The Traditional Design Philosophy

Maximum performance is primary goal

– Minimum delay at circuit level

Architecture implements the required function

with target throughput, latency

Performance achieved through optimum sizing,

logic mapping, architectural transformations

Supplies, thresholds set to achieve maximum

performance, subject to reliability constraints

Slide 3.37

The growing importance ofpower minimization andcontainment is revolutio-nizing design as we knowit. Methodologies thatwere long-accepted have tobe adjusted, and estab-lished design flows modi-fied. Although this trendwas visible already a dec-ade ago in the embeddeddesign world, it was onlyrecently that it started to

upend a number of traditional beliefs in the high-performance design community. Ever-higherclock frequencies were the holy grail of the microprocessor designer. Though architectural opti-mizations played a role in the performance improvements demonstrated over the years, reducingthe clock period through technology scaling was responsible for the largest fraction.

Once the architecture was selected, the major function of the design flow was to optimize thecircuitry through sizing, technology mapping, and logical transformations so that the maximumperformance was obtained. Supply and threshold voltages were selected in advance to guaranteetop performance.

CMOS Performance Optimization

Sizing: Optimal performance with equal fan-out per stage

Extendable to general logic cone through “logical effort”Equal effective fan-outs (giCi+1/Ci ) per stageExample: memory decoder

CL

CL

pre-decoder

3 15

CW

word driver

addrinput

wordline

{Ref: I. Sutherland, Morgan-Kaufman‘98]

Slide 3.38

This philosophy is best-reflected in the popular‘‘logical effort’’-based designoptimization methodology.The delay of a circuit ismini-mized if the ‘‘effective fan-out’’ of each stage is madeequal (and set to a value ofapproximately 4). Thoughthis technique is very power-ful, it also guarantees thatpower consumption is max-imal! In the coming chap-ters, we will reformulate thelogical-effort methodologyto bring power into theequation.


Slide 3.39

That the circuit optimiza-tion philosophy of old canno longer be maintainedis best illustrated by thissimple example (after She-khar Borkar from Intel).Assume a microprocessordesign implemented in agiven technology. Apply-ing a single technologyscaling step reduces the cri-tical dimensions of the chipby a factor of 0.7. Generalscaling, which reduces thevoltage by the same factor,increases the clock fre-

quency by a factor of 1.41. If we take into account the fact that the die size typically increases(actually, used to increase is a better wording) by a factor of 14% between generations, the totalcapacitance of the die increases by a factor of (1/0.7) � 1.142 = 1.86. (This simplified analysisassumes that all the extra transistors are used to good effect ). The net effect is that the powerdissipation of the chip increases by a factor of 1.3.

However, microprocessor designers tend to push harder than that. Over the past decades,processor frequency increased by a factor of 2 between technology generations. The extra perfor-mance improvement was obtained by circuit optimizations, such as a reduction in the logical depth.Maintaining this rate of improvement now pushes the power dissipation up by a factor of 1.8.

The situation gets even worse when the slowdown in supply voltage scaling is taken into account.Reducing the supply voltage even by a factor of 0.85 means that the power dissipation now rises by270% from generation to generation. As this is clearly unacceptable, a change in design philosophywas the only option.

The New Design Philosophy

Maximum performance (in terms ofpropagation delay) is too power-hungry,and/or not even practically achievable

Many (if not most) applications either cantolerate larger latency or can live with lower-than-maximum clock speeds

Excess performance (as offered bytechnology) to be used for energy/powerreduction

Trading off speed for power

Slide 3.40

This revised philosophybacks off from the ‘‘maxi-mum performance at allcost’’ theory, and abandonsthe notion that clock fre-quency is equivalent toperformance. The ‘‘designslack’’ that results from aless-than-maximum clockspeed in a new technologycan now be used to keepdynamic and static powerwithin bounds. Perfor-mance increase is stillpossible, but now comesmostly from architectural

optimizations – sometimes, but not always, at the expense of extra die area. Design now becomesa trade-off exercise between speed and energy (or power).

Model Not Appropriate Any Longer

Traditional scaling model

CVDDf2 3.1)7.0

1()7.0()14.1

7.01

(Power 22 =×××==

1),

7.0(Freqand,7.0VDDIf ==

CVDD 8.1)2()7.0()14.17.0

1(fPower

,2Freqand,7.0V DDIf222 =×××==

==

CVDD 7.2)2()85.0()14.17.0

1(fPower

,2Freqand,85.0VDDIf222 =×××==

==

Maintaining the frequency scaling model

While slowing down voltage scaling

72 Chapter #3

12

34

–0.400.40.8

0

0.2

0.4

0.6

0.8

1× 10

–4

VTH (V)

VDD (V)

Po

wer

(W

)

A

B

12

34

–0.400.40.8

0

1

2

3

4

5× 10–10

Del

ay (

s)

VTH(V)

VDD (V)

AB

For a given activity level, power is reduced while delay is unchanged if both VDD and VTH are lowered, such as from A to B

Relationship Between Power and Delay

[Ref: T. Sakurai and T. Kuroda, numerous references]

Slide 3.41

This trade-off is wonder-fully illustrated by this setof, by now legendary,charts. Originated byT. Kuroda and T. Sakuraiin the mid 1990s, thegraphs plot power and(propagation) delay of aCMOS module as a func-tion of the supply andthreshold voltages – twoparameters that were con-sidered to be fixed in earlieryears. The opposing natureof optimization for perfor-mance and power becomesobvious – the highest per-

formance happens to occur exactly where power dissipation peaks (high VDD, low VTH). Anotherobservation is that the same performance can be obtained at a number of operational points withvastly different levels of power dissipation. The existence of these ‘‘equal-delay’’ and ‘‘equal-power’’curves proves to be an important optimization instrument when trading off in the delay–power (orenergy) space.

The Energy–Delay Space

VTH

VD

D

Equal-performance curves

Energy minimum

Equal-energy curves

Slide 3.42

Contours of identical per-formance or energy aremore evident in the two-dimensional plots of thedelay and the (average)energy per operation asfunctions of supply andthreshold voltages. The lat-ter is obtained by multiply-ing the average power (asobtained using the expres-sions of Slide 3.36) by thelength of the clock period.Similar trends as shown inthe previous slide can beobserved. Particularlyinteresting is that a point

of minimum energy can be located. Lowering the voltages beyond this point makes little sense asthe leakage energy dominates, and the performance deteriorates rapidly.

Be aware that this set of curves is obtained for one particular value of the activity. For othervalues of a, the balance between static and dynamic power shifts, and so do the trade-off curves.Also, the curves shown here are for fixed transistor sizes.


Energy–Delay Product As a Metric

delay

energy

energy–delay product

90 nm technologyVTH approx 0.35 V

Energy–delay product exhibits minimum at approximately 2VTH (typical unless leakage dominates)

0.6 0.7 0.8 0.9 1 1.1 1.20

0.5

1

1.5

2

2.5

3

3.5

VDD

Slide 3.43

A further simplification ofthe graphs is obtained bykeeping the threshold vol-tage constant. The oppos-ing trends between energyand delay when reducingthe supply voltage areobvious. One would expectthat the product of the two(the energy–delay productor EDP) to show a mini-mum, which it does. Infact, it turns out that forCMOS designs, the mini-mum value of the EDPoccurs approximately

around two times the device threshold. In fact, a better estimate is 3VTH/(3–a) (with a the fitparameter in the alpha delay model – not to be confused with the activity factor). For a=1.4, thistranslates to 1.875VTH. Although this is an interesting piece of information, its meaning should notbe over-estimated. As mentioned earlier, the EDPmetric is only useful when equal weight is placedon delay and energy, which is rarely the case.

In energy-constrained world, design is trade-off process

Minimize energy for a given performance requirement

Maximize performance for given energy budget

Delay

Unoptimized design

DmaxDmin

Energy

Emin

Emax

Exploring the Energy–Delay Space

Pareto-optimaldesigns

♦

♦

[Ref: D. Markovic, JSSC’04]

Slide 3.44

The above charts amplydemonstrate that designfor low power is a trade-off process. We havefound that the best way tocapture the duality betweenperformance and energyefficiency is the energy–de-lay curves. Given a particu-lar design and a set ofdesign parameters, it is pos-sible to derive a pareto-optimal curve that forevery delay value givesthe minimum attainableenergy and vice versa. Thiscurve is the best character-

ization of the energy and performance efficiency of a design. It also helps to redefine the designproblem from ‘‘generate the fastest possible design’’ into a two-dimensional challenge: given amaximum delay, minimize the energy, or, given the maximum energy, find the design with theminimum delay.

We will use energy–delay curves extensively in the coming chapters. In the next chapter, weprovide effective techniques to derive the energy–delay curves for a contemporary CMOS design.

74 Chapter #3

Summary

Power and energy are now primary design constraints

Active power still dominating for most applications–Supply voltage, activity and capacitance the key

parameters

Leakage becomes major factor in sub-100 nm technology nodes–Mostly impacted by supply and threshold voltages

Design has become energy–delay trade-off exercise!

Slide 3.45

In summary, we have ana-lyzed in detail the varioussources of power dissipa-tion in today’s CMOS digi-tal design, and we havederived analytical andempirical models for all ofthem. Armed with thisknowledge, we are readyto start exploring themany ways of reducingpower dissipation andmaking circuits energy-effi-cient. One of the main les-sons at the end of this storyis that there is no free lunch.

Optimization for energy most often comes at the expense of extra delay (unless the initial design issub-optimal in both, obviously). Energy–delay charts are the best way to capture this duality.

References

D. Markovic, V. Stojanovic, B. Nikolic, M.A. Horowitz and R.W. Brodersen, “Methods for true energy–performance optimization,”IEEE Journal of Solid-State Circuits, 39(8), pp. 1282–1293, Aug. 2004.

J. Rabaey, A. Chandrakasan and B. Nikolic, Digital Integrated Circuits: A Design Perspective,” 2nd ed, Prentice Hall 2003.T. Sakurai, “Perspectives on power-aware electronics,” Digest of Technical Papers ISSCC, pp. 26–29, Feb. 2003.

I. Sutherland, B. Sproull and D. Harris, “Logical Effort”, Morgan Kaufmann, 1999. H. Veendrick, “Short-circuit dissipation of static CMOS circuitry and its impact on the design of buffer circuits,” IEEE Journal of Solid-State Circuits, SC-19(4), pp. 468–473, 1984.

Slide 3.46

Some references . . .


Documents

Chapter 3 Power and Energy Basics - Sahand University of ...ee.sut.ac.ir/People/Courses/167/03.pdf · Chapter 3 Power and Energy Basics Jan M. Rabaey Power and Energy Basics Slide