Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Chapter 3
Power and Energy Basics
Jan M. Rabaey
Power and Energy Basics
Slide 3.1
The goal of this chapter isto derive clear and unam-biguous definitions andmodels for all of the designmetrics relevant in the low-power design domain.Anyone with some trainingand experience in digitaldesign is probably alreadyfamiliar with a majority ofthem. If you are one ofthem, you should consider
this chapter as a review. However, we recommend that everyone at least browse through thematerial, as some new definitions, perspectives, and methodologies are offered. In addition, if onetruly wants to tackle the energy problem, it is essential to have an in-depth understanding of thecauses of energy dissipation in today’s advanced digital circuits.
Chapter Outline
Metrics
Dynamic power
Static power
Energy– delay trade-offs
Slide 3.2
Before discussing the var-ious sources of power dissi-pation in modern digitalintegrated circuits, it isworth spending some timeevaluating the metrics typi-cally used to evaluate thequality of a circuit ordesign. Unambiguous defi-nitions are essential if one
wants to provide fair comparisons. The rest of this chapter divides the sources of power roughlyalong the lines of dynamic and static power. At the end of the chapter, we make the point thatoptimization for power or energy alone rarely makes sense. Design for low power is most often atrade-off process, performed primarily in the energy-delay space. Realizing this goes a long way insetting up the foundations for an effective power-minimization design methodology.
J. Rabaey, Low Power Design Essentials, Series on Integrated Circuits and Systems,DOI 10.1007/978-0-387-71713-5_3, � Springer ScienceþBusiness Media, LLC 2009
53
Metrics
Delay (s):– Performance metric
Energy (Joule)– Efficiency metric: effort to perform a task
Power (Watt)– Energy consumed per unit time
Power*Delay (Joule)– Mostly a technology parameter – measures the efficiency of
performing an operation in a given technologyEnergy*Delay = Power*Delay2 (Joule s)– Combined performance and energy metric – figure of merit of
design styleOther Metrics: Energy-Delayn(Joule sn)– Increased weight on performance over energy
Slide 3.3
The basic design metrics –propagation delay, energy,and power – are well-knownto anyone with a digitaldesign experience. Yet, theymay not be sufficient. Intoday’s design environment,where both delay and energyplay on an almost equal base,optimizing for only one para-meter rarely makes sense.For instance, the designwith the minimum propaga-tion delay in general takes anexorbitant amount of energy,
and, vice versa, the design with the minimum energy is unacceptably slow. Both represent extremes in arich optimization space, where many other optimal operational points exist. Hence some other metrics ofpotential interest have been defined, such as the energy–delay product, which puts an equal weight onboth parameters. In fact, the normalized energy–delay products for a number of optimized general-purpose designs fall consistently within a narrow range. While being an interesting metric, the energy–-delay product of an actual design only tells us how close the design is to a perfect balance betweenperformance and energy efficiency. In real designs, achieving that balance may not necessarily be ofinterest. Typically, one metric is assigned greater weight – for instance, energy is minimized for a givenmaximum delay or delay is minimized for a given maximum energy. For these off-balance situations,othermetrics can be defined such as energy– delayn. Though interesting, these derivedmetrics however arerarely used, as they lead to optimization for only one target in the overall design space.
It is worth at this point to recap the definition of propagation delay: it is measured as the timedifference between the 50% transition points of the input and output waveforms. For modules withmultiple inputs and outputs, we typically define the propagation delay as the worst-case delay over allpossible scenarios.
Where Is Power Dissipated in CMOS?
Active (Dynamic) power– (Dis)charging capacitors
– Short-circuit powerBoth pull-up and pull-down on during transition
Static (leakage) power– Transistors are imperfect switches
Static currents– Biasing currents
Slide 3.4
Power dissipation sourcescan be divided in twomajor classes: dynamic andstatic. The differencebetween the two is that theformer is proportional tothe activity in the networkand the switching fre-quency, whereas the latteris independent of both.Until recently, dynamicpower vastly outweighedstatic power. With theemergence of leakage as amajor power component
54 Chapter #3
though, both should now be treated on an equal footing. Biasing currents for ‘‘analog’’ componentssuch as sense amplifiers or level converters strictly fall under the static power-consumption class,but originate from a design choice rather than a device deficiency.
Active (or Dynamic) Power
Sources:Charging and discharging capacitorsTemporary glitches (dynamic hazards)Short-circuit currents
Key property of active power:
fPdyn ∝where f is the switching frequency
Slide 3.5
As mentioned, dynamicpower is proportional tothe switching frequency.The charging and dischar-ging of capacitances is andshould be the main sourceof dynamic power dissipa-tion – as these operationsare at the core of what con-stitutes MOS digital circuitdesign. The other contribu-tions (short-circuit currentsand dynamic hazards orglitches) are parasitic effects
and should be made as small as possible.
Charging Capacitors
210 CVE =→
2
2
1CVER =
CV
R
CV
2
2
1CVE =
Applying a voltage step
Value of R does not impact energy!
d
dd dC
C
C
Slide 3.6
The following equation isprobably the most impor-tant one you will encoun-ter in this book: tocharge a capacitance Cby applying a voltagestep V, an amount ofenergy equal to CV2 istaken from the supply.Half of that energy isstored on the capacitor;the other half is dissi-pated as heat in the resis-tance of the charging net-work. During dischargethe stored energy in turn
is dissipated as heat as well. Observe that the resistance of the networks does not enter theequation.
Power and Energy Basics 55
Applied to Complementary CMOS Gate
One half of the power from the supply is consumed in the pull-up network and one half is stored on CL
Charge from CL is dumped during the 1→ 0 transitionIndependent of resistance of charging/discharging network
VDD
Vout
iL
CL
PMOS
NETWORK
NMOS
A 1
AN
NETWORK
210 DDLVCE =→
2
21
DDLR VCE =
2
21
DDLC VCE =
Slide 3.7
This model applies directlyto a digital CMOS gate,where the PMOS andNMOS transistors formthe resistive charge and dis-charge networks. For thesake of simplicity, the totalcapacitance of the networkis lumped into the outputcapacitance of the gate.
Slide 3.8
More generically, we cancompute the energy ittakes to charge a capaci-tance from a voltage V1
to a voltage V2. Usingsimilar math, we derivethat this requires fromthe supply an amount ofenergy equal toCV2(V2–V1). This equa-tion will come in handyfor a number of specialcircuits. One example isthe NMOS pass-transistorchain. It is well-knownthat the maximum voltage
at the end of such as chain is one threshold voltage below the supply [Rabaey03]. Using theafore-derived equation, we find that the energy dissipation in this case equalsCVDD(VDD�VTH), and is proportional to the swing at the output. In general, reducing theswing in a digital network results in a linear reduction in energy consumption.
Circuits with Reduced Swing
E0 1 = VC0
dVC
dtdt = CV dVC
0
V VT
= CV (V VTH )
Energy consumed is proportional to output swing
56 Chapter #3
Charging Capacitors – Revisited
RC EEE +=→10
2)( CVT
RCER =
R
CI
2
2
1CVEC =
Driving from a constant current source
22
0
)()( CVT
RCTRIdtRIIE
I
CVT
R ===
=
∫∞
Energy dissipated in resistor can be reducedby increasing charging time T (i.e., decreasing I )
Slide 3.9
So far, we have assumedthat charging a capacitoralways requires an amountof energy equal to CV2.This is true only when thedriving waveform is a vol-tage step. It is actually pos-sible to reduce the requiredenergy by choosing otherwaveforms. Assume, forinstance, that a currentsource with a fixed currentI is used instead. Underthose circumstances, theenergy consumed in theresistor is reduced to (RC/T)CV2 where T is the char-
ging time, and the output voltage rises linearly with time. Observe that the resistance of the networkplays a role under these circumstances. From this, it appears that the dissipation in the resistor canbe reduced to very small values, if not zero, by charging the capacitor very slowly (i.e., by reducing I).
Slide 3.10
In fact, the current-drivenscenario results in an actualenergy reduction over thevoltage-driven approachfor T > 2RC. As a refer-ence, the time it takes forthe output of the voltage-driven circuit to movebetween 0% and 90%points equals 2.3RC.Hence, the current-drivencircuit is more energy-effi-cient than the voltage-drivenone as long as it is slower.
For this scheme to work,the same approach shouldbe used to discharge the
capacitor, and the charge flowing through the source should be recovered. If not, the energygained is just wasted in the source.
The idea of ‘‘energy-efficient’’ charging gained a lot of attention in the 1990s. However, theinferior performance and the complexity of the circuitry ensured that the ideas remained confinedto the academic world. With the prospect of further voltage scaling bottoming out, these conceptsmay gain some traction anew (some more about this in Chapter 13).
Charging Capacitors
Using constant voltage or current driver?
Energy dissipated using constant-current charging can be made arbitrarily small at the expense of delay:Adiabatic charging
Econstant_current < Econstant_voltage
if
T > 2RC
Note: p(RC) = 0.69 RCt →
t
0 90%(RC) = 2.3 RC
Power and Energy Basics 57
Charging Capacitors
Driving using a sine wave (e.g., from resonant circuit)
Energy dissipated in resistor can be made arbitrarily smallif frequency ω << 1/RC (output signal in phase with input sinusoid)
R
Cv(t)
2
2
1CVEC =
Slide 3.11
Charging a capacitorusing a current source isonly one option. Othervoltage and current wave-forms can be imagined aswell. For instance, assumethat the input voltagewaveform is a sinusoidrather than a step. Afirst-order analysis showsthat this circuit outper-forms the voltage-stepapproach, for sinusoidfrequencies o below 1/RC.The easiest way to cometo this conclusion is to
evaluate the circuit in the frequency domain. The RC network is a low-pass filter with asingle pole at op = 1/RC. It is well-known that for frequencies much smaller than the pole,the output sinusoid has the same amplitude and the same phase as those of the input wave-form. In other words, no or negligible current is flowing through the resistor, and hence littlepower is dissipated. The attractive feature of the sinusoidal waveforms is that these are easilygenerated by resonant networks (such as LC oscillators). Again, with some notable exceptionssuch as power regulators, sinusoidal charging has found little industrial following.
Dynamic Power Consumption
Power = Energy per transition × Transition rate
= CLVDD2f0→1
= CswitchedVDD2f
Power dissipation is data dependent – depends on the switching probability,Switched capacitance Cswitched =
= CLVDD2f 0→1
p
CL= αCL(α is called the switching activity factor)
p0→1
p0→1
Slide 3.12
This brings us back to thegeneric case of the CMOSinverter. To translate thederived energy per opera-tion into power, it must bemultiplied with the rate ofpower-consuming transi-tions f0!1. The unit of theresulting metric is Watt(= Joules/sec). This trans-lation leads right away toone of the hardest problemsin power analysis and opti-mization: it requires knowl-edge of the ‘‘activity’’ of thecircuit. Consider a circuit
with a clock frequency f. The probability that a node will make a 0-to-1 transition at a given clocktick is given by af, where 0 � a � 1 is the activity factor at that node. As we discuss in the followingslides, a is a function of the circuit topology and the activity of the input signals. The accuracy ofpower estimation depends largely upon howwell the activity is known – which is most often not verymuch.
58 Chapter #3
The derived expression can be expanded for a complete module by summing over all nodes. Theaverage power is then expressed as (aC)V2f. Here aC is called the effective capacitance of themodule, and equals the average amount of capacitance that is being charged in the module everyclock cycle.
Impact of Logic Function
A B Out
0 0 1
0 1 0
1 0 0
1 1 0
Example: Static two-input NOR gate
Assume signal probabilitiespA =1 = 1/2pB =1 = 1/2
Then transition probabilityp0→1 = pout=0 × pout=1
= 3/4 × 1/4 = 3/16
αNOR = 3/16
If inputs switch every cycle
NAND gate yields similar result
Slide 3.13
Let us, for instance, derivethe activity of a two-inputNOR gate (which definesthe topology of the circuit).Assume that each input hasan equal probability of beinga 1 or a 0, and that the prob-ability of a transition at aclock tick is 50–50 as well,ensuringanevendistributionbetween states. With the aidof the truth table we derivethat the probability of a0!1 transition (or the activ-ity) equals 3/16.More gener-ally, theactivityat theoutputnode can be expressed as a
function of the 1-probabilities of the inputsA andB:�NOR ¼ pApBð1� pApBÞ.
Impact of Logic Function
A B Out
0 0 1
0 1 0
1 0 0
1 1 0
Example: Static two-input XOR gate
Assume signal probabilitiespA=1 = 1/2
= 1/2pB=1
= 1/2 × 1/2 = 1/4
Then transition probabilityp0→1
p0→1
= pout=0 × pout=1
= 1/4
If inputs switch every cycle
Slide 3.14
A similar analysis can beperformed for an XORgate. The observed activityis a bit higher (1/4).
Power and Energy Basics 59
Transition Probabilities for Basic Gates
p0→1
AND (1 – pApB )pApB
OR (1 – pA)(1 –pB)(1 – (1 –pA)(1 – pB))
XOR (1– (pA +pB – 2pApB))(pA + pB – 2 pApB)
Activity for static CMOS gatesα = p0p1
As a function of the input probabilities
Slide 3.15
These results can be gener-alized for all basic gates.
Activity as a Function of Topology
α NOR,NAND = (2N – 1)/22NαXOR = 1/4
XOR versus NAND/NOR
XOR
NAND/NOR
P
Slide 3.16
The topology of the logicnetwork has a majorimpact on the activity.This is nicely illustrated bycomparing the activity ofNAND (NOR) and XORgates as a function of fan-in. The output-transitionprobability of a NANDgate goes asymptoticallyto zero. The probability ofthe output being a 0 isindeed becoming smallerwith increasing fan-in. Anexample of such a networkis a memory-address deco-
der. On the other hand, the activity of an XOR network is independent of fan-in. This does notbode well for the power dissipation of modules such as large en(de)cryption and coding functions,which primarily consist of XORs.
Slide 3.17
One obvious question is how the choice of logic family impacts activity and power dissipation.Some interesting global trends can be observed. Consider, for instance, the case of dynamic logic.The only power-consuming transitions in pre-charged logic occur when the output evaluates to 0,after which it has to be recharged to a high in the next pre-charge cycle. Hence, the activity factor ais equal to the probability of the output being equal to 0. This means that the activity is alwayshigher in dynamic logic (compared to static), independent of the function. This does not mean per sethat the power dissipation of dynamic logic is higher, as the effective capacitance is the product of
60 Chapter #3
activity and capacitance,the latter being smaller indynamic logic. In generalthough, the higher activityoutweighs the capacitancegain.
Slide 3.18
Another interesting logicfamily is differential logic,which may seem attractivefor very low-voltagedesigns due to its increasedsignal-to-noise ratio. Dif-ferential implementationscome unfortunately withan inherent disadvantagefrom a power perspective:not only is the overall capa-citance higher, the activityis higher as well (for bothstatic and dynamic imple-mentations). The only
positive argument is that differential implementation reduces the number of gates needed for agiven function, and thus reduces the length of the critical path.
Slide 3.19
As activity is such an important parameter in the analysis of power dissipation, it is worthwhilespending some time on how to evaluate the activity of more complex logic networks. One maywonder whether it is possible to develop a ‘‘static power analyzer’’ along the lines of the ‘‘statictiming analyzer’’. The latter evaluates the propagation delay of a logic network analyzing only thetopology of the network without any simulation (hence the name ‘‘static’’). The odds for successful
How About Dynamic Logic?
Energy dissipated when effective output is zero!
or p0→1 = p0
V DD
Eval
Pre-charge
Always larger than p0p1!
Activity in dynamic circuits hence always higher than in static.But ... capacitance most often smaller.
E.g., p0→1(NAND) = 1/2N ; p0→1(NOR) = (2N – 1)/2N
Differential Logic?
V DD
Out Out
Gate
Static: Activity is doubled
Dynamic: Transition probability is 1!
Hence power always increases.
Power and Energy Basics 61
static power analysis seemfavorable at a first glance.Consider, for instance, thenetwork shown on the slide,and assume that the 1- and0-probabilities of the pri-mary input signals areknown. Using the basicgate expressions presentedearlier, the output signalprobabilities can be com-puted for the first layer ofgates starting from the pri-mary inputs. This process isthen repeated until the pri-mary outputs are reached.This process seems fairly
straightforward indeed. However, there is a catch. For the basic gate equations to be valid, the inputsmust be statistically independent. In probability theory, to say that two events are independentintuitively means that the occurrence of one event makes it neither more nor less probable that theother occurs. While this assumption is in general true for the network of the slide (assuming obviouslythat all the primary input signals are independent), it unfortunately rarely holds in actual circuits.
Reconvergent Fan-out (Spatial Correlation)
pZ
= 1–pA . p(X |A) = 1
Becomes complex and intractable real fast
Inputs to gates can be interdependent (correlated)
no reconvergence
pZ = 1–(1–pA )pB
A XA XZ
B
Z
reconvergent
pZ = 1–(1–pA)pA ?NO!
pZ = 1
reconvergence
Must use conditional probabilities
PZ: probability that Z = 1
probability that X = 1 given that A = 1
Slide 3.20
Even if the primary inputs toa logic network are indepen-dent, the signals may becomecorrelated or ‘‘colored’’, whilethey propagate through thelogic network. This is bestillustrated with a simpleexample, which showcasesthe impact of a networkproperty called reconvergentfan-out. In the rightmost cir-cuit, the inputs to theNANDgate Z are not independent,but are both functions of thesame input signalA. To com-pute the output probabilitiesof Z, the expression derived
earlier for aNANDgate is no longer applicable, and conditional probabilities need to be used.Conditionalprobability is the probability of some event A, given the occurrence of some other event B. Conditionalprobability is expressed as p(A|B), and is read as ‘‘the probability ofA, givenB’’.More specifically, one canderive that pðAjBÞ ¼ pðA \ BÞ=pðBÞ; assuming that p(B)6¼ 0.
While propagating these conditional probabilities through the network is theoretically possible,you may guess that the complexity of doing so for complex networks rapidly becomes unmanage-able – and that indeed is the case.
Evaluating Power Dissipation of Complex Logic
Simple idea: start from inputs and propagate signal probabilities to outputs
But:– Reconvergent fan-out– Feedback and temporal/spatial correlations
0.10.50.9
0.1
0.1
0.5
0.5
0.045
0.25
0.99 0.989
p1
62 Chapter #3
Temporal Correlations
Activity estimation the hardest part of power analysisTypically done through simulation with actual input vectors (see later slides)
R LogicX
Feedback
X is a function of itself→ correlated in time
Temporal correlation ininput streams
01010101010101...00000001111111...
Both streams have same P = 1 but different switching statistics
Slide 3.21
The story gets complicatedeven further by the occur-rence of temporal correla-tions. A signal shows tem-poral correlation if a datavalue in the signal stream isdependent upon previousvalues in the stream. Tem-poral correlations are thenorm in sequential net-works, as any signal in thenetwork is typically a func-tion of its previous valuesowing to the existence offeedback network. In addi-tion, primary input signals
as well may show temporal dependence. For example, in a digitized speech signal any sample valueis dependent upon the previous values.
All these arguments help to illustrate that static activity analysis is a very hard problem indeed,and actually all but impossible. Hence, power analysis tools either rely on simulations of actualsignal traces to derive the signal probabilities or make simplifying assumptions – for instance, it isassumed that the input signals are independent and purely random. This is discussed in more detailin Chapter 12. In the following chapters, we will most often assume that activity of a module in itstypical operation mode can be characterized by an independent parameter a.
Glitching in Static CMOS
ABC 101
X
Z
Gate Delay
The result is correct,but extra power is dissipated
A
BX
ZC
Glitch
Analysis so far did not include timing effects
Also known as dynamic hazards:“A single input change causingmultiple changes in the output”
000
Slide 3.22
So far, we have assumedthat the dynamic powerdissipation solely resultsfrom the charging (and dis-charging) of capacitancesin between clock events.Some additional sources ofdynamic power dissipation(i.e., proportional to theclock frequency) should beconsidered. Though (dis)-charging of capacitors isessential to the operationof a CMOS digital circuit,dynamic hazards and short-circuit currents are not.
They should be considered as parasitic, and be kept to an absolute minimum.A dynamic hazard occurs when a single input change causes multiple transitions at the output of
a gate. These events, also known as ‘‘glitches’’, are obviously wasteful, as a capacitor is chargedand/or discharged without having an impact on the final result. In the analysis of the transition
Power and Energy Basics 63
probabilities of complex logic circuits, presented in the earlier slides, glitches did not appear, as thepropagation delays of the individual gates were ignored – all events were assumed to be instanta-neous. To detect the occurrence of dynamic hazards a detailed timing analysis is necessary.
Example: Chain of NAND Gates
1Out1 Out2 Out3 Out4 Out5
0 200 400 6000.0
1.0
2.0
3.0
Time (ps)
Out8
Out6
Out 2
Out6
Out1
Out3
Out7
Out5
Vol
tage
(V
)
Slide 3.23
A typical example of theeffect of glitching is illu-strated in this slide, whichshows the simulatedresponse of a chain ofNAND gates with allinputs going simulta-neously from 0 to 1. Initi-ally, all the outputs are 1, asone of the inputs was 0. Forthis particular transition,all the odd bits must transi-tion to 0, while the even bitsremain at the value of 1.However, owing to thefinite propagation delay,
the even output bits at the higher bit positions start to discharge, and the voltage drops. Whenthe correct input ripples through the network, the output ultimately goes high. The glitch on theeven bits causes extra power dissipation beyond what is required to strictly implement the logicfunction. Although the glitches in this example are only partial (i.e., not from rail to rail), theycontribute significantly to the power dissipation. Long chains of gates often occur in importantstructures such as adders and multipliers, and the glitching component can easily dominate theoverall power consumption.
What Causes Glitches?
A,B
AX
Y
Z
B
C
D
C,D
X
Y
Z
A
B
D
C
XZ
Y
A,B
C,D
X
Y
Z
Uneven arrival times of input signals of gates due tounbalanced delay pathsSolution: balancing delay paths!
Slide 3.24
The occurrence of glitchingin a circuit is mainly due to amismatch in the path lengthsin the network. If all inputsignals of a gate changesimultaneously, no glitchingoccurs. On the other hand,if input signals change atdifferent times, a dynamichazard may develop. Sucha mismatch in signal timingis typically the result of dif-ferent path lengths withrespect to the primary inputsof the network. This is illu-strated in this slide, wheretwo equivalent, but topolo-
gically different, realizations of the function F = A.B.C.D are analyzed. Assume that the AND gate
64 Chapter #3
has a unit delay. The leftmost network suffers from glitching as a result of the disparity between thearrival times of the input signals for gates Y and Z. For example, for gate Z, inputD settles at time 0,whereas inputYonly settles at time 2.Redesigning the network so that all arrival times are identical candramatically reduce the number of superfluous transitions, as shown in the rightmost network.
Short-Circuit Currents
(also called crowbar currents)
Vout
CL
VDD
V in
VDD–VTH
Isc
VTHt
t
Ipeak
PMOS and NMOS simultaneously ON during transition
Psc ~ f
IscV in
Slide 3.25
So far, it was assumed that theNMOS and PMOS transis-tors of a CMOS gate arenever ON simultaneously.Thisassumption isnot entirelycorrect, as the finite slope ofthe input signal during switch-ing causes a direct currentpath between VDD andGND for a short period oftime. The extra power dissi-pation due to these ‘‘short-circuit’’ or ‘‘crowbar’’ cur-rents is proportional to theswitching activity, similarto the capacitive powerdissipation.
Short-Circuit Currents
Equalizing rise/fall times of input and output signals limits Psc to 10–15% of the dynamic dissipation
Large load Small load
VDD
=
time (s)0 20
−0.5
0
0.5
1
1.5
2
2.5
40 60
I sc (A
)
× 10−4
C = 20 fF
C = 100 fF
C = 500 fF
IMAXIsc
VoutCL
Vin
VDD
Isc∼ 0
VoutCL
Vin
[Ref: H. Veendrick, JSSC’84]
Slide 3.26
The peak value of the short-circuit current is also astrong function of the ratiobetween the slopes of theinput and output signals.This relationship is best illu-strated by the followingsimple analysis: Consider astatic CMOS inverter with a0!1 transition at the input.Assume first that the loadcapacitance is very large,so that the output fall timeis significantly larger thanthe input rise time (leftside). Under those circum-stances, the input movesthrough the transient region
before the output starts to change. As the source–drain voltage of the PMOS device is approximatelyzero during that period, the device shuts off without ever delivering any current. The short-circuitcurrent is close to zero. Consider now the reverse case (right side), where the output capacitance isvery small, and the output fall time is substantially smaller than the input rise time. The drain–sourcevoltage of the PMOS device equals VDD for most of the transition time, guaranteeing a maximal
Power and Energy Basics 65
short-circuit current. This clearly represents the worst-case condition. The conclusions of thisintuitive analysis are confirmed by the simulation results.
This analysis may lead to the (faulty) conclusion that the short-circuit dissipation is minimizedby making the output rise/fall time substantially larger than the input rise/fall time. On the otherhand, making the output rise/fall time too large slows down the circuit, and causes large short-circuit currents in the connecting gates. A more practical rule that optimizes the power consump-tion in a global way, can be formulated: The power dissipation due to short-circuit currents isminimized by matching the rise/fall times of the input and output signals. At the overall circuit level,this means that rise/fall times of all signals should be kept constant within a range. Equalizing theinput and output transition times of a gate is not the optimum solution for the individual gate, butkeeps the overall short-circuit current within bounds (maximum 10–15% of the total dynamicpower dissipation). Observe also that the impact of short-circuit current is reduced when we lowerthe supply voltage. In the extreme case, when VDD < VTHn + |VTHp|, the short-circuit dissipationis completely eliminated, because the devices are never ON simultaneously.
Modeling Short-Circuit Power
Can be modeled as capacitor
)( baτin
τoutkCSC +=
a, b: technology parametersk : function of supply and threshold voltages, and transistor sizes
2DDSCSC VCE =
Easily included in timing and power models
Slide 3.27
As the short-circuit power isproportional to theclock frequency, it canbemod-eled as an equivalent capacitor:Psc = CscVDD
2f, which thencan be lumped into the outputcapacitance of the gate. Beaware however that CSC is afunction of the input and out-put transition times.
Transistors Leak
Drain leakage– Diffusion currents
– Drain-induced barrier lowering (DIBL)
Junction leakage– Gate-induced drain leakage (GIDL)
Gate leakage– Tunneling currents through thin oxide
Slide 3.28
Although dynamic power tra-ditionally has dominated thepower budget, static powerhas become an increasingconcern when scaling below100nm. The main reasonsbehind this have beendiscussed at length inChapter 2. Sub-thresholddrain–source leakage, junc-tion leakage, and gate leakageall play important roles, but incontemporary design it is thesub-threshold leakage that isthe main cause of concern.
66 Chapter #3
Sub-threshold Leakage
Off-current increases exponentially when reducing VTH
S
V
leak
TH
W
WII
−
= 100
0 Pleak = VDD.I leak
Slide 3.29
In Chapter 2, it waspointed out that the mainreason behind the increasein drain–source leakage isthe gradual reduction ofthe threshold voltageforced by the lowering ofthe supply voltages. Anyreduction in threshold vol-tage causes the leakage cur-rent to grow exponentially.The chart illustrating this isrepeated for the purpose ofclarity.
Sub-Threshold Leakage
Leakage current increases with drain voltage(mostly due to DIBL)
S
VV λ
λ
leak
DSdTH
W
WII
+−
= 100
0 (for VDS > 3 kT/q)
Hence
)10)(10(0
0S
V
DDS
V
leak
DDdTH
VW
WIP
−
=
Leakage power is a strong function of supply voltage
Slide 3.30
An additional factor is theincreasing impact of theDIBL effect. Combiningthe equations for sub-threshold leakage and theinfluence of DIBL on VTH,an expression for the leak-age power of a gate can bederived. Observe the expo-nential dependence of leak-age power upon both VTH
and VDD.
Slide 3.31
The dependence of the leakage current on the applied drain–source voltage creates some interestingside effects in complex gates. Consider, for example, the case of a two-input NAND gate where thetwo NMOS transistors in the pull-down network are turned off. If the off-resistance of NMOStransistors would be fixed, and not a function of the applied voltage, one would expect that thedoubling of the resistance by putting two transistors in series would halve the leakage current(compared to a similar-sized inverter).
An actual analysis shows that the reduction in leakage is substantially larger. When the pull-down chain is off, node M settles to an intermediate voltage, set by balancing the leakagecurrents of transistors M1 and M2. This reduces the drain–source voltage of both transistors(especially of transistor M2), which translates into a substantial reduction in the leakage currents
Power and Energy Basics 67
due to DIBL. In addition,the gate–source voltage oftransistor M1 becomesnegative, resulting in anexponential reduction ofthe leakage current. Thisis further augmented bythe reverse body-biasing,which raises the thresholdof M1 – this effect is onlysecondary though.Using the expressions for
the leakage currentsderived earlier, we candetermine the voltagevalue of the intermediatenode, VM, and derive an
expression for the leakage current as a function of the DIBL factor �d and the sub-thresholdswing S. The resulting equation shows that the reduction in leakage current obtained by stackingtransistors is indeed larger than the linear factor one would initially expect. This is called thestacking effect.
Stacking Effect
factor 9
Leakage Reduction
2 NMOS 9
3 NMOS 17
4 NMOS 24
2 PMOS 8
3 PMOS 12
4 PMOS 16
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.5
1
1.5
2
2.5
3× 10–9
VM (V)
I leak
(A) IM1 IM2
90 nm NMOS
Slide 3.32
Themechanics of the stack-ing effect are illustratedwith the example of twostacked NMOS transistors(as in the NAND gate ofSlide 3.31) implemented ina 90 nm technology. Thecurrents through transis-torsM1 andM2 are plottedas a function of the inter-mediate voltage VM. Theactual operation point issituated at the crossing ofthe two load lines. As canbe observed, the drain–source voltage of M2 isreduced from 1 V to
60 mV, resulting in a ninefold reduction in leakage current. The negative VGS of 60 mV fortransistor M1 translates into a similar reduction.
The impact of the stacking effect is further detailed in the table, which illustrates the reduction inleakage currents for various stack sizes in 90 nm technology. The leakage reductions for bothNMOS and PMOS stacks are quite impressive. The impact is somewhat smaller for the PMOSchains, as the DIBL effect is smaller for those devices. The stacking effect will prove to be apowerful tool in the fight against static power dissipation.
Stacking Effect
NAND gate:
Assume that body-biasing effect in short-channeltransistor is small
S
VV MdTH
II+−
′= 100
S
VVVV
leak,M1
leak,M2
MDDdTHM
II)(
0 10−+−−
′=
DDd
dM VV
21+≈
)21
1(
10 d
dDDd
S
Vλ λ
λ
λ
λλ
λ
inv
stack
I
I ++−
≈ (instead of theexpected factor of 2)
VDD
68 Chapter #3
Gate Tunneling
IGD~ e–Tox eVGD, IGS ~ e–Tox eVGS
Independent of the sub-threshold leakage 0V
IGS
Exponential function of supply voltage
Modeled in BSIM4Also in BSIM3v3 (but not always included in foundry models)
NMOS gate leakage usually worse than PMOS
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
× 10
VDD (V)
I gat
e(A
)
90 nm CMOSILeak
IGD
VDD
VDD
ISUB
Slide 3.33
While sub-threshold cur-rents dominate the staticpower dissipation, otherleakage sources should notbe ignored. Gate leakage isbecoming significant in thesub-100 nm era. Gate leak-age currents flow from onelogical gate into the nextone, and hence have a pro-foundly different impact ongate operation comparedto sub-threshold currents.Whereas the latter can bereduced by increasingthreshold voltages, theonly way to reduce thegate leakage component is
to decrease the voltage stress over the gate dielectric – which means reducing voltage levels.Similar to sub-threshold leakage, gate leakage is also an exponential function of the supply voltage.
This is illustrated by the simulation results of a 90 nmCMOS inverter. Themaximum leakage currentis around 100 pA, which is an order of magnitude lower than the sub-threshold current. Yet, even forthese small values, the impact can be large, especially if one wants to store a charge on a capacitor fora substantial amount of time (such as in DRAMs, charge pumps, and even dynamic logic).Remember also that the gate leakage is an exponential function of the dielectric thickness.
Other Sources of Static Power Dissipation
Diode (drain–substrate) reverse-bias currents
n well
n+n+ n+p+p+p+
p substrate
• Electron-hole pair generation in depletion region of reverse-biased diodes• Diffusion of minority carriers through junction• For sub-50 nm technologies with highly doped pn junctions, tunneling through narrow depletion region becomes an issue
Strong function of temperature
Much smaller than other leakage components in general
Slide 3.34
Finally, junction leakage,though substantially smal-ler than the previouslymentioned leakage contri-butions, should not beignored. With the decreas-ing thickness of the deple-tion regions owing to thehigh doping levels, sometunneling effects maybecome pronounced insub-50 nm technologynodes. The strong depen-dence upon temperaturemust again be emphasized.
Power and Energy Basics 69
Other Sources of Static Power Dissipation
Circuit with dc bias currents:
Should be turned off when not in use, or standby current should be minimized
sense amplifiers, voltage converters and regulators, sensors, mixed-signal components, etc.
Slide 3.35
A majority of the state-of-the-art digital circuits containa number of analog compo-nents. Examples of such cir-cuits are sense amplifiers,reference voltages, voltageregulators, level converters,and temperature and leakagesensors.Onepropertyof eachof these circuits is that theyneedabias current for correctoperation. These currentscan become a sizable part ofthe total static power budget.To reduce their contribution,twomechanisms can be used:
(1) Trade off performance for current – Reducing the bias current of an analog circuit, in general,impacts its performance. For instance, the gain and slew rate of an amplifier benefit, from ahigher bias current.
(2) Power management – some analog components need to operate only for a fraction of the time.For example, a sense amplifier in a DRAM or SRAMmemory only needs to be ON at the endof the read cycle. Under those conditions the static power can be substantially reduced byturning off the bias when not in use. While being most effective, this technique does not alwayswork as some bias or reference networks need to be ON all the time, or their start-up timewould be too long to be practical.
In short, every analog circuit should be examined very carefully, and bias current and ON timeshould be minimized. The ‘‘a bias should never be on when not used’’ principle rules.
Summary of Power Dissipation Sources
α – switching activityC L – load capacitanceCSC – short-circuit capacitanceVswing – voltage swingf – frequency
P ~ α ⋅ (CL + CSC) ⋅ Vswing ⋅ VDD ⋅ f + (IDC + ILeak) ⋅ VDD
IDC – static currentIleak – leakage current
operationenergy
P × rate + staticpower=
Slide 3.36
From all the preceding dis-cussions, a global expressionfor the power dissipation ofa digital circuit can bederived. The two majorcomponents, the dynamicand static dissipation, areeasily recognized. An inter-esting perspective on therelationship between thetwo is obtained by realizingthat a given computation(such as a multiplication orthe execution of an instruc-tion on a processor) is bestcharacterized by its energy
cost. Static dissipation, on the other hand, is best captured as a power quantity. To determine the
70 Chapter #3
relative balance between the two, the former must be translated into power by multiplying it with itsexecution rate, or, in other words, the activity. Hence, precise knowledge of the activity is essential ifone wants to estimate the overall power dissipation. Note: in a similar way, multiplying the staticpower with the time period leads to a global expression for energy.
The Traditional Design Philosophy
Maximum performance is primary goal
– Minimum delay at circuit level
Architecture implements the required function
with target throughput, latency
Performance achieved through optimum sizing,
logic mapping, architectural transformations
Supplies, thresholds set to achieve maximum
performance, subject to reliability constraints
Slide 3.37
The growing importance ofpower minimization andcontainment is revolutio-nizing design as we knowit. Methodologies thatwere long-accepted have tobe adjusted, and estab-lished design flows modi-fied. Although this trendwas visible already a dec-ade ago in the embeddeddesign world, it was onlyrecently that it started to
upend a number of traditional beliefs in the high-performance design community. Ever-higherclock frequencies were the holy grail of the microprocessor designer. Though architectural opti-mizations played a role in the performance improvements demonstrated over the years, reducingthe clock period through technology scaling was responsible for the largest fraction.
Once the architecture was selected, the major function of the design flow was to optimize thecircuitry through sizing, technology mapping, and logical transformations so that the maximumperformance was obtained. Supply and threshold voltages were selected in advance to guaranteetop performance.
CMOS Performance Optimization
Sizing: Optimal performance with equal fan-out per stage
Extendable to general logic cone through “logical effort”Equal effective fan-outs (giCi+1/Ci ) per stageExample: memory decoder
CL
CL
pre-decoder
3 15
CW
word driver
addrinput
wordline
{Ref: I. Sutherland, Morgan-Kaufman‘98]
Slide 3.38
This philosophy is best-reflected in the popular‘‘logical effort’’-based designoptimization methodology.The delay of a circuit ismini-mized if the ‘‘effective fan-out’’ of each stage is madeequal (and set to a value ofapproximately 4). Thoughthis technique is very power-ful, it also guarantees thatpower consumption is max-imal! In the coming chap-ters, we will reformulate thelogical-effort methodologyto bring power into theequation.
Power and Energy Basics 71
Slide 3.39
That the circuit optimiza-tion philosophy of old canno longer be maintainedis best illustrated by thissimple example (after She-khar Borkar from Intel).Assume a microprocessordesign implemented in agiven technology. Apply-ing a single technologyscaling step reduces the cri-tical dimensions of the chipby a factor of 0.7. Generalscaling, which reduces thevoltage by the same factor,increases the clock fre-
quency by a factor of 1.41. If we take into account the fact that the die size typically increases(actually, used to increase is a better wording) by a factor of 14% between generations, the totalcapacitance of the die increases by a factor of (1/0.7) � 1.142 = 1.86. (This simplified analysisassumes that all the extra transistors are used to good effect ). The net effect is that the powerdissipation of the chip increases by a factor of 1.3.
However, microprocessor designers tend to push harder than that. Over the past decades,processor frequency increased by a factor of 2 between technology generations. The extra perfor-mance improvement was obtained by circuit optimizations, such as a reduction in the logical depth.Maintaining this rate of improvement now pushes the power dissipation up by a factor of 1.8.
The situation gets even worse when the slowdown in supply voltage scaling is taken into account.Reducing the supply voltage even by a factor of 0.85 means that the power dissipation now rises by270% from generation to generation. As this is clearly unacceptable, a change in design philosophywas the only option.
The New Design Philosophy
Maximum performance (in terms ofpropagation delay) is too power-hungry,and/or not even practically achievable
Many (if not most) applications either cantolerate larger latency or can live with lower-than-maximum clock speeds
Excess performance (as offered bytechnology) to be used for energy/powerreduction
Trading off speed for power
Slide 3.40
This revised philosophybacks off from the ‘‘maxi-mum performance at allcost’’ theory, and abandonsthe notion that clock fre-quency is equivalent toperformance. The ‘‘designslack’’ that results from aless-than-maximum clockspeed in a new technologycan now be used to keepdynamic and static powerwithin bounds. Perfor-mance increase is stillpossible, but now comesmostly from architectural
optimizations – sometimes, but not always, at the expense of extra die area. Design now becomesa trade-off exercise between speed and energy (or power).
Model Not Appropriate Any Longer
Traditional scaling model
CVDDf2 3.1)7.0
1()7.0()14.1
7.01
(Power 22 =×××==
1),
7.0(Freqand,7.0VDDIf ==
CVDD 8.1)2()7.0()14.17.0
1(fPower
,2Freqand,7.0V DDIf222 =×××==
==
CVDD 7.2)2()85.0()14.17.0
1(fPower
,2Freqand,85.0VDDIf222 =×××==
==
Maintaining the frequency scaling model
While slowing down voltage scaling
72 Chapter #3
12
34
–0.400.40.8
0
0.2
0.4
0.6
0.8
1× 10
–4
VTH (V)
VDD (V)
Po
wer
(W
)
A
B
12
34
–0.400.40.8
0
1
2
3
4
5× 10–10
Del
ay (
s)
VTH(V)
VDD (V)
AB
For a given activity level, power is reduced while delay is unchanged if both VDD and VTH are lowered, such as from A to B
Relationship Between Power and Delay
[Ref: T. Sakurai and T. Kuroda, numerous references]
Slide 3.41
This trade-off is wonder-fully illustrated by this setof, by now legendary,charts. Originated byT. Kuroda and T. Sakuraiin the mid 1990s, thegraphs plot power and(propagation) delay of aCMOS module as a func-tion of the supply andthreshold voltages – twoparameters that were con-sidered to be fixed in earlieryears. The opposing natureof optimization for perfor-mance and power becomesobvious – the highest per-
formance happens to occur exactly where power dissipation peaks (high VDD, low VTH). Anotherobservation is that the same performance can be obtained at a number of operational points withvastly different levels of power dissipation. The existence of these ‘‘equal-delay’’ and ‘‘equal-power’’curves proves to be an important optimization instrument when trading off in the delay–power (orenergy) space.
The Energy–Delay Space
VTH
VD
D
Equal-performance curves
Energy minimum
Equal-energy curves
Slide 3.42
Contours of identical per-formance or energy aremore evident in the two-dimensional plots of thedelay and the (average)energy per operation asfunctions of supply andthreshold voltages. The lat-ter is obtained by multiply-ing the average power (asobtained using the expres-sions of Slide 3.36) by thelength of the clock period.Similar trends as shown inthe previous slide can beobserved. Particularlyinteresting is that a point
of minimum energy can be located. Lowering the voltages beyond this point makes little sense asthe leakage energy dominates, and the performance deteriorates rapidly.
Be aware that this set of curves is obtained for one particular value of the activity. For othervalues of a, the balance between static and dynamic power shifts, and so do the trade-off curves.Also, the curves shown here are for fixed transistor sizes.
Power and Energy Basics 73
Energy–Delay Product As a Metric
delay
energy
energy–delay product
90 nm technologyVTH approx 0.35 V
Energy–delay product exhibits minimum at approximately 2VTH (typical unless leakage dominates)
0.6 0.7 0.8 0.9 1 1.1 1.20
0.5
1
1.5
2
2.5
3
3.5
VDD
Slide 3.43
A further simplification ofthe graphs is obtained bykeeping the threshold vol-tage constant. The oppos-ing trends between energyand delay when reducingthe supply voltage areobvious. One would expectthat the product of the two(the energy–delay productor EDP) to show a mini-mum, which it does. Infact, it turns out that forCMOS designs, the mini-mum value of the EDPoccurs approximately
around two times the device threshold. In fact, a better estimate is 3VTH/(3–a) (with a the fitparameter in the alpha delay model – not to be confused with the activity factor). For a=1.4, thistranslates to 1.875VTH. Although this is an interesting piece of information, its meaning should notbe over-estimated. As mentioned earlier, the EDPmetric is only useful when equal weight is placedon delay and energy, which is rarely the case.
In energy-constrained world, design is trade-off process
Minimize energy for a given performance requirement
Maximize performance for given energy budget
Delay
Unoptimized design
DmaxDmin
Energy
Emin
Emax
Exploring the Energy–Delay Space
Pareto-optimaldesigns
♦
♦
[Ref: D. Markovic, JSSC’04]
Slide 3.44
The above charts amplydemonstrate that designfor low power is a trade-off process. We havefound that the best way tocapture the duality betweenperformance and energyefficiency is the energy–de-lay curves. Given a particu-lar design and a set ofdesign parameters, it is pos-sible to derive a pareto-optimal curve that forevery delay value givesthe minimum attainableenergy and vice versa. Thiscurve is the best character-
ization of the energy and performance efficiency of a design. It also helps to redefine the designproblem from ‘‘generate the fastest possible design’’ into a two-dimensional challenge: given amaximum delay, minimize the energy, or, given the maximum energy, find the design with theminimum delay.
We will use energy–delay curves extensively in the coming chapters. In the next chapter, weprovide effective techniques to derive the energy–delay curves for a contemporary CMOS design.
74 Chapter #3
Summary
Power and energy are now primary design constraints
Active power still dominating for most applications–Supply voltage, activity and capacitance the key
parameters
Leakage becomes major factor in sub-100 nm technology nodes–Mostly impacted by supply and threshold voltages
Design has become energy–delay trade-off exercise!
Slide 3.45
In summary, we have ana-lyzed in detail the varioussources of power dissipa-tion in today’s CMOS digi-tal design, and we havederived analytical andempirical models for all ofthem. Armed with thisknowledge, we are readyto start exploring themany ways of reducingpower dissipation andmaking circuits energy-effi-cient. One of the main les-sons at the end of this storyis that there is no free lunch.
Optimization for energy most often comes at the expense of extra delay (unless the initial design issub-optimal in both, obviously). Energy–delay charts are the best way to capture this duality.
References
D. Markovic, V. Stojanovic, B. Nikolic, M.A. Horowitz and R.W. Brodersen, “Methods for true energy–performance optimization,”IEEE Journal of Solid-State Circuits, 39(8), pp. 1282–1293, Aug. 2004.
J. Rabaey, A. Chandrakasan and B. Nikolic, Digital Integrated Circuits: A Design Perspective,” 2nd ed, Prentice Hall 2003.T. Sakurai, “Perspectives on power-aware electronics,” Digest of Technical Papers ISSCC, pp. 26–29, Feb. 2003.
I. Sutherland, B. Sproull and D. Harris, “Logical Effort”, Morgan Kaufmann, 1999. H. Veendrick, “Short-circuit dissipation of static CMOS circuitry and its impact on the design of buffer circuits,” IEEE Journal of Solid-State Circuits, SC-19(4), pp. 468–473, 1984.
Slide 3.46
Some references . . .
Power and Energy Basics 75