Upload
mohammad-johar
View
230
Download
0
Embed Size (px)
Citation preview
8/10/2019 Leakage Pres
1/44
Leakage Modeling and
ReductionAmit Agarwal, Lei He et. al
Presenters: Qun Gu
Ho-Yan Wong
Courtesy of Lei He
8/10/2019 Leakage Pres
2/44
Outline
Introduction
Circuit level leakage reduction
System level leakage reduction
Coupled leakage and thermal simulation
and management
8/10/2019 Leakage Pres
3/44
Power Trends
8/10/2019 Leakage Pres
4/44
Circuit Power Dynamic Power:
determined by circuitperformance requirementetc. The percentage isgetting smaller.
Short_Circuit Power: BothPU and PD circuit partiallyconduct. Small percentage.(
8/10/2019 Leakage Pres
5/44
Leakage Power Sources
Subthreshold leakage
Reverse Biased Junction
BTBT Leakage
Gate Leakage
Reverse BiasedJunction BTBT
GateSource
n+n+
Bulk
Drain
SubthresholdLeakage
Gate Leaka e
8/10/2019 Leakage Pres
6/44
Leakage Dependences
8/10/2019 Leakage Pres
7/44
Circuit Techniques to Reduce
Leakage Design Time Techniques
Dual threshold CMOS
Run Time Techniques Standby Leakage Reduction Techniques
Natural Transistor Stacks
Sleep Transistor (MTCMOS)
Forward/Reverse Body Biasing (VTCMOS)
Active Leakage Reduction Techniques
Dynamic Vth Scaling (DVTS)
8/10/2019 Leakage Pres
8/44
Dual Threshold CMOS
Low Vth for critical path
High Vth for non-critical path
Concerns:It is not so straigtht forward to do this. Sometime tradeoff exist
between high Vth and low Vth applications.
Vth variation cannot be always success at low voltage supplies.
Increasing the number of critical paths will sometimes hurt
circuit performance.
Adjust Vth approaches in fabrication:Adjustment of tox (the higher tox, the higher Vth)
How?
8/10/2019 Leakage Pres
9/44
Natural Transistor Stacks
Reduce the leakage by stacking the devices.
Trade off between speed and
powerData pattern determined
Trade off with other leakage
power ( gate leakage)
How?
Concerns:
8/10/2019 Leakage Pres
10/44
Sleep Transistor (MTCMOS)
How?Inserts an extra series connected transistor
(sleep transistor with high Vth) in the PU/PDpath of a gate and turns it off in the standby
mode of operation.
Disadvantages:Increase area and delay
Data retention problemHard to turn on completely at very low
supply voltages
8/10/2019 Leakage Pres
11/44
Improvements for MTCMOS --
VRC Virtual power/ground Rails Clamp
(VRC)
Solves data retention problem
with diodes Virtual level changes are clamped
Allow data to be retained in SRAM
arrays
Alternatives: Super cutoff CMOS (with
low Vth) (SCCMOS)
In standby mode, PMOS gate is
Vcc+0.4v, NMOS is Vss-0.4v to fully
cut off leakage.
8/10/2019 Leakage Pres
12/44
Forward/Reverse Body Biasing (VTCMOS)
RBB (Reverse Body Bias):zerobody bias in active mode, a deep
reverse bias in standby mode.
FBB (Forward Body Bias):high Vth instandby mode, forward body biasing to
achieve better current drive in active mode.
Disadvantages:
Increase PN junction reverseleakage
Scaling down technology worsen
short channel effects and weaken
the Vth modulation capability
Disadvantages:Larger junction capacitance
High body effect for stack devices
Technology improvement for high Vth:Different doping profile
Higher work function materials
8/10/2019 Leakage Pres
13/44
Dynamic Vth Scaling (DVTS)
The lowest Vth is delivered (NBB-no body bias) if the highestperformance is required.
When the performance demand is low, clock frequency is lowered
and Vth is raised via RBB to reduce the run time leakage power
dissipation.
How?When critical path replica frequency is less then reference CLK,
adjust bias to decrease Vth.
Otherwise adjust bias to increase Vth.
Results:
8/10/2019 Leakage Pres
14/44
Process Variation and Leakage
IDSATand IOFFvariation measured (150nm process).
Variation Sources:Channel length
Transistor width
Oxide thickness
Flat-band voltage
Random dopant effect
The effects of largerspread of leakage:Robustness of logic
circuits.
Circuit design margin.
Circuit Techniques for Compensation Process Variation:Adaptive body biasing for process compensation
Process variation compensation in dynamic circuits
8/10/2019 Leakage Pres
15/44
Adaptive Body Biasing for Process
CompensationDue to the worsening parameter fluctuation:Some dies may not meet the target frequency.
Others exceed the leakage power constraints.
How?The slow dies which fail to meet the desired frequency can be forward
body biased to improve performance which paying more leakage power.
On the other hand, excess leakage dies can be reverse body biased to
meet the leakage power specifications.
Effects:So adaptive body bias reduces the spread of the die frequency distribution
by 7X, compared to a conventional zero body bias.
8/10/2019 Leakage Pres
16/44
Process Variation Compensation in Dynamic Circuits (I)
Programmable
keeper size scheme:A desired effective keeper
width can be chosenamong {0, W, 2W, 7W}
according to the control
bit.
Dynamic Circuits need keepers to compensate leakage current to keepdata.
The consideration for keepers size:Unnecessary large keeper size will hurt circuit performance
Excess leakage dies can not meet the robustness requirements
without enough keeper size.
8/10/2019 Leakage Pres
17/44
Process Variation Compensation in Dynamic Circuits (II)
Simulation Results:5X reduction in the number of robustness failing dies and 10%
improvement in average performance.
Variation spread of the robustness and delay distribution is reduced
by 55% and 35%
8/10/2019 Leakage Pres
18/44
System Level Leakage Reduction
Motivation
Leakage characteristics and reduction
Coupled leakage and thermal simulation
and management
Power and thermal simulation
Dynamic power and thermal management
Vdd scaling with cooling selection
8/10/2019 Leakage Pres
19/44
Motivation
Leakage current has increased due to
scaling in Vt, L, and tox
Leakage power becomes more importantdue to high leakage devices and low
activity rates
Leakage power depends greatly ontemperature
8/10/2019 Leakage Pres
20/44
Power States at System Level
3 Power states defined at system level:
1. Active Modecircuit in operation;
P= Pd+ Ps
2. Standby Modecircuit is idle but ready
to execute; P= Ps
3. Inactive Modecircuit is deactivated by
leakage reduction techniques; P < Ps
8/10/2019 Leakage Pres
21/44
System Level Leakage Power
Modeling Early model:
Ps = Vdd *NFET *kdesign *Ileakage
Later model, with application of 2 leakage
power reduction techniques (later):
Ps = Vdd *Ngate *Iavg
8/10/2019 Leakage Pres
22/44
Leakage Power Characteristics
Minimum Idle Time (M.I.T)
M.I.T. = {Es-i+ Ei-sPi* (ts-i+ ti-s)} / (PsPi)
Idle Period
Leakage power reduction is useful only
when Idle Period > M.I.T.
8/10/2019 Leakage Pres
23/44
Runtime Leakage Reduction for
Caches Caches dissipate large amount of leakage
power due to large SRAM array structures
Different techniques are developed toreduce L1 cache Ps, e.g. DRI, SWAY
Basic principle is to dynamically turn off
partial cache array structure
8/10/2019 Leakage Pres
24/44
Ps Reduction for L2 Caches
L2 cache has much larger miss penalty, so
approach for L1 can not be directly applied
Use VRC to reduce Ps , and use time-outbased control mechanisms to shutdown
L2-cache data portion
Time out threshold could be fixed (FTO),dynamic, or by feedback control (FCTO)
8/10/2019 Leakage Pres
25/44
Ps Reduction for L2 Caches contd
FTO Time out threshold is set as M.I.T.
FCTO Adjust the time-out threshold with the proportional-
integral (PI) feedback controller
Update time-out threshold according to N: L2 cache miss rate in previous time window
Told: Time-out thresholdin previous time window
New timeout threshold T = Told+ (N Setpoint) *Gain
8/10/2019 Leakage Pres
26/44
Circuits for FCTO
Timeout controller
Threshold controller
Tag Index Block offset
Tag
potion
Check for tag match
Data
potion
Mux
hit/miss
Timeout
controller
Requestaddress:
Hit?
hit/miss Wakeup
signal
Yes
Threshold
controller
= Shutdown
signalCounter
Threshold
register
- X +Nmiss
setpoint gain
Threshold
output
Data word
Wakeup/
shutdown
signals
8/10/2019 Leakage Pres
27/44
Comparison of L2 Leakage Reduction
Power reduction (%) Performance penalty (%)
Benchmark FTO FCTO SWAY DRI FTO FCTO SWAY DRI
go 52.21 63.80 57.55 56.79 1.06 1.10 9.95 7.39
l i 12.92 27.87 26.64 26.56 0.93 1.07 7.28 7.71
equake 35.75 48.61 46.40 45.71 0.84 1.01 9.73 10.58
art 0.07 2.20 2.17 2.18 0.37 0.92 3.18 3.14
Time-out (FTO and FCTO) achieve much smallerperformance penalty
Targeting at 1% performance loss, FCTO obtains morepower reduction than FTO does.
8/10/2019 Leakage Pres
28/44
System Level Leakage Reduction
Motivation
Leakage characteristics and reduction
Coupled leakage and thermal simulation
and management
Power and thermal simulation
Dynamic power and thermal management
Vdd scaling with cooling selection
8/10/2019 Leakage Pres
29/44
Temperature Aware Computing
Initial
conditions
(T, delay)
uArch
Floorplanpackaging
Workload
(e.g. Spec 2k)
Adjusted
conditions
(T, delay)
Performance simulator
(e.g. SimpleScalar, IMPACT)
Dynamic power estimation
(e.g. Wattch)
Leakage estimation
Coupled power and thermal simulator
(e.g. PTscalar, PowerImpact)
Temperature-aware
architecture techniques
(DVS, DTM,
reconfigurability
power model, GALS, etc)
8/10/2019 Leakage Pres
30/44
Leakage Model with Temperature
Scaling Exponential scaling based on BSIM3v3
Logic circuits in ITRS 100nm technology:
Memory units in ITRS 100nm technology:
dddd
ddc
dddd
ddl
VT
VTwordsizewordsVTP
VT
VTwordsizewordsVTP
53.372592.711exp1029.5),(
09.439613.1986exp)1072.11030.5(),(
210
2910
T
VTVTIVNP ddddavgddgates
09.439613.1986exp),( 200
8/10/2019 Leakage Pres
31/44
8/10/2019 Leakage Pres
32/44
Thermal Modeling
For the lumped RC thermal circuit
Thermal resistance Rth: the ability to remove heat to the ambient in
steady-state condition
Thermal capacitance Cth: capture the delay between a change in
power and the corresponding change in the temperature
Thermal time constant = Rth* Cth
Distributed model is needed for accurate solution
8/10/2019 Leakage Pres
33/44
Coupled Power and Thermal
Simulation Simulate time step ts < 0.5% of time
constant (~106 cycles) will give negligible
temperature and power calculation errors Clock gating reduces dynamic power and
also leakage energy
Leakage energy changes with operationtemperature
8/10/2019 Leakage Pres
34/44
Leakage Power at Different Temperature
uP similar to DEC Alpha 21264 and with clock gating
Leakage differs by up to 2X between 80oC and 110oC Differs for different applications too.
Coupled thermal and power simulation is a must
0%
20%
40%
60%
80%
100%
35 85 110 Dep 35 85 110 Dep
Temperature (oC)
Normalizedtotalpow
er
Dynamic power Leakage power
Benchmark art Benchmark gcc
100nm, 3.33GHz, 1.2V
8/10/2019 Leakage Pres
35/44
Thermal Runaway
Thermal runaway is caused by the positivefeedback loop between on-resistor,
temperature, and powerAlso a result of the interaction between
leakage power and temperature
Component temperature leakage power
exponentiallytemperature
If cooling not adequate, both keep increasing
8/10/2019 Leakage Pres
36/44
Thermal Runaway contd
Assume no throttlingand constant powerconsumption,
conditions for thermalrunaway is equivalentto d2T/dt2> 0
Lowest temperature
to meet TR criteria isrunaway temperature
8/10/2019 Leakage Pres
37/44
Dynamic Power and Thermal
Management (DPTM) Goal: Maximize throughput subject to maximum
on-chip temperature constraint
For each time window =Xcycles, stop orthrottle instruction fetch in cycles
0
8/10/2019 Leakage Pres
38/44
Dynamic Power and Thermal
Management (DPTM) Fetch toggling toggles I-cache, I-TLB,
branch prediction and decode units
Dynamic frequency scaling (DFS) andDynamic Voltage Scaling (DVS) adjust theclock freq and Vddstall
Activity migration move activities toanother component copy of lowertemperature
8/10/2019 Leakage Pres
39/44
Need for Temperature Dependent
Leakage Model Dynamic thermal
management using
fetch toggling with PIfeedback controller
Implemented 2
models: simple (fixed
Ps) and accurate (Psis temp. dependent)
8/10/2019 Leakage Pres
40/44
Validation of PI-based DPTM
Compared with two practices:
No dynamic management
Lower Vdd to avoid thermal violations
Cooling down
If reaching the thermal threshold, stop the
whole processor until the maximum
temperature is XoC lower than the threshold
X = 5 in our experiments
8/10/2019 Leakage Pres
41/44
System Performance
DPTM by feedback control may improve throughputby up to 11% compared to no DPTM case
DPTM allows designing for common workload but notthe worst case => thermal speculation
2.0
2.5
3.0
3.5
4.0
4.55.0
5.5
1 1.1 1.2 1.3
Vdd (V)
Throughput(BIPS
)
Feedback control, Max T=80C Simple cooling down, Max T=80C
No management, Max T=110C
Max throughput
8/10/2019 Leakage Pres
42/44
Active Cooling
Direct water-spray cooling Thermal resistance 0.067 compare to 0.8 for
conventional heatsink
Microchannel with liquid coolant,
8/10/2019 Leakage Pres
43/44
Impacts of Water Cooling
Increases the maximum throughput by 30%
Improves power efficiency by 9% and slows
down the decay of power efficiency
0
1
2
3
4
5
6
7
1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8
Vdd (V)
Throu
ghput(BIPS
0
0.1
0.2
0.3
0.4
Powereff
iciency(BIPS/W)
water cooling, Max T=60oC
Air cooling, Max T=80oC
8/10/2019 Leakage Pres
44/44
References
Amit Agarwal et. al, Leakage Mechanismsand Leakage Control for Nano-Scale
CMOS Circuits, Purdue University.
Lei He et. al, System Level LeakageReduction Considering theInterdependence of Temperature andLeakage, UCLA.