Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
METHODS FOR TRUE ENERGY-
PERFORMANCE OPTIMIZATION
Naga Harika Chinta
OVERVIEW
� Introduction
� Optimization Methods
A. Gate size
B. Supply voltage
C. Threshold voltage
� Circuit level optimization
A. Technology Calibration
B. Sensitivity to gate sizing, supply and threshold voltage
C. Optimization example:64 bit adder
� Micro architectural optimization
A. Choosing optimal circuit topology
B. Parallelism versus pipelining
C. Choosing optimal VDD and VTH
D. Optimal level of parallelism
E. Energy Area tradeoff
� Conclusion
� References
2
INTRODUCTION
� Maximum achievable performance depends on the efficiency of computation per unit energy.
� Optimization is achieved by
- minimizing energy consumption.
- maximizing the amount of computation.
� Optimization performed at 3 layers of abstraction:
-System architecture (outer layer).
-Micro architectural (intermediate).
-Fixed circuit topology(inner layer).
� Inner layer deals with:
-Circuit-specific supply voltage(VDD).
-Threshold voltage(VTH).
-Gate sizing. 3
OPTIMIZATION METHODS
� The tradeoff achieved by tuning some design variable x is given by the sensitivity to variable x:
� Tuning knobs in the circuit optimization:
-Gate size.
-Supply voltage.
-Threshold voltage.
� Gate sizing: achieved by TILOS(Timed logic synthesizer), a component that adjusts the transistor sizes and reorders series devices to meet performance optimizations.
� Supply Voltage: employed in low-power DSP.
� Threshold Voltage: became more prominent with leakage power consumption. 4
CIRCUIT-LEVEL OPTIMIZATION:
TECHNOLOGY CALIBRATION
A. Delay Model:
� Alpha-power law model is used as a baseline for gate-delay model
B. Energy Model:
� 2 components of energy are considered:
1. Switching Component:
2. Leakage Component:
5
REFERENCE DESIGN PROPERTIES
6
SENSITIVITY TO GATE SIZING
� Sensitivity of circuit energy to delay due to change in size of a gate
in ith stage is given by
� Largest potential for energy is achieved with a design of minimum
delay.
� Decrease in gate size decreases leakage current, which decreases
cycle time D, increasing the leakage energy.
� To achieve equal sensitivity in all stages, the difference in effective
fan-out's must increase in proportion to the energy of the gate.
7
SENSITIVITY TO SUPPLY VOLTAGE
� Sensitivity of circuit energy to delay increase due to global supply
reduction is given by,
� Design sized for minimum delay at maximal supply voltage offers
the greatest potential for energy reduction.
� Decrease in Supply voltage:
- decreases switching and leakage energy
- results in negative sensitivity 8
SENSITIVITY TO THRESHOLD VOLTAGE
� Sensitivity of circuit energy to delay due to threshold voltage is
given by,
� The sensitivity here decays exponentially with increase in Vth.
� Reduced Vdd is achieved:
- by lowering the threshold voltage while maintaining circuit
speed for designs with low leakage.
� Total energy is minimized:
- when leakage and switching components are comparable to
each other.9
64-BIT ADDER
10
16-bit kogge-Stone tree adder:
� All paths are assumed to be critical paths.
� Calculates the carry signals in advance, based on the input signals.
� Carry propagation is recursive.
Boolean Equations:
Pi= Ai ⊕⊕⊕⊕ Bi Carry propagate
Gi= Ai.Bi Carry generate
Si= Pi⊕⊕⊕⊕ Ci Sum
Ci+1= Gi+ PiC Carry out
� Signals P and G only depend on the input bits
16-BIT KOGGE STONE TREE ADDER
11
ENERGY MAP FOR 64-BIT ADDER
12
COMPARISON BETWEEN DIFFERENT
LOGIC CIRCUITS
� Superior performance achieved with gate sizing at small incremental
delays.
� Data below shows the potential energy reduction due to W, Vdd and
Vth:
13
COMPARISON BETWEEN DIFFERENT
LOGIC CIRCUITS (CONT’D)
� Supply voltage becomes more prominent at larger delays providing
larger energy savings.
� Figure below shows the energy-delay sensitivity to each of the
tuning variables in adder example.
14
OPTIMAL ENERGY TRADE-OFF IN 64-BIT
ADDER
� Energy-delay tradeoff curve obtained by jointly optimizing gate size, supply and threshold voltage:
� Circuit optimization is effective only in the region of about 30% around reference delay.
15
MICRO ARCHITECTURAL
OPTIMIZATION
� Various abstraction layers in the optimization
16
OPTIMAL CIRCUIT TOPOLOGY
� 64-bit ALU:
� ALU consists of 2 registers to drive 64-bit kogge- Stone tree adder:
1. Simple cycle latches(CL).
2. Static master-slave latch pairs.
� Output load (CL) is due to: registers, wire and bus capacitances.
� Input capacitance of the adder is fixed to reduce search space in global optimization.
� The major difference between register and adder is the switching activity.
17
OPTIMAL CIRCUIT TOPOLOGY (CONT’D)
� Figure below shows the energy-efficient curves in register, adder and ALU:
� At circuit level total energy of ALU is minimized subject to cycle time constraint.
18
MICRO-ARCHITECTURAL
OPTIMIZATION
19
PARALLELISM VERSUS PIPELINING
� Threshold voltage is introduced as an additional tuning variable in the optimization.
� Micro architectural design options:
nominal, parallel and pipeline
� Nominal: add-compare unit which uses 64-bit adder for both the adder(A) and comparator(B).
� Pipelining: extra register is inserted between blocks A and B.
� Parallel: area is doubled by operating two blocks in parallel.
20
PARALLELISM VERSUS
PIPELINING(CONT’D)
� Figure below shows the energy-per-operation in all the three designs is compared to nominal case:
� Conclusions from the above graph:
1. Parallelism is more energy efficient than pipelining when leakage energy is smaller than switching.
2. As devices becomes more leakier, parallel implementations achieve smaller energy savings.
3. Total energy is minimized at the point where the leakage energy is about the half of the active energy.
21
OPTIMAL VDD AND VTH
� The dependence of critical path delay and the ELk/Esw ratio on Vdd and Vth yields:
where init = initial design point
opt = optimal design point
� Equations based on:
-changing threshold voltage initially.
-changing Vdd to achieve desired performance.22
OPTIMAL VDD AND VTH(CONT’D)
23
OPTIMAL LEVEL OF PARALLELISM
� Figure below shows the energy-performance space for designs with parallelism P=2 to P=5.
� External load used was CL=32.
� Conclusions:
1. Parallel architecture provides an increase in performance at a very small marginal cost in energy
2. Addition of more parallel stages increases the throughput/performance.
24
ENERGY-AREA TRADEOFF
� Optimal tradeoff between energy and area reduces the cost of
design.
� Design cost function C(x) is given by:
where E(x) and A(x) are total energy and area.
β is the weight-factor.
� Time-multiplexing is used to reduce the area at the expense of some
increase in energy.
25
ENERGY-AREA TRADEOFF(CONT’D)
� Optimization in this case is achieved at the point where the overall chip cost is minimized.
26
CONCLUSIONS
27
REFERENCES
� Markovic, D.; Stojanovic, V.; Nikolic, B.; Horowitz, M.A.;
Brodersen, R.W, “Methods for true energy performance
optimization” Proc. IEEE, vol. 39, Issue:8, August 2004.
� Dunlop, A.E.; Fishburn, J.P.; Hill, D.D.; Shugard, D.D,”
Experiments using automatic physical design techniques for
optimizing circuit performance” Proc. IEEE, vol. 1, Year 1989.
� Neil H. E. Weste, David Money Harris,”CMOS VLSI DESIGN a
Circuits and Systems Perspective”, fourth ed.,2011
28