Upload
nickolas-palmer
View
220
Download
2
Embed Size (px)
Citation preview
OPTIMAL FSMD PARTITIONING FOR OPTIMAL FSMD PARTITIONING FOR LOW POWERLOW POWER
Nainesh Agarwal and Nikitas DimopoulosElectrical and Computer Engineering
University of Victoria
SummarySummary
Power and energyPower gatingPartitioning as means to achieve
optimal power gatingWhat next
Computation Power and EnergyComputation Power and Energy
What is the minimum energy a computation can expend?
Are we there yet?
Computation Power and Energy cont’dComputation Power and Energy cont’d
Feynman gives a relation between free energy and computation rate for reversible computation– E = kTlogr– Where r is the computation rate.
This means that at the limit, we may expend zero energy (when r =1) but then the computation will take infinitely long.
For irreversible computation, E=kTblog2– Where b is the number of bits involved in
the computation (entropy)
Computation Power and Energy cont’dComputation Power and Energy cont’d
In both cases, these quantities are wxceptionally small. – k =1.3806504×10−23 J/K
At T=300ºK, kT= 4.14x10-21JA 50W 3GHz processor, in one cycle, consumes 1.65x10-8J
Computation Power and Energy cont’dComputation Power and Energy cont’d
DSPstone benchmarks synthesized in 180 nm and 90 nm technologies
Computation Power and Energy cont’dComputation Power and Energy cont’d
DSPstone dynamic energyDSPstone dynamic energy
Dynamic Energy
0
5E-11
1E-10
1.5E-10
2E-10
2.5E-10
3E-10
3.5E-10
0 100 200 300 400 500 600
Simulation Period (ns)
Energy (J)
180nm
Gen Purp 90nm
High Perf 90nm
Total Energy
0.E+00
2.E-10
4.E-10
6.E-10
8.E-10
1.E-09
1.E-09
1.E-09
0 100 200 300 400 500 600
Simulation Period (ns)
Energy (J)
180nmGen Purp 90nmHigh Perf 90nm
3.86x10-11 J
DSPstone total energyDSPstone total energy
Computational energy is far above the theoretical minimum (by more than 10 orders of magnitude)
Technological drive reduces total energy (an order of magnitude per generation)
Leakage power has become an issue Power gating may provide efficiencies to
further scale the technology
Computation Power and Energy cont’dComputation Power and Energy cont’d
PartitioningPartitioning
Controller and datapath are considered together
Problem is formulated as – Integer Linear
Programming– Non-linear programming
solved using simulated annealing
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
NotationNotation
si represents a state of a FSMD vk represents a variable associated
with one or more states A variable vk is considered to be shared between
two states si and sj if the variable is read and/or written at both states
Tij Is the total number of bits of all variables shared by states si and sj
Eij is 1 if there is a transition between states si and sj, otherwise it is 0.
ILP formulationILP formulation
Minimizes the number of bits that are shared between the partitions and the number of times that control could between the partitions– sij is 1 if both states si and
sj are in the same partition.
Otherwise, it is 0.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
ILP formulation - completeILP formulation - complete
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Simulated Annealing formulationSimulated Annealing formulation
xi is -1 if state si is in the left partition, and it is 1 if si is in the right partition
These quantities count the number of variable bits and transition edges shared between the two partitions
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Simulated Annealing formulationSimulated Annealing formulation
simplification steps
Observe that is
constant (the total number of variable-bits)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Tiji, j=1
S
∑
Simulated Annealing formulationSimulated Annealing formulation
Minimizes both the shared bits and the transition edges.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
EvaluationEvaluation
Implemented four integer algorithms– 8-bit counter– 5/3 wavelet transform using lifting– multiplierless approximation to the eight-point Discrete Cosine
Transform (DCT)– Integer transform from the H.264 standard
Used CoDeL to implement the designs. Trace data were obtained from simulations using
Synopsys The ILP model was solved using the CPLEX solver
included in the AIMMS modeling environment The simulated annealing used MATLAB
Evaluation cont’dEvaluation cont’d
Power savings were estimated (no partitioned design implementation yet)– The static power savings depends on the size of
the sequential logic and the portion of time spent in each partition.
– The dynamic power savings depends on the number of bits that are not clocked while the partition is not powered mediated by the overhead due to data communication when the active partition changes.
Evaluation (Static Power Evaluation (Static Power savings)savings)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Evaluation (Dynamic Power Savings)Evaluation (Dynamic Power Savings)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Results (ILP)Results (ILP)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Results (Simulated Annealing)Results (Simulated Annealing)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
DiscussionDiscussion
Results show that partitioning the control and datapaths could potentially save up to 50% of power (static power)
Some circuits could not partition (DWT includes one tight loop where it spends more than 90% of the time)
Simulated annealing and ILP (for the partitioned circuits) give identical results.
Simulated annealing is much faster.
FutureFuture
Extend methodology to more than 2 partitions
Implement the partitioned FSMD machines and confirm the realized power savings
Lower energy!