Upload
vutruc
View
217
Download
1
Embed Size (px)
Citation preview
1
System-Level Optimization of Switched-CapacitorVRM and Core for Sub/Near-Vt Computing
Sai Zhang,Student Member, IEEE, Naresh R. Shanbhag, Fellow, IEEE, and Philip T. Krein, Fellow, IEEE
Abstract—This paper proposes to jointly optimize a switchedcapacitor voltage regulator module (SC-VRM) combined with acompute core to minimize system energy per instruction. Pastwork seeking to optimize system energy efficiency has focusedon separately maximizing SC-VRM efficiency or operating thecompute core at its minimum energy operating point (MEOP).We first propose and verify a core-aware SC-VRM energy modelwhich explicitly accounts for throughput constraints. Second, weperform joint optimization considering throughput unconstrainedapplications (TUA) and throughput constrained applications(TCA). We show that for TUA, the system minimum energyoperating point (S-MEOP) voltage is different from both the coreMEOP (C-MEOP) and VRM maximum efficiency point (V-MEP)voltages, and operating at S-MEOP achieves 12.3% and 21.8%energy savings compared with C-MEOP and V-MEP, respectively.For TCA, S-MEOP is achieved at the same voltage as C-MEOPbut different from the voltage for V-MEP, and 38.9% energysavings can be obtained by operating at S-MEOP.
I. INTRODUCTION
Emerging autonomous sensor network and other embeddedmobile applications require high energy efficiency and low-to-medium throughput in order to operate with limited batteryenergy supply. Such systems comprise an energy delivery sub-system in the form of a DC-DC voltage regulator module(VRM), and an information processing sub-system - the com-putational core. The energy drawn (see Fig.1) from the batteryEbat or the system-level energy consumption is given by:
Ebat = Evrm + Ecore (1)
where Evrm is the energy lost in the VRM, and Ecore is theenergy consumed by the core. Minimizing Ebat requires thatboth Evrm and Ecore be reduced. However, VRM efficiency ηand Ecore are coupled through the supply voltage Vdd, and theload current IL. This paper addresses the problem of system-level energy minimization by jointly optimizing Evrm andEcore, rather than minimizing these independently as has beendone thus far.
The conventional approach to VRM design has been tomaximize its efficiency η = Ecore/Ebat by minimizing theVRM loss Evrm. Of the three commonly employed topologies:linear regulator, switching converter, and switched capacitor
The authers are with Department of Electrical and Computer Engineering,Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA, email: szhang12,shanbhag, [email protected] work was supported in part by Systems on Nanoscale Information fabriCs(SONIC), one of the six SRC STARnet Centers, sponsored by MARCO andDARPA.Copyright (c) 2014 IEEE. Personal use of this material is permitted. However,permission to use this material for any other purposes must be obtained fromthe IEEE by sending an email to [email protected]
SC-VRM Core
THV*
ddV
*
coreE
/dd batV V
coreE
1ddV
ddV
Fig. 1. The VRM efficiency maximizing supply voltage V ∗dd,η 6= V ∗
dd(energy optimizing supply voltage of the core) thereby indicating the needfor joint/system-level energy optimization of the VRM and the core.
VRM (SC-VRM), the SC-VRM is most suitable for on-chipintegration and for sub/near-threshold applications due to itscompactness and ability to achieve a high voltage conversionratio Vbat : Vdd [1], [2].
In the area of low power embedded core design, theexistence of the minimum energy operating point (MEOP)has been shown in the sub-threshold region. Sub-thresholdoperation [3] provides a 20X energy saving at the expenseof a 100X throughput loss. Near threshold voltage (NTV)computing [3] has also been proposed to trade off energyefficiency, throughput, and robustness. However, the existenceof MEOP is tied to application level throughput constraints.These constraints have not been accounted for explicitly in theoptimization of core and VRM thus far. We consider two mainapplication scenarios depending on throughput: a throughputunconstrained application (TUA) and a throughput constrainedapplication (TCA). TUAs are those where data can be sampled,stored and processed [4], whereas TCAs such as real time DSPapplications, require real-time data processing [5].
Sub-threshold or NTV operation requires a VRM to bedesigned with a high voltage conversion ratio (5:1-to-8:1).Achieving high VRM efficiency for low Vdd is challenging.Much work in the design of VRM and core in this powerlevel have been focused on separately maximizing the VRMefficiency [1] and minimizing core energy [5], which doesnot lead to system level energy optimality. Therefore, jointoptimization of a SC-VRM and a core is critical. The notionof joint optimization has been proposed in [6], [7] to optimizea switching (buck) converter and core. The limitations ofprevious work are two fold: (1) the optimization focuses onsimple buck converter model with no circuit level verification,and (2) the distinction between TUAs and TCAs is not made,
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TCSII.2014.2331096
Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
2
which leads to completely different results.Therefore, in this paper, we propose to jointly optimize
a SC-VRM and a compute core considering throughput re-quirements. We propose an accurate core-aware energy modelof the SC-VRM verified in circuit level simulation, based onwhich a joint optimization problem is formulated and solvedfor both TUAs and TCAs. We show that the TUA systemMEOP (S-MEOP) voltage V ∗
dd,S is different from both thecore MEOP (C-MEOP) voltage V ∗
dd,C and the VRM maximumefficiency point (V-MEP) voltage V ∗
dd,V due to the trade offbetween VRM energy loss and core energy consumption. Asa result, for TUAs, operating at S-MEOP leads to 12.3% and21.8% energy savings compared with C-MEOP and V-MEP,respectively. On the other hand, for TCAs V ∗
dd,S is the sameas V ∗
dd,C , but different from V ∗dd,V and 38.9% energy savings
can be obtained by operating at S-MEOP. The rest of thepaper is organized as follows: Section 2 presents the core-aware SC-VRM model. Section 3 formulates and solves thejoint optimization problem for TUA and TCA. We concludein Section 4.
II. CORE-AWARE SC-VRM MODEL
A. Core energy model
There are two types of energy consumption in a coreoperating in near/sub-threshold region: dynamic energy andleakage energy. A unified model which accounts for bothcomponents has been proposed in [7], as follows:
Ecore = CcoreV2dd + VddIleak(Vdd)
1
fclk(2a)
with
Ileak(Vdd) = µCoxW
L(m− 1)V 2
T e−VtmVT e
−ηdVddmVT (1 − e
−VddVT )
(2b)where Ccore is the effective load capacitance, Vdd is the
supply voltage, Vt is the threshold voltage, VT is the thermalvoltage, µ is the carrier mobility, Cox is the gate capacitanceper unit W/L, m is an constant related to the sub-thresholdslope factor, and ηd is the drain induced barrier lowering(DIBL) coefficient. This model captures the trade-off betweenthe dynamic and leakage energy. The dynamic energy Edyn
decreases with Vdd while the leakage energy Elkg increases.In super-threshold region, Ecore decreases with Vdd as Edyn
dominates. However, in the sub-threshold region, Elkg starts todominate, so Ecore increases as Vdd decreases. This trade-offleads to the well-studied MEOP [8]. We denote the MEOP viathe 3-tuple (E∗, V ∗, f∗), where E∗ is the energy at MEOP, V ∗
is the energy optimum voltage, and f∗ is the energy optimumfrequency. The core is modeled as a load resistor Rcore inparallel with a leakage current source Ileak(Vdd) (see Fig. 2).We employ model parameters in [7], which were extractedfrom a 130 nm filter core and verified via circuit simulations.
B. SC-VRM energy model with throughput constraints
A SC-VRM utilizes a capacitor array to store and transfercharge from the battery or energy source (Vbat) to output(Vdd) as shown in Fig. 3(a). Voltage conversion and energy
ddVbatV
ctlR
driveRbpR
ctrlR
:1N
coreR ( )leak ddI V
coreI
batV
N
Fig. 2. The SC-VRM energy model.
delivery is achieved via cyclical switching of the capacitorarray topology. For the 2:1 SC-VRM in Fig. 3(a), chargefrom the battery is stored in the flying capacitor Csc duringphase 1 (Φ1), and the stored charge is delivered to the corein phase 2 (Φ2). The output capacitor Cout, typically in pF tonF range, acts as an energy buffer and manages ripple on theoutput voltage Vdd.
Conventional SC-VRM energy loss mechanisms are cap-tured via the transformer model [9] shown in Fig. 2. Here, therelationship R = V 2/P (V is the voltage and P is the powerdissipation) is employed to obtain an equivalent loss resistoras follows:
Rctl = 1/(βCtotfsw) (3a)Rdrive = 1/(Cdrivefsw) (3b)Rbp = 1/(γCtotfsw) (3c)Rctrl = 1/(Cctrlfctrl) (3d)
where Rctl, Rdrive , Rbp, and Rctrl represent the charge trans-fer loss, driver loss, bottom plate capacitor loss, and controlloss, respectively. In (3a)-(3d), Ctot is the total charge transfercapacitance (see Fig. 3(a)), fsw is the SC-VRM switchingfrequency, β is a constant related to SC-VRM topology,Cdrive and Cctrl are the equivalent output capacitances of thedriver and the control circuit, respectively, γ is the ratio ofbottom plate capacitor to Ctot, and fctrl is the control circuitfrequency.
We propose a core-aware SC-VRM energy model, whichdiffers from the conventional SC-VRM energy model in tworespects: 1) core throughput fclk is introduced as an additionalparameter, and 2) the energy loss is normalized with respectto the core clock period Tclk = 1/fclk. Thus, 1) identifiesa range of feasible output voltages Vdd for the SC-VRM togenerate, and 2) enables the use of energy per instruction (EPI)as an optimization metric. Here Rcore = 1/Ccorefclk is theequivalent resistor (see Fig. 2) representing the core loss.
Applying KCL at the output node of the SC-VRM modelin Fig. 2 indicates that Vdd satisfies the following constraint:
Icore = βCtot(VbatN
− Vdd)fsw = CcoreVddfclk + Ileak(Vdd)
(4)We can see that Vdd and its dependence on the core throughputfclk links the SC-VRM and core design spaces.
We further assume that the power switches are sized suchthat the RC time constant is much (M > 1 times) smallerthan the SC-VRM switching period 1/fsw, i.e.,
M =1
RonCtotfsw(5)
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TCSII.2014.2331096
Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
3
where M is typically chosen to be between three and five.The effective output capacitance of the drive circuit can thenbe expressed as:
Cdrive = CgateRonMfswCtot =Mfsw2πft
Ctot (6)
where Cgate is the gate capacitance of the switch, and ft =1/2πCgateRon is the unity gain frequency of the processtechnology.
Thus, the energy loss components per instruction for boththe SC-VRM and the core can be obtained as follows:
Ectl = βCtot
(VbatN
− Vdd
)2fswfclk
(7a)
Edrive = CtotV2bat
Mf2sw2πftfclk
(7b)
Ebp = γCtotV2bat
fswfclk
(7c)
Ectrl = CctrlV2bat
fctrlfclk
(7d)
In (7a) - (7d), we model control loss as a constant powerloss, since the control clock fctrl is always on for fastload transition response and regulation. Edrive and Ebp areload dependent due to the pulse frequency modulated (PFM)scheme which scales fsw with load to reduce shunt loss. Weconsider only the dynamic component of Edrive and Ebp,since the VRM is operating in the super-threshold region[10], [11]. To validate the core-aware SC-VRM model, a2:1 ladder SC-VRM was designed down to the transistorlevel in a commercial 1.2 V, 130nm CMOS process. A PFMdriver circuit consisting of a current starved oscillator, a non-overlapping clock generator with embedded buffer chain, anda double bound hysteresis regulation loop is designed (seeFig. 3(a)) to regulate the SC-VRM output voltage Vdd overa range of 300 mVto 580 mV from a battery voltage Vbat of1.2 V. The PFM control method reduces the Edrive and Ebp
under light load conditions. The PFM control employed hassame functionality as a VCO based control. Circuit simulationswith HSPICE were performed over a wide load range to verifythe core-aware energy model. In the design, Ctot is chosento be 500 pF, β is topology dependent and equals to fourfor the 2:1 SC-VRM, and Cdrive is sized as per (6). Fromcircuit simulation results, we estimate γ of the on-chip MIMcap to be 1%, and Cctrl to be 2% of Ccore. The switchingfrequency fsw is frequency modulated and ranges from 5 MHzto 100 MHz, fctrl is 100 MHz which is equal to the maximumachievable fsw, and fclk is extracted from the ring oscillatorload at Vdd = 0.5 V to be 224 MHz.
Figure 3(c) compares the results of circuit simulation andthe model (7) for a subthreshold load current Icore range of50µA to 1 mA [11]. From Fig. 3(c), the model and simulationresults match to within 5% under moderate (400µA) tomaximum load (1 mA) load conditions. Under light load, themodel overestimates the energy to within 15%. This core-aware SC-VRM model will be employed in the subsequentsections for joint optimization.
1
1
2
2
refhV
reflV
SR
Q
1
2
ENfsw
totC
outC
ddVNon-overlapping
driver
M1
M3
M4
M2
batV
(a)
( )t s ( )t s
Output voltage waveform Frequency modulation via pulse skipping
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
0.55
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
()
dd
VV
1.4
1.2
1
0.8
0.6
0.4
0.2
0
-0.2
()
EN
V
(b)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110
-16
10-15
10-14
10-13
10-12
load current (mA)
Energ
y loss (
J)
Ectl - simulation
Ectl - model
Ebp - simulation
Ebp - model
Edrive - simulation
Edrive - model
Ectrl - simulation
Ectrl - model
Ener
gy l
oss
()
J
Load current ( )mA
simulation
model
simulation
model
simulation
model
simulation
model
ctl
ctl
bp
bp
drive
drive
ctrl
ctrl
E
E
E
E
E
E
E
E
(c)
Fig. 3. Model verification: a) circuit schematic of a 2:1 SC-VRM withfrequency modulation control, b) Output and PFM control of the SC-VRMin 130nm process, and c)comparison between HSPICE simulation results andanalytical modeling results.
III. SYSTEM-LEVEL OPTIMIZATION
In this paper, we assume the following fixed parameters:the conversion ratio N , battery voltage Vbat, and the VRMtopology, in order to demonstrate the benefits of the proposedjoint optimization approach for a simpler but common sce-nario. The proposed framework does explicitly incorporate thefixed parameters and thus can be used to optimize over thesein the future.
A. System-level energy optimization for TUA
In TUA, the throughput is determined by the supply volt-age Vdd and exponentially decreases when the core entersthe sub-threshold region. For fixed battery voltage Vbat andcore architecture, the optimum solution for the problem isa pair defining the optimum switching frequency f∗sw (thusV ∗dd) and charge transfer capacitance C∗
tot. The system leveloptimization problem can be formulated as:
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TCSII.2014.2331096
Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
4
min. Ebat(fsw, Ctot)
= Ecore + Ectl + Edrive + Ebp + Ectrl
s.t. Ctot ≤ Cmax
βCtot(VbatN
− Vdd)fsw = CcoreVddfclk + Ileak(Vdd)
(8)The optimization problem in (8) can be solved in two
steps, i.e., first find the optimum Ctot and then optimize overfsw to obtain the global optimum design point. As shownin Fig. 4(a), the system EPI decreases with Ctot becauselarger capacitance transfers more charge per converter cycle,leading to decreased fsw. This decreases Eshunt, defined asEshunt = Edrive+Ebp+Ectrl, while maintaining same seriesenergy Eser, defined as Eser = Ectl + Ecore. Therefore, theoptimum Ctot equals the maximum available charge transfercapacitance Cmax. With Ctot = Cmax, Fig. 4(b) shows that forfixed conversion ratio N and load, Vdd is uniquely determinedby fsw, so V ∗
dd can be determined by f∗sw. Furthermore,Fig. 5(a) shows that as fsw reduces and the core enters the sub-threshold region, Edrive and Ebp first reduce with fsw thenincrease due to the exponential increase in the critical pathdelay 1/fclk. Ectl and Ectrl also increase due to a slowedcore operating frequency according to (7). These trends resultin a higher S-MEOP voltage V ∗
dd,S than the C-MEOP voltageV ∗dd,C . Fig. 5(b) plots the Ebat, Ecore and η as a function offsw. It is shown in Fig. 5(b) that V ∗
dd,S is also different from V-MEP voltage V ∗
dd,V because efficiency maximum design tendsto operate a core at higher Vdd to balance Ectl and Eshunt. TheV ∗dd,S (0.38 V with fsw of 3.81 MHz) lies in between V ∗
dd,C
(0.31V with fsw of 0.62 MHz) and V ∗dd,V (0.5 V with fsw of
113 MHz) voltage which suggests that a tradeoff between core-optimum and η-optimum design is necessary for highest sys-tem energy efficiency. Operating at S-MEOP results in 12.3%and 21.8% energy savings over operation at C-MEOP and V-MEP, respectively. By considering throughput constraint, thesystem optimization results in a unique operating point thatcan not be achieved by conventional joint optimization [6].
B. System-level energy optimization for TCA
In TCAs, the throughput is fixed by application levelrequirements. Thus, an additional constraint, below, needs tobe added to (8):
Vdd ≥ Vdd,min(fclk) (9)
Since the optimal value of Ctot is throughput independent,C∗
tot = Cmax. However, unlike TUA, Fig. 6(a) shows that Ebat
is strictly increasing with respect to fsw when fsw is sweptbetween fsw,min = 1.3 MHz and 100 MHz where fsw,min isthe lowest fsw at which the SC-VRM can meet (9). As fswincreases, Ebat increases from 47.3 pJ to 201.6 pJ, implyingthat the optimum switching frequency f∗sw is the frequencyunder which the Vdd = Vdd,min in (9). This is because anincreased fsw will lead to both higher Vdd (thus higher corecurrent) and higher shunt losses (Edrive and Ebp), which willtranslate to increased Ebat. Furthermore, Fig. 6(b) shows that
50 100 150 200 250 300 350 400 450 5000
0.2
0.4
0.6
0.8
1
1.2x 10
-10
Ctot
(pF)
Ein
(J)
Vdd
= 0.43V,
Vdd
= 0.47V
Vdd
= 0.51V
()
bat
EJ
20coreC pF100coreC pF
200coreC pF
100MHz
106
107
108
0.35
0.4
0.45
0.5
0.55
0.6
0.65
fsw(Hz)
Vdd(V
)
( )swf Hz
()
dd
VV
( )totC pF
Vdd=0.4V
Vdd=0.45V
Vdd=0.5V
(a)
50 100 150 200 250 300 350 400 450 5000
0.2
0.4
0.6
0.8
1
1.2x 10
-10
Ctot
(pF)
Ein
(J)
Vdd
= 0.43V,
Vdd
= 0.47V
Vdd
= 0.51V
()
bat
EJ
20coreC pF100coreC pF
200coreC pF
100MHz
106
107
108
0.35
0.4
0.45
0.5
0.55
0.6
0.65
fsw(Hz)
Vdd(V
)
( )swf Hz
()
dd
VV
( )totC pF
Vdd=0.4V
Vdd=0.45V
Vdd=0.5V
(b)
Fig. 4. Optimization for Ctot: a) system energy vs. charge transfer capac-itance, and b) Vdd vs. switching frequency fsw for Ctot = Cmax and anominal core load of 200 pF.
for TCA, the C-MEOP and S-MEOP are obtained at the samefsw,min thus V ∗
dd,S = V ∗dd,C , but V ∗
dd,S is still lower thanV ∗dd,V . This implies that operating at V ∗
dd,V is never energyoptimum at the system level. Although Ectl can be reducedby increasing output voltage while still satisfying (9), theincreased fsw translates to higher Eshunt which negates theenergy savings due to reduced Ectl. Fig. 6(b) also shows thatby operating at S-MEOP, maximum energy savings of 39.8%can be obtained.
The key difference between TUA optimization (Fig. 5) andTCA optimization (Fig. 6) is that in TUA, the core frequencyfclk scales with Vdd, while as it is a fixed value in TCA.In TUA optimization, Eshunt, Ecore, Ectl and Ectrl increasedramatically due to exponential reduction of fclk in the sub-threshold region. This leads to the S-MEOP occurring at ahigher Vdd compared with the C-MEOP. On the other hand,fclk is a constant in TCA so we always normalize the energyloss to the same value. This leads to S-MEOP occurring at thesame Vdd as C-MEOP.
IV. CONCLUSION AND FUTURE WORK
In this paper, we proposed a system level approach tooptimize operation of a SC-VRM and a compute core tominimize the total system EPI. We show that throughputconstraints have considerable impact on the optimization, and
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TCSII.2014.2331096
Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
5
105
106
107
108
109
0
200
400
fsw(Hz)
Ebat(
J)
105
106
107
108
1090
0.5
1
eff
Ebat
Ecore
eff
105
106
107
108
109
10-3
10-2
10-1
100
101
102
103
fsw(Hz)
Energ
y(J
)
Ebat
Ecore
Essl
Edrive
Ebp
Ectrl
12.6%
,bat C MEOPE
,bat S MEOPE
21.8%
C-MEOPS-MEOP
,bat V MEPE
En
erg
y (
)pJ
( )swf Hz
( )swf Hz
Ener
gy (
)pJ
EbatEcoreEctlEdriveEbpEctrl
EbatEcore
(a)
105
106
107
108
109
0
200
400
fsw(Hz)
Ebat(
J)
105
106
107
108
1090
0.5
1
eff
Ebat
Ecore
eff
105
106
107
108
109
10-3
10-2
10-1
100
101
102
103
fsw(Hz)
Energ
y(J
)
Ebat
Ecore
Essl
Edrive
Ebp
Ectrl
12.6%
,bat C MEOPE
,bat S MEOPE
21.8%
C-MEOPS-MEOP
,bat V MEPE
Ener
gy (
)pJ
( )swf Hz
( )swf Hz
En
erg
y (
)pJ
EbatEcoreEctlEdriveEbpEctrl
EbatEcore
(b)
Fig. 5. System optimization for TUA: a) core energy and SC-VRM energyloss vs. switching frequency fsw , and b) comparison of S-MEOP, C-MEOPand V-MEP.
that TUA and TCA result in different S-MEOP. For TUA,system level optimization reveals a unique S-MEOP differentfrom both C-MEOP and V-MEP; and for TCA, the S-MEOPoverlaps with C-MEOP.
Future work can focus on extending system level energyoptimization to reconfigurable SC-VRM circuits with variablevoltage conversion ratio, and developing energy efficient S-MEOP tracking architecture. Furthermore, statistical errorcompensation techniques such as algorithmic noise-tolerance(ANT) [12] has been shown to provide 60% energy savingwith no loss of performance when applied to a compute core.We will investigate the joint optimization problem consideringerror resiliency technique in both the core and VRM in orderto further reduce system EPI.
REFERENCES
[1] J. Kimball and P. Krein, “Analysis and design of switched capacitorconverters,” in Proc. IEEE Applied Power Electronics Conf. (APEC),2005, 2005, pp. 1473–1477.
[2] W. Chang, “Variable-conversion-ratio switched-capacitor-voltage-multiplier/divider dc-dc converter,” IEEE Trans. Circuit Syst. I,, vol. 58,no. 8, pp. 1944–1957, Aug 2011.
[3] R. Dreslinski, M. Wieckowski, D. Blaauw, D. Sylvester, and T. Mudge,“Near-threshold computing: Reclaiming Moore’s law through energyefficient integrated circuits,” Proceedings of the IEEE, vol. 98, no. 2,pp. 253 –266, Feb. 2010.
106
107
108
40
60
80
100
120
140
160
180
200
220
Ein
(J)
fsw (Hz)
106
107
1080.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Effic
ien
cy
Ein 10MHz
data2
10MHz V-MEP
( )swf Hz
En
erg
y (
)pJ
39.8%
106
107
108
10-2
10-1
100
101
102
103
fsw(Hz)
Energ
y(J
)
Ebat
Ecore
Essl
Edrive
Ebp
Ectrl
En
erg
y (
)pJ
( )swf Hz
fsw,min
S-MEOP
C-MEOP
EbatEcoreEctlEdriveEbpEctrl
EbatEcore
(a)
106
107
108
40
60
80
100
120
140
160
180
200
220
Ein
(J)
fsw (Hz)
106
107
1080.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Effic
ien
cy
Ein 10MHz
data2
10MHz V-MEP
( )swf Hz
Ener
gy (
)pJ
39.8%
106
107
108
10-2
10-1
100
101
102
103
fsw(Hz)
Energ
y(J
)
Ebat
Ecore
Essl
Edrive
Ebp
Ectrl
En
erg
y (
)pJ
( )swf Hz
fsw,min
S-MEOP
C-MEOP
EbatEcoreEctlEdriveEbpEctrl
EbatEcore
(b)
Fig. 6. System optimization for TCA: a) core energy and SC-VRM energyloss vs. switching frequency fsw , and b) comparison of S-MEOP, C-MEOPand V-MEP.
[4] Y. Lee, “Ultra-low power circuit techniques for a new class of sub-mm3sensor nodes,” in Proc. IEEE Custom Integrated Circuits Conf. (CICC),2010, Sept. 2010.
[5] A. Wang and A. Chandrakasan, “Energy-aware architectures for areal-valued FFT implementation,” in Proc. Int’l. Symp. Low PowerElectronics and Design (ISLPED), 2003, Aug. 2003, pp. 360 – 365.
[6] P. Rong and M. Pedram, “Hierarchical power management with appli-cation to scheduling,” in Proc. Int’l. Symp. Low Power Electronics andDesign (ISLPED), 2005, Aug. 2005, pp. 269 – 274.
[7] R. Abdallah, P. Shenoy, N. Shanbhag, and P. Krein, “System energy min-imization via joint optimization of the DC-DC converter and the core,” inProc. Int’l. Symp. Low Power Electronics and Design (ISLPED), 2011,Aug. 2011, pp. 97 –102.
[8] B. Zhai, D. Blaauw, D. Sylvester, and K. Flautner, “Theoretical andpractical limits of dynamic voltage scaling,” in Proc. Design AutomationConf., 2004., July 2004, pp. 868 –873.
[9] S. Ben-Yaakov and M. Evzelman, “Generic and unified model ofswitched capacitor converters,” in Proc. IEEE Energy ConversionCongress and Exposition, 2009, 2009, pp. 3501–3508.
[10] Y. Ramadass and A. Chandrakasan, “Voltage scalable switched capacitordc-dc converter for ultra-low-power on-chip applications,” in Proc. IEEEPower Electronics Specialists Conf. (PESC), 2007, June 2007, pp. 2353–2359.
[11] J. Kwong, Y. Ramadass, N. Verma, and A. Chandrakasan, “A 65 nmsub- microcontroller with integrated sram and switched capacitor dc-dcconverter,” Solid-State Circuits, IEEE Journal of, vol. 44, no. 1, pp. 115–126, Jan. 2009.
[12] B. Shim, S. Sridhara, and N. Shanbhag, “Reliable low-power digitalsignal processing via reduced precision redundancy,” IEEE Trans. VeryLarge Scale Integration (VLSI) Systems, vol. 12, no. 5, pp. 497–510,2004.
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TCSII.2014.2331096
Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].