5
1 System-Level Optimization of Switched-Capacitor VRM and Core for Sub/Near-V t Computing Sai Zhang,Student Member, IEEE, Naresh R. Shanbhag, Fellow, IEEE, and Philip T. Krein, Fellow, IEEE Abstract—This paper proposes to jointly optimize a switched capacitor voltage regulator module (SC-VRM) combined with a compute core to minimize system energy per instruction. Past work seeking to optimize system energy efficiency has focused on separately maximizing SC-VRM efficiency or operating the compute core at its minimum energy operating point (MEOP). We first propose and verify a core-aware SC-VRM energy model which explicitly accounts for throughput constraints. Second, we perform joint optimization considering throughput unconstrained applications (TUA) and throughput constrained applications (TCA). We show that for TUA, the system minimum energy operating point (S-MEOP) voltage is different from both the core MEOP (C-MEOP) and VRM maximum efficiency point (V-MEP) voltages, and operating at S-MEOP achieves 12.3% and 21.8% energy savings compared with C-MEOP and V-MEP, respectively. For TCA, S-MEOP is achieved at the same voltage as C-MEOP but different from the voltage for V-MEP, and 38.9% energy savings can be obtained by operating at S-MEOP. I. I NTRODUCTION Emerging autonomous sensor network and other embedded mobile applications require high energy efficiency and low- to-medium throughput in order to operate with limited battery energy supply. Such systems comprise an energy delivery sub- system in the form of a DC-DC voltage regulator module (VRM), and an information processing sub-system - the com- putational core. The energy drawn (see Fig.1) from the battery E bat or the system-level energy consumption is given by: E bat = E vrm + E core (1) where E vrm is the energy lost in the VRM, and E core is the energy consumed by the core. Minimizing E bat requires that both E vrm and E core be reduced. However, VRM efficiency η and E core are coupled through the supply voltage V dd , and the load current I L . This paper addresses the problem of system- level energy minimization by jointly optimizing E vrm and E core , rather than minimizing these independently as has been done thus far. The conventional approach to VRM design has been to maximize its efficiency η = E core /E bat by minimizing the VRM loss E vrm . Of the three commonly employed topologies: linear regulator, switching converter, and switched capacitor The authers are with Department of Electrical and Computer Engineering, Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA, email: szhang12, shanbhag, [email protected] This work was supported in part by Systems on Nanoscale Information fabriCs (SONIC), one of the six SRC STARnet Centers, sponsored by MARCO and DARPA. Copyright (c) 2014 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending an email to [email protected] SC-VRM Core TH V * dd V * core E / dd bat V V core E 1 dd V dd V Fig. 1. The VRM efficiency maximizing supply voltage V * dd,η 6= V * dd (energy optimizing supply voltage of the core) thereby indicating the need for joint/system-level energy optimization of the VRM and the core. VRM (SC-VRM), the SC-VRM is most suitable for on-chip integration and for sub/near-threshold applications due to its compactness and ability to achieve a high voltage conversion ratio V bat : V dd [1], [2]. In the area of low power embedded core design, the existence of the minimum energy operating point (MEOP) has been shown in the sub-threshold region. Sub-threshold operation [3] provides a 20X energy saving at the expense of a 100X throughput loss. Near threshold voltage (NTV) computing [3] has also been proposed to trade off energy efficiency, throughput, and robustness. However, the existence of MEOP is tied to application level throughput constraints. These constraints have not been accounted for explicitly in the optimization of core and VRM thus far. We consider two main application scenarios depending on throughput: a throughput unconstrained application (TUA) and a throughput constrained application (TCA). TUAs are those where data can be sampled, stored and processed [4], whereas TCAs such as real time DSP applications, require real-time data processing [5]. Sub-threshold or NTV operation requires a VRM to be designed with a high voltage conversion ratio (5:1-to-8:1). Achieving high VRM efficiency for low V dd is challenging. Much work in the design of VRM and core in this power level have been focused on separately maximizing the VRM efficiency [1] and minimizing core energy [5], which does not lead to system level energy optimality. Therefore, joint optimization of a SC-VRM and a core is critical. The notion of joint optimization has been proposed in [6], [7] to optimize a switching (buck) converter and core. The limitations of previous work are two fold: (1) the optimization focuses on simple buck converter model with no circuit level verification, and (2) the distinction between TUAs and TCAs is not made, This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TCSII.2014.2331096 Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

System-Level Optimization of Switched-Capacitor VRM · PDF fileSub-threshold or NTV operation requires a VRM to be designed with a high voltage conversion ratio ... optimization of

  • Upload
    vutruc

  • View
    217

  • Download
    1

Embed Size (px)

Citation preview

1

System-Level Optimization of Switched-CapacitorVRM and Core for Sub/Near-Vt Computing

Sai Zhang,Student Member, IEEE, Naresh R. Shanbhag, Fellow, IEEE, and Philip T. Krein, Fellow, IEEE

Abstract—This paper proposes to jointly optimize a switchedcapacitor voltage regulator module (SC-VRM) combined with acompute core to minimize system energy per instruction. Pastwork seeking to optimize system energy efficiency has focusedon separately maximizing SC-VRM efficiency or operating thecompute core at its minimum energy operating point (MEOP).We first propose and verify a core-aware SC-VRM energy modelwhich explicitly accounts for throughput constraints. Second, weperform joint optimization considering throughput unconstrainedapplications (TUA) and throughput constrained applications(TCA). We show that for TUA, the system minimum energyoperating point (S-MEOP) voltage is different from both the coreMEOP (C-MEOP) and VRM maximum efficiency point (V-MEP)voltages, and operating at S-MEOP achieves 12.3% and 21.8%energy savings compared with C-MEOP and V-MEP, respectively.For TCA, S-MEOP is achieved at the same voltage as C-MEOPbut different from the voltage for V-MEP, and 38.9% energysavings can be obtained by operating at S-MEOP.

I. INTRODUCTION

Emerging autonomous sensor network and other embeddedmobile applications require high energy efficiency and low-to-medium throughput in order to operate with limited batteryenergy supply. Such systems comprise an energy delivery sub-system in the form of a DC-DC voltage regulator module(VRM), and an information processing sub-system - the com-putational core. The energy drawn (see Fig.1) from the batteryEbat or the system-level energy consumption is given by:

Ebat = Evrm + Ecore (1)

where Evrm is the energy lost in the VRM, and Ecore is theenergy consumed by the core. Minimizing Ebat requires thatboth Evrm and Ecore be reduced. However, VRM efficiency ηand Ecore are coupled through the supply voltage Vdd, and theload current IL. This paper addresses the problem of system-level energy minimization by jointly optimizing Evrm andEcore, rather than minimizing these independently as has beendone thus far.

The conventional approach to VRM design has been tomaximize its efficiency η = Ecore/Ebat by minimizing theVRM loss Evrm. Of the three commonly employed topologies:linear regulator, switching converter, and switched capacitor

The authers are with Department of Electrical and Computer Engineering,Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA, email: szhang12,shanbhag, [email protected] work was supported in part by Systems on Nanoscale Information fabriCs(SONIC), one of the six SRC STARnet Centers, sponsored by MARCO andDARPA.Copyright (c) 2014 IEEE. Personal use of this material is permitted. However,permission to use this material for any other purposes must be obtained fromthe IEEE by sending an email to [email protected]

SC-VRM Core

THV*

ddV

*

coreE

/dd batV V

coreE

1ddV

ddV

Fig. 1. The VRM efficiency maximizing supply voltage V ∗dd,η 6= V ∗

dd(energy optimizing supply voltage of the core) thereby indicating the needfor joint/system-level energy optimization of the VRM and the core.

VRM (SC-VRM), the SC-VRM is most suitable for on-chipintegration and for sub/near-threshold applications due to itscompactness and ability to achieve a high voltage conversionratio Vbat : Vdd [1], [2].

In the area of low power embedded core design, theexistence of the minimum energy operating point (MEOP)has been shown in the sub-threshold region. Sub-thresholdoperation [3] provides a 20X energy saving at the expenseof a 100X throughput loss. Near threshold voltage (NTV)computing [3] has also been proposed to trade off energyefficiency, throughput, and robustness. However, the existenceof MEOP is tied to application level throughput constraints.These constraints have not been accounted for explicitly in theoptimization of core and VRM thus far. We consider two mainapplication scenarios depending on throughput: a throughputunconstrained application (TUA) and a throughput constrainedapplication (TCA). TUAs are those where data can be sampled,stored and processed [4], whereas TCAs such as real time DSPapplications, require real-time data processing [5].

Sub-threshold or NTV operation requires a VRM to bedesigned with a high voltage conversion ratio (5:1-to-8:1).Achieving high VRM efficiency for low Vdd is challenging.Much work in the design of VRM and core in this powerlevel have been focused on separately maximizing the VRMefficiency [1] and minimizing core energy [5], which doesnot lead to system level energy optimality. Therefore, jointoptimization of a SC-VRM and a core is critical. The notionof joint optimization has been proposed in [6], [7] to optimizea switching (buck) converter and core. The limitations ofprevious work are two fold: (1) the optimization focuses onsimple buck converter model with no circuit level verification,and (2) the distinction between TUAs and TCAs is not made,

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TCSII.2014.2331096

Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

2

which leads to completely different results.Therefore, in this paper, we propose to jointly optimize

a SC-VRM and a compute core considering throughput re-quirements. We propose an accurate core-aware energy modelof the SC-VRM verified in circuit level simulation, based onwhich a joint optimization problem is formulated and solvedfor both TUAs and TCAs. We show that the TUA systemMEOP (S-MEOP) voltage V ∗

dd,S is different from both thecore MEOP (C-MEOP) voltage V ∗

dd,C and the VRM maximumefficiency point (V-MEP) voltage V ∗

dd,V due to the trade offbetween VRM energy loss and core energy consumption. Asa result, for TUAs, operating at S-MEOP leads to 12.3% and21.8% energy savings compared with C-MEOP and V-MEP,respectively. On the other hand, for TCAs V ∗

dd,S is the sameas V ∗

dd,C , but different from V ∗dd,V and 38.9% energy savings

can be obtained by operating at S-MEOP. The rest of thepaper is organized as follows: Section 2 presents the core-aware SC-VRM model. Section 3 formulates and solves thejoint optimization problem for TUA and TCA. We concludein Section 4.

II. CORE-AWARE SC-VRM MODEL

A. Core energy model

There are two types of energy consumption in a coreoperating in near/sub-threshold region: dynamic energy andleakage energy. A unified model which accounts for bothcomponents has been proposed in [7], as follows:

Ecore = CcoreV2dd + VddIleak(Vdd)

1

fclk(2a)

with

Ileak(Vdd) = µCoxW

L(m− 1)V 2

T e−VtmVT e

−ηdVddmVT (1 − e

−VddVT )

(2b)where Ccore is the effective load capacitance, Vdd is the

supply voltage, Vt is the threshold voltage, VT is the thermalvoltage, µ is the carrier mobility, Cox is the gate capacitanceper unit W/L, m is an constant related to the sub-thresholdslope factor, and ηd is the drain induced barrier lowering(DIBL) coefficient. This model captures the trade-off betweenthe dynamic and leakage energy. The dynamic energy Edyn

decreases with Vdd while the leakage energy Elkg increases.In super-threshold region, Ecore decreases with Vdd as Edyn

dominates. However, in the sub-threshold region, Elkg starts todominate, so Ecore increases as Vdd decreases. This trade-offleads to the well-studied MEOP [8]. We denote the MEOP viathe 3-tuple (E∗, V ∗, f∗), where E∗ is the energy at MEOP, V ∗

is the energy optimum voltage, and f∗ is the energy optimumfrequency. The core is modeled as a load resistor Rcore inparallel with a leakage current source Ileak(Vdd) (see Fig. 2).We employ model parameters in [7], which were extractedfrom a 130 nm filter core and verified via circuit simulations.

B. SC-VRM energy model with throughput constraints

A SC-VRM utilizes a capacitor array to store and transfercharge from the battery or energy source (Vbat) to output(Vdd) as shown in Fig. 3(a). Voltage conversion and energy

ddVbatV

ctlR

driveRbpR

ctrlR

:1N

coreR ( )leak ddI V

coreI

batV

N

Fig. 2. The SC-VRM energy model.

delivery is achieved via cyclical switching of the capacitorarray topology. For the 2:1 SC-VRM in Fig. 3(a), chargefrom the battery is stored in the flying capacitor Csc duringphase 1 (Φ1), and the stored charge is delivered to the corein phase 2 (Φ2). The output capacitor Cout, typically in pF tonF range, acts as an energy buffer and manages ripple on theoutput voltage Vdd.

Conventional SC-VRM energy loss mechanisms are cap-tured via the transformer model [9] shown in Fig. 2. Here, therelationship R = V 2/P (V is the voltage and P is the powerdissipation) is employed to obtain an equivalent loss resistoras follows:

Rctl = 1/(βCtotfsw) (3a)Rdrive = 1/(Cdrivefsw) (3b)Rbp = 1/(γCtotfsw) (3c)Rctrl = 1/(Cctrlfctrl) (3d)

where Rctl, Rdrive , Rbp, and Rctrl represent the charge trans-fer loss, driver loss, bottom plate capacitor loss, and controlloss, respectively. In (3a)-(3d), Ctot is the total charge transfercapacitance (see Fig. 3(a)), fsw is the SC-VRM switchingfrequency, β is a constant related to SC-VRM topology,Cdrive and Cctrl are the equivalent output capacitances of thedriver and the control circuit, respectively, γ is the ratio ofbottom plate capacitor to Ctot, and fctrl is the control circuitfrequency.

We propose a core-aware SC-VRM energy model, whichdiffers from the conventional SC-VRM energy model in tworespects: 1) core throughput fclk is introduced as an additionalparameter, and 2) the energy loss is normalized with respectto the core clock period Tclk = 1/fclk. Thus, 1) identifiesa range of feasible output voltages Vdd for the SC-VRM togenerate, and 2) enables the use of energy per instruction (EPI)as an optimization metric. Here Rcore = 1/Ccorefclk is theequivalent resistor (see Fig. 2) representing the core loss.

Applying KCL at the output node of the SC-VRM modelin Fig. 2 indicates that Vdd satisfies the following constraint:

Icore = βCtot(VbatN

− Vdd)fsw = CcoreVddfclk + Ileak(Vdd)

(4)We can see that Vdd and its dependence on the core throughputfclk links the SC-VRM and core design spaces.

We further assume that the power switches are sized suchthat the RC time constant is much (M > 1 times) smallerthan the SC-VRM switching period 1/fsw, i.e.,

M =1

RonCtotfsw(5)

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TCSII.2014.2331096

Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

3

where M is typically chosen to be between three and five.The effective output capacitance of the drive circuit can thenbe expressed as:

Cdrive = CgateRonMfswCtot =Mfsw2πft

Ctot (6)

where Cgate is the gate capacitance of the switch, and ft =1/2πCgateRon is the unity gain frequency of the processtechnology.

Thus, the energy loss components per instruction for boththe SC-VRM and the core can be obtained as follows:

Ectl = βCtot

(VbatN

− Vdd

)2fswfclk

(7a)

Edrive = CtotV2bat

Mf2sw2πftfclk

(7b)

Ebp = γCtotV2bat

fswfclk

(7c)

Ectrl = CctrlV2bat

fctrlfclk

(7d)

In (7a) - (7d), we model control loss as a constant powerloss, since the control clock fctrl is always on for fastload transition response and regulation. Edrive and Ebp areload dependent due to the pulse frequency modulated (PFM)scheme which scales fsw with load to reduce shunt loss. Weconsider only the dynamic component of Edrive and Ebp,since the VRM is operating in the super-threshold region[10], [11]. To validate the core-aware SC-VRM model, a2:1 ladder SC-VRM was designed down to the transistorlevel in a commercial 1.2 V, 130nm CMOS process. A PFMdriver circuit consisting of a current starved oscillator, a non-overlapping clock generator with embedded buffer chain, anda double bound hysteresis regulation loop is designed (seeFig. 3(a)) to regulate the SC-VRM output voltage Vdd overa range of 300 mVto 580 mV from a battery voltage Vbat of1.2 V. The PFM control method reduces the Edrive and Ebp

under light load conditions. The PFM control employed hassame functionality as a VCO based control. Circuit simulationswith HSPICE were performed over a wide load range to verifythe core-aware energy model. In the design, Ctot is chosento be 500 pF, β is topology dependent and equals to fourfor the 2:1 SC-VRM, and Cdrive is sized as per (6). Fromcircuit simulation results, we estimate γ of the on-chip MIMcap to be 1%, and Cctrl to be 2% of Ccore. The switchingfrequency fsw is frequency modulated and ranges from 5 MHzto 100 MHz, fctrl is 100 MHz which is equal to the maximumachievable fsw, and fclk is extracted from the ring oscillatorload at Vdd = 0.5 V to be 224 MHz.

Figure 3(c) compares the results of circuit simulation andthe model (7) for a subthreshold load current Icore range of50µA to 1 mA [11]. From Fig. 3(c), the model and simulationresults match to within 5% under moderate (400µA) tomaximum load (1 mA) load conditions. Under light load, themodel overestimates the energy to within 15%. This core-aware SC-VRM model will be employed in the subsequentsections for joint optimization.

1

1

2

2

refhV

reflV

SR

Q

1

2

ENfsw

totC

outC

ddVNon-overlapping

driver

M1

M3

M4

M2

batV

(a)

( )t s ( )t s

Output voltage waveform Frequency modulation via pulse skipping

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

0.55

0.5

0.45

0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

()

dd

VV

1.4

1.2

1

0.8

0.6

0.4

0.2

0

-0.2

()

EN

V

(b)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110

-16

10-15

10-14

10-13

10-12

load current (mA)

Energ

y loss (

J)

Ectl - simulation

Ectl - model

Ebp - simulation

Ebp - model

Edrive - simulation

Edrive - model

Ectrl - simulation

Ectrl - model

Ener

gy l

oss

()

J

Load current ( )mA

simulation

model

simulation

model

simulation

model

simulation

model

ctl

ctl

bp

bp

drive

drive

ctrl

ctrl

E

E

E

E

E

E

E

E

(c)

Fig. 3. Model verification: a) circuit schematic of a 2:1 SC-VRM withfrequency modulation control, b) Output and PFM control of the SC-VRMin 130nm process, and c)comparison between HSPICE simulation results andanalytical modeling results.

III. SYSTEM-LEVEL OPTIMIZATION

In this paper, we assume the following fixed parameters:the conversion ratio N , battery voltage Vbat, and the VRMtopology, in order to demonstrate the benefits of the proposedjoint optimization approach for a simpler but common sce-nario. The proposed framework does explicitly incorporate thefixed parameters and thus can be used to optimize over thesein the future.

A. System-level energy optimization for TUA

In TUA, the throughput is determined by the supply volt-age Vdd and exponentially decreases when the core entersthe sub-threshold region. For fixed battery voltage Vbat andcore architecture, the optimum solution for the problem isa pair defining the optimum switching frequency f∗sw (thusV ∗dd) and charge transfer capacitance C∗

tot. The system leveloptimization problem can be formulated as:

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TCSII.2014.2331096

Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

4

min. Ebat(fsw, Ctot)

= Ecore + Ectl + Edrive + Ebp + Ectrl

s.t. Ctot ≤ Cmax

βCtot(VbatN

− Vdd)fsw = CcoreVddfclk + Ileak(Vdd)

(8)The optimization problem in (8) can be solved in two

steps, i.e., first find the optimum Ctot and then optimize overfsw to obtain the global optimum design point. As shownin Fig. 4(a), the system EPI decreases with Ctot becauselarger capacitance transfers more charge per converter cycle,leading to decreased fsw. This decreases Eshunt, defined asEshunt = Edrive+Ebp+Ectrl, while maintaining same seriesenergy Eser, defined as Eser = Ectl + Ecore. Therefore, theoptimum Ctot equals the maximum available charge transfercapacitance Cmax. With Ctot = Cmax, Fig. 4(b) shows that forfixed conversion ratio N and load, Vdd is uniquely determinedby fsw, so V ∗

dd can be determined by f∗sw. Furthermore,Fig. 5(a) shows that as fsw reduces and the core enters the sub-threshold region, Edrive and Ebp first reduce with fsw thenincrease due to the exponential increase in the critical pathdelay 1/fclk. Ectl and Ectrl also increase due to a slowedcore operating frequency according to (7). These trends resultin a higher S-MEOP voltage V ∗

dd,S than the C-MEOP voltageV ∗dd,C . Fig. 5(b) plots the Ebat, Ecore and η as a function offsw. It is shown in Fig. 5(b) that V ∗

dd,S is also different from V-MEP voltage V ∗

dd,V because efficiency maximum design tendsto operate a core at higher Vdd to balance Ectl and Eshunt. TheV ∗dd,S (0.38 V with fsw of 3.81 MHz) lies in between V ∗

dd,C

(0.31V with fsw of 0.62 MHz) and V ∗dd,V (0.5 V with fsw of

113 MHz) voltage which suggests that a tradeoff between core-optimum and η-optimum design is necessary for highest sys-tem energy efficiency. Operating at S-MEOP results in 12.3%and 21.8% energy savings over operation at C-MEOP and V-MEP, respectively. By considering throughput constraint, thesystem optimization results in a unique operating point thatcan not be achieved by conventional joint optimization [6].

B. System-level energy optimization for TCA

In TCAs, the throughput is fixed by application levelrequirements. Thus, an additional constraint, below, needs tobe added to (8):

Vdd ≥ Vdd,min(fclk) (9)

Since the optimal value of Ctot is throughput independent,C∗

tot = Cmax. However, unlike TUA, Fig. 6(a) shows that Ebat

is strictly increasing with respect to fsw when fsw is sweptbetween fsw,min = 1.3 MHz and 100 MHz where fsw,min isthe lowest fsw at which the SC-VRM can meet (9). As fswincreases, Ebat increases from 47.3 pJ to 201.6 pJ, implyingthat the optimum switching frequency f∗sw is the frequencyunder which the Vdd = Vdd,min in (9). This is because anincreased fsw will lead to both higher Vdd (thus higher corecurrent) and higher shunt losses (Edrive and Ebp), which willtranslate to increased Ebat. Furthermore, Fig. 6(b) shows that

50 100 150 200 250 300 350 400 450 5000

0.2

0.4

0.6

0.8

1

1.2x 10

-10

Ctot

(pF)

Ein

(J)

Vdd

= 0.43V,

Vdd

= 0.47V

Vdd

= 0.51V

()

bat

EJ

20coreC pF100coreC pF

200coreC pF

100MHz

106

107

108

0.35

0.4

0.45

0.5

0.55

0.6

0.65

fsw(Hz)

Vdd(V

)

( )swf Hz

()

dd

VV

( )totC pF

Vdd=0.4V

Vdd=0.45V

Vdd=0.5V

(a)

50 100 150 200 250 300 350 400 450 5000

0.2

0.4

0.6

0.8

1

1.2x 10

-10

Ctot

(pF)

Ein

(J)

Vdd

= 0.43V,

Vdd

= 0.47V

Vdd

= 0.51V

()

bat

EJ

20coreC pF100coreC pF

200coreC pF

100MHz

106

107

108

0.35

0.4

0.45

0.5

0.55

0.6

0.65

fsw(Hz)

Vdd(V

)

( )swf Hz

()

dd

VV

( )totC pF

Vdd=0.4V

Vdd=0.45V

Vdd=0.5V

(b)

Fig. 4. Optimization for Ctot: a) system energy vs. charge transfer capac-itance, and b) Vdd vs. switching frequency fsw for Ctot = Cmax and anominal core load of 200 pF.

for TCA, the C-MEOP and S-MEOP are obtained at the samefsw,min thus V ∗

dd,S = V ∗dd,C , but V ∗

dd,S is still lower thanV ∗dd,V . This implies that operating at V ∗

dd,V is never energyoptimum at the system level. Although Ectl can be reducedby increasing output voltage while still satisfying (9), theincreased fsw translates to higher Eshunt which negates theenergy savings due to reduced Ectl. Fig. 6(b) also shows thatby operating at S-MEOP, maximum energy savings of 39.8%can be obtained.

The key difference between TUA optimization (Fig. 5) andTCA optimization (Fig. 6) is that in TUA, the core frequencyfclk scales with Vdd, while as it is a fixed value in TCA.In TUA optimization, Eshunt, Ecore, Ectl and Ectrl increasedramatically due to exponential reduction of fclk in the sub-threshold region. This leads to the S-MEOP occurring at ahigher Vdd compared with the C-MEOP. On the other hand,fclk is a constant in TCA so we always normalize the energyloss to the same value. This leads to S-MEOP occurring at thesame Vdd as C-MEOP.

IV. CONCLUSION AND FUTURE WORK

In this paper, we proposed a system level approach tooptimize operation of a SC-VRM and a compute core tominimize the total system EPI. We show that throughputconstraints have considerable impact on the optimization, and

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TCSII.2014.2331096

Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

5

105

106

107

108

109

0

200

400

fsw(Hz)

Ebat(

J)

105

106

107

108

1090

0.5

1

eff

Ebat

Ecore

eff

105

106

107

108

109

10-3

10-2

10-1

100

101

102

103

fsw(Hz)

Energ

y(J

)

Ebat

Ecore

Essl

Edrive

Ebp

Ectrl

12.6%

,bat C MEOPE

,bat S MEOPE

21.8%

C-MEOPS-MEOP

,bat V MEPE

En

erg

y (

)pJ

( )swf Hz

( )swf Hz

Ener

gy (

)pJ

EbatEcoreEctlEdriveEbpEctrl

EbatEcore

(a)

105

106

107

108

109

0

200

400

fsw(Hz)

Ebat(

J)

105

106

107

108

1090

0.5

1

eff

Ebat

Ecore

eff

105

106

107

108

109

10-3

10-2

10-1

100

101

102

103

fsw(Hz)

Energ

y(J

)

Ebat

Ecore

Essl

Edrive

Ebp

Ectrl

12.6%

,bat C MEOPE

,bat S MEOPE

21.8%

C-MEOPS-MEOP

,bat V MEPE

Ener

gy (

)pJ

( )swf Hz

( )swf Hz

En

erg

y (

)pJ

EbatEcoreEctlEdriveEbpEctrl

EbatEcore

(b)

Fig. 5. System optimization for TUA: a) core energy and SC-VRM energyloss vs. switching frequency fsw , and b) comparison of S-MEOP, C-MEOPand V-MEP.

that TUA and TCA result in different S-MEOP. For TUA,system level optimization reveals a unique S-MEOP differentfrom both C-MEOP and V-MEP; and for TCA, the S-MEOPoverlaps with C-MEOP.

Future work can focus on extending system level energyoptimization to reconfigurable SC-VRM circuits with variablevoltage conversion ratio, and developing energy efficient S-MEOP tracking architecture. Furthermore, statistical errorcompensation techniques such as algorithmic noise-tolerance(ANT) [12] has been shown to provide 60% energy savingwith no loss of performance when applied to a compute core.We will investigate the joint optimization problem consideringerror resiliency technique in both the core and VRM in orderto further reduce system EPI.

REFERENCES

[1] J. Kimball and P. Krein, “Analysis and design of switched capacitorconverters,” in Proc. IEEE Applied Power Electronics Conf. (APEC),2005, 2005, pp. 1473–1477.

[2] W. Chang, “Variable-conversion-ratio switched-capacitor-voltage-multiplier/divider dc-dc converter,” IEEE Trans. Circuit Syst. I,, vol. 58,no. 8, pp. 1944–1957, Aug 2011.

[3] R. Dreslinski, M. Wieckowski, D. Blaauw, D. Sylvester, and T. Mudge,“Near-threshold computing: Reclaiming Moore’s law through energyefficient integrated circuits,” Proceedings of the IEEE, vol. 98, no. 2,pp. 253 –266, Feb. 2010.

106

107

108

40

60

80

100

120

140

160

180

200

220

Ein

(J)

fsw (Hz)

106

107

1080.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Effic

ien

cy

Ein 10MHz

data2

10MHz V-MEP

( )swf Hz

En

erg

y (

)pJ

39.8%

106

107

108

10-2

10-1

100

101

102

103

fsw(Hz)

Energ

y(J

)

Ebat

Ecore

Essl

Edrive

Ebp

Ectrl

En

erg

y (

)pJ

( )swf Hz

fsw,min

S-MEOP

C-MEOP

EbatEcoreEctlEdriveEbpEctrl

EbatEcore

(a)

106

107

108

40

60

80

100

120

140

160

180

200

220

Ein

(J)

fsw (Hz)

106

107

1080.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Effic

ien

cy

Ein 10MHz

data2

10MHz V-MEP

( )swf Hz

Ener

gy (

)pJ

39.8%

106

107

108

10-2

10-1

100

101

102

103

fsw(Hz)

Energ

y(J

)

Ebat

Ecore

Essl

Edrive

Ebp

Ectrl

En

erg

y (

)pJ

( )swf Hz

fsw,min

S-MEOP

C-MEOP

EbatEcoreEctlEdriveEbpEctrl

EbatEcore

(b)

Fig. 6. System optimization for TCA: a) core energy and SC-VRM energyloss vs. switching frequency fsw , and b) comparison of S-MEOP, C-MEOPand V-MEP.

[4] Y. Lee, “Ultra-low power circuit techniques for a new class of sub-mm3sensor nodes,” in Proc. IEEE Custom Integrated Circuits Conf. (CICC),2010, Sept. 2010.

[5] A. Wang and A. Chandrakasan, “Energy-aware architectures for areal-valued FFT implementation,” in Proc. Int’l. Symp. Low PowerElectronics and Design (ISLPED), 2003, Aug. 2003, pp. 360 – 365.

[6] P. Rong and M. Pedram, “Hierarchical power management with appli-cation to scheduling,” in Proc. Int’l. Symp. Low Power Electronics andDesign (ISLPED), 2005, Aug. 2005, pp. 269 – 274.

[7] R. Abdallah, P. Shenoy, N. Shanbhag, and P. Krein, “System energy min-imization via joint optimization of the DC-DC converter and the core,” inProc. Int’l. Symp. Low Power Electronics and Design (ISLPED), 2011,Aug. 2011, pp. 97 –102.

[8] B. Zhai, D. Blaauw, D. Sylvester, and K. Flautner, “Theoretical andpractical limits of dynamic voltage scaling,” in Proc. Design AutomationConf., 2004., July 2004, pp. 868 –873.

[9] S. Ben-Yaakov and M. Evzelman, “Generic and unified model ofswitched capacitor converters,” in Proc. IEEE Energy ConversionCongress and Exposition, 2009, 2009, pp. 3501–3508.

[10] Y. Ramadass and A. Chandrakasan, “Voltage scalable switched capacitordc-dc converter for ultra-low-power on-chip applications,” in Proc. IEEEPower Electronics Specialists Conf. (PESC), 2007, June 2007, pp. 2353–2359.

[11] J. Kwong, Y. Ramadass, N. Verma, and A. Chandrakasan, “A 65 nmsub- microcontroller with integrated sram and switched capacitor dc-dcconverter,” Solid-State Circuits, IEEE Journal of, vol. 44, no. 1, pp. 115–126, Jan. 2009.

[12] B. Shim, S. Sridhara, and N. Shanbhag, “Reliable low-power digitalsignal processing via reduced precision redundancy,” IEEE Trans. VeryLarge Scale Integration (VLSI) Systems, vol. 12, no. 5, pp. 497–510,2004.

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TCSII.2014.2331096

Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].