Core 1 with Adjustable Regulator ... - Cornell University

Enabling Realistic Fine-Grain Voltage Scalingwith Reconfigurable Power Distribution Networks

Waclaw Godycki, Christopher Torng, Ivan BukreyevAlyssa Apsel, Christopher Batten

School of Electrical and Computer EngineeringCornell University

1 Abstract

Recent work has shown that monolithic integration of voltage regulators will befeasible in the near future, enabling reduced system cost and the potential forfine-grain voltage scaling (FGVS). In this project, we use architecture-levelmodeling to explore a new dynamic voltage/frequency scaling controller called thefine-grain synchronization controller (FG-SYNC+). FG-SYNC+ enables improvedperformance and energy efficiency at similar average power for multithreadedapplications with activity imbalance. We then use circuit-level modeling to explorevarious approaches to organizing on-chip voltage regulation, including a newapproach called reconfigurable power distribution networks (RPDNs). RPDNs allowone regulator to “borrow” energy storage from regulators associated withunderutilized cores resulting in improved area/power efficiency and fasterresponse times. We evaluate FG-SYNC+ and RPDN using a vertically integratedresearch methodology, and our results demonstrate a 10–50% performance and10–70% energy-efficiency improvement on the majority of the applications studiedcompared to no FGVS, yet RPDN uses 40% less area compared to a more traditionalper-core regulation scheme.

2 Motivation

Monolithic integration using a standard CMOS process provides a tremendous costincentive for integrating closed-loop voltage regulators on the die. Recent technologytrends suggest that it is now becoming feasible to integrate switching regulatorson-chip (e.g., Intel Haswell), enabling reduced system cost as well as the potentialfor fine-grain voltage scaling (FGVS) to exploit fine-grain activity imbalance inmulti-threaded applications for performance and energy efficiency benefits.

Integrated Voltage Regulation inIntel Haswell Processors

HASWELL’S FIVR EXTENDS BATTERY LIFE First Processor With Integrated Voltage Regulator Won’t Be the Last

By David Kanter (July 29, 2013)

...................................................................................................................

July 2013 © Microprocessor Report

The PC ecosystem has lagged phones and tablets in power management. For many years, voltage regulators (VRs) for PC processors have been a staid and boring af-fair, delivering relatively incremental improvements. Tra-ditional VRs are bulky and consume high-value mother-board real estate close to the processor, but they operate at very low frequencies (e.g., 1MHz) compared with modern 1–3GHz processors.

Intel’s introduction of fully integrated voltage regula-tors (FIVRs) in its newest Core processor, code-named Haswell, is fundamentally changing the power-delivery land-scape. This innovation is the technical underpinning for new platform-level power management that adopts and in some aspects exceeds the capabilities of mobile systems, radically reducing average power consumption for the sys-tem and improving responsiveness. The result is one of the greatest increases in mobile-PC battery life ever, as evi-denced in the initial Haswell-based notebooks that are now shipping.

Specifically, the FIVR’s low latency enables more-agile power management that can quickly turn platform components on or off, enabling the system to spend most of its time in a deep-sleep state. The OS and applications, however, remain alive and connected to the network, and the system swiftly responds to external stimuli. For exam-ple, in Windows 8 Connected Standby mode, Intel claims the Haswell platform reduces power consumption by 20x compared with Sandy Bridge, while still meeting the 300ms wake-up requirement.

Looking beyond the PC, the implications are signifi-cant across a wide range of markets. Properly designed IP blocks such as CPUs, GPUs, FPGA fabrics, wireless modems, and caches can all take advantage of a FIVR. Ap-plications from smartphones to servers all benefit from

improvements in power efficiency. Although Intel is the first to bring FIVRs to market, others will eventually follow.

Tiny Capacitors and Inductors The FIVR is tightly coupled to Haswell, using both on-die and package-level integration. It complements an external voltage-regulator module (VRM) to deliver power to the processor. The FIVR’s main input voltage is supplied by the VRM at 1.8V (nominal) but is dynamically adjusted via serial voltage identification (SVID) that ranges between 1.6V and 2.3V. The output voltages that the FIVR supplies to the different die regions vary from about 0.65V to 2.0V.

Conceptually, the VRM is optimized for high con-version efficiency and acts as a slow and coarse-grained

Figure 1. Block diagram of Haswell’s power distribution. The FIVR converts a single voltage, VCCIN (1.8V nominal), into several independent voltage rails, each of which varies dynamically according to the needs of the circuit.

Haswell

VLLC

VDDQ

GraphicsEngine

Rin

g In

terc

onne

ct

SystemAgent

+PCIe

+DMI

DRAM CntlVCCIN

FIV

R

CPUs+

Cache

VCPU0

VRING

VGPU

VSA

VIOA

VCPU1……

Figure from D. Kanter, MPR’13

Fine-Grain Activity Imbalance inMulti-Threaded Applications

bfs

rsort

str

sea

rch

0

7

0

7

0

7

Application Start Application End

90μs

7μs

2.3μs

3 Target System

Our target system is an embedded processor composedof: eight in-order, single-issue, five-stage, RISC cores;private, coherent 16 KB instruction and data L1 caches;and a shared 512 KB unified L2 cache. We implementedthe core and L1 memory system for this design in RTL andused a commercial standard-cell-based CAD toolflowtargeting a TSMC 65 nm process to generate layout. Eachcore can run at 333 MHz at 1 V and the full eight-coresystem is approximately 6 mm2. Target System Layout

4 Types of Integrated Voltage Regulation

The three primary types of step-down voltage regulators are linear regulators (e.g.,LDOs), inductor-based switching regulators (e.g., buck), and capacitor-basedswitching regulators (e.g., switched-capacitor).

5 FGVS Architecture Design: FG-SYNC+

We explore a new FGVS controller called the fine-grain synchronization controller(FG-SYNC+) that exploits fine-grain scaling in level (i.e., many voltage levels), space(i.e., per-core regulation), and time (i.e., fast transition times between levels) toimprove performance and energy efficiency while maintaining similar average power.FG-SYNC+ uses a thread library instrumented with hint instructions to inform thehardware about which cores are doing useful work vs. useless work (e.g., waiting fora task, waiting at a barrier).

(d) rsort with 100 ns Response Time(b) splash2-lu with Eight Domains

time

(a) splash2-lu with Two Domains (c) rsort with 1000 ns Response Time

time

time time7

Core

0

Exec.Time

Reducedby 20%

7

Core

0

Super-SprintSprintNominalRestBusy Waiting

7

Core

0

7

Core

0

In these application activity plots illustrating FG-SYNC+, rows show controllerdecisions per-core and black strips above cores show when that core is active. Wecompare SPLASH-2 LU factorization with two vs. four voltage domains (a,b). We alsoillustrate the impact of slow voltage-settling response times over a small excerpt fromradix sort (c,d).

Fine-Grain Voltage Scaling with FG-SYNC+

We use three sensitivity studies to understand the implication of varying: (1) thenumber of voltage levels, (2) the number of voltage domains, (3) and voltage-settlingresponse times.

0.7/1.0V 1.0/1.15V

1.0/1.33V

0.7/1.0/1.33V0.7/1.0/1.15V

0.7/1.0/1.15/1.33V

2 Domains 4 Domains

8 Domains

1000 ns 100 ns

0 ns

isopower

isopower

Increasing

sprinting

levels

isopower

Increasing

number of

domains

Decreasing

response

time

isopower

Norm

. E

nerg

y E

ffic

ien

cy 2.0

1.8

1.6

1.4

1.2

1.0

0.80.9 1.0 1.1 1.2 1.3 1.4 1.5 1.0 1.1 1.2 1.3 1.4 1.5

Speedup Speedup

Norm

. E

nerg

y E

ffic

ien

cy 2.0

1.8

1.6

1.4

1.2

1.0

0.81.0 1.1 1.2 1.3 1.4 1.5 0.4 0.6 0.8 1.0 1.2 1.4

2-Level Controllers 3-Level vs. 4-Level Controllers

Different Numbers of Domains Different Response Times

Speedup Speedup

To exploit fine-grain activity imbalance, (1) at least three levels are required and fourlevels helps further; (2) more domains results in improved performance and energyefficiency; (3) response times of 100 ns or faster are required.

6 FGVS Circuit Design: RPDN

Shown in (a), we use a single fixed-voltage regulator (SFVR) as a baseline tocompare against more sophisticated regulation schemes. We choose a configurationthat can provide 80% efficiency at 1 V with an area of 0.26 mm2 (4% of the core/L1area). Shown in (b), multiple adjustable voltage regulators (MAVR) enablefine-grain voltage scaling in space and level. The power efficiency vs. area plot in (c)shows how we choose a per-core regulator area of 0.08 mm2 to allow efficient voltageregulation for super-sprint. Note that designing for super-sprint significantlyover-provisions for rest, nominal, and sprint modes; also, only one or two cores willever be using the super-sprint mode at any given time.

Core 1 with Adjustable Regulator

Vin

AVRControl

Unit

Core

Phase 0

Core 0 with Adjustable Regulator

(b) MAVR Abstract Schematic

Vin Flyback Cap

Core 0

SFVRControl

Unit

SP P

Vout = Vin/2

S

Phase 0

Core 1 Core 7

(a) SFVR Abstract Schematic

80

75

70

65

60

55

0.02 0.04 0.06 0.08

Pow

er

Effic

ien

cy (

%)

Area (mm2)

NominalMode

Super-SprintMode

0.08 mm2

80% Efficiencyat Super-Sprint

Chosen Design Point

(c) MAVR Efficiency vs. Area

Reconfigurable Power Distribution Networks

We propose a new approach called reconfigurable power distribution networks(RPDNs). As shown below, RPDNs include many small “unit cells” that each containthe flyback capacitance and regulator switches required for a SC regulator. Thesecells can be flexibly reconfigured through a switch fabric and combined with per-corecontrol circuitry to effectively create multiple differently-sized SC regulators“on-demand” for cores. The inset shows how 16 unit cells can be allocated to fourcores operating in four different modes.

VoutControlLoop

Core 0 Core 1

RPDNFabricControl

RPDN Switch Fabric

DCBAUnitCell

X

r

S N

Allocating 16 Cells

The efficiency vs. powerplot shown for a singlecore illustrates howlower-voltage modesrequire less area for highefficiency. A resting coreonly requires a single cell;RPDN would allow asuper-sprinting core to“borrow” three cells from aresting core whilemaintaining high efficiency.

Pow

er

Effic

ien

cy

10 20 30 40 50 60 700

60%

65%

70%

75%

80%

85%Rest

(0.7V)

Nominal

(1.0V)

Sprint

(1.15V)

Super-Sprint

(1.33V)

Output Power (mW)

7 Cells

6

Cells

5

Cells

4

Cells

3

Cells

2

Cells

1

Cell

100 150 200 250 300 350 400

1.4

1.3

1.2

1.1

1.0

0.9

0.8

0.7

Time (ns)

Volta

ge (

V)

120 ns

150 ns

1.35

1.25

1.15

1.05

0.95

0.85

0.75

0.25 1.0 1.75 2.5 3.25 4.0Time (us)

Volta

ge (

V)

1390 ns 960 ns

2900 ns

MAVR Transient Response

RPDN Transient Response

The RPDN architecture provides areasavings of 40% over MAVR whensupporting per-core supply regulationacross the same number of cores. Inaddition to reducing area overhead,RPDN significantly reduces thevoltage-settling response time. Thewaveforms shown here illustrate thedifference in the transient responses forRPDN and MAVR when transitioningbetween modes. For RPDN, theresponse time for the nominal tosuper-sprint transition takes 150 ns whilethe same transition takes 2.9 µs withMAVR.

7 Evaluation

Evaluation Methodology

We use a vertically integrated evaluation methodology that uses a mix of circuit-,gate-, register-transfer-, and architectural-level modeling.

CrossCompiler

RISCISA Sim

RISC Binary

C++ MultithreadedApp Using Hints

Cycle-LevelSimulator

Verilog RTL of MulticoreRISC Processor

Gate-Level Model

VerilogSimulator

VerilogSimulator

SwitchingActivity

Power Model

Layout

SynthesisP&R

SPICE CircuitModels of PDNs

SPICESimulator

Per-Core Stats

Cycle-Level Multicorewith DVFS Controller

Instruction-BasedEnergy Dictionary System

Performanceand Energy

Results

DVFS ModeTransition

Times

Power Efficienciesfor each PDN

Evaluation Results

We evaluate the performance, energyefficiency, and power of applications onour target system with each PDN. We usea 4-level, 8-domain FG-SYNC+ controllerand account for realistic voltage-settlingresponse times and regulator powerefficiencies in each DVFS mode forvarying load currents. 1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

Norm

. E

nerg

y E

ffic

ien

cy

MAVR RPDN

0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5Speedup

isopower

leak cores regulator efficiency overhead transition

System-level Power Breakdowns for SFVR, MAVR, RPDN

bfs

bilater

aldith

er

kmeans

mriq

pbbs-dr

pbbs-knn

pbbs-mm

rsort

splash2-ff

t

splash2-lu

strsea

rchvite

rbi

0.00.10.20.30.40.50.60.70.80.91.0

SMR SMR SMR SMR SMR SMR SMR SMR SMR SMR SMR SMR SMR

Norm

aliz

ed

Pow

er

Our promising results suggest that there is an important opportunity for architectureand circuit co-design of integrated voltage regulation in future systems.

8 Acknowledgments

This work was supported in part by NSF CAREER Award #1149464, a SporkFellowship, and donations from Intel Corporation and Synopsys, Inc. The authorsacknowledge and thank Ji Kim for his help in developing the instruction-based energymodel, and Derek Lockhart, Shreesha Srinath, and Pol Rosello for their help in writingmulticore application kernels.

Publication: Appears in the Proceedings of the 47th Int’l Symp. on Microarchitecture (MICRO-47), Dec. 2014. URL: http://www.csl.cornell.edu/~cbatten/pdfs/godycki-rpdn-micro2014.pdf Contact Author: Christopher Torng, 364 Upson Hall, Ithaca, NY 14853, [email protected]

Documents

Core 1 with Adjustable Regulator ... - Cornell University