31
A Scalable Priority-Aware Approach to Managing Data Center Server Power Yang Li, Charles R. Lefurgy, Karthick Rajamani, Malcolm S. Allen-Ware, Guillermo J. Silva, Daniel D. Heimsoth, Saugata Ghose, Onur Mutlu

A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

Embed Size (px)

Citation preview

Page 1: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

A Scalable Priority-Aware Approach toManaging Data Center Server Power

Yang Li, Charles R. Lefurgy, Karthick Rajamani, Malcolm S. Allen-Ware, Guillermo J. Silva, Daniel D. Heimsoth, Saugata Ghose, Onur Mutlu

Page 2: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

2

§ Critical impact on availability:

§ Significant capital cost:

Importance of Data Center Power Infrastructure

By Andy Patrizio, Network World, 2018

Numerous service down due to power outage

Tens of millions of US dollars

Page 3: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

3

§ Considering peak demand, not typical demand for capacity sizing

§ Employing redundant power infrastructures

§ Power capacity is sized for 2X peak demand!

Underutilized Data Center Power Infrastructure

0 20 40 60 80 100CPU Utilization (%)

0

1

2

3

4

5

Frac

tion

of T

ime

(%)

Reproduced from Barroso et al., The Data Center as a Computer, 2013.

Page 4: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

4

§ Why not add more servers to boost data center performance?

§ During normal operation, let these servers utilize the spare capacity§ During the rare worst cases, let a power management system kick in

– Throttle the servers with less important workloads (within seconds)– Protect the circuit breakers from tripping

§ Problem:Build a data center power management system using real-time control

Boosting Data Center Performance

+X%

Page 5: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

5

§ Problem

§ Background§ Current Practice & Design Challenges

§ Key Solutions

§ Implementations

§ Evaluations

§ Open Challenges

§ Conclusions

Outline

Page 6: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

6

We need to protect:– DC Contractual budget– Circuit breakers of UPS,

RPP, CDU

Layout for Data Center Power Infrastructure

Rack

ATS Generator

UPS

RPP

CDU

ServerPower Supply

Power SupplyServer

Power Supply

Power Supply

UPS

RPP

CDU . . .

. . .

. . .

. . .

Utility ATS

UPS

RPP

CDU

UPS

RPP

CDU. . .

. . .

. . .

Utility

A-Side Power Feed B-Side Power Feed

CircuitBreaker

Page 7: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

7

§ Problem

§ Background

§ Current Practice & Design Challenges§ Key Solutions

§ Implementations

§ Evaluations

§ Open Challenges

§ Conclusions

Outline

Page 8: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

8

§ Design Challenge #1:

Consider the redundant connections

§ Design Challenge #2:

Enforce priority-aware power cappingglobally

Hierarchical Control Framework

ShiftingController

ShiftingController

ShiftingController

CappingController

ShiftingControllerATS

UPS

RPP

CDU

ServerPower Supply

Power Supply

UPS

RPP

CDU . . .

. . .

. . .

. . .

CircuitBreaker

Contractual Budget

Page 9: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

9

§ Servers do not split power equally§ Need to enforce different power caps for different supplies

Design Challenges #1: Consider Redundant Connections

ServerPower Supply

Power Supply

A-side Power Feed

B-side Power Feed

400 W demand

230 WA-side demand

170 WB-side demand 0 5 10 15 20

Rack ID

-400

-200

0

200

400

Pow

er M

ism

atch

bet

wee

n C

DU

s (W

)

Page 10: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

10

§ Servers do not split power equally§ Need to enforce different power caps for different supplies§ Prior works can only enforce a single combined power cap§ It is desirable to utilize the stranded power

Design Challenges #1: Consider Redundant Connections

ServerPower Supply

Power Supply

A-side Power Feed

B-side Power Feed

400 W demand

230 WA-side demand

170 WB-side demand

400 W combined cap

170 W A-side cap

230 W B-side cap

300 W cap

170 W A-side power

130 W B-side power

100 W stranded power

Cap Violation

Page 11: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

11

§ Servers do not split power equally§ Need to enforce different power caps for different supplies§ Prior works can only enforce a single combined power cap§ It is desirable to utilize the stranded power

Design Challenges #1: Consider Redundant Connections

ServerPower Supply

Power Supply

A-side Power Feed

B-side Power Feed

400 W demand

230 WA-side demand

170 WB-side demand

400 W combined cap

170 W A-side cap

230 W B-side cap

300 W cap

170 W A-side power

130 W B-side power

100 W stranded power

Cap Violation

Design a capping controller to enforce caps for each power supply;Design a shifting controller to utilize stranded power

Page 12: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

12

§ Global Priority-Aware: Always cap low-priority servers before high-priority servers across the entire data center

§ Prior works can only enforce priority locally within rack

Design Challenges #2: Enforce Priority Globally

Top CB (Limit: 1400W)

Left CB (Limit: 750W) Right CB (Limit: 750W)

SD (Low Priority)

1240W Budget

SC (Low Priority)SB (Low Priority)SA (High Priority)

Demand 430 W 430 W 430 W 430 WBudget with

Local Priority350 W 270 W 310 W 310 W

Budget withGlobal Priority

430 W 270 W 270 W 270 W

Minimum server power cap!"#$"#%&'()*+ = 2700

Page 13: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

13

§ Global Priority-Aware: Always cap low-priority servers before high-priority servers across the entire data center

§ Prior works can only enforce priority locally within rack

Design Challenges #2: Enforce Priority Globally

Top CB (Limit: 1400W)

Left CB (Limit: 750W) Right CB (Limit: 750W)

SD (Low Priority)

1240W Budget

SC (Low Priority)SB (Low Priority)SA (High Priority)

Demand 430 W 430 W 430 W 430 WBudget with

Local Priority350 W 270 W 310 W 310 W

Budget withGlobal Priority

430 W 270 W 270 W 270 W

Design a shifting controller to perform global priority-aware power capping

Minimum server power cap!"#$"#%&'()*+ = 2700

Page 14: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

14

§ Problem

§ Background

§ Current Practice & Design Challenges

§ Key Solutions§ Implementations

§ Evaluations

§ Open Challenges

§ Conclusions

Outline

Page 15: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

15

Overview of CapMaestro

ShiftingController

ShiftingController

ShiftingController

CappingController

ShiftingControllerATS

UPS

RPP

CDU

ServerPower Supply

Power Supply

UPS

RPP

CDU . . .

. . .

. . .

. . .

CircuitBreaker

Contractual Budget

ShiftingController

ShiftingController

ShiftingControllerUPS

RPP

CDU. . .

. . .

. . .

CircuitBreaker

For A-side Feed For B-side Feed

Enforce cap for individual supplies

Enforce global priorityaware power capping

Page 16: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

16

§ Goal: Ensure all individual power supply caps are respected, while allowing as much power consumption as possible

§ Key Idea: – Monitor the minimum error between power caps and power consumption– Employ a Proportional-Integral (PI) controller

Capping Controller for Servers with Multiple Power Supplies

+

×"×#minimum

Budget1(t)

PS_power_in1(t) (AC W)

bound

DC capper PS1

PSM

Server

+BudgetM(t)

-

-

DC cap min

DC cap max

DC load

Throttle(t)error(t)

cap (DC W)

PS_power_inM(t) (AC W)

$�

1

2 3 4

Page 17: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

17

§ Key Idea: Power Metric Summary– Summarize metrics for servers by priority under each shifting controller – Use metrics to budget from high priority to low priority

§ Power metrics (at each shifting controller): For each priority !,– "#$%_'() ! : The minimum total power budget

– "+,'$)+ ! : The total power demand– "-,./,01 ! : The requested power budget

"#2)01-$()1: The maximum power budget for all the servers (regardless of priority)

Global Priority-Aware Power Capping

Fulfillable power with respect to other prioritiesà

à May not be fulfillable

Page 18: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

18

Global Priority-Aware Power Capping

Top CB (Limit: 1400W)

Left CB (Limit: 750W)

Right CB(Limit: 750W)

SD (Low Priority)

1240W Budget

SC (Low Priority)SB (Low Priority)SA (High Priority)

Demand 430 W 430 W 430 W 430 W

High Low!"#$_&'( 0 540

!)*&#() 0 860!+*,-*./ 0 750

!"0(./+#'(/ 980

High Low!"#$_&'( 270 270

!)*&#() 430 430

!+*,-*./ 430 320

!"0(./+#'(/ 980

High Low!"#$_&'( 270 810

!)*&#() 430 1290!+*,-*./ 430 970

!"0(./+#'(/ 1400

High Low430 270

High Low0 540

430 270 270 270

Minimum server power cap123423!"#$567 = 270<

Maximum server power cap123423!"#$5=> = 490<

Page 19: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

19

§ Rigorous theoretical proof:

– Servers with high priorities are always throttled after servers with lower

priorities, as long as the circuit breaker limits allow

– See our IBM technical report

§ Good scalability:

– Linear algorithm complexity at each controller

– Fixed ratio of overhead to # servers

Global Priority-Aware Power Capping

Page 20: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

20

§ Problem

§ Background

§ Current Practice & Design Challenges

§ Key Solutions

§ Implementations§ Evaluations

§ Open Challenges

§ Conclusions

Outline

Page 21: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

21

§ Implemented as a first-of-a-kind cloud service

§ Controllers are packaged into containers

Implementations

Page 22: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

22

Implementations

Room-LevelWorker

Rack-LevelWorker

ShiftingController

ShiftingController

ShiftingController

CappingController

ShiftingControllerATS

UPS

RPP

CDU

ServerPower Supply

Power Supply

UPS

RPP

CDU . . .

. . .

. . .

. . .

CircuitBreaker

Contractual Budget

ShiftingController

ShiftingController

ShiftingControllerUPS

RPP

CDU. . .

. . .

. . .

CircuitBreaker

For A-side Feed For B-side Feed

Page 23: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

23

§ Implemented as a first-of-a-kind cloud service

§ Controllers are packaged into Worker containers

§ Power controller measures and controls power at 8 second intervals

§ Employ Intel Node Manager as the underlining server throttling engine– Cap power within 1 second– Measure power supplies every 1 second

§ Our power control framework supports dynamic server priorities

Implementations

®

Page 24: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

24

§ Problem

§ Background

§ Current Practice & Design Challenges

§ Key Solutions

§ Implementations

§ Evaluations§ Open Challenges

§ Conclusions

Outline

Page 25: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

25

§ Performed several real-system experiments demonstrating that:– Our capping controllers successfully enforce power caps for individual supplies– Our shifting controllers ensure servers with high priority are throttled after

servers with lower priorities– Our shifting controllers reallocate stranded power to the underpowered servers

§ Performed a large-scale data center simulation demonstrating that:– Compared to the case of no power management system, our system enables

the data center to deploy 50% more servers

Evaluations

Page 26: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

26

§ Simulate a data center of 162 racks; 30% of servers are assigned high priority § Cap Ratio for high-priority servers under a power emergency:

§ Using 1% cap ratio as threshold, our system supports 5832 servers, while Local Priority only supports 4860 servers

Key Results

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 1K 2K 3K 4K 5K 6K 7K 8K

Cap

Rat

io

Number of Servers

No Priority

Local Priority

Global Priority

Full performance à

Throttled performance à

Page 27: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

27

§ Problem

§ Background

§ Current Practice & Design Challenges

§ Key Solutions

§ Implementations

§ Evaluations

§ Open Challenges§ Conclusions

Outline

Page 28: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

28

§ Broadening the target of power management:– Comprehensive power capping scheme for more system components, such

as GPU, FPGA, storage, and networking

§ Coordination of job scheduling with power management:– Controlling server power by controlling the number of jobs scheduled– Fine-grained job-level power control for jobs collocated on the same server

§ Crossing provider-user boundaries for energy savings:– Cloud providers need to make the benefits of energy savings visible to users– Need to overcome the issue of per-user power metering

Open Challenges

Page 29: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

29

§ Problem

§ Background

§ Current Practice & Design Challenges

§ Key Solutions

§ Implementations

§ Evaluations

§ Open Challenges

§ Conclusions

Outline

Page 30: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

30

§ Data center power capacity is heavily underutilized

§ Utilize this spare capacity by adding additional servers

§ Employ a power management system to deal with power emergencies or faults– Deal with power feeds– Enforce priority-aware power capping globally across the entire data center– Utilize the stranded power

§ Our power management system boosts data center server capacity by 50%

§ Highlight other open challenges in data center power management

Conclusions

Page 31: A Scalable Priority-Aware Approach to Managing Data Center ... · §Implemented as a first-of-a-kind cloud service §Controllers are packaged into Worker containers §Power controller

31

Thank you!