Upload
nguyenbao
View
217
Download
0
Embed Size (px)
Citation preview
A Scalable Priority-Aware Approach toManaging Data Center Server Power
Yang Li, Charles R. Lefurgy, Karthick Rajamani, Malcolm S. Allen-Ware, Guillermo J. Silva, Daniel D. Heimsoth, Saugata Ghose, Onur Mutlu
2
§ Critical impact on availability:
§ Significant capital cost:
Importance of Data Center Power Infrastructure
By Andy Patrizio, Network World, 2018
Numerous service down due to power outage
Tens of millions of US dollars
3
§ Considering peak demand, not typical demand for capacity sizing
§ Employing redundant power infrastructures
§ Power capacity is sized for 2X peak demand!
Underutilized Data Center Power Infrastructure
0 20 40 60 80 100CPU Utilization (%)
0
1
2
3
4
5
Frac
tion
of T
ime
(%)
Reproduced from Barroso et al., The Data Center as a Computer, 2013.
4
§ Why not add more servers to boost data center performance?
§ During normal operation, let these servers utilize the spare capacity§ During the rare worst cases, let a power management system kick in
– Throttle the servers with less important workloads (within seconds)– Protect the circuit breakers from tripping
§ Problem:Build a data center power management system using real-time control
Boosting Data Center Performance
+X%
5
§ Problem
§ Background§ Current Practice & Design Challenges
§ Key Solutions
§ Implementations
§ Evaluations
§ Open Challenges
§ Conclusions
Outline
6
We need to protect:– DC Contractual budget– Circuit breakers of UPS,
RPP, CDU
Layout for Data Center Power Infrastructure
Rack
ATS Generator
UPS
RPP
CDU
ServerPower Supply
Power SupplyServer
Power Supply
Power Supply
UPS
RPP
CDU . . .
. . .
. . .
. . .
Utility ATS
UPS
RPP
CDU
UPS
RPP
CDU. . .
. . .
. . .
Utility
A-Side Power Feed B-Side Power Feed
CircuitBreaker
7
§ Problem
§ Background
§ Current Practice & Design Challenges§ Key Solutions
§ Implementations
§ Evaluations
§ Open Challenges
§ Conclusions
Outline
8
§ Design Challenge #1:
Consider the redundant connections
§ Design Challenge #2:
Enforce priority-aware power cappingglobally
Hierarchical Control Framework
ShiftingController
ShiftingController
ShiftingController
CappingController
ShiftingControllerATS
UPS
RPP
CDU
ServerPower Supply
Power Supply
UPS
RPP
CDU . . .
. . .
. . .
. . .
CircuitBreaker
Contractual Budget
9
§ Servers do not split power equally§ Need to enforce different power caps for different supplies
Design Challenges #1: Consider Redundant Connections
ServerPower Supply
Power Supply
A-side Power Feed
B-side Power Feed
400 W demand
230 WA-side demand
170 WB-side demand 0 5 10 15 20
Rack ID
-400
-200
0
200
400
Pow
er M
ism
atch
bet
wee
n C
DU
s (W
)
10
§ Servers do not split power equally§ Need to enforce different power caps for different supplies§ Prior works can only enforce a single combined power cap§ It is desirable to utilize the stranded power
Design Challenges #1: Consider Redundant Connections
ServerPower Supply
Power Supply
A-side Power Feed
B-side Power Feed
400 W demand
230 WA-side demand
170 WB-side demand
400 W combined cap
170 W A-side cap
230 W B-side cap
300 W cap
170 W A-side power
130 W B-side power
100 W stranded power
Cap Violation
11
§ Servers do not split power equally§ Need to enforce different power caps for different supplies§ Prior works can only enforce a single combined power cap§ It is desirable to utilize the stranded power
Design Challenges #1: Consider Redundant Connections
ServerPower Supply
Power Supply
A-side Power Feed
B-side Power Feed
400 W demand
230 WA-side demand
170 WB-side demand
400 W combined cap
170 W A-side cap
230 W B-side cap
300 W cap
170 W A-side power
130 W B-side power
100 W stranded power
Cap Violation
Design a capping controller to enforce caps for each power supply;Design a shifting controller to utilize stranded power
12
§ Global Priority-Aware: Always cap low-priority servers before high-priority servers across the entire data center
§ Prior works can only enforce priority locally within rack
Design Challenges #2: Enforce Priority Globally
Top CB (Limit: 1400W)
Left CB (Limit: 750W) Right CB (Limit: 750W)
SD (Low Priority)
1240W Budget
SC (Low Priority)SB (Low Priority)SA (High Priority)
Demand 430 W 430 W 430 W 430 WBudget with
Local Priority350 W 270 W 310 W 310 W
Budget withGlobal Priority
430 W 270 W 270 W 270 W
Minimum server power cap!"#$"#%&'()*+ = 2700
13
§ Global Priority-Aware: Always cap low-priority servers before high-priority servers across the entire data center
§ Prior works can only enforce priority locally within rack
Design Challenges #2: Enforce Priority Globally
Top CB (Limit: 1400W)
Left CB (Limit: 750W) Right CB (Limit: 750W)
SD (Low Priority)
1240W Budget
SC (Low Priority)SB (Low Priority)SA (High Priority)
Demand 430 W 430 W 430 W 430 WBudget with
Local Priority350 W 270 W 310 W 310 W
Budget withGlobal Priority
430 W 270 W 270 W 270 W
Design a shifting controller to perform global priority-aware power capping
Minimum server power cap!"#$"#%&'()*+ = 2700
14
§ Problem
§ Background
§ Current Practice & Design Challenges
§ Key Solutions§ Implementations
§ Evaluations
§ Open Challenges
§ Conclusions
Outline
15
Overview of CapMaestro
ShiftingController
ShiftingController
ShiftingController
CappingController
ShiftingControllerATS
UPS
RPP
CDU
ServerPower Supply
Power Supply
UPS
RPP
CDU . . .
. . .
. . .
. . .
CircuitBreaker
Contractual Budget
ShiftingController
ShiftingController
ShiftingControllerUPS
RPP
CDU. . .
. . .
. . .
CircuitBreaker
For A-side Feed For B-side Feed
Enforce cap for individual supplies
Enforce global priorityaware power capping
16
§ Goal: Ensure all individual power supply caps are respected, while allowing as much power consumption as possible
§ Key Idea: – Monitor the minimum error between power caps and power consumption– Employ a Proportional-Integral (PI) controller
Capping Controller for Servers with Multiple Power Supplies
+
×"×#minimum
Budget1(t)
PS_power_in1(t) (AC W)
bound
DC capper PS1
PSM
Server
+BudgetM(t)
-
-
DC cap min
DC cap max
DC load
Throttle(t)error(t)
cap (DC W)
PS_power_inM(t) (AC W)
$�
�
1
2 3 4
17
§ Key Idea: Power Metric Summary– Summarize metrics for servers by priority under each shifting controller – Use metrics to budget from high priority to low priority
§ Power metrics (at each shifting controller): For each priority !,– "#$%_'() ! : The minimum total power budget
– "+,'$)+ ! : The total power demand– "-,./,01 ! : The requested power budget
"#2)01-$()1: The maximum power budget for all the servers (regardless of priority)
Global Priority-Aware Power Capping
Fulfillable power with respect to other prioritiesà
à May not be fulfillable
18
Global Priority-Aware Power Capping
Top CB (Limit: 1400W)
Left CB (Limit: 750W)
Right CB(Limit: 750W)
SD (Low Priority)
1240W Budget
SC (Low Priority)SB (Low Priority)SA (High Priority)
Demand 430 W 430 W 430 W 430 W
High Low!"#$_&'( 0 540
!)*&#() 0 860!+*,-*./ 0 750
!"0(./+#'(/ 980
High Low!"#$_&'( 270 270
!)*&#() 430 430
!+*,-*./ 430 320
!"0(./+#'(/ 980
High Low!"#$_&'( 270 810
!)*&#() 430 1290!+*,-*./ 430 970
!"0(./+#'(/ 1400
High Low430 270
High Low0 540
430 270 270 270
Minimum server power cap123423!"#$567 = 270<
Maximum server power cap123423!"#$5=> = 490<
19
§ Rigorous theoretical proof:
– Servers with high priorities are always throttled after servers with lower
priorities, as long as the circuit breaker limits allow
– See our IBM technical report
§ Good scalability:
– Linear algorithm complexity at each controller
– Fixed ratio of overhead to # servers
Global Priority-Aware Power Capping
20
§ Problem
§ Background
§ Current Practice & Design Challenges
§ Key Solutions
§ Implementations§ Evaluations
§ Open Challenges
§ Conclusions
Outline
21
§ Implemented as a first-of-a-kind cloud service
§ Controllers are packaged into containers
Implementations
22
Implementations
Room-LevelWorker
Rack-LevelWorker
ShiftingController
ShiftingController
ShiftingController
CappingController
ShiftingControllerATS
UPS
RPP
CDU
ServerPower Supply
Power Supply
UPS
RPP
CDU . . .
. . .
. . .
. . .
CircuitBreaker
Contractual Budget
ShiftingController
ShiftingController
ShiftingControllerUPS
RPP
CDU. . .
. . .
. . .
CircuitBreaker
For A-side Feed For B-side Feed
23
§ Implemented as a first-of-a-kind cloud service
§ Controllers are packaged into Worker containers
§ Power controller measures and controls power at 8 second intervals
§ Employ Intel Node Manager as the underlining server throttling engine– Cap power within 1 second– Measure power supplies every 1 second
§ Our power control framework supports dynamic server priorities
Implementations
®
24
§ Problem
§ Background
§ Current Practice & Design Challenges
§ Key Solutions
§ Implementations
§ Evaluations§ Open Challenges
§ Conclusions
Outline
25
§ Performed several real-system experiments demonstrating that:– Our capping controllers successfully enforce power caps for individual supplies– Our shifting controllers ensure servers with high priority are throttled after
servers with lower priorities– Our shifting controllers reallocate stranded power to the underpowered servers
§ Performed a large-scale data center simulation demonstrating that:– Compared to the case of no power management system, our system enables
the data center to deploy 50% more servers
Evaluations
26
§ Simulate a data center of 162 racks; 30% of servers are assigned high priority § Cap Ratio for high-priority servers under a power emergency:
§ Using 1% cap ratio as threshold, our system supports 5832 servers, while Local Priority only supports 4860 servers
Key Results
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 1K 2K 3K 4K 5K 6K 7K 8K
Cap
Rat
io
Number of Servers
No Priority
Local Priority
Global Priority
Full performance à
Throttled performance à
27
§ Problem
§ Background
§ Current Practice & Design Challenges
§ Key Solutions
§ Implementations
§ Evaluations
§ Open Challenges§ Conclusions
Outline
28
§ Broadening the target of power management:– Comprehensive power capping scheme for more system components, such
as GPU, FPGA, storage, and networking
§ Coordination of job scheduling with power management:– Controlling server power by controlling the number of jobs scheduled– Fine-grained job-level power control for jobs collocated on the same server
§ Crossing provider-user boundaries for energy savings:– Cloud providers need to make the benefits of energy savings visible to users– Need to overcome the issue of per-user power metering
Open Challenges
29
§ Problem
§ Background
§ Current Practice & Design Challenges
§ Key Solutions
§ Implementations
§ Evaluations
§ Open Challenges
§ Conclusions
Outline
30
§ Data center power capacity is heavily underutilized
§ Utilize this spare capacity by adding additional servers
§ Employ a power management system to deal with power emergencies or faults– Deal with power feeds– Enforce priority-aware power capping globally across the entire data center– Utilize the stranded power
§ Our power management system boosts data center server capacity by 50%
§ Highlight other open challenges in data center power management
Conclusions
31
Thank you!