Upload
oakes
View
50
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Beyond DVFS: A First Look at Performance Under a Hardware-Enforced Power Bound. HPPAC 2012. Barry Rountree, Dong H. Ahn , Bronis R. de Supinski , David K. Lowenthal , Martin Schulz. Monday, May 21st. Computing under a power bound forces us to rethink performance. Traditional - PowerPoint PPT Presentation
Citation preview
LLNL-PRES-552151This work has been authored by Lawrence Livermore National Security, LLC under contract DE-AC52-07NA27344 with the U.S. Department of Energy. Accordingly, the United States Government retains and the publisher, by accepting this work for dissemination, acknowledges that the United States Government retains a non-exclusive, paid up, irrevocable, world-wide license to publish or reproduce the disseminated form of this work or allow others to do so, for United States Government purposes.
Beyond DVFS:A First Look at Performance Under a Hardware-Enforced Power Bound
HPPAC 2012Barry Rountree, Dong H. Ahn,
Bronis R. de Supinski, David K. Lowenthal, Martin Schulz
Monday, May 21st
Lawrence Livermore National Laboratory LLNL-PRES-5521512
Exascale (if not sooner)• Not all components can
operate at highest power level simultaneously
• Power provisioning is best effort
• Users must tune power for performance
• Nearly every application limited by power
Computing under a power bound forces us to rethink performance
Traditional• All components can
operate at highest power level simultaneously
• Power provisioned for “worst case”
• Users are happily oblivious (about power)
• Few if any applications limited by power
Lawrence Livermore National Laboratory LLNL-PRES-5521513
Computing under a power bound forces us to rethink performance
Exascale (if not sooner)• Utilization measured in
kilowatt hours• Weak-scaling jobs may
perform optimally with fewer, faster nodes
• Running all components as fast as possible cannot be done. Running most components at identical speeds is suboptimal
Traditional• Utilization measured in
node-hours• Weak-scaling jobs perform
best using as many nodes as possible
• Running all components as fast as possible reliably leads to top performance
Lawrence Livermore National Laboratory LLNL-PRES-5521514
Power(Watts)
Processors
rzmerl(Early April)
Average Processor Power BoundSum of processor power draw divided by processor count must be at or below this level.
Each processor usessome amount of power
Total processor powerdivided by processor count should be lessthan the bound
Linpack + Intel Turbo Boost
GHznon-turbo(2.6 GHz)
max turbo(3.3 GHz)
Short-term solution:
Disable Turbo Boostglobally
Lost performance
Mid-term solution:
Buy more power
(This does not scale)
Average Processor Power Bound rzmerl(Mid April)exascale(?)
Long-term solution:
Schedule powerto optimize performance
An Unexpected Power Bound:Merlot cluster at LLNL
Lawrence Livermore National Laboratory LLNL-PRES-5521515
Runtime Average Power Limit (RAPL)• Measures cumulative joules (power x time)• Three separate power meters• Clamping on package and DRAM power
Turbo suppression
Effective frequency
libmsr currently under development
Scheduling Power with Processor Hardware: Intel’s RAPL
Lawrence Livermore National Laboratory LLNL-PRES-5521516
Domains and Features of Runing Average Power Limit Technology
Source: Intel 64 and IA-32 Software Developer’s Manual,
Volume 3B
Introduced on Sandy Bridge ProcessorsOnboard energy meters measure accumulated joules.
Divide by time to get average power.
Can place user-specified limit on average power over a user-specific time window.
Lawrence Livermore National Laboratory LLNL-PRES-5521517
Bounding Package Power with RAPL
Setting LOCK fixes power limits until rebootLimits are ignored until enable bits are setPower limit is enforced using average watts over user specified window.
Resolution: ~1msMax Window: ~46ms
Watts granularity: 0.125WMinimum power bound: 51W
Source: Intel 64 and IA-32 Software Developer’s Manual,
Volume 3B
Two windows allows tweaking peak and average powerHigher bound, smaller window for peak powerLower bound, wider window for average power
Lawrence Livermore National Laboratory LLNL-PRES-5521518
Bounding DRAM Power with RAPL
Similar interface for DRAM power control
Only one power limit supported
Source: Intel 64 and IA-32 Software Developer’s Manual,
Volume 3B
Lawrence Livermore National Laboratory LLNL-PRES-5521519
Processors are Heterogeneous Under a Power Bound
rzzinmg.C.864 processors34 power bounds
No Power Bound
Processors take similar time
Significant variation in power
Power variation expected and acceptable
51W Power Bound
Processors require same amount of power
Individual processor efficiency has not changed
Efficiency variation manifests as performance variation
Processors are heterogeneous under a power bound
Where should the hot processors go?
Is is worth paying a premium efficient processors?
Lawrence Livermore National Laboratory LLNL-PRES-55215110
Wide Variation in Application Package Power Draw
Ave
rgae
Wat
ts
rzmerlNPB C.8234 processors
Wide variation in power consumption across applications
Provisioning power for most power-hungry application leaves remaining applicationsnode-bound, not power-bound
Processors ordered by cg.C.8 average PKG power
Lawrence Livermore National Laboratory LLNL-PRES-55215111
Wide Variation in Application DRAM Power Draw
Ave
rgae
Wat
ts
rzmerlNPB C.8234 processors
Memory power substantially lower than package power
Processors ordered by cg.C.8 average PKG power
Lawrence Livermore National Laboratory LLNL-PRES-55215112
Overprovision hardware• Processors are cheap and plentiful• Power is not
Measure performance at max power consumption• May require turning off nodes• Running out of nodes before running out of power means
application is not power-bound
Expect heterogeneous processor performance• Put most-efficient nodes on the critical path if possible• Put least-efficient nodes where they will do the least harm
Exascale Is Not Only Bigger: Exascale Is Fundamentally Different