Computer Structure 2012 – Power Management 1
Computer Structure
Power Management
Lihu Rappoport and Adi Yoaz
Thanks to Efi Rotem for many of the foils
Computer Structure 2012 – Power Management 2
Processor Power Components· The power consumed by a processor consists of
– Dynamic power: power for toggling transistors and lines from 01 or 10 αCV2f : α – activity, C – capacitance, V – voltage, f – frequency
– Leakage power: leakage of transistors under voltage function of: Z – total size of all transistors, V – voltage, t – temperature
· Peak power must not exceed the thermal constrains– Power generates heat
Heat must be dissipated to keep transistors within allowed temperature– Peak power determines peak frequency (and thus peak performance)– Also affects form factor, cooling solution cost, and acoustic noise
· Average power– Determines battery life (for mobile devices), electricity bill, air-condition bill– Average power = Total Energy / Total time
Including low-activity and idle-time (~90% idle time for client)
Computer Structure 2012 – Power Management 3
Performance per Watt· In small form-factor devices thermal budget limits performance
– Old target: get max performance– New target: get max performance at a given power envelope
Performance per Watt
· Increasing f also requires increasing V (~linearly)– Dynamic Power = αCV2f = Kf3 X% performance costs ~3X% power– A power efficient feature – better than 1:3 performance : power
Otherwise it is better to just increase frequency (and voltage)
· Vmin is the minimal operation voltage– Once at Vmin, reducing frequency no longer reduces voltage– At this point a feature is power efficient only if it is 1:1 performance : power
· Active energy efficiency tradeoff– Energyactive = Poweractive × Timeactive Poweractive / Perfactive
– Energy efficient feature: 1:1 performance : power
Computer Structure 2012 – Power Management 4
Platform Power· Processor average power is <10% of the platform
Display(panel + inverter)
33%
CPU10%
Power Supply10%
MCH9%
Misc.8%
GFX8%
HDD8%
CLK5%
ICH3%
DVD2%
LAN2%
Fan2%
Computer Structure 2012 – Power Management 5
Managing Power· Typical CPU usage varies over time
– Bursts of high utilization & long idle periods (~90% of time in client)
· Optimize power and energy consumption– High power when high performance is needed– Low power at low activity or idle
· Enhanced Intel SpeedStep® Technology– Multi voltage/frequency operating points– OS changes frequency to meet performance needs and minimize power– Referred to as processor Performance states = P-States
· OS notifies CPU when no tasks are ready for execution– CPU enters sleep state, called C-state– Using MWAIT instruction, with C-state level as an argument– Tradeoff between power and latency
Deeper sleep more power savings longer to wake
Computer Structure 2012 – Power Management 6
P-states· Operation frequncies are called P-states = Performance states
– P0 is the highest frequency – P1,2,3… are lower frequencies– Pn is the min Vcc point = Energy efficient point
· DVFS = Dynamic Voltage and Frequency Scaling– Power = CV2f ; f = KV Power ~ f3 – Program execution time ~ 1/f– E = P×t E ~ f2
Pn is the most energy efficient point
– Going up/down the cubic curve of power High cost to achieve frequency large power savings for
some small frequency reduction
P0P1
Pn
Freq
Power
P2
Computer Structure 2012 – Power Management 7
C-States: C0
· C0: CPU active state
Leakage
Clock Distribution
Local Clocks and Logic
Active Core Power
Computer Structure 2012 – Power Management 8
C-States: C1
· C0: CPU active state· C1: Halt state:
• Stop core pipeline• Stop most core clocks• No instructions are executed• Caches respond to external snoops
Leakage
Clock Distribution
Active Core Power
Computer Structure 2012 – Power Management 9
C-States: C3
· C0: CPU active state· C1: Halt state:
• Stop core pipeline• Stop most core clocks• No instructions are executed• Caches respond to external snoops
· C3 state:• Stop remaining core clocks• Flush internal core caches
Leakage
Active Core Power
Computer Structure 2012 – Power Management 10
C-States: C6
· C0: CPU active state· C1: Halt state:
• Stop core pipeline• Stop most core clocks• No instructions are executed• Caches respond to external snoops
· C3 state:• Stop remaining core clocks• Flush internal core caches
· C6 state:• Processor saves architectural state• Turn off power gate, eliminating leakage
Leakage
Core power goes to ~0
Active Core Power
Computer Structure 2012 – Power Management 11
Putting it all together
· CPU running at max power and frequency· Periodically enters C1
0
2
4
6
8
10
12
14
16
18
20
Pow
er [W
]
C1
C0P0
Time
Computer Structure 2012 – Power Management 12
Putting it all together
· Going into idle period– Gradually enters deeper C states– Controlled by OS
0
2
4
6
8
10
12
14
16
18
20
Time
Pow
er [W
]
C2C3
C4C1
C0P0
Computer Structure 2012 – Power Management 13
Putting it all together
· Tracking CPU utilization history– OS identifies low activity– Switches CPU to lower P state
0
2
4
6
8
10
12
14
16
18
20
Time
Pow
er [W
]
C2C3
C4
C0P1
C1
C0P0
Computer Structure 2012 – Power Management 14
Putting it all together
· CPU enters Idle state again
0
2
4
6
8
10
12
14
16
18
20
Time
Pow
er [W
]
C2C3
C4
C0P1
C2C3
C4C1
C0P0
Computer Structure 2012 – Power Management 15
· Further lowering the P state· DVD play runs at lowest P state
0
2
4
6
8
10
12
14
16
18
20
Putting it all together
Time
Pow
er [W
]
C2C3
C4
C0P1
C0P2C2
C3C4
C1
C0P0
Computer Structure 2012 – Power Management 16
Voltage and Frequency Domains· Two Independent Variable Power Planes
– CPU cores, ring and LLC Embedded power gates – each core
can be turned off individually Cache power gating – turn off portions
or all cache at deeper sleep states– Graphics processor
Can be varied or turned off when not active
· Shared frequency for all IA32 cores and ring
· Independent frequency for PG· Fixed Programmable power plane
for System Agent – Optimize SA power consumption– System On Chip functionality and PCU logic– Periphery: DDR, PCIe, Display
VCC Core(Gated)
VCC Core(Gated)
VCC Core(Gated)
VCC Core(Gated)
VCC Core(ungated)
VCC SA
VCC Graphics
VCC Periphery
VCC Periphery
Embe
dded
pow
er g
ates
Computer Structure 2012 – Power Management 17
Turbo Mode· P1 is guaranteed frequency
– CPU and GFX simultaneous heavy load at worst case conditions– Actual power has high dynamic range
· P0 is max possible frequency – the Turbo frequency– P1-P0 has significant frequency range (GHz)
Single thread or lightly loaded applications GFX <>CPU balancing
– OS treats P0 as any other P-state Requesting is when it needs more performance
– P1 to P0 range is fully H/W controlled Frequency transitions handled completely in HW PCU keeps silicon within existing operating limits
– Systems designed to same specs, with or without Turbo Mode
· Pn is the energy efficient state– Lower than Pn is controlled by Thermal-State
“Turbo”H/W
Control
OS VisibleStates
OS Control
T-state &Throttle
P1
Pn
P0 1C
freq
uenc
y
LFM
Computer Structure 2012 – Power Management 18
Freq
uenc
y (F
)
Freq
uenc
y (F
)
No Turbo
Core
0
Core 1Core 2Core 3
Core 2Core 3Co
re 0
Core 1Power Gating
Zero power for inactive cores
Turbo Mode
Workload Lightly Threaded
Computer Structure 2012 – Power Management 19
Workload Lightly Threaded
Freq
uenc
y (F
)
Freq
uenc
y (F
)
No Turbo
Core
0
Core 1Core 2Core 3
Turbo ModeUse thermal budget of
inactive core to increase
frequency of active cores
Core
0
Core 1
Turbo Mode
Power GatingZero power for inactive cores
Computer Structure 2012 – Power Management 20
Freq
uenc
y (F
)
Freq
uenc
y (F
)
No Turbo
Core
0
Core 1Core 2Core 3
Workload Lightly Threaded Co
re 0
Core 1
Turbo Mode
Power GatingZero power for inactive cores
Turbo ModeUse thermal budget of
inactive core to increase
frequency of active cores
Computer Structure 2012 – Power Management 21
Active cores running workloads < TDP
Freq
uenc
y (F
)
Freq
uenc
y (F
)
No Turbo
Core
0
Core 1Core 2Core 3
Core 2 Core 3
C
ore
0 Core 1Core 2Core 3Co
re 0
Core 1
Turbo Mode
Turbo ModeIncrease frequency
within thermal headroom
Computer Structure 2012 – Power Management 22
Freq
uenc
y (F
)
Freq
uenc
y (F
)
No Turbo
Core
0
Core 1Core 2Core 3
Workload Lightly Threaded
And active cores < TDP
Core 2 Core 3
Core 1
Cor
e 0
Turbo Mode
Turbo ModeIncrease frequency
within thermal headroom
Power GatingZero power for inactive cores
Computer Structure 2012 – Power Management 23
Thermal CapacitanceClassic Model
Steady-State Thermal Resistance
Design guide for steady state
Tem
pera
ture
Time
Classic model response
Temperature rises as energy is delivered to thermal solutionThermal solution response is calculated at real-time
Tem
pera
ture
Time
More realistic response to power
changes
New ModelSteady-State Thermal Resistance
ANDDynamic Thermal Capacitance
Foil taken from IDF 2011
Computer Structure 2012 – Power Management 24
Time
Power
Sleep orLow power
Turbo Boost 2.0
“TDP”
C0/P0(Turbo)
After idle periods, the system accumulates “energy budget” and can accommodate high power/performance for a few seconds
In Steady State conditions the power stabilizes on TDP
P > TDP:
Responsiveness
Sust
ain
power
Buildup thermal budget during idle periods
Use accumulated
energy budget to enhance user
experience
Intel® Turbo Boost Technology 2.0
Foil taken from IDF 2011
Computer Structure 2012 – Power Management 25
Core and Graphic Power Budgeting• Cores and Graphics integrated on the same die with
separate voltage/frequency controls; tight HW control• Full package power specifications available for sharing• Power budget can shift between Cores and Graphics
Core Power [W]
Graphics Power[W]
Total package power
Realistic concurrentmax power
Sum of max power
Heavy Graphicsworkload
Heavy CPUworkload
SpecificationCore Power
Specification Graphics Power
Applications
Sandy Bridge Next Gen Turbo
for short periods
Foil taken from IDF 2011