72
ECE 571 – Advanced Microprocessor-Based Design Lecture 6 Vince Weaver http://web.eece.maine.edu/ ~ vweaver [email protected] 19 September 2019

ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

ECE 571 – AdvancedMicroprocessor-Based Design

Lecture 6

Vince Weaver

http://web.eece.maine.edu/~vweaver

[email protected]

19 September 2019

Page 2: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Announcements

• HW#1 grades out

• HW#3 will be posted, RAPL

• Note, the equake benchmark takes a while to run (a few

minutes). Don’t give up on it.

• Some notes on Intel uops, uops.info

1

Page 3: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Paper Discussion

Producing Wrong Data Without Doing Anything Obviously

Wrong! by Mytkowicz, Diwan, Hauswirth and Sweeney,

ASPLOS’09.

2

Page 4: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Power and Energy

3

Page 5: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Definitions and Units

People often say Power when they mean Energy

• Energy – Joules, kWH (3.6MJ), Therm (105.5MJ), 1

Ton TNT (4.2GJ), eV (1.6 × 10−19 J), BTU (1055 J),

horsepower-hour (2.68 MJ), calorie (4.184 J)

• Power – Energy/Time – Watts (1 J/s), Horsepower

(746W), Ton of Refrigeration (12,000 Btu/h)

• Volt-Amps (for A/C) – same units as Watts, but not

same thing

• Charge – mAh (batteries) – need V to convert to Energy

4

Page 6: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Power and Energy in a Computer System

Power Consumption Breakdown on a Modern Laptop, A.

Mahersi and V. Vardhan, PACS’04.

• Old, but hard to find thorough breakdowns like this

• Thinkpad Laptop, 1.3GHz Pentium M, 256M, 14” disp

• Oscilloscope, voltage probe and clamp-on current probe

• Measured V and Current. P=IIR. V=IR P=IV,

subtractive for things w/o wires

• Total System Power 14-30W

• Old: no LED backlight, no SDD, etc.

5

Page 7: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Modern results are from CUGR/REU student research.

Laptop (2004) Modern Server?

Hard Drive 0.5-2W 5WLCD 1W

Backlight 1-4WCPU 2-15W 60+WGPU 1-5W 50+W

Memory 0.5-1.5W 1-5WPower Supply 0.65W

Wireless 0.1 - 3WCD-ROM 3-5W

USB (max 2.5W)USB keyboard 0.04W

USB mouse 0.03WUSB flash 0.5WUSB wifi 0.5W

6

Page 8: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

CPU Power and Energy

7

Page 9: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

CMOS Transistors

Source Drain

Gate

N−MOSFET

Gate

Source Drain

n−wellp−substrate

P−MOSFET

8

Page 10: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

CMOS Dynamic Power

• P = C∆V Vddαf

Charging and discharging capacitors big factor

(C∆V Vdd) from Vdd to ground

α is activity factor, transitions per clock cycle

F is frequency

• α often approximated as 12, ∆V Vdd as V 2

dd leading to

P ≈ 12CV

2ddf

• Some pass-through loss (V momentarily shorted)

9

Page 11: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

CMOS Dynamic Power Reduction

How can you reduce Dynamic Power?

• Reduce C – scaling

• Reduce Vdd – eventually hit transistor limit

• Reduce α (design level)

• Reduce f – makes processor slower

10

Page 12: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

CMOS Static Power

• Leakage Current – bigger issue as scaling smaller.

Forecast at one point to be 20-50% of all chip power

before mitigations were taken.

• Various kinds of leakage (Substrate, Gate, etc)

• Linear with Voltage: Pstatic = IleakageVdd

11

Page 13: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Leakage Mitigation

• SOI – Silicon on Insulator (AMD, IBM but not Intel)

• High-k dielectric – instead of SO2 use some other

material for gate oxide (Hafnium)

• Transistor sizing – make only the critical transistors fast;

non-critical ones can be made slower and less leakage

prone

• Body-biasing

• Sleep transistors

12

Page 14: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Notes on Process Technology

• 65nm – 2006

p4 to core2, IBM Cell

1.0v, High-K dielectric, gate thickness a few atoms

193/248nm light (UV)

• 45nm – 2008

core2 to nehalem

large lenses, double patterning, high-k

• 32nm – 2010

13

Page 15: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

sandybridge to westmere

immersion lithography

• 22nm – 2012 ivybridge, haswell

oxide only 0.5nm (two silicon atoms)

fin-fets

• 14nm and smaller – ??

Extreme UV (13.5nm light, hard-vacuum required)?

Electron beam?

14

Page 16: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Notes on Process Technology

• TI-OMAP cell phone processor (more or less discontinued

by TI, big layoffs in 2012)

Beagle Board and Gumstix OMAP35?? – 65nm

• OMAP4460 (Pandaboard) 45nm

• Cortex A15 28nm

• Rasp-pi BCM2835 – 45nm?

15

Page 17: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Total Energy

• Etot = [Pdyanmic + Pstatic]t

• Etot = [(CtotV2ddαf) + (NtotIleakageVdd)]t

16

Page 18: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Delay

• Td = CLVddµCox(WL )(Vdd−Vt)

• Simplifies to fMAX ∼ (Vdd−Vt)2Vdd

• If you lower f, you can lower Vdd

17

Page 19: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Thermal Issues

• Temperature and Heat Dissipation are closely related to

Power

• If thermal issues, need heatsinks, fans, cooling

18

Page 20: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Metrics to Optimize

• Power

• Energy

• MIPS/W, FLOPS/W (don’t handle quadratic V well)

• Energy ∗Delay• Energy ∗Delay2

19

Page 21: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Power Optimization

• Does not take into account time. Lowering power does

no good if it increases runtime.

20

Page 22: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Energy Optimization• Lowering energy can affect time too, as parts can run

slower at lower voltages

Which is better?

10 20 30 40 50

20 30 40 5010

5W

1W

5W

1W

50J

50J

time (s)

time (s)

21

Page 23: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Energy Delay – Watt/t*t

• Horowitz, Indermaur, Gonzalez (Low Power Electronics,

1994)

• Need to account for delay, so that lowering Energy does

not made delay (time) worse

• Voltage Scaling – in general scaling low makes transistors

slower

• Transistor Sizing – reduces Capacitance, also makes

transistors slower

22

Page 24: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

• Technology Scaling – reduces V and power.

• Transition Reduction – better logic design, have fewer

transitions

Get rid of clocks? Asynchronous? Clock-gating?

23

Page 25: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

ED Optimization

Which is better?

time (s)

time (s)

200W

200W

50W

50W

1

1

2

2

E=200JED=200JsEDD=200Jss

E=100JED=200JsEDD=400Jss

24

Page 26: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Energy Delay Squared– E*t*t

• Martin, Nystrom, Penzes – Power Aware Computing,

2002

• Independent of Voltage in CMOS

• Et can be misleading

Ea=2Eb, ta=tB/2

Reduce voltage by half, Ea=Ea/4, ta=2ta, Ea=Eb/2,

ta=tb

• Can have arbitrary large number of delay terms in Energy

product, squared seems to be good enough

25

Page 27: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Energy Delay / Energy Delay Squared

Lower is better.

Energy Delay ED ED2

5J 2s 10Js 20Js2

5J 3s 15Js 45Js2

Same ED, Different ED2

Energy Delay ED ED2

5J 2s 10Js 20Js2

2J 5s 10Js 50Js2

26

Page 28: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Energy Example

V f2

V (f/2)2

(V/2) (f/2)2

t

t

t

2t

2t

2t

E

E

E/4

Double delay, but keep

Voltage constant

Reduce voltage; we can

because f is less

27

Page 29: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Energy-Delay Product Redux

0 2 4 6 8

Delay (ns)

0

50

100

En

ergy (

pJ)

0 2 4 6 8 10

Configuration

0

100

200

300

400

En

ergy-D

elay (

pJ-n

s)

0 2 4 6 8 10

Configuration

0

1000

2000

3000

En

ergy-D

elay

-Sq

uare

d (

pJ-n

s^2)

Roughly based on data from “Energy-Delay Tradeoffs in

CMOS Multipliers” by Brown et al.

28

Page 30: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Raw Data

Delay Energy ED ED2

3 130 390 11703.5 100 350 1225

3.8 85 323 1227

4 75 300 1200

4.5 70 315 1418

5 65 325 1625

5.5 58 319 1755

6 55 330 1980

6.5 50 390 2535

8 50 400 3200

29

Page 31: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Other Metrics

• Energy −Delayn – choose appropriate factor

• Energy−Delay−Area2 – takes into account cost (die

area) [McPAT]

• Power-Delay – units of Energy – used to measure

switching

• Energy Delay Diagram – [SWEEP]

• Energy-Delay-FIT (reliability?)

30

Page 32: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Measuring Power and Energy

31

Page 33: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Why?

• New, massive, HPC machines use impressive amounts of

power

• When you have 100k+ cores, saving a few Joules per

core quickly adds up

• To improve power/energy draw, you need some way of

measuring it

32

Page 34: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Energy/Power Measurement is AlreadyPossible

Three common ways of doing this:

• Hand-instrumenting a system by tapping all power inputs

to CPU, memory, disk, etc., and using a data logger

• Using a pass-through power meter that you plug your

server into. Often these will log over USB

• Estimating power/energy with a software model based

on system behavior

33

Page 35: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Measuring Power and Energy

• Sense resistor or Hall Effect sensor gives you the current

• Sense resistor is small resistor. Measure voltage drop.

Current V=IR Ohm’s Law, so V/R=I

• Voltage drops are often small (why?) so you made need

to amplify with instrumentation amplifier

• Then you need to measure with A/D converter

• P = IV and you know the voltage

• How to get Energy from Power?

34

Page 36: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Hall Effect Current Sensors

• Output voltage varies based on magnetic field.

• Current in wire causes magnetic field

• Voltage output is linear proportional to current

• Ideally little to no resistance (unlike sense resistor)

• Can measure higher current. 5, 20, 30A

• Need that? 100W CPU at 3.3V is roughly 30A

35

Page 37: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Other Issues

• Matching up internal and external measurements?

• Serial port? ntp? signal?

• Hard for small time intervals.

36

Page 38: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Existing Related Work

Plasma/dposv results with Virginia Tech’s PowerPack

0

20

40

60

80

100

120

140

160

0 5 10 15 20 25 30 35 40

Pow

er (

Wat

ts)

Time (seconds)

CPUMemory

MotherboadFan

37

Page 39: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Powerpack

• Measure at Wall socket: WattsUp, ACPI-enabled power

adapter, Data Acquisition System

• Measure all power pins to components (intercept ATX

power connector?)

• CPU Power – CPU powered by four 12VDC pins.

• Disk power – measure 12 and 5VDC pins on disk power

connecter

38

Page 40: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

• Memory Power – DIMMs powered by four 5VDC pins

• Motherboard Power – 3.3V pins. Claim NIC contribution

is minimal, checked by varying workload

• System fans

39

Page 41: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

PowerMon 2

• PowerMon 2 is a custom board from RENCI

• Plugs in-line with ATX power supply.

• Reports results over USB

• 8 channels, 1kHz sample rate

• We had hardware at UT, but managed to brick it

40

Page 42: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Shortcomings of current methods

• Each measurement platform has a different interface

• Typically data can only be recorded off-line, to a separate

logging machine, and analysis is done after the fact

• Correlating energy/power with other performance

metrics can be difficult

• How often can you measure ( a lot happens on a CPU

at 2GHz)

41

Page 43: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Watt’s Up Pro Meter

42

Page 44: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Watt’s Up Pro Features

• Can measure 18 different values with 1 second resolution

(Watts, Volts, Amps, Watt-hours, etc.)

• Values read over USB

• Joules can be derived from power and time

• Can only measure system-wide

43

Page 45: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Watt’s Up Pro Graph

0 10 20 30Time (seconds)

0

20

40

60

Ave

rage

Pow

er (

Wat

ts)

PLASMA Cholesky Factorization N=10,000 threads=2

Measured on Core2 Laptop

44

Page 46: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Estimating Power

• Popular thing to do. One example: Real Time Power

Estimation and Thread Scheduling via Performance

Counters by Singh, Bhadauria and McKee.

• Have some sort of hardware measurement setup.

• Then measure lots of easy-to-measure things.

Performance counters. Temperature. etc.

• Create a model (machine learning?) that can estimate

• Apparently using as few as 4 counters can give pretty

good results

45

Page 47: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

RAPL

• Running Average Power Limit

• Part of an infrastructure to allow setting custom per-

package hardware enforced power limits

• Also for TurboBoost

• User Accessible Energy/Power readings are a bonus

feature of the interface

46

Page 48: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

How RAPL Works

• RAPL is not an analog power meter

(usually, Haswell-EP exception)

• RAPL uses a software power model, running on a helper

controller on the main chip package

• Energy is estimated using various hardware performance

counters, temperature, leakage models and I/O models

• The model is used for CPU throttling and turbo-boost,

but the values are also exposed to users via a model-

specific register (MSR)

47

Page 49: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Available RAPL Readings

• PACKAGE ENERGY: total energy used by entire package

• PP0 ENERGY: energy used by “power plane 0” which

includes all cores and caches

• PP1 ENERGY: on original Sandybridge this includes the

on-chip Intel GPU

• DRAM ENERGY: on Sandybridge EP this measures DRAM

energy usage. It is unclear whether this is just the

interface or if it includes all power used by all the

DIMMs too

48

Page 50: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

• SoC energy (skylake and newer?)

49

Page 51: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

RAPL Measurement Accuracy

• Intel Documentation indicates Energy readings are

updated roughly every millisecond (1kHz)

• Rotem at al. show results match actual hardware

Rotem et al. (IEEE Micro, Mar/Apr 2012)

50

Page 52: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

RAPL Accuracy, Continued

• The hardware also reports minimum measurement

quanta. This can vary among processor releases. On

our Sandybridge EP machine all Energy measurements

are in multiples of 15.2nJ

• Power and Energy can vary between identical packages

on a system, even when running identical workloads. It

is unclear whether this is due to process variation during

manufacturing or else a calibration issue.

51

Page 53: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

RAPL Validation

• The Dresden Paper

• My MEMSYS paper (include some plots?)

52

Page 54: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

RAPL Power Plot

10 20 30 40Time (seconds)

0

50

100

150

Avera

ge P

ow

er

(Watts)

PLASMA Cholesky Factorization N=30,000 threads=16

DRAM Package 0DRAM Package 1

PP0 Package 0PP0 Package 1

Total Package 0Total Package 1

Measured on SandyBridge EP

53

Page 55: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

RAPL Energy Plot

10 20 30 40Time (seconds)

0

1000

2000

3000

4000

Tota

l E

nerg

y (

Joule

s)

Cholesky Factorization N=30,000 threads=16

PLASMA Package 0PLASMA Package 1mkl Package 0mkl Package 1

Measured on SandyBridge EP

54

Page 56: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

NVML

• Recent NVIDIA GPUs support reading power via the

NVIDIA Management Library (NVML)

• On Fermi C2075 GPUs it has milliwatt resolution within

±5W and is updated at roughly 60Hz

• The power reported is that for the entire board, including

GPU and memory

55

Page 57: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

NVML Power Graph

0 1 2Time (seconds)

0

50

100

150

Avera

ge P

ow

er

(Watts)

MAGMA LU 10,000, Nvidia Fermi C2075

56

Page 58: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

AMD Application Power Management

• Recent AMD Family 15h processors also can report

“Current Power In Watts” via the Processor Power in

the TDP MSR

• Support for this can be provided similar to RAPL

• Have had bad luck getting accurate readings. Have found

various chip errata on fam15h and fam16h hardware

57

Page 59: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Other ways to measure Power

• IPMI – many server machines have built in (low

frequency) measurement of power supply values.

• Thermal? IR camera? Can see how much individual

parts of chip use.

Overheat? Use IR transparent liquid to cool it?

58

Page 60: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Using RAPL

• On Linux, at least 3 ways to get these values

• Read msr directly, either with instruction or /dev/msr.

Need root as you can do bad things with msrs. “safemsr”

• perf event

• hwmon/powercap (/sys/class/powercap/)

59

Page 61: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Listing Events

$ perf list

...

power/energy-cores/ [Kernel PMU event]

power/energy-gpu/ [Kernel PMU event]

power/energy-pkg/ [Kernel PMU event]

power/energy-ram/ [Kernel PMU event]

...

60

Page 62: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Measuring

$ perf stat -a -e power/energy-cores/,power/energy-ram/,instructions,cycles /opt/ece571/401.bzip2/bzip2 -k -f ./input.source

Performance counter stats for ’system wide’:

63.79 Joules power/energy-cores/ (100.00%)

2.34 Joules power/energy-ram/

21038123875 instructions # 1.06 insns per cycle (100.00%)

19782762541 cycles

3.407427702 seconds time elapsed

61

Page 63: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Measuring

• The key is -a which enables system-wide mode (needs

root too if not configured as such)

• Why do you need system-wide?

• What does that do to the other metrics?

62

Page 64: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Power and Energy Concerns

Table 1: OpenBLAS HPL N=10000 (Matrix Multiply)Machine Processor Cores Freq Idle Load Time Total

Power Power Time Energy

Raspberry Pi 2 Cortex-A7 4 900MHz 1.8W 3.4W 454s 1543J

Dragonboard Cortex-A53 4 1.2GHz 2.4W 4.7W 241s 1133J

Raspberry Pi 3 Cortex-A53 4 1.2GHz 1.8W 4.3W 178s 765J

Jetson-TX1 Cortex-A57 4 1.9GHz 2.1W 13.4W 47s 629J

Macbook Air Broadwell 2 1.6GHz 10.0W 29.1W 14s 407J

1. Which machine has the lowest under-load power draw?

Pi 2

63

Page 65: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

2. Which machine consumes the least amount of energy?

Broadwell Macbook Air

3. Which machine computes the result fastest?

Broadwell Macbook Air

4. Consider a use case with an embedded board taking

a picture once every 60 seconds and then performing

a matrix-multiply similar to the one in the benchmark

(perhaps for image-recognition purposes). Could all of

the boards listed meet this deadline?

No, only the Jetson and Macbook Air can meet the

64

Page 66: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

deadline

5. Assume a workload where a device takes a picture once

a minute then does a large matrix multiply (as seen in

Table 1). The device is idle when not multiplying, but

under full load when it is.

(a) Over a mine, what is the total energy usage of the

Jetson TX-1?

Each Minute = (13s Idle * 2.1W) + (47s Load *13.4W)

= 657J

(b) Over a minute, what is the total energy usage of the

Macbook Air?

65

Page 67: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Each Minute = (46s * 10W) + (14*29.1) = 867J

66

Page 68: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Pandaboard Power Stats

• Wattsuppro: 2.7W idle, seen up to 5W when busy

• http://ssvb.github.com/2012/04/10/cpuburn-arm-cortex-a9.html

• With Neon and CPU burn:Idle system 550 mA 2.75W

cpuburn-neon 1130 mA 5.65W

cpuburn-1.4a (burnCortexA9.s) 1180 mA 5.90W

ssvb-cpuburn-a9.S 1640 mA 8.2W

67

Page 69: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

Easy ways to reduce Power Usage

68

Page 70: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

DVFS

• Voltage planes – on CMP might share voltage planes so

have to scale multiple processors at a time

• DC to DC converter, programmable.

• Phase-Locked Loops. Orders of ms to change. Multiplier

of some crystal frequency.

• Senger et al ISCAS 2006 lists some alternatives. Two

phase locked loops? High frequency loop and have

programmable divider?

• Often takes time, on order of milliseconds, to switch

69

Page 71: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

frequency. Switching voltage can be done with less

hassle.

70

Page 72: ECE 571 { Advanced Microprocessor-Based Design Lecture 6web.eece.maine.edu/~vweaver/classes/ece571_2019f/ece571... · 2019. 9. 19. · Hard Drive 0.5-2W 5W LCD 1W Backlight 1-4W CPU

When can we scale CPU down?

• System idle

• System memory or I/O bound

• Poor multi-threaded code (spinning in spin locks)

• Thermal emergency

• User preference (want fans to run less)

71