Energy Models in Data Parallel CPU/GPU Computations

Preview:

Citation preview

Energy models in data parallel CPU/GPU computations

Master Degree in Computer Science and Networking

Università di Pisa and Scuola Superiore Sant’Anna

Academic Year 2014/2015

Supervisor: Candidate:Prof. Marco Danelutto Alessandro Lenzi

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 1 / 23

Outline

1 IntroductionI Energy consumption issueI Heterogeneous, parallel architecturesI ProblemI Energy consumption in CPU/GPU computations

2 GPU Energy Model

3 CPU Energy Model

4 Aggregated model and usage

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 2 / 23

Section 1

Introduction

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 3 / 23

The problem of energy consumption

I Technical problemI Cities worth of power, power provvisioning is an issue

I Higher temperature, more failuresI Mobile devices with limited energy budget.

I Economic issue

I Cost of energy > cost of hardware

I Environmental concern

I 2% of greenhouse emissions cause by US IT sector

Energy consumption is nowadays a growing concern in IT industry.

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 4 / 23

The problem of energy consumption

I Technical problemI Cities worth of power, power provvisioning is an issueI Higher temperature, more failures

I Mobile devices with limited energy budget.I Economic issue

I Cost of energy > cost of hardware

I Environmental concern

I 2% of greenhouse emissions cause by US IT sector

Energy consumption is nowadays a growing concern in IT industry.

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 4 / 23

The problem of energy consumption

I Technical problemI Cities worth of power, power provvisioning is an issueI Higher temperature, more failuresI Mobile devices with limited energy budget.

I Economic issue

I Cost of energy > cost of hardware

I Environmental concern

I 2% of greenhouse emissions cause by US IT sector

Energy consumption is nowadays a growing concern in IT industry.

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 4 / 23

The problem of energy consumption

I Technical problemI Cities worth of power, power provvisioning is an issueI Higher temperature, more failuresI Mobile devices with limited energy budget.

I Economic issue

I Cost of energy > cost of hardwareI Environmental concern

I 2% of greenhouse emissions cause by US IT sector

Energy consumption is nowadays a growing concern in IT industry.

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 4 / 23

The problem of energy consumption

I Technical problemI Cities worth of power, power provvisioning is an issueI Higher temperature, more failuresI Mobile devices with limited energy budget.

I Economic issueI Cost of energy > cost of hardware

I Environmental concern

I 2% of greenhouse emissions cause by US IT sector

Energy consumption is nowadays a growing concern in IT industry.

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 4 / 23

The problem of energy consumption

I Technical problemI Cities worth of power, power provvisioning is an issueI Higher temperature, more failuresI Mobile devices with limited energy budget.

I Economic issueI Cost of energy > cost of hardware

I Environmental concern

I 2% of greenhouse emissions cause by US IT sector

Energy consumption is nowadays a growing concern in IT industry.

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 4 / 23

The problem of energy consumption

I Technical problemI Cities worth of power, power provvisioning is an issueI Higher temperature, more failuresI Mobile devices with limited energy budget.

I Economic issueI Cost of energy > cost of hardware

I Environmental concernI 2% of greenhouse emissions cause by US IT sector

Energy consumption is nowadays a growing concern in IT industry.

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 4 / 23

The problem of energy consumption

I Technical problemI Cities worth of power, power provvisioning is an issueI Higher temperature, more failuresI Mobile devices with limited energy budget.

I Economic issueI Cost of energy > cost of hardware

I Environmental concernI 2% of greenhouse emissions cause by US IT sector

Energy consumption is nowadays a growing concern in IT industry.

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 4 / 23

The problem of energy consumption

I Technical problemI Cities worth of power, power provvisioning is an issueI Higher temperature, more failuresI Mobile devices with limited energy budget.

I Economic issueI Cost of energy > cost of hardware

I Environmental concernI 2% of greenhouse emissions cause by US IT sector → 3% by 2020

Energy consumption is nowadays a growing concern in IT industry.

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 4 / 23

The problem of energy consumption

I Technical problemI Cities worth of power, power provvisioning is an issueI Higher temperature, more failuresI Mobile devices with limited energy budget.

I Economic issueI Cost of energy > cost of hardware

I Environmental concernI 2% of greenhouse emissions cause by US IT sector → 3% by 2020

Energy consumption is nowadays a growing concern in IT industry.

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 4 / 23

Scenario

Parallel, heterogeneous architectures everywhere!I Even low-end computers have multiple CPU cores and a GPU

I Different architectures, different energy footprint depending on thecomputation

I Saving energy requires efficient resource usage

I Structured parallel application development methodology

I Divide work according to energy consumption, not just time!

We selected a known, widely used data-parallel pattern

Map parallel pattern

I Map: higher order function, taking function f and collection A

I f applied in parallel on different partitionsI suitable for both CPU and GPU architectures

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 5 / 23

Scenario

Parallel, heterogeneous architectures everywhere!I Even low-end computers have multiple CPU cores and a GPUI Different architectures, different energy footprint depending on the

computation

I Saving energy requires efficient resource usage

I Structured parallel application development methodology

I Divide work according to energy consumption, not just time!

We selected a known, widely used data-parallel pattern

Map parallel pattern

I Map: higher order function, taking function f and collection A

I f applied in parallel on different partitionsI suitable for both CPU and GPU architectures

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 5 / 23

Scenario

Parallel, heterogeneous architectures everywhere!I Even low-end computers have multiple CPU cores and a GPUI Different architectures, different energy footprint depending on the

computation

I Saving energy requires efficient resource usage

I Structured parallel application development methodology

I Divide work according to energy consumption, not just time!

We selected a known, widely used data-parallel pattern

Map parallel pattern

I Map: higher order function, taking function f and collection A

I f applied in parallel on different partitionsI suitable for both CPU and GPU architectures

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 5 / 23

Scenario

Parallel, heterogeneous architectures everywhere!I Even low-end computers have multiple CPU cores and a GPUI Different architectures, different energy footprint depending on the

computation

I Saving energy requires efficient resource usageI Structured parallel application development methodology

I Divide work according to energy consumption, not just time!

We selected a known, widely used data-parallel pattern

Map parallel pattern

I Map: higher order function, taking function f and collection A

I f applied in parallel on different partitionsI suitable for both CPU and GPU architectures

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 5 / 23

Scenario

Parallel, heterogeneous architectures everywhere!I Even low-end computers have multiple CPU cores and a GPUI Different architectures, different energy footprint depending on the

computation

I Saving energy requires efficient resource usageI Structured parallel application development methodology

I Divide work according to energy consumption, not just time!

We selected a known, widely used data-parallel pattern

Map parallel pattern

I Map: higher order function, taking function f and collection A

I f applied in parallel on different partitionsI suitable for both CPU and GPU architectures

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 5 / 23

Scenario

Parallel, heterogeneous architectures everywhere!I Even low-end computers have multiple CPU cores and a GPUI Different architectures, different energy footprint depending on the

computation

I Saving energy requires efficient resource usageI Structured parallel application development methodology

I Divide work according to energy consumption, not just time!

We selected a known, widely used data-parallel pattern

Map parallel pattern

I Map: higher order function, taking function f and collection A

I f applied in parallel on different partitionsI suitable for both CPU and GPU architectures

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 5 / 23

Scenario

Parallel, heterogeneous architectures everywhere!I Even low-end computers have multiple CPU cores and a GPUI Different architectures, different energy footprint depending on the

computation

I Saving energy requires efficient resource usageI Structured parallel application development methodology

I Divide work according to energy consumption, not just time!

We selected a known, widely used data-parallel pattern

Map parallel pattern

I Map: higher order function, taking function f and collection A

I f applied in parallel on different partitionsI suitable for both CPU and GPU architectures

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 5 / 23

Scenario

Parallel, heterogeneous architectures everywhere!I Even low-end computers have multiple CPU cores and a GPUI Different architectures, different energy footprint depending on the

computation

I Saving energy requires efficient resource usageI Structured parallel application development methodology

I Divide work according to energy consumption, not just time!

We selected a known, widely used data-parallel pattern

Map parallel patternI Map: higher order function, taking function f and collection A

I f applied in parallel on different partitionsI suitable for both CPU and GPU architectures

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 5 / 23

Scenario

Parallel, heterogeneous architectures everywhere!I Even low-end computers have multiple CPU cores and a GPUI Different architectures, different energy footprint depending on the

computation

I Saving energy requires efficient resource usageI Structured parallel application development methodology

I Divide work according to energy consumption, not just time!

We selected a known, widely used data-parallel pattern

Map parallel patternI Map: higher order function, taking function f and collection A

I f applied in parallel on different partitions

I suitable for both CPU and GPU architectures

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 5 / 23

Scenario

Parallel, heterogeneous architectures everywhere!I Even low-end computers have multiple CPU cores and a GPUI Different architectures, different energy footprint depending on the

computation

I Saving energy requires efficient resource usageI Structured parallel application development methodology

I Divide work according to energy consumption, not just time!

We selected a known, widely used data-parallel pattern

Map parallel patternI Map: higher order function, taking function f and collection A

I f applied in parallel on different partitionsI suitable for both CPU and GPU architectures

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 5 / 23

Scenario

Parallel, heterogeneous architectures everywhere!I Even low-end computers have multiple CPU cores and a GPUI Different architectures, different energy footprint depending on the

computation

I Saving energy requires efficient resource usageI Structured parallel application development methodology

I Divide work according to energy consumption, not just time!

We selected a known, widely used data-parallel pattern

Map parallel patternI Map: higher order function, taking function f and collection A

I f applied in parallel on different partitionsI suitable for both CPU and GPU architectures

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 5 / 23

Problem

ProblemFind a model predicting the energy consumption of different map parallelapplications executed on both CPU and GPU.

Why modelling energy?I Divide tasks properly between CPU and GPU

I Select parallelism degree on both devicesI Autonomic manager: allows to respond to non-functional concernsI Model for the programmerI Aim: Save energy!

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 6 / 23

Problem

ProblemFind a model predicting the energy consumption of different map parallelapplications executed on both CPU and GPU.

Why modelling energy?I Divide tasks properly between CPU and GPUI Select parallelism degree on both devices

I Autonomic manager: allows to respond to non-functional concernsI Model for the programmerI Aim: Save energy!

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 6 / 23

Problem

ProblemFind a model predicting the energy consumption of different map parallelapplications executed on both CPU and GPU.

Why modelling energy?I Divide tasks properly between CPU and GPUI Select parallelism degree on both devicesI Autonomic manager: allows to respond to non-functional concerns

I Model for the programmerI Aim: Save energy!

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 6 / 23

Problem

ProblemFind a model predicting the energy consumption of different map parallelapplications executed on both CPU and GPU.

Why modelling energy?I Divide tasks properly between CPU and GPUI Select parallelism degree on both devicesI Autonomic manager: allows to respond to non-functional concernsI Model for the programmer

I Aim: Save energy!

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 6 / 23

Problem

ProblemFind a model predicting the energy consumption of different map parallelapplications executed on both CPU and GPU.

Why modelling energy?I Divide tasks properly between CPU and GPUI Select parallelism degree on both devicesI Autonomic manager: allows to respond to non-functional concernsI Model for the programmerI Aim: Save energy!

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 6 / 23

Problem

ProblemFind a model predicting the energy consumption of different map parallelapplications executed on both CPU and GPU.

Why modelling energy?I Divide tasks properly between CPU and GPUI Select parallelism degree on both devicesI Autonomic manager: allows to respond to non-functional concernsI Model for the programmerI Aim: Save energy!

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 6 / 23

Dividing map tasks between CPU/GPU cores

Partitioning between CPU and GPU; GPU part transferred through PCIebus

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 7 / 23

Dividing map tasks between CPU/GPU cores

Within the device, data is partitioned among Streaming Multiprocessors(SM)

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 7 / 23

Dividing map tasks between CPU/GPU cores

In a single SM, f is applied on the collection using threads, scheduled inbatch of 32 on CUDA cores

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 7 / 23

Dividing map tasks between CPU/GPU cores

On the host side, the collection is partitioned between the cores, that applyf in parallel

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 7 / 23

Dividing map tasks between CPU/GPU cores

Result is transferred back from GPU to host memory, obtaining the finalresult

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 7 / 23

Energy consumption of a map computation

E =∫ tend

0P(t)dt

I P(t) depends on computation and used resources (parallelism degree)

I Divide energy depending on phase, where P(t) constant

I Phases:

computing (host and device), communication and waiting

For a single phase: E = P×Tphase

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 8 / 23

Energy consumption of a map computation

E =∫ tend

0P(t)dt

I P(t) depends on computation and used resources (parallelism degree)

I Divide energy depending on phase, where P(t) constant

I Phases:

computing (host and device), communication and waiting

For a single phase: E = P×Tphase

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 8 / 23

Energy consumption of a map computation

E =∫ tend

0P(t)dt

I P(t) depends on computation and used resources (parallelism degree)

I Divide energy depending on phase, where P(t) constant

I Phases:

computing (host and device), communication and waiting

For a single phase: E = P×Tphase

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 8 / 23

Energy consumption of a map computation

E =∫ tend

0P(t)dt

I P(t) depends on computation and used resources (parallelism degree)

I Divide energy depending on phase, where P(t) constant

I Phases: computing (host and device),

communication and waiting

For a single phase: E = P×Tphase

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 8 / 23

Energy consumption of a map computation

E =∫ tend

0P(t)dt

I P(t) depends on computation and used resources (parallelism degree)

I Divide energy depending on phase, where P(t) constant

I Phases: computing (host and device), communication

and waiting

For a single phase: E = P×Tphase

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 8 / 23

Energy consumption of a map computation

E =∫ tend

0P(t)dt

I P(t) depends on computation and used resources (parallelism degree)

I Divide energy depending on phase, where P(t) constant

I Phases: computing (host and device), communication and waiting

For a single phase: E = P×Tphase

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 8 / 23

Energy consumption of a map computation

E =∫ tend

0P(t)dt

I P(t) depends on computation and used resources (parallelism degree)

I Divide energy depending on phase, where P(t) constant

I Phases: computing (host and device), communication and waiting

For a single phase: E = P×Tphase

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 8 / 23

Energy consumption of a map computation

E =∫ tend

0P(t)dt

I P(t) depends on computation and used resources (parallelism degree)

I Divide energy depending on phase, where P(t) constant

I Phases: computing (host and device), communication and waiting

For a single phase: E = P×Tphase

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 8 / 23

Section 2

GPU Energy Model

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 9 / 23

Predicting power with regression

Explanatory Variables: b, number of blocks (SM) used; w , number ofwarps (32 threads) for each bPower varies according to

√w for fixed b

P(b,w) = β0(b)+β1(b)√w

Cost of modellingFeasible only for repeated computations, otherwise too high.Time: at least 6 seconds Energy: about 1300 J

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 10 / 23

Predicting power with regression

Explanatory Variables: b, number of blocks (SM) used; w , number ofwarps (32 threads) for each bPower varies according to

√w for fixed b

P(b,w) = β0(b)+β1(b)√w

Cost of modellingFeasible only for repeated computations, otherwise too high.Time: at least 6 seconds Energy: about 1300 J

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 10 / 23

Predicting power with regression

Explanatory Variables: b, number of blocks (SM) used; w , number ofwarps (32 threads) for each bPower varies according to

√w for fixed b

P(b,w) = β0(b)+β1(b)√w

Cost of modellingFeasible only for repeated computations, otherwise too high.Time: at least 6 seconds Energy: about 1300 J

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 10 / 23

Predicting power with regression

Explanatory Variables: b, number of blocks (SM) used; w , number ofwarps (32 threads) for each bPower varies according to

√w for fixed b

P(b,w) = β0(b)+β1(b)√w

I Precise, but costly to calculateI Estimate β0(b) and β1(b) as

functions of b

I trade samples (energyconsumption) with precision

I 6 samples → error below 5.21%

Cost of modellingFeasible only for repeated computations, otherwise too high.Time: at least 6 seconds Energy: about 1300 J

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 10 / 23

Predicting power with regression

Explanatory Variables: b, number of blocks (SM) used; w , number ofwarps (32 threads) for each bPower varies according to

√w for fixed b

P(b,w) = β0(b)+β1(b)√w

I Precise, but costly to calculateI Estimate β0(b) and β1(b) as

functions of bI trade samples (energy

consumption) with precision

I 6 samples → error below 5.21%

Cost of modellingFeasible only for repeated computations, otherwise too high.Time: at least 6 seconds Energy: about 1300 J

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 10 / 23

Predicting power with regression

Explanatory Variables: b, number of blocks (SM) used; w , number ofwarps (32 threads) for each bPower varies according to

√w for fixed b

P(b,w) = β0(b)+β1(b)√w

I Precise, but costly to calculateI Estimate β0(b) and β1(b) as

functions of bI trade samples (energy

consumption) with precision

I 6 samples → error below 5.21%

Cost of modellingFeasible only for repeated computations, otherwise too high.Time: at least 6 seconds Energy: about 1300 J

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 10 / 23

Predicting power with regression

Explanatory Variables: b, number of blocks (SM) used; w , number ofwarps (32 threads) for each bPower varies according to

√w for fixed b

P(b,w) = β0(b)+β1(b)√w

I Precise, but costly to calculateI Estimate β0(b) and β1(b) as

functions of bI trade samples (energy

consumption) with precision

I 6 samples → error below 5.21%

Cost of modellingFeasible only for repeated computations, otherwise too high.Time: at least 6 seconds Energy: about 1300 J

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 10 / 23

Predicting power with regression

Explanatory Variables: b, number of blocks (SM) used; w , number ofwarps (32 threads) for each bPower varies according to

√w for fixed b

P(b,w) = β0(b)+β1(b)√w

I Precise, but costly to calculateI Estimate β0(b) and β1(b) as

functions of bI trade samples (energy

consumption) with precision

I 6 samples → error below 5.21%

Cost of modellingFeasible only for repeated computations, otherwise too high.Time: at least 6 seconds Energy: about 1300 J

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 10 / 23

Predicting power with regression

Explanatory Variables: b, number of blocks (SM) used; w , number ofwarps (32 threads) for each bPower varies according to

√w for fixed b

P(b,w) = β0(b)+β1(b)√w

I Precise, but costly to calculateI Estimate β0(b) and β1(b) as

functions of bI trade samples (energy

consumption) with precision

I 6 samples → error below 5.21%

Cost of modellingFeasible only for repeated computations, otherwise too high.Time: at least 6 seconds Energy: about 1300 J

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 10 / 23

Heuristic model for power estimation

IdeaI Similar behaviours for map on GPU

I Use a computation as metre to measureothers

I Study the metre M, use to predict theothers

We define αC , increment in power of C divided by increment of metre

αC (b,w) =PC (b,w)− (Pbase +Pblockb)

PM(b,w)− (Pbase +Pblockb)

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 11 / 23

Heuristic model for power estimation

IdeaI Similar behaviours for map on GPU

I Use a computation as metre to measureothers

I Study the metre M, use to predict theothers

60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Pow

er (W

)

Warps

Average Power for different computations using four blocks and varying the amount of warps on Nvidia K20c

Vector AddMatrix Add

Matrix Multiplication

We define αC , increment in power of C divided by increment of metre

αC (b,w) =PC (b,w)− (Pbase +Pblockb)

PM(b,w)− (Pbase +Pblockb)

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 11 / 23

Heuristic model for power estimation

IdeaI Similar behaviours for map on GPUI Use a computation as metre to measure

others

I Study the metre M, use to predict theothers

60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Pow

er (W

)

Warps

Average Power for different computations using four blocks and varying the amount of warps on Nvidia K20c

Vector AddMatrix Add

Matrix Multiplication

62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98

100 102 104 106 108 110 112

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Pow

er (W

)

Warps

Average Power for different computations using eight blocks and varying the amount of warps on Nvidia K20c

Vector AddMatrix Add

Matrix Multiplication

We define αC , increment in power of C divided by increment of metre

αC (b,w) =PC (b,w)− (Pbase +Pblockb)

PM(b,w)− (Pbase +Pblockb)

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 11 / 23

Heuristic model for power estimation

IdeaI Similar behaviours for map on GPUI Use a computation as metre to measure

othersI Study the metre M, use to predict the

others

60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Pow

er (W

)

Warps

Average Power for different computations using four blocks and varying the amount of warps on Nvidia K20c

Vector AddMatrix Add

Matrix Multiplication

62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98

100 102 104 106 108 110 112

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Pow

er (W

)

Warps

Average Power for different computations using eight blocks and varying the amount of warps on Nvidia K20c

Vector AddMatrix Add

Matrix Multiplication

We define αC , increment in power of C divided by increment of metre

αC (b,w) =PC (b,w)− (Pbase +Pblockb)

PM(b,w)− (Pbase +Pblockb)

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 11 / 23

Heuristic model for power estimation

IdeaI Similar behaviours for map on GPUI Use a computation as metre to measure

othersI Study the metre M, use to predict the

others

60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Pow

er (W

)

Warps

Average Power for different computations using four blocks and varying the amount of warps on Nvidia K20c

Vector AddMatrix Add

Matrix Multiplication

62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98

100 102 104 106 108 110 112

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Pow

er (W

)

Warps

Average Power for different computations using eight blocks and varying the amount of warps on Nvidia K20c

Vector AddMatrix Add

Matrix Multiplication

We define αC , increment in power of C divided by increment of metre

αC (b,w) =PC (b,w)− (Pbase +Pblockb)

PM(b,w)− (Pbase +Pblockb)

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 11 / 23

Heuristic model for power estimation

IdeaI Similar behaviours for map on GPUI Use a computation as metre to measure

othersI Study the metre M, use to predict the

others

60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Pow

er (W

)

Warps

Average Power for different computations using four blocks and varying the amount of warps on Nvidia K20c

Vector AddMatrix Add

Matrix Multiplication

62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98

100 102 104 106 108 110 112

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Pow

er (W

)

Warps

Average Power for different computations using eight blocks and varying the amount of warps on Nvidia K20c

Vector AddMatrix Add

Matrix Multiplication

We define αC , increment in power of C divided by increment of metre

αC (b,w) =PC (b,w)− (Pbase +Pblockb)

PM(b,w)− (Pbase +Pblockb)

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 11 / 23

Heuristic model for power estimation

IdeaI Similar behaviours for map on GPUI Use a computation as metre to measure

othersI Study the metre M, use to predict the

others

60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Pow

er (W

)

Warps

Average Power for different computations using four blocks and varying the amount of warps on Nvidia K20c

Vector AddMatrix Add

Matrix Multiplication

62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98

100 102 104 106 108 110 112

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Pow

er (W

)

Warps

Average Power for different computations using eight blocks and varying the amount of warps on Nvidia K20c

Vector AddMatrix Add

Matrix Multiplication

We define αC , increment in power of C divided by increment of metre

αC (b,w) =PC (b,w)− (Pbase +Pblockb)

PM(b,w)− (Pbase +Pblockb)

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 11 / 23

Heuristic model for power estimation

IdeaI Similar behaviours for map on GPUI Use a computation as metre to measure

othersI Study the metre M, use to predict the

others

60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Pow

er (W

)

Warps

Average Power for different computations using four blocks and varying the amount of warps on Nvidia K20c

Vector AddMatrix Add

Matrix Multiplication

62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98

100 102 104 106 108 110 112

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Pow

er (W

)

Warps

Average Power for different computations using eight blocks and varying the amount of warps on Nvidia K20c

Vector AddMatrix Add

Matrix Multiplication

We define αC , increment in power of C divided by increment of metre

αC (b,w) =PC (b,w)− (Pbase +Pblockb)

PM(b,w)− (Pbase +Pblockb)

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 11 / 23

Heuristic model for power estimation

IdeaI Similar behaviours for map on GPUI Use a computation as metre to measure

othersI Study the metre M, use to predict the

others

60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Pow

er (W

)

Warps

Average Power for different computations using four blocks and varying the amount of warps on Nvidia K20c

Vector AddMatrix Add

Matrix Multiplication

62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98

100 102 104 106 108 110 112

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Pow

er (W

)

Warps

Average Power for different computations using eight blocks and varying the amount of warps on Nvidia K20c

Vector AddMatrix Add

Matrix Multiplication

We define αC , increment in power of C divided by increment of metre

αC (b,w) =PC (b,w)− (Pbase +Pblockb)

PM(b,w)− (Pbase +Pblockb)

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 11 / 23

Heuristic model for power estimation

IdeaI Similar behaviours for map on GPUI Use a computation as metre to measure

othersI Study the metre M, use to predict the

others

60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Pow

er (W

)

Warps

Average Power for different computations using four blocks and varying the amount of warps on Nvidia K20c

Vector AddMatrix Add

Matrix Multiplication

62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98

100 102 104 106 108 110 112

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Pow

er (W

)

Warps

Average Power for different computations using eight blocks and varying the amount of warps on Nvidia K20c

Vector AddMatrix Add

Matrix Multiplication

We define αC , increment in power of C divided by increment of metre

αC (b,w) =PC (b,w)− (Pbase +Pblockb)

PM(b,w)− (Pbase +Pblockb)

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 11 / 23

Heuristic model for power estimation

IdeaI Similar behaviours for map on GPUI Use a computation as metre to measure

othersI Study the metre M, use to predict the

others

60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Pow

er (W

)

Warps

Average Power for different computations using four blocks and varying the amount of warps on Nvidia K20c

Vector AddMatrix Add

Matrix Multiplication

62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98

100 102 104 106 108 110 112

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Pow

er (W

)

Warps

Average Power for different computations using eight blocks and varying the amount of warps on Nvidia K20c

Vector AddMatrix Add

Matrix Multiplication

We define αC , increment in power of C divided by increment of metre

αC (b,w) =PC (b,w)− (Pbase +Pblockb)

PM(b,w)− (Pbase +Pblockb)

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 11 / 23

αC analysis and power estimator

I αC concentrated near αC

I over more than 2000 samples, 87% ofvalues of αC within ±1 from the average

Power predictor using αC

PC (b,w) = Pbase +αC (Pwarp+P ′warpb)√w

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 12 / 23

αC analysis and power estimator

I αC concentrated near αC

I over more than 2000 samples, 87% ofvalues of αC within ±1 from the average

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4253997603

0.9211608897

1.4169220192

1.9126831486

2.408444278

2.9042054075

3.3999665369

3.8957276664

4.3914887958

4.8872499252

5.3830110547

Freq

uenc

y

values

Empirical distribution of alpha for vector addition

Empirical frequency

Power predictor using αC

PC (b,w) = Pbase +αC (Pwarp+P ′warpb)√w

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 12 / 23

αC analysis and power estimator

I αC concentrated near αC

I over more than 2000 samples, 87% ofvalues of αC within ±1 from the average

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4253997603

0.9211608897

1.4169220192

1.9126831486

2.408444278

2.9042054075

3.3999665369

3.8957276664

4.3914887958

4.8872499252

5.3830110547

Freq

uenc

y

values

Empirical distribution of alpha for vector addition

Empirical frequency

Power predictor using αC

PC (b,w) = Pbase +αC (Pwarp+P ′warpb)√w

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 12 / 23

αC analysis and power estimator

I αC concentrated near αC

I over more than 2000 samples, 87% ofvalues of αC within ±1 from the average

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4253997603

0.9211608897

1.4169220192

1.9126831486

2.408444278

2.9042054075

3.3999665369

3.8957276664

4.3914887958

4.8872499252

5.3830110547

Freq

uenc

y

values

Empirical distribution of alpha for vector addition

Empirical frequency

Power predictor using αC

PC (b,w) = Pbase +αC (Pwarp+P ′warpb)√w

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 12 / 23

Validation of the heuristic

Heuristic is convenient if a single sample returns accurate estimation. Wecan model power with low energy footprint!

Validation

I Sample power for 500 random (b, w) pairs, calculate αC (b,w)

I try to predict power for random (b’, w’) using the achieved αC (b,w)

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 13 / 23

Validation of the heuristic

Heuristic is convenient if a single sample returns accurate estimation. Wecan model power with low energy footprint!

ValidationI Sample power for 500 random (b, w) pairs, calculate αC (b,w)

I try to predict power for random (b’, w’) using the achieved αC (b,w)

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 13 / 23

Validation of the heuristic

Heuristic is convenient if a single sample returns accurate estimation. Wecan model power with low energy footprint!

ValidationI Sample power for 500 random (b, w) pairs, calculate αC (b,w)

I try to predict power for random (b’, w’) using the achieved αC (b,w)

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 13 / 23

Validation of the heuristic

Heuristic is convenient if a single sample returns accurate estimation. Wecan model power with low energy footprint!

ValidationI Sample power for 500 random (b, w) pairs, calculate αC (b,w)

I try to predict power for random (b’, w’) using the achieved αC (b,w)

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11

0.12

0.13

0.14

0.15

-15.81%

-14.48%

-13.15%

-11.82%

-10.49%

-9.16%-7.84%

-6.51%-5.18%

-3.85%-2.52%

-1.19%0.13%

1.46%2.79%

4.12%5.45%

6.78%8.10%

9.43%10.76%

12.09%

13.42%

14.74%

16.07%

17.40%

18.73%

20.06%

21.39%

22.71%

24.04%

25.37%

26.70%

28.03%

29.36%

30.68%

32.01%

33.34%

34.67%

36.00%

37.32%

Erro

r pro

babi

lity

Error

Error distribution using the heuristic over 500 samples. Nvidia K40m

Error distribution

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 13 / 23

Validation of the heuristic

Heuristic is convenient if a single sample returns accurate estimation. Wecan model power with low energy footprint!

ValidationI Sample power for 500 random (b, w) pairs, calculate αC (b,w)

I try to predict power for random (b’, w’) using the achieved αC (b,w)

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0.11

0.12

0.13

0.14

0.15

-15.81%

-14.48%

-13.15%

-11.82%

-10.49%

-9.16%-7.84%

-6.51%-5.18%

-3.85%-2.52%

-1.19%0.13%

1.46%2.79%

4.12%5.45%

6.78%8.10%

9.43%10.76%

12.09%

13.42%

14.74%

16.07%

17.40%

18.73%

20.06%

21.39%

22.71%

24.04%

25.37%

26.70%

28.03%

29.36%

30.68%

32.01%

33.34%

34.67%

36.00%

37.32%

Erro

r pro

babi

lity

Error

Error distribution using the heuristic over 500 samples. Nvidia K40m

Error distribution

Less than 10% error with high probability (over 85% of the cases), lessthan 15% error with probability over 95%!Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 13 / 23

Estimating GPU energy consumption

Energy = PC (b,w)×T (b,w)

T (b,w) = d Nb×w×32e×TGPU

f ' T (1,1)b×w

I Error below 15% using a single sample with over 95% probability!

I Same for other computations and devices until threads map to CUDA cores.

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 14 / 23

Estimating GPU energy consumption

Energy = PC (b,w)×T (b,w)

T (b,w) = d Nb×w×32e×TGPU

f ' T (1,1)b×w

I Error below 15% using a single sample with over 95% probability!

I Same for other computations and devices until threads map to CUDA cores.

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 14 / 23

Estimating GPU energy consumption

Energy = PC (b,w)×T (b,w)

T (b,w) = d Nb×w×32e×TGPU

f ' T (1,1)b×w

0 0.01

0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.1 0.11 0.12

0.13 0.14 0.15 0.16

-14.82%

-12.34%

-9.86%-7.38%

-4.90%-2.43%

0.05%2.53%

5.01%7.49%

9.96%12.44%

14.92%

17.40%

19.87%

22.35%

24.83%

27.31%

29.79%

32.26%

34.74%

Erro

r pro

babi

lity

Error

Error in energy estimation distribution, using the heuristic over 500 samples. Nvidia K40m

Error distribution

I Error below 15% using a single sample with over 95% probability!

I Same for other computations and devices until threads map to CUDA cores.

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 14 / 23

Estimating GPU energy consumption

Energy = PC (b,w)×T (b,w)

T (b,w) = d Nb×w×32e×TGPU

f ' T (1,1)b×w

0 0.01

0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.1 0.11 0.12

0.13 0.14 0.15 0.16

-14.82%

-12.34%

-9.86%-7.38%

-4.90%-2.43%

0.05%2.53%

5.01%7.49%

9.96%12.44%

14.92%

17.40%

19.87%

22.35%

24.83%

27.31%

29.79%

32.26%

34.74%

Erro

r pro

babi

lity

Error

Error in energy estimation distribution, using the heuristic over 500 samples. Nvidia K40m

Error distribution

I Error below 15% using a single sample with over 95% probability!

I Same for other computations and devices until threads map to CUDA cores.

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 14 / 23

Estimating GPU energy consumption

Energy = PC (b,w)×T (b,w)

T (b,w) = d Nb×w×32e×TGPU

f ' T (1,1)b×w

0 0.01

0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.1 0.11 0.12

0.13 0.14 0.15 0.16

-14.82%

-12.34%

-9.86%-7.38%

-4.90%-2.43%

0.05%2.53%

5.01%7.49%

9.96%12.44%

14.92%

17.40%

19.87%

22.35%

24.83%

27.31%

29.79%

32.26%

34.74%

Erro

r pro

babi

lity

Error

Error in energy estimation distribution, using the heuristic over 500 samples. Nvidia K40m

Error distribution

I Error below 15% using a single sample with over 95% probability!

I Same for other computations and devices until threads map to CUDA cores.

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 14 / 23

Section 3

CPU Energy Model

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 15 / 23

Predicting power of a map computation

Again: Energy = (Average) Power × TimeI Explanatory variable for power is n, number of cores usedI Power grows linearly in the number of workers used in the computation

40

50

60

70

80

90

100

110

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Pow

er (W

)

Workers

Average power varying the number of processing units allocated to different map computations

Vector AddMap Atan

Matrix add

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 16 / 23

Predicting power of a map computation

Again: Energy = (Average) Power × TimeI Explanatory variable for power is n, number of cores usedI Power grows linearly in the number of workers used in the computation

40

50

60

70

80

90

100

110

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Pow

er (W

)

Workers

Average power varying the number of processing units allocated to different map computations

Vector AddMap Atan

Matrix add

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 16 / 23

Energy consumption using regression

Power predictor (with regression):

Pc(n) = β0+β1×n

No need for an heuristic!

I More precise measures, no data transferI 2 executions with different parallelism degrees are enough to find

regression parametersI Negligible footprint (e.g.: ∼ 0.3s, ∼ 30J)

Energy Predictor:

Ec(n) = Pc(n)×dN

neTf ' Pc(n)×

T (1)n

Relative energy error below 5% with probability over 85%.

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 17 / 23

Energy consumption using regression

Power predictor (with regression):

Pc(n) = β0+β1×n

0

0.05

0.1

0.15

0.2

0.25

-3.68%-3.14%

-2.60%-2.06%

-1.52%-0.98%

-0.44%0.10%

0.64%1.18%

1.72%

Prob

abili

ty

Error

Error distribution for estimating cpu power consumption with regression

Empirical distributionN(mu, sigma)

No need for an heuristic!I More precise measures, no data transfer

I 2 executions with different parallelism degrees are enough to findregression parameters

I Negligible footprint (e.g.: ∼ 0.3s, ∼ 30J)

Energy Predictor:

Ec(n) = Pc(n)×dN

neTf ' Pc(n)×

T (1)n

Relative energy error below 5% with probability over 85%.

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 17 / 23

Energy consumption using regression

Power predictor (with regression):

Pc(n) = β0+β1×n

0

0.05

0.1

0.15

0.2

0.25

-3.68%-3.14%

-2.60%-2.06%

-1.52%-0.98%

-0.44%0.10%

0.64%1.18%

1.72%

Prob

abili

ty

Error

Error distribution for estimating cpu power consumption with regression

Empirical distributionN(mu, sigma)

No need for an heuristic!I More precise measures, no data transferI 2 executions with different parallelism degrees are enough to find

regression parameters

I Negligible footprint (e.g.: ∼ 0.3s, ∼ 30J)

Energy Predictor:

Ec(n) = Pc(n)×dN

neTf ' Pc(n)×

T (1)n

Relative energy error below 5% with probability over 85%.

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 17 / 23

Energy consumption using regression

Power predictor (with regression):

Pc(n) = β0+β1×n

0

0.05

0.1

0.15

0.2

0.25

-3.68%-3.14%

-2.60%-2.06%

-1.52%-0.98%

-0.44%0.10%

0.64%1.18%

1.72%

Prob

abili

ty

Error

Error distribution for estimating cpu power consumption with regression

Empirical distributionN(mu, sigma)

No need for an heuristic!I More precise measures, no data transferI 2 executions with different parallelism degrees are enough to find

regression parametersI Negligible footprint (e.g.: ∼ 0.3s, ∼ 30J)

Energy Predictor:

Ec(n) = Pc(n)×dN

neTf ' Pc(n)×

T (1)n

Relative energy error below 5% with probability over 85%.

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 17 / 23

Energy consumption using regression

Power predictor (with regression):

Pc(n) = β0+β1×n

0

0.05

0.1

0.15

0.2

0.25

-3.68%-3.14%

-2.60%-2.06%

-1.52%-0.98%

-0.44%0.10%

0.64%1.18%

1.72%

Prob

abili

ty

Error

Error distribution for estimating cpu power consumption with regression

Empirical distributionN(mu, sigma)

No need for an heuristic!I More precise measures, no data transferI 2 executions with different parallelism degrees are enough to find

regression parametersI Negligible footprint (e.g.: ∼ 0.3s, ∼ 30J)

Energy Predictor:

Ec(n) = Pc(n)×dN

neTf ' Pc(n)×

T (1)n

Relative energy error below 5% with probability over 85%.

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 17 / 23

Energy consumption using regression

Power predictor (with regression):

Pc(n) = β0+β1×n

0

0.05

0.1

0.15

0.2

0.25

-3.68%-3.14%

-2.60%-2.06%

-1.52%-0.98%

-0.44%0.10%

0.64%1.18%

1.72%

Prob

abili

ty

Error

Error distribution for estimating cpu power consumption with regression

Empirical distributionN(mu, sigma)

No need for an heuristic!I More precise measures, no data transferI 2 executions with different parallelism degrees are enough to find

regression parametersI Negligible footprint (e.g.: ∼ 0.3s, ∼ 30J)

Energy Predictor:

Ec(n) = Pc(n)×dN

neTf ' Pc(n)×

T (1)n

Relative energy error below 5% with probability over 85%.Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 17 / 23

Energy consumption using regression

Power predictor (with regression):

Pc(n) = β0+β1×n

0

0.05

0.1

0.15

0.2

0.25

-3.68%-3.14%

-2.60%-2.06%

-1.52%-0.98%

-0.44%0.10%

0.64%1.18%

1.72%

Prob

abili

ty

Error

Error distribution for estimating cpu power consumption with regression

Empirical distributionN(mu, sigma)

No need for an heuristic!I More precise measures, no data transferI 2 executions with different parallelism degrees are enough to find

regression parametersI Negligible footprint (e.g.: ∼ 0.3s, ∼ 30J)

Energy Predictor:

Ec(n) = Pc(n)×dN

neTf ' Pc(n)×

T (1)n

Relative energy error below 5% with probability over 85%.Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 17 / 23

Section 4

Aggregated model and usage

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 18 / 23

Aggregated Model

Consider n CPU Cores; b used blocks (SM) with w warps each and gfraction of task transferred to device. Then

E (n,b,w ,g) =PCPU(n)×TCPU(n,1−g)+ PGPU(b,w)×TGPU(b,w ,g)+

+2×Psend ×Tsend(g)+Estatic

Estatic minimized calculating g such that:

TCPU(n,1−g)' TGPU(b,w ,g)+2×Tsend(g)

Average energy cost of a task for (n, b, w):

E (n,b,w) = (1−g)PCPU(n)TCPUf +gPGPU(b,w)TGPU

f

To save energySelect (n,b,w) with minimum E (n,b,w) for executing!

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 19 / 23

Aggregated Model

Consider n CPU Cores; b used blocks (SM) with w warps each and gfraction of task transferred to device. Then

E (n,b,w ,g) =PCPU(n)×TCPU(n,1−g)+ PGPU(b,w)×TGPU(b,w ,g)+

+2×Psend ×Tsend(g)+Estatic

Estatic minimized calculating g such that:

TCPU(n,1−g)' TGPU(b,w ,g)+2×Tsend(g)

Average energy cost of a task for (n, b, w):

E (n,b,w) = (1−g)PCPU(n)TCPUf +gPGPU(b,w)TGPU

f

To save energySelect (n,b,w) with minimum E (n,b,w) for executing!

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 19 / 23

Aggregated Model

Consider n CPU Cores; b used blocks (SM) with w warps each and gfraction of task transferred to device. Then

E (n,b,w ,g) =PCPU(n)×TCPU(n,1−g)+ PGPU(b,w)×TGPU(b,w ,g)+

+2×Psend ×Tsend(g)+Estatic

Estatic minimized calculating g such that:

TCPU(n,1−g)' TGPU(b,w ,g)+2×Tsend(g)

Average energy cost of a task for (n, b, w):

E (n,b,w) = (1−g)PCPU(n)TCPUf +gPGPU(b,w)TGPU

f

To save energySelect (n,b,w) with minimum E (n,b,w) for executing!

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 19 / 23

Aggregated Model

Consider n CPU Cores; b used blocks (SM) with w warps each and gfraction of task transferred to device. Then

E (n,b,w ,g) =PCPU(n)×TCPU(n,1−g)+ PGPU(b,w)×TGPU(b,w ,g)+

+2×Psend ×Tsend(g)+Estatic

Estatic minimized calculating g such that:

TCPU(n,1−g)' TGPU(b,w ,g)+2×Tsend(g)

Average energy cost of a task for (n, b, w):

E (n,b,w) = (1−g)PCPU(n)TCPUf +gPGPU(b,w)TGPU

f

To save energySelect (n,b,w) with minimum E (n,b,w) for executing!

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 19 / 23

Aggregated Model

Consider n CPU Cores; b used blocks (SM) with w warps each and gfraction of task transferred to device. Then

E (n,b,w ,g) =PCPU(n)×TCPU(n,1−g)+ PGPU(b,w)×TGPU(b,w ,g)+

+2×Psend ×Tsend(g)+Estatic

Estatic minimized calculating g such that:

TCPU(n,1−g)' TGPU(b,w ,g)+2×Tsend(g)

Average energy cost of a task for (n, b, w):

E (n,b,w) = (1−g)PCPU(n)TCPUf +gPGPU(b,w)TGPU

f

To save energySelect (n,b,w) with minimum E (n,b,w) for executing!

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 19 / 23

Aggregated Model

Consider n CPU Cores; b used blocks (SM) with w warps each and gfraction of task transferred to device. Then

E (n,b,w ,g) =PCPU(n)×TCPU(n,1−g)+ PGPU(b,w)×TGPU(b,w ,g)+

+2×Psend ×Tsend(g)+Estatic

Estatic minimized calculating g such that:

TCPU(n,1−g)' TGPU(b,w ,g)+2×Tsend(g)

Average energy cost of a task for (n, b, w):

E (n,b,w) = (1−g)PCPU(n)TCPUf +gPGPU(b,w)TGPU

f

To save energySelect (n,b,w) with minimum E (n,b,w) for executing!

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 19 / 23

Saving energy - Matrix Multiplication

I Matrix multiplication: partition in rowsI Calculate 3 tasks (with n=1,2) on CPU to calculate regression

parameters

I Calculate 1 task on the GPU to calculate αMM

I Remove uninteresting configurations

I For each triple (n, b, w) calculate averageenergy for task (row)

I Select triple with minimum task energyconsumption as execution configuration

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 20 / 23

Saving energy - Matrix Multiplication

I Matrix multiplication: partition in rowsI Calculate 3 tasks (with n=1,2) on CPU to calculate regression

parametersI Calculate 1 task on the GPU to calculate αMM

I Remove uninteresting configurations

I For each triple (n, b, w) calculate averageenergy for task (row)

I Select triple with minimum task energyconsumption as execution configuration

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 20 / 23

Saving energy - Matrix Multiplication

I Matrix multiplication: partition in rowsI Calculate 3 tasks (with n=1,2) on CPU to calculate regression

parametersI Calculate 1 task on the GPU to calculate αMM

I Remove uninteresting configurations

I For each triple (n, b, w) calculate averageenergy for task (row)

I Select triple with minimum task energyconsumption as execution configuration

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 20 / 23

Saving energy - Matrix Multiplication

I Matrix multiplication: partition in rowsI Calculate 3 tasks (with n=1,2) on CPU to calculate regression

parametersI Calculate 1 task on the GPU to calculate αMM

I Remove uninteresting configurationsPower for vector addition, with different (b, w) parameters

Power

1 2 3 4 5 6 7 8 9 10 11 12 13Blocks 0 2

4 6

8 10

12 14

16 18

20 22

24 26

28 30

32

Warps 55 60 65 70 75 80 85 90 95

100 105 110 115 120 125 130 135 140

Power (W)

I For each triple (n, b, w) calculate averageenergy for task (row)

I Select triple with minimum task energyconsumption as execution configuration

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 20 / 23

Saving energy - Matrix Multiplication

I Matrix multiplication: partition in rowsI Calculate 3 tasks (with n=1,2) on CPU to calculate regression

parametersI Calculate 1 task on the GPU to calculate αMM

I Remove uninteresting configurations

I For each triple (n, b, w) calculate averageenergy for task (row)

I Select triple with minimum task energyconsumption as execution configuration

Power for vector addition, with different (b, w) parameters

Power

1 2 3 4 5 6 7 8 9 10 11 12 13Blocks 0 2

4 6

8 10

12 14

16 18

20 22

24 26

28 30

32

Warps 55 60 65 70 75 80 85 90 95

100 105 110 115 120 125 130 135 140

Power (W)

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 20 / 23

Saving energy - Matrix Multiplication

I Matrix multiplication: partition in rowsI Calculate 3 tasks (with n=1,2) on CPU to calculate regression

parametersI Calculate 1 task on the GPU to calculate αMM

I Remove uninteresting configurations

I For each triple (n, b, w) calculate averageenergy for task (row)

I Select triple with minimum task energyconsumption as execution configuration

Power for vector addition, with different (b, w) parameters

Power

1 2 3 4 5 6 7 8 9 10 11 12 13Blocks 0 2

4 6

8 10

12 14

16 18

20 22

24 26

28 30

32

Warps 55 60 65 70 75 80 85 90 95

100 105 110 115 120 125 130 135 140

Power (W)

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 20 / 23

Saving energy - Matrix Multiplication (results)

Execution parameters:

I n = 1

I b = 14, w = 32I g = 0.9931

For 9000x9000 matrix multiplication:Device Time EnergyCPU 25.79s 1035.15JGPU 27.01s 3064.32J

Total: 4147.92J

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 21 / 23

Saving energy - Matrix Multiplication (results)

Execution parameters:

I n = 1I b = 14, w = 32

I g = 0.9931

For 9000x9000 matrix multiplication:Device Time EnergyCPU 25.79s 1035.15JGPU 27.01s 3064.32J

Total: 4147.92J

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 21 / 23

Saving energy - Matrix Multiplication (results)

Execution parameters:

I n = 1I b = 14, w = 32I g = 0.9931

For 9000x9000 matrix multiplication:Device Time EnergyCPU 25.79s 1035.15JGPU 27.01s 3064.32J

Total: 4147.92J

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 21 / 23

Saving energy - Matrix Multiplication (results)

Execution parameters:

I n = 1I b = 14, w = 32I g = 0.9931

For 9000x9000 matrix multiplication:Device Time EnergyCPU 25.79s 1035.15JGPU 27.01s 3064.32J

Total: 4147.92J

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 21 / 23

Saving energy - Matrix Multiplication (results)

Execution parameters:

I n = 1I b = 14, w = 32I g = 0.9931

For 9000x9000 matrix multiplication:Device Time EnergyCPU 25.79s 1035.15JGPU 27.01s 3064.32J

Total: 4147.92J

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 21 / 23

Saving energy - Matrix Multiplication (results)

Execution parameters:

I n = 1I b = 14, w = 32I g = 0.9931

For 9000x9000 matrix multiplication:Device Time EnergyCPU 25.79s 1035.15JGPU 27.01s 3064.32J

Total: 4147.92J

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 21 / 23

Conclusions

I Structured parallel applications have similar energy footprint on CPUand GPU

I We used this similarity to develop accurate and efficient models forenergy prediction

I Using the models we can save 14% energy w.r.t time optimized, 26%w.r.t only GPU

I We also devised a methodology to develop energy consumption models

Future Works:

I Enhance (time) model: consider the case in which threads are morethan physical units

I Develop other models: study other data-parallel patterns (reduce,scan)

I Practical usage: development of an autonomic manager forminimizing energy consumption

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 22 / 23

Conclusions

I Structured parallel applications have similar energy footprint on CPUand GPU

I We used this similarity to develop accurate and efficient models forenergy prediction

I Using the models we can save 14% energy w.r.t time optimized, 26%w.r.t only GPU

I We also devised a methodology to develop energy consumption models

Future Works:

I Enhance (time) model: consider the case in which threads are morethan physical units

I Develop other models: study other data-parallel patterns (reduce,scan)

I Practical usage: development of an autonomic manager forminimizing energy consumption

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 22 / 23

Conclusions

I Structured parallel applications have similar energy footprint on CPUand GPU

I We used this similarity to develop accurate and efficient models forenergy prediction

I Using the models we can save 14% energy w.r.t time optimized, 26%w.r.t only GPU

I We also devised a methodology to develop energy consumption models

Future Works:

I Enhance (time) model: consider the case in which threads are morethan physical units

I Develop other models: study other data-parallel patterns (reduce,scan)

I Practical usage: development of an autonomic manager forminimizing energy consumption

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 22 / 23

Conclusions

I Structured parallel applications have similar energy footprint on CPUand GPU

I We used this similarity to develop accurate and efficient models forenergy prediction

I Using the models we can save 14% energy w.r.t time optimized, 26%w.r.t only GPU

I We also devised a methodology to develop energy consumption models

Future Works:

I Enhance (time) model: consider the case in which threads are morethan physical units

I Develop other models: study other data-parallel patterns (reduce,scan)

I Practical usage: development of an autonomic manager forminimizing energy consumption

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 22 / 23

Conclusions

I Structured parallel applications have similar energy footprint on CPUand GPU

I We used this similarity to develop accurate and efficient models forenergy prediction

I Using the models we can save 14% energy w.r.t time optimized, 26%w.r.t only GPU

I We also devised a methodology to develop energy consumption models

Future Works:

I Enhance (time) model: consider the case in which threads are morethan physical units

I Develop other models: study other data-parallel patterns (reduce,scan)

I Practical usage: development of an autonomic manager forminimizing energy consumption

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 22 / 23

Conclusions

I Structured parallel applications have similar energy footprint on CPUand GPU

I We used this similarity to develop accurate and efficient models forenergy prediction

I Using the models we can save 14% energy w.r.t time optimized, 26%w.r.t only GPU

I We also devised a methodology to develop energy consumption models

Future Works:I Enhance (time) model: consider the case in which threads are more

than physical units

I Develop other models: study other data-parallel patterns (reduce,scan)

I Practical usage: development of an autonomic manager forminimizing energy consumption

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 22 / 23

Conclusions

I Structured parallel applications have similar energy footprint on CPUand GPU

I We used this similarity to develop accurate and efficient models forenergy prediction

I Using the models we can save 14% energy w.r.t time optimized, 26%w.r.t only GPU

I We also devised a methodology to develop energy consumption models

Future Works:I Enhance (time) model: consider the case in which threads are more

than physical unitsI Develop other models: study other data-parallel patterns (reduce,

scan)

I Practical usage: development of an autonomic manager forminimizing energy consumption

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 22 / 23

Conclusions

I Structured parallel applications have similar energy footprint on CPUand GPU

I We used this similarity to develop accurate and efficient models forenergy prediction

I Using the models we can save 14% energy w.r.t time optimized, 26%w.r.t only GPU

I We also devised a methodology to develop energy consumption models

Future Works:I Enhance (time) model: consider the case in which threads are more

than physical unitsI Develop other models: study other data-parallel patterns (reduce,

scan)I Practical usage: development of an autonomic manager for

minimizing energy consumption

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 22 / 23

Conclusions

I Structured parallel applications have similar energy footprint on CPUand GPU

I We used this similarity to develop accurate and efficient models forenergy prediction

I Using the models we can save 14% energy w.r.t time optimized, 26%w.r.t only GPU

I We also devised a methodology to develop energy consumption models

Future Works:I Enhance (time) model: consider the case in which threads are more

than physical unitsI Develop other models: study other data-parallel patterns (reduce,

scan)I Practical usage: development of an autonomic manager for

minimizing energy consumption

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 22 / 23

Thank you!Questions?

Alessandro Lenzi Energy models in data parallel CPU/GPU computations December 4, 2015 23 / 23

Recommended