43
Ziliang Zong, Texas State University Adam Manzanares, Los Alamos National Lab Xiao Qin, Auburn University Energy Efficient Scheduling for High- Performance Clusters

Energy efficient resource management for high-performance clusters

Embed Size (px)

DESCRIPTION

In the past decade, high-performance cluster computing platforms have been widely used to solve challenging and rigorous engineering tasks in industry and scientific applications. Due to extremely high energy cost,reducing energy consumption has become a major concern in designing economical and environmentally friendly cluster computing infrastructures for many high-performance applications. The primary focus of this talk is to illustrate how to improve energy efficiency of clusters and storage systems without significantly degrading performance. In this talk, we will first describe a general architecture for building energy-efficient cluster computing platforms. Then, we will outline several energyefficient scheduling algorithms designed for high-performance clusters and large-scale storage systems. The experimental results using both synthetic and real world applications show that energy dissipation in clusters can be reduced with a marginal degradation of system performance.

Citation preview

Page 1: Energy efficient resource management for high-performance clusters

Ziliang Zong, Texas State University Adam Manzanares, Los Alamos National Lab Xiao Qin, Auburn University

Energy Efficient Scheduling for High-Performance Clusters

Page 2: Energy efficient resource management for high-performance clusters

Where is Auburn University?Ph.D.’04, U. of Nebraska-Lincoln

04-07, New Mexico Tech 07-now, Auburn University

Page 3: Energy efficient resource management for high-performance clusters

404/08/2023

Storage Systems Research Group at Auburn (2008)

Page 4: Energy efficient resource management for high-performance clusters

604/08/2023

Storage Systems Research Group at Auburn (2011)

Page 5: Energy efficient resource management for high-performance clusters

Investigators

04/08/2023 7

Ziliang Zong, Ph.D. Assistant Professor,

Texas State University

Adam Manzanares, Ph.D. Candidate Los Alamos National Lab

Xiao Qin, Ph.D. Associate Professor

Auburn University

Page 6: Energy efficient resource management for high-performance clusters

804/08/2023

Introduction - Applications

Page 7: Energy efficient resource management for high-performance clusters

Introduction – Data Centers

04/08/2023 9

Page 8: Energy efficient resource management for high-performance clusters

Motivation – Electricity Usage

04/08/2023 10

EPA Report to Congress on Server and Data Center Energy Efficiency, 2007

Page 9: Energy efficient resource management for high-performance clusters

Motivation – Energy Projections

04/08/2023 11

EPA Report to Congress on Server and Data Center Energy Efficiency, 2007

Page 10: Energy efficient resource management for high-performance clusters

Motivation – Design Issues

04/08/2023 12

Energy Efficiency

Performance

Reliability&Security

Page 11: Energy efficient resource management for high-performance clusters

Architecture – Multiple Layers

04/08/2023 13

Page 12: Energy efficient resource management for high-performance clusters

Energy Efficient Devices

04/08/2023 14

Page 13: Energy efficient resource management for high-performance clusters

Multiple Design Goals

04/08/2023 15

Performance Energy Efficiency

Reliability

Security

High-Performance Computing Platforms

Page 14: Energy efficient resource management for high-performance clusters

Energy-Aware Scheduling for Clusters

04/08/2023 16

Page 15: Energy efficient resource management for high-performance clusters

Parallel Applications

04/08/2023 17

1

2 3 4

5 6 7

8 9

10

3

3

4

2

1020

75

8

3 3

3

33

42

1 1010

20

57

1

Entry Task

Exit Task

Page 16: Energy efficient resource management for high-performance clusters

Motivational Example

04/08/2023 18

81

2 3

4

6 5

10 15

2 4

6

An Example of duplication

Linear Schedule Time: 39s

No Duplication Schedule (NDS)

T10 8

T323

T233

T439

Time: 32s

Task Duplication Schedule (TDS) Time: 29s

T10 8

T218

2

T10 8

T323

T42920

T1

0 8

T323

T26

2414

2

26

T432

Page 17: Energy efficient resource management for high-performance clusters

Motivational Example (cont.)

04/08/2023 19

T1

0 8

T323

T26

2414

2

26

T432

T10 8

T218

2

T10 8

T323

T42920

An Example of duplication

Linear Schedule Time:39s Energy: 234J

No Duplication Schedule (MCP)

Task Duplication Schedule (TDS)

T10 8

T323

T233

T439

Time: 32s Energy: 242J

Time: 29s Energy: 284J

CPU_Energy=6W

Network_Energy=1W

(10,60)

(8,48)

1

2 3

4

(6,6)

(5,5)

(15,90)

(2,2)

(4,4)

(6,36)

Page 18: Energy efficient resource management for high-performance clusters

Motivational Example (cont.)

04/08/2023 20

(10,60)

(8,48)

1

2 3

4

(6,6)

(5,5)

(15,90)

(2,2)

(4,4)

(6,36)

The energy cost of duplicating T1:

CPU side: 48J Network side: -6J Total: 42J

The performance benefit of duplicating T1: 6s

Energy-performance tradeoff: 42/6 = 7

T1

0 8

T323

T26

2414

2

26

T432

T10 8

T218

2

T10 8

T323

T42920

EAD

PEBD

Time: 32s Energy: 242J

Time: 29s Energy: 284JIf Threshold = 10

Duplicate T1?

EAD: NO

PEBD: Yes

Page 19: Energy efficient resource management for high-performance clusters

Basic Steps of Energy-Aware Scheduling

04/08/2023 21

Task Description:

Task Set {T1, T2, …, T9, T10 }

T1 is the entry task;T10 is the exit task;T2, T3 and T4 can not start until T1 finished;T5 and T6 can not start until T2 finished;T7 can not start until both T3 and T4 finished;T8 can not start until both T5 and T6 finished;T9 can not start until both T6 and T7 finished;T10 can not start until both T8 and T9 finished;

1

2 3 4

5 6 7

8 9

10

3

3

4

2

1020

75

8

3 3

3

33

42

1 1010

20

57

1

Entry Task

Exit Task

Step 1: DAG Generation

Algorithm Implementation:

Page 20: Energy efficient resource management for high-performance clusters

Basic Steps of Energy-Aware Scheduling

Task Level EST ECT LAST LACT FP

1 40 0 3 0 3 --

2 28 3 6 4 7 1

3 37 3 7 3 7 1

4 35 3 5 3 5 1

5 16 6 7 16 17 2

6 25 6 16 7 17 2

7 33 7 27 7 27 3

8 15 16 23 18 25 6

9 13 27 32 27 32 7

10 8 32 40 32 40 9

04/08/2023 22

Step 2: Parameters Calculation

Algorithm Implementation:

1

2 3 4

5 6 7

8 9

10

3

3

4

2

1020

75

8

3 3

3

33

42

1 1010

20

57

1

Entry Task

Exit Task

Total Execution time from current task to the exit task

Earliest Start Time

Earliest Completion Time

Latest Allowable Start Time

Latest Allowable Completion Time

Favorite Predecessor

Page 21: Energy efficient resource management for high-performance clusters

Basic Steps of Energy-Aware Scheduling

Task Level EST ECT LAST LACT FP

1 40 0 3 0 3 --

2 28 3 6 4 7 1

3 37 3 7 3 7 1

4 35 3 5 3 5 1

5 16 6 7 16 17 2

6 25 6 16 7 17 2

7 33 7 27 7 27 3

8 15 16 23 18 25 6

9 13 27 32 27 32 7

10 8 32 40 32 40 9

04/08/2023 23

Step 3: Scheduling

Algorithm Implementation:

1

2 3 4

5 6 7

8 9

10

3

3

4

2

1020

75

8

3 3

3

33

42

1 1010

20

57

1

Entry Task

Exit Task

Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1} Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1} Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1} Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1} Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1}

Page 22: Energy efficient resource management for high-performance clusters

Basic Steps of Energy-Aware Scheduling

04/08/2023 24

Step 4: Duplication Decision

Algorithm Implementation:

1

2 3 4

5 6 7

8 9

10

3

3

4

2

1020

75

8

3 3

3

33

42

1 1010

20

57

1

Entry Task

Exit Task

Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1} Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1} Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1} Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1} Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1}

Decision 1: Duplicate T1?

Decision 2: Duplicate T2? Duplicate T1?

Decision 3: Duplicate T1?

Page 23: Energy efficient resource management for high-performance clusters

The EAD and PEBD Algorithms

04/08/2023 25

Generate the DAG of given task sets

Find all the critical paths in DAG

Generate scheduling queue based on the level (ascending)

select the task (has not been scheduled yet) with the lowest level as starting task

For each task which is in the same critical path with starting task, check

if it is already scheduled

allocate it to the same processor with the tasks in

the same critical pathYes

No

mee

t ent

ry ta

sk

Save time if duplicate this task?

Yes

Calculate energy increase

and time decrease

Ratio= energy increase/ time decrease

Ratio<=Threshold?No

Yes

Duplicate this task and select the next task in the same

critical path

Calculate energy increase

more_energy<=Threshold?

Duplicate this task and select the next task in the same

critical path

Yes

No

No

PEBD EAD

Page 24: Energy efficient resource management for high-performance clusters

Energy Dissipation in Processors

04/08/2023 26

http://www.xbitlabs.com

Page 25: Energy efficient resource management for high-performance clusters

Parallel Scientific Applications

04/08/2023 27

T1

T2 T3

T4 T5 T6 T7

T8 T9 T10 T11

T12 T13 T14 T15

T1

T2 T3 T4 T5 T6

T7

T8 T9 T10 T11

T12

T13 T14 T15

T16

T17 T18

Fast Fourier Transform Gaussian Elimination

Page 26: Energy efficient resource management for high-performance clusters

Large-Scale Parallel Applications

04/08/2023 28

Robot Control Sparse Matrix Solver

http://www.kasahara.elec.waseda.ac.jp/schedule/

Page 27: Energy efficient resource management for high-performance clusters

Impact of CPU Power Dissipation

04/08/2023 29

EAD PEBD TDS MCP0

5000

10000

15000

20000

25000

30000

35000

40000Total Energy Consumption Athlon

4600+ 85W

Athlon 4600+ 65W

Athlon 3800+ 35W

Intel Core2 Duo E6300

Ener

gy (J

oul)

Impact of CPU Types:

Energy consumption for different processors (Gaussian, CCR=0.4)

EAD PEBD TDS MCP0

5000

10000

15000

20000

25000

30000

35000

40000Total Energy Consumption Athlon

4600+ 85W

Athlon 4600+ 65W

Athlon 3800+ 35W

Intel Core2 Duo E6300

Ener

gy (J

oul)

Energy consumption for different processors (FFT, CCR=0.4)

19.4% 3.7%

CPU Type Power (busy) Power (idle) Gap

104w 15w 89w

75w 14w 61w

47w 11w 36w

44w 26w 18w

Observation: CPUs with large gap between CPU_busy and CPU_idle can obtain greater energy savings

Page 28: Energy efficient resource management for high-performance clusters

Impact of Interconnect Power Dissipation

04/08/2023 30

Impact of Interconnection Types:

0.1 0.5 1 5 100

200000

400000

600000

800000

1000000

1200000

1400000Total Energy Consumption

TDS

EAD

PEBD

MCP

Ener

gy (J

oul)

0.1 0.5 1 5 100

200000400000600000800000

10000001200000140000016000001800000

Total Energy Consumption

TDS

EAD

PEBD

MCP

Ener

gy (J

oul)

Energy consumption (Robot Control, Myrinet) Energy consumption (Robot Control, Infiniband)

16.7% 5%

Interconnection Power

Myrinet 33.6w

Infiniband 65w

Observation: The energy saving of EAD and PEBD is degraded if the interconnection has high power consumption rate.

13.3% 3.1%

Page 29: Energy efficient resource management for high-performance clusters

Parallelism Degrees

04/08/2023 31

Impact of Application Parallelism:

0.1 0.5 1 5 100

100000200000300000400000500000600000700000800000900000

1000000Total Energy Consumption

TDS

EADUS

TEBUS

NDS

Ener

gy (J

oul)

Energy consumption of Robert Control(Myrinet) Energy consumption of Sparse Matrix (Myrinet)

Application Parallelism

Robot Control 4.363796

Sparse Matrix Solver 15.868853

Observation: Robert Control has more task dependencies thus there exists more possibility for EAD and PEBD to consume energy by judiciously duplicating tasks.

17% 15.8%6.9% 5.4%

Page 30: Energy efficient resource management for high-performance clusters

Communication-Computation Ratio

04/08/2023 32

Impact of CCR:

Energy consumption under different CCRs

Processor type: Athlon 3800+ 35WInterconnection: MyrinetSimualated Application: Robot ControlCCR: (0.1, 0.5, 1, 5, 10)

Observation:

The overall energy consumption of EAD and PEBD are less than MCP and TDS.

EAD and PEBD are very sensitive to CCR

MCP provides the greatest energy savings if CCR is less than 1

MCP consumes much more energy when CCR is largeCCR: Communication-Computation Rate

Page 31: Energy efficient resource management for high-performance clusters

Performance

04/08/2023 33

Impact to Schedule Length:

0.1 0.5 1 5 100

20406080

100120140160

Schedule Length

TDS

EAD

PEBD

MCP

Tim

e Un

it (S

)

0.1 0.5 1 5 100

20406080

100120140160180200

Schedule Length

TDS

EAD

PEBD

MCP

Tim

e Un

it (S

)

Schedule length of Gaussian Elimination Schedule length of Sparse Matrix Solver

Application EAD Performance Degradation (: TDS)

PEBD Performance Degradation (: TDS)

Gaussian Elimination 5.7% 2.2%

Sparse Matrix Solver 2.92% 2.02%

Observation: it is worth trading a marginal degradation in schedule length for a significant energy savings for cluster systems.

Page 32: Energy efficient resource management for high-performance clusters

Heterogeneous Clusters - Motivational Example

04/08/2023 34

3

34

2

1 02 0

75

8

22

3

21

31

76

1 0

32

1

E n t r y

t a s k

E x i t

t a s k

Task Description:TaskSet {T1, T2, …, T9, T10 }T1 is the entry task;T10 is the exit task;T2, T3 and T4 can not start until T1 finished;T5 and T6 can not start until T2 finished;T7 can not start until both T3 and T4 finished;T8 can not start until both T5 and T6 finished;T9 can not start until both T6 and T7 finished;T10 can not start until both T8 and T9 finished;

2.4 1.3 11.2

9.9 10.2

1.7 2.3 8.1

3.2 4.1 9.6

7.2 6.5 7.8

5.0 1.4 5.9

3.0 7.6 7.5

2.4 4.9 8.8

4.5 5.2 9.3

1.8 11.4 9.0

2.0 3.9 6.7 T1

T2

T3

T4

T5

T6

T7

T8

T9

T10

P1 P2 P3

(a) An example task description

(c) A DAG based on description in (a)

8

25

idle

active

EN

EN

65

100

idle

active

EN

EN

4

12

idle

active

EN

EN

6

10

30

2112

trtr

EL

EL

idle

active

4

7

20

3223

trtr

EL

EL

idle

active8

15

40

3113

trtr

EL

EL

idle

active

(b) A heterogeneous processor graph

(d) A mapping matrix

4 0

Page 33: Energy efficient resource management for high-performance clusters

Motivational Example (cont.)

04/08/2023 35

1

23

4

56

7

89

1 0

3

34

2

11 0

2 0

75

8

1 51 5

1 5

1 51 5

2 0

1 0

55 0

5 0

1 0 0

2 53 5

1

23

4

56

7

89

1 0

3

34

2

11 0

2 0

75

8

1 51 5

1 5

1 51 5

2 0

1 0

55 0

5 0

1 0 0

2 53 5

C 2

C 1 C 3

C 4

1 5

1 0

5

8 5

(a) The originial task description (b) The partitioned task graph

(c) The cluster graph

Cluster 1 is allocated to node C

Cluster 2 is allocated to node B

Cluster 3 is allocated to node D

Cluster 4 is allocated to node A

(d) Final allocation list

A B C D

C13050

J3700J 2008J 3000J

C21000

J900J 1560J 1200J

C3 180J 194J 136J 75J

C4 207J 226J 251J 243J

Energy calculation for tentative schedule

C1

C2

C3

C4

Page 34: Energy efficient resource management for high-performance clusters

Experimental Settings

04/08/2023 36

Parameters Value (Fixed) - (Varied)Different trees to be examined

Gaussian elimination, Fast Fourier Transform

Execution time of Gaussian Elimination

{5, 4, 1, 1, 1, 1, 10, 2, 3, 3, 3, 7, 8, 6, 6, 20, 30, 30 }-(random)

Execution time of Fast Fourier Transform

{15, 10, 10, 8, 8, 1, 1, 20, 20, 40, 40, 5, 5, 3, 3 }-(random)

Computing node type AMD Athlon 64 X2 4600+ with 85W TDP (Type 1)AMD Athlon 64 X2 4600+ with 65W TDP (Type 2)AMD Athlon 64 X2 3800+ with 35W TDP (Type 3)Intel Core 2 Duo E6300 processor (Type 4)

CCR set Between 0.1 and 10Computing node heterogeneity

Environment1:# of Type 1: 4# of Type 2: 4# of Type 3: 4# of Type 4: 4

Environment2:# of Type 1: 6# of Type 2: 2# of Type 3: 2# of Type 4: 6

Environment3:# of Type 1: 5# of Type 2: 3# of Type 3: 3# of Type 4: 5

Environment4:# of Type 1: 7# of Type 2: 1# of Type 3: 1# of Type 4: 7

Network energy consumption rate

20W, 33.6W, 60W

Simulation Environments

Page 35: Energy efficient resource management for high-performance clusters

Communication-Computation Ratio

04/08/2023 37

(a) CCR sensitivity under environment 1 (b) CCR sensitivity under environment 2

(c) CCR sensitivity under environment 3 (d) CCR sensitivity under environment 4

Energy Consumption under Different CCR

0

10000

20000

30000

40000

50000

60000

70000

0.1 0.3 0.5 0.7 0.9 2 4 6 8 10

CCR

En

erg

y (

Jo

ul)

TDSEETDSNDSEENDSHEADUS

Energy Consumption under Different CCR

0

10000

20000

30000

40000

50000

60000

70000

0.1 0.3 0.5 0.7 0.9 2 4 6 8 10

CCR

En

erg

y (

Jo

ul)

TDSEETDSNDSEENDSHEADUS

Energy Consumption under Different CCR

0

10000

20000

30000

40000

50000

60000

70000

0.1 0.3 0.5 0.7 0.9 2 4 6 8 10

CCR

En

erg

y (

Jo

ul)

TDSEETDSNDSEENDSHEADUS

Energy Consumption under Different CCR

0

10000

20000

30000

40000

50000

60000

70000

0.1 0.3 0.5 0.7 0.9 2 4 6 8 10

CCR

En

erg

y (

Jo

ul)

TDSEETDSNDSEENDSHEADUS

CCR sensitivity for Gaussian Elimination

Page 36: Energy efficient resource management for high-performance clusters

Heterogeneity

04/08/2023 38

(a) Energy consumption when Net_Energy=60 and CCR=0.1 (b) Energy consumption when Net_Energy=60 and CCR=0.5

(c) Energy consumption when Net_Energy=60 and CCR=8 (d) Energy consumption when Net_Energy=60 and CCR=10

Energy Consumption under Different Environments

0

10000

20000

30000

40000

50000

TDS EETDS NDS EENDS HEADUS

En

erg

y(J

ou

l)

E1 E2 E3 E4

Energy Consumption under Different Environments

05000

10000150002000025000300003500040000

TDS EETDS NDS EENDS HEADUS

En

erg

y(J

ou

l)

E1 E2 E3 E4

Energy Consumption under Different Environments

0

20000

40000

60000

80000

100000

TDS EETDS NDS EENDS HEADUS

En

erg

y(J

ou

l)

E1 E2 E3 E4

Energy Consumption under Different Environments

0

20000

40000

60000

80000

100000

TDS EETDS NDS EENDS HEADUSE

nerg

y(J

ou

l)

E1 E2 E3 E4

Computational nodes heterogeneity experiments

CPU Type

E1 E2 E3 E4

4 6 5 7

4 2 3 1

4 2 3 1

4 6 5 7

Observation: CPUs with large gap between CPU_busy and CPU_idle can obtain greater energy savings

Page 37: Energy efficient resource management for high-performance clusters

3904/08/2023

Architecture for high-performance computing platforms

Energy-Efficient Scheduling for Clusters

Energy-Efficient Scheduling for Heterogeneous Systems

How to measure energy consumption? Kill-A-Watt

Conclusions

Page 38: Energy efficient resource management for high-performance clusters

4004/08/2023

Source Code Availabilitywww.mcs.sdsmt.edu/~zzong/software/scheduling.html

Page 39: Energy efficient resource management for high-performance clusters

Download the presentation slideshttp://www.slideshare.net/xqin74

Google: slideshare Xiao Qin

‹#›

Page 40: Energy efficient resource management for high-performance clusters

http://www.eng.auburn.edu/~xqin

Page 41: Energy efficient resource management for high-performance clusters

My webpagehttp://www.eng.auburn.edu/~xqin

Page 42: Energy efficient resource management for high-performance clusters

Download Slides at slidesharehttp://www.slideshare.net/xqin74

Page 43: Energy efficient resource management for high-performance clusters

Questions http://www.eng.auburn.edu/~xqin

04/08/2023 45