42
Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and Software Engineering Auburn University Energy Efficient Scheduling for High- Performance Clusters

Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

  • Upload
    lev

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

Energy Efficient Scheduling for High-Performance Clusters. Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and Software Engineering Auburn University. Where is Auburn University?. Ph.D.’04, U. of Nebraska-Lincoln. 04-07, New Mexico Tech. - PowerPoint PPT Presentation

Citation preview

Page 1: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

Ziliang Zong, Adam Manzanares, and Xiao Qin

Department of Computer Science and Software EngineeringAuburn University

Energy Efficient Scheduling for High-Performance Clusters

Page 2: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

Where is Auburn University?Ph.D.’04, U. of Nebraska-Lincoln

04-07, New Mexico Tech 07-09, Auburn University

Page 3: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

323/4/21

Storage Systems Research Storage Systems Research Group at Group at New Mexico Tech New Mexico Tech (2004-2007)(2004-2007)

Page 4: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

423/4/21

Storage Systems Research Group Storage Systems Research Group at at Auburn (2008)Auburn (2008)

Page 5: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

523/4/21

Storage Systems Research Group at Storage Systems Research Group at Auburn (2009)Auburn (2009)

Page 6: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

623/4/21

InvestigatorsInvestigators

Ziliang Zong, Ph.D. Assistant Professor, South Dakota Schools of Mines and Technology

Adam Manzanares, Ph.D. Candidate Auburn University

Xiao Qin, Ph.D. Assistant Professor at Auburn University

Page 7: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

723/4/21

Introduction - ApplicationsIntroduction - Applications

Page 8: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

823/4/21

Introduction – Data CentersIntroduction – Data Centers

Page 9: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

923/4/21

Motivation – Electricity Motivation – Electricity UsageUsage

EPA Report to Congress on Server and Data Center Energy Efficiency, 2007

Page 10: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

1023/4/21

Motivation – Energy Motivation – Energy ProjectionsProjections

EPA Report to Congress on Server and Data Center Energy Efficiency, 2007

Page 11: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

1123/4/21

Motivation – Design IssuesMotivation – Design Issues

Page 12: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

1223/4/21

OutlineOutline

Page 13: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

1323/4/21

Architecture – Multiple Architecture – Multiple LayersLayers

Page 14: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

1423/4/21

Energy Efficient DevicesEnergy Efficient Devices

Page 15: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

1523/4/21

Multiple Design GoalsMultiple Design Goals

Page 16: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

1623/4/21

OutlineOutline

Page 17: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

1723/4/21

Energy-Aware Scheduling for Energy-Aware Scheduling for

ClustersClusters

Page 18: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

1823/4/21

Parallel ApplicationsParallel Applications

1

2 3 4

5 6 7

8 9

10

3

3

4

2

1020

75

8

3 3

3

33

42

1 1010

20

57

1

Entry Task

Exit Task

Page 19: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

1923/4/21

Motivational ExampleMotivational Example

81

2 3

4

6 5

10 15

2 4

6

An Example of duplication

Linear Schedule Time: 39s

No Duplication Schedule (NDS)

T10 8

T323

T233

T439

Time: 32s

Task Duplication Schedule (TDS) Time: 29s

T10 8

T218

2

T10 8

T323

T42920

T1

0 8

T323

T26

2414

2

26

T432

Page 20: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

2023/4/21

Motivational Example (cont.)Motivational Example (cont.)

T1

0 8

T323

T26

2414

2

26

T432

T10 8

T218

2

T10 8

T323

T42920

An Example of duplication

Linear Schedule Time:39s Energy: 234J

No Duplication Schedule (MCP)

Task Duplication Schedule (TDS)

T10 8

T323

T233

T439

Time: 32s Energy: 242J

Time: 29s Energy: 284J

CPU_Energy=6W

Network_Energy=1W

(10,60)

(8,48)

1

2 3

4

(6,6)

(5,5)

(15,90)

(2,2)

(4,4)

(6,36)

Page 21: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

2123/4/21

Motivational Example (cont.)Motivational Example (cont.)

(10,60)

(8,48)

1

2 3

4

(6,6)

(5,5)

(15,90)

(2,2)

(4,4)

(6,36)

The energy cost of duplicating T1:

CPU side: 48J Network side: -6J Total: 42J

The performance benefit of duplicating T1: 6s

Energy-performance tradeoff: 42/6 = 7

T1

0 8

T323

T26

2414

2

26

T432

T10 8

T218

2

T10 8

T323

T42920

EAD

PEBD

Time: 32s Energy: 242J

Time: 29s Energy: 284JIf Threshold = 10

Duplicate T1?

EAD: NO

PEBD: Yes

Page 22: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

2223/4/21

Basic Steps of Energy-Aware Basic Steps of Energy-Aware SchedulingScheduling

Task Description:

Task Set {T1, T2, …, T9, T10 }

T1 is the entry task;T10 is the exit task;T2, T3 and T4 can not start until T1 finished;T5 and T6 can not start until T2 finished;T7 can not start until both T3 and T4 finished;T8 can not start until both T5 and T6 finished;T9 can not start until both T6 and T7 finished;T10 can not start until both T8 and T9 finished;

1

2 3 4

5 6 7

8 9

10

3

3

4

2

1020

75

8

3 3

3

33

42

1 1010

20

57

1

Entry Task

Exit Task

Step 1: DAG Generation

Algorithm Implementation:

Page 23: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

2323/4/21

Basic Steps of Energy-Aware Basic Steps of Energy-Aware SchedulingScheduling

Step 2: Parameters Calculation

Algorithm Implementation:

Task Level EST ECT LAST LACT FP

1 40 0 3 0 3 --

2 28 3 6 4 7 1

3 37 3 7 3 7 1

4 35 3 5 3 5 1

5 16 6 7 16 17 2

6 25 6 16 7 17 2

7 33 7 27 7 27 3

8 15 16 23 18 25 6

9 13 27 32 27 32 7

10 8 32 40 32 40 9

1

2 3 4

5 6 7

8 9

10

3

3

4

2

1020

75

8

3 3

3

33

42

1 1010

20

57

1

Entry Task

Exit Task

Total Execution time from current task to the exit task

Earliest Start Time

Earliest Completion Time

Latest Allowable Start

Time

Latest Allowable

Completion Time

Favorite Predecessor

Page 24: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

2423/4/21

Basic Steps of Energy-Aware Basic Steps of Energy-Aware SchedulingScheduling

Step 3: Scheduling

Algorithm Implementation:

1

2 3 4

5 6 7

8 9

10

3

3

4

2

1020

75

8

3 3

3

33

42

1 1010

20

57

1

Entry Task

Exit Task

Task Level EST ECT LAST LACT FP

1 40 0 3 0 3 --

2 28 3 6 4 7 1

3 37 3 7 3 7 1

4 35 3 5 3 5 1

5 16 6 7 16 17 2

6 25 6 16 7 17 2

7 33 7 27 7 27 3

8 15 16 23 18 25 6

9 13 27 32 27 32 7

10 8 32 40 32 40 9

Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1} Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1} Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1} Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1} Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1}

Page 25: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

2523/4/21

Basic Steps of Energy-Aware Basic Steps of Energy-Aware SchedulingScheduling

Step 4: Duplication Decision

Algorithm Implementation:

1

2 3 4

5 6 7

8 9

10

3

3

4

2

1020

75

8

3 3

3

33

42

1 1010

20

57

1

Entry Task

Exit Task

Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1} Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1} Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1} Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1} Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1}

Decision 1: Duplicate T1?

Decision 2: Duplicate T2? Duplicate T1?

Decision 3: Duplicate T1?

Page 26: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

2623/4/21

The EAD and PEBD AlgorithmsThe EAD and PEBD AlgorithmsGenerate the DAG of given task sets

Find all the critical paths in DAG

Generate scheduling queue based on the level (ascending)

select the task (has not been scheduled yet) with the lowest level as starting task

For each task which is in the same critical path with starting task, check

if it is already scheduled

allocate it to the same processor with the tasks in

the same critical pathYes

No

mee

t ent

ry ta

sk

Save time if duplicate this task?

Yes

Calculate energy increase

and time decrease

Ratio= energy increase/ time decrease

Ratio<=Threshold?No

Yes

Duplicate this task and select the next task in the same

critical path

Calculate energy increase

more_energy<=Threshold?

Duplicate this task and select the next task in the same

critical path

Yes

No

No

PEBD EAD

Page 27: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

2723/4/21

Energy Dissipation in Energy Dissipation in ProcessorsProcessors

http://www.xbitlabs.com

Page 28: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

2823/4/21

Parallel Scientific ApplicationsParallel Scientific Applications

T1

T2 T3

T4 T5 T6 T7

T8 T9 T10 T11

T12 T13 T14 T15

T1

T2 T3 T4 T5 T6

T7

T8 T9 T10 T11

T12

T13 T14 T15

T16

T17 T18

Fast Fourier Transform Gaussian Elimination

Page 29: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

2923/4/21

Large-Scale Parallel Large-Scale Parallel Applications Applications

Robot Control Sparse Matrix Solver

http://www.kasahara.elec.waseda.ac.jp/schedule/

Page 30: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

3023/4/21

Impact of CPU Power Impact of CPU Power DissipationDissipation

Energy consumption for different processors (Gaussian, CCR=0.4)

Energy consumption for different processors (FFT, CCR=0.4)

19.4% 3.7%

CPU Type Power (busy) Power (idle) Gap

104w 15w 89w

75w 14w 61w

47w 11w 36w

44w 26w 18w

Observation: CPUs with large gap between CPU_busy and CPU_idle can obtain greater energy savings

Page 31: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

3123/4/21

Impact of Interconnect Power Impact of Interconnect Power DissipationDissipation

Energy consumption (Robot Control, Myrinet) Energy consumption (Robot Control, Infiniband)

16.7% 5%

Interconnection Power

Myrinet 33.6w

Infiniband 65w

Observation: The energy saving of EAD and PEBD is degraded if the interconnection has high power consumption rate.

13.3% 3.1%

Page 32: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

3223/4/21

Parallelism DegreesParallelism Degrees

Energy consumption of Robert Control(Myrinet) Energy consumption of Sparse Matrix (Myrinet)

Application Parallelism

Robot Control 4.363796

Sparse Matrix Solver 15.868853

Observation: Robert Control has more task dependencies thus there exists more possibility for EAD and PEBD to consume energy by judiciously duplicating tasks.

17% 15.8%6.9% 5.4%

Page 33: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

3323/4/21

Communication-Computation Communication-Computation RatioRatio

Energy consumption under different CCRs

Processor type: Athlon 3800+ 35WInterconnection: MyrinetSimualated Application: Robot ControlCCR: (0.1, 0.5, 1, 5, 10)

Observation:

The overall energy consumption of EAD and PEBD are less than MCP and TDS.

EAD and PEBD are very sensitive to CCR

MCP provides the greatest energy savings if CCR is less than 1

MCP consumes much more energy when CCR is large

CCR: Communication-Computation Rate

Page 34: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

3423/4/21

PerformancePerformance

Schedule length of Gaussian Elimination Schedule length of Sparse Matrix Solver

Application EAD Performance Degradation (: TDS)

PEBD Performance Degradation (: TDS)

Gaussian Elimination 5.7% 2.2%

Sparse Matrix Solver 2.92% 2.02%

Observation: it is worth trading a marginal degradation in schedule length for a significant energy savings for cluster systems.

Page 35: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

3523/4/21

Heterogeneous Clusters - Heterogeneous Clusters - Motivational ExampleMotivational Example

3

34

2

1 02 0

75

8

22

3

21

31

76

1 0

32

1

E n t r y

t a s k

E x i t

t a s k

Task Description:TaskSet {T1, T2, …, T9, T10 }T1 is the entry task;T10 is the exit task;T2, T3 and T4 can not start until T1 finished;T5 and T6 can not start until T2 finished;T7 can not start until both T3 and T4 finished;T8 can not start until both T5 and T6 finished;T9 can not start until both T6 and T7 finished;T10 can not start until both T8 and T9 finished;

2.4 1.3 11.2

9.9 10.2

1.7 2.3 8.1

3.2 4.1 9.6

7.2 6.5 7.8

5.0 1.4 5.9

3.0 7.6 7.5

2.4 4.9 8.8

4.5 5.2 9.3

1.8 11.4 9.0

2.0 3.9 6.7 T1

T2

T3

T4

T5

T6

T7

T8

T9

T10

P1 P2 P3

(a) An example task description

(c) A DAG based on description in (a)

8

25

idle

active

EN

EN

65

100

idle

active

EN

EN

4

12

idle

active

EN

EN

6

10

30

2112

trtr

EL

EL

idle

active

4

7

20

3223

trtr

EL

EL

idle

active8

15

40

3113

trtr

EL

EL

idle

active

(b) A heterogeneous processor graph

(d) A mapping matrix

4 0

Page 36: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

3623/4/21

Motivational Example (cont.)Motivational Example (cont.)1

23

4

56

7

89

1 0

3

34

2

11 0

2 0

75

8

1 51 5

1 5

1 51 5

2 0

1 0

55 0

5 0

1 0 0

2 53 5

1

23

4

56

7

89

1 0

3

34

2

11 0

2 0

75

8

1 51 5

1 5

1 51 5

2 0

1 0

55 0

5 0

1 0 0

2 53 5

C 2

C 1 C 3

C 4

1 5

1 0

5

8 5

(a) The originial task description (b) The partitioned task graph

(c) The cluster graph

Cluster 1 is allocated to node C

Cluster 2 is allocated to node B

Cluster 3 is allocated to node D

Cluster 4 is allocated to node A

(d) Final allocation list

Energy calculation for tentative schedule

C1

C2

C3

C4

Page 37: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

3723/4/21

Experimental SettingsExperimental Settings

Parameters Value (Fixed) - (Varied)

Different trees to be examined

Gaussian elimination, Fast Fourier Transform

Execution time of Gaussian Elimination

{5, 4, 1, 1, 1, 1, 10, 2, 3, 3, 3, 7, 8, 6, 6, 20, 30, 30 }-(random)

Execution time of Fast Fourier Transform

{15, 10, 10, 8, 8, 1, 1, 20, 20, 40, 40, 5, 5, 3, 3 }-(random)

Computing node type AMD Athlon 64 X2 4600+ with 85W TDP (Type 1)AMD Athlon 64 X2 4600+ with 65W TDP (Type 2)AMD Athlon 64 X2 3800+ with 35W TDP (Type 3)Intel Core 2 Duo E6300 processor (Type 4)

CCR set Between 0.1 and 10Computing node heterogeneity

Environment1:# of Type 1: 4# of Type 2: 4# of Type 3: 4# of Type 4: 4

Environment2:# of Type 1: 6# of Type 2: 2# of Type 3: 2# of Type 4: 6

Environment3:# of Type 1: 5# of Type 2: 3# of Type 3: 3# of Type 4: 5

Environment4:# of Type 1: 7# of Type 2: 1# of Type 3: 1# of Type 4: 7

Network energy consumption rate

20W, 33.6W, 60W

Simulation Environments

Page 38: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

3823/4/21

Communication-Computation Communication-Computation RatioRatio

(a) CCR sensitivity under environment 1 (b) CCR sensitivity under environment 2

(c) CCR sensitivity under environment 3 (d) CCR sensitivity under environment 4

Energy Consumption under Different CCR

0

10000

20000

30000

40000

50000

60000

70000

0.1 0.3 0.5 0.7 0.9 2 4 6 8 10

CCR

En

erg

y (

Jo

ul)

TDSEETDSNDSEENDSHEADUS

Energy Consumption under Different CCR

0

10000

20000

30000

40000

50000

60000

70000

0.1 0.3 0.5 0.7 0.9 2 4 6 8 10

CCR

En

erg

y (

Jo

ul)

TDSEETDSNDSEENDSHEADUS

Energy Consumption under Different CCR

0

10000

20000

30000

40000

50000

60000

70000

0.1 0.3 0.5 0.7 0.9 2 4 6 8 10

CCR

En

erg

y (

Jo

ul)

TDSEETDSNDSEENDSHEADUS

Energy Consumption under Different CCR

0

10000

20000

30000

40000

50000

60000

70000

0.1 0.3 0.5 0.7 0.9 2 4 6 8 10

CCR

En

erg

y (

Jo

ul)

TDSEETDSNDSEENDSHEADUS

CCR sensitivity for Gaussian Elimination

Page 39: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

3923/4/21

HeterogeneityHeterogeneity

(a) Energy consumption when Net_Energy=60 and CCR=0.1 (b) Energy consumption when Net_Energy=60 and CCR=0.5

(c) Energy consumption when Net_Energy=60 and CCR=8 (d) Energy consumption when Net_Energy=60 and CCR=10

Energy Consumption under Different Environments

0

10000

20000

30000

40000

50000

TDS EETDS NDS EENDS HEADUS

En

erg

y(J

ou

l)

E1 E2 E3 E4

Energy Consumption under Different Environments

05000

10000150002000025000300003500040000

TDS EETDS NDS EENDS HEADUS

En

erg

y(J

ou

l)

E1 E2 E3 E4

Energy Consumption under Different Environments

0

20000

40000

60000

80000

100000

TDS EETDS NDS EENDS HEADUS

En

erg

y(J

ou

l)

E1 E2 E3 E4

Energy Consumption under Different Environments

0

20000

40000

60000

80000

100000

TDS EETDS NDS EENDS HEADUSE

nerg

y(J

ou

l)

E1 E2 E3 E4

Computational nodes heterogeneity experiments

CPU Type

E1 E2 E3 E4

4 6 5 7

4 2 3 1

4 2 3 1

4 6 5 7

Observation: CPUs with large gap between CPU_busy and CPU_idle can obtain greater energy savings

Page 40: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

4023/4/21

Architecture for high-performance computing platforms

Energy-Efficient Scheduling for Clusters

Energy-Efficient Scheduling for Heterogeneous Systems

How to measure energy consumption? Kill-A-Watt

ConclusionsConclusions

Page 41: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

4123/4/21

http://www.auburn.edu/~xzq0001

Page 42: Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and

QuestionsQuestions http://www.eng.auburn.edu/~xqin

4223/4/21