Upload
lev
View
37
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Energy Efficient Scheduling for High-Performance Clusters. Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and Software Engineering Auburn University. Where is Auburn University?. Ph.D.’04, U. of Nebraska-Lincoln. 04-07, New Mexico Tech. - PowerPoint PPT Presentation
Citation preview
Ziliang Zong, Adam Manzanares, and Xiao Qin
Department of Computer Science and Software EngineeringAuburn University
Energy Efficient Scheduling for High-Performance Clusters
Where is Auburn University?Ph.D.’04, U. of Nebraska-Lincoln
04-07, New Mexico Tech 07-09, Auburn University
323/4/21
Storage Systems Research Storage Systems Research Group at Group at New Mexico Tech New Mexico Tech (2004-2007)(2004-2007)
423/4/21
Storage Systems Research Group Storage Systems Research Group at at Auburn (2008)Auburn (2008)
523/4/21
Storage Systems Research Group at Storage Systems Research Group at Auburn (2009)Auburn (2009)
623/4/21
InvestigatorsInvestigators
Ziliang Zong, Ph.D. Assistant Professor, South Dakota Schools of Mines and Technology
Adam Manzanares, Ph.D. Candidate Auburn University
Xiao Qin, Ph.D. Assistant Professor at Auburn University
723/4/21
Introduction - ApplicationsIntroduction - Applications
823/4/21
Introduction – Data CentersIntroduction – Data Centers
923/4/21
Motivation – Electricity Motivation – Electricity UsageUsage
EPA Report to Congress on Server and Data Center Energy Efficiency, 2007
1023/4/21
Motivation – Energy Motivation – Energy ProjectionsProjections
EPA Report to Congress on Server and Data Center Energy Efficiency, 2007
1123/4/21
Motivation – Design IssuesMotivation – Design Issues
1223/4/21
OutlineOutline
1323/4/21
Architecture – Multiple Architecture – Multiple LayersLayers
1423/4/21
Energy Efficient DevicesEnergy Efficient Devices
1523/4/21
Multiple Design GoalsMultiple Design Goals
1623/4/21
OutlineOutline
1723/4/21
Energy-Aware Scheduling for Energy-Aware Scheduling for
ClustersClusters
1823/4/21
Parallel ApplicationsParallel Applications
1
2 3 4
5 6 7
8 9
10
3
3
4
2
1020
75
8
3 3
3
33
42
1 1010
20
57
1
Entry Task
Exit Task
1923/4/21
Motivational ExampleMotivational Example
81
2 3
4
6 5
10 15
2 4
6
An Example of duplication
Linear Schedule Time: 39s
No Duplication Schedule (NDS)
T10 8
T323
T233
T439
Time: 32s
Task Duplication Schedule (TDS) Time: 29s
T10 8
T218
2
T10 8
T323
T42920
T1
0 8
T323
T26
2414
2
26
T432
2023/4/21
Motivational Example (cont.)Motivational Example (cont.)
T1
0 8
T323
T26
2414
2
26
T432
T10 8
T218
2
T10 8
T323
T42920
An Example of duplication
Linear Schedule Time:39s Energy: 234J
No Duplication Schedule (MCP)
Task Duplication Schedule (TDS)
T10 8
T323
T233
T439
Time: 32s Energy: 242J
Time: 29s Energy: 284J
CPU_Energy=6W
Network_Energy=1W
(10,60)
(8,48)
1
2 3
4
(6,6)
(5,5)
(15,90)
(2,2)
(4,4)
(6,36)
2123/4/21
Motivational Example (cont.)Motivational Example (cont.)
(10,60)
(8,48)
1
2 3
4
(6,6)
(5,5)
(15,90)
(2,2)
(4,4)
(6,36)
The energy cost of duplicating T1:
CPU side: 48J Network side: -6J Total: 42J
The performance benefit of duplicating T1: 6s
Energy-performance tradeoff: 42/6 = 7
T1
0 8
T323
T26
2414
2
26
T432
T10 8
T218
2
T10 8
T323
T42920
EAD
PEBD
Time: 32s Energy: 242J
Time: 29s Energy: 284JIf Threshold = 10
Duplicate T1?
EAD: NO
PEBD: Yes
2223/4/21
Basic Steps of Energy-Aware Basic Steps of Energy-Aware SchedulingScheduling
Task Description:
Task Set {T1, T2, …, T9, T10 }
T1 is the entry task;T10 is the exit task;T2, T3 and T4 can not start until T1 finished;T5 and T6 can not start until T2 finished;T7 can not start until both T3 and T4 finished;T8 can not start until both T5 and T6 finished;T9 can not start until both T6 and T7 finished;T10 can not start until both T8 and T9 finished;
1
2 3 4
5 6 7
8 9
10
3
3
4
2
1020
75
8
3 3
3
33
42
1 1010
20
57
1
Entry Task
Exit Task
Step 1: DAG Generation
Algorithm Implementation:
2323/4/21
Basic Steps of Energy-Aware Basic Steps of Energy-Aware SchedulingScheduling
Step 2: Parameters Calculation
Algorithm Implementation:
Task Level EST ECT LAST LACT FP
1 40 0 3 0 3 --
2 28 3 6 4 7 1
3 37 3 7 3 7 1
4 35 3 5 3 5 1
5 16 6 7 16 17 2
6 25 6 16 7 17 2
7 33 7 27 7 27 3
8 15 16 23 18 25 6
9 13 27 32 27 32 7
10 8 32 40 32 40 9
1
2 3 4
5 6 7
8 9
10
3
3
4
2
1020
75
8
3 3
3
33
42
1 1010
20
57
1
Entry Task
Exit Task
Total Execution time from current task to the exit task
Earliest Start Time
Earliest Completion Time
Latest Allowable Start
Time
Latest Allowable
Completion Time
Favorite Predecessor
2423/4/21
Basic Steps of Energy-Aware Basic Steps of Energy-Aware SchedulingScheduling
Step 3: Scheduling
Algorithm Implementation:
1
2 3 4
5 6 7
8 9
10
3
3
4
2
1020
75
8
3 3
3
33
42
1 1010
20
57
1
Entry Task
Exit Task
Task Level EST ECT LAST LACT FP
1 40 0 3 0 3 --
2 28 3 6 4 7 1
3 37 3 7 3 7 1
4 35 3 5 3 5 1
5 16 6 7 16 17 2
6 25 6 16 7 17 2
7 33 7 27 7 27 3
8 15 16 23 18 25 6
9 13 27 32 27 32 7
10 8 32 40 32 40 9
Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1} Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1} Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1} Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1} Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1}
2523/4/21
Basic Steps of Energy-Aware Basic Steps of Energy-Aware SchedulingScheduling
Step 4: Duplication Decision
Algorithm Implementation:
1
2 3 4
5 6 7
8 9
10
3
3
4
2
1020
75
8
3 3
3
33
42
1 1010
20
57
1
Entry Task
Exit Task
Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1} Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1} Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1} Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1} Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1}
Decision 1: Duplicate T1?
Decision 2: Duplicate T2? Duplicate T1?
Decision 3: Duplicate T1?
2623/4/21
The EAD and PEBD AlgorithmsThe EAD and PEBD AlgorithmsGenerate the DAG of given task sets
Find all the critical paths in DAG
Generate scheduling queue based on the level (ascending)
select the task (has not been scheduled yet) with the lowest level as starting task
For each task which is in the same critical path with starting task, check
if it is already scheduled
allocate it to the same processor with the tasks in
the same critical pathYes
No
mee
t ent
ry ta
sk
Save time if duplicate this task?
Yes
Calculate energy increase
and time decrease
Ratio= energy increase/ time decrease
Ratio<=Threshold?No
Yes
Duplicate this task and select the next task in the same
critical path
Calculate energy increase
more_energy<=Threshold?
Duplicate this task and select the next task in the same
critical path
Yes
No
No
PEBD EAD
2723/4/21
Energy Dissipation in Energy Dissipation in ProcessorsProcessors
http://www.xbitlabs.com
2823/4/21
Parallel Scientific ApplicationsParallel Scientific Applications
T1
T2 T3
T4 T5 T6 T7
T8 T9 T10 T11
T12 T13 T14 T15
T1
T2 T3 T4 T5 T6
T7
T8 T9 T10 T11
T12
T13 T14 T15
T16
T17 T18
Fast Fourier Transform Gaussian Elimination
2923/4/21
Large-Scale Parallel Large-Scale Parallel Applications Applications
Robot Control Sparse Matrix Solver
http://www.kasahara.elec.waseda.ac.jp/schedule/
3023/4/21
Impact of CPU Power Impact of CPU Power DissipationDissipation
Energy consumption for different processors (Gaussian, CCR=0.4)
Energy consumption for different processors (FFT, CCR=0.4)
19.4% 3.7%
CPU Type Power (busy) Power (idle) Gap
104w 15w 89w
75w 14w 61w
47w 11w 36w
44w 26w 18w
Observation: CPUs with large gap between CPU_busy and CPU_idle can obtain greater energy savings
3123/4/21
Impact of Interconnect Power Impact of Interconnect Power DissipationDissipation
Energy consumption (Robot Control, Myrinet) Energy consumption (Robot Control, Infiniband)
16.7% 5%
Interconnection Power
Myrinet 33.6w
Infiniband 65w
Observation: The energy saving of EAD and PEBD is degraded if the interconnection has high power consumption rate.
13.3% 3.1%
3223/4/21
Parallelism DegreesParallelism Degrees
Energy consumption of Robert Control(Myrinet) Energy consumption of Sparse Matrix (Myrinet)
Application Parallelism
Robot Control 4.363796
Sparse Matrix Solver 15.868853
Observation: Robert Control has more task dependencies thus there exists more possibility for EAD and PEBD to consume energy by judiciously duplicating tasks.
17% 15.8%6.9% 5.4%
3323/4/21
Communication-Computation Communication-Computation RatioRatio
Energy consumption under different CCRs
Processor type: Athlon 3800+ 35WInterconnection: MyrinetSimualated Application: Robot ControlCCR: (0.1, 0.5, 1, 5, 10)
Observation:
The overall energy consumption of EAD and PEBD are less than MCP and TDS.
EAD and PEBD are very sensitive to CCR
MCP provides the greatest energy savings if CCR is less than 1
MCP consumes much more energy when CCR is large
CCR: Communication-Computation Rate
3423/4/21
PerformancePerformance
Schedule length of Gaussian Elimination Schedule length of Sparse Matrix Solver
Application EAD Performance Degradation (: TDS)
PEBD Performance Degradation (: TDS)
Gaussian Elimination 5.7% 2.2%
Sparse Matrix Solver 2.92% 2.02%
Observation: it is worth trading a marginal degradation in schedule length for a significant energy savings for cluster systems.
3523/4/21
Heterogeneous Clusters - Heterogeneous Clusters - Motivational ExampleMotivational Example
3
34
2
1 02 0
75
8
22
3
21
31
76
1 0
32
1
E n t r y
t a s k
E x i t
t a s k
Task Description:TaskSet {T1, T2, …, T9, T10 }T1 is the entry task;T10 is the exit task;T2, T3 and T4 can not start until T1 finished;T5 and T6 can not start until T2 finished;T7 can not start until both T3 and T4 finished;T8 can not start until both T5 and T6 finished;T9 can not start until both T6 and T7 finished;T10 can not start until both T8 and T9 finished;
2.4 1.3 11.2
9.9 10.2
1.7 2.3 8.1
3.2 4.1 9.6
7.2 6.5 7.8
5.0 1.4 5.9
3.0 7.6 7.5
2.4 4.9 8.8
4.5 5.2 9.3
1.8 11.4 9.0
2.0 3.9 6.7 T1
T2
T3
T4
T5
T6
T7
T8
T9
T10
P1 P2 P3
(a) An example task description
(c) A DAG based on description in (a)
8
25
idle
active
EN
EN
65
100
idle
active
EN
EN
4
12
idle
active
EN
EN
6
10
30
2112
trtr
EL
EL
idle
active
4
7
20
3223
trtr
EL
EL
idle
active8
15
40
3113
trtr
EL
EL
idle
active
(b) A heterogeneous processor graph
(d) A mapping matrix
4 0
3623/4/21
Motivational Example (cont.)Motivational Example (cont.)1
23
4
56
7
89
1 0
3
34
2
11 0
2 0
75
8
1 51 5
1 5
1 51 5
2 0
1 0
55 0
5 0
1 0 0
2 53 5
1
23
4
56
7
89
1 0
3
34
2
11 0
2 0
75
8
1 51 5
1 5
1 51 5
2 0
1 0
55 0
5 0
1 0 0
2 53 5
C 2
C 1 C 3
C 4
1 5
1 0
5
8 5
(a) The originial task description (b) The partitioned task graph
(c) The cluster graph
Cluster 1 is allocated to node C
Cluster 2 is allocated to node B
Cluster 3 is allocated to node D
Cluster 4 is allocated to node A
(d) Final allocation list
Energy calculation for tentative schedule
C1
C2
C3
C4
3723/4/21
Experimental SettingsExperimental Settings
Parameters Value (Fixed) - (Varied)
Different trees to be examined
Gaussian elimination, Fast Fourier Transform
Execution time of Gaussian Elimination
{5, 4, 1, 1, 1, 1, 10, 2, 3, 3, 3, 7, 8, 6, 6, 20, 30, 30 }-(random)
Execution time of Fast Fourier Transform
{15, 10, 10, 8, 8, 1, 1, 20, 20, 40, 40, 5, 5, 3, 3 }-(random)
Computing node type AMD Athlon 64 X2 4600+ with 85W TDP (Type 1)AMD Athlon 64 X2 4600+ with 65W TDP (Type 2)AMD Athlon 64 X2 3800+ with 35W TDP (Type 3)Intel Core 2 Duo E6300 processor (Type 4)
CCR set Between 0.1 and 10Computing node heterogeneity
Environment1:# of Type 1: 4# of Type 2: 4# of Type 3: 4# of Type 4: 4
Environment2:# of Type 1: 6# of Type 2: 2# of Type 3: 2# of Type 4: 6
Environment3:# of Type 1: 5# of Type 2: 3# of Type 3: 3# of Type 4: 5
Environment4:# of Type 1: 7# of Type 2: 1# of Type 3: 1# of Type 4: 7
Network energy consumption rate
20W, 33.6W, 60W
Simulation Environments
3823/4/21
Communication-Computation Communication-Computation RatioRatio
(a) CCR sensitivity under environment 1 (b) CCR sensitivity under environment 2
(c) CCR sensitivity under environment 3 (d) CCR sensitivity under environment 4
Energy Consumption under Different CCR
0
10000
20000
30000
40000
50000
60000
70000
0.1 0.3 0.5 0.7 0.9 2 4 6 8 10
CCR
En
erg
y (
Jo
ul)
TDSEETDSNDSEENDSHEADUS
Energy Consumption under Different CCR
0
10000
20000
30000
40000
50000
60000
70000
0.1 0.3 0.5 0.7 0.9 2 4 6 8 10
CCR
En
erg
y (
Jo
ul)
TDSEETDSNDSEENDSHEADUS
Energy Consumption under Different CCR
0
10000
20000
30000
40000
50000
60000
70000
0.1 0.3 0.5 0.7 0.9 2 4 6 8 10
CCR
En
erg
y (
Jo
ul)
TDSEETDSNDSEENDSHEADUS
Energy Consumption under Different CCR
0
10000
20000
30000
40000
50000
60000
70000
0.1 0.3 0.5 0.7 0.9 2 4 6 8 10
CCR
En
erg
y (
Jo
ul)
TDSEETDSNDSEENDSHEADUS
CCR sensitivity for Gaussian Elimination
3923/4/21
HeterogeneityHeterogeneity
(a) Energy consumption when Net_Energy=60 and CCR=0.1 (b) Energy consumption when Net_Energy=60 and CCR=0.5
(c) Energy consumption when Net_Energy=60 and CCR=8 (d) Energy consumption when Net_Energy=60 and CCR=10
Energy Consumption under Different Environments
0
10000
20000
30000
40000
50000
TDS EETDS NDS EENDS HEADUS
En
erg
y(J
ou
l)
E1 E2 E3 E4
Energy Consumption under Different Environments
05000
10000150002000025000300003500040000
TDS EETDS NDS EENDS HEADUS
En
erg
y(J
ou
l)
E1 E2 E3 E4
Energy Consumption under Different Environments
0
20000
40000
60000
80000
100000
TDS EETDS NDS EENDS HEADUS
En
erg
y(J
ou
l)
E1 E2 E3 E4
Energy Consumption under Different Environments
0
20000
40000
60000
80000
100000
TDS EETDS NDS EENDS HEADUSE
nerg
y(J
ou
l)
E1 E2 E3 E4
Computational nodes heterogeneity experiments
CPU Type
E1 E2 E3 E4
4 6 5 7
4 2 3 1
4 2 3 1
4 6 5 7
Observation: CPUs with large gap between CPU_busy and CPU_idle can obtain greater energy savings
4023/4/21
Architecture for high-performance computing platforms
Energy-Efficient Scheduling for Clusters
Energy-Efficient Scheduling for Heterogeneous Systems
How to measure energy consumption? Kill-A-Watt
ConclusionsConclusions
4123/4/21
http://www.auburn.edu/~xzq0001
QuestionsQuestions http://www.eng.auburn.edu/~xqin
4223/4/21