Upload
others
View
31
Download
0
Embed Size (px)
Citation preview
Hawk: Hybrid Datacenter Scheduling
Pamela Delgado, Florin Dinu,
Anne-Marie Kermarrec, Willy Zwaenepoel
1USENIX ATC 2015
Job 1
cluster
task
…
…
…
…
2
scheduler
task
Job N
task task
…
… …
Introduction: datacenter scheduling
cluster
…
…
…
…
3
centralized
scheduler
Job 1
task task
Job N
task task
…
… …
Introduction: centralized scheduling
Introduction: centralized scheduling
cluster
…
…
…
…
4
centralized
scheduler… Job 1Job 2Job N
… …
Introduction: centralized scheduling
cluster
…
…
…
…
5
centralized
scheduler… Job 1Job 2Job N
… …
Good: placement
Not so good: scheduling latency
Introduction: distributed scheduling
6
cluster
…
…
…
…
…
distributedscheduler 1
distributedscheduler 2
distributedscheduler N
Job 1
Job 2
Job N
… …
Introduction: distributed scheduling
7
cluster
…
…
…
…
…
distributedscheduler 1
distributedscheduler 2
distributedscheduler N
Good: scheduling latency
Not so good: placement
Job 1
Job 2
Job N
… …
Outline
8
1) Introduction
2) HAWK hybrid scheduling
• Rationale
• Design
3) Evaluation
• Simulation
• Real cluster implementation
4) Conclusion
Hybrid scheduling
9
cluster
…
…
…
…
Job 1
Job N
Job 2
distributedscheduler N
distributedscheduler 1
centralizedscheduler
Job M
…
… …
…
Hawk: hybrid scheduling
10
Long jobs centralized
Short jobs distributed
Hawk: hybrid scheduling
Long job 1
Short job 1
Long job M
Short job N
distributedscheduler 1
distributedscheduler N
11
Long/short:estimatedexecution time vs cut-off
centralizedscheduler
… … …
Short job 2 distributedscheduler 2
…
Rationale for Hawk
Long job 1
Short job 1
Long job M
Short job N
12
Typical production workloads
many
few
little resources
most resources
…Short job 2
…
Rationale for Hawk (continued)
13Source: Design Insights for MapReduce from Diverse Production Workloads, Chen et al 2012
0
20
40
60
80
100
0
20
40
60
80
100
Percentage of long jobs Percentage of task-seconds for long jobs
Rationale for Hawk (continued)
14Source: Design Insights for MapReduce from Diverse Production Workloads, Chen et al 2012
0
20
40
60
80
100
0
20
40
60
80
100
Percentage of long jobs Percentage of task-seconds for long jobs
Long jobs: minority but
take up most of the resources
centralized
Hawk: hybrid scheduling
distributed 1
distributed N
15
Few jobs reasonableschedulinglatency
Few resources can tradenot-so-good
placement
Long job 1
Short job 1
Short job N
Bulk ofresources good placement
Latency sensitive Fast scheduling
… …
…
centralized
Hawk: hybrid scheduling
distributed 1
distributed N
16
Few jobs reasonableschedulinglatency
Few resources can tradenot-so-good
placement
Long job 1
Short job 1
Short job N
Bulk ofresources good placement
Latency sensitive Fast scheduling
… …
…
BEST OF BOTH WORLDS
Good: scheduling latency for short jobs
Good: placement for long jobs
Hawk: distributed scheduling
17
• Sparrow
• Work-stealing
Hawk: distributed scheduling
18
• Sparrow
• Work-stealing
Sparrow
distributedscheduler
task
random
reservation
(power of two)
19
…
Hawk: distributed scheduling
20
• Sparrow
• Work-stealing
Sparrow and high load
distributedscheduler
task
Random
placement:
Low likelihood on
finding a free node21
…
Sparrow and high load
distributedscheduler
task
Random
placement:
Low likelihood on
finding a free node22
…
High load + job heterogeneity
head-of-line blocking
Hawk work-stealing
23
Free node!!
…
Hawk work-stealing
24
…
1. Free node:
contact random
node for probes!
2. Random node:
send short tasks
reservation in queue
Hawk work-stealing
25
…
1. Free node:
contact random
node for probes!
2. Random node:
send short tasks
reservation in queue
High load high probability
of contacting node with backlog
Hawk cluster partitioning
distributedscheduler
26
Reserved nodes:
small cluster
partition
centralizedscheduler
…
No coordination,
challenge: no free
nodes for mice!
Hawk cluster partitioning
distributedscheduler
27
Reserved nodes:
small cluster
partition
centralizedscheduler
…
No coordination,
challenge: no free
nodes for mice!
Short jobs schedule anywhere.
Long jobs only in non-reserved nodes.
Hawk design summary
Hybrid scheduler:
long centralized, short distributed
Work-stealing
Cluster partitioning
28
29
Evaluation: 1. Simulation
• Sparrow simulator
• Google trace
• Vary number of nodes to vary cluster utilization
• Measure: Job running time
• Report 50th and 90th percentiles for short and long jobs
• Normalized to Sparrow
Simulated results: short jobs
0
0.2
0.4
0.6
0.8
1
1.2
10 20 30 40 50
Haw
k/Sp
arro
w
Number of nodes in the cluster (thousands)
50th 90th
lower better
30
Simulated results: short jobs
0
0.2
0.4
0.6
0.8
1
1.2
10 20 30 40 50
Haw
k/Sp
arro
w
Number of nodes in the cluster (thousands)
50th 90th
lower better
31
Better across the board
Simulated results: long jobs
0
0.2
0.4
0.6
0.8
1
1.2
10 20 30 40 50
Haw
k/Sp
arro
w
Number of nodes in the cluster (thousands)
50th 90th
32
lower better
Simulated results: long jobs
0
0.2
0.4
0.6
0.8
1
1.2
10 20 30 40 50
Haw
k/Sp
arro
w
Number of nodes in the cluster (thousands)
50th 90th
33
Better except under high load
lower better
Simulated results: long jobs
0
0.2
0.4
0.6
0.8
1
1.2
10 20 30 40 50
Haw
k/Sp
arro
w
Number of nodes in the cluster (thousands)
50th 90th
34
Very high utilization: partitioning
lower better
Decomposing Hawk
1. Hawk minus centralized
2. Hawk minus stealing
3. Hawk minus partitioning
(normalized to Hawk)
35
0
0.5
1
1.5
2
Hawk no centralized
50th short jobs 90th short jobs
50th long jobs 90th long jobs
Decomposing Hawk: no centralized
36
1. Hawk minus centralized
2. Hawk minus stealing
3. Hawk minus partitioning
(normalized to Hawk)
Decomposing Hawk: no stealing
37
0
0.5
1
1.5
2
Hawk no stealing
50th short jobs 90th short jobs
50th long jobs 90th long jobs
19.6
1. Hawk minus centralized
2. Hawk minus stealing
3. Hawk minus partitioning
(normalized to Hawk)
Decomposing Hawk: no partitioning
38
0
0.5
1
1.5
2
Hawk no partition
50th short jobs 90th short jobs
50th long jobs 90th long jobs
11.9
1. Hawk minus centralized
2. Hawk minus stealing
3. Hawk minus partitioning
(normalized to Hawk)
0
0.5
1
1.5
2
Hawk no centralized Hawk no stealing Hawk no partition
50th short jobs 90th short jobs 50th long jobs 90th long jobs
Decomposing Hawk summary
39
11.919.6
0
0.5
1
1.5
2
Hawk no centralized Hawk no stealing Hawk no partition
50th short jobs 90th short jobs 50th long jobs 90th long jobs
Decomposing Hawk summary
Absence of any component
reduces Hawk’s performance!
40
11.919.6
Sensitivity analysis
1. Incorrect estimates of runtime
2. Cut off long/short
3. Details of stealing
41
Sensitivity analysis
1. Incorrect estimates of runtime
2. Cut off long/short
3. Details of stealing
Bottom line: benefits of Hawk remain despite variation
See paper for details
42
Evaluation: 2. Implementation
43
Hawk daemon
Hawkscheduler
Hawk daemon
Experiment
• 100-node cluster
• Subset of Google trace
• Vary inter-arrival time to vary cluster utilization
• Measure: Job running time
• Report 50th and 90th percentile for short and long jobs
• Normalized to Sparrow
44
Short jobs
45
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1 1.2 1.4 1.6 1.8 2 2.25
Haw
k/Sp
arro
w
Inter-arrival time
real 90th simulated 90th
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1 1.2 1.4 1.6 1.8 2 2.25
Haw
k/Sp
arro
w
Inter-arrival time
real 50th simulated 50th
lower better
Long jobs
46
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1 1.2 1.4 1.6 1.8 2 2.25
Haw
k/Sp
arro
w
Inter-arrival time
real 90th simulated 90th
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1 1.2 1.4 1.6 1.8 2 2.25
Haw
k/Sp
arro
w
Inter-arrival time
real 50th simulated 50th
lower better
Implementation
47
1. Hawk works well in real cluster
2. Good correspondence
implementation/simulation
Related work
48
Centralized: Hadoop Fair Scheduler, Quincy
Eurosys’10, SOSP‘09
Two level: Yarn, Mesos
SoCC’12, NSDI’11
Distributed schedulers: Omega, Sparrow
Eurosys’12,SOSP’13
Hybrid schedulers: Mercury
Conclusion
• Hawk: hybrid scheduler
long: centralized, short: distributed
work-stealing
cluster partitioning
• Hawk provides good results for short and long jobs
• Even under high cluster utilization
49