SparrowDistributed Low-Latency Spark Scheduling
Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica
Outline
The Spark scheduling bottleneck
Sparrow’s fully distributed, fault-tolerant technique
Sparrow’s near-optimal performance
Spark Today
WorkerWorkerWorkerWorkerWorker
Worker
Spark ContextUser
1User
2User
3
Query Compilation
Storage
Scheduling
Spark Today
WorkerWorkerWorkerWorkerWorker
Worker
Spark ContextUser
1User
2User
3
Query Compilation
Storage
Scheduling
Job Latencies Rapidly Decreasing
10 min.
10 sec.
100 ms
1 ms
2004: MapReducebatch job
2009: Hive
query
2010: Dremel Query
2012: Impala query
2010:In-
memory Spark query
2013:Spark
streaming
Job latencies rapidly decreasing
Job latencies rapidly decreasing+
Spark deployments growing in size
Scheduling bottleneck!
Spark scheduler throughput:
1500 tasks / second
1 second 100100 ms
10
10 second 1000
Task DurationCluster size(# 16-core machines)
Optimizing the Spark Scheduler
0.8: Monitoring code moved off critical path
0.8.1: Result deserialization moved off critical path
Future improvements may yield 2-3x higher throughput
Is the scheduler the bottleneck in my cluster?
WorkerWorkerWorkerWorkerWorker
Worker
Cluster Scheduler
Task launch
Task completion
WorkerWorkerWorkerWorkerWorker
Worker
Cluster Scheduler
Task launch
Task completion
WorkerWorkerWorkerWorkerWorker
Worker
Cluster Scheduler
Task launch
Task completion
Scheduler
delay
Spark Today
WorkerWorkerWorkerWorkerWorker
Worker
Spark ContextUser
1User
2User
3
Query Compilation
Storage
Scheduling
Future Spark
WorkerWorkerWorkerWorkerWorker
Worker
User 1
User 2
User 3
SchedulerQuery
compilation
SchedulerQuery
compilation
SchedulerQuery
compilation
Benefits:High
throughputFault
tolerance
Future Spark
WorkerWorkerWorkerWorkerWorker
Worker
User 1
User 2
User 3
SchedulerQuery
compilation
SchedulerQuery
compilation
SchedulerQuery
compilation
Storage:
Tachyon
Scheduling with Sparrow
WorkerWorkerWorkerWorkerWorker
Scheduler
Scheduler
Scheduler
SchedulerStage
Worker
Stage
Batch Sampling
WorkerWorkerWorkerWorkerWorker
Scheduler
Scheduler
Scheduler
Scheduler
Worker
Place m tasks on the least loaded of 2m workers
4 probes (d =
2)
Queue length poor predictor of wait time
Worker
Worker
80 ms155
ms
530 ms
Poor performance on heterogeneous workloads
Stage
Late Binding
Worker
Worker
Worker
Worker
Worker
Scheduler
Scheduler
SchedulerScheduler
Worker
Place m tasks on the least loaded of dm workers
4 probes (d =
2)
Late Binding
Scheduler
Scheduler
SchedulerScheduler
Place m tasks on the least loaded of dm workers
4 probes (d =
2)
Worker
Worker
Worker
Worker
Worker
Worker
Stage
Late Binding
Scheduler
Scheduler
SchedulerScheduler
Place m tasks on the least loaded of dm workers
Worker
requests
task
Worker
Worker
Worker
Worker
Worker
Worker
Stage
What about constraints?
Stage
Per-Task Constraints
Scheduler
Scheduler
Scheduler
Scheduler
Worker
Worker
Worker
Worker
Worker
Worker
Probe separately for each task
Technique Recap
Scheduler
Scheduler
Scheduler
Scheduler
Batch sampling
+Late binding
+Constraints
WorkerWorkerWorkerWorkerWorker
Worker
How well does Sparrow perform?
How does Sparrow compare to Spark’s native scheduler?
100 16-core EC2 nodes, 10 tasks/job, 10 schedulers, 80% load
TPC-H Queries: Background
TPC-H: Common benchmark for analytics workloads
Sparrow
Spark
Shark: SQL execution engine
TPC-H Queries
100 16-core EC2 nodes, 10 schedulers, 80% load
95
75
25
50
Percentiles
5
Within 12% of ideal
Median queuing delay of 9ms
Policy Enforcement
WorkerHigh Priority
Low Priority WorkerUser A (75%)
User B (25%)
Fair SharesServe queues using
weighted fair queuing
PrioritiesServe queues based on strict priorities
Weighted Fair Sharing
Fault Tolerance
Scheduler 1
Scheduler 2
Spark Client 1 ✗Spark
Client 2
Timeout: 100msFailover: 5ms
Re-launch queries: 15ms
Making Sparrow feature-complete
Interfacing with UI
Delay scheduling
Speculation
(2) Distributed,
fault-tolerant scheduling
with Sparrow www.github.com/radlab/sparrow
Scheduler
Scheduler
Scheduler
Scheduler
WorkerWorkerWorkerWorkerWorker
Worker
(1) Diagnosing a
Spark scheduling bottleneck