24
Varys Efficient Coflow Scheduling Mosharaf Chowdhury, Yuan Zhong, Ion Stoica UC Berkeley

Varys Efficient Coflow Scheduling Mosharaf Chowdhury, Yuan Zhong, Ion Stoica UC Berkeley

Embed Size (px)

Citation preview

Page 1: Varys Efficient Coflow Scheduling Mosharaf Chowdhury, Yuan Zhong, Ion Stoica UC Berkeley

VarysEfficient Coflow Scheduling

Mosharaf Chowdhury,

Yuan Zhong, Ion Stoica UC Berkeley

Page 2: Varys Efficient Coflow Scheduling Mosharaf Chowdhury, Yuan Zhong, Ion Stoica UC Berkeley

PerformanceFacebook analytics jobs spend 33% of their runtime in communication1

As in-memory systems proliferate,the network is likely to become the primary

bottleneck

1. Managing Data Transfers in Computer Clusters with Orchestra, SIGCOMM’2011

Communication is Crucial

Page 3: Varys Efficient Coflow Scheduling Mosharaf Chowdhury, Yuan Zhong, Ion Stoica UC Berkeley

OptimizingCommunicatio

nPerformance:NetworkingApproach

“Let systems figure it out”

FlowA sequence of packets between two endpoints

Independent unit of allocation, sharing, load balancing, and/orprioritization

Page 4: Varys Efficient Coflow Scheduling Mosharaf Chowdhury, Yuan Zhong, Ion Stoica UC Berkeley

Spark 1.0.1 6

# Comm. Params*

10

20

Hadoop 1.0.4

YARN2.3.0

OptimizingCommunicatio

nPerformance:

SystemsApproach

“Let users figure it out”*Lower bound. Does not include many parameters that can indirectly impact communication; e.g., number of reducers etc. Also excludes control-plane communication/RPC parameters.

Page 5: Varys Efficient Coflow Scheduling Mosharaf Chowdhury, Yuan Zhong, Ion Stoica UC Berkeley

OptimizingCommunicatio

nPerformance:NetworkingApproach

“Let systems figure it out”

OptimizingCommunicatio

nPerformance:

SystemsApproach

“Let users figure it out”

A collection of parallel flows

Distributed endpoints

Each flow is independent

Completion time depends on the last

flow to complete

Page 6: Varys Efficient Coflow Scheduling Mosharaf Chowdhury, Yuan Zhong, Ion Stoica UC Berkeley

A collection of parallel flows

Distributed endpoints

Each flow is independent

Completion time depends on the last

flow to complete

Coflow1

1. Coflow: A Networking Abstraction for Cluster Applications, HotNets’2012

Page 7: Varys Efficient Coflow Scheduling Mosharaf Chowdhury, Yuan Zhong, Ion Stoica UC Berkeley

1

2

N

1

2

N

.

.

.

.

.

.Completion time

depends on the last flow to complete

A collection of parallel flows

Distributed endpoints

Each flow is independent

Coflow1How to schedule coflows …

… for faster#1 completion of coflows?

… to meet#2 more deadlines?

DC Fabric

Page 8: Varys Efficient Coflow Scheduling Mosharaf Chowdhury, Yuan Zhong, Ion Stoica UC Berkeley

Varys Enables coflows in data-intensive clusters

1. Simpler Frameworks

Zero user-side configuration using a simple coflow API

2. Better performance

Faster and more predictable transfers through coflow scheduling

Page 9: Varys Efficient Coflow Scheduling Mosharaf Chowdhury, Yuan Zhong, Ion Stoica UC Berkeley

Benefits of

time2 4 6 time2 4 6 time2 4 6

Coflow1 comp. time = 6Coflow2 comp. time = 6

Coflow1 comp. time = 6Coflow2 comp. time = 6

Fair Sharing Flow-level Prioritization1,2 The Optimal

Coflow1 comp. time = 3Coflow2 comp. time = 6

L1

L2

L1

L2

L1

L2

1. Finishing Flows Quickly with Preemptive Scheduling, SIGCOMM’2012.2. pFabric: Minimal Near-Optimal Datacenter Transport, SIGCOMM’2013.

Link 1

Link 2

3 Units

Coflow 1

6 Units

Coflow 2

3-ε Units

Inter-Coflow Scheduling

Page 10: Varys Efficient Coflow Scheduling Mosharaf Chowdhury, Yuan Zhong, Ion Stoica UC Berkeley

time2 4 6

Coflow1 comp. time = 6Coflow2 comp. time = 6

Fair Sharing

L1

L2

time2 4 6

Coflow1 comp. time = 6Coflow2 comp. time = 6

Flow-level Prioritization1

L1

L2

time2 4 6

The Optimal

Coflow1 comp. time = 3Coflow2 comp. time = 6

L1

L2

Benefits of

Concurrent Open Shop Scheduling1

• Tasks on independent machines

• Examples include job scheduling and caching blocks

• Use a ordering heuristic

Link 1

Link 2

3 Units

Coflow 1

6 Units

Coflow 2

3-ε Units

1. A note on the complexity of the concurrent open shop problem, Journal of Scheduling, 9(4):389–396, 2006

Inter-Coflow Scheduling

1. Finishing Flows Quickly with Preemptive Scheduling, SIGCOMM’2012.2. pFabric: Minimal Near-Optimal Datacenter Transport, SIGCOMM’2013.

Page 11: Varys Efficient Coflow Scheduling Mosharaf Chowdhury, Yuan Zhong, Ion Stoica UC Berkeley

Inter-Coflow Scheduling

3

2

1

3

2

1

Ingress Ports(Machine Uplinks)

Egress Ports(Machine Downlinks)

DC Fabric

Concurrent Open Shop Scheduling1

• Tasks on independent machines

• Examples include job scheduling and caching blocks

• Use a ordering heuristic

Link 1

Link 2

3 Units

Coflow 1

6 Units

Coflow 2

3-ε Units

1. A note on the complexity of the concurrent open shop problem, Journal of Scheduling, 9(4):389–396, 2006

3

6

3-ε

Page 12: Varys Efficient Coflow Scheduling Mosharaf Chowdhury, Yuan Zhong, Ion Stoica UC Berkeley

Inter-Coflow Scheduling

3

2

1

3

2

1

Ingress Ports(Machine Uplinks)

Egress Ports(Machine Downlinks)

DC Fabric

Concurrent Open Shop Scheduling• Flows on dependent links• Consider ordering and

matching constraints

^

with coupled resources

Link 1

Link 2

3 Units

Coflow 1

6 Units

Coflow 2

3-ε Units

3

6

3-ε

is NP-Hard

Characterized COSS-CRProved that list scheduling might not result in optimal solution

Page 13: Varys Efficient Coflow Scheduling Mosharaf Chowdhury, Yuan Zhong, Ion Stoica UC Berkeley

VarysEmploys a two-step algorithm to minimize coflow completion times

1. Ordering heuristic Keeps an ordered list of coflows to be scheduled, preempting if needed

2. Allocation algorithm

Allocates minimum required resources to each coflow to finish in minimum time

Page 14: Varys Efficient Coflow Scheduling Mosharaf Chowdhury, Yuan Zhong, Ion Stoica UC Berkeley

Ordering Heuristic

1

2

3

1

2

3

4

2

3

4

4

9

P3

P2

Time

P1

C2 ends

C1 ends

5 9

P3

P2

Time

P1

C1 ends

C2 ends

4

C1 C2

Length 3 4

Width 2 3

Size 5 12

Bottleneck

5 4

Shortest-First

Narrowest-First

Smallest-First

Smallest-Effective-Bottleneck-First

: SEBF

Page 15: Varys Efficient Coflow Scheduling Mosharaf Chowdhury, Yuan Zhong, Ion Stoica UC Berkeley

Ordering Heuristic

1

2

3

1

2

3

4

2

3

4

4

9

P3

P2

Time

P1

C1 ends

C2 ends

4

C1 C2

Length 3 4

Width 2 3

Size 5 12

Bottleneck

5 4

Shortest-First

Narrowest-First

Smallest-First

Smallest-Effective-Bottleneck-First

9

P3

P2

Time

P1

C2 ends

C1 ends

5

: SEBFAllocation Algorithm

Page 16: Varys Efficient Coflow Scheduling Mosharaf Chowdhury, Yuan Zhong, Ion Stoica UC Berkeley

Allocation Algorithm

A coflow cannot finish before its very last flow

Finishing flows faster than the bottleneck cannot decrease a coflow’s completion time

Ensure minimum

allocation to each flow for

it to finish at the

desired duration;

for example, at bottleneck’s completion,

orat the deadline.

MADD

Page 17: Varys Efficient Coflow Scheduling Mosharaf Chowdhury, Yuan Zhong, Ion Stoica UC Berkeley

VarysEnables frameworks to take advantage of coflow scheduling

1. Exposes the coflow API2. Enforces through a centralized scheduler

Page 18: Varys Efficient Coflow Scheduling Mosharaf Chowdhury, Yuan Zhong, Ion Stoica UC Berkeley

1. Does it improve performance?2. Can it beat non-preemptive

solutions? YES

EvaluationA 3000-node trace-driven simulation matched against a 100-node EC2 deployment

Page 19: Varys Efficient Coflow Scheduling Mosharaf Chowdhury, Yuan Zhong, Ion Stoica UC Berkeley

Faster Jobs

95th

Avg. 1.85X 1.25X

1.74X 1.15X

Comm. Improv. Job Improv.

2.50X3.16X

2.94X3.84X

Comm. Heavy1

1. 26% jobs spend at least 50% of their duration in communication stages.

Page 20: Varys Efficient Coflow Scheduling Mosharaf Chowdhury, Yuan Zhong, Ion Stoica UC Berkeley

Better than Non-Preemptive Solutions

95th

Avg. 5.65X

7.70X

w.r.t. FIFO1

WhatAbout

PerpetualStarvation

NO?

1. Managing Data Transfers in Computer Clusters with Orchestra, SIGCOMM’2011

Page 21: Varys Efficient Coflow Scheduling Mosharaf Chowdhury, Yuan Zhong, Ion Stoica UC Berkeley

FourChallenges

#3Decentralized

Varys

Master failureLow-latency analytics

#1Coflow

Dependencies

Multi-stage jobsMulti-wave stages

#2Unknown Flow

Information

Pipelining between stages

Task failures and restarts

in the Context of Multipoint-to-Multipoint Coflows

Page 22: Varys Efficient Coflow Scheduling Mosharaf Chowdhury, Yuan Zhong, Ion Stoica UC Berkeley

#4Theory Behind

“Concurrent Open Shop Scheduling with Coupled

Resources”

Page 23: Varys Efficient Coflow Scheduling Mosharaf Chowdhury, Yuan Zhong, Ion Stoica UC Berkeley

• Consolidates network optimization of data-intensive frameworks• Improves job performance by addressing the COSS-CR

problem• Increases predictability through informed admission

control

VarysGreedily schedules coflows without worrying about flow-level metrics

http://varys.net/Mosharaf Chowdhury - @mosharaf

Page 24: Varys Efficient Coflow Scheduling Mosharaf Chowdhury, Yuan Zhong, Ion Stoica UC Berkeley