24
Aaron Carey Production Engineer - ILM London [email protected]

Dynamic Scheduling - Federated clusters in mesos

Embed Size (px)

Citation preview

Page 1: Dynamic Scheduling - Federated clusters in mesos

Aaron CareyProduction Engineer - ILM [email protected]

Page 2: Dynamic Scheduling - Federated clusters in mesos

Federated Clusters in Mesos

Page 3: Dynamic Scheduling - Federated clusters in mesos

Why?

Page 4: Dynamic Scheduling - Federated clusters in mesos

Who wins?

Page 5: Dynamic Scheduling - Federated clusters in mesos

Why?Sites in 3 time zones

Need to share render resources

Went through a project to prepare for cloud burst rendering

Renders mostly come at night (mostly)

What happens when our farm is full?

Can we burst to our other locations?

Page 6: Dynamic Scheduling - Federated clusters in mesos

Approaches

Page 7: Dynamic Scheduling - Federated clusters in mesos

Huawei DesignLed by the master and gossip protocol

Includes policy model

Master decides if a framework gets an offer

Master is in control

Based on two master plugins, consul deployment, gossip protocol

https://www.youtube.com/watch?v=kqyVQzwwD5E

http://www.slideshare.net/mKrishnaKumar1/federated-mesos-clusters-for-global-data-center-designs

Page 8: Dynamic Scheduling - Federated clusters in mesos

Our hack designNeeds to be simple

Decisions made in the framework

Framework connects to all masters

Masters don’t care about each other

We don’t need a policy engine

Keep code out of the Master

Page 9: Dynamic Scheduling - Federated clusters in mesos

Diversion...

Page 10: Dynamic Scheduling - Federated clusters in mesos

A note on scheduling...Historically, schedulers in VFX are tyrannical micro managers

Full knowledge of the whole cluster and all tasks allow better informed decisions

In Mesos you only know what the Master tells youNo knowledge of other frameworks

At the mercy of the Master

Offers only deal in the presentWe could hoard all offers we get, but we want to play nice

We don’t know if a better offer is just around the corner

Page 11: Dynamic Scheduling - Federated clusters in mesos

Making dynamic scheduling decisions...Can we intelligently schedule tasks without knowing the whole cluster state?

Page 12: Dynamic Scheduling - Federated clusters in mesos

Schedule penaltyEvery datacentre has a penalty for scheduling a task

Golf rules

Penalty = Interactivity Penalty + Data Penalty + Utilisation Penalty

Page 13: Dynamic Scheduling - Federated clusters in mesos

Interactive PenaltyFramework regularly checks current latency to connected datacentres

Lo = maximum latency for interactive applications (around 35ms)

Lm = latency for datacentre m

I = 0 for non interactive, 1 for interactive

Page 14: Dynamic Scheduling - Federated clusters in mesos

Data Penalty

Total Input Data Required - Input Data Already at Location

Bandwidth

Page 15: Dynamic Scheduling - Federated clusters in mesos

Utilisation PenaltyFramework checks current utilisation of datacentres

Utarget = target utilisation of datacentre (e.g. 95%)

Um = utilisation of datacentre m

Page 16: Dynamic Scheduling - Federated clusters in mesos

Time PenaltyOptional

Penalty decreases based on length of time in the queue

Page 17: Dynamic Scheduling - Federated clusters in mesos

Putting it togetherSet a cost threshold above which jobs don’t run

Tasks will get dispatched to the datacentre with the lowest cost

Thresholding can ensure jobs wait for optimum resources without consuming all offers

Page 18: Dynamic Scheduling - Federated clusters in mesos

Where were we?

Page 19: Dynamic Scheduling - Federated clusters in mesos

Framework

Page 20: Dynamic Scheduling - Federated clusters in mesos

System

Page 21: Dynamic Scheduling - Federated clusters in mesos

What’s Next?

Page 22: Dynamic Scheduling - Federated clusters in mesos

Peer to Peer vs Hierarchical

Page 23: Dynamic Scheduling - Federated clusters in mesos

Get involved!Proposal for federated clusters:

https://docs.google.com/document/d/1U4IY_ObAXUPhtTa-0Rw_5zQxHDRnJFe5uFNOQ0VUcLg/edit?usp=sharing

Federated Marathon:

https://github.com/schibsted/triathlon

Current Discussion (favouring hierarchical design):

[email protected]

Page 24: Dynamic Scheduling - Federated clusters in mesos

We’re [email protected]