Meson: Building a Machine Learning Orchestration Framework on Mesos

Preview:

Citation preview

Building a Machine Learning Orchestration Framework on Mesos

0

Antony Arokiasamy | Kedar Sadekar | Personalization Infrastructure

1

Help members find content to watch and enjoy to maximize member satisfaction and retention

Everything is a Recommendation2

Recommendations are driven by Machine Learning

Ranking

Row

s

Machine Learning Pipeline3

User Selection

Feature Generation

Model Validation

PublishModel

Model Training

Machine Learning Pipeline Challenges

4

• Innovation• Heterogeneous Environments

• Spark• Native Support

• Separate Orchestration and Execution

• Multi Tenancy

• ML Constructs• Parameter Sweep – 30k Dockers

Meson Workflow System in 30 seconds

5

• General Purpose Workflow Orchestration and Scheduling framework• Delegates execution to resource managers like Mesos

• Optimized for Machine Learning Pipelines and Visualization

• Checkout the Blog• bit.ly/mesonws or techblog.netflix.com

Meson Architecture6

Mesos Usage7

• Executors• Custom Executor• Executor Caching• Executor Cleanup

• Framework Messages

• Resource Attributes• Multi Tenancy• Cluster Management

Custom Executors8

• Reuse Executor Process• e.g. Spark• Executor Id = <unique id>

• Two Way Communication

Executor Caching9

Executor Caching10

• Executor Id = hash(<something unique for the class of executors>)• E.g. Executor Id = hash(classpath)

• Match with Executor Id in Offer

offers

accept

Executor Cleanup11

• Expiration

• Explicitly keep track of Executors

Framework Messages12

Multi Tenancy13

• Resource Attributes • spark.mesos.constraints

Cluster Management14

• Red-Black software updates

• Scale up/Scale down

Mesos Cluster15

• 100s of Concurrent Jobs

• 700 Nodes

• 5000 Cores

• 25 TB Memory

• Apps: Meson Workflow System, Spark and Dockers

• Few smaller clusters

What's Next16

• Fenzo Scheduler - https://github.com/Netflix/Fenzo• Bin Packing, Auto Scaling, Host Attributes/Constraints, Groups, etc

• Cook Scheduler - https://github.com/twosigma/Cook• Multi tenant Spark Scheduler

• Open Source Meson Workflow System

17

Antony Arokiasamy

Kedar Sadekar

@aasamy

/aasamy

aarokiasamy@netflix.com

@kedar_sadekar

/kedar-sadekar

ksadekar@netflix.com

Recommended