28
Cluster Scheduler Reference: Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center NSDI’2011 Multi-agent Cluster Scheduling for Scalability and Flexibility. Berkerly techdoc EECS-2012-273. (doctoral dissertation) Omega: flexible scalable schedulers for large compute Ytao 2013.5.15

Cluster Scheduler

  • Upload
    chaney

  • View
    87

  • Download
    0

Embed Size (px)

DESCRIPTION

Cluster Scheduler. Ytao 2013.5.15. Reference: Mesos : A Platform for Fine-Grained Resource Sharing in the Data Center NSDI’2011 Multi-agent Cluster Scheduling for Scalability and Flexibility. Berkerly techdoc EECS-2012-273. (doctoral dissertation) - PowerPoint PPT Presentation

Citation preview

Page 1: Cluster Scheduler

Cluster Scheduler

Reference:Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center NSDI’2011

Multi-agent Cluster Scheduling for Scalability andFlexibility. Berkerly techdoc EECS-2012-273. (doctoral dissertation)

Omega: flexible scalable schedulers for large compute clusters EuroSys’2013

Ytao 2013.5.15

Page 2: Cluster Scheduler

Cluster Scheduler(Intro)

• Cloud computing framework varies.

• New frameworks will likely continue to merge, and no single framework will be optimal for all application.

• Cluster with multiple frameworks improves ultilization and data sharing.

Page 3: Cluster Scheduler

Problem Statement:

• In the face of increasing demand for cluster resources by diverse cluster computing applications and the growing number of machines in typical clusters,it is a challenge to design cluster schedulers that provide flexible, scalable, and effcient resource allocations

Page 4: Cluster Scheduler

Cluster SchedulerMonolithic State Scheduling(MSS)

• Traditional & popular ones: (LSF , condor Hadoop)

• Concept: a single scheduling agent process that makes all scheduling decisions sequentially

• Usage: The agent takes input about framework requirements, resource availability, and organizational policies, and computes a global schedule for all tasks

Page 5: Cluster Scheduler

Cluster SchedulerMonolithic State Scheduling(MSS 2)

• Advantage: optimal scheduling. Global!• Challenge:– Complexity: capture all framework requirements– Scalebility : New frameworks emerge– Lose framework’s own scheduling optimization

Page 6: Cluster Scheduler

Cluster Scheduler (update)

• Scalability. (response time, number of machines)• Flexibility (heterogeneous mix of job)• Usability and Maintainability(easily adapt new

types of jobs, frameworks)• Fault isolation(Minimize dependencies between

unrelated jobs)• Utilization(Achieve high cluster resource

utilization. e.g., cpu utilization, memory utilization)

Page 7: Cluster Scheduler

Partitioned State Scheduling(PSS)

• PSS: in PSS, cluster state is divided between multiple scheduling agents as non-overlapping scheduling domains

• Statically Partitioned State Scheduling (SPS): statically set cluster resources for particular frameworks.

• Dynamically Partitioned State Scheduling(DPS) : Mesos NSDI 2011

Page 8: Cluster Scheduler

Replicated State Scheduling(RSS)

• in RSS scheduling domains may overlap and optimistic consistency control is used to resolve conflicting transactions

• Omega EuroSys 2013

Page 9: Cluster Scheduler
Page 10: Cluster Scheduler

Cluster Environment

• Use of commodity servers• Tens to hundreds of thousands of servers• Heterogeneous resources• Use of commodity networks

Page 11: Cluster Scheduler

Mix workloads

• Service Jobs vs. Terminating Jobs• Service Jobs consist of a set of service tasks that conceptually are intended

to run forever, and these tasks are interacted with by means of request-response interfaces. , e.g., a set of web servers or relational database servers.

• Terminating Jobs, on the other hand, are given a set of inputs, perform some work as a function of those inputs, and are intended terminate eventually (traditional HPC cluster management only considers this)

Page 12: Cluster Scheduler

Mesos• Goal:

– Support and demonstrate multi-agent scheduling– Support fair-sharing meta-scheduling policy– Increase overall cluster utilization– Scale to tens of thousands of machines and hundreds of jobs

• Mesos aims to provide a scalable and resilient core for enabling various frameworks to efficiently share clusters.

• To define a minimal interface that enables efficient resource sharing across frameworks,

• push control of task scheduling and execution to the frameworks.

Page 13: Cluster Scheduler
Page 14: Cluster Scheduler
Page 15: Cluster Scheduler

Mesos (resource allocation)

• Resource offer strategy: spare available resource – Fairness– Priority

• Framework rejects offer if resource allocation is not satisfied. Wait for a good offer.

Page 16: Cluster Scheduler
Page 17: Cluster Scheduler

Mesos(sum)

• take advantage of short tasks to increase cluster utilization

• Two-level :Resource offer and scheduler makes it scalable.

• Good at batch jobs. How about service jobs?

Page 18: Cluster Scheduler

RSS Omega

• One of the major drawbacks of Partitioned State Scheduling is that scheduling domains must be selected before the scheduling agent performs its task-resource assignments, thereby potentially restricting the “goodness of fit” that might be achieved by the scheduling agent in its task-resource assignments.

Page 19: Cluster Scheduler
Page 20: Cluster Scheduler

Role of job manager• 1.If job queue is not empty, remove next job from job queue• 2. Sync: Begin a transaction by synchronizing private cluster state with

common cluster• 3. Schedule: Engage scheduling agent to attempt to create task-resource

assignments for all tasks in job, modifying private cluster state in the process.• 4. Submit: Attempt to commit job transaction (i.e., all task-resource

assignments for the job) from private cluster state back to common cluster state. Job transaction can succeed or fail.

• 5. Record which task-resource assignments were successfully committed to common cluster state.

• 6. If any tasks in job remain unscheduled---either because no suitable resources were found for the task during the “schedule" stage or the task-resource assignment, experienced a---insert job back into job queue to be handled again in a future transaction

Page 21: Cluster Scheduler

Role of meta scheduling agent

• Attempt to execute transactions submitted by job managers according to transaction mode settings

• Detect conflicts according to conflict detection semantics and policies

• Enforce meta-scheduling policies

Page 22: Cluster Scheduler

For each submitted job

• 1. reject task-resource assignments that would violate policies

• 2. reject task-resource assignments that conflict with previously accepted transactions

• 3. reject job all task-resource assignments in a job transaction if using all-or-nothing transaction semantics and at least one task-resource assignment was rejected

Page 23: Cluster Scheduler

Conflict mode

• Machine-granularity:• Reject task-resourse assign only if sequential

number and machine are both conflicted.

• Resource-granularity:• Reject task-resourse assign if cluster state

enter an invalid state(cpu memory)

Page 24: Cluster Scheduler

Transaction Granularity Semantics

• all-or-nothing – if a single task-resource assignment conflicts

thennone of the task-resource assignments in the transaction are applied to common cluster state

• incremental– each task-resource assignment can fail

independent of all others

Page 25: Cluster Scheduler
Page 26: Cluster Scheduler
Page 27: Cluster Scheduler

Omega (sum)

• trade of between whole cluster resource utilization and conflicts.

• By having a view of whole cluster, dynastically assign task number based on global optimal is feasible

• Meta-sheduler needs more modification

Page 28: Cluster Scheduler

• Thanks