61
A Travel Through Mesos Episode I

A Travel Through Mesos

Embed Size (px)

Citation preview

Page 1: A Travel Through Mesos

A Travel Through MesosEpisode I

Page 2: A Travel Through Mesos

1.What is Mesos?

An introduction to Mesos and its architecture

Page 3: A Travel Through Mesos

8 Kg

4kg?

1 Kg

2 Kg

We Want to Buy Oranges...

Page 4: A Travel Through Mesos

8 Kg

4kg?

1 Kg

2 Kg

We Need to Try Until There’re Enough

Page 5: A Travel Through Mesos

8 Kg

4kg?

1 Kg

2 Kg

One Big Shop Instead of Three!!

Page 6: A Travel Through Mesos

What is Mesos?

Resource Manager

Mesos abstracts computing resources from nodes in the datacenter.

“Program against your datacenter like it’s a single pool of resources”

Different workloads

Mesos is a platform for sharing a cluster between applications. It can scale up to 10,000s of nodes.

Uses containerization

Workloads are launched in containers (either LXC or Docker), providing an isolation level.

Page 7: A Travel Through Mesos

A Distributed Systems Kernel

Just like OS manages resource utilization allowing concurrent use of the limited resources by multiple applications, Mesos applies this principle to a whole cluster of machines to provide resource management and scheduling across the cluster.

Page 9: A Travel Through Mesos

Zookeeper

Masters Agents

Mesos Architecture

Page 10: A Travel Through Mesos

Master Nodes● Source of truth of the cluster status

(in memory - high memory usage)● Send resource offers to the

applications.● Host primary UI● High availability with active-pasive

replication using Zookeeper for leader election and Paxos for state sharing.

Zookeeper

Page 11: A Travel Through Mesos

Agent Nodes● Launch containers running

application tasks.● Advertise their available resources

to the master.● Host an UI for the launched

containers.● Manage status updates from the

running tasks and they’re in charge of communication with the master.

● Known as slaves until 0.28

Page 12: A Travel Through Mesos

MESOS IS NOT AN OSThe Kernel comparison can be confusing: each node

has an OS installed and Mesos runs as a service daemon on it

Page 14: A Travel Through Mesos

What is a Resource?

Types

● SCALAR (1024.0)● RANGE ([1-10])● SET ({elem1, elem2})

Predefined resources

● cpus● mem● disk● ports

Everything an application task uses for doing its work

Page 15: A Travel Through Mesos

Resources are Defined by Agent

● Each Mesos agent is configured with the resources it has.

● The agent continuously sends updates to the master with its available resources.

cpu 8.0

mem 4096.0

disk 1024.0

ports [9000-65536]

cpu 16.0

mem 8192.0

disk 512.0

ports [9000-10000]

Page 16: A Travel Through Mesos

CPUs ResourceRepresents how many CPU cores are available.

● Can be specified in fractions (0.5 CPUs)

● By default, Mesos configures each agent with the number of cores in the processor.

● Mesos enforces it by using CPU shares (CPU time per second)

● It’s a guaranteed minimum (if there’s more CPU time available, it could be used)Example

cpus=24

Page 17: A Travel Through Mesos

Memory ResourceRepresents how many MB of memory are available.

● By default, Mesos configures each agent with 1 GB or 50% of detected memory, whichever is smaller. (Leave memory for the OS!!)

● It’s a strictly preallocated resource (you get what you reserve)

● That makes it a critical resource (you have to get the right amount of memory for your tasks, otherwise they could get killed if they try to use too much)

Example

mem=1024.0

Page 18: A Travel Through Mesos

Disk ResourceRepresents how many MB of disk space are available.

● By default, Mesos configures each agent with 5 GB or 50% of detected disk, whichever is smaller

● If affects the container’s sandbox.

● Mesos, by default, doesn’t enforce it (it’s not really allocated, a task can use as much space as it wants). Setting --enforce_container_disk_quota changes that behaviour.

Example

disk=2048.0

Page 19: A Travel Through Mesos

Ports ResourceRepresents the available ports to listen in the agent.

● It’s a RANGE.● By default, Mesos configures

each agent to expose port range 31000–32000.

● Port usage is not enforced by Mesos.

● However, it’s important to reserve the ports a task must listen to, to be sure to avoid conflicts (only one process can be listening in a port at a time).

Example

ports=[9000-9300]

Page 20: A Travel Through Mesos

Custom Resources

● Mesos allows to define any custom resource.

● Remember that a resource is something which can be exclusively reserved.

● There’s no need to enforce the resource allocation (see disk or ports).

Examples

● network_bandwith=1000.0● bugs={bug1, bug2}● oranges=1500.0

This resources will be offered to applications, which need to be able to manage it if they want to use it.

Page 21: A Travel Through Mesos

What is an Attribute?

Types

● SCALAR (1024.0)● RANGE ([1-10])● SET ({elem1, elem2})

● They are not allocated, only passed along with the resources to the applications in offers.

● They are a helper for the scheduling decisions.

Arbitrary key-value data that serves as metadata about the machine running the agent.

Example

● rack_id=eu-1● os=ubuntu

Page 23: A Travel Through Mesos

Leader Agents

Framework Architecture

Scheduler

Executors - Tasks

Register

Offer

Accept and Launch

Reject

Page 24: A Travel Through Mesos

What is a Framework? An application that runs on Mesos.

● Based in the master-worker design.

● It’s ad-hoc for the application business model

Two components:

● Scheduler● Executors

Page 25: A Travel Through Mesos

Scheduler

● It’s the brain of the framework.

● Registers with Mesos and receives resource offers.

● Launches tasks for the application when it has been offered with enough resources, or according another scheduling logic.

● We could see it as an intermediate between the application logic and the Mesos layer.

● It’s developed for each application. Mesos provides an API for doing it (HTTP and native)

Page 26: A Travel Through Mesos

Executor

● Launched by the scheduler when it has work to do (worker).

● It will receive tasks to do from the scheduler and will send back status updates (it’s connected with Mesos too).

● Act as a process container that runs tasks.

● Mesos provides an executor API also, but, given that it’s more general purpose than the scheduler, Mesos provides a CommandExecutor that should be enough for most of the workloads.

Page 27: A Travel Through Mesos

Task

● The unit of work in Mesos, the workload that a scheduler wants to run in the cluster.

● Runs inside an executor.● An Executor can run more

than one task (not common).

● A task has a definition of the needed resources that will be allocated.

● Mesos will allocate to the container enough resources for the bunch of tasks launched plus the executor. (and will resize it dynamically if more tasks are added).

Page 29: A Travel Through Mesos

What is an Offer?

● Used by Mesos to allocate resources to a framework.

● Leading master send offers to the frameworks’ schedulers.

Page 30: A Travel Through Mesos

What’s Inside an Offer?

● Resources offered.● Affected agent

(slaveId).● Attributes of the

agent.

cpu 8.0

mem 4096.0

disk 1024.0

ports [9000-65536]

hostname agent-1

rack_id EU-I-1

slaveId asd1323...

Page 31: A Travel Through Mesos

How’re Offers Sent to Frameworks?

● Masters run the resource allocator module.

● This module decides to which framework send an offer using an algorithm called DRF (Dominant resource fairness).

● The allocation module is pluggable.

● The algorithm tries to maximize the minimal dominant share across frameworks. (Considering their dominant resource)

● DRF orders frameworks and then the offer is sent to them in order one at a time.

Page 32: A Travel Through Mesos

What to Do with an Offer?

ACCEPT

● Launch a task with resources of the offer (only the needed, not all)

● Perform a reservation.● Create a persistent volume.

REJECT

● Don’t do anything with an offer.

● Why? When Mesos sends an offer to a scheduler for the Allocator the resources are allocated to the framework. (framework penalized in the DRF)

Page 33: A Travel Through Mesos

More About Offers

● Different offers of the same agent can be grouped to get more resources (when accepting an offer).

● Several tasks can be launched with the same offer (as long as there are enough resources)

● Mesos tries to send offers as big as possible.

Page 34: A Travel Through Mesos

Two Level Scheduling

Master manages cluster resources and decides to which framework send an offer.

Schedulers accept or reject offers according to the concrete application needs.

Page 35: A Travel Through Mesos

A Travel Through MesosEpisode II

Page 37: A Travel Through Mesos

What’s a Role?

● Like a group of frameworks.● Used to ensure that certain resources are only offered

to certain frameworks (only resources allocated to a role are offered to a framework, with an exception).

● Each framework registers with Mesos with a role (by default, * )

Page 38: A Travel Through Mesos

* IS A ROLE, NOT ANYThe default role (*) doesn’t mean that any role is

accepted, is a concrete role (Bad name…)

Page 39: A Travel Through Mesos

More on RolesAny role is allowed

Frameworks can register with any role name, unless the flag --roles is set in the Mesos masters with a concrete list.

Resources allocated to * are available to all roles

By default, resources are allocated to the default role (*). All the frameworks, no matter their role, will receive offers of resources allocated to ‘*’.

Roles can use weights

Weights can be assigned to roles, allowing to indicate in DRF that certain role has to get a higher amount of resources than other.

Page 40: A Travel Through Mesos

7.Reservationhttp://mesos.apache.org/documentation/latest/reservation/

Page 41: A Travel Through Mesos

What’s a Reservation?

The way to allocate resources in an agent to

specific roles

Page 42: A Travel Through Mesos

Static Reservation

While configuring the exposed resources in an agent, those resources could be statically reserved to concrete roles.

cpu 4.0

mem 2048.0

disk(*) 512.0

ports [9000-65536]

cpu(pro) 4.0

mem(pro) 2048.0

disk(pro) 512.0

Page 43: A Travel Through Mesos

Static Reservation

Not recommended

Static reservations are only maintained for backwards compatibility.

Restart needed

To change the amount of reserved resources it’s needed to modify the agent configuration and restart it.

By default, resources are allocated to the default role

Page 44: A Travel Through Mesos

Dynamic Reservation

Resources can be reserved and unreserved

In runtime, resources can be reserved to a role, and later they can be unreserved when no task is using that resources.

Using an HTTP endpoint

Dynamic reservation is managed by operators using HTTP endpoints for reserve and unreserve.

Using an acceptOffers operation

Schedulers can reserve/unreserve resources when accepting an offer by using two special operations.

Page 46: A Travel Through Mesos

Sandbox (Disk Resource)

Working directory

A Sandbox is a temporary directory given to each executor and set as working directory for it. It’s accessible from outside the container.

Stores logs and other data

It contains the stdout and stderr of the executor. Besides that it contains the fetched files (URI) and files created by the task.

Garbage collected

This directory is cleaned from the agent system once a configurable period of time has passed.

Page 47: A Travel Through Mesos

Persistent Volumes

● Created from disk resources, they live outside the executor’s sandbox and will persist on the agent.

● When a task using them finishes, they are offered back without losing data.

● Used for stateful services.

Page 48: A Travel Through Mesos

More on Persistent Volumes

● Created over previously reserved disk resources.

● No more than one task can have the volume at the same time.

● To unreserve the disk resources associated with a persistent volume, it’s needed to destroy the volume first

● Created/destroyed using HTTP endpoints or via acceptOffers in the Scheduler.

● Associated to a role (volume can be offered back to any framework in the role).

Page 49: A Travel Through Mesos

Type of Disk Resources

ROOT

Maps to the main operating system storage drive. It’s the default option.

MOUNT

Auxiliary disks provided by operators which maps to a mount point in the host OS. When reserved, all the disk is reserved (no matter the reserved size).

PATH

Auxiliary disk resource created by operator, which maps a directory in the host OS to a disk resource. Usually used to carve up a mounted disk in smaller chunks.

Page 51: A Travel Through Mesos

Container?

Task isolation

Contain task

resources

Control task

resource usage

Run in different

environm.

Page 52: A Travel Through Mesos

Docker Containerizer

● Works with Docker images (task/executor).

● Uses docker-engine (docker run….).

● Needs docker installed in each agent. (external dependency…)

Mesos roadmap is unifying containerizers and stop its

support.

Page 53: A Travel Through Mesos

Mesos Containerizer

● Runs commands of the host OS.

● Runs Docker/AppC Images (Universal Containerizer).

● Uses LXC.

● Based on pluggable isolators, which are used for isolating resources from other containers.

● Examples: cgroups/cpu, cgroups/mem, docker/volume, disk/du, docker/runtime, network/cni, etc.

Tip:sudo nsenter --mount --uts --ipc --net --pid

--target <PID_CONTAINER>

Page 54: A Travel Through Mesos

Docker on Mesos Containerizer

● A Docker image represents a filesystem.

● Mesos pulls the image and extracts the filesystem.

● Using pivotroot, the container is launched over that filesystem.

● Isolation is done by the Mesos containerizer (no docker-engine dependency).http://events.linuxfoundation.org/sites/events/files/sli

des/Mesos%20and%20Containers.pdf

Page 55: A Travel Through Mesos

Docker on Mesos Containerizer

BE CAREFUL WITH PERMISSIONS

User namespace matches with the agent (the only way to use an user created in the Dockerfile is to have an user in the agent with the same name, uid and gid).

BRIDGE NETWORK IS NOT SUPPORTED

When you bind to a port, by default you do it on the agent host stack (if you’re not using another isolator like network/cni for using virtual networks and IP per container).

Page 56: A Travel Through Mesos

10. More AspectsExternal volumes, oversubscription, checkpointing

Page 57: A Travel Through Mesos

External Volumes

● Uses dvdcli and a Docker Volume plugin, for instance REX-Ray or GlusterFS (dependency).

● Mounts an external volume from a storage provider to the task container (Cinder, Amazon EBS, etc).

● Instead of binding a task data to an agent (persistent volumes) it manages storage outside the agents.

Page 58: A Travel Through Mesos

Oversubscription

● Frameworks can use resources allocated to a framework but temporarily unused.

● These resources can be revoked by Mesos in any moment.

● A QoS module ensures that the framework to which these resources belong has not impact in its performance.

Page 59: A Travel Through Mesos

Checkpointing

For agent recovery, a Framework can enable checkpointing to write its state to disk regularly.

If the Mesos Agent is stopped (a failure or upgrade), tasks of checkpointed frameworks continue running (otherwise, all running tasks are killed).

Page 60: A Travel Through Mesos

Hands On: Let’s make a framework

https://github.com/roberveral/mesos-gocd