From pets to cattle - powered by CoreOS, docker, Mesos & nginx

Preview:

Citation preview

MUC_DATE_THEME_INITIALS

From pets to cattle –powered by CoreOS, docker, Mesos & nginxThomas Schneider

MUC_DATE_THEME_INITIALS 2

Ι IntroΙ Pets vs. cattleΙ Where we started ...Ι Learnings & dead-endsΙ ... where we ended upΙ Technology stackΙ Showtime, baby!

Agenda

MUC_DATE_THEME_INITIALS 3

Intro

Ι Background§ ~10 years experience in IT§ Mainly software development§ Some system engineering, automation & operations§ @zooplus:

§ Lead engineer, build & runtime platform§ Trying to encourage DevOps culture

Ι Get in touch§ thomas.schneider@zooplus.com§ github.com/schneidexe

MUC_DATE_THEME_INITIALS 4

Pets vs. cattle

MUC_DATE_THEME_INITIALS 5

Where we started …

Ι Done via shellΙ .sh scripts, scp, rsyncΙ Mainly .jar/.war filesΙ 2 out of 3 apps

deployed differentlyΙ Locally != dev != prodΙ Dev create & test, but

no idea how it runsΙ Ops deploy & run, but

have no idea what

Ι Monolithic code-baseΙ CI partly automatedΙ Manually configured

teamcityΙ 10+ all (=one) purpose

agentsΙ Up to 10+ builds in

parallelΙ Maven binary repository

(nexus)Ι Source repository

(subversion)

Build Deploy

Ι 300+ apps/jobsΙ Lot’s of bare-metalΙ Static environmentΙ Servers used by multiple

apps, upgrades blockedΙ Outdated machinesΙ Monitoring: ELK,

collectd (hard to get all)Ι High error noise levelΙ Ops == FirefightersΙ Devs blocked by InfraΙ Biz scaled, errors $$$

Run

Application stack is limited and tailored to environment. Build and release process slow, error-prone and inflexible. Intransparent infrastructure, extension is slow & painful.

MUC_DATE_THEME_INITIALS 6

Where we started …

MUC_DATE_THEME_INITIALS 7

Learnings & dead-ends

Ι Handle diversity§ Horizontal scaling is not our primary issue§ Handling tons of different apps it is!

Ι Immutability & automation§ Everything is (made from) code, build immutable artifacts§ Automation is not only for speed but for knowledge§ Re-creation over re-configuration

Ι Bring the pain forward§ Do not do it all by yourself, get infra and devs (and biz) on-board§ Think reverse: from prod to dev

Ι Keep it simple and fast§ Get buy-in from devs, they have to understand it§ Teams had/have to adopt several times (after all that years of stable infra!)§ Simple is fun, Speed is fun - if it’s fun people will use it

MUC_DATE_THEME_INITIALS 8

Learnings & dead-ends

Ι Docker helps, but does not solve all of your problems§ It’s not just about ‘docker build’ & ‘docker run’§ Think about scaling, monitoring and management of your apps on prod

Ι Stay focused§ What do YOU need? (not Google, Netflix & co)§ There are new products around every week, do 2-3 POCs, then stick with your descision§ Keep it modular, have a plan in mind how to migrate/replace parts§ Be not scared of throwing away a few things

Ι Persistence§ Try not to mix stateful and stateless things, externalize data§ A database might be more a pet than cattle

MUC_DATE_THEME_INITIALS 9

Learnings & dead-ends

Ι Puppet§ Not the best tool for deploying your app§ Re-configuration can be tricky (systems become almost identical)§ Not immutable (unless you really nail every dependency)

Ι Fleet (0.9.x)§ nice features like side-kicks, low overhead§ no resource management, too low level§ stability issues

Ι Mixing frameworks on Mesos agent nodes§ Isolation can get tough if patterns are too different e.g. jobs and services§ Not enough resources for big jobs on service nodes§ Spike utilization of batch jobs or builds can impact overall host performance

Ι Graphite and containers§ cannot handle metrics with too much dynamics (30k different containers in 1 week)

MUC_DATE_THEME_INITIALS 10

... where we ended up

Ι GUIs and REST APIsΙ Deploy

§ Services§ Jobs§ Containers§ Machines

Ι Unified deployment & management

Ι Cloud-agnosticΙ You build it, you run it!*

Ι Distributed code-baseΙ full CI/CD life-cycle can

be automatedΙ Pre-configured jenkins

masterΙ Disposable, customized

jenkins slavesΙ Scalable buildsΙ Multi-format binary

repository (Artifactory)Ι Source repository (git)

Build Deploy

Ι Flexible resourcemanagement

Ι Health-checks & self-healing

Ι Environment configΙ Service discovery &

routingΙ Horizontal scalingΙ House-keepingΙ Out-of-the-box

monitoring (metrics, logs)

Run

Build, deploy and run any application with high flexibility & low effort (Jenkinsfile, Dockerfile, Deployment .json). Same release process for all applications. High transparency on infrastructure*.

MUC_DATE_THEME_INITIALS 11

... where we ended up

MUC_DATE_THEME_INITIALS 12

Technology stack: Service Discovery & Routing, PaaS, CaaS, IaaS

SD/SR Nixy Mesos-DNS

PaaSMarathon Chronos Jenkins

Mesos Zookeeper

CaaS

IaaSdeploy-API + cloud-init

MUC_DATE_THEME_INITIALS 13

Technology stack: Monitoring

SD/SR

journal+

filebeat

dockerbeatPaaS

CaaS

hostbeat

IaaSvCenter

MUC_DATE_THEME_INITIALS 14

Technology stack: Deploy API

Ι Fast cloud-like provisioning of VMs (resources: cpu, mem, disk, net)Ι Lightweight bootstrapping

with cloud-initΙ Focus on cattle machinesΙ Re-create over re-configure

$ curl -X POST \--form "image=coreos-1081.3.0" \--form "application=docker" \--form "env=dev112" \--form "cpu=8" \--form "mem=16" \--form "disk=100" \--form "cloudinit_file=@docker.yml" \"http://deploy.zooplus.de/api/v1/machines"

coreos:units

- name: docker.servicedrop-ins:

- name: docker-opts.confcontent: |

[Service]Environment='DOCKER_OPTS=host=tcp://0.0.0.0:2375'

MUC_DATE_THEME_INITIALS 15

Technology stack: Docker

Ι Automated reproducible builds with DockerfileΙ Immutable imagesΙ Bundles app and

dependenciesΙ Common artifact format

Ι Standardized way of deployment, monitoring, etc.Ι Isolation of applicationsΙ Resource allocation

$ cat DockerfileFROM repo.zooplus.de/centos:7RUN yum install –y java-1.8.0_47 && \

yum clean allADD shop.jar /shop.jarCMD java –jar shop.jar

$ docker build –t shop . Building image shopStep 1 : FROM repo.zooplus.de/centos:7---> 9b92a6d1f7de

...

$ docker run shopStarting shop...

MUC_DATE_THEME_INITIALS 16

Technology stack: Mesos

Ι Resource managerΙ Task distribution Ι “Whole DC as single machine”Ι All tasks run in docker

containers

Ι Web UI for status/utilization and debugging (logs, task state)Ι Usually no direct interaction

MUC_DATE_THEME_INITIALS 17

Technology stack: Jenkins

Ι Automation engineΙ Jenkins 2.xΙ Jenkinsfile and multi-

branch supportΙ Post-commit hooksΙ Immutable slaves

§ Running on mesos/docker§ Customized§ Highly scalable

Ι Jenkins master docker image § Spawn test instance in <1min

(builds should run on prod)§ Bootstrap with DSL

folder(”catalog") { }multibranchWorkflowJob(catalog/app') {

branchSources {git {

remote('ssh://git@stash.zooplus.de:22/cat/app.git')credentialsId('ef406810-be3c-4f2a-ad65-6239706d1766')

}}

}

MUC_DATE_THEME_INITIALS 18

Technology stack: Marathon

Ι Task scheduler for mesosΙ Distributed init systemΙ Long runnnig apps/services

Ι Rest API: submit apps via .json

Ι GUI: manage apps & manual config

Ι Health checks & self-healingΙ Multi-app deploymentsΙ Rolling updatesΙ Horizontal scaling

MUC_DATE_THEME_INITIALS 19

Technology stack: Chronos

Ι Task scheduler for mesosΙ Distributed cron systemΙ Batch jobs

Ι Rest API: submit jobs via .json

Ι GUI: job details & status, manual execution

Ι Scheduling§ Time-based§ Dependency-based

MUC_DATE_THEME_INITIALS 20

Technology stack: Nixy/Nginx/Mesos-DNS

Ι Nixy§ Service catalog from marathon§ REST-like API§ Event-based§ Configures nginx based on

templatesΙ Nginx

§ State-of-the-art web server§ Used as service router§ SSL termination§ Proxy for HTTP, TCP and UDP§ Access control & public

exposureΙ Mesos-DNS

§ Service catalog from mesos§ Convention-over-configuration

naming pattern§ used for “internal” services

"Apps": {"/finance/jenkins": {"Tasks": [["ops85-150.web.zooplus.de:20357"],["ops85-150.web.zooplus.de:20358"]

],"Frontends": [{"Type": "http","Data": ["finance-jenkins"]

}]

}}

$ host jenkins-finance.marathon.prod.zooplus.netjenkins-finance.marathon.prod.zooplus.net has address 192.168.85.150

MUC_DATE_THEME_INITIALS 21

Technology stack: journal & beats

Ι Hostbeat§ Ships host metrics in beats

format§ Like collectd

Ι Dockerbeat§ Ships container metrics in

beats format§ Metadata: env, labels

Ι Journal/Filebeat§ Ships every single log line from

journald to ELK§ Docker uses journal log-driver

to ship stdout/stderr§ Apps should log in JSON-lines

Ι ELK/Graphite§ Elastic search: event-data§ Graphite: TSD/metrics

Ι Nagios

MUC_DATE_THEME_INITIALS 22

Technology stack: Cluster structure

MUC_DATE_THEME_INITIALS 23

Technology stack:Questions?

MUC_DATE_THEME_INITIALS 24

Showtime, baby!

MUC_DATE_THEME_INITIALS 25

Thank You!

… and yes, we’re hiring! ;)

Recommended