34
Cassandra and Docker 2 years in production instaclustr.com @Instaclustr

Cassandra and docker

Embed Size (px)

Citation preview

Cassandra and Docker2 years in production

instaclustr.com @Instaclustr

Who am I and what do I do?• Ben Bromhead

• Co-founder and CTO of Instaclustr -> www.instaclustr.com

• Instaclustr provides Cassandra-as-a-Service in the cloud.

• Currently support AWS, Azure, Heroku and Softlayer with more to come.

• 700+ nodes

Objectives• A quick intro on docker (for the Cassandra folk).

• Our docker story

• Working with Cassandra and docker.

• Running C* in a constrained env w/ docker

• Listen to my astonishment of all the progress docker has made since I last gave this talk

Why docker matters• Finally Developers have a solution to build once and deploy

anywhere

• Finally Ops/Admin has a solution to configure anywhere

• Finally DevOps is easy

• Dev == Test == Staging == Production

• Move with speed

Docker, how it works.• Runs anywhere (Linux kernel 2.6.32+)

• Uses lightweight VMs:

• Own process space (namespace)

• Process isolation and resource control (cgroups)

• Own network adapter

• Own filesystem (chroot)

• Linux Analog to Solaris Zones, *BSD jails

Docker, how it works.• Difference between a container and a VM

Virtual Machine Container

Docker, how it works.• What about the packaging component?

• Uses Union filesystem to create a git like workflow around your deployed code:

!!

Docker!Container!Image!Registry!

Push%

!!!!

Bins/!Libs!!!!!

App!A!

App!Δ!!

!!!!Bins/!

Docker'Engine' Docker'Engine'

Update'

Host'is'now'running'A’’''

App'Δ''

''''Bins/'

''''

Bins/'Libs'''''

App'A'

''''Bins/'

''''

Bins/'Libs'''''

App'A’’'

Host'running'A'wants'to'upgrade'to'A’’.'Requests'update.'Gets'only'diffs''

Why we started using Docker

• We are super duper big fans of the “Immutable server” concept

• Once it’s deployed you don’t touch it

• No config management, no chef, no puppet etc

• Seed at boot and be done with it

Why we started using Docker• Before Docker, we built AMIs in Amazon

• A new AMI for every deploy, version etc

• This meant we cycled our entire fleet of instances constantly

• Which is fine for some, but we work with persistent data

• Sooo much time streaming from replicas/copying backups from S3

Why we started using Docker• Docker images solved this for us

• Treat the host as a sterile environment

• Everything in a few docker containers which we can simply update

• Cycle the docker container instead of the AMI

• Yes… docker was primarily a package management tool for us

Docker at Instaclustr

• So how do we get on board the hype train an established devops practice? Without killing performance or stability?

• Ran in dev to get comfortable with it, then non-critical systems.

• Talked to others who use it in production

• https://github.com/docker/docker/issues - https://docs.docker.com/ You will spend a lot of time here

Docker is it production ready?

Docker is it production ready?

Yes

Docker & Cassandra - Networking

• 1st trial, throughput dropped in half!

• Writes sucked, streaming sucked, what was going on?

• Quick check with iperf showed a 50% hit in throughput

Docker & Cassandra - Networking

• Docker uses Linux Ethernet Bridges for basic software defined routing. This will hose your network throughput (2014).

• Use the host network stack instead (—net=host), 0% impact on Cassandra throughput (iperf still showed minor overhead)

• Also solves NAT issues in an AWS like networking environment.

Docker & Cassandra + Filesystem• The filesystems (AUFS, BTRFS etc) that bring great benefits to Dockers

workflow around building and snapshoting containers are not very good for databases.

• You also need keep your C* data, commitlogs & caches in a Docker volume mount for persistence.

• UnionFS (AUFS) is terrible for writing lots of big files.

• BTRFS is a pain to use from an ops point of view. Terrible

• Hooray volume mounts use the underlying filesystem. Put cassandra data dir on a volume mount with a decent fs (e.g. xfs)

Docker + Process Capabilities

Docker + Process Capabilities

• Docker by default drops all process capabilities except the minimum needed to start.

• https://github.com/docker/docker/blob/master/oci/defaults_linux.go#L64-L79

Docker + Process Capabilities• Cassandra needs to pin files to memory using Mlockall, otherwise things

get sloooow.

• Mlockall is a process capability.

• A process needs CAP_IPC_LOCK & RLIMIT_MEMLOCK in order to perform this operation. By default docker doesn't assign this to a running container…

• Can use --privileged and be done with it. Kind of lazy though

• Use --cap-add instead

Docker + SIGTERM propagation• When stopping the process docker will send a SIGTERM.

• Some interpreted languages treat PID 1 differently. E.g. Python/Bash does not have default signal handlers when it’s PID 1.

• Bad if you use a bash script to launch Cassandra

• Java to the rescue!

• Make sure you run the cassandra bash script with -f (foreground)

• exec causes the JVM to replace the bash process… making the world a happier place

Docker + SIGTERM propagation• Tools like OpsCenter Server will have trouble with this.

• Can be fixed using a wacky combination of trap and wait stanzas in your OpsCenter Server script (see http://veithen.github.io/2014/11/16/sigterm-propagation.html)

• But now you have a bash script that duplicates init/systemd/supervisord

• The debate rages on…

Docker + CoreOS

• Docker + fav OS + CM?, CoreOS + etcd, Swarm + Machine, Deis etc

• We chose CoreOS (Appeared to be sane, etcd is cool, systemd if you are into that kind of thing)

• Docker (the company) now does their own thing… did you know they now call Docker… Docker Engine… who’d have thunk.

Docker + CoreOS

• Disable automatic updates + restarts (seriously do this)

• Fix logging, otherwise you will log to 3 locations (/var/log/cassandra, journalctl and dockers json based log

• JVM will exit with error 143 (128 + 15 for SIGTERM). Need to ignore that in your systemd service definition.

Docker + Dev Env• Docker relies on Linux kernel capabilites… so no native docker in OS X

• We use OSX for dev, so we run vagrant and the CoreOS vagrant file

• Install Docker userland tools in OS X and forward ports to the vagrant box running CoreOS

• Our env is a little strange, we a single cassandra instance on a single CoreOS vm.

• Docker for mac now uses a lighter weight virtualisation layer native to OSX.

• Look at https://github.com/tobert/cassandra-docker for full dockerisation!

Docker + C* + Dev Env

• How do I run lots of C* instances on a VM or my dev laptop without it falling over?

• Backwards performance tuning!

• Make it run as slowly, but as stable as possible!

Docker + C* + Dev Env• Set Memory to be super low (you can go higher than this), edit your

cassandra-env.sh:

MAX_HEAP_SIZE="128M"HEAP_NEWSIZE=“24M"

Docker + C* + Dev Env• Tune compaction to have free reign and to smash the disk

concurrent_compactors:1in_memory_compaction_limit_in_mb:2compaction_throughput_mb_per_sec:0

Docker + C* + Dev Env• Let’s use HSHA thrift server as it reduces the memory per thread

used.

rpc_server_type:hsha

Docker + C* + Dev Env• The HSHA server also lets us limit the number of threads serving in

flight requests, but still have a large number of clients connected.

concurrent_reads:4concurrent_writes:4rpc_min_threads:2rpc_max_threads:2

• You can play with these to get the right numbers based on how your clients connect, but keep them low.

Docker + C* + Dev Env• This is Dev! Caches have no power here!

key_cache_size_in_mb:0reduce_cache_sizes_at:0reduce_cache_capacity_to:0

Docker + C* + Dev Env

• How well does this work?!?!

• Will survive running the insane workload in the c* 2.1 new stresstest tool.

• We run this on AWS t2.small instances

• Sign up at https://www.instaclustr.com and give our new Developer nodes a spin!

Go forth and conquer!

Questions?