Rancher + Kubernetes; Stories from the trenches

Preview:

Citation preview

Stories from the trenches

Who am I?

Chief Cloud Officer - Bulletproof

@gergnz

Who else is to blame?

• Qamal Kosim-SatyaputraBCG Digital Ventures

• Stuart GriceGrice Barrett Consulting

• Maciej DrożdżowskiA Grumpy Polish Bloke

What did we build?

• 2 Rancher Clusters• Managing 6 Kubernetes Clusters

• Deployment Tooling• kube-services

• Cluster Management Tooling• combat-wombat, monitoring-

meerkats, rolling-raccoon• Operations Tooling

• PCI-DSS, ISO27001

How did we do it?

• Terraform• Ansible• AWS• Trendmicro Deep Security• Splunk• Vault• Packer• Skeddly• Amazon Linux

• Github• Gitlab• Travis-CI• Bintray• Jenkins• Kafka• Ruby, Python• Docker• Java

Let’s Dig In

AWS Account Separation

Rancher + Kubernetes

Rancher

Demo

What could possibly go wrong?

Amazon Linux/Docker• Docker version change after upgrade

• Slow startup time

• cgroup location

• /var/lib/docker.sock becomes a dir

- name: install docker

yum: name=docker-1.12.6

- --max-concurrent-downloads 128

docker pull <allthethings> (in packer)

- sudo mount –t tmpfs tmpfs /sys/fs/cgroup

sudo sed -i 's|cgroup|sys/fs/cgroup|' /etc/cgconfig.conf

- ¯\_(ツ)_/¯ - never worked this one out

Docker Images

Java Musl-libc DNS

JUST DON’T!!!!!!!!!!!!!!!!

Java Musl-libc DNS Cont.

JAVA_OPTS=-Dsun.net.spi.nameservice.provider.1=default-Dsun.net.spi.nameservice.provider.2=dns,sun-Dsun.net.spi.nameservice.nameservers=<vpc endpoint>

Magical Rancher CNI and KubeDNS goes here

Rancher• RDS resources exhausted

• Host clean up

- Make it bigger, much bigger

- Build combat-wombat

combat-wombat

Kubernetes• T2 instance CPU credit exhaustion

• etcd split-brain

• anti-affinity

• Launch configuration UpdatePolicy

- etcd gets really busy

don’t use T2s, use Cs

- etcdctl cluster-health

disaster

- json embedded inside yaml for beta/alpha features

sleepy-sloth

- rolling-raccoon

sleepy-sloth

rolling-raccoon

What did we learn?SoDD: Stackoverflow Driven Development

If at first you don’t succeed: Double Tap

Requirements/Specifications change: use code to build generalised building blocks

The upgrade loop never stops: Fix bug, incur tech debt, raise issue/PR, wait for next release, upgrade, rinse, repeat.

Recommended