29
Custom High Availability of Kubernetes Mike Splain | @mikesplain

Kubernetes Boston — Custom High Availability of Kubernetes

Embed Size (px)

Citation preview

Page 1: Kubernetes Boston — Custom High Availability of Kubernetes

Custom High Availability of Kubernetes

Mike Splain | @mikesplain

Page 2: Kubernetes Boston — Custom High Availability of Kubernetes

Our requirements for running K8s

• Fast recovery without human intervention • Nodes are ephemeral • Autoscaling • Testable on developer machines • K8s as an artifact

Page 3: Kubernetes Boston — Custom High Availability of Kubernetes

Technology choices for running K8s

Page 4: Kubernetes Boston — Custom High Availability of Kubernetes

Hasn’t someone else already built this?

Page 5: Kubernetes Boston — Custom High Availability of Kubernetes

K8s official scripts

• Simple bash scripts • “Just Works!” • Sets up Autoscaling group for

minions • Uses Salt

Problems?

• No etcd HA • No Master HA • Salt Master is coupled with K8s

Master • Ubuntu / Fedora

Page 6: Kubernetes Boston — Custom High Availability of Kubernetes

CoreOS’s official scripts

• Cool go app to start it! • Or Cloud formation? • But what now?

Problems?

• No etcd HA • No Master HA • Lots of magic

Page 7: Kubernetes Boston — Custom High Availability of Kubernetes

“It can’t be that hard right?”

Page 8: Kubernetes Boston — Custom High Availability of Kubernetes
Page 9: Kubernetes Boston — Custom High Availability of Kubernetes
Page 10: Kubernetes Boston — Custom High Availability of Kubernetes

etcd

Page 11: Kubernetes Boston — Custom High Availability of Kubernetes

etcd

• Easy. • “Just works”

• Cluster discovery: • Discovery Service • DNS • ?

#cloud-config

coreos: etcd2: advertise-client-urls: "http://$public_ipv4:2379" initial-advertise-peer-urls: "http://$private_ipv4:2380" listen-client-urls: "http://0.0.0.0:2379,http://0.0.0.0:4001" listen-peer-urls: "http://$private_ipv4:2380,http://$private_ipv4:7001" discovery-token: “<token here>” units: - name: etcd2.service command: start update: reboot-strategy: none

Page 12: Kubernetes Boston — Custom High Availability of Kubernetes

etcd

• etcd-aws-clusterhttps://github.com/MonsantoCo/etcd-aws-cluster

• Uses Autoscaling groups for discovery • Requires IAM Instance Roles

#cloud-config

coreos: etcd2: advertise-client-urls: "http://$public_ipv4:2379" initial-advertise-peer-urls: "http://$private_ipv4:2380" listen-client-urls: "http://0.0.0.0:2379,http://0.0.0.0:4001" listen-peer-urls: "http://$private_ipv4:2380,http://$private_ipv4:7001" units: - name: etcd2.service command: stop - name: etcd-peers.service command: start content: | [Unit] Description=Write a file with the etcd peers that we should bootstrap to Requires=docker.service After=docker.service [Service] Restart=on-failure RestartSec=10 TimeoutStartSec=300 ExecStartPre=/usr/bin/docker pull registry.barklyprotects.com/kubernetes/etcd-aws-cluster:latest ExecStartPre=/usr/bin/docker run --rm=true -v /etc/sysconfig/:/etc/sysconfig/ registry.barklyprotects.com/kubernetes/etcd-aws-cluster:latest ExecStart=/usr/bin/systemctl start etcd2

write_files: - path: /etc/systemd/system/etcd2.service.d/30-etcd_peers.conf permissions: 0644 content: | [Service] # Load the other hosts in the etcd leader autoscaling group from file EnvironmentFile=/etc/sysconfig/etcd-peers

Page 13: Kubernetes Boston — Custom High Availability of Kubernetes

Terraform to launch etcd

• References static cloud-init files

resource "aws_launch_configuration" "terraform_etcd" { name_prefix = "${var.environment}_etcd_conf-" image_id = "${var.coreos_ami}" instance_type = "t2.small" key_name = "${var.key_name}" security_groups = ["${aws_security_group.terraform_etcd2_sec_group.id}"] user_data = "${file("../cloud-config/output/etcd.yml")}" enable_monitoring = true ebs_optimized = false iam_instance_profile = "${aws_iam_instance_profile.terraform_etcd_role_profile.id}" root_block_device { volume_size = 20 }

lifecycle { create_before_destroy = true } }

resource "aws_autoscaling_group" "terraform_etcd" { name = "${var.environment}_etcd" launch_configuration = "${aws_launch_configuration.terraform_etcd.name}" availability_zones = ["us-east-1c"] max_size = "${var.capacities_etcd_max}" min_size = "${var.capacities_etcd_min}" health_check_grace_period = 300 desired_capacity = "${var.capacities_etcd_desired}" vpc_zone_identifier = ["${aws_subnet.etcd.id}"] force_delete = true

tag { key = "Name" value = "${var.environment}_etcd" propagate_at_launch = true }

lifecycle { create_before_destroy = true } }

Page 14: Kubernetes Boston — Custom High Availability of Kubernetes

Masters

Page 15: Kubernetes Boston — Custom High Availability of Kubernetes

Master Node

• Kubelet service • API Server • Replication Controller Manager • Scheduler • Podmaster • Proxy

• Flannel

Page 16: Kubernetes Boston — Custom High Availability of Kubernetes

Consider Master pods as additional artifact

• Docker container that takes env variables

• Outputs templated pods to disk for Kubelet to load

• We use j2cli to template these files with little overhead

apiVersion: v1 kind: Pod metadata: name: kube-podmaster namespace: kube-system spec: hostNetwork: true containers: - name: scheduler-elector image: gcr.io/google_containers/podmaster:1.1 imagePullPolicy: Always command: - /podmaster - --etcd-servers={{ ETCD_ENDPOINTS }} - --key=scheduler - --whoami={{ ADVERTISE_IP }} - --source-file=/src/manifests/kube-scheduler.yaml - --dest-file=/dst/manifests/kube-scheduler.yaml volumeMounts: - mountPath: /src/manifests name: manifest-src readOnly: true - mountPath: /dst/manifests name: manifest-dst

Page 17: Kubernetes Boston — Custom High Availability of Kubernetes

`

• Setup box for services needed for real Docker to run

• Get etcd server IPs and write to file • Start flannel with those IPs

#cloud-config

coreos: units: - name: etcd.service command: stop - name: etcd2.service command: stop - name: early-docker.service command: start - name: kub_get_etcd.service command: start content: | [Unit] Description= Write K8s etcd urls to disk. Requires=early-docker.service After=early-docker.service Before=early-docker.target [Service] Type=oneshot Environment="DOCKER_HOST=unix:///var/run/early-docker.sock" ExecStart=/usr/bin/sh -c "/usr/bin/docker pull registry.barklyprotects.com/kubernetes/kub-get-etcd" ExecStart=/usr/bin/sh -c "/usr/bin/docker run --net=host -v /etc/barkly/:/etc/barkly/ registry.barklyprotects.com/kubernetes/kub-get-etcd {{ KUB_ETCD_ASG }} > /etc/etcd_servers.env" - name: flanneld.service command: start drop-ins: - name: 10-environment_vars.conf content: | [Unit] After=kub_get_etcd.service [Service] ExecStartPre=/usr/bin/sh -c "/usr/bin/echo -n FLANNELD_ETCD_ENDPOINTS= > /etc/flannel_etcd_servers.env" ExecStartPre=/usr/bin/sh -c "/usr/bin/cat /etc/etcd_servers.env >> /etc/flannel_etcd_servers.env" ExecStartPre=/usr/bin/sh -c "/usr/bin/echo FLANNELD_IFACE=$private_ipv4 >> /etc/flannel_etcd_servers.env" ExecStartPre=/usr/bin/ln -sf /etc/flannel_etcd_servers.env /run/flannel/options.env Restart=always RestartSec=10

Page 18: Kubernetes Boston — Custom High Availability of Kubernetes

Other config

• Grab certs from S3 • Terraform only allows

permissions to specific files • Format Master pod files to disk

- name: kub_certs.service command: start content: | [Unit] Description=Writes kubernetes cluster certs to disk. Requires=early-docker.service After=early-docker.service Before=early-docker.target Before=kubelet.service [Service] Type=oneshot Environment="DOCKER_HOST=unix:///var/run/early-docker.sock" ExecStart=/usr/bin/sh -c /usr/bin/mkdir -p /etc/kubernetes/ssl ExecStart=/usr/bin/docker run --net=host -v /etc/kubernetes/ssl:/ssl registry.barklyprotects.com/ops/awscli s3 cp s3://our_k8s_cluster_bucket/ca.pem /ssl ExecStart=/usr/bin/docker run --net=host -v /etc/kubernetes/ssl:/ssl registry.barklyprotects.com/ops/awscli s3 cp s3://our_k8s_cluster_bucket/apiserver.pem /ssl ExecStart=/usr/bin/docker run --net=host -v /etc/kubernetes/ssl:/ssl registry.barklyprotects.com/ops/awscli s3 cp s3://our_k8s_cluster_bucket/apiserver-key.pem /ssl - name: kub_pods.service command: start content: | [Unit] Description=Writes kubernetes pod files to disk. Requires=early-docker.service After=early-docker.service Before=early-docker.target Before=kubelet.service [Service] Type=oneshot Environment="DOCKER_HOST=unix:///var/run/early-docker.sock" ExecStart=/usr/bin/sh -c "/usr/bin/mkdir -p /etc/kubernetes/ssl" ExecStart=/usr/bin/docker run --net=host -v /etc/barkly/:/etc/barkly/ -e K8S_VERSION='1.1.7' -e CLOUD_PROVIDER='--cloud-provider=aws' -e SERVICE_IP_RANGE="10.3.0.0/16" -e ADVERTISE_IP="$private_ipv4" -e ETCD_AUTOSCALE_GROUP_NAME="our_etcd_autoscaling_group_name" -v /srv/kubernetes/manifests:/output_src -v /etc/kubernetes/manifests:/output_dst registry.barklyprotects.com/kubernetes/kub-master-pods

Page 19: Kubernetes Boston — Custom High Availability of Kubernetes

Start Docker & Kubelet

• Kubelet will wait for docker and flannel to be ready

• Kubelet will load manifests for Master services outputed from previous container

- name: docker.service command: start drop-ins: - name: 40-flannel.conf content: | [Unit] Requires=flanneld.service After=flanneld.service - name: kubelet.service command: start content: | [Unit] Requires=docker.service After=docker.service After=fluentd-elasticsearch.service [Service] ExecStartPre=/usr/bin/mkdir -p /var/log/containers ExecStart=/etc/bin/kubelet \ --hostname-override="$private_ipv4" \ --api_servers=http://127.0.0.1:8080 \ --register-node=false \ --allow-privileged=true \ --config=/etc/kubernetes/manifests \ --cluster-dns=10.3.0.10 \ --cluster-domain=cluster.local \ --cloud-provider=aws \ --v=4 Restart=always RestartSec=10 [Install] WantedBy=multi-user.target

Page 20: Kubernetes Boston — Custom High Availability of Kubernetes

Get Kubelet

• /usr/bin & /usr/local/bin are read only in CoreOS

- name: kubelet.service command: start drop-ins: - name: 10-download-binary.conf content: | [Service] ExecStartPre=/bin/bash -c "/etc/bin/download-k8s-binary kubelet" write_files: # Since systemd needs these files before it will start - path: /etc/bin/download-k8s-binary permissions: '0755' content: | #!/usr/bin/env bash export K8S_VERSION="v1.1.8" mkdir -p /etc/bin FILE=$1 if [ ! -f /usr/bin/$FILE ]; then curl -sSL -o /etc/bin/$FILE https://s3.amazonaws.com/barkly-kubernetes-builds/${K8S_VERSION}/bin/$FILE chmod +x /etc/bin/$FILE else # we check the version of the binary INSTALLED_VERSION=$(/etc/bin/$FILE --version) MATCH=$(echo "${INSTALLED_VERSION}" | grep -c "${K8S_VERSION}") if [ $MATCH -eq 0 ]; then # the version is different curl -sSL -o /etc/bin/$FILE https://s3.amazonaws.com/barkly-kubernetes-builds/${K8S_VERSION}/bin/$FILE chmod +x /etc/bin/$FILE fi fi

Page 21: Kubernetes Boston — Custom High Availability of Kubernetes

Terraform to build

• Similar to as before.. we reference our cloudinit script

resource "aws_launch_configuration" "terraform_master" { name_prefix = "${var.environment}_master_conf-" image_id = "${var.coreos_ami}" instance_type = "t2.medium" key_name = "${var.key_name}" security_groups = ["${aws_security_group.terraform_master_sec_group.id}"] user_data = "${file("../cloud-config/output/master.yml")}" enable_monitoring = true ebs_optimized = false iam_instance_profile = "${aws_iam_instance_profile.terraform_master_role_profile.id}"

root_block_device { volume_size = 20 }

lifecycle { create_before_destroy = true } }

resource "aws_autoscaling_group" "terraform_master" { name = "${var.environment}_master" launch_configuration = "${aws_launch_configuration.terraform_master.name}" availability_zones = ["us-east-1c"] max_size = "${var.capacities_master_max}" min_size = "${var.capacities_master_min}" health_check_grace_period = 300 desired_capacity = "${var.capacities_master_desired}" vpc_zone_identifier = ["${aws_subnet.master.id}"] force_delete = true load_balancers = ["${aws_elb.terraform_master.name}"]

tag { key = "Name" value = "${var.environment}_master" propagate_at_launch = true } lifecycle { create_before_destroy = true } }

Page 22: Kubernetes Boston — Custom High Availability of Kubernetes

Minions

Page 23: Kubernetes Boston — Custom High Availability of Kubernetes

Minion Node (now just Nodes)

• Kubelet Service manages all other services. • Proxy • Pods

Page 24: Kubernetes Boston — Custom High Availability of Kubernetes

Early Docker / Flannel

• Same as master!

#cloud-config

coreos: units: - name: etcd.service command: stop - name: etcd2.service command: stop - name: early-docker.service command: start - name: kub_get_etcd.service command: start content: | [Unit] Description= Write K8s etcd urls to disk. Requires=early-docker.service After=early-docker.service Before=early-docker.target [Service] Type=oneshot Environment="DOCKER_HOST=unix:///var/run/early-docker.sock" ExecStart=/usr/bin/sh -c "/usr/bin/docker pull registry.barklyprotects.com/kubernetes/kub-get-etcd" ExecStart=/usr/bin/sh -c "/usr/bin/docker run --net=host -v /etc/barkly/:/etc/barkly/ registry.barklyprotects.com/kubernetes/kub-get-etcd {{ KUB_ETCD_ASG }} > /etc/etcd_servers.env" - name: flanneld.service command: start drop-ins: - name: 10-environment_vars.conf content: | [Unit] After=kub_get_etcd.service [Service] ExecStartPre=/usr/bin/sh -c "/usr/bin/echo -n FLANNELD_ETCD_ENDPOINTS= > /etc/flannel_etcd_servers.env" ExecStartPre=/usr/bin/sh -c "/usr/bin/cat /etc/etcd_servers.env >> /etc/flannel_etcd_servers.env" ExecStartPre=/usr/bin/sh -c "/usr/bin/echo FLANNELD_IFACE=$private_ipv4 >> /etc/flannel_etcd_servers.env" ExecStartPre=/usr/bin/ln -sf /etc/flannel_etcd_servers.env /run/flannel/options.env Restart=always RestartSec=10

Page 25: Kubernetes Boston — Custom High Availability of Kubernetes

Kubelet

• Similar to master

- name: docker.service command: start drop-ins: - name: 40-flannel.conf content: | [Unit] Requires=flanneld.service After=flanneld.service

- name: kubelet.service command: start content: | [Unit] Requires=docker.service After=docker.service After=fluentd-elasticsearch.service [Service] ExecStartPre=/usr/bin/mkdir -p /var/log/containers ExecStart=/etc/bin/kubelet \ --api_servers=https://ourk8smaster.barkly.com \ --hostname-override="$private_ipv4" \ --register-node=true \ --allow-privileged=true \ --config=/etc/kubernetes/manifests \ --cluster-dns=10.3.0.10 \ --cluster-domain=cluster.local \ --kubeconfig=/etc/kubernetes/worker-kubeconfig-kubelet.yaml \ --tls-cert-file=/etc/kubernetes/ssl/worker.pem \ --tls-private-key-file=/etc/kubernetes/ssl/worker-key.pem \ --cloud-provider=aws \ --v=4 Restart=always RestartSec=10 [Install] WantedBy=multi-user.target drop-ins: - name: 10-download-binary.conf content: | [Service] ExecStartPre=/bin/bash -c "/etc/bin/download-k8s-binary kubelet"

Page 26: Kubernetes Boston — Custom High Availability of Kubernetes

Manifests can be hard coded

• Minion manifests are far less dynamic and can be hard coded as CloudInit files

write_files: - path: "/etc/kubernetes/manifests/kube-proxy.yaml" content: | apiVersion: v1 kind: Pod metadata: name: kube-proxy namespace: kube-system spec: hostNetwork: true containers: - name: kube-proxy image: registry.barklyprotects.com/kubernetes/hyperkube:v1.1.8 command: - /hyperkube - proxy - --master=https://ourk8smaster.barkly.com - --kubeconfig=/etc/kubernetes/worker-kubeconfig-proxy.yaml - --proxy-mode=iptables - --v=4 securityContext: privileged: true volumeMounts: - mountPath: /etc/ssl/certs name: "ssl-certs" - mountPath: /etc/kubernetes/worker-kubeconfig-proxy.yaml name: "kubeconfig" readOnly: true - mountPath: /etc/kubernetes/ssl name: "etc-kube-ssl" readOnly: true volumes: - name: "ssl-certs" hostPath: path: "/usr/share/ca-certificates" - name: "kubeconfig" hostPath: path: "/etc/kubernetes/worker-kubeconfig-proxy.yaml" - name: "etc-kube-ssl" hostPath: path: "/etc/kubernetes/ssl"

Page 27: Kubernetes Boston — Custom High Availability of Kubernetes

Manifests can be hard coded

• Minion manifests are far less dynamic and can be hard coded as CloudInit files

- path: "/etc/kubernetes/worker-kubeconfig-proxy.yaml" content: | apiVersion: v1 kind: Config clusters: - name: local cluster: certificate-authority: /etc/kubernetes/ssl/ca.pem users: - name: kubelet user: client-certificate: /etc/kubernetes/ssl/worker.pem client-key: /etc/kubernetes/ssl/worker-key.pem contexts: - context: cluster: local user: kubelet name: kubelet-context current-context: kubelet-context - path: "/etc/kubernetes/worker-kubeconfig-kubelet.yaml" content: | apiVersion: v1 kind: Config clusters: - name: local cluster: certificate-authority: /etc/kubernetes/ssl/ca.pem users: - name: kubelet user: client-certificate: /etc/kubernetes/ssl/worker.pem client-key: /etc/kubernetes/ssl/worker-key.pem contexts: - context: cluster: local user: kubelet name: kubelet-context current-context: kubelet-context

Page 28: Kubernetes Boston — Custom High Availability of Kubernetes

Demo!

Page 29: Kubernetes Boston — Custom High Availability of Kubernetes

Questions?