24
Kubernetes to Scale [email protected] @micheleorsi GDG Cloud - London, 11 January 2017

Kubernetes to scale

Embed Size (px)

Citation preview

Page 1: Kubernetes to scale

Kubernetes to Scale

[email protected] @micheleorsi

GDG Cloud - London, 11 January 2017

Page 2: Kubernetes to scale

Started with a monolith ...

https://www.flickr.com/photos/southtopia/5702790189

Page 3: Kubernetes to scale

https://www.pexels.com/photo/gray-pebbles-with-green-grass-51168/

... broken into microservices

Page 4: Kubernetes to scale

Micro-problems at scale

● alignment

● real pipelines

● infrastructure

● resilience

● monitoring

● constraints

Page 5: Kubernetes to scale

An year-long endeavour

● build a new, modern infrastructure

● migrate the search (flight/hotel) product there

... without:

● impacting the business● throwing away our whole datacenter

Page 6: Kubernetes to scale

How we did that: technology

● company framework

● docker

● kubernetes

Page 7: Kubernetes to scale

How? Teams and peopleHow we did that: team/people

https://www.pexels.com/photo/blue-lego-toy-beside-orange-and-white-lego-toy-standing-during-daytime-105822/

Page 8: Kubernetes to scale

APP3-PRODUCTION

Kubernetes: our architecture

APP2-PRODUCTIONAPP1-PRODUCTION

APP3-PRODUCTIONAPP2-PRODUCTION

APP1-PREVIEW

APP3-PRODUCTIONAPP2-PRODUCTION

APP1-DEVELOPMENT

APP3-PRODUCTIONAPP2-PRODUCTION

APP1-QA

APP3-PRODUCTIONAPP2-PRODUCTION

APP1-STRESSTEST

nonproductionproduction

Page 9: Kubernetes to scale

Kubernetes: our architecture

APP1-PRODUCTION

deployment

replica-set

POD3

POD2

POD1

production

Page 10: Kubernetes to scale

Kubernetes: our architecture

APP1-PRODUCTION

deployment

replica-set

secret configmap

POD3

POD2

POD1

production

Page 11: Kubernetes to scale

Kubernetes: our architecture

APP1-PRODUCTION

deployment

replica-set

(ingress)path: app1-production.prd.lmn.intra

secret configmap

POD3

POD2

POD1

production

Page 12: Kubernetes to scale

Kubernetes: our architecture

nginx-ingress-ctrl: 80

cluster

F5POD

10.0.0.2

POD10.0.0.1

nginx-ingress-ctrl: 80

nginx-ingress-ctrl: 80

POD10.0.0.3POD

10.0.0.4

POD10.0.0.5

POD10.0.0.6

Page 13: Kubernetes to scale

APP1-PRODUCTION

Kubernetes: our architecture

POD

collectd

production

application fluentd

Page 14: Kubernetes to scale

/liveness:

● when tomcat container is up● when “active/max” threads < threshold

/readiness:

● all the startup jobs have run● no termination request has been received

.. ongoing never-ending research ..

Self-healing: our choice for resilience

Page 15: Kubernetes to scale

Kubernetes: what’s left outside?

● datastores

● distributed caches (early 2017)

● distributed locking

● pub-sub/queues

● logs and metrics storage

Page 16: Kubernetes to scale

● zero downtime during rollout

● monitoring in place

● alerting

● centralized logging

● legacy infrastructure to the rescue in case of problem

When can you test with production traffic?

Page 17: Kubernetes to scale

... failure ... at all different levels ..

https://www.flickr.com/photos/ghost_of_kuji/2763674926

Page 18: Kubernetes to scale

Main problems

● configuration

● infrastructure

● tools

● manual mistakes

● (external) scalability

Page 19: Kubernetes to scale

There’s light .. at the end

https://www.pexels.com/photo/grayscale-photography-of-person-at-the-end-of-tunnel-211816/

Page 20: Kubernetes to scale

Pipeline: a huge step forward

microservice = factory.newDeployRequest().withArtifact(“com.lastminute.application1”,2)

lmn_deployCanaryStrategy(microservice,”qa”)

lmn_deployStableStrategy(microservice,”preview”)

lmn_deployCanaryStrategy(microservice,”production”)

pipeline

Page 21: Kubernetes to scale

APP1-PRODUCTION

POD

Monitoring: grafana/graphite/nagios

cluster

graphiteapplication collectd

Grafana

nagios

icons from http://www.flaticon.com

Page 22: Kubernetes to scale

● lead and migration time

● resilience

● root cause analysis

● speed of deployment

● instant scaling

... benefits

Page 23: Kubernetes to scale

● 36 bare-metal nodes (only for production cluster)● 5100 req/sec in the new cluster● 2M metrics/minute flows● 35 micro-services migrated in 5 months

○ 3 new micro-services migrated per week○ 10 minutes to create a new environment

● 11 min to roll-out a new version with 55 instances○ whole pipeline runs in 16 min

Give me the numbers!

Page 24: Kubernetes to scale

Yes, we’re hiring!

THANKS

www.lastminutegroup.com