
1. the journey to container adoption in Enterprise Personal observations by Igor Moochnick Running Docker, Mesos and more in production 2. Where do I come from? 3. Monolithic architecture Local dependencies Everything in one place 4. Static Infrastructure Predictable operations Known Change Scheduled downtime 5. A lot of Change control and coordination MR, MC Waiting for approvals 6. Paradigm shits for Speed Requirements Correctness Stability Waterfall Monolith/3-tier Market demand Customer's delight Speed Agile/Lean SOA/Services 7. What's in it for us? Will it help? Is it a hype? Static vs. Cloud Virtualization vs. Containers Private vs. public Docker? 8. Gradual adoption of virtualization over 5 years Explosion adoption of containers over 2 years Virtualization OpenStack Docker Interest over time (by Google Analytics) 9. Starting slow Getting used to Find limitations Isolation of the builds Slow? Container hosts Network vs. Storage 10. Paradigm shift to MicroServices Loosely coupled service oriented architecture with bounded contexts From Adrian Cockroft (ex Netflix Chief Architect) 11. What is an application? A single container Putting multiple processes into a single container simplifies the deployment Breaks Docker best-practices model monit, supervisord, runsvdir, runIt A composition of related containers Pod (Kubernetes) Task (Amazon AWS ECS Elastic Container Service) Separation of operational concerns Not all frameworks understand the container composition A graph of dependent containers 12. Immutable Artifacts Configuration management doesn't guarantee immutability Cumulative change/Drift vs. refresh Version everything! Turn your release process into an artifact! Pipeline Builder 13. Release Process / Pipeline 1. A developer commits new code to a Repo 2. A build is triggered and creates an app artifact and pushes it into the artifact repository with metadata: 1. Artifact has a hard version 2. Declares its contracts and contract versions 3. List of dependencies and their versions (Bill-of-materials) attached 3. Builds a Docker images and pushes it to the Docker registry 1. Inherits from official base image approved by InfoSec and Systems teams 2. Has exactly the same tag as the version of the app artifact creates correlations 1:1 with the source 4. Deployment ... 14. Release Process Challenges Pick Container Registry: Your own DockerHub Artifactory Registry management is important: Disk space, Heavy images Tracking of what's in use Decommissioning and pruning of the artifacts Availability Auditing Permissions 15. Deployment Prepare Docker host (configuration management) Fry and not Bake Pull Docker container Beware of growing size Pre-warm the host with the base image or a previous version Start application Single container easy Composition of containers is a challenge (Fig? Your own? ...) What configuration (env vars, partitions, etc...) is needed? External HIERARCHICAL config/settings management is the key (Consul, Zookeeper, Hiera) Passing secrets into the containers think carefully! Secret management is important (Consul, EtcD, ...) 16. Versions Composition Ownership management Zombie containers Disappearing containers Container Sprawl 17. Testing Considerations Not much different from Virtualized payload Spin up sandbox environment Test against API, Mocks, Fakes, Pact Go live? Use Blue/Green deployment Pressure testing? Simpler and cheaper to do it in production Isolate traffic Gradually add load to the point of failure Monitor and measure 18. Environment Management Dev/QA/Prod/etc... environments parity Local dev machine vs. Cloud deployment BigRig: 19. Lots of Microservices 20. Change Management 21. Accordance tracks dependencies & ownership Dependency Management 22. Service Discovery No built-in SDN yet, just simple linking Where my dependencies? Eureka EtcD Consul Need to manage state of the App Starting Running When do you know that the app is healthy and running? Healtchecks RunScope - tests contracts and validates the payload Stopping Dead Or check the state from the LB requires extra code 23. Am I alive? When the service is ready to receive traffic? How do you know if your service is alive? Or still alive? When the service is actually can start accessing the linked dependencies/volumes? Introduce delayed initialization or retries Make your orchestration smarter to recognize the composition time Stagger the start and introduce jitter into the system 24. Monitoring / Alerting Adds another layer to monitor Monitor both host and the containers Rate of change is drastically different Location, Names, Versions everything in motion Mutiple running versions at the same time Multiple locations, regions, zones, DC, HA, etc... Tools start to recognize Docker DataDog, Librato, NewRelic, Composite SLA metrics 25. Reasoning about failure Tools assume containment hierarchy Most can't reason about the relationship Your apps spanning across multiple containers and hosts Ex: Machine component (disk?) failure will affect all instances, VMs, Containers and Apps Region Zone/DC Environment Machine VM/Instance Container Process Process Linked Container Volume Storage 26. Failure Detection, Cleanup When to clean up the containers? What the container failure mean? How to deal with the partial failure of the app dependencies or linked containers Volume containers filling up the host storage beware! How to decommission / tear down: What? In what order? How to communicate with the Monitoring/Alerting Notify Change Management system 27. Container storage Stateful containers are hard for the moment Volumes disappear if the Docker host dies especially on the clouds: AWS, OpenStack, etc... Use host mounts, but don't forget where is your stuff and when to clean it Interesting: volume relocation by Flocker 28. Log Management Eagerly move logs out containers are short lived Beware of sheer volume of logs be smart about what and when you ship Can't truncate or rotate container STDOUT and STDERR Write to volumes Log rotation volume rotation? Log analysis Log monitoring & alerting Tools examples: Scribe, LogStash FluentD Splunk (if you can afford it) 29. Mesos Cluster management, provides efficient, fine- grained resource sharing and isolation across distributed applications, or frameworks Distributed resource broker Since 2012 runs in Twitter in Production In July 2013 became top-level Apache project 30. Mesos Ecosystem Marathon Chronos Singularity (HubSpot) Monitoring: queues growing, failure rates, health checking [Apache] Aurora (Twitter) Working rolling upgrades Service health--checks Notifications/service ownership/quotas Note (can't wait): Mantis (Netflix) Distributed scheduler (Fenzo) + predictive auto-scaling (Scryer) Resource optimization Auto-scaling micro-service graph 31. Docker Cluster Management 32. Missing Mesos features AWS Multi-region? Sticky locations? Persistent volumes? No Pods support (multi-container apps) No REST Api to schedule jobs No built-in clean-up Tricky to write frameworks (but getting easier) A lot of work to integrate with the monitoring/alerting/logging systems 33. What's next? Kubernetes What will be the solution for SDN? Container dependencies discovery Lambda architecture What's an on-prem alternative? How do we test apps? What is an app? Should we just stop using apps concepts and move to stream processing? 34. Work in progress Failures tracking Correlation does not imply causation (from Wikipedia) Derivatives and predictive monitoring Machine learning 35. Data, Request & Control Flow Salp (inspired by Dapper) 36. Credits ... Who Moved My Cheese? Movie by Dr. Spencer Johnson Apache Mesos at Twitter (Texas LinuxFest 2014) Containers at Hong Kong commercial port Yes, prime minister 37. Thank you! Questions? @igor_moochnick [email protected]
