39
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Sathiya Shunmugasundaram @ Capital One Gnani Dathathreya @ Capital One December 2016 Operations Automation and Infrastructure Management with Amazon ECS CON311

AWS re:Invent 2016: Operations Automation and Infrastructure Management with Amazon ECS (CON311)

Embed Size (px)

Citation preview

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Sathiya Shunmugasundaram @ Capital One

Gnani Dathathreya @ Capital One

December 2016

Operations Automation and

Infrastructure Management with Amazon ECS

CON311

What to Expect from the Session

• Microservices, Docker, and the Amazon ECS journey

• Container stack evolution

• Container stack operations automation

• Infrastructure creation automation

• AMI update automation

• Blue/green deployment automation

• Canary deployment automation

• Lessons and looking forward

Microservices, Docker, and the

Amazon ECS Journey

We use Docker and ECS-based container technologies to advance

microservices adoption and increase efficiencies of cloud resources:

Microservices

architecture

We embraced microservices architecture for our cloud

applications and this is driving Docker container

technology adoption

Federated operating

model

Self-service

automation tools

Ours is a federated organization with a You Build You

Own operating model providing autonomy and speed

for delivery teams

We developed self service container management

automation tools based on ECS for accelerating

federating teams application delivery

Amazon ECS is the most adopted container management solution

in Capital One

• ECS and Docker implementations at Capital One include Credit Card servicing, Auto

Loan Servicing, and Enterprise Open Source office applications

• We run microservices, event-driven applications, batch applications, real-time APIs

and web applications using ECS and Docker solutions

• ECS is adopted in multiple lines of business for both internal and customer

applications

• ECS simplified the containerization journey in Capital One

• We leverage ECS’s integration with CloudWatch, IAM and other native services for

seamless integration with operations

• With ECS and our automation tooling, Docker apps can be deployed with a production

hardened container stack in minutes

Container stacks are integrated with Enterprise DevOps tools

providing an end-to-end automation solution for containerized

microservices.

SCM Build

Code

Binary

Repo

Docker

Image

Repo

Cluster

Scheduler

Cluster manager

Service

Discovery

Software

LBELB

API

Gateway

clientsDevelopers

Container management solution

components

Capability SCM Build Repos Compute cluster

Cluster manager

Container scheduler

Dynamic

Service

Discovery

Load balancer Load

balancer

API

Gateway

Solution GitHub

Enterprise

Jenkins Nexus

Docker

registry

EC2 instances

ECS

Consul

Target Group

Nginx

App load

balancer

Elastic Load

Balancing

API

Gateway

Container Stack Evolution

Our container stack evolved along with the ECS and ELB

advancement.

Simple stack

Mutual SSL1 +

ECSClassic load

balancer

Our container stack evolved along with the ECS and ELB

advancement.

Registrator+ + + +ECS Classic load

balancer

Consul Registrator Nginx

Simple stack

Mutual SSL

High Density Packing

Mutual SSL

1

2

+ECS

Classic load

balancer

Our container stack evolved along with the ECS and ELB

advancement.

+ECS

Application

load balancer

3Simple stack

High Density Packing

Simple stack

Mutual SSL1 +

ECSClassic load

balancer

Registrator+ + + +ECS Classic ELB Consul Registrator Nginx

High Density Packing

Mutual SSL2

ECSClassic load

balancerConsul Registrator Nginx

ECS and Classic Load Balancer is a simpler solution for running

containers; however, fixed host port mapping constrains running

one task per service in a ECS instance.

SV1Task1

SV1Task2

SV2Task1

SV2Task2

X X

ECS cluster

Availability Zone A Availability Zone B

Auto Scaling groupECS instance ECS instance

Service 1

ELB

Service 2

ELBELB’s fixed listener

port constrains

running only one task

per ECS instance for

a service

ELB’s fixed listener

port constrains

running only one task

per ECS instance for

a service

1

ECS, Classic Load Balancer, Consul, Nginx, and Registrator

solution provides dynamic service discovery and load balancing;

However, it involves management of several components.

ECS cluster

Availability Zone A Availability Zone B

Auto Scaling groupECS instance ECS instance

Services

ELB

SV1

Task1

SV2

Task2

Nginx

SV1

Task3

Consul Agent

Registrator ConsulTemplates

SV1

Task2

SV2

Task1

Nginx

SV2

Task3

Consul Agent

Registrator ConsulTemplates

ELB fixed listener port is

mapped to Nginx running

in each ECS instance. An

ELB can serve multiple

services

Nginx config routes

service requests to

appropriate service

containers/tasks

Registrator, consul

and consul templates

dynamically discover

containers/tasks and

configure nginx

Consul cluster for

dynamic service

discovery

Consul ELB

Availability Zone A

Availability Zone B

Availability Zone C

instance

consul

Auto Scaling group

instance

consul

instance

consul

2

ECS and Application Load Balancer solution provides a simpler,

efficient solution with dynamic service discovery and load balancing

capabilities.

Services

ELB

ECS cluster

Availability Zone A Availability Zone B

Auto Scaling groupECS instance ECS instance

SV1Task1

SV2Task2

SV1Task3

SV1Task2

SV2Task1

SV2Task3

Application Load Balancer

For each service, a Target

Group is created with Routing

Rule. Service Containers with

dynamic ports are added to

the target group

Application Load Balancer

Routes requests to service

containers based on routing

rules and target groups

3

Container Stack

Operations Automation

Infrastructure automation lets developers focus on application

development and less time on infrastructure coding:

• Lambda functions, Jenkins jobs for container stack creation, termination

• Blue/green and canary deployment automation tools

• AMI update automation

• Container health checks, alerts, and actions

• Integration with enterprise logging solution

• Monitoring solution with CloudWatch

• JVM stats monitoring with CloudWatch

• Automatic scaling of ECS containers

• Automatic scaling of ECS Instances

• Test apparatus self-service tool for performance testing

Infrastructure Creation

Automation

Automation tooling provides a consistent and repeatable way for

users to create container stacks without writing a single line of

infrastructure code

virtual private cloud

Parameters

for container

stack creation

S3

parameters put

event triggers

Lambda

Lambda + Terraform

Lambda executes Terraform with

parameters for infrastructure

creation

Container stack is

created in the VPC

Users provide

parameters like

subnets, security

groups, etc.

Users provide information like subnets, security groups, metrics,

alarms, and alerts as parameters for a container stack creation tool.

instance_type="m3.medium”

server_subnets="subnet-ab12,subnet-ab12”

ecs_sg="sg-sg1234”

asg_min=”3”

asg_max=”9”

asg_desired=”6”

sns_topic=”my-alerts”

scalein_adjustment="-1”

scaleout_adjustment="1”

scalein_cooldown="300”

scaleout_cooldown="300”

scaleout_alarm_cpu_interval_secs="900”

scaleout_cpu_percent="80”

scalein_cpu_percent=”40"

custom_script_location=“my_s3_bucket"

custom_script_name =”custom-script”

X509_cert_location=“my_s3_bucket”

X509_cert_files=“cert1.cer,cert2.cer”

ecs_cluster_name=”my-app-cluster”

iam_role=“my_app_Iam_role”

docker_registry=“my-docker-registry”

proxy_server=“my-co.proxycom”

Three Lambda functions make up the core of the stack creation; these

microfunctions decouple the compute cluster, Application Load

Balancer, and ECS service so they can have their own lifecycles.

Availability Zone A Availability Zone B

Auto Scaling groupECS Instance ECS Instance

ECS cluster

Lambda

Creates ECS cluster and

EC2 instances

Lambda

Creates load balancer with

default TG

Lambda

Creates target group, ECS

service, and rule

Compute cluster

Rehydrated independently

Availability Zone A Availability Zone B

Auto Scaling groupECS Instance ECS Instance

ECS cluster

Application Load Balancer

End users get the same endpoints

Availability Zone A Availability Zone B

Auto Scaling groupECS Instance ECS Instance

ECS cluster

service1Task1

service2Task2

service1Task3

service1Task2

service2Task1

service2Task3

ECS service

Target group and service deployments

AMI Update Automation

Users perform regular AMI updates without outages using

automation tooling Lambda functions

ECS cluster

Availability Zone A Availability Zone B

Auto Scaling groupECS Instance ECS Instance

service1Task1

service2Task2

service1Task3

service1Task2

service2Task1

service2Task3

Old AMI Old AMI

Old AMI container stack

AMI update: Lambda function creates new ECS cluster and EC2

instances with the new AMI.

ECS cluster

Availability Zone A Availability Zone B

Auto Scaling groupECS Instance ECS Instance

Availability Zone A Availability Zone B

Auto Scaling groupECS Instance ECS Instance

service1Task1

service2Task2

service1Task3

service1Task2

service2Task1

service2Task3

New AMI New AMI

Lambda

Creates updated EC2

instances

ECS cluster

1

Old AMI Old AMI

AMI update: Lambda function replicates ECS services from old

AMI cluster to new AMI-based ECS cluster

ECS cluster

Availability Zone A Availability Zone B

Auto Scaling groupECS Instance ECS Instance

Availability Zone A Availability Zone B

Auto Scaling groupECS Instance ECS Instance

service1Task1

service2Task2

service1Task3

service1Task2

service2Task1

service2Task3

service1Task1

service2Task2

service1Task3

service1Task2

service2Task1

service2Task3

New AMI New AMI

Lambda

Replicates ECS services to

new AMI instances

ECS cluster

Old AMI Old AMI

2

AMI update: Lambda function drains and deletes ECS services from

the old ECS cluster and terminates the old AMI EC2 instances

ECS cluster

Availability Zone A Availability Zone B

Auto Scaling groupECS Instance ECS Instance

Availability Zone A Availability Zone B

Auto Scaling groupECS Instance ECS Instance

service1Task1

service2Task2

service1Task3

service1Task2

service2Task1

service2Task3

service1Task1

service2Task2

service1Task3

service1Task2

service2Task1

service2Task3

New AMI New AMI

ECS cluster

Old AMI Old AMI

Lambda

Delete ECS services and old

instances

3

X

AMI update: This completes the AMI update for the stack without

causing any outages

ECS cluster

Availability Zone A Availability Zone B

Auto Scaling groupECS Instance ECS Instance

service1Task1

service2Task2

service1Task3

service1Task2

service2Task1

service2Task3

New AMI New AMI

New AMI

container stack

Blue/Green Deployment

Automation

Blue/green deployment reduces downtime and risk by running two

environments called Blue and Green and toggling between them

Image Courtesy: http://martinfowler.com/bliki/BlueGreenDeployment.html

Users perform Blue/Green deployments and rollbacks using

automation tooling Lambda functions

ECS cluster

Availability Zone A Availability Zone B

Auto Scaling groupECS Instance ECS Instance

service1Task1

service1Task2

Blue services

Blue/Green: Lambda function creates a beta ELB and green service;

users can test green service with Beta ELBLambda

Creates beta load balancer and

green service

ECS cluster

Availability Zone A Availability Zone B

Auto Scaling groupECS Instance ECS Instance

service1Task1

service1Task2service1

Task1service1Task1

Blue/Green: Lambda function adds green service to the original

ELB; traffic flows to green serviceLambda

Adds green service to

original LB

ECS cluster

Availability Zone A Availability Zone B

Auto Scaling groupECS Instance ECS Instance

service1Task1

service1Task2service1

Task1service1Task1

Blue/Green: Lambda function deletes blue services and the Beta

ELB; traffic flows to green serviceLambda

Deletes beta LB and blue

services

ECS cluster

Availability Zone A Availability Zone B

Auto Scaling groupECS Instance ECS Instance

service1Task1

service1Task1

Canary Deployment

Automation

Canary deployment is a way of releasing a new version of an

application by mixing new and old versions and gradually

increasing the percentage of new version

Load Balancer

V1 V1 V1 V1 V2

Load Balancer

V1 V1 V2V2V2

Load Balancer

V2V2V2V2V2

Canary deployment automation allows applications to roll out new

versions in a very controlled manner

Canary release automation uses Lambda, ECS, and other AWS

services. Flexible, serverless, and lean.

Deployment

Request Bucket

Deployment

JSON S3 Trigger

Lambda

function

SQS Poll

Lambda

function

Deployment

SQS QueueECS

Service

Deployment

State Bucket

ECS

Instance

ECS

Instance

SNS Topic

1 23

4

5

6

7

8

911

1210

Lessons and Looking Forward

Lessons learned and looking forward

• Amazon ECS has significantly reduced our container stack operations

• With ECS and our automation tooling, Docker apps can be deployed

with production hardened container stack in minutes

We would like to see the following ECS features that will

accelerate our enterprise adoption

• Container-level security groups

• Container placement constraints

• Balancing placements with scale-in, scale-out actions

Thank you!

Sathiya Shunmugasundaram @ Capital One

Gnani Dathathreya @ Capital One

Remember to complete

your evaluations!