Re:invent 2016 Container Scheduling, Execution and AWS Integration

Preview:

Citation preview

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Andrew Spyker (@aspyker)

12/1/2016

Container Scheduling, Executionand AWS Integration

What to Expect from the Session

• Why containers?• Including current use cases and scale

• How did we get there?• Overview of our container cloud platform

• Collaboration with ECS

About Netflix

• 86.7M members• 1000+ developers• 190+ countries• > ⅓ NA internet download traffic• 500+ Microservices• Over 100,000 VM’s• 3 regions across the world

Why containers?

Given our VM architecture comprised of …

amazingly resilient,microservice driven,cloud native,CI/CD devops enabled,elastically scalable

do we really need containers?

Our Container System Provides Innovation Velocity

• Iterative local development, deploy when ready

• Manage app and dependencies easily and completely

• Simpler way to express resources, let system manage

Innovation Velocity - Use Cases

• Media Encoding - encoding research development time• Using VM’s - 1 month, using containers  - 1 week

• Niagara• Build all Netflix codebases in hours• Saves development 100’s of hours of debugging

• Edge Rearchitecture with NodeJS• Focus returns to app development• Simplifies, speeds test and deployment

Why not use existing container mgmt solution?

• Most solutions are focused on the datacenter• Most solutions are

• Working to abstract datacenter and cross-cloud• Delivering more than cluster manager• Not yet at our level of scale

• Wanted to leverage our existing cloud platform• Not appropriate for Netflix

Batch

What do batch users want?

• Simple shared resources, run till done, job files

• NOT• EC2 Instance sizes, autoscaling, AMI OS’s

• WHY• Offloads resource management ops, simpler

Historic use of containers

• General Workflow (Meson), Stream Processing (Mantis)

• Proven using cgroups and Mesos

• With simple isolation

• Using specific packaging formatsLinux

cgroups

Enter Titus

Job Management

Batch

Resource Management & Optimization

Container ExecutionIntegration

Sample batch use cases

• Algorithm Model Training

GPU usage

• Personalization and recommendation• Deep learning with neural nets/mini batch

• Titus• Added g2 support using nvidia-docker-plugin• Mounts nvidia drivers and devices into Docker container• Distribution of training jobs and infrastructure made self service

• Recently moved to p2.8xl instances• 2X performance improvement with same CUDA based code

Sample batch use cases

• Media Encoding Experimentation

• Digital Watermarking

Sample batch use cases

Ad hocReporting

Open ConnectCDN Reporting

Lessons learned from batch

• Docker helped generalize use cases• Cluster autoscaling adds efficiency• Advanced scheduling required• Initially ignored failures (with retries)• Time sensitive batch came later

Titus Batch Usage (Week of 11/7)

• Started ~ 300,000 containers during the week• Peak of 1000 containers per minute• Peak of 3,000 instances (mix of r3.8xls and m4.4xls)

Services

Adding Services to Titus

Job Management

Batch

Resource Management & Optimization

Container ExecutionIntegration

Service

Services are just long running batch, right?

Services more complex

Services resize constantly and run forever• Autoscaling• Hard to upgrade underlying hosts

Have more state• Ready for traffic vs. just started/stopped• Even harder to upgrade

Existing well defined dev, deploy, runtime & ops tools

Real Networking is Hard

Multi-Tenant Networking is Hard

• IP per container• Security group support• IAM role support• Network bandwidth isolation

Solutions

• VPC Networking driver• Supports ENI’s - full IP functionality• With scheduling - security groups• Support traffic control (isolation)

• EC2 Metadata proxy• Adds container “node” identity• Delivers IAM roles

VPC Networking Integration with Docker

Titus Executor

Titus Networking Driver

- Create and attach ENI with- security group- IP address

create net namespace

VPC Networking Integration with Docker

Titus Executor

Titus Networking Driver

- Launch ”pod root” container with- IP address- Using “pause” container- Using net=none

Pod RootContainer Docker

create net namespace

VPC Networking Integration with Docker

Titus Executor

Titus Networking Driver

- Create virtual ethernet- Configure routing rules- Configure metadata proxy iptables NAT- Configure traffic control for bandwidth

pod_root_id

Pod RootContainer

VPC Networking Integration with Docker

Titus Executor

Pod RootContainer(pod_root_id)

Docker

App Container

create container with--net=container:pod_root_id

Metadata Proxy

container

Amazon Metadata Service

(169.254.169.254)

Titus Metadata Proxy

What is my IP, instanceid, hostname?- Return Titus assigned

What is my ami, instance type, etc.- Unknown

Give me my role credentials- Assume role to container role, return

credentials

Give me anything else- Proxy

veth<id>

169.254.169.254:80

host_ip:9999

iptables/NAT

Putting it all together

Virtual Machine HostENI1sg=A

ENI2sg=X

ENI3sg=Y,Z

Non-routable IP IP1

IP2

IP3

sg=X sg=X sg=Y,ZNonroutable IP, sg=A Metadata proxy

Appcontainer

pod root

veth<id>

Appcontainer

pod root

veth<id>

Appcontainer

pod root

veth<id>

Appcontainer

pod root

veth<id>

Container 1 Container 2 Container 3 Container 4

Linux Policy Based Routing+ Traffic Control

169.254.169.254

NAT

Additional AWS Integrations

• Live and rotated to S3 log file access• Multi-tenant resource isolation (disk)• Environmental context• Automatic instance type selection• Elastic scaling of underlying resource pool

Netflix Infrastructure Integration

• Spinnaker CI/CD• Atlas telemetry• Discovery/IPC• Edda (and dependent systems)• Healthcheck, system metrics pollers• Chaos testing

VM’sVM’s

Why? Single consistent cloud platform

VPC

EC2

Virtual Machines

AWS

Autoscaler

ServiceApplications

Cloud Platform Libraries(metrics, IPC, health)

Titus Job Control

VM’sVM’s

Container

ServiceApplications

Cloud Platform Libraries(metrics, IPC, health)

VM’sVM’s

Container

BatchApplications

Cloud Platform Libraries(metrics, IPC)

Edda EurekaAtlas

Titus Spinnaker Integration

Deploy Based On New Docker

Registry Tags

Deployment Strategies Same

as ASG’s

IAM Roles and Sec Groups Per

Container

Basic Resource

Requirements

Easily See Healthcheck &

Service Discovery Status

Fenzo – The heart of Titus scheduling

Extensible Library for Scheduling Frameworks

• Plugins based scheduling objectives• Bin packing, etc.

• Heterogeneous resources & tasks• Cluster autoscaling

• Multiple instance types• Plugins based constraints evaluator

• Resource affinity, task locality, etc.• Single offer mode added in support of ECS

Fenzo scheduling strategy

For each task

On each hostValidate hard constraintsEval fitness and soft constraints

Until fitness “good enough”, andA minimum #hosts evaluated

Plugins

Scheduling – Capacity Guarantees

DesiredMax

Titus maintains …Critical tier• guaranteed

capacity & start latencies

Flex tier• more dynamic

capacity & variable start latency

Titus MasterScheduler

Fenzo

Scheduling – Bin Packing, Elastic Scaling

Max

User adds work tasks

• Titus does bin packing to ensure that we can downscale entire hosts efficiently

Canterminate

Desired

Min

✖ ✖ ✖ ✖

Titus MasterScheduler

Fenzo

Availability Zone B

Availability Zone A

Scheduling – Constraints including AZ Balancing

User specifies constraints

• AZ Balancing• Resource and Task

affinity• Hard and softDesired

Min

Titus MasterScheduler

Fenzo

ASG version 001

Scheduling – Rolling new Titus code

Operator updates Titus agent codebase

• New scheduling on new cluster• Batch jobs drain• Service tasks are migrated via

Spinnaker pipelines• Old cluster autoscales down

Desired

Min

ASG version 002

Min

Desired

✖ ✖

Titus MasterScheduler

Fenzo

Current Service Usage

• Approach• Started with internal applications• Moved on to line-of-fire NodeJS (shadow first, prod 1Q17)• Moved on to stream processing (prod 4Q)

• Current - ~ 2000 long running containers

1Q

Batch 2Q

Servicepre-prod 3Q

Serviceshadow

ServiceProd

4Q

Collaboration with ECS

Why ECS?

• Decrease operational overhead of underlying cluster state management

• Allow open source collaboration on ECS Agent• Work with Amazon and others on EC2 enablement• GPUS, VPC, Sec Groups, IAM Roles, etc.• Over time this enablement should result in less maintenance

Titus Today

Container Host

mesos-agent

Titus executor

containercontainer

containerMesos master

Titus Scheduler

EC2 Integration

Outbound- Launch/Terminate Container- ReconciliationInbound- Container Host Events (and offers)- Container Events

First Titus ECS Implementation

Container Host

ECS agent

Titus executor

containercontainer

containerECSTitus

Scheduler

EC2 integrationOutbound

- Launch/Terminate Container- Polling for

- Container Host Events- Container Events

Collaboration with ECS team starts

• Collaboration on ECS “event stream” that could provide• “Real time” task & container instance state changes• Event based architecture more scalable than polling

• Great engineering collaboration• Face to face focus• Monthly interlocks• Engineer to engineer focused

Current Titus ECS Implementation

Container Host

ECS agent

Titus executor

containercontainer

containerECS

Titus Scheduler

EC2 Integration

Outbound- Launch/Terminate Container- ReconciliationInbound- Container Host Events- Container Events

Cloud Watch EventsSQS

Analysis - Periodic Reconciliation

For tasks in listTasksdescribeTasks (batches of 100)

Number of API calls: 1 + num tasks / 100 per reconcile

1280 containersacross 40 nodes

Analysis - Scheduling

• Number of API calls: 2X number of tasks• registerTaskDefinition and startTask

• Largest Titus historical job• 1000 tasks per minute• Possible with increased rate limits

Continued areas of scheduling collaboration

• Combining/batching registerTaskDefinition and startTask

• More resource types in the control plane• Disk, Network Bandwidth, ENI’s

• To fit with existing scheduler approach• Extensible message fields in task state transitions• Named tasks (beyond ARN’s) for terminate• Starting vs. Started state

Possible phases of ECS support in Titus

• Work in progress• ECS completing scheduling collaboration items• Complete transition to ECS for overall cluster manager• Allows us to contribute to ECS agent open source

Netflix cloud platform and EC2 integration points

• Future• Provide Fenzo as the ECS task placement service• Extend Titus Job Management features to ECS

Titus Future Focus

Future Strategy of Titus

• Service Autoscaling and global traffic integration• Service/Batch SLA management

• Capacity guarantees, fair shares and pre-emption• Trough / Internal spot market management• Exposing pods to users• More use cases and scale

Questions?

Andrew Spyker (@aspyker)

Thank you!

Remember to complete your evaluations!

Recommended