90
ECS & Docker: Secure Async Execution @ Brennan Saeta

ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

ECS & Docker:Secure Async Execution @

Brennan Saeta

Page 2: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful
Page 3: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

The Beginnings — 2012

10courses

1 million learners

worldwide

4partners

Page 4: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Education at Scale

1,800courses

18 million learners

worldwide

140partners

Page 5: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Outline

• Evolution of Coursera’s nearline execution systems

• Next-generation execution framework: Iguazú

• Iguazú application deep dive: GrID — evaluating programming assignments

Page 6: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Key Takeaways

• What is nearline execution, and why it is useful

• Best practices for running containers in production in the cloud

• Hardening techniques for securely operating container infrastructure at scale

Page 7: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

A history of nearline execution

Page 8: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful
Page 9: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Coursera Architecture (2012)

PHP Monolith

Page 10: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Early days - Requirements

• Video re-encoding for distribution

• Grade computation for 100,000+ learners

• Pedagogical data exports for courses

Page 11: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Coursera Architecture (2012)

PHP Monolith

Page 12: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Cascade Architecture

PHP Monolith

PHP Monolith

Cascade

Page 13: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Cascade Architecture

PHP Monolith

PHP Monolith

Cascade

Queue

Page 14: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Upgrading to ScalaRe-architecting delayed execution for our 2nd generation learning platform.

Page 15: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Upgrading to the JVM

• Leverage mature Scala & JVM ecosystems for code sharing

• JVM much more reliable (no memory leaks)

• New job model: scheduled recurring jobs.• Named: Saturn

Page 16: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Saturn Architecture

Service A

Service B

Service C

C*

Online ServingScala/micro-service architecture

C*

Page 17: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Saturn Architecture

Service A

Service B

Service C

C*

Online ServingScala/micro-service architecture

Saturn

C*

Page 18: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Saturn Architecture

Service A

Service B

Service C

C*

Saturn

C*

ZK Ensemble

Page 19: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Saturn Architecture

SaturnLeader ZK

Ensemble

Service A

Service B

Service C

C*C*

Page 20: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Problems with Saturn

• Single master meant naïve implementation ran all jobs in same JVM• Huge CPU contention @ top of the hour

• OOM Exceptions & GC issues

Page 21: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Enter: Docker

Containers allow for resource isolation!

CC-by-2.0 https://www.flickr.com/photos/photohome_uk/1494590209

Page 22: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Supported Features

Platform

Saturn DockerAmazon

ECSIguazú

Run code ✅ ✅ ✅ ✅

Resource Isolation ❌ ✅ ✅ ✅

Clusters /HA ☑️ ❌ ✅ ✅

Greatdeveloper workflow

✅ ❌ ❌ ✅

ScheduledJobs ✅ ❌ ❌ ✅

Page 23: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Supported Features

Platform

Saturn DockerAmazon

ECSIguazú

Run code ✅ ✅ ✅ ✅

Resource Isolation ❌ ✅ ✅ ✅

Clusters /HA ✅ ❌ ✅ ✅

Greatdeveloper workflow

✅ ❌ ❌ ✅

ScheduledJobs ✅ ❌ ❌ ✅

Page 24: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Supported Features

Platform

Saturn DockerAmazon

ECSIguazú

Run code ✅ ✅ ✅ ✅

Resource Isolation ❌ ✅ ✅ ✅

Clusters /HA ✅ ❌ ✅ ✅

Greatdeveloper workflow

✅ ❌ ❌ ✅

ScheduledJobs ✅ ❌ ❌ ✅

Page 25: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Supported Features

Platform

Saturn DockerAmazon

ECSIguazú

Run code ✅ ✅ ✅ ✅

Resource Isolation ❌ ✅ ✅ ✅

Clusters /HA ✅ ❌ ✅ ✅

Greatdeveloper workflow

✅ ❌ ❌ ✅

ScheduledJobs ✅ ❌ ❌ ✅

Page 26: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Supported Features

Platform

Saturn DockerAmazon

ECS???

Run code ✅ ✅ ✅ ✅

Resource Isolation ❌ ✅ ✅ ✅

Clusters /HA ✅ ❌ ✅ ✅

Greatdeveloper workflow

✅ ❌ ❌ ✅

ScheduledJobs ✅ ❌ ❌ ✅

Page 27: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Solution: Iguazú

Marissa Strniste (https://www.flickr.com/photos/mstrniste/5999464924) CC-BY-2.0

Page 28: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Solution: Iguazú

• Framework & service for asynchronous execution• Optimized Scala developer

experience for Coursera

• Unified scheduler supports:• Immediate execution (nearline)

• Scheduled recurring execution (cron-like)

• Deferred execution (run once @ time X)

Marissa Strniste (https://www.flickr.com/photos/mstrniste/5999464924) CC-BY-2.0

Page 29: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Iguazú Architecture

Iguazú Frontend

Iguazú Scheduler

Iguazú Backend

CassandraServices Services

Iguazú Admin

IguazúWorkers

SQS

ECS API

Devs

Users

Page 30: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Iguazú Architecture

Iguazú Frontend

Iguazú Scheduler

Iguazú Backend

CassandraServices Services

Iguazú Admin

IguazúWorkers

SQSQueue

ECS API

Devs

Users

Page 31: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Iguazú Architecture

Iguazú Frontend

Iguazú Scheduler

Iguazú Backend

CassandraServices Services

Iguazú Admin

IguazúWorkers

ECS API

Devs

Users

SQSQueue

Page 32: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Iguazú Architecture

Iguazú Frontend

Iguazú Scheduler

Iguazú Backend

CassandraServices Services

Iguazú Admin

IguazúWorkers

ECS API

Devs

Users

ZK Ensemble

SQSQueue

Page 33: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Iguazú Architecture

Iguazú Frontend

Iguazú Scheduler

Iguazú Backend

CassandraServices Services

Iguazú Admin

IguazúWorkers

ECS API

Devs

Users

ZK Ensemble

SQSQueue

Page 34: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Autoscale, autoscale, autoscale!

Page 35: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Autoscaling⇄ Iguazú⇆ ECS

IguazuECS APIAutoscaling

EC2Worker

EC2Worker

ShutdownLifecycle

Notification Poll WorkerJob Status

All finishedProceed

Term-inate EC2

Worker

Page 36: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Failure in Nearline Systems

• Most jobs are non-idempotent

• Iguazú: At most once execution• Time-bounded delay

• Future: At least once execution• With caveats

Page 37: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Iguazú adoption by the numbers

~100 jobs in production

>1000 runs per day

>100 different job schedules

Page 38: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Iguazú Applications

Nearline Jobs

• Pedagogical Instructor Data Exports

• System Integrations• Course Migrations

Scheduled Recurring Jobs

• Course Reminders• System Integrations

• Payment reconciliation• Course translations

• Housekeeping• Build artifact archival• A/B Experiments

Page 39: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

While containers may help you on your journey, they are not themselves a destination.

CC-by-2.0 https://www.flickr.com/photos/usoceangov/5369581593

Page 40: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Writing an Iguazu Job

class AbReminderJob @Inject() (abClient: AbClient, email: EmailAPI)

extends AbstractJob {

override val reservedCpu = 1024 // 1 CPU core

override val reservedMemory = 1024 // 1 GB RAM

def run(parameters: JsValue) = {

val experiments = abClient.findForgotten()

logger.info(s"Found ${experiments.size} forgotten experiments.")

experiments.foreach { experiment =>

sendReminder(experiment.owners, experiment.description)

}

}

}

Page 41: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Writing an Iguazu Job

class AbReminderJob @Inject() (abClient: AbClient, email: EmailAPI)

extends AbstractJob {

override val reservedCpu = 1024 // 1 CPU core

override val reservedMemory = 1024 // 1 GB RAM

def run(parameters: JsValue) = {

val experiments = abClient.findForgotten()

logger.info(s"Found ${experiments.size} forgotten experiments.")

experiments.foreach { experiment =>

sendReminder(experiment.owners, experiment.description)

}

}

}

Page 42: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Writing an Iguazu Job

class AbReminderJob @Inject() (abClient: AbClient, email: EmailAPI)

extends AbstractJob {

override val reservedCpu = 1024 // 1 CPU core

override val reservedMemory = 1024 // 1 GB RAM

def run(parameters: JsValue) = {

val experiments = abClient.findForgotten()

logger.info(s"Found ${experiments.size} forgotten experiments.")

experiments.foreach { experiment =>

sendReminder(experiment.owners, experiment.description)

}

}

}

Page 43: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Writing an Iguazu Job

class AbReminderJob @Inject() (abClient: AbClient, email: EmailAPI)

extends AbstractJob {

override val reservedCpu = 1024 // 1 CPU core

override val reservedMemory = 1024 // 1 GB RAM

def run(parameters: JsValue) = {

val experiments = abClient.findForgotten()

logger.info(s"Found ${experiments.size} forgotten experiments.")

experiments.foreach { experiment =>

sendReminder(experiment.owners, experiment.description)

}

}

}

Page 44: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Writing an Iguazu Job

class AbReminderJob @Inject() (abClient: AbClient, email: EmailAPI)

extends AbstractJob {

override val reservedCpu = 1024 // 1 CPU core

override val reservedMemory = 1024 // 1 GB RAM

def run(parameters: JsValue) = {

val experiments = abClient.findForgotten()

logger.info(s"Found ${experiments.size} forgotten experiments.")

experiments.foreach { experiment =>

sendReminder(experiment.owners, experiment.description)

}

}

}

Page 45: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Testing an Iguazu job

Page 46: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

The Hollywood Principle applies to distributed systems.

CC-by-2.0 https://www.flickr.com/photos/raindog808/354080327

Page 47: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Deploying a new Iguazu Job

• Developer• merge into master… done

• Jenkins Build Steps• Compile & package job JAR

• Prepare Docker image

• Pushes image into registry

• Register updated job with Amazon ECS API

Page 48: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Invoking an Iguazú Job

// invoking a job with one function call

// from another service via REST framework RPC

val invocationId = iguazuJobInvocationClient

.create(IguazuJobInvocationRequest(

jobName = "exportQuizGrades",

parameters = quizParams))

Page 49: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

A clean environment

increases reliability.CC-by-2.0 https://www.flickr.com/photos/raindog808/354080327

Page 50: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Evaluating Programming AssignmentsAn application of Iguazú

Page 51: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful
Page 52: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful
Page 53: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Design Goals

Elastic Infrastructure

No Maintenance

Near Real-time Secure Infrastructure

Page 54: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Design Goals

Elastic Infrastructure

No Maintenance

Near Real-time Secure Infrastructure

Page 55: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Design Goals

Elastic Infrastructure

No Maintenance

Near Real-time Secure Infrastructure

Page 56: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Solution: GrID

Patrick Hoesly (https://www.flickr.com/photos/zooboing/5665221326/) CC-BY-2.0

• Service + framework for gradingprogramming assignments

• Builds on Iguazú

• Named for Tron’s “digital frontier”• Backronym: Grading Inside Docker

Page 57: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

High-level GrID Architecture

Learners

GrID

Iguazú

S3 Bucket

ECS APIs

Grading MachinesVPC Firewalls

Coursera Production Account Coursera GrID Grading Account

Page 58: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

High-level GrID Architecture

Learners

GrID

Iguazú

S3 Bucket

ECS APIs

Grading MachinesVPC Firewalls

Coursera Production Account Coursera GrID Grading Account

Page 59: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

High-level GrID Architecture

Learners

GrID

Iguazú

S3 Bucket

ECS API

Grading MachinesVPC Firewalls

Production Acct GrID Grading Account

Page 60: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

High-level GrID Architecture

Learners

GrID

Iguazú

S3 Bucket

ECS API

Grading Machines

VPC Firewalls

Production Acct GrID Grading Account

Page 61: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Design Goals

Elastic Infrastructure

No Maintenance

Near Real-time Secure Infrastructure

Page 62: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Programming Assignments

Page 63: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

The Security Challenge

Compiling and running untrusted, arbitrary code on our cluster in near real time.

Would you like to compile and run C code from random

people on the Internet on your servers?

Page 64: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

FROM redis

FROM ubuntu:latest

FROM jane’s-image

Page 65: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Security Assumptions

• Run arbitrary binaries

• Instructor grading scripts may have vulnerabilities• ∴ Grading code is untrusted

• Unknown vulnerabilities in Docker and Linux name-spacing and/or container implementation

Page 66: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Security Goals

Prevent submitted code from:• impacting the evaluation of other submissions.

• disrupting the grading environment (e.g., DoS)

• affecting the rest of the Coursera learning platform

Page 67: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Grading assignment submissions

CC-by-2.0 https://www.flickr.com/photos/dherholz/4367511580/

Page 68: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful
Page 69: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

CPU CPU CPU CPU

RAM

Alice’s Container

Alice’s Submission

Grader

Bob’s Container

Bob’s Submission

Grader

Mallory’s Container

Mallory’s Submission

Grader

Kernel

Disk

Page 70: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

CPU CPU CPU CPU

RAM

Alice’s Container

Alice’s Submission

Grader

Bob’s Container

Bob’s Submission

Grader

Mallory’s Container

Mallory’s Submission

Grader

Kernel

Disk

Page 71: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

CPU cgroups CPU cgroups

RAM — cgroups

Alice’s Container

Alice’s Submission

Grader

Bob’s Container

Bob’s Submission

Grader

Mallory’s Container

Mallory’s Submission

Grader

Kernel

Disk

Page 72: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

CPU cgroups CPU cgroups

RAM — cgroups

Alice’s Container

Alice’s Submission

Grader

Bob’s Container

Bob’s Submission

Grader

Mallory’s Container

Mallory’s Submission

Grader

Kernel

Disk

Page 73: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

CPU cgroups CPU cgroups

RAM — cgroups

Alice’s Container

Alice’s Submission

Grader

Bob’s Container

Bob’s Submission

Grader

Mallory’s Container

Mallory’s Submission

Grader

Kernel

Disk — blkio limits & btrfs quotas

Page 74: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

CPU cgroups CPU cgroups

RAM — cgroups

Alice’s Container

Alice’s Submission

Grader

Bob’s Container

Bob’s Submission

Grader

Mallory’s Container

Mallory’s Submission

Grader

Kernel

Disk — blkio limits & btrfs quotas

Page 75: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Attacks: Kernel Resource Exhaustion

• Open file limits per container (nofile)

• nproc Process limits

• Limit kernel memory per cgroup

• Limit execution time

Page 76: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful
Page 77: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

CPU cgroups CPU cgroups

RAM — cgroups

Alice’s Container

Alice’s Submission

Grader

Bob’s Container

Bob’s Submission

Grader

Mallory’s Container

Mallory’s Submission

Grader

Kernel — cgroups, ulimits

Disk — blkio limits & btrfs quotas Network

Page 78: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Attacks: Network attacks

Attacks:

• Bitcoin mining

• DoS attacks on other systems

• Access Amazon S3 and other AWS APIs

Defense:

• Deny network access

Page 79: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Docker Network Modes

NetworkDisabled too restrictive• Some graders require local loopback• Feature also deprecated

--net=none + deny net_admin + audit network

• Isolation via Docker creating an independent network stack for each container

github.com/coursera/amazon-ecs-agent

Page 80: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

CC-by-2.0 https://www.flickr.com/photos/valentinap/253659858

Page 81: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

CC-by-2.0 https://www.flickr.com/photos/jessicafm/2834658255/

Page 82: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

CC-by-2.0 https://www.flickr.com/photos/donnieray/11501178306/in/photostream/

Page 83: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Defense in Depth

• Mandatory Access Control (App Armor)• Allows auditing or denying access to a

variety of subsystems

• Drop capabilities from bounding set• No need for NET_BIND_SERVICE,

CAP_FOWNER, MKNOD

• Deny root within container

Page 84: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Deny Root Escalations

• We modify instructor grader images

before allowing them to be run

• Clears setuid

• Inserts C wrapper to drop privileges from

root and redirect stdin/stdout/stderr

• Run cleaning job on another Iguazú

cluster

• Run Docker in Docker!

• Docker 1.10 adds User Namespaces

Page 85: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

If all else fails…

• Utilizes VPC security measures to

further restrict network access

• No public internet access

• Security group to restrict

inbound/outbound access

• Network flow logs for auditing

• Separate AWS account

• Run in an Auto Scaling group

• Regularly terminate all grading EC2

instances

Page 86: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Other Security Measures

• Utilize AWS CloudTrail for audit logs

• Third-party security monitoring

(Threat Stack)• No one should log in, so any TTY is an alert

• Penetration testing by third-party red

team (Synack)

Page 87: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Lessons Learned - GrID

• Building a platform for code execution is hard!

• Carefully monitor disk usage

• Run the latest kernels• Latest security patches

• btrfs wedging on older kernels• Default Ubuntu 14.04 kernel not new

enough!

Page 88: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Reliable deploytooling pays for itself.

Page 89: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Thank you!Brennan Saeta

github/saeta@bsaeta

[email protected]

Frank Chengithub/frankchn

@[email protected]

GrID lead Iguazú Lead

Page 90: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful

Questions?Brennan Saeta

github/saeta@bsaeta

[email protected]

Frank Chengithub/frankchn

@[email protected]

GrID lead Iguazú Lead