Slaying Monoliths with Node and Docker

Preview:

Citation preview

November 2016

Slaying Monoliths with &

Yunong XiaoPrincipal Engineer

@yunongxhttp://yunong.io

#netflixeverywhere

Subscriber Growth

20M

33M

46M

59M

72M

85M

2011 2012 2013 2014 2015 2016

API Evolution

So You Want to Watch Netflix

So You Want to Watch Netflix

Watch Anywhere

In The Beginning…

Java Web Server

❖ Java based web server

❖ Renders UI

❖ Accesses data

❖ Individual clients for each service

❖ Different behavior for each client

Java Web server

Route A

Route B

Route C

Route D

Route N

Client Library A

Client Library B

Client Library C

Client Library N

Backend Service A

Backend Service B

Backend Service C

Backend Service N

Spot the Monolith

Java Web server

Route A

Route B

Route C

Route D

Route N

Client Library A

Client Library B

Client Library C

Client Library N

Backend Service A

Backend Service B

Backend Service C

Backend Service N

MONO

LITH

New Devices

API Evolution

Java Web Server

Java Web server

Route A

Route B

Route C

Route D

Route N

Client Library A

Client Library B

Client Library C

Client Library N

Backend Service A

Backend Service B

Backend Service C

Backend Service N

MONO

LITH

REST API

REST API

Backend Service A

Backend Service B

Backend Service C

Backend Service N

REST API

❖ Inflexible: waiting for weeks between API changes.

❖ Inefficient: multiple round trips

❖ Complex API: hard to maintain

API Evolution

Design for Innovation

❖ Rapid innovation

❖ More AB tests and devices

❖ Customized API

❖ Performance matters

REST API

REST API

Backend Service A

Backend Service B

Backend Service C

Backend Service N

API.NEXT

API Server

Script A

Script B

Script C

Script D

Script N

Client Library A

Client Library B

Client Library C

Client Library N

Backend Service A

Backend Service B

Backend Service C

Backend Service N

MONO

LITH

Scale

❖ 42.5 billion hours watched in 2015

❖ “Massive” RPS: Billions/day

❖ 1000s of scripts active in prod, 10000s in test

❖ 100s of changes/day

❖ 100s of AB tests with many variants/test

All Scripts Live in One Process

❖ Vertical Scale: Running out of headroom

❖ Memory

❖ I/O

❖ Instance cost: Largest instances $ can buy

HappySad Together?

❖ Resource contention

❖ 1 bad script takes out everyone

❖ Conflicting dependencies

API Server

Script A

Script B

Script C

Script D

Script N

Client Library A

Client Library B

Client Library C

Client Library N

Backend Service A

Backend Service B

Backend Service C

Backend Service N

Developer ErgonomicsUI Engineering Systems Engineering

API Evolution

Requirements

❖ Scalability

❖ Availability

❖ Developer productivity

Runtime Scalability & Availability

❖ Process isolation

❖ Separation of data access scripts and API servers

❖ Reduce infrastructure costs

❖ Horizontally scalable architecture

❖ Faster startup times

❖ Immutable deployment artifacts

Developer Productivity

❖ JS to rule them all

❖ Run and debug scripts locally, set breakpoints, step through code

❖ Fast, incremental builds

❖ As closely mirrors production as possible

API Evolution

API Server

Script A

Script B

Script C

Script D

Script N

Client Library A

Client Library B

Client Library C

Client Library N

Backend Service A

Backend Service B

Backend Service C

Backend Service N

MONO

LITH

API Server

Script A

Script B

Script C

Script D

Script N

Client Library A

Client Library B

Client Library C

Client Library N

Backend Service A

Backend Service B

Backend Service C

Backend Service N

Natural SeparationUI Engineering Systems Engineering

Next Generation Data Access API

TV

iOS

Android

Windows

Browsers

Remote Service Layer

Search

MAP

GPS

Playback

Clients Node API Edge API Backend Services

Node API Platform

❖ Set of JS data access scripts

❖ Running Node.js + restify

❖ Inside of a Docker

/browse/search/account/signup

Unified Remote Service Layer

/bootstrap/search/account/login

Unified Remote Service Layer

Evolutionary Traits

❖ Runtime platform

❖ Application management

❖ Container infrastructure

❖ Developer tools

“Production”

Evolutionary Traits

❖ Runtime platform

❖ Application management

❖ Container infrastructure

❖ Developer tools

“Production”

-Twitter

“A full-stack developer is one who can add technical debt to any layer of the

application”

Aim: Paved Path for Data Access Apps

❖ Metrics

❖ Alerts

❖ Autoscaling

❖ Load balancing

❖ Discovery

❖ Analytics

Node Runtime: Platform as a Service

❖ Production ready Node platform

❖ Just bring JS business logic

❖ Everything else is “free”

❖ No servers/infrastructure to manage

nf-iso-properties

Properties Discovery RPC

nf-eureka-client

reactive-datasource

Insight

nf-atlas-client

bunyan-suro

(data-pipeline)

bunyan (logging)

nf-salp

Web serverRuntime

reactive-socket-lb

HTTP Client

Evolutionary Traits

❖ Runtime platform

❖ Application management

❖ Container infrastructure

❖ Developer tools

“Production”

Aim: Simple App Management

❖ Versioning

❖ Deployment

❖ Operational Insights

Versioning: Current Problems

❖ APIs change all the time

❖ 100000s different versions

❖ 1000s live in prod

Versioning: Inconsistency

api.netflix.com/tvui/1469577600021

api.netflix.com/web/6dbd361

api.netflix.com/ios/1.3.2

api.netflix.com/android/1234

Build Timestamp

Git sha

App version

Integer

Aim: Consistent Versions & Reproducible Builds

Solution: Use SemVer

Versioning: Node API Index

Routing

api.netflix.com/tvui/1469577600021

api.netflix.com/web/6dbd361

api.netflix.com/ios/1.3.2

api.netflix.com/android/1234

Build Timestamp

Git sha

App version

Integer

Problem: API Upgrades

api.netflix.com/ios/1.3.2 1.3.2

1.3.3

Path immutably baked into client

Solution: SemVer Routing

api.netflix.com/ios/^1.0.0

1.3.2

1.3.3

1.4.0

1.6.5

nq.netflix.com

api.netflix.com/ios/1.3.2

^1.0.0

^1.0.0

1.3.2 1.3.2

Operational Insights

❖ List and view deployed apps and routes

❖ Deployment history

❖ Metrics: RPS, latency, errors, …

❖ Analytics

Generated Dashboards

Evolutionary Traits

❖ Runtime platform

❖ Application management

❖ Container infrastructure

❖ Developer tools

“Production”

Titus: Container Management & Scheduling

Fenzo

Evolutionary Traits

❖ Runtime platform

❖ Application management

❖ Container infrastructure

❖ Developer tools

“Production”

Aim: Developer Productivity

❖ Run and debug scripts locally

❖ Fast, incremental builds

❖ Local “prod” environment

Local Development: Builds are Slow

Build depsCommit to SCM

DocumentJS NQ Scripts

Build Docker Image

Tens of Minutes

Rapid Local Development: Debug in SecondsDeveloper Laptop (Mac OSX)

Virtual Box (Linux)Running Docker Host

Docker Server

ContainerRunning MyApp Image MyApp Image

MyApp scripts & config

NodeQuark Image

Prana Image

NodeJS Image

Ubuntu Image

Recap: Containers

❖ Process isolation❖ Layered dependency management❖ Portability across environments:

prod->test❖ Fast deployment❖ Single deployment artifact: Docker

image

Recap: Node.js

❖ JS everywhere: client & server❖ Performant❖ Lightweight & efficient: run

locally❖ Non blocking❖ Superb ecosystem (npm)❖ Built for the web

Recap: Node Platform❖ Developer productivity

❖ Fast incremental builds❖ Run, debug, and test locally❖ Local prod like environment

❖ Scalability & availability❖ Monolith -> micro-services❖ Process isolation: better availability❖ Horizontally scalable architecture❖ Immutable deployment artifacts

Unified Remote Service Layer

Backend Service A

Backend Service B

Backend Service C

Backend Service N

Thanks!

❖ Interested? is hiring! ❖ @yunongx❖ yunong@netflix.com❖ yunong.io