40
Cloud Native NetflixOSS Services on Docker Andrew Spyker (@aspyker) Sudhir Tonse (@stonse)

Cloud Native NetflixOSS Services on Docker

  • Upload
    elisha

  • View
    103

  • Download
    9

Embed Size (px)

DESCRIPTION

Cloud Native NetflixOSS Services on Docker. Andrew Spyker (@ aspyker ) Sudhir Tonse (@ stonse ). Agenda. Introduction NetflixOSS, Cloud Native with Operational Excellence, and IBM Cloud Services Fabric Docker Local Port Docker Cloud Port. About Andrew. @ aspyker ispyker.blogspot.com. - PowerPoint PPT Presentation

Citation preview

Page 1: Cloud Native NetflixOSS Services on Docker

Cloud Native NetflixOSS Services on Docker

Andrew Spyker (@aspyker)

Sudhir Tonse (@stonse)

Page 2: Cloud Native NetflixOSS Services on Docker

Agenda

• Introduction– NetflixOSS, Cloud Native with Operational

Excellence, and IBM Cloud Services Fabric• Docker Local Port• Docker Cloud Port

Page 3: Cloud Native NetflixOSS Services on Docker

About Andrew

• IBM - Cloud Performance Architecture and Strategy

• How did I get into cloud?– Performance led to cloud scale, led to cloud platforms– Created Mobile/Cloud Acme Air– Cloud platforms led to NetflixOSS, led to winning Netflix Cloud Prize for best sample application– Also ported to IBM Cloud - SoftLayer– Two years focused on IBM Cloud Services Fabric and Operations

• RTP dad that enjoys technology as well asrunning, wine and poker

@aspyker

ispyker.blogspot.com

Page 4: Cloud Native NetflixOSS Services on Docker

About Sudhir

• Manages the Cloud Platform Infrastructure team at Netflix

• Many of these components have been open sourced under the NetflixOSS umbrella.

• Sudhir is a weekend golfer and tries to make the most of the wonderful California weather and public courses.

Page 5: Cloud Native NetflixOSS Services on Docker

NetflixOSS on Github

• NetflixOSS is what it takes to run a cloud service and business with operational excellence

• netflix.github.io–40+ OSS projects–Expanding every day

• Focusing more on interactive mid-tier server technology today

Page 6: Cloud Native NetflixOSS Services on Docker

OROtherIaaS

NetflixOSS Categorized

Page 7: Cloud Native NetflixOSS Services on Docker

REST Framework/Bootstrapping/DI Karyon/Governator

Resiliency/Fallback Hystrix

RPC (Routing/LB) Ribbon/Eureka

Distributed Co-ordination (Zookeeper) Curator

Distributed Caching EVCache

NoSQL (Cassandra) Persistence Astyanax

Monitoring Turbine

Metrics Servo

Logging Blitz4J

Function NetflixOSS Library

Functional Reactive Programming RxJava

Properties/Configuration Archaius

Service Requests

Data Access/Caching

Config/Insights

App Instance EurekaServer(s)

Hystrix Dashboard

IPC (smart LB)

Metrics Dashboard

Cassandra

Netflix OSS – Application Container/Services

Page 8: Cloud Native NetflixOSS Services on Docker

Elastic, Web and Hyper Scale

Doing This

Not Doing That

Source: Programmableweb.com 2012

Page 9: Cloud Native NetflixOSS Services on Docker

Elastic, Web and Hyper Scale

…Front end API

(browser and mobile)

AuthenticationService

BookingService

Temporalcaching

DurableStorage

LoadBalancers

… … …

Strategy Benefit

Make deployments automated Without automation impossible

Expose well designed API to users Offloads presentation complexity to clients

Remove state for mid tier services Allows easy elastic scale out

Push temporal state to client and caching tier Leverage clients, avoids data tier overload

Use partitioned data storage Data design and storage scales with HA

Page 10: Cloud Native NetflixOSS Services on Docker

HA and Automatic Recovery

Feeling This

Not Feeling That

Page 11: Cloud Native NetflixOSS Services on Docker

Micro serviceImplementation

Call “Auth Service”

Highly Available Service Runtime Recipe

Ribbon REST clientwith Eureka

Web AppFront End

(REST services)App Service

(auth-service)

Executeauth-service

call

Hyst

rix

EurekaServer(s)

EurekaServer(s)

EurekaServer(s)

Karyon

FallbackImplementation

Implementation Detail Benefits

Decompose into micro services • Key user path always available• Failure does not propagate across service boundaries

Karyon /w automatic Eureka registration • New instances are quickly found• Failing individual instances disappear

Ribbon client with Eureka awareness • Load balances & retries across instances with “smarts”• Handles temporal instance failure

Hystrix as dependency circuit breaker • Allows for fast failure• Provides graceful cross service degradation/recovery

Page 12: Cloud Native NetflixOSS Services on Docker

IaaS High Availability

Region (Dallas)

DAL01

Datacenter (DAL06)DAL05

… … …

Eureka…

Local LBs

Web App Auth Service Booking Service

Cluster Auto Recovery and Scaling Services

……

… ……

……

… ……Global LoadBalancers …

Rule Why?

Always > 2 of everything 1 is SPOF, 2 doesn’t web scale and slow DR recovery

Including IaaS and cloud services You’re only as strong as your weakest dependency

Use auto scaler/recovery monitoring Clusters guarantee availability and service latency

Use application level health checks Instance on the network != healthy

Page 13: Cloud Native NetflixOSS Services on Docker

Only proof is testing!Chaos Testing

Region (Dallas)

DAL06

Datacenter (DAL05)

DAL01

… … …

Eureka…

Local LBs

Web App Auth Service Booking Service

Cluster Auto Recovery and Scaling Services

…… ……

…… ……

Global LoadBalancers … ✗

Chaos Gorilla

Videos: bit.ly/noss-sl-blog, http://bit.ly/sl-gorilla

Page 14: Cloud Native NetflixOSS Services on Docker

Continuous Delivery

Reading This

Not This

Page 15: Cloud Native NetflixOSS Services on Docker

ContinuousDelivery

… …v

Cluster v1 Canary v2 Cluster V2

Step Technology

Developers test locally Unit test frameworks

Continuous build Continuous build server based on gradle builds

Build “bakes” full instance image Imaginator (Aminator inspired) creates SoftLayer images

Developer work across dev and test Archaius allows for environment based context

Developers do canary tests, red/black deployments in prod

Asgard console provides app cluster common devops approach, security patterns, and visibility

ContinuousBuild Server

Baked to SoftLayerImage Templates(or AMI’s)

Page 16: Cloud Native NetflixOSS Services on Docker

Operational Visibility

If you can’t see it, you can’t improve it

Page 17: Cloud Native NetflixOSS Services on Docker

Operational Visibility

… …

Web App Auto Service

Visibility Point Technology

Basic IaaS instance monitoring Not enough (not scalable, not app specific)

User like external monitoring SaaS offerings or OSS like Uptime

Service to service interconnects Hystrix streams Turbine aggregation Hystrix dashboard

Application centric metrics Servo gauges, counters, timers sent to metrics store

Remote logging Logstash/Kibana

Threshold monitoring and alerts Services like PagerDuty for incident management

ServoHystrix/TurbineUptime

Metric/EventRepositories

LogStash/ElasticSearch/Kibana

Incidents

Page 18: Cloud Native NetflixOSS Services on Docker

3. Region (us-south-1)

5. AsgardService

3. Datacenter (DAL01) – Fabric services are clustered across 3 DC’s

3. Datacenter (DAL05) – Apps are clustered across 3 DC’s

Datacenter (DAL06)

1. Eureka

2. Local LBService

A service you depend on

4. Cluster Auto Recovery and Scaling Services

2. Global LoadBalancers …

8. LogstashKibana

6. ImaginatorService

7. UptimeService

Yourbuilt code

Tested base images /w

agents

Your front endservice

Your mid tierservice

Code and Image Build

Devops

Current IBM Cloud Services Fabric

Currently VM based

Page 19: Cloud Native NetflixOSS Services on Docker

Agenda

• Introduction• Docker Local Port

– Lessons Learned– Open Source

• Docker Cloud Port

Page 20: Cloud Native NetflixOSS Services on Docker

Demo Start

Start demoloading here

Page 21: Cloud Native NetflixOSS Services on Docker

Service Discovery(Eureka)

Web App Auth Service

Region (docker-local)

Datacenter(docker-local-1a)

… …

Cluster Auto Recovery & Scaling Service (Microscaler)

Load Balancer(Zuul) …

Docker-local-1c… …

Docker-local-1b… …

Users

Devops(admin)

Devops Console(Asgard)

Acme AirWeb App

Acme AirAuth Service Cassandra

NodeBlue and green boxes are container instances

Docker “Local” Setup

Skydock SkyDNS

Page 22: Cloud Native NetflixOSS Services on Docker

Why Docker for our work?

• Because we could, actually …– To show Netflix cloud platform as portable to non-VM clouds– Help with NetflixOSS understanding inside of IBM

• Local Testing – “Cloud in a box” more production like– Developers able to do larger scale testing– Continuous build/test tool systems able to run at “scale”

• Public Cloud Support– Understand how an container IaaS layer could be implemented

• So far, proof of concept, you can help continue– More on that later (hint open source!)

Page 23: Cloud Native NetflixOSS Services on Docker

Micro serviceImplementation

Call “Auth Service”

Ribbon REST clientwith Eureka

Web AppFront End

(REST services)App Service

(auth-service)

Executeauth-service

call

EurekaServer(s)

EurekaServer(s)

EurekaServer(s)

Karyon

DockerHost

SkydockSkyDNS Eureka Auth ServiceMicro Service

DockerDaemonEv

ent

API

Two Service Location Technologies?

Page 24: Cloud Native NetflixOSS Services on Docker

Service Location Lessons Learned

• Both did their job well– SkyDNS/SkyDock for container basic DNS

• Must be careful of DNS caching clients– Eureka for application level routing

• Interesting to see the contrasts– Intrusiveness (Eureka requires on instance/in app changes)– Data available (DNS isn’t application aware)– Application awareness (running container != healthy code)

• Points to value in “above IaaS” service location registration– Transparent IaaS implementations struggle to be as application aware

• More information on my blog http://bit.ly/aws-sd-intr

Page 25: Cloud Native NetflixOSS Services on Docker

Instance Auto Recovery / Scaling

• Auto scaling performs three important aspects– Devops cluster rolling versions– Auto recovery of instances due to failure– Auto scaling due to load

• Various NetflixOSS auto scalers– For NetflixOSS proper – Amazon Auto Scaler– For SoftLayer port – RightScale Server Arrays– For Docker local port – we implemented

“Microscaler”

Page 26: Cloud Native NetflixOSS Services on Docker

Microscaler Agent Architecture

• OSS at http://github.com/EmergingTechnologyInstitute/microscaler

• Microscaler service, agent are containers• Microscaler has CLI remote client and REST interface

• Note:– No IBM support, OSS proof of concept of auto scaler needed for local usage– Works well for small scale Docker local testing

Dockerhost

Web

App

i001

Web

App

i002

Aut

h Se

rvic

e i0

01

Auth

Ser

vice

i002

Mic

rosc

aler

Age

nt

Docker Remote

API

Mic

rosc

alerMicroscaler

REST or CLI

Page 27: Cloud Native NetflixOSS Services on Docker

Microscaler CLI/REST usage• Login CLI:

– ms login --target <API URL> --user user01 --key key• Login REST:

– curl -X POST -H "Content-Type: application/json" -d '{"user":“user01","key":“key01"}' http://localhost56785/asgcc/login– {"status":"OK","token":"a28e7079-db0b-4235-8b9b-01c229e02e9a“}

• Launch Config CLI:– ms add-lconf --lconf-name lconf1 --lconf-image-id cirros --lconf-instances-type m1.small --lconf-key key1

• Launch Config REST:– curl -X POST -H "Content-Type: application/json" -H "authorization: a28…e9a" -d

'{"name":"mylconf","image_id":”img1","instances_type":"m1.small","key":"keypair"}' http://locahost:56785/asgcc/lconfs

– {"status":"OK”}

• ASG CLI:– ms add-ms --ms-name asg1--ms-availability-zones docker01,docker02 --asg-launch-configuration lconf1 --asg-min-

instances 1 --asg-max-instances 3 --asg-scale-out-cooldown 300 --asg-scale-in-cooldown 60 --asg-no-load-balancer--asg-domain docker.local.io

– ms start-ms --ms-name asg1• ASG REST:

– curl -X POST -H "Content-Type: application/json" -H "authorization: a28…e9a" -d '{"name":”asg1","availability_zones":[”az1"],"launch_configuration":”lconf1","min_instances":1,"max_instances":3}' http://localhost:56785/asgcc/asgs

– {"status":"OK“}– curl -X PUT -H "Content-Type: application/json" -H "authorization: a28e…e9a”

http://localhost:56785/asgcc/asgs/myasg/start– {"status":"OK”}

Page 28: Cloud Native NetflixOSS Services on Docker

Working with the Docker remote API

• Microscaler and Asgard need to work against the “IaaS” API– Docker remote API to the rescue– Start and stop containers, query images and containers

• Exposed http://172.17.42.1:4243 to both– Could (should) have used socket– Be careful of security once you do this

• Found that this needs to easily configurable– Boot2docker and docker.io default to different addresses

• Found that current API isn’t totally documented– Advanced options not documented or shown in examples– Open Source to the rescue (looked at service code)– Need to work on submitting pull requests for documentation

Page 29: Cloud Native NetflixOSS Services on Docker

Region and Availability Zones

• Coded Microscaler to assign availability zones– Via user_data in an environment variable– Need metadata about deployment in Docker eventually?

• Tested Chaos Gorilla– Stop all containers in a single availability zone

• Tested Split Brain Monkey– Jepsen inspired, used iptables to isolate Docker network

• Eureka awareness of availability zones not there yet– Should be an easy change based on similar SoftLayer port

Page 30: Cloud Native NetflixOSS Services on Docker

Image management

• Docker and baked images are kindred spirits

• Using locally built images - Easy for a simple demo

• Haven’t yet pushed the images to dockerhub

• Considering Imaginator (Aminator) extension– To allow for Docker images to be built as we are VM’s– Considering http://www.packer.io/– Or maybe the other way around?

• Dockerfiles for VM images?

Page 31: Cloud Native NetflixOSS Services on Docker

Using Docker as an IaaS?

• We do all the bad things– Our containers run multiple processes– Our containers use unmanaged TCP ports– Our containers run and allow ssh access

• Good– Get all the benefits of Docker containers and images– Only small changes to CSF/NetflixOSS cloud platform

• Bad– Might not take full advantage of Docker

• Portability, container process optimizations, composability

• Considering more Docker centric approaches over time

Page 32: Cloud Native NetflixOSS Services on Docker

Where can I play with this?

# on boot2docker or docker.io under virtual box Ubuntugit clone http://github.com/EmergingTechnologyInstitute/

acmeair-netflixoss-dockerlocal

cd bin

# please read http://bit.ly/aa-noss-dl-license./acceptlicenses.sh

# get coffee (or favorite caffeinated drink), depending on download speed ~ 30 min./buildsimages.sh

# this is FAST! – but wait for about eight minutes for cross topology registration./startminimum.sh

# Route your network from guest to docker network (http://bit.ly/docker-tcpdirect)./showipaddrs.sh

# Look at the environment (Zuul front end, Asgard console, Eureka console, etc.)Browse to http://172.17.0.X

All Open SourceToday!

Page 33: Cloud Native NetflixOSS Services on Docker

Service Discovery(Eureka)

Web App Auth Service

Region (docker-local)

Datacenter(docker-local-1a)

… …

Cluster Auto Recovery & Scaling Service (Microscaler)

Load Balancer(Zuul) …

Docker-local-1c… …

Docker-local-1b… …

Users

Devops(admin)

Devops Console(Asgard)

Acme AirWeb App

Acme AirAuth Service Cassandra

NodeBlue and green boxes are container instances

Docker “Local” Setup

Skydock SkyDNS

Show demo here

Page 34: Cloud Native NetflixOSS Services on Docker

Agenda

• Introduction• Docker Local Port• Docker Cloud Port

– Lessons Learned

Page 35: Cloud Native NetflixOSS Services on Docker

DAL05 Datacenter

SoftLayer Private Network

Docker Cloud on IBM SoftLayer

DAL06 Datacenter

Dockerhost Dockerhost

Dockerhost

Web

App

i001

Web

App

i003

Aut

h Se

rvic

e i0

01

Auth

Ser

vice

i003

Web

App

i002

Web

App

i004

Aut

h Se

rvic

e i0

02

Auth

Ser

vice

i004

Regi

stry

Zuul

Eur

eka

Cass

andr

a

Mic

rosc

aler

Mic

rosc

aler

Age

nt

Mic

rosc

aler

Age

nt

Skyd

ock

SkyD

NS

Skyd

ock

Skyd

ock

Asga

rd

API P

roxy

Docker Remote

API

Docker Remote

API

Page 36: Cloud Native NetflixOSS Services on Docker

Networking

• Docker starts docker0 bridge to interconnect single host instances

• We assigned the subnet of the bridge to be a portable subnet within our SoftLayer account within a VLAN– We routed all traffic to the actual private interface

• This allows network to work seamlessly– Between datacenters– Across hardware firewall appliances– To external load balancers– To all other instances (VM’s, bare metal) in SoftLayer

• This allowed for easy networking between multiple Docker hosts

Page 37: Cloud Native NetflixOSS Services on Docker

Docker API and Multi-host

• Once you have multiple Docker hosts– You have multiple Docker remote API’s

• Wrote “API Proxy” to deal with this

• Not the best solution in the world, but worked

• Considering how this works with existing IaaS API– Single SoftLayer API handles bare metal, virtual machines– How to keep the API Docker compatible

• Maybe other more Docker centric approaches coming?

Page 38: Cloud Native NetflixOSS Services on Docker

Image Management

• Currently using standard Docker private registry

• Considering how this could be integrated with SoftLayer Image management system– Use optimized cross datacenter distribution network– Expose Docker layered versions through console

• Again, important to not lose Docker value in image transparency and portability

Page 39: Cloud Native NetflixOSS Services on Docker

DAL05 Datacenter

SoftLayer Private Network

Docker Cloud on IBM SoftLayer

DAL06 Datacenter

Dockerhost Dockerhost

Dockerhost

Web

App

i001

Web

App

i003

Aut

h Se

rvic

e i0

01

Auth

Ser

vice

i003

Web

App

i002

Web

App

i004

Aut

h Se

rvic

e i0

02

Auth

Ser

vice

i004

Regi

stry

Zuul

Eur

eka

Cass

andr

a

Mic

rosc

aler

Mic

rosc

aler

Age

nt

Mic

rosc

aler

Age

nt

Skyd

ock

SkyD

NS

Skyd

ock

Skyd

ock

Asga

rd

API P

roxy

Docker Remote

API

Docker Remote

API

Demos 1-1 today or tomorrow at Jerry’s session

Page 40: Cloud Native NetflixOSS Services on Docker

Questions?