39
Microservices on Mesos & Netflix OSS By [email protected]

Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

Embed Size (px)

Citation preview

Page 1: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

Microservices on Mesos & Netflix OSS

By [email protected]

Page 2: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

Before we start(or ground rules)

● Forgive my shaking voice

● Feel free to tell me I’m stupid

● Feel free to interrupt me for question

Page 3: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

Agenda

● About Hoiio Stack

● Phoenix Deployment

○ Mesos/Marathon

● Netflix OSS @ Hoiio

○ Service Discovery

■ Consul

■ Eureka

○ API Gateway

● (Monitoring)

Page 4: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

About Hoiio

Page 5: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

● VoIP/SIP

● SMS/Email

● Connected Apps

● HR suite

Services

Page 6: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

● Distributed multi-service Micro-services

● RabbitMQ as message broker Peer-to-peer via HTTP

● Kafka as event-store

● JAVA/Python/... modules

● Entirely on AWS

Platform

zuul

auth

billingsip

HTTP

Page 7: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

Phoenix DeploymentMesos/Marathon

Page 8: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

● Abstract cluster of machines to a single “black-box” machine

● Master nodes, Slave/Agent nodes

● Tasks are submitted to master

● Master schedules job to one of the slaves

Mesos

Page 10: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

● Framework running on top of Mesos

● Manage tasks config, number of instance,...

● Healthcheck

● REST interface

● Mesos as OS, Marathon as Task Manager

Marathon

Page 11: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

● Framework running on top of Mesos

● Manage tasks config, number of instance,...

● Healthcheck

● REST interface

● Mesos as OS, Marathon as Task Manager

MarathonMesos Slave

Mesos Master Marathon

CPU/Memory

Kernel Scheduler

Task Manager

Page 12: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS
Page 13: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

● Docker as container

○ Supported by Mesos

○ Use AWS ECR as private repo/ Private repo running on Marathon

● Marathon performs healthcheck and replaces unhealthy instances

● Replacement takes seconds!

Phoenix?

Page 14: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

{ "id": "ms-uat-xxx", "mem": 384, “cpu”: 0.5, ...

Page 15: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS
Page 16: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

Service DiscoveryNetflix Eureka

Page 17: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

● Eureka Server & Client

● Server route are replicated

● Each Client hold a copy of route table

● Route table are updated in background

https://github.com/Netflix/eureka/wiki/Eureka-at-a-glance

Page 18: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

● Eureka

○ Eureka server tracks which service

is running where (which ip and port?)

○ All records are replicated to all eureka-clients

● Ribbon

○ Pick a server from records replica on local eureka-client

○ Make request to picked server

○ Retry if configured

10.0.12.16:1234 10.0.140.21:4321

10.0.140.26:6789

Eureka

10.0.12.16:1234

10.0.140.21:4321, 10.0.140.26:6789

10.0.12.16:1234

10.0.140.21:4321, 10.0.140.26:6789

10.0.12.16:1234

10.0.140.21:4321, 10.0.140.26:6789

10.0.12.16:1234

10.0.12.16:1234

10.0.140.21:4321, 10.0.140.26:6789

R10.0.140.21:4321

10.0.140.21:4321

Page 19: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

Auth1

Routes

● Single-point-of-failure? Not really

○ Route table are replicated

○ Each client has a copy

○ Routes are queried from local copy

● When Eureka is down

○ New servers are not updated

○ Might call to a dead server ->

retry on local server list with Ribbon

SIP

Auth2

HTTP

Routes

EurekaServer

Routes

Routes

Page 20: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

String moduleVipAddress = "call.hoiio.info"

Observable<HttpResponse> response = HoiioRibbonRequest.getInstance().makeRequest(

moduleVipAddress,

UUID.randomUUID().toString(),

httpRequest);

Page 21: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

● Timeout and Retry

○ Defined in HoiioRibbonRequest

○ Default:

■ Timeout: 10s

■ Retry:

● Same server: 0

● Next server: 3

○ Can be re-configured

10.0.12.16:1234

10.0.12.16:1234

10.0.140.21:4321, 10.0.140.26:6789

R10.0.140.21:4321

10.0.140.21:4321

Page 22: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

HttpClient httpClient;

RetryPolicy retryPolicy;

String moduleVipAddress = "call.hoiio.info"

Integer sameServerRetry = 1

Integer nextServerRetry = 1

retryPolicy = new RetryPolicy(

new DefaultLoadBalancerRetryHandler(

sameServerRetry,

nextServerRetry,

true

)

)

}

Integer timeout = 60;

httpClient = new HttpClient(500, 50, timeout*1000);

Configuration config = new Configuration(moduleVipAddress, httpClient, retryPolicy);

Observable<HttpResponse> httpResponse = HoiioRibbonRequest.getInstance().makeRequest(

config,

correlationId,

httpRequest);

Page 23: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

Service DiscoveryConsul

Page 24: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

● Clustering with agent on each instance

● Service info is shared in cluster

● Agent has REST interface to register/deregister/checks/query/…

● Zuul-pronted as primary reversed proxy

Implementation

service.json

service.json

Zuul

Page 25: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS
Page 26: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

HoiioConsulLoadBalancer lb = new HoiioConsulLoadBalancer(appName, ConsulService.Info.environment(), tag);

HttpResponse httpResponse;

try {

httpResponse = lb.execute(new HttpCmd(httpRequest))

} catch (NoServerException ignored) {

ZuulLogger.logger.error("No server for " + appName)

httpResponse = responseFactory.get().newHttpResponse(

new BasicStatusLine(HttpVersion.HTTP_1_1, 503, "Service not available"),

null);

}

Page 27: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

API Gatewaywith Netflix Zuul and Archaius

Page 28: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

● Single gateway for API

● API mapping for easy understanding

● Optimize number of request called

● Reject malformed request

Problems

sms

auth

billingsip

HTTP

Page 29: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

● Why Zuul?

○ Apps does not have Eureka Client

○ Cron jobs

○ Exposing API

● What Zuul does

○ Represent API caller (Apps,

Cronjob, Partner,...) to talk to modules (act as a proxy)

■ Relay request

■ Retry

○ Authenticate request

10.0.12.16:123410.0.140.21:4321

10.0.140.26:6789

Eureka

10.0.12.16:1234

10.0.140.21:4321, 10.0.140.26:6789

10.0.12.16:1234

10.0.140.21:4321, 10.0.140.26:6789

10.0.12.16:1234

10.0.140.21:4321, 10.0.140.26:6789

10.0.12.16:1234

10.0.140.21:4321, 10.0.140.26:6789

10.0.12.16:1234

10.0.140.21:4321, 10.0.140.26:6789

Z

Z/a/b/c

10.0.140.26:6789

10.0.12.16:1234

/a/b/c -> /a/c

/a/c

Microservice

Page 30: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

● Pre, Route, Post Filter

○ Groovy filter

○ Has priority

● Integrate with Archaius for Dynamic configuration

● Integrate with Eureka/Consul for service discovery

Netflix ZuulReject malformed

Authenticate

Route using Eureka

Ribbon/Eureka

Add header

pre

route

post

Archaius

Route mapping

/sms/send /sms/send -> {“module”:”sms”, “uri”:”sendOneSms”}

/sendOneSms

Page 31: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

● Timeout and Retry

○ Zuul represents API callers to talk

to modules -> must tell Zuul timeout and retry for each API

○ Default values

■ Timeout: 10s

■ Retry:

● Same server: 0

● Next server: 3

{

"vipAddress": "auth.hoiio.info",

"module": "auth",

“apis”: [

{

"from":"/v1/otp",

"to": "/private/v1/otp",

"type": "private",

"timeout": 60,

"retry": {

"same": 1,

"next": 2

}

}

]

}

Page 32: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS
Page 33: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS
Page 34: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

Monitoring

Page 35: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

● Remember Consul?

● Consul watch

○ Trigger action when a service status changes

Service status

service.json

{ "service": { "name": "MS-Apps-1-46", "tags": ["prod"], "address": "10.0.14.10", "port": 8080, "checks": [ { "script": "/opt/consul/bin/MS-Apps-1-46-healthcheck.sh",

"interval": "60s" } ] }}

Page 36: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

● Metric sources:

○ CollectD/cAdvisor

○ Cloudwatch

● Metric storage:

○ InfluxDB

● Visualization:

○ Grafana

Instance stats

Kapacitor

Cloudwatch

CAS slack/sms

Page 37: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS
Page 38: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

Thank you!

Page 39: Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS

We are hiring!

● Fresh web engineer● Senior web engineer● Internship