66
The Selfish Stack Future Stack – August 2016 Cristopher Stauffer

The Selfish Stack [FutureStack16 NYC]

Embed Size (px)

Citation preview

Page 1: The Selfish Stack [FutureStack16 NYC]

The Selfish StackF u t u r e S t a c k – A u g u s t 2 0 1 6

Cristopher Stauffer

Page 2: The Selfish Stack [FutureStack16 NYC]

Cristopher StaufferD i r e c t o r o f Te s t E n g i n e e r i n g

[email protected]

Page 3: The Selfish Stack [FutureStack16 NYC]

A Common Tale

Proof of Concept

Beta Release

Critical Production Application

Critical Business Liability

Founders

Early Startup

Well Funded

Expanding

Page 4: The Selfish Stack [FutureStack16 NYC]

▶ Unsustainable Development Costs

▶ Unnecessary Development Synchronization

▶ Inability to Scale

▶ Is All Things To All People

Production Liability

Page 5: The Selfish Stack [FutureStack16 NYC]

Engineering Ecosystem

Page 6: The Selfish Stack [FutureStack16 NYC]

Entitled Capabilities

▶ Sustainable

▶ Independent

▶ Scalable

▶ Clear Intention

Page 7: The Selfish Stack [FutureStack16 NYC]

Our System

▶ Microservices Architecture (2 Years Ago)▶ HTTP Microservices▶ Docker Containers▶ Unit, Function Testing▶ On Demand Deployment▶ Infrastructure Monitoring

Page 8: The Selfish Stack [FutureStack16 NYC]

Our System

▶ Microservices Architecture (18 Months Ago)▶ Promoting Only Microservice development▶ Promoting breaking apart monoliths▶ Deploying 100 releases a month

Page 9: The Selfish Stack [FutureStack16 NYC]

What were we seeing?

Nov-14

Jan-15

Mar-15

May-15

Jul-15

Sep-15

Nov-15

Jan-16

Mar-16

May-16

050

100150200250

Service Count

Page 10: The Selfish Stack [FutureStack16 NYC]

What were we seeing?

Nov-14

Jan-15

Mar-15

May-15

Jul-15

Sep-15

Nov-15

Jan-16

Mar-16

May-16

0200400600800

100012001400

Environment Deployments

Page 11: The Selfish Stack [FutureStack16 NYC]

Our System

▶ Microservices Architecture (18 Months Ago)▶ Out of Memory▶ Out of Disk Space▶ Diagnosing Performance Degradations▶ Misconfigurations▶ Missed certification steps▶ Bad merges

Page 12: The Selfish Stack [FutureStack16 NYC]

Our System

▶ Customer/User Dissatisfaction

▶ Loss in Engineering Confidence

Page 13: The Selfish Stack [FutureStack16 NYC]

Pain Points

▶ Manually Set Static Configuration

▶ Manual Monitoring

▶ Process Level Health Checks

▶ Manual CD Pipeline (even with automated tests)

▶ All or Nothing Deployments

Why Was It So Painful?

Page 14: The Selfish Stack [FutureStack16 NYC]

▶ Manually Set Static Configuration

▶ Manual Monitoring

▶ Process Level Health Checks

▶ Manual CD Pipeline (even with automated tests)

▶ All or Nothing Deployments

Pain Points

Why Was It So Painful?

Won’t Scale

Page 15: The Selfish Stack [FutureStack16 NYC]

Think About a Selfless Stack

“My new service is going to use upall the memory on the host, but it needs it”

“If you say so!”

Page 16: The Selfish Stack [FutureStack16 NYC]

Think About a Selfless Stack

“Not all the tests passed but gettingthis out is really important”

“You know best!”

Page 17: The Selfish Stack [FutureStack16 NYC]

Think About a Selfless Stack

“The best way for me monitor my new appIs this new metrics tool nobody has used”

“I’m sure there’sA good reason…”

Page 18: The Selfish Stack [FutureStack16 NYC]

Think About a Selfless Stack

“A-OK!”

....xyzabc…..

....xyzabc…..

....xyzabc…..

....xyzabc…..

....xyzabc…..

....xyzabc…..

“A-OK!”

“A-OK!”

“A-OK!”

“A-OK!”

Page 19: The Selfish Stack [FutureStack16 NYC]

Think About a Selfless Stack

....xyzabc…..

....xyzabc…..

....xyzabc…..

....xyzabc…..

....xyzabc…..

....xyzabc…..

“I’M NOT OK!”

“I’M NOT OK!”

“I’M NOT OK!”

“I’M NOT OK!”

Page 20: The Selfish Stack [FutureStack16 NYC]

▶ Correct Configuration and Routing

▶ Error Detection and Resolution

▶ Utilization and Optimization of Resources

▶ Protecting System Integrity

What did we need?

Page 21: The Selfish Stack [FutureStack16 NYC]

▶ Correct Configuration and Routing

▶ Error Detection and Resolution

▶ Utilization and Optimization of Resources

▶ Protecting System Integrity

What did we need?

Selfish Stack

Page 22: The Selfish Stack [FutureStack16 NYC]

V

I am SelfishI CARE ABOUT YOU

JUST NOT AS MUCHAS I CARE ABOUT MYSELF

Page 23: The Selfish Stack [FutureStack16 NYC]

V

ErrorDetectionAndAlerting

Page 24: The Selfish Stack [FutureStack16 NYC]

▶ New Relic Monitoring For Microservices and Legacy Apps

▶ Simple – just add an agent

▶ Detailed per application dashboards out of the box

▶ Single score to focus attention

Error Detection and Alerting

Page 25: The Selfish Stack [FutureStack16 NYC]

Base Docker Image

▶ Docker Engine▶ Ubuntu Image▶ JVM Image (e.g. Java 8)▶ New Relic Agent▶ Microservice Image

Error Detection and Alerting

Page 26: The Selfish Stack [FutureStack16 NYC]

100 Apps in 100 Days

▶ Made use of our base containers

▶ Rolled out monitoring to every application in the fleet

▶ Suddenly we had visibility everywhere.

▶ Alerting was based on a team ownership model

Error Detection and Alerting

Page 27: The Selfish Stack [FutureStack16 NYC]

Is this a Selfish System?

▶ Pool of Docker containers glued together

▶ Engineers are alerted

▶ Engineers make changes

▶ Engineers make the call

Page 28: The Selfish Stack [FutureStack16 NYC]

Is this a Selfish System?

▶ Pool of Docker containers glued together

▶ Engineers are alerted

▶ Engineers make changes

▶ Engineers make the call

Not Selfish

Page 29: The Selfish Stack [FutureStack16 NYC]

V

ConfigurationAndApplicationOrchestration

Page 30: The Selfish Stack [FutureStack16 NYC]

▶ Configuration

▶ Provisioning

▶ Routing

▶ Resource Balancing

What Engineers Should Be Focusing On

▶ Delivering customer value

▶ Satisfying internal needs

▶ Improving system resiliency

▶ Increasing Engineering productivity

Page 31: The Selfish Stack [FutureStack16 NYC]

Pain Points

Back to Our Pain Points

▶ Manually Set Static Configuration

▶ Manual Monitoring

▶ Process Level Health Checks

▶ Manual CD Pipeline (even with automated tests)

▶ All or Nothing Deployments

Page 32: The Selfish Stack [FutureStack16 NYC]

▶ Utilizes Mesos/Marathon Technology

▶ Highly Available

▶ Container Orchestration Platform

▶ Capable of intelligently limiting resources and balancing load

▶ Discovering and Routing Aware

▶ Capable of detecting ‘unhealthy’ applications

Platform as a Service

Page 33: The Selfish Stack [FutureStack16 NYC]

Basic Workflow

▶ Deploy applications to PaaS

▶ PaaS decides what host and port to run applications on

▶ PaaS determines if resources are available

▶ Health checks are built in to ensure application uptime

Platform as a Service

Page 34: The Selfish Stack [FutureStack16 NYC]

Health Checks

▶ Complements New Relic as startup validation

▶ Addresses risks of nodes hard crashing and not recovering

▶ Addresses risk of non-reporting New Relic hosts due to OOM

▶ Attempts to always maintain recommended instance count

Platform as a Service

Page 35: The Selfish Stack [FutureStack16 NYC]

Resource Utilization

▶ Re-balancing of applications across fleet of nodes

▶ Safeguards for CPU and memory starvation

▶ Ability to scale on demand (still human driven)

▶ Protective of over allocation

Platform as a Service

Page 36: The Selfish Stack [FutureStack16 NYC]

Pain Points

Back to Our Pain Points

▶ Manually Set Static Configuration

▶ Manual Monitoring

▶ Process Level Health Checks

▶ Manual CD Pipeline (even with automated tests)

▶ All or Nothing Deployments

Page 37: The Selfish Stack [FutureStack16 NYC]

▶ Manually Set Static Configuration

▶ Manual Monitoring

▶ Process Level Health Checks

▶ Manual CD Pipeline (even with automated tests)

▶ All or Nothing Deployments

Pain Points

Back to Our Pain Points

Little Selfish

Page 38: The Selfish Stack [FutureStack16 NYC]

▶ Manually Set Static Configuration

▶ Manual Monitoring

▶ Process Level Health Checks

▶ Manual CD Pipeline (even with automated tests)

▶ All or Nothing Deployments

Pain Points

Back to Our Pain Points

Won’t Scale

Page 39: The Selfish Stack [FutureStack16 NYC]

▶ Correct Configuration and Routing

▶ Error Detection and Resolution

▶ Utilization and Optimization of Resources

▶ Protecting System Integrity

What did we need?

Page 40: The Selfish Stack [FutureStack16 NYC]

▶ Correct Configuration and Routing

▶ Error Detection and Resolution

▶ Utilization and Optimization of Resources

▶ Protecting System Integrity

What did we need?

Page 41: The Selfish Stack [FutureStack16 NYC]

▶ Correct Configuration and Routing

▶ Error Detection and Resolution

▶ Utilization and Optimization of Resources

▶ Protecting System Integrity

What did we need?

Page 42: The Selfish Stack [FutureStack16 NYC]

V

SafeContinuousDelivery

Page 43: The Selfish Stack [FutureStack16 NYC]

Regressions give comfort

▶ Monolithic releases are understandable

▶ We tested everything

▶ Everything works

Continuous Integration

Page 44: The Selfish Stack [FutureStack16 NYC]

Release code as it is written

Continuous Delivery Pipeline

Develop

Commit to Branch

Continuous Integration

Merge

Continuous Delivery

Page 45: The Selfish Stack [FutureStack16 NYC]

Regressions Are Resource Intensive

▶ Empower continuous delivery

▶ Focused – Highly Selective to Integration Testing

Continuous Integration

Page 46: The Selfish Stack [FutureStack16 NYC]

Enter the Canary

▶ Landscape is in flux

▶ If we test a subset of things how can we be sure everything works?

▶ Canary Ensures▶ Dependencies met▶ Satisfying existing contracts▶ Handle production load

Continuous Delivery Pipeline

Page 47: The Selfish Stack [FutureStack16 NYC]

Canary Pipeline

▶ Special canary routing in our service discovery layer

▶ Test anywhere in the service mesh

▶ Discoverable tests using a /tests endpoint

▶ Monitor canary health in New Relic

Page 48: The Selfish Stack [FutureStack16 NYC]

Canary Isolated

▶ Receives no production traffic

▶ Reports to New Relic using unique name

▶ Discoverable and routable by Canary Tests

▶ Monitored for a configurable amount of time

▶ Triggers rollback if Canary Tests fail or New Relic reports Yellow/Red

Page 49: The Selfish Stack [FutureStack16 NYC]

Canary Partial

▶ Receives % of production traffic

▶ Reports to New Relic using unique name

▶ Monitored for a configurable amount of time

▶ Ensures similar response times and return codes

▶ Ensures similar CPU / memory utilization

▶ Triggers rollback if New Relic reports Yellow/Red

Page 50: The Selfish Stack [FutureStack16 NYC]

Continuous Delivery Pipeline

▶ Receives general production traffic

▶ Reports to New Relic under unique name

▶ Monitored for a configurable amount of time

▶ Ensures similar response times and return codes

▶ Ensures similar CPU / Memory utilization

▶ Triggers rollback if New Relic reports Yellow/Red

Page 51: The Selfish Stack [FutureStack16 NYC]

The Actors

▶ No Human involvement

▶ Bamboo Build Agent

▶ Launch Pad (Custom Microservice) orchestrates Canary Process

▶ Cerebro (Custom Microservice) retrieves sensory information:

▶ Service Discovery▶ Health Check▶ New Relic

Page 52: The Selfish Stack [FutureStack16 NYC]

V

Is ThisSelfish?

Page 53: The Selfish Stack [FutureStack16 NYC]

Think About the System

“My new service is going to use upall the memory on the host, but it needs it”

“Yeah…I don’t havethat to spare.”

Page 54: The Selfish Stack [FutureStack16 NYC]

Think About the System

“Not all the tests passed but gettingthis out is really important”

“No test passing; No way I’m deploying”

Page 55: The Selfish Stack [FutureStack16 NYC]

Think About the System

“The best way for me monitor my new appIs this new metrics tool nobody has used”

“So I only speak to my friend New Relic and he said your app just slowed down by 5x…”

Page 56: The Selfish Stack [FutureStack16 NYC]

Think About the System

“I feel really good about this”

“Me too…”

“Bad News Kid…I ran out of worker nodes…”

Page 57: The Selfish Stack [FutureStack16 NYC]

The Result

▶ Environmental Consistency

▶ Process That Is Appealing

▶ Early Detection and Response

▶ Instant Intervention and Rollback

Page 58: The Selfish Stack [FutureStack16 NYC]

Pain Points

Back to Our Pain Points

▶ Manually Set Static Configuration

▶ Manual Monitoring

▶ Process Level Health Checks

▶ Manual CD Pipeline (even with automated tests)

▶ All or Nothing Deployments

Page 59: The Selfish Stack [FutureStack16 NYC]

Pain Points

Back to Our Pain Points

▶ Manually Set Static Configuration

▶ Manual Monitoring

▶ Process Level Health Checks

▶ Manual CD Pipeline (even with automated tests)

▶ All or Nothing Deployments

Page 60: The Selfish Stack [FutureStack16 NYC]

▶ Manually Set Static Configuration

▶ Manual Monitoring

▶ Process Level Health Checks

▶ Manual CD Pipeline (even with automated tests)

▶ All or Nothing Deployments

Pain Points

Back to Our Pain Points

Selfish

Page 61: The Selfish Stack [FutureStack16 NYC]

▶ Manually Set Static Configuration

▶ Manual Monitoring

▶ Process Level Health Checks

▶ Manual CD Pipeline (even with automated tests)

▶ All or Nothing Deployments

Pain Points

Back to Our Pain Points

Does Scale

Page 62: The Selfish Stack [FutureStack16 NYC]

What we now see?

Nov-14

Jan-15

Mar-15

May-15

Jul-15

Sep-15

Nov-15

Jan-16

Mar-16

May-16

0

50

100

150

200

250Yodle Service Count

Page 63: The Selfish Stack [FutureStack16 NYC]

What we now see?

Nov-14

Jan-15

Mar-15

May-15

Jul-15

Sep-15

Nov-15

Jan-16

Mar-16

May-16

0200400600800

100012001400

Monthly Deployments

Page 64: The Selfish Stack [FutureStack16 NYC]

Entitled Capabilities

▶ Sustainable

▶ Independent

▶ Scalable

▶ Clear Intention

Page 65: The Selfish Stack [FutureStack16 NYC]

▶ Autonomic Systems informed our direction

▶ Automatic Decisions are made based on basic health stats and New Relic Data

▶ Imagine additional sensors▶ Database Load▶ Service Mesh Health▶ Custom Metrics▶ Browser Product Data

▶ Give your CI/CD Processes insights

Extending This Idea

Page 66: The Selfish Stack [FutureStack16 NYC]

V

I am SelfishI CARE ABOUT YOU

BY CARING ABOUT MYSELF