94
Rails Operations Lessons learned from deploying and managing hundreds of Rails applications Thanks for coming out this morning. I know it’s hungover oclock, so it means a lot. You are dedicated, upstanding individuals.

Rails Operations - Lessons Learned

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Rails Operations -  Lessons Learned

Rails OperationsLessons learned from deploying and managing hundreds

of Rails applications

Thanks for coming out this morning. I know it’s hungover oclock, so it means a lot. You are dedicated, upstanding individuals.

Page 2: Rails Operations -  Lessons Learned

Oh hi, I’m Josh

Page 3: Rails Operations -  Lessons Learned

• @techpickles

• http://github.com/technicalpickles

• http://technicalpickles.com

I am from the internet

Page 4: Rails Operations -  Lessons Learned
Page 5: Rails Operations -  Lessons Learned

Awesomeness Engineer of Supreme Versatility

II

My official title is Awesomeness Engineer of Supreme Versatility. 2. (I recently was promoted)

Page 6: Rails Operations -  Lessons Learned

Managed hosting and operations

We’re mostly known for our hosting. What isn’t as well known is our managed services. For this, we engage more closely with our customers.

When bringing on new managed customers, we work with them to spec out servers, review application’s needs. We get them up and running on these servers with our configuration management tool, moonshine. And once deployed, we provide 24x7 monitoring. If you’re server goes down, we let you know, and get it back online as soon as possible, regardless of when it happens.

And that’s not all. Once live, we provide operational support. Anything from application performance analysis, recommending architecture improvements, installing and managing new software on servers, or just being there to give feedback on how the application is operating.

You can basically think of us as a Rails Operations company.

Page 7: Rails Operations -  Lessons Learned

I’m talking aboutRails Operations

Conveniently enough, I’m talking about Rails Operations today.

Page 8: Rails Operations -  Lessons Learned

WTF isRails Operations?

I found this hard to distill down to a simple statement.

I think it’s safe to say that the majority of us are developers. We write code, build applications, launch products.

A lot of organizations, operations is something different. eople associate operations with system administration. And to an extent, this can be fairly accurate. Different people, different teams, different. As developers, we write some code, and toss it over wall, and let _them_ handle it.

I think this is a bit flawed. The code you write has an operational impact. The systems you run it on have an operational impact on your code. It’s a complex relationship, and when developer and operations teams are separate, it’s hard to bridge the gap between, since it’s neithers responsibility.

Page 9: Rails Operations -  Lessons Learned

Development and maintenanceof a production Rails application

The simplest definition I’ve found is this.

Page 10: Rails Operations -  Lessons Learned

Very important assumptionYou develop code that will eventually go into production,

and in part to some business model, generate revenue

That is to say, you are part of some organization

Page 11: Rails Operations -  Lessons Learned
Page 12: Rails Operations -  Lessons Learned

Before we dig in too deep...

Page 13: Rails Operations -  Lessons Learned

Let’s talk about the business. We need to start with where development and operations fit within the rest of The Business.

Page 14: Rails Operations -  Lessons Learned

QuestionDoes development generate revenue?

Page 15: Rails Operations -  Lessons Learned

• Takes place on laptops, desktop machines, staging servers

• No real users

• Unknown if it truly works

• Tests are green, but...

Page 16: Rails Operations -  Lessons Learned

NO

Page 17: Rails Operations -  Lessons Learned

but it CREATES potential revenue

Page 18: Rails Operations -  Lessons Learned

• Step 1: Development

• Step 2: .......

• Step 3: PROFIT

Page 19: Rails Operations -  Lessons Learned

QuestionDoes operations generate revenue?

Page 20: Rails Operations -  Lessons Learned

• Lives on servers located in data centers and clouds

• Real users

• Either code works, or it doesn’t

• Either the application is available or not

Page 21: Rails Operations -  Lessons Learned

NO

Just because your application works in production, doesn’t mean people are using it or buying your product.

Page 22: Rails Operations -  Lessons Learned

but it PRESERVES potential revenue

If you have good operations, that means users will be able to see your application working and actually be able to use it.

Page 23: Rails Operations -  Lessons Learned

• Step 1: Development

• Step 2: Operations

• Step 3: ......

• Step 4: PROFIT!

Page 24: Rails Operations -  Lessons Learned

QuestionUh, what generates revenue?

Page 25: Rails Operations -  Lessons Learned

Million Dollar Question

Page 26: Rails Operations -  Lessons Learned

• Working features (or at least that work enough)

• Infrastructure to keep the application up and running (or at least up enough)

• A business model

• Sheer determination

• Good luck

Page 27: Rails Operations -  Lessons Learned

Lessons learned

Alright. I’ve given you a definition of Rails Operations, and had a brief detour to talk about the business and where development and operations fit into it.

Now for some lessons. Basically, I’ll be going over some patterns, some antipatterns, and other practices and topics.

Page 28: Rails Operations -  Lessons Learned

Common threads

Putting this all together, I kept coming back to some common threads. That is, some ideas that apply to many aspects. I’m going to start you off with a few together, and then just jump into the lessons. We’ll probably pick up a few more along the way.

Page 29: Rails Operations -  Lessons Learned

Give a damn

If you don’t care about what you’re doing, everything else I’m talking about today probably doesn’t matter. I don’t think you need to worry about this though, since you are here.

Page 30: Rails Operations -  Lessons Learned

Earlier we talked about how operations preserves revenue. To that end, our goal is to mitigate risk as much as makes sense.

Page 31: Rails Operations -  Lessons Learned

Tradeoffs and compromise. Each possible solution has them. The trick is understanding that there are tradeoffs. What tradeoffs you make depends on what your priorities are. For example:

* Dollar signs * Time * Sanity * Technical debt * Higher risk

Page 32: Rails Operations -  Lessons Learned

Configuration Management

Pattern

Page 33: Rails Operations -  Lessons Learned

It’s about managing configuration.

duh.

Page 34: Rails Operations -  Lessons Learned

You write code that manages your servers’

configuration

Take a moment to think about how you might describe a server to someone. There’s plenty of nouns:

* packages * users * files * cronjobs * services

And some verbs:

* running commands

Page 35: Rails Operations -  Lessons Learned

• apache package is installed

• apache service is running

• deploy user exists

• cron jobs

• etc

Page 36: Rails Operations -  Lessons Learned

• Moonshine

• Puppet

• Chef

Page 37: Rails Operations -  Lessons Learned

Automation

Bootstrapping. Anyone that has setup a new server from scratch can tell you... it’s time consuming, labor intensive, and error prone.

Bootstraping is just part of it though, only ever happens once though. What’s more interesting is that you can use this to manage your infrastructure as it involves. Need to start using redis? Just add it to your configuration management, and you’ll have it next deploy.

Page 38: Rails Operations -  Lessons Learned

The best way to illustrate why you should be using configuration management is to explore the consequences of not using it.

Imagine it’s time to add a new application server. Your application is under heavy load, and needs this server to be up and serving requests. How long will it take you to get it up? And how will you know it’s setup correctly? If you’re doing this all manually, you can’t really know the answers to these questions.

Here’s another example. Adding a new dependency to your application. It can be a gem, a native package, a new daemon, whatever. How do you ensure this gets on the server when you need it? Deploy and pray? Log into the server and install it yourself? This sucks, and kind of risky especially if you’re talking about production.

Page 39: Rails Operations -  Lessons Learned

As always, there’s tradeoffs to be made.

Setting up and learning how to do configuration management takes time. Time that could be spent working on user-facing tasks.

Taking on risk of having to cold deploy, or having deploys fail because of missing dependencies.

Usually, the balance is to have to take the risk and have it burn you enough times that it’s more painful to not stop and get your configuration management on, that it is to not do so.

If you do know it, it’s a no brainer. Just DO IT.

Page 40: Rails Operations -  Lessons Learned

Staging ServersPattern

Page 41: Rails Operations -  Lessons Learned

Preproduction servers

Staging servers are all about being a testbed between

Page 42: Rails Operations -  Lessons Learned

Helps ensure correctness of deploy

Page 43: Rails Operations -  Lessons Learned

configuration management

+staging servers

=VERY YES

If you use configuration management, and have staging servers, then this is a huge win.

We talked about adding new dependencies earlier. If you are doing configuration management, then staging is the first place you can see if ur doing it right.

Page 44: Rails Operations -  Lessons Learned

There’s basically no downside to using staging servers. The only tradeoff though is that servers do cost dollar signs and staging servers are no different. This leads us to a new thread...

Page 45: Rails Operations -  Lessons Learned

Maths... look around you. In most cases, you can do some dollar sign math to justify costs of a thing. Let’s try this.

A staging server may cost $60/mo

But how can you calculate the cost of not having a staging server? Let’s assume that if you don’t have a staging server, you’re bound to do a bad deploy that it could have prevented. Some code that doesn’t work outright, or is otherwise flawed. Let’s say it causes an hour of downtime while you determine the problem and try to fix it. Do you know how much it costs your business in lost revenue to be down an hour?

This is actually a pretty mature question, and I’d be surprised if many people can answer it off hand. In any event, I think we can do some fuzzy math to say yeah, it probably is more than $60. If that’s the case, then one failed deploy a month is enough to validate a staging server.

Page 46: Rails Operations -  Lessons Learned

Repeat after me• development

• staging

• production

Page 47: Rails Operations -  Lessons Learned

capistrano-gitflow

Whenever possible, I like to enforce standard by means of automation

For the flow of code from development -> staging -> production, we have capistrano-gitflow. Originally done up by apinstein, I did some refactorings and cleaned it up enough to be usable as a gem

Effectively, this enforces development -> staging -> production. Whenever you deploy to staging, it tags the current branch including information about the date, the user deploying, and a small blurb about the changes. Assuming this is cool, you can promote a tag to production and go on from there. If you haven’t deployed to staging yet, you’ll be promtpted and it will default to using the last production tag.

Page 48: Rails Operations -  Lessons Learned

Deploy early, deploy often

Pattern

Page 49: Rails Operations -  Lessons Learned

A play on release early, release often.

Although technically, I guess it’s the same

It’s basically the same thing we hear in the open source community.

The sooner you release code, the sooner you can validate it and the sooner you can get feedback. Does it work? Does it not break the entire site? Are users happy?

Page 50: Rails Operations -  Lessons Learned

By deploying early and often, we’re also limiting risk. The less changes that go out in a single deploy, the less things there are that can possibly break. By waiting to deploy, you’re accumulating a larger set of changes to deploy, and therefore there’s more surface area to debug if it breaks.

Page 51: Rails Operations -  Lessons Learned

In a way, you can consider undeployed code a liability.

Imagine spending a day or two doing some code cleanups to get ready for a sprint. Should you deploy when you are done and happy with the refactorings, or should you go ahead and do your sprint.

If it were me, I’d deploy the refactorings first. That way, the code is out there, and you’ll know if it performs equally to its nonrefactored version. It’s really easy to introduce performance killing changes in even a few line diff.

If you instead wait and deploy with new features, if anything goes awry, you have significantly more code to spelunk to track down a potential problem.

Page 52: Rails Operations -  Lessons Learned

Feeling Driven Development

Antipattern

Page 53: Rails Operations -  Lessons Learned

Oh feelings.

Page 54: Rails Operations -  Lessons Learned

The front page feels slow

Page 55: Rails Operations -  Lessons Learned

The primary key seems like it’s increasing

rapidly

Page 56: Rails Operations -  Lessons Learned

IO seems high

Page 57: Rails Operations -  Lessons Learned

What does it even mean?

This drives me nuts. By saying something ‘feels’ slow, there’s an implied assumption. The assumption is that it should be fast. Saying it like that is...weird, because it gives no indication of what is slow or not.

The trick is in determining what the assumption is, and then finding a way to measure and identify the problem.

How can we do this?

Page 58: Rails Operations -  Lessons Learned

Science Driven Development

Counterpattern

Page 59: Rails Operations -  Lessons Learned
Page 60: Rails Operations -  Lessons Learned

Metrics everywhere!

With the right tools, you can easily be continuously collecting data so you have it in your pocket when you need it.

Page 61: Rails Operations -  Lessons Learned

• New Relic - http://newrelic.com

• Scout - http://scoutapp.com

These are the two we use and highly recommend.

New Relic is really great for giving a high level view of your application. We’re talking at the request response level, including all sorts of fun maths with most time consuming requests, highest standard deviation, etc. It also breaks down requests by where time spent. Like if it’s all in the view, the controller, the database, partials, etc etc

Scout is useful for other reasons. While New Relic is good for high level understanding of your application, Scout is a bit more low level. You can use it to collect metrics about your servers, and how well they are running. Memory, CPU, disk space, IO, mysql connection stats, and so on.

I really believe these are a great combination, because New Relic can point you in the direction of a problem area, and Scout can better understand what’s contributing to it at a system level.

Page 62: Rails Operations -  Lessons Learned

The front page feels slow

The front page is taking 10 seconds to load, but we really need it to be loading in under 1 second

Page 63: Rails Operations -  Lessons Learned

The primary key seems like it’s increasing

rapidlyThe primary key is at 90% of it’s maximum, up from 80% yesterday, and looks like it’ll run out overnight.

Page 64: Rails Operations -  Lessons Learned

IO seems highIO fluctatues up to 90% sometimes, but doesn’t appear

to have a negative effect

Page 65: Rails Operations -  Lessons Learned
Page 66: Rails Operations -  Lessons Learned

MonitoringTopic

Page 67: Rails Operations -  Lessons Learned

How do you know when everything is

awful?

Page 68: Rails Operations -  Lessons Learned

How would you prefer to know?

• Angry tweets

• Angry email from your boss

• You personally checking everything all the time

• An automated system to let you know

Page 69: Rails Operations -  Lessons Learned

• Nagios

• Scout

Page 70: Rails Operations -  Lessons Learned

What to monitor

It’s not a problem til it’s a problem

Page 71: Rails Operations -  Lessons Learned

Define priority

Does it wake someone up?

Page 72: Rails Operations -  Lessons Learned

Must be actionable

Page 73: Rails Operations -  Lessons Learned

Single point of contact

If everything is awful, needs to be a single point of contact. They take point, acknowledge and begin looking into it. If need be, bring on others

Page 74: Rails Operations -  Lessons Learned
Page 75: Rails Operations -  Lessons Learned

Vertical scalingPattern

Page 76: Rails Operations -  Lessons Learned

Your app is slowNow what?

Page 77: Rails Operations -  Lessons Learned
Page 78: Rails Operations -  Lessons Learned

Resources are (relatively) cheap

Page 79: Rails Operations -  Lessons Learned

Developers are (relatively) expensive

Page 80: Rails Operations -  Lessons Learned

Imagine having memory issues.

Page 81: Rails Operations -  Lessons Learned

As always there’s a balance.

Remember, it’s a tradeoff to optimize for developer time by vertically scaling. It buys you time to either deal

Page 82: Rails Operations -  Lessons Learned

Hipster StackAntipattern

Page 83: Rails Operations -  Lessons Learned
Page 84: Rails Operations -  Lessons Learned

“I read a blog post about how mongo is

totally web scale”

Page 85: Rails Operations -  Lessons Learned

Cargo cult operations

Page 86: Rails Operations -  Lessons Learned
Page 87: Rails Operations -  Lessons Learned
Page 88: Rails Operations -  Lessons Learned

Remember what’s important for th ebusiness? Do you want to become the expert at <insert technology here>? Is it really the most valuable thing you can be doing?

Page 89: Rails Operations -  Lessons Learned

If you’re still going to go hipster...

• experiment in branches

• understand operational impact

• Staging!

Page 90: Rails Operations -  Lessons Learned

Test in productionWait, what?

Page 91: Rails Operations -  Lessons Learned

Further Reading

• Web Operations - John Allspaw and Jesse Robins

• Continuous Delivery - Jez Humble and David Farley

• “Web Operations for Developers 101”

http://www.amazon.com/Web-Operations-Keeping-Data-Time/dp/1449377440/ref=sr_1_1?s=books&ie=UTF8&qid=1314447411&sr=1-1

http://www.amazon.com/Continuous-Delivery-Deployment-Automation-Addison-Wesley/dp/0321601912/ref=sr_1_4?s=books&ie=UTF8&qid=1314447411&sr=1-4

http://www.paperplanes.de/2011/7/25/web_operations_101_for_developers.html

Page 92: Rails Operations -  Lessons Learned

Fin.

Page 93: Rails Operations -  Lessons Learned

Want to talk ops?find me here

josh@railsmachine@techpickles

Page 94: Rails Operations -  Lessons Learned

Do you like these things?

• Rails

• Operations

• Ping Pong

• Beer

We are hiring