How Shopify Scales Rails

Preview:

Citation preview

shopify

How Shopify Scales RailsJohn Duff

• The Shopify stack

• Knowing what to scale

• How we cache

• Scaling beyond caching

• Splitting things up

Overview

What is Shopify?

The Stack

• Ruby 1.9.3-p385

• Rails 3.2

• Percona MySQL 5.5

• Unicorn 4.5

• Memcached 1.4.14

• Redis 2.6

The Stack

The Stack• 53 App Servers

• 1590 Unicorn Workers

• 5 Job Servers

• 370 Job Workers

Nginx

Unicorn

Rails 3.2

Ruby 1.9.3-p385

The Stack

Firewall

Load Balancer

App Servers

Redis

Job Servers

Database

Memcached Search

• 55,873 Lines of application code

• 15,968 Lines of CoffeeScript application code

• 81,892 Lines of test code

• 211 Controllers

• 468 Models

The Stack

Current Scale

9.9 M OrdersAn order every 3.2 seconds

2,008 Sales per MinuteCyber Monday

50,000 RPM45 ms response time

13.3 billion requests

Looking Back, to Look Ahead

• First line of code written in 2004

• Shopify released June, 2006

• Same codebase

• Over 9 years of Rails upgrades, improvements and changes

Looking Back, to Look Ahead

Looking Back, to Look Ahead

• 6,702 Lines of application code (55,873)

• 4,386 Lines of test code (81,892)

• 38 Controllers (211)

• 77 Models (468)

Looking Back, to Look Ahead

• Ruby 1.8.2

• Rails 0.13.1

• MySQL 4.1

• Lighttpd

• Memcached

Know The System

One Request, One Process

RPM = W * 1/R

RPM = 1590 * 60 / 0.072

1,325,000 = 1172 * 60 / 0.072

↑ Workers

↓ Response Time

Know The System

• Avoid network calls during requests

• Speed up unavoidable network calls

• The Storefront and Checkout

• The Chive

Chive Flash Sale

Measure ALL THE THINGS

Measure ALL THE THINGS

• New Relic

• Splunk

• StatsD

• Cacti

• Conan

New Relic

Splunk

Caching

cacheable

cacheable

• https://github.com/Shopify/cacheable

• serve gzip’d content

• ETag and 304 Not Modified

• generational caching

• no explicit expiry

cacheableclass PostsController < ApplicationController def show response_cache do @post = @shop.posts.find(params[:id]) respond_with(@post) end end

def cache_key_data { :action => action_name, :format => request.format, :params => params.slice(:id), :shop_version => @shop.version } endend

requests

Caching Dynamic 404s

Identity Cache

Identity Cache

• https://github.com/Shopify/identity_cache

• cache full model objects in memcached

• can include associated objects in cache

• must opt in to the cache

• explicit, but automatic expiry

Identity Cacheclass Product < ActiveRecord::Base include IdentityCache has_many :images cache_index [:shop_id, :id] cache_has_many :images, :embed => trueend

@product = Product.fetch_by_shop_id_and_id(shop_id, id)@images = @product.fetch_images

Identity Cache

Get Out of My Process

Delayed Job

• Jobs stored in the db

• Workers run in their own process

• Workers poll for jobs periodically

• https://github.com/collectiveidea/delayed_job

Resque

• Redis backed

• O(1) operation to pop jobs

• Faster (300 jobs/sec vs 120 jobs/sec)

• Extensible

• https://github.com/defunkt/resque

Resque

• Sending Email

• Processing Payments

• Geolocation

• Import / Export

• Indexing for Search

• 86 Other things...

Background Payment Processing

ms

Resque

class AddressGeolocationJob max_retries 3

def self.perform(params) object = params[:model].constantize.find(params[:id]) object.latitude, object.longitude = Geocoder.geocode(object) object.save! endend

Resque.enqueue(AddressGeolocationJob, :id => 1, :model => 'Address')

Redis

• Inventory reservation system

• Sessions

• Theme uploads

• Throttling

• Carts

All Roads Lead To MySQL

MySQL Hardware

• 4 x 8 Core Processor

• SSD

• 256 GB Ram

• Full working set in memory

MySQL Query Optimization

• pt-query-digest

• Avoid queries that generate temp tables

• Adding the right indexes

• Forcing / Ignoring Indexes

MySQL Tuning

• disable innodb_stats_on_metadata

• increase table_open_cache

• replace glibc memory allocator with tcmalloc

• innodb_autoinc_lock_mode=‘interleaved’

after_commitdb transactions best friend

after_commit

• After transaction has been committed

• Webhooks

• Cache expiry

• Update associated objects

after_commitclass OrderObserver < ActiveRecord::Observer observe :order

def after_save(order) if order.changes.keys.include?(:financial_status) order.flag_for_after_commit(:update_customer) end end

def after_commit(order) if order.flagged_for_after_commit?(:update_customer) Resque.enqueue(UpdateCustomerJob, :id => order.id) end endend

Services

Services

• Split out standalone services as needed

• Independently scaled

• Segmented metrics

• Overall system is more complex

• Limit to what is necessary

Imagery

Adapt and Evolve as NeededUsing data and knowledge of the system to drive decisions

Summary

• Know your application and infrastructure.

• Keep slow IO or CPU tasks out of the main process.

• Measure your optimizations. You can make it worse.

Thanks.@johnduff | john.duff@shopify.com