Scalable Web Apps

Scalable web appsexecution time

vsdevelopment time

Piotr [email protected]

Types of scaling

Horizontal scalingscale out

Vertical scalingscale up

Server #3Server #2Server #1

Think about your app as a workernot single instance

Load balancer

App #1 App #2 App #3 App #4 App #5

OS

App

Server #3

Server #2

Server #1

Load balancer

App #1 App #2

App #3 App #4

App #5Load balancer

Server #n

Think about your app as a workernot single instance

Sessions

We need:

• Common• Fast• Persistent

Storage for sessions.

Sessions


Load balancer


Session storage

App

OS

Sessions - Redis

• Key-value in memory database (hash-tabled)• Scalable up to 1k nodes• Partitioning with Query routing• Non blocking M-S replication on nodes• Clustered (currently not production ready)http://athlan.pl/symfony2-redis-session-handler/

Redis - Partitioning with Query routing

Node #1 Node #2 Node #3

Query random

node

Miss Hit, abort

Also supported:• Client-side partitioning (app calls appropriate

node)• Proxy assisted partitioning (proxy selects

appropriate node)

Centralized Logging

• Logs should be centrailzed to avoid taking notice to each node separately

• Approaches:– File replication (rsync + cron)– syslog (easy to integrate with log4j)• syslogd over UDP p:514• rsyslog over TCP, stores data in db

Common storage, no local changes!

• Keep storage avaliable to all nodes– Symfony2 Gaufrette Bundle• FTP• Amazon S3• OpenCloud• AzureBlobStorage• Rackspace

Architecture


Load balancer


Session storage Files storage abstraction

Centralized logging

App

OS

OS

Continuous Integration

• To keep all nodes up-to-date, you need CI• Automatize disabling nodes, building,

deploying– Jenkins CI

Contineous Integration

1. Disable service on node2. Deploy/build app

1. Copy files2. Update db schema (liquibase, ORM schema

update)3. Execute scripts

3. Re-run service

Balance the payload - HAProxy

• Very, very fast proxy!• Software TCP/HTTP load balancer• Different node selecting algorithms:– roudrobin (limit 4128)– static-rr– leastconn (lowest number of connections)

Yeah guys, this is logo :)

But no schema is needed just imagine how it works.


• You can check node’s status by pinging• Dead node is excluded from balancing strategy

vi /etc/haproxy/haproxy.cfgoption httpchk HEAD /check.txt HTTP/1.0server webA 192.168.0.102:80 checkserver webB 192.168.0.103:80 check


• Monitor node’s status by read stats from socket via socat.

echo "show stat" | socat /tmp/haproxy.sock stdio


• Monitor node’s status by native stats webapp console

Nodes Monitoring - Zabbix

• Zabbix, centralized server monitoring

Zabbix + HAProxy

• UserParameter=haproxy.qcur[*], echo "show stat" | socat /tmp/haproxy.sock stdio | grep -i '$1' | sed 's/,/\ /g' | awk '{print $$3}'

Reverse Proxy and Varnish cache

http://tomayko.com/writings/things-caches-do

• Global virtual user = global cache


Reverse Proxy – Expiration model



Reverse Proxy – Expiration model



Reverse Proxy – Validation model



Reverse Proxy – Validation model




Varnish:80

Apache:81

App


Varnish:8080

Apache:8081

App

Varnish:8082

Apache:8083

App

HAProxy:80


Apache:8081

App

Apache:8082

App

HAProxy:81

Varnish:80

Varnish and ESI

<!DOCTYPE html><html><body> <esi:include src="http://..." /></body></html>

Scaling databases - Master slave

Master

Slave Slave Slave

Write

Read

• All data redundancy

MongoDB scaling

• Common models to spread data over nodes:– range keys– hash keys

• Many nodes on cheap machines• No all data redundancy in each node

MongoDB – range-based keys

• Awesome for range queries (grab data from min nodes – Query isolation)

• Not good enough to distribute data over nodes in case of monotinic incemental

http://docs.mongodb.org

http://docs.mongodb.org/

MongoDB – hash-based keys

• Take notice: not good for range queries whilemerge-sorting, no Query isolation in this case

• Write scaling – Write to many nodes simultaneously (take notice to readers-writer lock, where write is exclusive)



Mongodb sharding and clustering



CQRS

• Command Query Responsibility Segregation– separate application service layers for writing and

readng from DB (possibility to use different data sources like RAM or DB)

CQRS

• Examples– post-insert population cache• all SELECTs are from cache (even invalid)• consider LFU instead of LRU to invaidate cache

– pre-insert into memory• dump results periodicaly

In both approaches there is convenient to use Queues or data bus !

Queues, RabbitMQ

• RabbitMQ is based on AMQP (Advanced Message Queuing Protocol)– point-to-point– publish-and-subscribe– queueing, routing

• AMQP is not JMS (Java Message Service is an API, not protocol)

• Happy Rabit is empty Rabbit– do not try to store any data (messages) in queue

system in persistent mode to keep HA

• Simple queue

• Work queues(one consumer)

• Publish/Subscribe(many consumers)

Queues, RabbitMQ

Box vs spread architecture.

• Box architecture– no scaling– easy to maintenance

Server

RabbitMQ DBRedis

Webapp Varnish

Server #2

Box vs spread architecture.

• Spread architecture– High availability– more integrations, more administrative

Server #1

RabbitMQ

DB shardRedis

HAProxy

Varnish

Webapp

Server #3

DB shard Varnish

Webapp

Scalable web appsexecution time

vsdevelopment time

Piotr [email protected]

Technology

Scalable Web Apps