Retaining globally distributed high availability

Retaining globally distributed high availability Art van Scheppingen Head of Database Engineering

2

1.  Who is Spil Games? 2.  Theory 3.  Spil Storage Pla9orm 4.  Ques=ons?

Overview

Who are we? Who is Spil Games?

4

•  Company founded in 2001 •  350+ employees world wide •  180M+ unique visitors per month •  45 portals in 19 languages •  Casual games •  Social games •  Real =me mul=player games •  Mobile games

•  35+ MySQL clusters •  60k queries per second (3.5 billion qpd)

Facts

5

Geographic Reach 180 Million Monthly Ac=ve Users(*)

Source: (*) Google Analy3cs, August 2012

•  Over 45 localized portals in 19 languages •  Mul= pla9orm: web, mobile, tablet •  Focus on casual and social games •  180M MAU per month (30M YoY growth) •  Over 50M registered users

6

Girls, Teens and Family

spielen.com juegos.com gamesgames.com games.co.uk

Brands

Foundations The exci2ng theory

8

•  What does it exactly mean?

Retaining globally distributed HA

9

Wikipedia: High availability is a system design approach and associated service implementa=on that ensures a prearranged level of opera=onal performance will be met during a contractual measurement period. Oracle: •  Availability of resources in a computer system

What is high availability?

10

•  Master with (many) slave(s)

How do we reach HA with MySQL?

Master

Slave Slave Slave

11

•  Master with (many) slave(s) •  Mul= Master


Master

Slave

Master

Slave

12

•  Master with (many) slave(s) •  Mul= Master •  Clustering


MysqldMysqld

ndbd

ndbd ndbd

ndbd ndbd

mgmt

13

•  Master with (many) slave(s) •  Mul= Master •  Clustering •  Geographical redundancy


Master local DC

Slave local DC

Slave Asia Slave US

14

•  Scale up •  Ver=cal •  Faster CPU/Memory/disks •  Expensive •  Costs mul=ply in same rate as # of nodes

•  Scale out •  Horizontal •  More (small) machines •  Inexpensive •  Par==oning/federa=ng (sharding)

What if we keep growing?

15

•  Func=onal •  Shard your database func=onally

•  Reads •  Add more slaves (keep them coming!)

•  Writes •  More disks •  Horizontal par==oning •  Federated par==ons

Scale out

16

•  Breaking up tables in small parts on the same host •  Par==oned on a column •  Infinite growth (as long as you add diskspace) •  Less used data to slower (cheaper) disks •  No stored procedures, func=ons, etc •  Uneven usage of par==ons (hash par==on may help) •  Once wrihen, data remains on the par==on

Horizontal partitioning

17

•  Breaking up your table in parts on mul=ple hosts •  Par==oned on a column •  Infinite growth (as long as you add hosts) •  Less used data on slower hosts •  Not supported in (standard) MySQL •  Par==oning on applica=on level (or proxy) •  Alterna=vely: NDB

•  Uneven usage of par==ons •  Once wrihen data (mostly) remains on the par==on •  Parallel queries to retrieve data from all shards

Federated partitions (sharding)

18

•  Parallel execu=on of sequen=al jobs •  Limited by the weakest link •  As fast as the slowest node •  Fix: nonsequen=al (asynchronous) execu=on

Amdahl's law

19

Typical LAMP stack

Client

Webserver

PHP

MySQL

Memcache

Webserver

PHP

Loadbalancer

20

A-typical LAMP stack

Client

Webserver

PHP

MySQL

Memcache

Webserver

PHP

Loadbalancer

MQ

Jobs

Spil Storage Platform Abstrac2ng the storage layer

22

•  Dependent on one storage pla9orm •  No more pla9orm-‐specific query language

•  Differen=ate writes •  Op=mis=c (asynchronous) •  Pessimis=c (synchronous)

•  Shard data beher •  Par==on on user and func=on •  Cluster informa=on by users, not by func=on

•  Global expansion •  Par==on on geographic loca=on

•  Solve uneven usage of data storage •  Move data from shard to shard

•  Anything may/could/will fail eventually •  Not designed for the “happy” flow

What was our wishlist?

23

Old architecture overview

24

New architecture overview

25

New architecture overview

Server API

Application Model

Storage platform

Client-side API

Presentation layer

Physical storage

26

•  Everything wrihen in Erlang •  Piqi as protocol •  binary •  JSON •  XML

•  SSP u=lizes local caching (memcache) •  Flexible (persistent) storage layer •  MySQL (various flavors) •  Membase/Couchbase •  Could be any other storage product

•  MQs (DWH updates)

Our building blocks

27

•  Predictable •  Reliable •  Decent performance •  Easy to comprehend •  Excellent eco system •  Libraries •  Monitoring tools •  Knowledge

Why choose MySQL?

28

•  Func=onal language •  High availability: designed for telecom solu=ons •  Excels at concurrency, distribu=on, fault tolerance •  Do more with less! •  Other companies using Erlang:

Why Erlang?

29

•  What is the bucket model? •  Each record has one unique owner ahribute (GID) •  GID (Global IDen=fier) iden=fying different types •  Bucket(s) per func=onality •  Bucket is structured data •  Ahributes contain data of records •  Ahributes do not have to correspond to schema

How do we shard?

30

$ curl -‐X POST -‐H 'Accept: applica=on/json' -‐H \ 'Content-‐Type: applica=on/json' -‐-‐data-‐binary "{\"gid\": \ 288511851128422401}" hhp://127.0.0.1:8777/demobucket/get { "records": [ { "gid": 288511851128422401, "given_name": "g", "registered_on": 1, "email": "mail1", "gender": "m", "birthdate": { "year": 1963, "month": 6, "day": 21 } } ], "meta_info": { "total_ct": 1 } }

Example bucket

31

CREATE TABLE demobucket ( gid bigint(20) unsigned not null, given_name varchar(64) not null, registered_on =nyint(3) unsigned default 0, email varchar(255) not null, gender enum(‘m’, ‘f’, ‘u’) not null default ‘m’, birthdate date not null, PRIMARY KEY(gid) );

Example bucket MySQL 1

32

CREATE TABLE demobucket ( gid bigint(20) unsigned not null, user_name varchar(64) not null, user_register =mestamp on update CURRENT_TIMESTAMP(), user_emailaddress varchar(255) not null, user_gender char(1) not null default ‘m’, user_dob varchar(10) not null, PRIMARY KEY(gid) );

Example bucket MySQL 2

33

CREATE COLUMNFAMILY demobucket ( gid int PRIMARY KEY, given_name varchar, registered_on =mestamp, email varchar, gender varchar, birth_date varchar );

Example bucket Cassandra

34

demobucket:get( #demobucket_get_input{ gid=12345, filters= [ #filter{ ahr= <<"gender">> , op= <<"=">> , parms= {string, <<"f">>}}, #filter{ ahr= <<"registered_on">>, op= <<"sort">>, parms=asc }, #filter{ ahr= <<"gid">>, op= <<"limit">>, parms={int, 10 }} ]} )

Example Erlang filters

35

Pipeline flow of a bucket

36

•  Nearest datacenter (DC) to the end user •  Satellite DC •  Processing and caching •  Do not own/store data

•  Storage DC •  Processing, caching and persistent storage •  Store all same user data in same DC

•  Par==on on user globally •  Global IDen=fier per user

Global distribution

37

•  Contains GIDs and their master DC •  GIDs master DC predefined •  Migrated GIDs get updated

The lookup server

38

•  Globally sharded on GID •  (local) GID Lookup

How does this work?

GID lookup

Shard 1 Shard 2

Persistent storage

39

Master/Satellite DC example

40

•  Spread data even on shards •  Migra=on of buckets between shards

•  GID migra=on between DCs •  Crea=ng a new storage DC needs data migra=on •  Users will automa=cally be migrated a�er visi=ng another DC many =mes

Why do we need data migration?

41

•  Versioning on bucket defini=ons •  GIDs are assigned to a bucket version •  Data in old bucket versions remain (read only) •  New data only gets wrihen to new bucket version •  Updates migrate data to new bucket version •  Migrates can be triggered

Seamless schema upgrades

42

Seamless schema upgrades

Demobucket v1

GID 1234 1235 1236 1237 1238 1239

name Roy Moss Jen

Douglas Denholm Richmond

Demobucket v2

GID

name

gender

GID 1241

name Patricia

gender f

GID 1241 1235

name Patricia Moss

gender f m

GID 1234

1236 1237 1238 1239

name Roy Jen


GID 1234

1237 1238 1239

name Roy


GID 1241 1235 1236

name Patricia Moss Jen

gender f m f

43

•  Every cluster (two masters) will contain two shards •  Data wrihen interleaved •  HA for both shards •  No warmup needed

•  Both masters ac=ve and “warmed up” •  Slaves added (other DC) for HA and backup

Multi Master writes

SSP

Shard 1

Shard 2

44

•  SPAPI is in place •  SSP is (mostly) running in shadow mode •  GID buckets running in produc=on •  Ac=vity feed system first to produc=on •  Satellite DC in early 2013!

Where do we stand now?

45

Questions?

47

•  Presenta=on can be found at: hhp://spil.com/perconalondon2012 •  If you wish to contact me: [email protected] •  Don’t forget to rate my talk!

Thank you!

Technology

Retaining globally distributed high availability