Upload
spil-engineering
View
305
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Example of a solution for retaining globally distributed high availability with MySQL
Citation preview
Retaining globally distributed high availability Art van Scheppingen Head of Database Engineering
2
1. Who is Spil Games? 2. Theory 3. Spil Storage Pla9orm 4. Ques=ons?
Overview
Who are we? Who is Spil Games?
4
• Company founded in 2001 • 350+ employees world wide • 180M+ unique visitors per month • 45 portals in 19 languages • Casual games • Social games • Real =me mul=player games • Mobile games
• 35+ MySQL clusters • 60k queries per second (3.5 billion qpd)
Facts
5
Geographic Reach 180 Million Monthly Ac=ve Users(*)
Source: (*) Google Analy3cs, August 2012
• Over 45 localized portals in 19 languages • Mul= pla9orm: web, mobile, tablet • Focus on casual and social games • 180M MAU per month (30M YoY growth) • Over 50M registered users
6
Girls, Teens and Family
spielen.com juegos.com gamesgames.com games.co.uk
Brands
Foundations The exci2ng theory
8
• What does it exactly mean?
Retaining globally distributed HA
9
Wikipedia: High availability is a system design approach and associated service implementa=on that ensures a prearranged level of opera=onal performance will be met during a contractual measurement period. Oracle: • Availability of resources in a computer system
What is high availability?
10
• Master with (many) slave(s)
How do we reach HA with MySQL?
Master
Slave Slave Slave
11
• Master with (many) slave(s) • Mul= Master
How do we reach HA with MySQL?
Master
Slave
Master
Slave
12
• Master with (many) slave(s) • Mul= Master • Clustering
How do we reach HA with MySQL?
MysqldMysqld
ndbd
ndbd ndbd
ndbd ndbd
mgmt
13
• Master with (many) slave(s) • Mul= Master • Clustering • Geographical redundancy
How do we reach HA with MySQL?
Master local DC
Slave local DC
Slave Asia Slave US
14
• Scale up • Ver=cal • Faster CPU/Memory/disks • Expensive • Costs mul=ply in same rate as # of nodes
• Scale out • Horizontal • More (small) machines • Inexpensive • Par==oning/federa=ng (sharding)
What if we keep growing?
15
• Func=onal • Shard your database func=onally
• Reads • Add more slaves (keep them coming!)
• Writes • More disks • Horizontal par==oning • Federated par==ons
Scale out
16
• Breaking up tables in small parts on the same host • Par==oned on a column • Infinite growth (as long as you add diskspace) • Less used data to slower (cheaper) disks • No stored procedures, func=ons, etc • Uneven usage of par==ons (hash par==on may help) • Once wrihen, data remains on the par==on
Horizontal partitioning
17
• Breaking up your table in parts on mul=ple hosts • Par==oned on a column • Infinite growth (as long as you add hosts) • Less used data on slower hosts • Not supported in (standard) MySQL • Par==oning on applica=on level (or proxy) • Alterna=vely: NDB
• Uneven usage of par==ons • Once wrihen data (mostly) remains on the par==on • Parallel queries to retrieve data from all shards
Federated partitions (sharding)
18
• Parallel execu=on of sequen=al jobs • Limited by the weakest link • As fast as the slowest node • Fix: nonsequen=al (asynchronous) execu=on
Amdahl's law
19
Typical LAMP stack
Client
Webserver
PHP
MySQL
Memcache
Webserver
PHP
Loadbalancer
20
A-typical LAMP stack
Client
Webserver
PHP
MySQL
Memcache
Webserver
PHP
Loadbalancer
MQ
Jobs
Spil Storage Platform Abstrac2ng the storage layer
22
• Dependent on one storage pla9orm • No more pla9orm-‐specific query language
• Differen=ate writes • Op=mis=c (asynchronous) • Pessimis=c (synchronous)
• Shard data beher • Par==on on user and func=on • Cluster informa=on by users, not by func=on
• Global expansion • Par==on on geographic loca=on
• Solve uneven usage of data storage • Move data from shard to shard
• Anything may/could/will fail eventually • Not designed for the “happy” flow
What was our wishlist?
23
Old architecture overview
24
New architecture overview
25
New architecture overview
Server API
Application Model
Storage platform
Client-side API
Presentation layer
Physical storage
26
• Everything wrihen in Erlang • Piqi as protocol • binary • JSON • XML
• SSP u=lizes local caching (memcache) • Flexible (persistent) storage layer • MySQL (various flavors) • Membase/Couchbase • Could be any other storage product
• MQs (DWH updates)
Our building blocks
27
• Predictable • Reliable • Decent performance • Easy to comprehend • Excellent eco system • Libraries • Monitoring tools • Knowledge
Why choose MySQL?
28
• Func=onal language • High availability: designed for telecom solu=ons • Excels at concurrency, distribu=on, fault tolerance • Do more with less! • Other companies using Erlang:
Why Erlang?
29
• What is the bucket model? • Each record has one unique owner ahribute (GID) • GID (Global IDen=fier) iden=fying different types • Bucket(s) per func=onality • Bucket is structured data • Ahributes contain data of records • Ahributes do not have to correspond to schema
How do we shard?
30
$ curl -‐X POST -‐H 'Accept: applica=on/json' -‐H \ 'Content-‐Type: applica=on/json' -‐-‐data-‐binary "{\"gid\": \ 288511851128422401}" hhp://127.0.0.1:8777/demobucket/get { "records": [ { "gid": 288511851128422401, "given_name": "g", "registered_on": 1, "email": "mail1", "gender": "m", "birthdate": { "year": 1963, "month": 6, "day": 21 } } ], "meta_info": { "total_ct": 1 } }
Example bucket
31
CREATE TABLE demobucket ( gid bigint(20) unsigned not null, given_name varchar(64) not null, registered_on =nyint(3) unsigned default 0, email varchar(255) not null, gender enum(‘m’, ‘f’, ‘u’) not null default ‘m’, birthdate date not null, PRIMARY KEY(gid) );
Example bucket MySQL 1
32
CREATE TABLE demobucket ( gid bigint(20) unsigned not null, user_name varchar(64) not null, user_register =mestamp on update CURRENT_TIMESTAMP(), user_emailaddress varchar(255) not null, user_gender char(1) not null default ‘m’, user_dob varchar(10) not null, PRIMARY KEY(gid) );
Example bucket MySQL 2
33
CREATE COLUMNFAMILY demobucket ( gid int PRIMARY KEY, given_name varchar, registered_on =mestamp, email varchar, gender varchar, birth_date varchar );
Example bucket Cassandra
34
demobucket:get( #demobucket_get_input{ gid=12345, filters= [ #filter{ ahr= <<"gender">> , op= <<"=">> , parms= {string, <<"f">>}}, #filter{ ahr= <<"registered_on">>, op= <<"sort">>, parms=asc }, #filter{ ahr= <<"gid">>, op= <<"limit">>, parms={int, 10 }} ]} )
Example Erlang filters
35
Pipeline flow of a bucket
36
• Nearest datacenter (DC) to the end user • Satellite DC • Processing and caching • Do not own/store data
• Storage DC • Processing, caching and persistent storage • Store all same user data in same DC
• Par==on on user globally • Global IDen=fier per user
Global distribution
37
• Contains GIDs and their master DC • GIDs master DC predefined • Migrated GIDs get updated
The lookup server
38
• Globally sharded on GID • (local) GID Lookup
How does this work?
GID lookup
Shard 1 Shard 2
Persistent storage
39
Master/Satellite DC example
40
• Spread data even on shards • Migra=on of buckets between shards
• GID migra=on between DCs • Crea=ng a new storage DC needs data migra=on • Users will automa=cally be migrated a�er visi=ng another DC many =mes
Why do we need data migration?
41
• Versioning on bucket defini=ons • GIDs are assigned to a bucket version • Data in old bucket versions remain (read only) • New data only gets wrihen to new bucket version • Updates migrate data to new bucket version • Migrates can be triggered
Seamless schema upgrades
42
Seamless schema upgrades
Demobucket v1
GID 1234 1235 1236 1237 1238 1239
name Roy Moss Jen
Douglas Denholm Richmond
Demobucket v2
GID
name
gender
GID 1241
name Patricia
gender f
GID 1241 1235
name Patricia Moss
gender f m
GID 1234
1236 1237 1238 1239
name Roy Jen
Douglas Denholm Richmond
GID 1234
1237 1238 1239
name Roy
Douglas Denholm Richmond
GID 1241 1235 1236
name Patricia Moss Jen
gender f m f
43
• Every cluster (two masters) will contain two shards • Data wrihen interleaved • HA for both shards • No warmup needed
• Both masters ac=ve and “warmed up” • Slaves added (other DC) for HA and backup
Multi Master writes
SSP
Shard 1
Shard 2
44
• SPAPI is in place • SSP is (mostly) running in shadow mode • GID buckets running in produc=on • Ac=vity feed system first to produc=on • Satellite DC in early 2013!
Where do we stand now?
45
Questions?
47
• Presenta=on can be found at: hhp://spil.com/perconalondon2012 • If you wish to contact me: [email protected] • Don’t forget to rate my talk!
Thank you!