View
2.569
Download
0
Category
Preview:
DESCRIPTION
Short talk given at the Berlin hadoop get together on the 27th of january 2011
Citation preview
Scaling social games“the order of magnitude
challenge”
Paolo Negri @hungryblank
Order of magnitude
DAU:
daily active users
0
250000
500000
750000
1000000
July December
DAU
Flash client (game) HTTP API
Social Games
http://www.flickr.com/photos/stars6/4381851322
Flash client
Social Games
• Game actions need to be persisted and validated
• 1 API call every few secs
HTTP API
Social Games
http://www.flickr.com/photos/stars6/4381851322
• 5000 HTTP reqs/sec
• more than 90% writes
• 60K queries/sec
July 2010
HAproxy
Ruby on Rails
MySQL
• ~ 170 000 daily users
• Plain Ruby on Rails app
• Persistency 100% SQL
July 2010
HAproxy
Ruby on Rails
MySQL
• 1 haproxy server
• multiple RoR servers
• 4 mysql servers (sharded dataset)
July 2010
HAproxy
Ruby on Rails
MySQLSlow down
July 2010
HAproxy
Ruby on Rails
MySQLSlow down
High queries/requestratio
Queries/request
• Which code is triggering extra queries?
• Why in our test environment the ratio is lower than live?
Queries/request
Application Ruby on RailsPlugins
Running code of live system
Queries/request
Plugins
Source of extra queries
• sharding plugin “breaks” std Rails query cache
• Flash wire protocol plugin generates extra queries
Plugins
• Deceiving “feature for free”
• Might provide the right feature
• But might not meet scaling need
Plugins
• Instant code legacy, for new projects also!
• Once added it’s your code
• Even if it’s maintained, will it follow your needs?
Plugins
• Assess code quality when you add it
• Can you afford to maintain/change it?
Plugins
• We fixed it!
• Query cut up to 40% on some requests
Early August
• The MySQL hiccup
• every 70 mins query time spikes x7
0
7.5
15
22.5
30
6:00 6:10 6:20 6:30 6:40 6:50 7:00 7:10 7:20 7:30 7:40 7:50 8:00 8:10
query time in ms
Hiccup causes
• Code (app + plugins + Rails)?
• Some periodic job?
• The devil (AWS)?
Who is periodically blocking MySQL
Hiccup quick fix
• We shard out the top queried table(40% of all queries)
shard 2 shard 4shard 1 shard 3
MySQL servers
Hiccup quick fix
• We shard out the top queried table(40% of all queries)
Top tableshard 2
Top tableshard 4
Top tableshard 1
Top tableshard 3
Other tablesshard 2
Other tablesshard 4
Other tablesshard 1
Other tablesshard 3
Hiccup quick fix• Mysql likes it
• “top table” shards will go a long way in the scaling process
Top tableshard 2
Top tableshard 4
Top tableshard 1
Top tableshard 3
Other tablesshard 2
Other tablesshard 4
Other tablesshard 1
Other tablesshard 3
Hiccup causes
• Code (app + plugins + Rails)?
• Some periodic job?
• The devil (AWS)?
Who is periodically blocking MySQL
None of the Above
Hiccup real cause
• Emerging MySQL internal at high volume
• MySQL flushes its buffer
• Under heavy write IO it’s blocking
Hiccup solution
• Percona MySQL patches (XtraDB) avoid blocking behavior
• Query time profile gets smooth
• IO capacity limit manifested with gradual performance decay
Write through cache
• Memcache in front of MySQL
• Evaluated before sharding
• Was discarded
• Because of our read/write reatio
Write through cache
90% of the times we read datain order to modify it
Write through cache
It means 90% of the times
1. read cache
2. write cache
3. write SQL
Write through cache
• memcache perfs
Read heavy
• Mysql write (unless async)
• Write through lib optimized for writes?
Write heavy
Bound to
MySQL
• Sharding SQL is a painful way to scale
• Data migrations at high load imply downtime
• ACID benefits all lost because of sharding or in name of performance
Redis
• A persistent cache
• Fast 60000 qps on AWS hardware
• Interesting data structures, not only KV
• Already some small scale experince in house
Redis adoption
• Which data to start from?
• How do we migrate without downtime?
• Which Ruby object - Redis structure lib?
Redis adoption
• Which data to start from?
• Best data fit for Redis hashes
• Top 3rd queried table
• a collection of integer fields that need only increment / decrement
Redis adoption
• How do we migrate without downtime?
• Migrate one user at a time
• Use a Redis set to keep note of migrated/non migrated
• No downtime, transparent to users
Redis adoption
• How do we migrate without downtime?
RoRServer
MySQL
Redis
User 123
Redis adoption
• How do we migrate without downtime?
RoRServer
MySQL
Redis
User 123
read original data
Redis adoption
• How do we migrate without downtime?
RoRServer
MySQL
Redis
User 123
write migrated data
Redis adoption
• How do we migrate without downtime?
• Migration might never complete
• SQL + Redis set information to generate final batch migration
Redis 1st result
10% query load from 4 MySQL server
is moved to 1 Redis server
Redis server load is 0.05
Redis
• Becomes the tool to use
• Migration plan for all write intensive data
• Migrate one “class” at a time
Redis honeymoon end
• Memory usage grows more than data
• Snapshot to disk causes spikes in query time
• Starting new slaves eats memory on the master node
Redis honeymoon end
• Redis machine sized with overabundant RAM
• Rigorous slave/master starting plan
Russian Roulette Feeling
Redis
• Redis team acknowledges persistency/replication problems
• Redis 2.4 diskstore plan starts
1.000.000
And counting...
1.000.000
HAproxy
Ruby on Rails
Persistency
painless scaling
1.000.000
HAproxy
Ruby on Rails
Peristency
just add serversas load grows
1.000.000
HAproxy
Ruby on Rails
PeristencyPainful and
troublesome
Infrastructure
• AWS
• Chef - through Scalarium
• Ganglia
Thanks...
woogaIs looking for
Business Intelligence Engineer
http://wooga.com/jobs
Recommended