memcached
scaling your website
with memcached
by: steve yen
about me
• Steve Yen
• NorthScale
• Escalate Software
• Kiva Software
what you’ll learn
• what, where, why, when
• how
• especially, best practices
“mem cache dee”
• latest version1.4.1
• http://code.google.com/p/memcached
open source
distributed cache
livejournal
helps your websites run fast
popular
simple
KISS
easy
small bite-sized steps
• not a huge, forklift replacement rearchitecture / reengineering project
fast
“i only block for memcached”
scalable
many client libraries
• might be TOO many
• the hit list...
• Java ==> spymemcached
• C ==> libmemcached
• Python, Ruby, etc ==>
• libmemcached wrappers
frameworks
• rails
• django
• spring / hibernate
• cakephp, symphony, etc
applications
• drupal
• wordpress
• mediawiki
• etc
it works
it promises to solve performance problems
it delivers!
problem?
your website is too slow
RDBMS melting down
urgent! emergency
one server
web app + RDBMS
1 + 1 servers
web app
RDBMS
N + 1 servers
web app, web app, web app, web app
RDBMS
RDBMS
EXPLAIN PLAN?
buy a bigger box
buy better disks
master write DB + multiple read DB?
vertical partitioning?
sharding?
uh oh, big reengineering
• risky!
• touch every line of code, every query!!
and, it’s 2AM
you need a band-aid
a simple band-aid now
use a cache
keep things in memory!
don’t hit disk
distributed cache
• to avoid wasting memory
don’t write one of these yourself
memcached
simple API
• hash-table-ish
your code before
v = db.query( SOME SLOW QUERY )
your code after
v = memcachedClient.get(key)
if (!v) {
v = db.query( SOME SLOW QUERY )
memcachedClient.set(key, v)
}
cache read-heavy stuff
invalidate when writing
• db.execute(“UPDATE foo WHERE ...”)
• memcachedClient.delete(...)
and, repeat
• each day...
• look for the next slowest operations
• add code to cache a few more things
your life gets better
thank you memcached!
no magic
you are in control
now for the decisions
memcached adoption
• first, start using memcached
• poorly
• but you can breathe again
memcached adoption
• next, start using memcached correctly
memcached adoption
• later
• queueing
• persistence
• replication
• ...
an early question
where to run servers?
answer 1
• right on your web servers
• a great place to start, if you have extra memory
servers
web app web app web app web appmemcached memcached memcached, memcached
RDBMS
add up your memory usage!
• having memcached server swap == bad!
answer 2
• run memcached right on your database server?
• WRONG!
answer 3
• run memcached on separate dedicated memcached servers
• congratulations!
• you either have enough money
• or enough traffic that it matters
running a server
• daemonize
• don’t be root!
• no security
server lists
• mc-server1:11211
• mc-server2:11211
• mc-server3:11211
consistent hashing
source: http://www.spiteful.com/2008/03/17/programmers-toolbox-part-3-consistent-hashing/
client-side intelligence
• no “server master” bottleneck
libmemcached
• fast C memcached client
• supports consistent hashing
• many wrappers to your favorite languages
updating server lists
• push out new configs and restart?
• moxi
• memcached + integrated proxy
keys
• no whitespace
• 250 char limit
• use short prefixes
keys & MD5
• don’t
• stats become useless
values
• any binary object
• 1MB limit
• change #define & recompile if you want more
• and you’re probably doing something wrong if you want more
values
• query resultset
•serialized object
•page fragment
•pages• etc
nginx + memcached
>1 language?
• JSON
• protocol buffers
• XML
memcached is lossy
• memcached WILL lose data
that’s a good thing
remember, it’s a CACHE
why is memcached lossy?
memcached node dies
when node restarts...
• you just get a bunch of cache misses
(and a short RDBMS spike)
eviction
more disappearing data!
LRU
• can config memcached to not evict
• but, you’re probably doing something wrong if you do this
remember, it forgets
• it’s just a CACHE
expiration
• aka, timeouts
• memcached.set(key, value, timeout)
use expirations or not?
1st school of thought
• expirations hide bugs
• you should be doing proper invalidations
• (aka, deletes)
• coherency!
school 2
• it’s 3AM and I can’t think anymore
• business guy:
• “sessions should auto-logout after 30 minutes due to bank security policy”
put sessionsin memcached?
• just a config change
• eg, Ruby on Rails
good
• can load-balance requests to any web host
• don’t touch the RDBMS on every web request
bad
• could lose a user’s session
solution
• save sessions to memcached
• the first time, also save to RDBMS
• ideally, asynchronously
• on cache miss, restore from RDBMS
solution
• save sessions to memcached
• the first time, also save to RDBMS
• ideally, asynchronously
• on cache miss, restore from RDBMS
in the background...
• have a job querying the RDBMS
• cron job?
• the job queries for “old” looking session records in the sessions table
• refresh old session records from memcached
add vs replace vs set
append vs prepend
CAS
• compare - and - swap
incr and decr
• no negative numbers
queueing
• “hey, with those primitives, I could build a queue!”
don’t
• memcached is lossy
• protocol is incorrect for a queue
• instead
• gearman
• beanstalkd
• etc
cache stampedes
• gearman job-unique-id
• encode a timestamp in your values
• one app node randomly decides to refresh slightly early
coherency
denormalization
• or copies of data
example: changing a product price
memcached UDF’s
• another great tool in your toolbox
• on a database trigger, delete stuff from memcached
memcached UDF’s
• works even if you do UPDATES with fancy WHERE clauses
multigets
• they are your friend
• memcached is fast, but...
• imagine 1ms for a get request
• 200 serial gets ==> 200ms
a resultset loop
foreach product in resultset
c = memcached.get(product.category_id)
do something with c
2 loops
for product in resultset
multiget_request.append(product.category_id)
multiget_response = memcachedClient.multiget(
multiget_request)
for c in multiget_response
do something with c
memcached slabber
• allocates memory into slabs
• it might “learn” the wrong slab sizes
• watch eviction stats
losing a node
• means your RDBMS gets hit
replication
• simple replication in libmemcached
• >= 2x memory cost
• only simple verbs
• set, get, delete
• doesn’t handle flapping nodes
persistence
things that speak memcached
• tokyo tyrant
• memcachedb
• moxi
another day
• monitoring & statistics
• near caching
• moxi
thanks!!!
• love any feedback
• your memcached war stories
• your memcached wishlist
thanks!
photo credits
• http://flickr.com/photos/davebluedevil/15877348/
• http://www.flickr.com/photos/theamarand/2874288064/
• http://www.flickr.com/photos/splityarn/3469596708/
• http://www.flickr.com/photos/heisnofool/3241930754/
• http://www.flickr.com/photos/onourminds/2885704630/
• http://www.flickr.com/photos/lunaspin/990825818/