Upload
michael-domanski
View
2.654
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Slides from europython2010 conference in Birmingham on the subject of caching in python.
Citation preview
Caching techinques in python
Michael Domanskieuropython 2010
czwartek, 22 lipca 2010
who I am
• python developer, professionally for a few years now
• experienced also in c and objective-c
• currently working for 10clouds.com
czwartek, 22 lipca 2010
Interesting intro
• a bit of theory
• common patterns
• common problems
• common solutions
czwartek, 22 lipca 2010
How I think about cache
• imagine a giant dict storing all your data
• you have to manage all data manually
• or provide some automated behaviour
czwartek, 22 lipca 2010
similar to....
• manual memory managment in c
• cache is memory
• and you have to controll it manually
czwartek, 22 lipca 2010
profits
• improved performance
• ...?
czwartek, 22 lipca 2010
problems
• managing any type of memory is hard
• automation often have to be done custom each time
czwartek, 22 lipca 2010
common patterns
czwartek, 22 lipca 2010
memoization
czwartek, 22 lipca 2010
• very old pattern (circa 1968)
• we own the name to Donald Mitchie
czwartek, 22 lipca 2010
• we assosciate input with output, and store in somewhere
• based on the assumption that for a given input, output is always the same
how it works
czwartek, 22 lipca 2010
code example
CACHE_DICT = {}
def cached(key): def func_wrapper(func): def arg_wrapper(*args, **kwargs): if not key in CACHE_DICT: value = func(*args, **kwargs) CACHE_DICT[key] = value return CACHE_DICT[key] return arg_wrapper return func_wrapper
czwartek, 22 lipca 2010
what if output can change?
• our pattern is still usefull
• we simply need to add something
czwartek, 22 lipca 2010
cache invalidation
czwartek, 22 lipca 2010
There are only two hard problems in Computer Science: cache invalidation and naming things
Phil Karlton
czwartek, 22 lipca 2010
• basically, we update data in cache
• we need to know when and what to change
• the more granular you want to be, the harder it gets
czwartek, 22 lipca 2010
def invalidate(key): try: del CACHE_DICT[key] except KeyError: print "someone tried to invalidate not present key: %s" %key
code example
czwartek, 22 lipca 2010
common problems
czwartek, 22 lipca 2010
invalidating too much/not enough
• flushing all data any time something changes
• not flushing cache at all
• tragic effects
czwartek, 22 lipca 2010
@cached('key1')def simple_function1(): return db_get(id=1)
@cached('key2')def simple_function2(): return db_get(id=2)
# SUPPOSE THIS IS IN ANOTHER MODULE
@cached('big_key1')def some_bigger_function(): """ this function depends on big_key1, key1 and key2 """ def inner_workings(): db_set(1, 'something totally new') ####### ## imagine 100 lines of code here :) ###### inner_workings()
return [simple_function1(),simple_function2()]
if __name__ == '__main__': simple_function1() simple_function2() a,b = some_bigger_function() assert a == db_get(id=1), "this fails because we didn't invalidated cache properly"
czwartek, 22 lipca 2010
invalidating too soon/too late
• your cache have to be synchronised to you db
• sometimes very hard to spot
• leads to tragic mistakes
czwartek, 22 lipca 2010
@cached('key1')def simple_function1(): return db_get(id=1)
@cached('key2')def simple_function2(): return db_get(id=2)
# SUPPOSE THIS IS IN ANOTHER MODULE
def some_bigger_function(): db_set(1, 'something') value = simple_function1() db_set(2, 'something else') #### now we know we used 2 cached functions so.... invalidate('key1') invalidate('key2') #### now we know we are safe, but for a price return simple_function2()
if __name__ == '__main__': some_bigger_function()
czwartek, 22 lipca 2010
superposition of dependancy
• somehow less obvious problem
• eventually you will start caching effects of computation
• you have to know very preciselly of what your data is dependant
czwartek, 22 lipca 2010
@cached('key1')def simple_function1(): return db_get(id=1)
@cached('key2')def simple_function2(): return db_get(id=2)
# SUPPOSE THIS IS IN ANOTHER MODULE
@cached('key')def some_bigger_function():
return { '1': simple_function1(), '2': simple_function2(), '3': db_get(id=3) }
if __name__ == '__main__': simple_function1() # somewhere else db_set(1, 'foobar') # and again db_set(3, 'bazbar') invalidate('key') # ooops, we forgot something data = some_bigger_function() assert data['1'] == db_get(id=1), "this fails because we didn't manage to invalidate all the keys"
czwartek, 22 lipca 2010
summing up
• know your data....
• be aware what and when you cache
• take care when using cached data in computation
czwartek, 22 lipca 2010
common solutions
czwartek, 22 lipca 2010
process level cache
czwartek, 22 lipca 2010
why?
• very fast access
• simple to implement
• very effective as long as you’re using single process
czwartek, 22 lipca 2010
clever tricks with dicts
czwartek, 22 lipca 2010
code example
CACHE_DICT = {}
def cached(key): def func_wrapper(func): def arg_wrapper(*args, **kwargs): if not key in CACHE_DICT: value = func(*args, **kwargs) CACHE_DICT[key] = value return CACHE_DICT[key] return arg_wrapper return func_wrapper
czwartek, 22 lipca 2010
invalidation
czwartek, 22 lipca 2010
def invalidate(key): try: del CACHE_DICT[key] except KeyError: print "someone tried to invalidate not present key: %s" %key
code example
czwartek, 22 lipca 2010
application level cache
czwartek, 22 lipca 2010
memcache
czwartek, 22 lipca 2010
• battle tested
• scales
• fast
• supports a few cool features
• behaves a lot like dict
• supports time-based expiration
czwartek, 22 lipca 2010
• python-memcache
• python-libmemcache
• python-cmemcache
• pylibmc
libraries?
czwartek, 22 lipca 2010
why no benchmarks
• not the point of this talk :)
• benchmarks are generic, caching is specific
• pick your flavour, think for yourself
czwartek, 22 lipca 2010
cache = memcache.Client(['localhost:11211'])
def memcached(key): def func_wrapper(func): def arg_wrapper(*args, **kwargs): value = cache.get(str(key)) if not value: value = func(*args, **kwargs) cache.set(str(key), value) return value return arg_wrapper return func_wrapper
code example
czwartek, 22 lipca 2010
invalidation
czwartek, 22 lipca 2010
def mem_invalidate(key): cache.set(str(key), None)
code example
czwartek, 22 lipca 2010
batch key managment
czwartek, 22 lipca 2010
• what if I don’t want to expire each key manually
• that’s a lot to remember
• and we have to be carefull :(
czwartek, 22 lipca 2010
groups?
• group keys into sets
• which are tied to one key per set
• expire one key, instead of twenty
czwartek, 22 lipca 2010
how to get there?
• store some extra data
• you can store dicts in cache
• and cache behaves like dict
• so it’s a case of comparing keys and values
czwartek, 22 lipca 2010
#we start with specified key and groupkey='some_key'group='some_group'
# now retrieve some data from memcacheddata=memcached_client.get_multi(key, group)# now data is a dict that should look like #{'some_key' :{'group_key' : '1234',# 'value' : 'some_value' },# 'some_group' : '1234'}#if data and (key in data) and (group in data): if data[key]['group_key']==data[group]: return data[key]['value']
czwartek, 22 lipca 2010
def cached(key, group_key='', exp_time=0 ):
# we don't want to mix time based and event based expiration models if group_key : assert exp_time==0, "can't set expiration time for grouped keys" def f_wrapper(func): def arg_wrapper(*args, **kwargs): value = None if group_key: data = cache.get_multi([tools.make_key(group_key)]+[tools.make_key(key)]) data_dict = data.get(tools.make_key(key)) if data_dict: value = data_dict['value'] group_value = data_dict['group_value'] if group_value != data[tools.make_key(group_key)]: value = None else: value = cache.get(key) if not value: value = func(*args, **kwargs) if exp_time: cache.set(tools.make_key(key), value, exp_time) elif not group_key: cache.set(tools.make_key(key), value) else: # exp_time not set and we have group_keys group_value = make_group_value(group_key) data_dict = { 'value':value, 'group_value': group_value} cache.set_multi({ tools.make_key(key):data_dict, tools.make_key(group_key):group_value }) return value arg_wrapper.__name__ = func.__name__ return arg_wrapper return f_wrapper
czwartek, 22 lipca 2010
questions?
czwartek, 22 lipca 2010
code samples @http://github.com/
mdomans/europython2010
czwartek, 22 lipca 2010
follow me
twitter: mdomansblog: blog.mdomans.com
czwartek, 22 lipca 2010