Upload
david-arcos
View
1.449
Download
0
Embed Size (px)
Citation preview
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Efficient Django
David Arcos - @DZPMEfficient Django – #EuroPython 2016
AbstractTips and best practices for avoiding scalability
issues and performance bottlenecks in Django
● 1) Basic concepts: the theory
● 2) Measuring: how to find bottlenecks
● 3) Tips and tricks
● 4) Conclusion (yes, it scales!)
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Hi!
● I'm David Arcos
● Python/Django developer since 2008
● Co-organizer at Python Barcelona
● CTO at Lead Ratings
David Arcos - @DZPMEfficient Django – #EuroPython 2016
● “We improve your sales conversions, using
predictive algorithms to rate the leads”
● Prediction API, “Machine Learning as a Service”
● http://lead-ratings.com
David Arcos - @DZPMEfficient Django – #EuroPython 2016
1) Basic concepts
David Arcos - @DZPMEfficient Django – #EuroPython 2016
The Pareto Principle"For many events, roughly 80% of the effects
come from 20% of the causes"
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Prioritize and focus
Focus on the few tasks that will have the most impact
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Basic scalability“Potential to be enlarged to handle a growing amount of work”
● Stateless app servers
– Load balance them, scale horizontally
● Keep the state on the database(s)
– This is the difficult part! Each system is different
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Database performance
● Do less requests:
– Less reads
– Less writes
● Do faster requests:
– Indexed fields
– De-normalize
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Templates
● Cache them
● Jinja2 is a bit faster than the default engine
– but cache them anyways
● You can do fragment caching (for blocks)
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Cache● Generic approach: cache at each stack level
● The cache documentation is excellent
● Beware of the cache invalidation!
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Cache● Generic approach: cache at each stack level
● The cache documentation is excellent
● Beware of the cache invalidation!
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Bottlenecks● Where is your bottleneck?
● CPU bound or I/O bound?
– CPU? Run heavy calculations in async workers
– Memory? Compress objects before caching
– Database? Read from db replicas
● How to find it?
David Arcos - @DZPMEfficient Django – #EuroPython 2016
2) Measuring
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Can't improve what you don't measure
● Measure your system to find bottlenecks
● Optimize those bottlenecks
● Verify the improvements
● Rinse and repeat!
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Monitoring● System: load, CPU, memory...
● Database: q/s, response time, size
● Cache: q/s, hit rate
● Queue: length
● Custom: metrics for your app
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Profiling● The cProfile module provides profiling of
Python programs by collecting data:
– Number of calls, running time, time per call...
David Arcos - @DZPMEfficient Django – #EuroPython 2016
timeit
● The timeit module is a simple way to time
execution time of small bits of Python code:
David Arcos - @DZPMEfficient Django – #EuroPython 2016
ipdb
● Like pdb, but for ipython
– tab completion, syntax highlighting, better
tracebacks, better introspection…
● Use ipdb.set_trace() to add a breakpoint and
jump in with the debugger
David Arcos - @DZPMEfficient Django – #EuroPython 2016
django-debug-toolbar● Display debug information about the current
request/response
● Panels, very modular
David Arcos - @DZPMEfficient Django – #EuroPython 2016
django-debug-toolbar-line-profiler
● A toolbar panel for profiling
Django Debug Panel
● Chrome extension
● For AJAX requests and non-HTML responses
David Arcos - @DZPMEfficient Django – #EuroPython 2016
3) Tips and tricks
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Add db indexes● Single (db_index) or multiple (index_together)
● Be sure to profile and measure!
– Sometimes it’s not obvious (i.e., admin)
– Huge difference, i.e. from 15s to 3 ms (3.5M rows)
● But: uses more space, slower writes
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Do bulk operations
● Will greatly reduce the number of SQL queries:
– Model.objects.bulk_create()
– qs.update() <- maybe with F() expressions
– qs.delete()
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Get related objects
● Return FK fields in same query:
– qs.select_related()
● Return M2M fields, extra query:
– qs.prefetch_related()
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Slow admin?● Use list_select_related
● Overwrite get_queryset() with prefetch_related
● Is ordering using an index? Same for search_fields
● readonly_fields will avoid FK/M2M queries
● Use the raw_id_fields widget (or better:
django-salmonella)
● Extend admin/filter.html to show filters as <select>
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Cachalot● Caches your Django ORM queries and
automatically invalidates them
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Queues and workers● Do slow stuff later
● Some operations can be queued, and executed
asynchronously in workers
● Use Celery
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Cached sessions
● Use SESSION_ENGINE to set cached sessions:
– Non-persistent: don’t hit the DB
– Persistent: don’t hit the DB… so often
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Persistent connections
● Use CONN_MAX_AGE to set the lifetime of a
database connection (persistence)
David Arcos - @DZPMEfficient Django – #EuroPython 2016
UUIDs
● Use UUID for Primary Keys (instead of
incremental IDs)
– Guaranteed uniqueness, avoid collisions
– UUIDs are well-indexed
● Easier db sharding
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Slow tests?● Skip migrations: --keepdb
● Run in parallel: --parallel
● Disable unused middlewares, installed_apps,
password hashers, logging, etc…
● Use mocking whenever possible
David Arcos - @DZPMEfficient Django – #EuroPython 2016
4) Conclusions
● Measure first
● Optimize only the bottleneck
● Go for the low-hanging fruit
● Measure again
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Good resources
● The official Django documentation
● Book: “High Performance Django”
● Blog: “Instagram Engineering”
● “Latency Numbers Every Programmer Should Know”
David Arcos - @DZPMEfficient Django – #EuroPython 2016
Thanks for attending!
- Get the slides at http://slideshare.net/DZPM
- We are looking for engineers and data scientists!