35
David Arcos - @DZPM Efficient Django – #EuroPython 2016 Efficient Django

Efficient Django

Embed Size (px)

Citation preview

Page 1: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

Efficient Django

Page 2: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

AbstractTips and best practices for avoiding scalability

issues and performance bottlenecks in Django

● 1) Basic concepts: the theory

● 2) Measuring: how to find bottlenecks

● 3) Tips and tricks

● 4) Conclusion (yes, it scales!)

Page 3: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

Hi!

● I'm David Arcos

● Python/Django developer since 2008

● Co-organizer at Python Barcelona

● CTO at Lead Ratings

Page 4: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

● “We improve your sales conversions, using

predictive algorithms to rate the leads”

● Prediction API, “Machine Learning as a Service”

● http://lead-ratings.com

Page 5: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

1) Basic concepts

Page 6: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

The Pareto Principle"For many events, roughly 80% of the effects

come from 20% of the causes"

Page 7: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

Prioritize and focus

Focus on the few tasks that will have the most impact

Page 8: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

Basic scalability“Potential to be enlarged to handle a growing amount of work”

● Stateless app servers

– Load balance them, scale horizontally

● Keep the state on the database(s)

– This is the difficult part! Each system is different

Page 9: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

Database performance

● Do less requests:

– Less reads

– Less writes

● Do faster requests:

– Indexed fields

– De-normalize

Page 10: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

Templates

● Cache them

● Jinja2 is a bit faster than the default engine

– but cache them anyways

● You can do fragment caching (for blocks)

Page 11: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

Cache● Generic approach: cache at each stack level

● The cache documentation is excellent

● Beware of the cache invalidation!

Page 12: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

Cache● Generic approach: cache at each stack level

● The cache documentation is excellent

● Beware of the cache invalidation!

Page 13: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

Bottlenecks● Where is your bottleneck?

● CPU bound or I/O bound?

– CPU? Run heavy calculations in async workers

– Memory? Compress objects before caching

– Database? Read from db replicas

● How to find it?

Page 14: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

2) Measuring

Page 15: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

Can't improve what you don't measure

● Measure your system to find bottlenecks

● Optimize those bottlenecks

● Verify the improvements

● Rinse and repeat!

Page 16: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

Monitoring● System: load, CPU, memory...

● Database: q/s, response time, size

● Cache: q/s, hit rate

● Queue: length

● Custom: metrics for your app

Page 17: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

Profiling● The cProfile module provides profiling of

Python programs by collecting data:

– Number of calls, running time, time per call...

Page 18: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

timeit

● The timeit module is a simple way to time

execution time of small bits of Python code:

Page 19: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

ipdb

● Like pdb, but for ipython

– tab completion, syntax highlighting, better

tracebacks, better introspection…

● Use ipdb.set_trace() to add a breakpoint and

jump in with the debugger

Page 20: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

django-debug-toolbar● Display debug information about the current

request/response

● Panels, very modular

Page 21: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

django-debug-toolbar-line-profiler

● A toolbar panel for profiling

Django Debug Panel

● Chrome extension

● For AJAX requests and non-HTML responses

Page 22: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

3) Tips and tricks

Page 23: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

Add db indexes● Single (db_index) or multiple (index_together)

● Be sure to profile and measure!

– Sometimes it’s not obvious (i.e., admin)

– Huge difference, i.e. from 15s to 3 ms (3.5M rows)

● But: uses more space, slower writes

Page 24: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

Do bulk operations

● Will greatly reduce the number of SQL queries:

– Model.objects.bulk_create()

– qs.update() <- maybe with F() expressions

– qs.delete()

Page 25: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

Get related objects

● Return FK fields in same query:

– qs.select_related()

● Return M2M fields, extra query:

– qs.prefetch_related()

Page 26: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

Slow admin?● Use list_select_related

● Overwrite get_queryset() with prefetch_related

● Is ordering using an index? Same for search_fields

● readonly_fields will avoid FK/M2M queries

● Use the raw_id_fields widget (or better:

django-salmonella)

● Extend admin/filter.html to show filters as <select>

Page 27: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

Cachalot● Caches your Django ORM queries and

automatically invalidates them

Page 28: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

Queues and workers● Do slow stuff later

● Some operations can be queued, and executed

asynchronously in workers

● Use Celery

Page 29: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

Cached sessions

● Use SESSION_ENGINE to set cached sessions:

– Non-persistent: don’t hit the DB

– Persistent: don’t hit the DB… so often

Page 30: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

Persistent connections

● Use CONN_MAX_AGE to set the lifetime of a

database connection (persistence)

Page 31: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

UUIDs

● Use UUID for Primary Keys (instead of

incremental IDs)

– Guaranteed uniqueness, avoid collisions

– UUIDs are well-indexed

● Easier db sharding

Page 32: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

Slow tests?● Skip migrations: --keepdb

● Run in parallel: --parallel

● Disable unused middlewares, installed_apps,

password hashers, logging, etc…

● Use mocking whenever possible

Page 33: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

4) Conclusions

● Measure first

● Optimize only the bottleneck

● Go for the low-hanging fruit

● Measure again

Page 34: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

Good resources

● The official Django documentation

● Book: “High Performance Django”

● Blog: “Instagram Engineering”

● “Latency Numbers Every Programmer Should Know”

Page 35: Efficient Django

David Arcos - @DZPMEfficient Django – #EuroPython 2016

Thanks for attending!

- Get the slides at http://slideshare.net/DZPM

- We are looking for engineers and data scientists!