Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Fast numerics in Python - NumPyand PyPy
Maciej Fijałkowski
SEA, NCAR
22 February 2012
fijal PyPy
What is this talk about?
What is PyPy and why?Numeric landscape in PythonWhat we achieved in PyPyWhere we’re going?
fijal PyPy
What is PyPy?
An efficient implementation of PythonlanguageA framework for writing efficient dynamiclanguage implementationsAn open source project with a lot of volunteereffort, released under the MIT licenseAgile development, 13000 unit tests, continuousintegration, sprints, distributed teamI’ll talk today about the first part (mostly)
fijal PyPy
PyPy status right now
An efficient just in time compiler for thePython languageRelatively “good’ on numerics (compared toother dynamic languages)Example - real time video processing2-300x faster on Python code
fijal PyPy
Why should you care?
If I write this stuff in C/fortran/assembler it’ll befaster anywaymaybe, but ...
fijal PyPy
Why should you care? (2)
Experimentation is importantImplementing something faster, in human time,leaves more time for optimizations andimprovementsFor novel algorithms, clearer implementationmakes them easier to evaluate (Python often iscleaner than C)
Sometimes makes it possible in the first place
fijal PyPy
Why should you care? (2)
Experimentation is importantImplementing something faster, in human time,leaves more time for optimizations andimprovementsFor novel algorithms, clearer implementationmakes them easier to evaluate (Python often iscleaner than C)
Sometimes makes it possible in the first place
fijal PyPy
Why would you care even more?
Growing communityEverything is for free with reasonable licensingThere are many smart people out thereaddressing hard problems
fijal PyPy
Example of why would you care
You spend a year writing optimized algorithmsfor a GPUNext year a new generation of GPUs come alongYour algorithms are no longer optimized
Alternative - express your algorithmsLeave low-level details to people who havenothing better to do
... like me (I don’t know enough Physics to dothe other part)
fijal PyPy
Example of why would you care
You spend a year writing optimized algorithmsfor a GPUNext year a new generation of GPUs come alongYour algorithms are no longer optimized
Alternative - express your algorithmsLeave low-level details to people who havenothing better to do
... like me (I don’t know enough Physics to dothe other part)
fijal PyPy
Example of why would you care
You spend a year writing optimized algorithmsfor a GPUNext year a new generation of GPUs come alongYour algorithms are no longer optimized
Alternative - express your algorithmsLeave low-level details to people who havenothing better to do
... like me (I don’t know enough Physics to dothe other part)
fijal PyPy
Numerics in Python
numpy - for array operationsscipy, scikits - various algorithms, alsoexposing C/fortran librariesmatplotlib - pretty picturesipython
fijal PyPy
There is an entire ecosystem!
Which I don’t even know very wellPyCUDA
pandas
mayavi
fijal PyPy
What’s important?
There is an entire ecosystem built by peopleIt’s available for free, no shady licensingIt’s being expandedIt’s growingIt’ll keep up with hardware advancments
fijal PyPy
Problems with numerics in python
Stuff is reasonably fast, but...
Only if you don’t actually write much PythonArray operations are fine as long as they’revectorizedNot everything is expressable that wayNumpy allocates intermediates for eachoperation, suboptimal
fijal PyPy
Problems with numerics in python
Stuff is reasonably fast, but...
Only if you don’t actually write much PythonArray operations are fine as long as they’revectorizedNot everything is expressable that wayNumpy allocates intermediates for eachoperation, suboptimal
fijal PyPy
Our approach
Build a tree of operationsCompile assembler specialized for aliasing andoperationsExecute the specialized assembler
fijal PyPy
Examples
a, b, c are single dimensional arraysa+a would generate different code than a+ba+b*c is as fast as a loop
fijal PyPy
Performance comparison
NumPyPyPy GCCa+b 0.6s 0.4s 0.3s
(0.25s)a+b+c 1.9s 0.5s 0.7s
(0.32s)5+ 3.2s 0.8s 1.7s
(0.51s)
Pathscale is actually slower
fijal PyPy
Performance comparion SSE
Branch only so far!
PyPySSE
PyPy GCC
a+b 0.3s 0.4s 0.3s(0.25s)
a+b+c 0.35s 0.5s 0.7s(0.32s)
5+ 0.36s 0.8s 1.7s(0.51s)
fijal PyPy
Status
This works reasonably wellFar from implementing the entire numpy,although it’s in progressAssembler generation backend needs worksVectorization in progress
fijal PyPy
Status benchmarks - slightly morecomplex
laplace solutionsolutions:
NumPy: 4.3slooped: too long to run ~2100s
PyPy: 1.6slooped: 2.5s
C: 0.9s
fijal PyPy
Progress plan
Express operations in high-level languagesLet us deal with low level details
However, retain knobs and buttons for advancedusersDon’t get penalized too much for not using them
fijal PyPy
Progress plan
Express operations in high-level languagesLet us deal with low level details
However, retain knobs and buttons for advancedusersDon’t get penalized too much for not using them
fijal PyPy
Few words about the future
Predictions are hard
Especially when it comes to futureTake this with a grain of salt
fijal PyPy
Few words about the future
Predictions are hard
Especially when it comes to futureTake this with a grain of salt
fijal PyPy
This is just the beginning...
PyPy is an easy platform to experiment withWe did not spend a whole lot of time dealing withthe low-level optimizationsAutomatic vectorization over multiple threadsSSE, GPU, dynamic offloadingOptimizations based on machine cache sizeWe’re running a fundraiser, make your employerdonate money
fijal PyPy
Q&A
http://pypy.org/
http://buildbot.pypy.org/numpy-status/latest.html
http://morepypy.blogspot.com/
Any questions?
fijal PyPy