Pypy is-it-ready-for-production-the-sequel

Preview:

DESCRIPTION

Slidedeck for my talk at Pycon Singapore 2013.

Citation preview

the sequel

Mark Rees

CTO

Century Software (M) Sdn Bhd

is it ready for production?

pypy & me

not affiliated with pypy team

have followed it‟s development since

2004

use cpython and jython at work

used ironpython for small projects

gave a similar talk at PyConAU 2012

the question:

would pypy improve performance of

some of our workloads?

i am a manager, who still is wants to be a

programmer, so i did the analysis

pypy

history- first sprint 2003, EU project from 2004 – 2007

- open source project from 2007

https://bitbucket.org/pypy

- pypy 1.4 first release suitable for “production”

12/2010

what is pypy?- RPython translation toolchain, a framework for

generating dynamic programming language

implementations

- a implementation of Python in Python using the

framework

pypy

current releasepypy 2.0 released may 2013

latest iteration 2.0.2

want to know more about pypy- http://pypy.org/

- david beazley pycon 2012 keynote

http://goo.gl/5PXFQ

- how the pypy jit works http://goo.gl/dKgFp

- why pypy by example http://goo.gl/vpQyJ

production ready – a definition

it runs

it satisfies the project requirements

its design was well thought out

it's stable

it's maintainable

it's scalable

it's documented

it works with the python modules we use

it is as fast or faster than cpython

http://programmers.stackexchange.com/questions/61726/define-production-ready

pypy – does it run?

of course, it runs

See http://pypy.readthedocs.org/en/latest/cpython_differences.html

for differences between PyPy and CPython

pypy – other production criteria

does it satisfy the project requirements

- yes

is it‟s design was well thought out

- I would assume so

is it stable

- yes

is it maintainable

- 7 out of 10

is it scalable

- stackless & greenlets built in

is it documented

- cpython docs for functionality, rpython toolchain 8 out

of 10

pypy – does it work with the modules we use

standard library modules supported:

__builtin__, __pypy__, _ast, _bisect, _codecs, _collections, _ffi, _hashlib,

_io, _locale, _lsprof, _md5, _minimal_curses, _multiprocessing, _random,

_rawffi, _sha, _socket, _sre, _ssl, _warnings, _weakref, _winreg, array,

binascii, bz2, cStringIO, clr, cmath, cpyext, crypt, errno, exceptions,

fcntl, gc, imp, itertools, marshal, math, mmap, operator, oracle, parser,

posix, pyexpat, select, signal, struct, symbol, sys, termios, thread, time,

token, unicodedata, zipimport, zlib

these modules are supported but written in

python:cPickle, _csv, ctypes, datetime, dbm, _functools, grp, pwd, readline,

resource, sqlite3, syslog, tputil

many python libs are known to work, like:ctypes, django, pyglet, sqlalchemy, PIL. See

https://bitbucket.org/pypy/compatibility/wiki/Home for a more

exhaustive list.

pypy – does it work with the modules we use

pypy c-api support is beta, worked most of

the time but failed with reportlab:Fatal error in cpyext, CPython compatibility layer, calling

PySequence_GetItemEither report a bug or consider not using this particular extension<OpErrFmt object at 0x7f94582f3100>RPython traceback:File ”pypy_module_cpyext_api_1.c", line 30287, in PySequence_GetItemFile ”pypy_module_cpyext_pyobject.c", line 1056, in

BaseCpyTypedescr_realizeFile ”pypy_objspace_std_objspace.c", line 3404, in

allocate_instance__W_ObjectObjectFile ”pypy_objspace_std_typeobject.c", line 33781, in

W_TypeObject_check_user_subclassSegmentation fault

But this was the only compatibility issue we

had running all of our python code under

pypy and we could fallback to pure python

reportlab extensions anyway.

pypy – does it work with the modules you use

Ipython notebook requires tornado & zeromq

pypy – does it work with the modules you use

pypy – does it run as fast as cpython

http://speed.pypy.org/

but!

pypy django benchmark

DJANGO_TMPL = Template("""<table>

{% for row in table %}

<tr>{% for col in row %}<td>{{ col|escape }}</td>{% endfor %}</tr>

{% endfor %}

</table>

""")

def test_django(count):

table = [xrange(150) for _ in xrange(150)]

context = Context({"table": table})

# Warm up Django.

DJANGO_TMPL.render(context)

DJANGO_TMPL.render(context)

times = []

for _ in xrange(count):

t0 = time.time()

data = DJANGO_TMPL.render(context)

t1 = time.time()

times.append(t1 - t0)

return times

my csv to xml benchmark

def bench(data, output):

f = open(data, 'rb')

fn = [„age‟,….]

reader = csv.DictReader(f, fn)

writer = SAXWriter(output)

writer.start_doc()

writer.start_tag('data')

try:

for row in reader:

writer.start_tag('row')

for key in row.keys():

writer.tag(key.replace(' ', '_'), body=row[key])

writer.end_tag('row')

finally:

f.close()

writer.end_tag('data')

writer.end_doc()

my pypy benchmarks

https://bitbucket.org/hexdump42/pypy-benchmarks

benchmark cpython

2.7.3

pypy-jit

1.9

pypy-jit

2.0.2

bm_csv2xml 88.26/94.

04

28.89 3.0549 x

faster

23.86 3.7728x

faster

average execution time (in seconds)

my pypy benchmarks

https://bitbucket.org/hexdump42/pypy-benchmarks

benchmark cpython

2.7.3

pypy-jit

1.9

pypy-jit

2.0.2

bm_csv2xml 88.26/94.

04

28.89 3.0549 x

faster

23.86 3.7728x

faster

bm_csv 1.54/1.65 5.89 3.8122 x

slower

1.72 0.9825 x

slower

average execution time (in seconds)

my pypy benchmarks

https://bitbucket.org/hexdump42/pypy-benchmarks

benchmark cpython

2.7.3

pypy-jit

1.9

pypy-jit

2.0.2

bm_csv2xml 88.26/94.

04

28.89 3.0549 x

faster

23.86 3.7728x

faster

bm_csv 1.54/1.65 5.89 3.8122 x

slower

1.72 0.9825 x

slower

bm_openpyxl 1.31/1.21 3.26 2.4871 x

slower

3.15 2.6051 x

slower

average execution time (in seconds)

my pypy benchmarks

https://bitbucket.org/hexdump42/pypy-benchmarks

benchmark cpython

2.7.3

pypy-jit

1.9

pypy-jit

2.0.2

bm_csv2xml 88.26/94.

04

28.89 3.0549 x

faster

23.86 3.7728x

faster

bm_csv 1.54/1.65 5.89 3.8122 x

slower

1.72 0.9825 x

slower

bm_openpyxml 1.31/1.21 3.26 2.4871 x

slower

3.15 2.6051 x

slower

bm_xhtml2pdf 1.91/1.95 3.27 1.7155 x

slower

4.22 2.1637 x

slower

average execution time (in seconds)

my pypy benchmarks

https://bitbucket.org/hexdump42/pypy-benchmarks

benchmark cpython

2.7.3

pypy-jit

1.9

pypy-jit

2.0.2

bm_interp 5412/5248 12556 2.32 x

larger

21880 4.1692 x

larger

bm_csv2xml 7048/7064 55180 7.8292 x

larger

55232 7.8188 x

larger

bm_csv 5812/5180 52200 8.9814 x

larger

52176 10.0726

x larger

bm_openpyxl 12656/

12656

77252 6.1040 x

larger

80428 6.3549 x

larger

bm_xhtml2pdf 48880/

34884

236792 4.8444 x

larger

101376 2.906 x

larger

max memory use

what is the pypy jit doing?

https://bitbucket.org/pypy/jitviewer/

modified csv pypy benchmarks

https://bitbucket.org/hexdump42/pypy-benchmarks

benchmark cpython

2.7.3

pypy-jit

1.9

pypy-jit

2.0.2

bm_csv2xml_mod 88.25/90.02 23.65 3.7315 x

faster

21.76 4.0556 x

faster

bm_csv_mod 1.62/1.69 1.89 0.8571 x

slower

1.68 0.9643 x

slower

average execution time (in seconds)

is pypy ready for production

1. it runs

2. it satisfies the project requirements

3. its design was well thought out

4. it's stable

5. it's maintainable

6. it's scalable

7. it's documented

8. it works with the python modules we use

9. it can be as fast or faster than cpython

some other reasons to consider pypy

cffi – C foreign function interface for python

- http://cffi.readthedocs.org/

pypy version of numpy

py3k version of pypy work-in-progress

check out the STM/AME project

-

https://speakerdeck.com/pyconslides/pypy-

python-without-the-gil-by-armin-rigo-and-

maciej-fijalkowski

You can help

http://www.pypy.org/howtohelp.html

now for something different

cffi better than ctypes?

cffi better than ctypes?

Mark Reesmark at censof dot com

+Mark Rees

@hexdump42

hex-dump.blogspot.com

contact details

http://www.slideshare.net/hexdump42/pypy-isitreadyforproductionthesequel

http://goo.gl/8IPuX