33
pypy-logo PyPy - How to not write Virtual Machines for Dynamic Languages Armin Rigo Institut für Informatik Heinrich-Heine-Universität Düsseldorf ESUG 2007 Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

PyPy - How to not write Virtual Machines forDynamic Languages

Armin Rigo

Institut für InformatikHeinrich-Heine-Universität Düsseldorf

ESUG 2007

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 2: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

Scope

This talk is about:

implementing dynamic languages(with a focus on complicated ones)in a context of limited resources(academic, open source, or domain-specific)

Complicated = requiring a large VM

Smalltalk (etc...): typically small core VMPython (etc...): the VM contains quite a lot

Limited resourcesOnly near-complete implementations are really usefulMinimize implementer’s duplication of efforts

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 3: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

Scope

This talk is about:

implementing dynamic languages(with a focus on complicated ones)in a context of limited resources(academic, open source, or domain-specific)

Complicated = requiring a large VM

Smalltalk (etc...): typically small core VMPython (etc...): the VM contains quite a lot

Limited resourcesOnly near-complete implementations are really usefulMinimize implementer’s duplication of efforts

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 4: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

Scope

This talk is about:

implementing dynamic languages(with a focus on complicated ones)in a context of limited resources(academic, open source, or domain-specific)

Complicated = requiring a large VM

Smalltalk (etc...): typically small core VMPython (etc...): the VM contains quite a lot

Limited resourcesOnly near-complete implementations are really usefulMinimize implementer’s duplication of efforts

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 5: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

Our point

Our point:

Do not write virtual machines “by hand”Instead, write interpreters in high-level languagesMeta-programming is your friend

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 6: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

Common Approaches to VM construction

Using C directly (or C disguised as another language)

CPythonRubySpidermonkey (Mozilla’s JavaScript VM)but also: Squeak, Scheme48

Building on top of a general-purpose OO VMJython, IronPythonJRuby, IronRuby

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 7: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

Implementing VMs in C

When writing a VM in C it is hard to reconcile:flexibility, maintainabilitysimplicity of the VMperformance (needs dynamic compilation techniques)

Python CaseCPython is a very simple bytecode VM, performance notgreatPsyco is a just-in-time-specializer, very complex, hard tomaintain, but good performanceStackless is a fork of CPython adding microthreads. It wasnever incorporated into CPython for complexity reasons

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 8: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

Implementing VMs in C

When writing a VM in C it is hard to reconcile:flexibility, maintainabilitysimplicity of the VMperformance (needs dynamic compilation techniques)

Python CaseCPython is a very simple bytecode VM, performance notgreatPsyco is a just-in-time-specializer, very complex, hard tomaintain, but good performanceStackless is a fork of CPython adding microthreads. It wasnever incorporated into CPython for complexity reasons

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 9: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

Compilers are a bad encoding of Semantics

to reach good performance levels, dynamic compilation isoften neededa dynamic compiler needs to encode language semanticsthis encoding is often obscure and hard to change

Python CasePsyco is a dynamic compiler for Pythonsynchronizing with CPython’s rapid development is a lot ofeffortmany of CPython’s new features not supported well

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 10: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

Compilers are a bad encoding of Semantics

to reach good performance levels, dynamic compilation isoften neededa dynamic compiler needs to encode language semanticsthis encoding is often obscure and hard to change

Python CasePsyco is a dynamic compiler for Pythonsynchronizing with CPython’s rapid development is a lot ofeffortmany of CPython’s new features not supported well

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 11: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

Fixing of Early Design Decisions

when starting a VM in C, many design decisions need tobe made upfrontexamples: memory management technique, threadingmodelthe decision is manifested throughout the VM sourcevery hard to change later

Python CaseCPython uses reference counting, increfs and decrefseverywhereCPython uses OS threads with one global lock, hard tochange to lightweight threads or finer locking

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 12: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

Fixing of Early Design Decisions

when starting a VM in C, many design decisions need tobe made upfrontexamples: memory management technique, threadingmodelthe decision is manifested throughout the VM sourcevery hard to change later

Python CaseCPython uses reference counting, increfs and decrefseverywhereCPython uses OS threads with one global lock, hard tochange to lightweight threads or finer locking

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 13: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

Implementation Proliferation

restrictions of the original implementation lead tore-implementations, forksall implementations need to be synchronized withlanguage evolutionlots of duplicate effort

Python Caseseveral serious implementations: CPython, Stackless,Psyco, Jython, IronPython, PyPythe implementations have various grades of compliance

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 14: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

Implementation Proliferation

restrictions of the original implementation lead tore-implementations, forksall implementations need to be synchronized withlanguage evolutionlots of duplicate effort

Python Caseseveral serious implementations: CPython, Stackless,Psyco, Jython, IronPython, PyPythe implementations have various grades of compliance

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 15: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

Implementing Languages on Top of General-PurposeOO VMs

users wish to have easy interoperation with thegeneral-purpose OO VMs used by the industry (JVM, CLR)therefore re-implementations of the language on the OOVMs are startedeven more implementation proliferationimplementing on top of an OO VM has its own set ofproblems

Python CaseJython is a Python-to-Java-bytecode compilerIronPython is a Python-to-CLR-bytecode compilerboth are slightly incompatible with the newest CPythonversion (especially Jython)

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 16: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

Implementing Languages on Top of General-PurposeOO VMs

users wish to have easy interoperation with thegeneral-purpose OO VMs used by the industry (JVM, CLR)therefore re-implementations of the language on the OOVMs are startedeven more implementation proliferationimplementing on top of an OO VM has its own set ofproblems

Python CaseJython is a Python-to-Java-bytecode compilerIronPython is a Python-to-CLR-bytecode compilerboth are slightly incompatible with the newest CPythonversion (especially Jython)

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 17: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

Benefits of implementing on top of OO VMs

higher level of implementationthe VM supplies a GC and mostly a JITbetter interoperability than what the C level providessome proponents believe that eventually one single VMshould be enough

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 18: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

The problems of OO VMs

some of the benefits of OO VMs don’t work out in practicemost immediate problem: it can be hard to map conceptsof the dynamic lang to the host OO VMperformance is often not improved, and can be very bad,because of the semantic mismatch between the dynamiclanguage and the host VMpoor interoperability with everything outside the OO VMin practice, one OO VM is not enough

Python CaseJython about 5 times slower than CPythonIronPython is about as fast as CPython (but someintrospection features missing)

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 19: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

The problems of OO VMs

some of the benefits of OO VMs don’t work out in practicemost immediate problem: it can be hard to map conceptsof the dynamic lang to the host OO VMperformance is often not improved, and can be very bad,because of the semantic mismatch between the dynamiclanguage and the host VMpoor interoperability with everything outside the OO VMin practice, one OO VM is not enough

Python CaseJython about 5 times slower than CPythonIronPython is about as fast as CPython (but someintrospection features missing)

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 20: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

PyPy’s Approach to VM Construction

Goal: achieve flexibility, simplicity and performance together

Approach: auto-generate VMs from high-level descriptionsof the language... using meta-programming techniques and aspectshigh-level description: an interpreter written in a high-levellanguage... which we translate (i.e. compile) to VMs running on topof various targets, like C/Posix, CLR, JVM

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 21: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

PyPy

PyPy = Python interpreter written in RPython + translationtoolchain for RPython

What is RPythonRPython is a subset of Pythonsubset chosen in such a way that type-inference can beperformedstill a high-level language (unlike SLang or Prescheme)...really a subset, can’t give a small example of code thatdoesn’t just look like Python :-)

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 22: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

PyPy

PyPy = Python interpreter written in RPython + translationtoolchain for RPython

What is RPythonRPython is a subset of Pythonsubset chosen in such a way that type-inference can beperformedstill a high-level language (unlike SLang or Prescheme)...really a subset, can’t give a small example of code thatdoesn’t just look like Python :-)

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 23: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

Auto-generating VMs

high-level source: early design decisions not necessarywe need a custom translation toolchain to compile theinterpreter to a full VMmany aspects of the final VM are orthogonal to theinterpreter source: they are inserted during translationtranslation aspect ∼= monads, with more ad-hoc control

ExamplesGarbage Collection strategyThreading models (e.g. coroutines with CPS...)non-trivial translation aspect: auto-generating a dynamiccompiler from the interpreter

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 24: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

Auto-generating VMs

high-level source: early design decisions not necessarywe need a custom translation toolchain to compile theinterpreter to a full VMmany aspects of the final VM are orthogonal to theinterpreter source: they are inserted during translationtranslation aspect ∼= monads, with more ad-hoc control

ExamplesGarbage Collection strategyThreading models (e.g. coroutines with CPS...)non-trivial translation aspect: auto-generating a dynamiccompiler from the interpreter

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 25: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

Good Points of the Approach

Simplicity:

dynamic languages can be implemented in a high levellanguageseparation of concerns from low-level detailsa potential single-source-fits-all interpreter – lessduplication of effortsruns everywhere with the same semantics – no outdatedimplementations, no ties to any standard platform

PyPyarguably the most readable Python implementation so far

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 26: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

Good Points of the Approach

Simplicity:

dynamic languages can be implemented in a high levellanguageseparation of concerns from low-level detailsa potential single-source-fits-all interpreter – lessduplication of effortsruns everywhere with the same semantics – no outdatedimplementations, no ties to any standard platform

PyPyarguably the most readable Python implementation so far

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 27: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

Good Points of the Approach

Flexibility at all levels:

when writing the interpreter (high-level languages rule!)when adapting the translation toolchain as necessaryto break abstraction barriers when necessary

Exampleboxed integer objects, represented as tagged pointersmanual system-level RPython code

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 28: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

Good Points of the Approach

Flexibility at all levels:

when writing the interpreter (high-level languages rule!)when adapting the translation toolchain as necessaryto break abstraction barriers when necessary

Exampleboxed integer objects, represented as tagged pointersmanual system-level RPython code

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 29: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

Good Points of the Approach

Performance:

“reasonable” performancecan generate a dynamic compiler from the interpreter(work in progress, 60x faster on very simple Python code)

JIT compiler generator

almost orthogonal from the interpreter source - applicableto many languages, follows language evolution “for free”based on Partial Evaluationbenefits from a high-level interpreter and a tweakabletranslation toolchaingenerating a dynamic compiler is easier than generating astatic one!

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 30: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

Good Points of the Approach

Performance:

“reasonable” performancecan generate a dynamic compiler from the interpreter(work in progress, 60x faster on very simple Python code)

JIT compiler generator

almost orthogonal from the interpreter source - applicableto many languages, follows language evolution “for free”based on Partial Evaluationbenefits from a high-level interpreter and a tweakabletranslation toolchaingenerating a dynamic compiler is easier than generating astatic one!

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 31: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

Open Issues / Drawbacks / Further Work

writing the translation toolchain in the first place takes lotsof effort (but it can be reused)writing a good GC is still necessary. But: maybe we canreuse existing good GCs (e.g. from the Jikes RVM)?conceptually simple approach but many abstraction layersdynamic compiler generation seems to work, but needsmore efforts. Also: can we layer it on top of the JIT of ageneral purpose OO VM?

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 32: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

Conclusion / Meta-Points

high-level languages are suitable to implement dynamiclanguagesdoing so has many benefitsVMs shouldn’t be written by handPyPy’s concrete approach is not so importantdiversity is goodlet’s write more meta-programming toolchains!

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages

Page 33: PyPy - How to not write Virtual Machines for Dynamic Languages · Meta-programming is your friend Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages. pypy-logo

pypy-logo

For more information

PyPy

http://codespeak.net/pypy/

“Sprints”Main way we develop PyPyThey are programming camps, a few days to one weeklongWe may have one in Bern soon (PyPy+Squeak) and/or inGermany (JIT and other topics)

See alsoGoogle for the full paper corresponding to these slides that wassubmitted at Dyla’2007

Armin Rigo PyPy - How to not write Virtual Machines for Dynamic Languages