Building Interpreters with PyPy

Preview:

Citation preview

Building Interpreters with PyPy

About me

• Computer science bachelor student at TU Berlin!

• Programming/Python since ~2008!

• Primarily involved with Pocoo projects (Sphinx, Werkzeug, Flask, Babel, …)

PyPy Python Interpreter

• Fast Python implementation!

• Just-in-Time compilation!

• Proper garbage collection (no reference counting)!

• Written in Python

PyPy Translation Toolchain

• Capable of compiling (R)Python!

• Garbage collection!

• Tracing just-in-time compiler generator!

• Software transactional memory?

PyPy based interpreters• Topaz (Ruby)!

• HippyVM (PHP)!

• Pyrolog (Prolog)!

• pycket (Racket)!

• Various other interpreters for (Scheme, Javascript, io, Gameboy)

RPython• Python subset!

• Statically typed!

• Garbage collected!

• Standard library almost entirely unavailable!

• Some missing builtins (print, open(), …)!

• rpython.rlib!

• exceptions are (sometimes) ignored!

• Not a really a language, rather a "state"

Hello RPython# hello_rpython.pyimport os!

def entry_point(argv): os.write(2, “Hello, World!\n”) return 0!

def target(driver, argv): return entry_point, None

$ rpython hello_rpython.py…$ ./hello_python-cHello, RPython!

Goal

• BASIC interpreter capable of running Hamurabi!

• Bytecode based!

• Garbage Collection!

• Just-In-Time Compilation

Live play session

Architecture

Parser

Compiler

Virtual Machine

AST

Bytecode

Source

10 PRINT TAB(32);"HAMURABI"20 PRINT TAB(15);"CREATIVE COMPUTING MORRISTOWN, NEW JERSEY"30 PRINT:PRINT:PRINT80 PRINT "TRY YOUR HAND AT GOVERNING ANCIENT SUMERIA"90 PRINT "FOR A TEN-YEAR TERM OF OFFICE.":PRINT95 D1=0: P1=0100 Z=0: P=95:S=2800: H=3000: E=H-S110 Y=3: A=H/Y: I=5: Q=1210 D=0215 PRINT:PRINT:PRINT "HAMURABI: I BEG TO REPORT TO YOU,": Z=Z+1217 PRINT "IN YEAR";Z;",";D;"PEOPLE STARVED,";I;"CAME TO THE CITY,"218 P=P+I227 IF Q>0 THEN 230228 P=INT(P/2)229 PRINT "A HORRIBLE PLAGUE STRUCK! HALF THE PEOPLE DIED."230 PRINT "POPULATION IS NOW";P232 PRINT "THE CITY NOW OWNS ";A;"ACRES."235 PRINT "YOU HARVESTED";Y;"BUSHELS PER ACRE."250 PRINT "THE RATS ATE";E;"BUSHELS."260 PRINT "YOU NOW HAVE ";S;"BUSHELS IN STORE.": PRINT270 REM *** MORE CODE THAT DID NOT FIT INTO THE SLIDE FOLLOWS

Parser

Parser

Abstract Syntax Tree (AST)

Source

Parser

Parser

AST

SourceLexer

Tokens

Source

Parser

AST

RPLY

• Based on PLY, which is based on Lex and Yacc!

• Lexer generator!

• LALR parser generator

Lexerfrom rply import LexerGenerator!

lg = LexerGenerator()!

lg.add(“NUMBER”, “[0-9]+”)# …lg.ignore(“ +”) # whitespace!

lexer = lg.build().lex

lg.add('NUMBER', r'[0-9]*\.[0-9]+')lg.add('PRINT', r'PRINT')lg.add('IF', r'IF')lg.add('THEN', r'THEN')lg.add('GOSUB', r'GOSUB')lg.add('GOTO', r'GOTO')lg.add('INPUT', r'INPUT')lg.add('REM', r'REM')lg.add('RETURN', r'RETURN')lg.add('END', r'END')lg.add('FOR', r'FOR')lg.add('TO', r'TO')lg.add('NEXT', r'NEXT')lg.add('NAME', r'[A-Z][A-Z0-9$]*')lg.add('(', r'\(')lg.add(')', r'\)')lg.add(';', r';')lg.add('STRING', r'"[^"]*"')

lg.add(':', r'\r?\n')lg.add(':', r':')lg.add('=', r'=')lg.add('<>', r'<>')lg.add('-', r'-')lg.add('/', r'/')lg.add('+', r'\+')lg.add('>=', r'>=')lg.add('>', r'>')lg.add('***', r'\*\*\*.*')lg.add('*', r'\*')lg.add('<=', r'<=')lg.add('<', r'<')

>>> from basic.lexer import lex>>> source = open("hello.bas").read()>>> for token in lex(source):... print tokenToken("NUMBER", "10")Token("PRINT", "PRINT")Token("STRING",'"HELLO BASIC!"')Token(":", "\n")

Grammar

• A set of formal rules that defines the syntax!

• terminals = tokens!

• nonterminals = rules defining a sequence of one or more (non)terminals

10 PRINT TAB(32);"HAMURABI"20 PRINT TAB(15);"CREATIVE COMPUTING MORRISTOWN, NEW JERSEY"30 PRINT:PRINT:PRINT80 PRINT "TRY YOUR HAND AT GOVERNING ANCIENT SUMERIA"90 PRINT "FOR A TEN-YEAR TERM OF OFFICE.":PRINT95 D1=0: P1=0100 Z=0: P=95:S=2800: H=3000: E=H-S110 Y=3: A=H/Y: I=5: Q=1210 D=0215 PRINT:PRINT:PRINT "HAMURABI: I BEG TO REPORT TO YOU,": Z=Z+1217 PRINT "IN YEAR";Z;",";D;"PEOPLE STARVED,";I;"CAME TO THE CITY,"218 P=P+I227 IF Q>0 THEN 230228 P=INT(P/2)229 PRINT "A HORRIBLE PLAGUE STRUCK! HALF THE PEOPLE DIED."230 PRINT "POPULATION IS NOW";P232 PRINT "THE CITY NOW OWNS ";A;"ACRES."235 PRINT "YOU HARVESTED";Y;"BUSHELS PER ACRE."250 PRINT "THE RATS ATE";E;"BUSHELS."260 PRINT "YOU NOW HAVE ";S;"BUSHELS IN STORE.": PRINT270 REM *** MORE CODE THAT DID NOT FIT INTO THE SLIDE FOLLOWS

program :program : lineprogram : line program

line : NUMBER statements

statements : statementstatements : statement statements

statement : PRINT :statement : PRINT expressions :expressions : expressionexpressions : expression ;expressions : expression ; expressions

statement : NAME = expression :

statement : IF expression THEN number :

statement : INPUT name :

statement : GOTO NUMBER :statement : GOSUB NUMBER :statement : RETURN :

statement : REM *** :

statement : FOR NAME = NUMBER TO NUMBER :statement : NEXT NAME :

statement : END :

expression : NUMBERexpression : NAMEexpression : STRINGexpression : operationexpression : ( expression )expression : NAME ( expression )

operation : expression + expressionoperation : expression - expressionoperation : expression * expressionoperation : expression / expressionoperation : expression <= expressionoperation : expression < expressionoperation : expression = expressionoperation : expression <> expressionoperation : expression > expressionoperation : expression >= expression

from rply.token import BaseBox!class Program(BaseBox): def __init__(self, lines): self.lines = lines

AST

class Line(BaseBox): def __init__(self, lineno, statements): self.lineno = lineno self.statements = statements

class Statements(BaseBox): def __init__(self, statements): self.statements = statements

class Print(BaseBox): def __init__(self, expressions, newline=True): self.expressions = expressions self.newline = newline

from rply import ParserGenerator!pg = ParserGenerator(["NUMBER", "PRINT", …])

Parser

@pg.production("program : ")@pg.production("program : line")@pg.production("program : line program")def program(p): if len(p) == 2: return Program([p[0]] + p[1].get_lines()) return Program(p)

@pg.production("line : number statements")def line(p): return Line(p[0], p[1].get_statements())

@pg.production("op : expression + expression")@pg.production("op : expression * expression")def op(p): if p[1].gettokentype() == "+": return Add(p[0], p[2]) elif p[1].gettokentype() == "*": return Mul(p[0], p[2])

pg = ParserGenerator([…], precedence=[ ("left", ["+", "-"]), ("left", ["*", "/"])])

parse = pg.build().parse

Compiler/Virtual Machine

Compiler

Virtual Machine

AST

Bytecode

class VM(object): def __init__(self, program): self.program = program

class VM(object): def __init__(self, program): self.program = program self.pc = 0

class VM(object): def __init__(self, program): self.program = program self.pc = 0 self.frames = []

class VM(object): def __init__(self, program): self.program = program self.pc = 0 self.frames = [] self.iterators = []

class VM(object): def __init__(self, program): self.program = program self.pc = 0 self.frames = [] self.iterators = [] self.stack = []

class VM(object): def __init__(self, program): self.program = program self.pc = 0 self.frames = [] self.iterators = {} self.stack = [] self.variables = {}

class VM(object): … def execute(self): while self.pc < len(self.program.instructions): self.execute_bytecode(self.program.instructions[self.pc])

class VM(object): … def execute_bytecode(self, code): raise NotImplementedError(code)

class VM(object): ... def execute_bytecode(self): if isinstance(code, TYPE): self.execute_TYPE(code) ... else: raise NotImplementedError(code)

class Program(object): def __init__(self): self.instructions = []

Bytecode

class Instruction(object): pass

class Number(Instruction): def __init__(self, value): self.value = value!class String(Instructions): def __init__(self, value): self.value = value

class Print(Instruction): def __init__(self, expressions, newline): self.expressions = expressions self.newline = newline

class Call(Instruction): def __init__(self, function_name): self.function_name = function_name

class Let(Instruction): def __init__(self, name): self.name = name

class Lookup(Instruction): def __init__(self, name): self.name = name

class Add(Instruction): pass!class Sub(Instruction): pass!class Mul(Instruction): pass!class Equal(Instruction): pass!...

class GotoIfTrue(Instruction): def __init__(self, target): self.target = target!class Goto(Instruction): def __init__(self, target, with_frame=False): self.target = target self.with_frame = with_frame!class Return(Instruction): pass

class Input(object): def __init__(self, name): self.name = name

class For(Instruction): def __init__(self, variable): self.variable = variable!class Next(Instruction): def __init__(self, variable): self.variable = variable

class Program(object): def __init__(self): self.instructions = [] self.lineno2instruction = {}! def __enter__(self): return self! def __exit__(self, exc_type, exc_value, tb): if exc_type is None: for i, instruction in enumerate(self.instructions): instruction.finalize(self, i)

def finalize(self, program, index): self.target = program.lineno2instruction[self.target]

class Program(BaseBox): … def compile(self): with bytecode.Program() as program: for line in self.lines: line.compile(program) return program

class Line(BaseBox): ... def compile(self, program): program.lineno2instruction[self.lineno] = len(program.instructions) for statement in self.statements: statement.compile(program)

class Line(BaseBox): ... def compile(self, program): program.lineno2instruction[self.lineno] = len(program.instructions) for statement in self.statements: statement.compile(program)

class Print(Statement): def compile(self, program): for expression in self.expressions: expression.compile(program) program.instructions.append( bytecode.Print( len(self.expressions), self.newline ) )

class Print(Statement): ... def compile(self, program): for expression in self.expressions: expression.compile(program) program.instructions.append( bytecode.Print( len(self.expressions), self.newline ) )

class Let(Statement): ... def compile(self, program): self.value.compile(program) program.instructions.append( bytecode.Let(self.name) )

class Input(Statement): ... def compile(self, program): program.instructions.append( bytecode.Input(self.variable) )

class Goto(Statement): ... def compile(self, program): program.instructions.append( bytecode.Goto(self.target) )!class Gosub(Statement): ... def compile(self, program): program.instructions.append( bytecode.Goto( self.target, with_frame=True ) )!class Return(Statement): ... def compile(self, program): program.instructions.append( bytecode.Return() )

class For(Statement): ... def compile(self, program): self.start.compile(program) program.instructions.append( bytecode.Let(self.variable) ) self.end.compile(program) program.instructions.append( bytecode.For(self.variable) )

class WrappedObject(object): pass!class WrappedString(WrappedObject): def __init__(self, value): self.value = value!class WrappedFloat(WrappedObject): def __init__(self, value): self.value = value

class VM(object): … def execute_number(self, code): self.stack.append(WrappedFloat(code.value)) self.pc += 1! def execute_string(self, code): self.stack.append(WrappedString(code.value)) self.pc += 1

class VM(object): … def execute_call(self, code): argument = self.stack.pop() if code.function_name == "TAB": self.stack.append(WrappedString(" " * int(argument))) elif code.function_name == "RND": self.stack.append(WrappedFloat(random.random())) ... self.pc += 1

class VM(object): … def execute_let(self, code): value = self.stack.pop() self.variables[code.name] = value self.pc += 1! def execute_lookup(self, code): value = self.variables[code.name] self.stack.append(value) self.pc += 1

class VM(object): … def execute_add(self, code): right = self.stack.pop() left = self.stack.pop() self.stack.append(WrappedFloat(left + right)) self.pc += 1

class VM(object): … def execute_goto_if_true(self, code): condition = self.stack.pop() if condition: self.pc = code.target else: self.pc += 1

class VM(object): … def execute_goto(self, code): if code.with_frame: self.frames.append(self.pc + 1) self.pc = code.target

class VM(object): … def execute_return(self, code): self.pc = self.frames.pop()

class VM(object): … def execute_input(self, code): value = WrappedFloat(float(raw_input() or “0.0”)) self.variables[code.name] = value self.pc += 1

class VM(object): … def execute_for(code): self.pc += 1 self.iterators[code.variable] = ( self.pc, self.stack.pop() )

class VM(object): … def execute_next(self, code): loop_begin, end = self.iterators[code.variable] current_value = self.variables[code.variable].value next_value = current_value + 1.0 if next_value <= end: self.variables[code.variable] = \ WrappedFloat(next_value) self.pc = loop_begin else: del self.iterators[code.variable] self.pc += 1

def entry_point(argv): try: filename = argv[1] except IndexError: print(“You must supply a filename”) return 1 content = read_file(filename) tokens = lex(content) ast = parse(tokens) program = ast.compile() vm = VM(program) vm.execute() return 0

Entry Point

JIT (in PyPy)1. Identify “hot" loops!

2. Create trace inserting guards based on observed values!

3. Optimize trace!

4. Compile trace!

5. Execute machine code instead of interpreter

from rpython.rlib.jit import JitDriver!jitdriver = JitDriver( greens=[“pc”, “vm”, “program”, “frames”, “iterators”], reds=[“stack”, “variables"])

class VM(object): … def execute(self): while self.pc < len(self.program.instructions): jitdriver.merge_point( vm=self, pc=self.pc, … )

Benchmark10 N = 120 IF N <= 10000 THEN 4030 END40 GOSUB 10050 IF R = 0 THEN 7060 PRINT "PRIME"; N70 N = N + 1: GOTO 20100 REM *** ISPRIME N -> R110 IF N <= 2 THEN 170120 FOR I = 2 TO (N - 1)130 A = N: B = I: GOSUB 200140 IF R <> 0 THEN 160150 R = 0: RETURN160 NEXT I170 R = 1: RETURN200 REM *** MOD A -> B -> R210 R = A - (B * INT(A / B))220 RETURN

cbmbasic 58.22s

basic-c 5.06s

basic-c-jit 2.34s

Python implementation (CPython) 2.83s

Python implementation (PyPy) 0.11s

C implementation 0.03s

Questions?

These slides are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License