Thnad's Revenge

  • View

  • Download

Embed Size (px)


At a previous JRubyConf, we talked about Thnad, a fictional programming language. Thnad served as a vehicle to explore the joy of building a compiler using JRuby, BiteScript, Parslet, and other tools. Now, Thnad is back with a second runtime: Rubinius. Come see the Rubinius environment through JRuby eyes. Together, we'll see how to grapple with multiple instruction sets and juggle contexts without going cross-eyed.

Text of Thnad's Revenge

  • 1.Welcome to Thnads Revenge, a programming language implementation tale in three acts.

2. Not to be confused with... 3. Revenge, the awesome Atari video game from the 80s. 4. Cucumber RecipesIan Dees with Aslak Hellesyand Matt Wynne pragprog/titles/JRUBY discount code: JRubyIanDeesBefore we get to the talk, let me make a couple of quick announcements. First, wereupdating the JRuby book this summer with a JRuby 1.7-ready PDF. To celebrate that, wereoffering a discount code on the book during the conference. Second, Im working on a newbook with the Cucumber folks, which has some JRuby/JVM stuff in itif youd like to be atech reviewer, please nd me after this talk. 5. I. Meet ThnadII. Enter the FrenemyIII. Thnads Revenge(with apologies to Ira Glass) Act I, Meet Thnad, in which we encounter Thnad, a programminglanguage built with JRuby and designed not for programmer happiness, but for implementerhappiness. Act II, Enter the Frenemy, in which we meet a new Ruby runtime. Act III, ThnadsRevenge, in which we port Thnad to run on the Rubinius runtime and encounter somesurprises along the way. 6. I. Meet ThnadThnad is a programming language I created last summer as an excuse to learn some funJRuby tools and see what its like to write a compiler. 7. The name comes from a letter invented by Dr. Seuss in his book, On Beyond Zebra. Sincemost of the real letters are already taken by programming languages, a ctional one seemsappropriate. 8. A Fictional ProgrammingLanguage Optimized for Implementer HappinessJust as Ruby is optimized for programmer happiness, Thnad is optimized for implementerhappiness. It was designed to be implemented with a minimum of time and effort, and amaximum amount of fun. 9. function factorial(n) { if (eq(n, 1)) { 1 } else { times(n, factorial(minus(n, 1))) } } print(factorial(4))Heres a sample Thnad program demonstrating all the major features. Thnad has integers,functions, conditionals, and... not much else. These minimal features were easy to add,thanks to the great tools available in the JRuby ecosystem (and other ecosystems, as wellsee). 10. Thnad Features1. Names and Numbers2. Function Calls3. Conditionals4. Function DenitionsIn the next few minutes, were going to trace through each of these four language features,from parsing the source all the way to generating the nal binary. We wont show everysingle grammar rule, but we will hit the high points. 11. As Tom mentioned in his talk, there are a number of phases a piece of source code goesthrough during compilation. 12. Stages of ParsingtokenizeparsetransformemitThese break down into four main stages in a typical language: nding the tokens or parts ofspeech of the text, parsing the tokens into an in-memory tree, transforming the tree, andgenerating the bytecode. Were going to look at each of Thnads major features in thecontext of these stages. 13. 1. Names and NumbersFirst, lets look at the easiest language feature: numbers and function parameters. 14. {:number => 42} root 42 :number "42"Our parser needs to transform this input text into some kind of Ruby data structure. 15. used a library called Parslet for that. Parslet handles the rst two stages of compilation(tokenizing and parsing) using a Parsing Expression Grammar, or PEG. PEGs are like regularexpressions attached to blocks of code. They sound like a hack, but theres solid compilertheory behind them. 16. {:number => 42}root42 :number"42" rule(:number) { match([0-9]).repeat(1).as(:number) >> space? }The rule at the bottom of the page is Parslets notation for matching one or more numbersfollowed by a optional space. 17. {:number => 42} :number :value"42"42 rule(:number => simple(:value)) { }Now for the third stage, transformation. We could generate the bytecode straight from theoriginal tree, using a bunch of hard-to-test case statements. But it would be nicer to have aspecic Ruby class for each Thnad language feature. The rule at the bottom of this slide tellsParslet to transform a Hash with a key called :number into an instance of a Number class weprovide. 18. BiteScript github/headius/bitescriptThe nal stage, outputting bytecode, is handled by the BiteScript library, which is basically adomain-specic language for emitting JVM opcodes. 19. main do ldc 42 ldc 1 invokestatic :Example, :baz, [int, int, int] returnvoid endHeres an example, just to get an idea of the avor. To call a method, you just push thearguments onto the stack and then call a specic opcode, in this case invokestatic. The VMyoure writing for is aware of classes, interfaces, and so onyou dont have to implementmethod lookup like you would with plain machine code. 20. JVM Bytecode for Dummies Charles Nutter, redev 2010 slideshare/CharlesNutter/redev-2010-jvm-bytecode-for-dummiesWhen I rst saw the BiteScript, I thought it was something youd only need if you were doingdeep JVM hacking. But when I read the slides from Charlies presentation at redev, itclicked. This library takes me way back to my college days, when wed write assemblerprograms for a really simple instruction set like MIPS. BiteScript evokes that same kind offeeling. Id always thought the JVM would have a huge, crufty instruction setbut its actuallyquite manageable to keep the most important parts of it in your head. 21. class Number < :value def eval(context, builder) builder.ldc value end endWe can generate the bytecode any way we want. One simple way is to give each of ourclasses an eval() method that takes a BiteScript generator and calls various methods on it togenerate JVM instructions. 22. class Name < :name def eval(context, builder) param_names = context[:params] || [] position= param_names.index(name) raise "Unknown parameter #{name}" unless position builder.iload position end endDealing with passed-in parameters is nearly as easy as dealing with raw integers; we justlook up the parameter name by position, and then push the nth parameter onto the stack. 23. 2. Function CallsThe next major feature is function calls. Once we have those, we will be able to run a trivialThnad program. 24. {:funcall => {:name => baz,:args => [{:arg => {:number => 42}}]}}{:arg => {:name => foo}}]}} rootbaz(42, foo):funcall :name :args "baz":arg :arg :number :name "42""foo"Were going to move a little faster here, to leave time for Rubinius. Here, we want totransform this source code into this Ruby data structure representing a function call. 25. foo, [] rootThnad::Funcall:name :args"foo" Thnad::Number :value42Now, we want to transform generic Ruby data structures into purpose-built ones that we canattach bytecode-emitting behavior to. 26. class Funcall < :name, :argsdef eval(context, builder)args.each { |a| a.eval(context, builder) }types = [] * (args.length + 1)builder.invokestatic builder.class_builder, name, typesendendThe bytecode for a function call is really simple in BiteScript. All functions in Thnad are staticmethods on a single class. 27. 3. ConditionalsThe rst two features weve dened are enough to write simple programs like print(42). Thenext two features will let us add conditionals and custom functions. 28. {:cond =>{:number => 0},:if_true => {:body => {:number => 42}},:if_false => {:body => {:number => 667}}}if (0) {42 root } else {667:cond:if_true:if_false } :number :body :body "0" :number:number "42""667"A conditional consists of the if keyword, followed by a body of code inside braces, then theelse keyword, followed by another body of code in braces. 29.,, :cond:if_true :if_false Thnad::Number Thnad::Number Thnad::Number :value:value :value 0 42 667Heres the transformed tree representing a set of custom Ruby classes. 30. class Conditional < :cond, :if_true, :if_falsedef eval(context, builder)cond.eval context, builderbuilder.ifeq :elseif_true.eval context, builderbuilder.goto :endifbuilder.label :elseif_false.eval context, builderbuilder.label :endifendendThe bytecode emitter for conditionals has a new twist. The Conditional struct points to threeother Thnad nodes. It needs to eval() them at the right time to emit their bytecode inbetween all the zero checks and gotos. 31. 4. Function DenitionsOn to the nal piece of Thnad: dening new functions. 32. {:func => {:name => foo},:params =>{:param => {:name => x}},:body =>{:number => 5}}function foo(x) { root5 }:func :params :body:name:param:number"foo" :name "5" "x"A function denition looks a lot like a function call, but with a body attached to it. 33. foo,[], Thnad::Function:name :params :body"foo"Thnad::NameThnad::Number :name :value "x"5Heres the transformation we want to perform for this language feature. 34. class Function < :name, :params, :bodydef eval(context, builder)param_names = [params][:params] = param_namestypes = [] * (param_names.count + 1) builder.public_static_method(, [], *types) do |method|self.body.eval(context, method)method.ireturnendendendSince all Thnad parameters and return types are integers, emitting a function denition isreally easy. We count the parameters so that we can give the JVM a correct signature. Then,we just pass a block to the public_static_method helper, a feature of BiteScript that willinspire the Rubinius work later on. 35. CompilerWeve seen how to generate individual chunks of bytecode; how do they all get stitchedtogether into a .class le? 36. builder = dopublic_class classname, object do |klass| # ... klass.public_static_method main, [], void,