Charles nutter star techconf 2011 - jvm languages

Preview:

Citation preview

Building Languagesfor the JVM

Charles Oliver Nutter

Me

• Charles Oliver Nutter

• JRuby and JVM guy

• “headius” on most services

• headius@headius.com

Who Was I?

• Java EE architect

• Successfully!

• Never wrote a parser

• Never wrote a compiler

• But I wanted to learn how...

You

• Java?

• Ruby?

• Python?

• C#?

• Other?

Why Create Languages?

• Nothing is perfect

• New problems need new solutions

• Language design can be fun

• Fame and fortune?

• Not really.

Why Impl a Language?

• To learn it?

• Sort of...

• To learn the target platform

• Definitely!

• Fame and fortune?

• Well...getting there...

Challenges

• Community

• Platform

• Specifications

• Resources

Community

• Investment in status quo

• Afraid to stand out

• Known quantities

• Everything else sucks

• Gotta get paid!

Platform

• Matching language semantics

• JVM designed around Java

• JVM hides underlying platform

• Challenging to use

• Not bad...C/++ would be way worse

• Community may hate it ;-)

Specifications

• Incomplete

• Ruby had none for years

• ...and no complete test suites

• Difficult to implement

• Low level features

• Single-implementation quirks

• Hard or impossible to optimize

Resources

• You gotta eat

• Not much money in language work

• Some parts are hard

• OSS is a necessity

Why JVM?

• Because I am lazy

• Because VMs are *hard*

• Because I can’t be awesome at everything

Ok, Why Really?

• Cross-platform

• Libraries

• Languages

• Memory management

• Tools

• OSS

Cross-platform

• OpenJDK: Linux, Windows, Solaris, OS X, xBSD

• J9: Linux, zLinux, AS/400, ...

• HP: OpenVMS, HP/UX, ...

• Dalvik (Android): Linux on ARM, x86

Libraries

• For any need, a dozen libraries

• And a couple of them are good!

• Cross-platform

• Leading edge

Selection of languages

• Java

• Scala

• Clojure

• JRuby

• Mirah

• Jython, Groovy, Fantom, Kotlin, Ceylon, ...

Memory management

• Best GCs in the world

• Fastest object allocation

• Safe escape hatches like NIO

Tools

• Debugging

• Profiling

• Monitoring

• JVM internals

Open source?

• FOSS reference impl (OpenJDK)

• Mostly OSS libraries

• Heavy OSS culture

• Strong OSS influence in OpenJDK core

Case Study: JRuby

Ruby on the JVM

• All of Ruby’s power and beauty

• Solid VM underneath

• “Just another JVM language”

JVM Language

• Full interop with Java

• Tricky to do...

• Very rewarding for 99% case

• VM concerns solved

• No need to write a GC

• No need to write a JIT

• ...oh, but wait...

More than a JVM language

• Use native code where JDK fails us

• Paper over ugly bits like CLASSPATH

• Matching Ruby semantics exactly*

• Push JVM forward too!

Playing with JRuby

• Simple IRB demo

• JRuby on Rails - see Jano’s talk tomorrow

• JRuby performance

• PotC (???)

How did we do it?

JRuby Architecture

• Parser

• Abstract Syntax Tree (AST)

• Intermediate Representation (IR)

• Core classes

• Compiler

Parser

• Port of MRI’s Bison grammar

• “Jay” parser generator for Java

• Hand-written lexer

• Nearly as fast as the C version

• ...once it gets going

system ~/projects/jruby $ jruby -y -e "1 + 1"push state 0 value nullreducestate 0 uncover 0rule (1) $$1 :goto from state 0 to 2push state 2 value nulllexstate 2 reading tIDENTIFIER value Token { Value=load, Position=file:/Users/headius/projects/jruby/lib/jruby.jar!/jruby/kernel.rb:6}shiftfrom state 2 to 33push state 33value Token { Value=load, Position=file:/Users/headius/projects/jruby/lib/jruby.jar!/jruby/kernel.rb:6}lexstate 33reading tSTRING_BEG value Token { Value=', Position=file:/Users/headius/projects/jruby/lib/jruby.jar!/jruby/kernel.rb:6}reducestate 33uncover 2rule (487) operation : tIDENTIFIERgoto from state 2 to 62push state 62value Token { Value=load, Position=file:/Users/headius/projects/jruby/lib/jruby.jar!/jruby/kernel.rb:6}reducestate 62uncover 62 rule (252) $$6 :

system ~/projects/jruby $ jruby -y -e "1 + 1"push state 0 value nullreducestate 0 uncover 0rule (1) $$1 :goto from state 0 to 2push state 2 value nulllexstate 2 reading tIDENTIFIER value Token { Value=load, Position=file:/Users/headius/projects/jruby/lib/jruby.jar!/jruby/kernel.rb:6}shiftfrom state 2 to 33push state 33value Token { Value=load, Position=file:/Users/headius/projects/jruby/lib/jruby.jar!/jruby/kernel.rb:6}lexstate 33reading tSTRING_BEG value Token { Value=', Position=file:/Users/headius/projects/jruby/lib/jruby.jar!/jruby/kernel.rb:6}reducestate 33uncover 2rule (487) operation : tIDENTIFIERgoto from state 2 to 62push state 62value Token { Value=load, Position=file:/Users/headius/projects/jruby/lib/jruby.jar!/jruby/kernel.rb:6}reducestate 62uncover 62 rule (252) $$6 :

You will never need this.

public class RubyYaccLexer {    public static final Encoding UTF8_ENCODING = UTF8Encoding.INSTANCE;    public static final Encoding USASCII_ENCODING = USASCIIEncoding.INSTANCE;    public static final Encoding ASCII8BIT_ENCODING = ASCIIEncoding.INSTANCE;        private static ByteList END_MARKER = new ByteList(new byte[] {'_', 'E', 'N', 'D', '_', '_'});    private static ByteList BEGIN_DOC_MARKER = new ByteList(new byte[] {'b', 'e', 'g', 'i', 'n'});    private static ByteList END_DOC_MARKER = new ByteList(new byte[] {'e', 'n', 'd'});    private static final HashMap<String, Keyword> map;

    static {        map = new HashMap<String, Keyword>();

        map.put("end", Keyword.END);        map.put("else", Keyword.ELSE);        map.put("case", Keyword.CASE);        map.put("ensure", Keyword.ENSURE);        map.put("module", Keyword.MODULE);        map.put("elsif", Keyword.ELSIF);        map.put("def", Keyword.DEF);        map.put("rescue", Keyword.RESCUE);        map.put("not", Keyword.NOT);        map.put("then", Keyword.THEN);        map.put("yield", Keyword.YIELD);        map.put("for", Keyword.FOR);        map.put("self", Keyword.SELF);        map.put("false", Keyword.FALSE);

    public enum Keyword {        END ("end", Tokens.kEND, Tokens.kEND, LexState.EXPR_END),        ELSE ("else", Tokens.kELSE, Tokens.kELSE, LexState.EXPR_BEG),        CASE ("case", Tokens.kCASE, Tokens.kCASE, LexState.EXPR_BEG),        ENSURE ("ensure", Tokens.kENSURE, Tokens.kENSURE, LexState.EXPR_BEG),        MODULE ("module", Tokens.kMODULE, Tokens.kMODULE, LexState.EXPR_BEG),        ELSIF ("elsif", Tokens.kELSIF, Tokens.kELSIF, LexState.EXPR_BEG),        DEF ("def", Tokens.kDEF, Tokens.kDEF, LexState.EXPR_FNAME),        RESCUE ("rescue", Tokens.kRESCUE, Tokens.kRESCUE_MOD, LexState.EXPR_MID),        NOT ("not", Tokens.kNOT, Tokens.kNOT, LexState.EXPR_BEG),        THEN ("then", Tokens.kTHEN, Tokens.kTHEN, LexState.EXPR_BEG),        YIELD ("yield", Tokens.kYIELD, Tokens.kYIELD, LexState.EXPR_ARG),        FOR ("for", Tokens.kFOR, Tokens.kFOR, LexState.EXPR_BEG),        SELF ("self", Tokens.kSELF, Tokens.kSELF, LexState.EXPR_END),        FALSE ("false", Tokens.kFALSE, Tokens.kFALSE, LexState.EXPR_END),        RETRY ("retry", Tokens.kRETRY, Tokens.kRETRY, LexState.EXPR_END),        RETURN ("return", Tokens.kRETURN, Tokens.kRETURN, LexState.EXPR_MID),        TRUE ("true", Tokens.kTRUE, Tokens.kTRUE, LexState.EXPR_END),        IF ("if", Tokens.kIF, Tokens.kIF_MOD, LexState.EXPR_BEG),        DEFINED_P ("defined?", Tokens.kDEFINED, Tokens.kDEFINED, LexState.EXPR_ARG),

    private int yylex() throws IOException {        int c;        boolean spaceSeen = false;        boolean commandState;                if (lex_strterm != null) {            int tok = lex_strterm.parseString(this, src);            if (tok == Tokens.tSTRING_END || tok == Tokens.tREGEXP_END) {                lex_strterm = null;                setState(LexState.EXPR_END);            }

            return tok;        }

        commandState = commandStart;        commandStart = false;

        loop: for(;;) {            c = src.read();            switch(c) {

            case '<':                return lessThan(spaceSeen);            case '>':                return greaterThan();            case '"':                return doubleQuote();            case '`':                return backtick(commandState);            case '\'':                return singleQuote();            case '?':                return questionMark();            case '&':                return ampersand(spaceSeen);            case '|':                return pipe();            case '+':                return plus(spaceSeen);

    private int lessThan(boolean spaceSeen) throws IOException {        int c = src.read();        if (c == '<' && lex_state != LexState.EXPR_DOT && lex_state != LexState.EXPR_CLASS &&                !isEND() && (!isARG() || spaceSeen)) {            int tok = hereDocumentIdentifier();                        if (tok != 0) return tok;        }                determineExpressionState();                switch (c) {        case '=':            if ((c = src.read()) == '>') {                yaccValue = new Token("<=>", getPosition());                return Tokens.tCMP;

%%program : {                  lexer.setState(LexState.EXPR_BEG);                  support.initTopLocalVariables();              } top_compstmt {  // ENEBO: Removed !compile_for_eval which probably is to reduce warnings                  if ($2 != null) {                      /* last expression should not be void */                      if ($2 instanceof BlockNode) {                          support.checkUselessStatement($<BlockNode>2.getLast());                      } else {                          support.checkUselessStatement($2);                      }                  }                  support.getResult().setAST(support.addRootNode($2, support.getPosition($2)));              }

stmt : kALIAS fitem {                    lexer.setState(LexState.EXPR_FNAME);                } fitem {                    $$ = support.newAlias($1.getPosition(), $2, $4);                }                | kALIAS tGVAR tGVAR {                    $$ = new VAliasNode($1.getPosition(), (String) $2.getValue(), (String) $3.getValue());                }                | kALIAS tGVAR tBACK_REF {                    $$ = new VAliasNode($1.getPosition(), (String) $2.getValue(), "$" + $<BackRefNode>3.getType());                }                | kALIAS tGVAR tNTH_REF {                    support.yyerror("can't make alias for the number variables");                }                | kUNDEF undef_list {                    $$ = $2;                }                | stmt kIF_MOD expr_value {                    $$ = new IfNode(support.getPosition($1), support.getConditionNode($3), $1, null);                }

  public Object yyparse (RubyYaccLexer yyLex) throws java.io.IOException {    if (yyMax <= 0) yyMax = 256;"" " // initial size    int yyState = 0, yyStates[] = new int[yyMax];" // state stack    Object yyVal = null, yyVals[] = new Object[yyMax];" // value stack    int yyToken = -1;" " " " " // current input    int yyErrorFlag = 0;" " " " // #tokens to shift

    yyLoop: for (int yyTop = 0;; ++ yyTop) {      if (yyTop >= yyStates.length) {" " " // dynamically increase        int[] i = new int[yyStates.length+yyMax];        System.arraycopy(yyStates, 0, i, 0, yyStates.length);        yyStates = i;        Object[] o = new Object[yyVals.length+yyMax];        System.arraycopy(yyVals, 0, o, 0, yyVals.length);        yyVals = o;      }      yyStates[yyTop] = yyState;      yyVals[yyTop] = yyVal;      if (yydebug != null) yydebug.push(yyState, yyVal);

        if (state == null) {            yyVal = yyDefault(yyV > yyTop ? null : yyVals[yyV]);        } else {            yyVal = state.execute(support, lexer, yyVal, yyVals, yyTop);        }

states[23] = new ParserState() {  public Object execute(ParserSupport support, RubyYaccLexer lexer, Object yyVal, Object[] yyVals, int yyTop) {                    yyVal = new IfNode(support.getPosition(((Node)yyVals[-2+yyTop])), support.getConditionNode(((Node)yyVals[0+yyTop])), ((Node)yyVals[-2+yyTop]), null);    return yyVal;  }};states[24] = new ParserState() {  public Object execute(ParserSupport support, RubyYaccLexer lexer, Object yyVal, Object[] yyVals, int yyTop) {                    yyVal = new IfNode(support.getPosition(((Node)yyVals[-2+yyTop])), support.getConditionNode(((Node)yyVals[0+yyTop])), null, ((Node)yyVals[-2+yyTop]));    return yyVal;  }};

Never look at this.

AST

• Interpreted directly

• Specialized in places

• Large and rich

$ ast -e "a = true; if a; 2; else; 3; end"AST:RootNode 0 BlockNode 0 NewlineNode 0 LocalAsgnNode:a 0 TrueNode:true 0 NewlineNode 0 IfNode 0 LocalVarNode:a 0 NewlineNode 0 FixnumNode 0 NewlineNode 0 FixnumNode 0

public class IfNode extends Node {    private final Node condition;    private final Node thenBody;    private final Node elseBody;

    public IfNode(ISourcePosition position, Node condition, Node thenBody, Node elseBody) {        super(position);                assert condition != null : "condition is not null";// assert thenBody != null : "thenBody is not null";// assert elseBody != null : "elseBody is not null";                this.condition = condition;        this.thenBody = thenBody;        this.elseBody = elseBody;    }

    @Override    public IRubyObject interpret(Ruby runtime, ThreadContext context, IRubyObject self, Block aBlock) {        ISourcePosition position = getPosition();

        context.setFile(position.getFile());        context.setLine(position.getStartLine());

        IRubyObject result = condition.interpret(runtime, context, self, aBlock);                if (result.isTrue()) {            return thenBody == null ? runtime.getNil() : thenBody.interpret(runtime, context, self, aBlock);        } else {            return elseBody == null ? runtime.getNil() : elseBody.interpret(runtime, context, self, aBlock);        }    }

IR (future work)

• Control flow graph

• Ruby-specific instruction set

• Optimizing compiler

• Ruby-level optimizations

jruby -e “1 + 1”

2011-11-04T05:23:09.375-03:00: IR_Printer: instrs: 0 %self = recv_self 1 %block(0:0) = recv_closure 2 file_name(-e) 3 line_num(0) 4 %v_0 = call(+, 1:fixnum, [1:fixnum]) 5 return(%v_0)2011-11-04T05:23:09.375-03:00: IR_Printer: live variables: %v_0: 4-5

a = 1; while a < 10; puts a; a += 1; end

2011-11-04T05:25:23.517-03:00: IR_Printer: instrs: 0%self = recv_self 1%block(0:0) = recv_closure 2file_name(-e) 3line_num(0) 4a(0:1) = 1:fixnum 5_LOOP_BEGIN_0: 6%v_1 = call(<, a(0:1), [10:fixnum]) 7beq(%v_1, true, _ITER_BEGIN_0) 8%v_0 = nil 9jump _LOOP_END_0 10 _ITER_BEGIN_0: 11 %v_2 = call(puts, %self, [a(0:1)]) 12 %v_3 = call(+, a(0:1), [1:fixnum]) 13 a(0:1) = copy(%v_3) 14 %v_0 = copy(%v_3) 15 thread_poll 16 _ITER_END_0: 17 jump _LOOP_BEGIN_0 18 _LOOP_END_0: 19 return(%v_0)2011-11-04T05:25:23.517-03:00: IR_Printer: live variables: %v_0: 8-19 %v_1: 6-7 %v_3: 12-14

2011-11-04T05:25:23.518-03:00: IRScope: ################## After CFG Linearize##################2011-11-04T05:25:23.518-03:00: IR_Printer: ----------------------------------------2011-11-04T05:25:23.518-03:00: IR_Printer: Method [root]:[script]:-e2011-11-04T05:25:23.518-03:00: IR_Printer: Graph:BB [4:LBL_3]:>[7], <[3]BB [1:LBL_1]:>[8,2]BB [2:LBL_2]:>[3], <[1]BB [7:_LOOP_END_0]:>[8], <[4]BB [8:LBL_4]:<[1,7]BB [3:_LOOP_BEGIN_0]:>[5,4], <[6,2]BB [6:_ITER_END_0]:>[3], <[5]BB [5:_ITER_BEGIN_0]:>[6], <[3]

2011-11-04T05:25:23.517-03:00: IR_Printer: instrs: 0%self = recv_self 1%block(0:0) = recv_closure 2file_name(-e) 3line_num(0) 4a(0:1) = 1:fixnum 5_LOOP_BEGIN_0: 6%v_1 = call(<, a(0:1), [10:fixnum]) 7beq(%v_1, true, _ITER_BEGIN_0) 8%v_0 = nil 9jump _LOOP_END_0 10 _ITER_BEGIN_0: 11 %v_2 = call(puts, %self, [a(0:1)]) 12 %v_3 = call(+, a(0:1), [1:fixnum]) 13 a(0:1) = copy(%v_3) 14 %v_0 = copy(%v_3) 15 thread_poll 16 _ITER_END_0: 17 jump _LOOP_BEGIN_0 18 _LOOP_END_0: 19 return(%v_0)2011-11-04T05:25:23.517-03:00: IR_Printer: live variables: %v_0: 8-19 %v_1: 6-7 %v_3: 12-14

2011-11-04T05:25:23.518-03:00: IR_Printer: Instructions:BB [4:LBL_3] %v_0 = nilBB [1:LBL_1]BB [2:LBL_2] %self = recv_self %block(0:0) = recv_closure file_name(-e) line_num(0) a(0:1) = 1:fixnumBB [7:_LOOP_END_0] return(%v_0)BB [8:LBL_4] return(nil)

Core classes

• Mostly Java-based

• Leverage JDK where possible

• Work around JDK where necessary

• Use Ruby when possible

@JRubyClass(name="Fixnum", parent="Integer", include="Precision")public class RubyFixnum extends RubyInteger {        public static RubyClass createFixnumClass(Ruby runtime) {        RubyClass fixnum = runtime.defineClass("Fixnum", runtime.getInteger(),                ObjectAllocator.NOT_ALLOCATABLE_ALLOCATOR);

    @JRubyMethod(name = "+")    public IRubyObject op_plus(ThreadContext context, IRubyObject other) {        if (other instanceof RubyFixnum) {            return addFixnum(context, (RubyFixnum)other);        }        return addOther(context, other);    }

    private IRubyObject addFixnum(ThreadContext context, RubyFixnum other) {        long otherValue = other.value;        long result = value + otherValue;        if (additionOverflowed(value, otherValue, result)) {            return addAsBignum(context, other);        }        return newFixnum(context.getRuntime(), result);    }

Compiler

• AST-walking

• ASM bytecode library

• Minimal optimizations

• Invokedynamic really helps

• IR offers new opportunities

jruby --bytecode -e “1 + 1”

ALOAD 0 INVOKEVIRTUAL ruby/__dash_e__.getCallSite0 ALOAD 1 ALOAD 2 ALOAD 1 GETFIELD org/jruby/runtime/ThreadContext.runtime INVOKESTATIC org/jruby/RubyFixnum.one LDC 1 INVOKEVIRTUAL org/jruby/runtime/CallSite.call ARETURN

public class ASTCompiler {    private boolean isAtRoot = true;

    public void compileBody(Node node, BodyCompiler context, boolean expr) {        Node oldBodyNode = currentBodyNode;        currentBodyNode = node;        compile(node, context, expr);        currentBodyNode = oldBodyNode;    }        public void compile(Node node, BodyCompiler context, boolean expr) {        if (node == null) {            if (expr) context.loadNil();            return;        }        switch (node.getNodeType()) {            case ALIASNODE:                compileAlias((AliasNode) node, context, expr);                break;            case ANDNODE:                compileAnd(node, context, expr);                break;

    public void compileIf(Node node, BodyCompiler context, final boolean expr) {        final IfNode ifNode = (IfNode) node;

        // optimizations if we know ahead of time it will always be true or false        Node actualCondition = ifNode.getCondition();        while (actualCondition instanceof NewlineNode) {            actualCondition = ((NewlineNode)actualCondition).getNextNode();        }

        if (actualCondition.getNodeType().alwaysTrue()) {            // compile condition as non-expr and just compile "then" body            compile(actualCondition, context, false);            compile(ifNode.getThenBody(), context, expr);        } else if (actualCondition.getNodeType().alwaysFalse()) {            // always false or nil            compile(ifNode.getElseBody(), context, expr);        } else {

            BranchCallback trueCallback = new BranchCallback() {                public void branch(BodyCompiler context) {                    if (ifNode.getThenBody() != null) {                        compile(ifNode.getThenBody(), context, expr);                    } else {                        if (expr) context.loadNil();                    }                }            };

            BranchCallback falseCallback = new BranchCallback() {                public void branch(BodyCompiler context) {                    if (ifNode.getElseBody() != null) {                        compile(ifNode.getElseBody(), context, expr);                    } else {                        if (expr) context.loadNil();                    }                }            };                        // normal            compile(actualCondition, context, true);            context.performBooleanBranch(trueCallback, falseCallback);        }

public abstract class BaseBodyCompiler implements BodyCompiler {    protected SkinnyMethodAdapter method;    protected VariableCompiler variableCompiler;    protected InvocationCompiler invocationCompiler;    protected int argParamCount;    protected Label[] currentLoopLabels;    protected Label scopeStart = new Label();    protected Label scopeEnd = new Label();    protected Label redoJump;    protected boolean inNestedMethod = false;    private int lastLine = -1;    private int lastPositionLine = -1;    protected StaticScope scope;    protected ASTInspector inspector;    protected String methodName;    protected String rubyName;    protected StandardASMCompiler script;

    public void performBooleanBranch(BranchCallback trueBranch, BranchCallback falseBranch) {        Label afterJmp = new Label();        Label falseJmp = new Label();

        // call isTrue on the result        isTrue();

        method.ifeq(falseJmp); // EQ == 0 (i.e. false)        trueBranch.branch(this);        method.go_to(afterJmp);

        // FIXME: optimize for cases where we have no false branch        method.label(falseJmp);        falseBranch.branch(this);

        method.label(afterJmp);    }

More Demos!

• JRuby + JVM flags

• JRuby concurrency

• Invokedynamic (Java 7)

• Redcar Editor

• Ruboto (Android)

• VisualVM

Gracias!

• Charles Oliver Nutter

• “headius” on most services

• headius@headius.com

Recommended