Upload
startech-conference
View
1.004
Download
3
Embed Size (px)
Citation preview
Building Languagesfor the JVM
Charles Oliver Nutter
Me
• Charles Oliver Nutter
• JRuby and JVM guy
• “headius” on most services
Who Was I?
• Java EE architect
• Successfully!
• Never wrote a parser
• Never wrote a compiler
• But I wanted to learn how...
You
• Java?
• Ruby?
• Python?
• C#?
• Other?
Why Create Languages?
• Nothing is perfect
• New problems need new solutions
• Language design can be fun
• Fame and fortune?
• Not really.
Why Impl a Language?
• To learn it?
• Sort of...
• To learn the target platform
• Definitely!
• Fame and fortune?
• Well...getting there...
Challenges
• Community
• Platform
• Specifications
• Resources
Community
• Investment in status quo
• Afraid to stand out
• Known quantities
• Everything else sucks
• Gotta get paid!
Platform
• Matching language semantics
• JVM designed around Java
• JVM hides underlying platform
• Challenging to use
• Not bad...C/++ would be way worse
• Community may hate it ;-)
Specifications
• Incomplete
• Ruby had none for years
• ...and no complete test suites
• Difficult to implement
• Low level features
• Single-implementation quirks
• Hard or impossible to optimize
Resources
• You gotta eat
• Not much money in language work
• Some parts are hard
• OSS is a necessity
Why JVM?
• Because I am lazy
• Because VMs are *hard*
• Because I can’t be awesome at everything
Ok, Why Really?
• Cross-platform
• Libraries
• Languages
• Memory management
• Tools
• OSS
Cross-platform
• OpenJDK: Linux, Windows, Solaris, OS X, xBSD
• J9: Linux, zLinux, AS/400, ...
• HP: OpenVMS, HP/UX, ...
• Dalvik (Android): Linux on ARM, x86
Libraries
• For any need, a dozen libraries
• And a couple of them are good!
• Cross-platform
• Leading edge
Selection of languages
• Java
• Scala
• Clojure
• JRuby
• Mirah
• Jython, Groovy, Fantom, Kotlin, Ceylon, ...
Memory management
• Best GCs in the world
• Fastest object allocation
• Safe escape hatches like NIO
Tools
• Debugging
• Profiling
• Monitoring
• JVM internals
Open source?
• FOSS reference impl (OpenJDK)
• Mostly OSS libraries
• Heavy OSS culture
• Strong OSS influence in OpenJDK core
Case Study: JRuby
Ruby on the JVM
• All of Ruby’s power and beauty
• Solid VM underneath
• “Just another JVM language”
JVM Language
• Full interop with Java
• Tricky to do...
• Very rewarding for 99% case
• VM concerns solved
• No need to write a GC
• No need to write a JIT
• ...oh, but wait...
More than a JVM language
• Use native code where JDK fails us
• Paper over ugly bits like CLASSPATH
• Matching Ruby semantics exactly*
• Push JVM forward too!
Playing with JRuby
• Simple IRB demo
• JRuby on Rails - see Jano’s talk tomorrow
• JRuby performance
• PotC (???)
How did we do it?
JRuby Architecture
• Parser
• Abstract Syntax Tree (AST)
• Intermediate Representation (IR)
• Core classes
• Compiler
Parser
• Port of MRI’s Bison grammar
• “Jay” parser generator for Java
• Hand-written lexer
• Nearly as fast as the C version
• ...once it gets going
system ~/projects/jruby $ jruby -y -e "1 + 1"push state 0 value nullreducestate 0 uncover 0rule (1) $$1 :goto from state 0 to 2push state 2 value nulllexstate 2 reading tIDENTIFIER value Token { Value=load, Position=file:/Users/headius/projects/jruby/lib/jruby.jar!/jruby/kernel.rb:6}shiftfrom state 2 to 33push state 33value Token { Value=load, Position=file:/Users/headius/projects/jruby/lib/jruby.jar!/jruby/kernel.rb:6}lexstate 33reading tSTRING_BEG value Token { Value=', Position=file:/Users/headius/projects/jruby/lib/jruby.jar!/jruby/kernel.rb:6}reducestate 33uncover 2rule (487) operation : tIDENTIFIERgoto from state 2 to 62push state 62value Token { Value=load, Position=file:/Users/headius/projects/jruby/lib/jruby.jar!/jruby/kernel.rb:6}reducestate 62uncover 62 rule (252) $$6 :
system ~/projects/jruby $ jruby -y -e "1 + 1"push state 0 value nullreducestate 0 uncover 0rule (1) $$1 :goto from state 0 to 2push state 2 value nulllexstate 2 reading tIDENTIFIER value Token { Value=load, Position=file:/Users/headius/projects/jruby/lib/jruby.jar!/jruby/kernel.rb:6}shiftfrom state 2 to 33push state 33value Token { Value=load, Position=file:/Users/headius/projects/jruby/lib/jruby.jar!/jruby/kernel.rb:6}lexstate 33reading tSTRING_BEG value Token { Value=', Position=file:/Users/headius/projects/jruby/lib/jruby.jar!/jruby/kernel.rb:6}reducestate 33uncover 2rule (487) operation : tIDENTIFIERgoto from state 2 to 62push state 62value Token { Value=load, Position=file:/Users/headius/projects/jruby/lib/jruby.jar!/jruby/kernel.rb:6}reducestate 62uncover 62 rule (252) $$6 :
You will never need this.
public class RubyYaccLexer { public static final Encoding UTF8_ENCODING = UTF8Encoding.INSTANCE; public static final Encoding USASCII_ENCODING = USASCIIEncoding.INSTANCE; public static final Encoding ASCII8BIT_ENCODING = ASCIIEncoding.INSTANCE; private static ByteList END_MARKER = new ByteList(new byte[] {'_', 'E', 'N', 'D', '_', '_'}); private static ByteList BEGIN_DOC_MARKER = new ByteList(new byte[] {'b', 'e', 'g', 'i', 'n'}); private static ByteList END_DOC_MARKER = new ByteList(new byte[] {'e', 'n', 'd'}); private static final HashMap<String, Keyword> map;
static { map = new HashMap<String, Keyword>();
map.put("end", Keyword.END); map.put("else", Keyword.ELSE); map.put("case", Keyword.CASE); map.put("ensure", Keyword.ENSURE); map.put("module", Keyword.MODULE); map.put("elsif", Keyword.ELSIF); map.put("def", Keyword.DEF); map.put("rescue", Keyword.RESCUE); map.put("not", Keyword.NOT); map.put("then", Keyword.THEN); map.put("yield", Keyword.YIELD); map.put("for", Keyword.FOR); map.put("self", Keyword.SELF); map.put("false", Keyword.FALSE);
public enum Keyword { END ("end", Tokens.kEND, Tokens.kEND, LexState.EXPR_END), ELSE ("else", Tokens.kELSE, Tokens.kELSE, LexState.EXPR_BEG), CASE ("case", Tokens.kCASE, Tokens.kCASE, LexState.EXPR_BEG), ENSURE ("ensure", Tokens.kENSURE, Tokens.kENSURE, LexState.EXPR_BEG), MODULE ("module", Tokens.kMODULE, Tokens.kMODULE, LexState.EXPR_BEG), ELSIF ("elsif", Tokens.kELSIF, Tokens.kELSIF, LexState.EXPR_BEG), DEF ("def", Tokens.kDEF, Tokens.kDEF, LexState.EXPR_FNAME), RESCUE ("rescue", Tokens.kRESCUE, Tokens.kRESCUE_MOD, LexState.EXPR_MID), NOT ("not", Tokens.kNOT, Tokens.kNOT, LexState.EXPR_BEG), THEN ("then", Tokens.kTHEN, Tokens.kTHEN, LexState.EXPR_BEG), YIELD ("yield", Tokens.kYIELD, Tokens.kYIELD, LexState.EXPR_ARG), FOR ("for", Tokens.kFOR, Tokens.kFOR, LexState.EXPR_BEG), SELF ("self", Tokens.kSELF, Tokens.kSELF, LexState.EXPR_END), FALSE ("false", Tokens.kFALSE, Tokens.kFALSE, LexState.EXPR_END), RETRY ("retry", Tokens.kRETRY, Tokens.kRETRY, LexState.EXPR_END), RETURN ("return", Tokens.kRETURN, Tokens.kRETURN, LexState.EXPR_MID), TRUE ("true", Tokens.kTRUE, Tokens.kTRUE, LexState.EXPR_END), IF ("if", Tokens.kIF, Tokens.kIF_MOD, LexState.EXPR_BEG), DEFINED_P ("defined?", Tokens.kDEFINED, Tokens.kDEFINED, LexState.EXPR_ARG),
private int yylex() throws IOException { int c; boolean spaceSeen = false; boolean commandState; if (lex_strterm != null) { int tok = lex_strterm.parseString(this, src); if (tok == Tokens.tSTRING_END || tok == Tokens.tREGEXP_END) { lex_strterm = null; setState(LexState.EXPR_END); }
return tok; }
commandState = commandStart; commandStart = false;
loop: for(;;) { c = src.read(); switch(c) {
case '<': return lessThan(spaceSeen); case '>': return greaterThan(); case '"': return doubleQuote(); case '`': return backtick(commandState); case '\'': return singleQuote(); case '?': return questionMark(); case '&': return ampersand(spaceSeen); case '|': return pipe(); case '+': return plus(spaceSeen);
private int lessThan(boolean spaceSeen) throws IOException { int c = src.read(); if (c == '<' && lex_state != LexState.EXPR_DOT && lex_state != LexState.EXPR_CLASS && !isEND() && (!isARG() || spaceSeen)) { int tok = hereDocumentIdentifier(); if (tok != 0) return tok; } determineExpressionState(); switch (c) { case '=': if ((c = src.read()) == '>') { yaccValue = new Token("<=>", getPosition()); return Tokens.tCMP;
%%program : { lexer.setState(LexState.EXPR_BEG); support.initTopLocalVariables(); } top_compstmt { // ENEBO: Removed !compile_for_eval which probably is to reduce warnings if ($2 != null) { /* last expression should not be void */ if ($2 instanceof BlockNode) { support.checkUselessStatement($<BlockNode>2.getLast()); } else { support.checkUselessStatement($2); } } support.getResult().setAST(support.addRootNode($2, support.getPosition($2))); }
stmt : kALIAS fitem { lexer.setState(LexState.EXPR_FNAME); } fitem { $$ = support.newAlias($1.getPosition(), $2, $4); } | kALIAS tGVAR tGVAR { $$ = new VAliasNode($1.getPosition(), (String) $2.getValue(), (String) $3.getValue()); } | kALIAS tGVAR tBACK_REF { $$ = new VAliasNode($1.getPosition(), (String) $2.getValue(), "$" + $<BackRefNode>3.getType()); } | kALIAS tGVAR tNTH_REF { support.yyerror("can't make alias for the number variables"); } | kUNDEF undef_list { $$ = $2; } | stmt kIF_MOD expr_value { $$ = new IfNode(support.getPosition($1), support.getConditionNode($3), $1, null); }
public Object yyparse (RubyYaccLexer yyLex) throws java.io.IOException { if (yyMax <= 0) yyMax = 256;"" " // initial size int yyState = 0, yyStates[] = new int[yyMax];" // state stack Object yyVal = null, yyVals[] = new Object[yyMax];" // value stack int yyToken = -1;" " " " " // current input int yyErrorFlag = 0;" " " " // #tokens to shift
yyLoop: for (int yyTop = 0;; ++ yyTop) { if (yyTop >= yyStates.length) {" " " // dynamically increase int[] i = new int[yyStates.length+yyMax]; System.arraycopy(yyStates, 0, i, 0, yyStates.length); yyStates = i; Object[] o = new Object[yyVals.length+yyMax]; System.arraycopy(yyVals, 0, o, 0, yyVals.length); yyVals = o; } yyStates[yyTop] = yyState; yyVals[yyTop] = yyVal; if (yydebug != null) yydebug.push(yyState, yyVal);
if (state == null) { yyVal = yyDefault(yyV > yyTop ? null : yyVals[yyV]); } else { yyVal = state.execute(support, lexer, yyVal, yyVals, yyTop); }
states[23] = new ParserState() { public Object execute(ParserSupport support, RubyYaccLexer lexer, Object yyVal, Object[] yyVals, int yyTop) { yyVal = new IfNode(support.getPosition(((Node)yyVals[-2+yyTop])), support.getConditionNode(((Node)yyVals[0+yyTop])), ((Node)yyVals[-2+yyTop]), null); return yyVal; }};states[24] = new ParserState() { public Object execute(ParserSupport support, RubyYaccLexer lexer, Object yyVal, Object[] yyVals, int yyTop) { yyVal = new IfNode(support.getPosition(((Node)yyVals[-2+yyTop])), support.getConditionNode(((Node)yyVals[0+yyTop])), null, ((Node)yyVals[-2+yyTop])); return yyVal; }};
Never look at this.
AST
• Interpreted directly
• Specialized in places
• Large and rich
$ ast -e "a = true; if a; 2; else; 3; end"AST:RootNode 0 BlockNode 0 NewlineNode 0 LocalAsgnNode:a 0 TrueNode:true 0 NewlineNode 0 IfNode 0 LocalVarNode:a 0 NewlineNode 0 FixnumNode 0 NewlineNode 0 FixnumNode 0
public class IfNode extends Node { private final Node condition; private final Node thenBody; private final Node elseBody;
public IfNode(ISourcePosition position, Node condition, Node thenBody, Node elseBody) { super(position); assert condition != null : "condition is not null";// assert thenBody != null : "thenBody is not null";// assert elseBody != null : "elseBody is not null"; this.condition = condition; this.thenBody = thenBody; this.elseBody = elseBody; }
@Override public IRubyObject interpret(Ruby runtime, ThreadContext context, IRubyObject self, Block aBlock) { ISourcePosition position = getPosition();
context.setFile(position.getFile()); context.setLine(position.getStartLine());
IRubyObject result = condition.interpret(runtime, context, self, aBlock); if (result.isTrue()) { return thenBody == null ? runtime.getNil() : thenBody.interpret(runtime, context, self, aBlock); } else { return elseBody == null ? runtime.getNil() : elseBody.interpret(runtime, context, self, aBlock); } }
IR (future work)
• Control flow graph
• Ruby-specific instruction set
• Optimizing compiler
• Ruby-level optimizations
jruby -e “1 + 1”
2011-11-04T05:23:09.375-03:00: IR_Printer: instrs: 0 %self = recv_self 1 %block(0:0) = recv_closure 2 file_name(-e) 3 line_num(0) 4 %v_0 = call(+, 1:fixnum, [1:fixnum]) 5 return(%v_0)2011-11-04T05:23:09.375-03:00: IR_Printer: live variables: %v_0: 4-5
a = 1; while a < 10; puts a; a += 1; end
2011-11-04T05:25:23.517-03:00: IR_Printer: instrs: 0%self = recv_self 1%block(0:0) = recv_closure 2file_name(-e) 3line_num(0) 4a(0:1) = 1:fixnum 5_LOOP_BEGIN_0: 6%v_1 = call(<, a(0:1), [10:fixnum]) 7beq(%v_1, true, _ITER_BEGIN_0) 8%v_0 = nil 9jump _LOOP_END_0 10 _ITER_BEGIN_0: 11 %v_2 = call(puts, %self, [a(0:1)]) 12 %v_3 = call(+, a(0:1), [1:fixnum]) 13 a(0:1) = copy(%v_3) 14 %v_0 = copy(%v_3) 15 thread_poll 16 _ITER_END_0: 17 jump _LOOP_BEGIN_0 18 _LOOP_END_0: 19 return(%v_0)2011-11-04T05:25:23.517-03:00: IR_Printer: live variables: %v_0: 8-19 %v_1: 6-7 %v_3: 12-14
2011-11-04T05:25:23.518-03:00: IRScope: ################## After CFG Linearize##################2011-11-04T05:25:23.518-03:00: IR_Printer: ----------------------------------------2011-11-04T05:25:23.518-03:00: IR_Printer: Method [root]:[script]:-e2011-11-04T05:25:23.518-03:00: IR_Printer: Graph:BB [4:LBL_3]:>[7], <[3]BB [1:LBL_1]:>[8,2]BB [2:LBL_2]:>[3], <[1]BB [7:_LOOP_END_0]:>[8], <[4]BB [8:LBL_4]:<[1,7]BB [3:_LOOP_BEGIN_0]:>[5,4], <[6,2]BB [6:_ITER_END_0]:>[3], <[5]BB [5:_ITER_BEGIN_0]:>[6], <[3]
2011-11-04T05:25:23.517-03:00: IR_Printer: instrs: 0%self = recv_self 1%block(0:0) = recv_closure 2file_name(-e) 3line_num(0) 4a(0:1) = 1:fixnum 5_LOOP_BEGIN_0: 6%v_1 = call(<, a(0:1), [10:fixnum]) 7beq(%v_1, true, _ITER_BEGIN_0) 8%v_0 = nil 9jump _LOOP_END_0 10 _ITER_BEGIN_0: 11 %v_2 = call(puts, %self, [a(0:1)]) 12 %v_3 = call(+, a(0:1), [1:fixnum]) 13 a(0:1) = copy(%v_3) 14 %v_0 = copy(%v_3) 15 thread_poll 16 _ITER_END_0: 17 jump _LOOP_BEGIN_0 18 _LOOP_END_0: 19 return(%v_0)2011-11-04T05:25:23.517-03:00: IR_Printer: live variables: %v_0: 8-19 %v_1: 6-7 %v_3: 12-14
2011-11-04T05:25:23.518-03:00: IR_Printer: Instructions:BB [4:LBL_3] %v_0 = nilBB [1:LBL_1]BB [2:LBL_2] %self = recv_self %block(0:0) = recv_closure file_name(-e) line_num(0) a(0:1) = 1:fixnumBB [7:_LOOP_END_0] return(%v_0)BB [8:LBL_4] return(nil)
Core classes
• Mostly Java-based
• Leverage JDK where possible
• Work around JDK where necessary
• Use Ruby when possible
@JRubyClass(name="Fixnum", parent="Integer", include="Precision")public class RubyFixnum extends RubyInteger { public static RubyClass createFixnumClass(Ruby runtime) { RubyClass fixnum = runtime.defineClass("Fixnum", runtime.getInteger(), ObjectAllocator.NOT_ALLOCATABLE_ALLOCATOR);
@JRubyMethod(name = "+") public IRubyObject op_plus(ThreadContext context, IRubyObject other) { if (other instanceof RubyFixnum) { return addFixnum(context, (RubyFixnum)other); } return addOther(context, other); }
private IRubyObject addFixnum(ThreadContext context, RubyFixnum other) { long otherValue = other.value; long result = value + otherValue; if (additionOverflowed(value, otherValue, result)) { return addAsBignum(context, other); } return newFixnum(context.getRuntime(), result); }
Compiler
• AST-walking
• ASM bytecode library
• Minimal optimizations
• Invokedynamic really helps
• IR offers new opportunities
jruby --bytecode -e “1 + 1”
ALOAD 0 INVOKEVIRTUAL ruby/__dash_e__.getCallSite0 ALOAD 1 ALOAD 2 ALOAD 1 GETFIELD org/jruby/runtime/ThreadContext.runtime INVOKESTATIC org/jruby/RubyFixnum.one LDC 1 INVOKEVIRTUAL org/jruby/runtime/CallSite.call ARETURN
public class ASTCompiler { private boolean isAtRoot = true;
public void compileBody(Node node, BodyCompiler context, boolean expr) { Node oldBodyNode = currentBodyNode; currentBodyNode = node; compile(node, context, expr); currentBodyNode = oldBodyNode; } public void compile(Node node, BodyCompiler context, boolean expr) { if (node == null) { if (expr) context.loadNil(); return; } switch (node.getNodeType()) { case ALIASNODE: compileAlias((AliasNode) node, context, expr); break; case ANDNODE: compileAnd(node, context, expr); break;
public void compileIf(Node node, BodyCompiler context, final boolean expr) { final IfNode ifNode = (IfNode) node;
// optimizations if we know ahead of time it will always be true or false Node actualCondition = ifNode.getCondition(); while (actualCondition instanceof NewlineNode) { actualCondition = ((NewlineNode)actualCondition).getNextNode(); }
if (actualCondition.getNodeType().alwaysTrue()) { // compile condition as non-expr and just compile "then" body compile(actualCondition, context, false); compile(ifNode.getThenBody(), context, expr); } else if (actualCondition.getNodeType().alwaysFalse()) { // always false or nil compile(ifNode.getElseBody(), context, expr); } else {
BranchCallback trueCallback = new BranchCallback() { public void branch(BodyCompiler context) { if (ifNode.getThenBody() != null) { compile(ifNode.getThenBody(), context, expr); } else { if (expr) context.loadNil(); } } };
BranchCallback falseCallback = new BranchCallback() { public void branch(BodyCompiler context) { if (ifNode.getElseBody() != null) { compile(ifNode.getElseBody(), context, expr); } else { if (expr) context.loadNil(); } } }; // normal compile(actualCondition, context, true); context.performBooleanBranch(trueCallback, falseCallback); }
public abstract class BaseBodyCompiler implements BodyCompiler { protected SkinnyMethodAdapter method; protected VariableCompiler variableCompiler; protected InvocationCompiler invocationCompiler; protected int argParamCount; protected Label[] currentLoopLabels; protected Label scopeStart = new Label(); protected Label scopeEnd = new Label(); protected Label redoJump; protected boolean inNestedMethod = false; private int lastLine = -1; private int lastPositionLine = -1; protected StaticScope scope; protected ASTInspector inspector; protected String methodName; protected String rubyName; protected StandardASMCompiler script;
public void performBooleanBranch(BranchCallback trueBranch, BranchCallback falseBranch) { Label afterJmp = new Label(); Label falseJmp = new Label();
// call isTrue on the result isTrue();
method.ifeq(falseJmp); // EQ == 0 (i.e. false) trueBranch.branch(this); method.go_to(afterJmp);
// FIXME: optimize for cases where we have no false branch method.label(falseJmp); falseBranch.branch(this);
method.label(afterJmp); }
More Demos!
• JRuby + JVM flags
• JRuby concurrency
• Invokedynamic (Java 7)
• Redcar Editor
• Ruboto (Android)
• VisualVM
Gracias!
• Charles Oliver Nutter
• “headius” on most services