128
, July 1, 2011 – Intel, Santa Clara High Performance JavaScript: JITs in Firefox

Intel JIT Talk

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

High Performance JavaScript: JITs in Firefox

Page 2: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Power of the Web

Page 3: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

How did we get here?

•Making existing web tech faster and faster

•Adding new technologies, like video, audio, integration and 2D+3D drawing

Page 4: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

The Browser

David Anderson
Today I'm going to talk about we make programs on the web really fast. What does it mean to program on the Web? A website is actually fairly complicated. It's a number of technologies all integrated and working together. Let's take a high-level look at what happens when you enter a URL in your browser.
Page 5: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Getting a Webpage

www.intel.com192.198.164.158

Page 6: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Getting a Webpage

www.intel.com192.198.164.158

GET / HTTP/1.0(HTTP Request)

Page 7: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Getting a Webpage

<html><head><title>Laptop, Notebooks, ...</title></head><script language="javascript"> function doStuff() {...}</script><body>...

(HTML, CSS, JavaScript)

Page 8: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Core Web Components

•HTML provides content

•CSS provides styling

•JavaScript provides a programming interface

Page 9: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

New Technologies

•2D Canvas

•Video, Audio tags

•WebGL (OpenGL + JavaScript)

Page 10: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

DOM

(Source: http://www.w3schools.com/htmldom/default.asp)

Page 11: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

CSS•Cascading Style Sheets

•Separates presentation from content

•<b class=”warning”>This is red</b>

.warning { color: red; text-decoration: underline;}

Page 12: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

What is JavaScript?

•C-like syntax

•De facto language of the web

•Interacts with the DOM and the browser

Page 13: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

C-Like Syntax

•Braces, semicolons, familiar keywords

if (x) { for (i = 0; i < 100; i++) print("Hello!");}

Page 14: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Displaying a Webpage

•Layout engine applies CSS to DOM

•Computes geometry of elements

•JavaScript engine executes code

•This can change DOM, triggers “reflow”

•Finished layout is sent to graphics layer

Page 15: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

The Result

Page 16: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

JavaScript

•Invented by Brendan Eich in 1995 for Netscape 2

•Initial version written in 10 days

•Anything from small browser interactions, complex apps (Gmail) to intense graphics (WebGL)

Page 17: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

The Result

Page 18: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Think LISP or Self, not Java

•Untyped - no type declarations

•Multi-Paradigm – objects, closures, first-class functions

•Highly dynamic - objects are dictionaries

Page 19: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

No Type Declarations

•Properties, variables, return values can be anything:

function f(a) { var x = 72.3; if (a) x = a + "string"; return x;}

Page 20: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

No Type Declarations

•Properties, variables, return values can be anything:

function f(a) { var x = “hi”; if (a) x = a + 33.2; return x;}

if a is: number: add; string: concat;

Page 21: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Functional

•Functions may be returned, passed as arguments:

function f(a) { return function () { return a; }}var m = f(5);print(m());

Page 22: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Objects

•Objects are dictionaries mapping strings to values

•Properties may be deleted or added at any time!

var point = { x : 5, y : 10 };delete point.x;point.z = 12;

Page 23: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Prototypes

•Every object can have a prototype object

•If a property is not found on an object, its prototype is searched instead

•… And the prototype’s prototype, etc..

Page 24: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Numbers

•JavaScript specifies that numbers are IEEE-754 64-bit floating point

•Engines use 32-bit integers to optimize

•Must preserve semantics: integers overflow to doubles

Page 25: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Numbersvar x = 0x7FFFFFFF;x++;

int x = 0x7FFFFFFF;x++;

JavaScript C++

x 2147483648 -2147483648

Page 26: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

JavaScript Implementations

•Values (Objects, strings, numbers)

•Garbage collector

•Runtime library

•Execution engine (VM)

Page 27: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Values

•The runtime must be able to query a value’s type to perform the right computation.

•When storing values to variables or object fields, the type must be stored as well.

Page 28: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

ValuesBoxed Unboxed

Purpose Storage ComputationExamples (INT32, 9000) (int)9000

(STRING, “hi”) (String *) “hi”(DOUBLE, 3.14) (double)3.14

Definition (Type tag, C++ value) C++ value

• Boxed values required for variables, object fields• Unboxed values required for computation

Page 29: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Boxed Values

•Need a representation in C++

•Easy idea: 96-bit struct

•32-bits for type

•64-bits for double, pointer, or integer

•Too big! We have to pack better.

Page 30: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Boxed Values

•We could use pointers, and tag the low bits

•Doubles would have to be allocated on the heap

•Indirection, GC pressure is bad

•There is a middle ground…

Page 31: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

•Values are 64-bit

•Doubles are normal IEEE-754

•How to pack non-doubles?

•51 bits of NaN space!

IA32, ARMNunboxing

Page 32: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

NunboxingIA32, ARM

63 0

Type Payload

•Full value: 0x400c000000000000•Type is double because it’s not a NaN•Encodes: (Double, 3.5)

0x400c0000 0x00000000

Page 33: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

NunboxingIA32, ARM

63 0

Type Payload

•Full value: 0xFFFF000100000040•Value is in NaN space•Type is 0xFFFF0001 (Int32)•Value is (Int32, 0x00000040)

0xFFFF0001 0x00000040

Page 34: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Punboxingx86-64

63 0Type Payload

•Full value: 0xFFFF800000000040•Value is in NaN space•Bits >> 47 == 0x1FFFF == INT32•Bits 47-63 masked off to retrieve payload•Value(Int32, 0x00000040)

1111 1111 1111 1000 0

47

000 .. 0000 0000 0000 0000 0100 0000

Page 35: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Nunboxing Punboxing

Fits in register NO YES

Trivial to decode YES NO

Portability 32-bit only* x64 only

* Some 64-bit OSes can restrict mmap() to 32-bits

Page 36: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Garbage Collection

•Need to reclaim memory without pausing user workflow, animations, etc

•Need very fast object allocation

•Consider lots of small objects like points, vectors

•Reading and writing to the heap must be fast

Page 37: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Mark and Sweep

•All live objects are found via a recursive traversal of the root value set

•All dead objects are added to a free list

•Very slow to traverse entire heap

•Building free lists can bring “dead” memory into cache

Page 38: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Improvements

•Sweeping, freelist building have been moved off-thread

•Marking can be incremental, interleaved with program execution

Page 39: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Generational GC•Observation: most objects are short-lived

•All new objects are bump-allocated into a newborn space

•Once newborn space is full, live objects are moved to the tenured space

•Newborn space is then reset to empty

Page 40: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Generational GC

8MB 8MBInactive Active

Newborn Space

Tenured Space

Page 41: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

After Minor GC

8MB8MBActive Inactive

Newborn Space

Tenured Space

Page 42: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Barriers

•Incremental marking, generational GC need read or write barriers

•Read barriers are much slower

•Azul Systems has hardware read barrier

•Write barrier seems to be well-predicted?

Page 43: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Unknowns

•How much do cache effects matter?

•Memory GC has to touch

•Locality of objects

•What do we need to consider to make GC fast?

Page 44: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Running JavaScript

•Interpreter – Runs code, not fast

•Basic JIT – Simple, untyped compiler

•Advanced JIT – Typed compiler, lightspeed

Page 45: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Interpreter

•Good for code that runs once

•Giant switch loop

•Handles all edge cases of JS semantics

Page 46: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Interpreter

while (true) { switch (*pc) { case OP_ADD: ... case OP_SUB: ... case OP_RETURN: ... } pc++;}

Page 47: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Interpreter case OP_ADD: { Value lhs = POP(); Value rhs = POP(); Value result; if (lhs.isInt32() && rhs.isInt32()) { int left = rhs.toInt32(); int right = rhs.toInt32(); if (AddOverflows(left, right, left + right)) result.setInt32(left + right); else result.setNumber(double(left) + double(right)); } else if (lhs.isString() || rhs.isString()) { String *left = ValueToString(lhs); String *right = ValueToString(rhs); String *r = Concatenate(left, right); result.setString(r); } else { double left = ValueToNumber(lhs); double right = ValueToNumber(rhs); result.setDouble(left + right); } PUSH(result); break;

Page 48: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Interpreter

•Very slow!

•Lots of opcode and type dispatch

•Lots of interpreter stack traffic

•Just-in-time compilation solves both

Page 49: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Just In Time

•Compilation must be very fast

•Can’t introduce noticeable pauses

•People care about memory use

•Can’t JIT everything

•Can’t create bloated code

•May have to discard code at any time

Page 50: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Basic JIT•JägerMonkey in Firefox 4

•Every opcode has a hand-coded template of assembly (registers left blank)

•Method at a time:

•Single pass through bytecode stream!

•Compiler uses assembly templates corresponding to each opcode

Page 51: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Bytecode

function Add(x, y) { return x + y;}

GETARG 0 ; fetch xGETARG 1 ; fetch yADDRETURN

Page 52: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Interpreter case OP_ADD: { Value lhs = POP(); Value rhs = POP(); Value result; if (lhs.isInt32() && rhs.isInt32()) { int left = rhs.toInt32(); int right = rhs.toInt32(); if (AddOverflows(left, right, left + right)) result.setInt32(left + right); else result.setNumber(double(left) + double(right)); } else if (lhs.isString() || rhs.isString()) { String *left = ValueToString(lhs); String *right = ValueToString(rhs); String *r = Concatenate(left, right); result.setString(r); } else { double left = ValueToNumber(lhs); double right = ValueToNumber(rhs); result.setDouble(left + right); } PUSH(result); break;

Page 53: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Assembling ADD

•Inlining that huge chunk for every ADD would be very slow

•Observation:

•Some input types much more common than others

Page 54: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

var j = 0;for (i = 0; i < 10000; i++) { j += i;}

var j = 12.3;for (i = 0; i < 10000; i++) { j += new Object() + i.toString();}

Integer Math – Common

Weird Stuff – Rare!

Page 55: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Assembling ADD

•Only generate code for the easiest and most common cases

•Large design space

•Can consider integers common, or

•Integers and doubles, or

•Anything!

Page 56: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Basic JIT - ADDif (arg0.type != INT32) goto slow_add;if (arg1.type != INT32) goto slow_add;

Page 57: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Basic JIT - ADDif (arg0.type != INT32) goto slow_add;if (arg1.type != INT32) goto slow_add;R0 = arg0.dataR1 = arg1.data

Greedy Register Allocator

Page 58: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Basic JIT - ADDif (arg0.type != INT32) goto slow_add;if (arg1.type != INT32) goto slow_add;R0 = arg0.data;R1 = arg1.data;R2 = R0 + R1;if (OVERFLOWED) goto slow_add;

Page 59: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Slow Paths

slow_add: Value result = runtime::Add(arg0, arg1);

Page 60: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

slow_add: Value result = Interpreter::Add(arg0, arg1); R2 = result.data; goto rejoin;

Inline Out-of-line

if (arg0.type != INT32) goto slow_add; if (arg1.type != INT32) goto slow_add; R0 = arg0.data; R1 = arg1.data; R2 = R0 + R1; if (OVERFLOWED) goto slow_add;rejoin:

Page 61: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Rejoining

slow_add: Value result = runtime::Add(arg0, arg1); R2 = result.data; R3 = result.type; goto rejoin;

Inline Out-of-line if (arg0.type != INT32) goto slow_add; if (arg1.type != INT32) goto slow_add; R0 = arg0.data; R1 = arg1.data; R2 = R0 + R1; if (OVERFLOWED) goto slow_add; R3 = TYPE_INT32;rejoin:

Page 62: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Final Code

if (arg0.type != INT32) goto slow_add; if (arg1.type != INT32) goto slow_add; R0 = arg0.data; R1 = arg1.data; R2 = R0 + R1; if (OVERFLOWED) goto slow_add; R3 = TYPE_INT32;rejoin: return Value(R3, R2);

Inline

slow_add: Value result = Interpreter::Add(arg0, arg1); R2 = result.data; R3 = result.type; goto rejoin;

Out-of-line

Page 63: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Observations

•Even though we have fast paths, no type information can flow in between opcodes

•Cannot assume result of ADD is integer

•Means more branches

•Types increase register pressure

Page 64: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Basic JIT – Object Access

•x.y is multiple dictionary searches!

•How can we create a fast-path?

•We could try to cache the lookup, but

•We want it to work for >1 objects

function f(x) { return x.y;}

Page 65: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Object Access

•Observation

•Usually, objects flowing through an access site look the same

•If we knew an object’s layout, we could bypass the dictionary lookup

Page 66: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Object Layout

var obj = { x: 50, y: 100,

z: 20 };

Page 67: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Object LayoutJSObject

Shape *shape;

Value *slots;

ShapeProperty Name Slot Number

x 0

y 1

z 2

0 (Int32, 50)

1 (Int32, 100)

2 (Int32, 20)

Page 68: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Object Layoutvar obj = { x: 50,

y: 100, z: 20

};…

var obj2 = { x: 78, y: 93, z: 600

};

Page 69: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Object LayoutJSObject

Shape *shape;

Value *slots;

ShapeProperty Name Slot Number

x 0

y 1

z 2 JSObject

Shape *shape;

Value *slots;

0 (Int32, 50)

1 (Int32, 100)

2 (Int32, 20)

0 (Int32, 78)

1 (Int32, 93)

2 (Int32, 600)

Page 70: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Familiar Problems

•We don’t know an object’s shape during JIT compilation

•... Or even that a property access receives an object!

•Solution: leave code “blank,” lazily generate it later

Page 71: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Inline Cachesfunction f(x) { return x.y;}

Page 72: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Inline Caches

•Start as normal, guarding on type and loading data

if (arg0.type != OBJECT) goto slow_property; JSObject *obj = arg0.data;

Page 73: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

if (arg0.type != OBJECT) goto slow_property; JSObject *obj = arg0.data;

goto property_ic;

Inline Caches

•No information yet: leave blank (nops).

Page 74: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Property ICsfunction f(x) { return x.y;}

f({x: 5, y: 10, z: 30});

Page 75: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Property ICsJSObject

Shape *shape;

Value *slots;

Shape S1

Property Name Slot Number

x 0

y 1

z 2

0 (Int32, 5)

1 (Int32, 10)

2 (Int32, 30)

Page 76: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Property ICsShape S1

Property Name Slot Number

x 0

y 1

z 2

Page 77: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Property ICsShape S1

Property Name Slot Number

x 0

y 1

z 2

Page 78: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Inline Caches

•Shape = S1, slot is 1

if (arg0.type != OBJECT) goto slow_property; JSObject *obj = arg0.data;

goto property_ic;

Page 79: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Inline Caches

•Shape = S1, slot is 1

if (arg0.type != OBJECT) goto slow_property; JSObject *obj = arg0.data;

goto property_ic;

Page 80: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Inline Caches

•Shape = SHAPE1, slot is 1

if (arg0.type != OBJECT) goto slow_property; JSObject *obj = arg0.data; if (obj->shape != SHAPE1) goto property_ic; R0 = obj->slots[1].type; R1 = obj->slots[1].data;

Page 81: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Polymorphism•What happens if two different shapes pass

through a property access?

•Add another fast-path...

function f(x) { return x.y;}f({ x: 5, y: 10, z: 30});f({ y: 40});

Page 82: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Polymorphism

if (arg0.type != OBJECT) goto slow_path; JSObject *obj = arg0.data; if (obj->shape != SHAPE1) goto property_ic; R0 = obj->slots[1].type; R1 = obj->slots[1].data;rejoin:

Main Method

Page 83: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Polymorphism

if (arg0.type != OBJECT) goto slow_path; JSObject *obj = arg0.data; if (obj->shape != SHAPE0) goto property_ic; R0 = obj->slots[1].type; R1 = obj->slots[1].data;rejoin:

Main Method Generated Stub

stub: if (obj->shape != SHAPE2) goto property_ic; R0 = obj->slots[0].type; R1 = obj->slots[0].data; goto rejoin;

Page 84: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Polymorphism

if (arg0.type != OBJECT) goto slow_path; JSObject *obj = arg0.data; if (obj->shape != SHAPE1) goto stub; R0 = obj->slots[1].type; R1 = obj->slots[1].data;rejoin:

Main Method Generated Stub

stub: if (obj->shape != SHAPE2) goto property_ic; R0 = obj->slots[0].type; R1 = obj->slots[0].data; goto rejoin;

Page 85: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Chain of Stubsif (obj->shape == S1) { result = obj->slots[0];} else { if (obj->shape == S2) { result = obj->slots[1]; } else { if (obj->shape == S3) { result = obj->slots[2]; } else { if (obj->shape == S4) { result = obj->slots[3]; } else { goto property_ic; } } }}

Page 86: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Code Memory•Generated code is patched from inside the

method

•Self-modifying, but single threaded

•Code memory is always rwx

•Concerned about protection-flipping expense

Page 87: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Basic JIT Summary

•Generates simple code to handle common cases

•Inline caches adapt code based on runtime observations

•Lots of guards, poor type information

Page 88: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Optimizing Harder

•Single pass too limited: we want to perform whole-method optimizations

•We could generate an IR, but without type information...

•Slow paths prevent most optimizations.

Page 89: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Code Motion?

function Add(x, n) { var sum = 0; for (var i = 0; i < n; i++) sum += x + n; return sum;}

The addition looks loop invariant.Can we hoist it?

Page 90: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Code Motion?

function Add(x, n) { var sum = 0; var temp0 = x + n; for (var i = 0; i < n; i++) sum += temp0; return sum;}

Let’s try it.

Page 91: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Wrong Resultvar global = 0;var obj = { valueOf: function () { return ++global; }}Add(obj, 10);

• Original code returns 155• Hoisted version returns 110!

Page 92: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Idempotency

•Untyped operations are usually not provably idempotent

•Slow paths are re-entrant, can have observable side effects

•This prevents code motion, redundancy elimination

Page 93: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Ideal Scenario

•Actual knowledge about types!

•Remove slow paths

•Perform whole-method optimizations

Page 94: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Advanced JITs•Make optimistic guesses about types

•Guesses must be informed

•Don’t want to waste time compiling code that can’t or won’t run

•Using type inference or runtime profiling

•Generate an IR, perform textbook compiler optimizations

Page 95: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Optimism Pays Off

•People naturally write code as if it were typed

•Variables and object fields that change types are rare

Page 96: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

IonMonkey

•Work in progress

•Constructs high and low-level IRs

•Applies type information using runtime feedback

•Inlining, GVN, LICM, LSRA

Page 97: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

function Add(x, y) { return x + y + x;}

GETARG 0 ; fetch xGETARG 1 ; fetch yADDGETARG 0 ; fetch xADDRETURN

Page 98: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Build SSAGETARG 0 ; fetch xGETARG 1 ; fetch yADDGETARG 0 ; fetch xADDRETURN

v0 = arg0v1 = arg1

Page 99: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Build SSAGETARG 0 ; fetch xGETARG 1 ; fetch yADDGETARG 0 ; fetch xADDRETURN

v0 = arg0v1 = arg1

Page 100: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Type Oracle

•Need a mechanism to inform compiler about likely types

•Type Oracle: given program counter, returns types for inputs and outputs

•May use any data source – runtime profiling, type inference, etc

Page 101: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Type Oracle

•For this example, assume the type oracle returns “integer” for the output and both inputs to all ADD operations

Page 102: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Build SSAGETARG 0 ; fetch xGETARG 1 ; fetch yADDGETARG 0 ; fetch xADDRETURN

v0 = arg0v1 = arg1v2 = iadd(v0, v1)

Page 103: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Build SSAGETARG 0 ; fetch xGETARG 1 ; fetch yADDGETARG 0 ; fetch xADDRETURN

v0 = arg0v1 = arg1v2 = iadd(v0, v1)v3 = iadd(v2, v0)

Page 104: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Build SSAGETARG 0 ; fetch xGETARG 1 ; fetch yADDGETARG 0 ; fetch xADDRETURN

v0 = arg0v1 = arg1v2 = iadd(v0, v1)v3 = iadd(v2, v0)v4 = return(v3)

Page 105: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

SSA (Intermediate)

v0 = arg0v1 = arg1v2 = iadd(v0, v1)v3 = iadd(v2, v0)v4 = return(v3)

Untyped

Untyped

TypedTyped

Page 106: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Intermediate SSA

•SSA does not type check

•Type analysis:

•Makes sure SSA inputs have correct type

•Inserts conversions, value decoding

Page 107: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Unoptimized SSA

v0 = arg0v1 = arg1v2 = unbox(v0, INT32)v3 = unbox(v1, INT32)v4 = iadd(v2, v3)v5 = unbox(v0, INT32)v6 = iadd(v4, v5)v7 = return(v6)

Page 108: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Optimization

v0 = arg0v1 = arg1v2 = unbox(v0, INT32)v3 = unbox(v1, INT32)v4 = iadd(v2, v3)v5 = unbox(v0, INT32)v6 = iadd(v4, v5)v7 = return(v6)

Page 109: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Optimization

v0 = arg0v1 = arg1v2 = unbox(v0, INT32)v3 = unbox(v1, INT32)v4 = iadd(v2, v3)v5 = unbox(v0, INT32)v6 = iadd(v4, v5)v7 = return(v6)

Page 110: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Optimized SSA

v0 = arg0v1 = arg1v2 = unbox(v0, INT32)v3 = unbox(v1, INT32)v4 = iadd(v2, v3)v5 = iadd(v4, v2)v6 = return(v5)

Page 111: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Register Allocation

•Linear Scan Register Allocation

•Based on Christian Wimmer’s work• Linear Scan Register Allocation for the Java HotSpot™ Client Compiler. Master's thesis,

Institute for System Software, Johannes Kepler University Linz, 2004

• Linear Scan Register Allocation on SSA Form. In Proceedings of the International Symposium on Code Generation and Optimization, pages 170–179. ACM Press, 2010.

•Same algorithm used in Hotspot JVM

Page 112: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Allocate Registers

0: stack arg01: stack arg12: r1 unbox(0, INT32)3: r0 unbox(1, INT32)4: r0 iadd(2, 3)5: r0 iadd(4, 2)6: r0 return(5)

Page 113: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Code Generation

0: stack arg01: stack arg12: r1 unbox(0, INT32)3: r0 unbox(1, INT32)4: r0 iadd(2, 3)5: r0 iadd(4, 2)6: r0 return(5)

SSA

Page 114: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Code Generation

0: stack arg01: stack arg12: r1 unbox(0, INT32)3: r0 unbox(1, INT32)4: r0 iadd(2, 3)5: r0 iadd(4, 2)6: r0 return(5)

SSA if (arg0.type != INT32) goto BAILOUT; r1 = arg0.data;

Native Code

Page 115: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Code Generation

0: stack arg01: stack arg12: r1 unbox(0, INT32)3: r0 unbox(1, INT32)4: r0 iadd(2, 3)5: r0 iadd(4, 2)6: r0 return(5)

SSA if (arg0.type != INT32) goto BAILOUT; r1 = arg0.data; if (arg1.type != INT32) goto BAILOUT; r0 = arg1.data;

Native Code

Page 116: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Code Generation

0: stack arg01: stack arg12: r0 unbox(0, INT32)3: r1 unbox(1, INT32)4: r0 iadd(2, 3)5: r0 iadd(4, 2)6: r0 return(5)

SSA if (arg0.type != INT32) goto BAILOUT; r0 = arg0.data; if (arg1.type != INT32) goto BAILOUT; r1 = arg1.data; r0 = r0 + r1; if (OVERFLOWED) goto BAILOUT;

Native Code

Page 117: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Code Generation

0: stack arg01: stack arg12: r0 unbox(0, INT32)3: r1 unbox(1, INT32)4: r0 iadd(2, 3)5: r0 iadd(4, 2)6: r0 return(5)

SSA if (arg0.type != INT32) goto BAILOUT; r0 = arg0.data; if (arg1.type != INT32) goto BAILOUT; r1 = arg1.data; r0 = r0 + r1; if (OVERFLOWED) goto BAILOUT; r0 = r0 + r1; if (OVERFLOWED) goto BAILOUT;

Native Code

Page 118: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Code Generation

0: stack arg01: stack arg12: r0 unbox(0, INT32)3: r1 unbox(1, INT32)4: r0 iadd(2, 3)5: r0 iadd(4, 2)6: r0 return(4)

SSA if (arg0.type != INT32) goto BAILOUT; r0 = arg0.data; if (arg1.type != INT32) goto BAILOUT; r1 = arg1.data; r0 = r0 + r1; if (OVERFLOWED) goto BAILOUT; r0 = r0 + r1; if (OVERFLOWED) goto BAILOUT; return r0;

Native Code

Page 119: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Generated Code if (arg0.type != INT32) goto BAILOUT; r0 = arg0.data; if (arg1.type != INT32) goto BAILOUT; r1 = arg1.data; r0 = r0 + r1; if (OVERFLOWED) goto BAILOUT; r0 = r0 + r1; if (OVERFLOWED) goto BAILOUT; return r0;

Page 120: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Generated Code•Bailout exits JIT•May recompile function

if (arg0.type != INT32) goto BAILOUT; r0 = arg0.data; if (arg1.type != INT32) goto BAILOUT; r1 = arg1.data; r0 = r0 + r1; if (OVERFLOWED) goto BAILOUT; r0 = r0 + r1; if (OVERFLOWED) goto BAILOUT; return r0;

Page 121: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Guards Still Needed

•Object shapes for reading properties

•Types of…

•Arguments

•Values read from the heap

•Values returned from C++

Page 122: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Guards Still Needed

•Math

•Integer Overflow

•Divide by Zero

•Multiply by -0

•NaN is falsey in JS, truthy in ISAs

Page 123: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Advanced JITs

•Full type speculation - no slow paths

•Can move and eliminate code

•If speculation fails, method is recompiled or deoptimized

•Can still use techniques like inline caching!

Page 124: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

To Infinity, and Beyond•Advanced JIT example has four guards:

•Two type checks

•Two overflow checks

•Better than Basic JIT, but we can do better

• Interval analysis can remove overflow checks

•Type Inference!

Page 125: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Type Inference

•Whole program analysis to determine types of variables

•Hybrid: both static and dynamic

•Replaces checks in JIT with checks in virtual machine

•If VM breaks an assumption held in any JIT code, that JIT code is discarded

Page 126: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Conclusions•The web needs JavaScript to be fast

•Untyped, highly dynamic nature makes this challenging

•Value boxing has different tradeoffs on different ISAs

•GC needs to be fast – what should we focus on?

Page 127: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Conclusions

•JITs getting more powerful

•Type specialization is key

•Still need checks and special cases on many operations

Page 128: Intel JIT Talk

Friday, July 1, 2011 – Intel, Santa Clara

Questions?