72
Under the Hood of the Testarossa JIT Compiler Mark Stoodley Senior Software Developer IBM Runtime Technologies September 19, 2016

Under the Hood of the Testarossa JIT Compiler

Embed Size (px)

Citation preview

Page 1: Under the Hood of the Testarossa JIT Compiler

Under the Hood of the Testarossa JIT Compiler

Mark StoodleySenior Software DeveloperIBM Runtime TechnologiesSeptember 19, 2016

Page 2: Under the Hood of the Testarossa JIT Compiler

2

Important disclaimers• THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.• WHILST EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION

CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED.

• ALL PERFORMANCE DATA INCLUDED IN THIS PRESENTATION HAVE BEEN GATHERED IN A CONTROLLED ENVIRONMENT. YOUR OWN TEST RESULTS MAY VARY BASED ON HARDWARE, SOFTWARE OR INFRASTRUCTURE DIFFERENCES.

• ALL DATA INCLUDED IN THIS PRESENTATION ARE MEANT TO BE USED ONLY AS A GUIDE.• IN ADDITION, THE INFORMATION CONTAINED IN THIS PRESENTATION IS BASED ON IBM’S CURRENT

PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM, WITHOUT NOTICE.• IBM AND ITS AFFILIATED COMPANIES SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT

OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.• NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:

– CREATING ANY WARRANT OR REPRESENTATION FROM IBM, ITS AFFILIATED COMPANIES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS

Page 3: Under the Hood of the Testarossa JIT Compiler

3

• Worked on 2 completely different production Java JIT compilers since 2002 after compiler & architecture graduate work at University of Toronto

• Current architect of Testarossa JIT

• Eclipse OMR open source project lead

Who am I?

Page 4: Under the Hood of the Testarossa JIT Compiler

4

• Created in 1998 as an IBM closed source project– Java ME to SE to many languages/compilation scenarios– Built by IBM compiler team in Toronto (Markham) Canada

• Best known as IBM Java JIT since IBM SDK for Java 5.0 (2005)– Early show as debug sidecar in IBM Java 1.4.2 (2004)– Designed in conjunction with J9 JVM technology

• Also used for other IBM compiler backends and binary translators

Testarossa: backend compiler technology

Page 5: Under the Hood of the Testarossa JIT Compiler

5

Testarossa technology highlights: 1998-…• Languages:

– Production: Java ME and SE, COBOL, PL/I, Z binary emulator, binary (re)optimizer– Prototypes: Ruby, Python, SOM++, and more…

• Some technology highlights implemented by the Java JIT :– Cooperative suspend (1999)– Diagnostic abilities: e.g. limit files, per method options (1999)– Full optimization while supporting type accurate GC (1999)– AOT (rom-able) compilation for Java (1999)– Aggressive runtime native code patching (2000)– Invocation and time-based compilation triggers (2000)– Adaptive compilation (cold, warm, hot, very hot, scorching) (1999)– JIT profiling infrastructure and optimizations (2001)– Speculative class hierarchy based inlining and optimization (2001)– Fairly complete set of classical compiler optimizations and dataflow analyses (2001)– Java-specific optimizations like ”check” removal (2001)– Java debug support (2001)– Escape analysis and stack allocation (2001)– Automatic lock coarsening (2002)– Multiple code caches (2005)– Asynchronous compilation (2006)– Interpreter profiling (2006)– Real-time Specification for Java (AOT and JIT) (2005)– Dynamic AOT compilation for Java (2006)– Hot Code Replacement support (2007)– Compressed references (2007)– Multiple compilation threads (2010)– On stack replacement (2013)– Transactional Memory (2013)– Packed objects (2013)– Multitenancy (2013)– Auto SIMD (2014)– Auto GPU (2014)– Heuristic tuning and retuning (1999– ongoing)

• Platforms that are or have been supported :– ME: ARM32, X86(IA32), MIPS, POWER, SH4 – 32-bit SE: ARM, POWER, X86, Z– 64-bit SE: POWER, X86, Z– Hard real-time (RTSJ compliant): IA32– COBOL, PL/I, COBOL Automatic Binary Optimizer: Z– Z binary emulator: X86, P

• Performance metrics that have been or are actively tracked :– Latency (elapsed time)– Throughput (operations / sec)– Start-up time– Ramp-up time– CPU consumption– Resource consumption at idle– Compilation time– Memory footprint– JIT library size– Incremental pauses

• Hardware exploitation highlights:– Efficient CPU instruction sequences– Managing different kinds of hardware registers– Exploiting hardware data type support– Cryptographic, compression acceleration– Character conversion loop recognition and acceleration– Atomic locking and other synchronization optimization– Simultaneous Multi Threading– Transactional Memory– SIMD (Single instruction multiple data)– GPU (Graphics processing unit)

Page 6: Under the Hood of the Testarossa JIT Compiler

6

On the track: performance keeps going up!

Java6 (SR16 FP4)

Java 6.1 (SR8 FP4)

Java 7 (SR9)

Java 7.1 (SR3)

Java 8 (SR1)

0

2000

4000

6000

8000

10000

12000

Java6.0.16.4 Java6.1.8.4 Java7.0.9.0 Java7.1.3.0 Java8.0.1.00

0.2

0.4

0.6

0.8

1

1.2

1.4

Java6.0.16.4 Java6.1.8.4 Java7.0.9.0 Java7.1.3.0 Java8.0.1.0

1.53X2.00X2.29X2.76X 1.35X1.60X 1.76X1.96X

Apache Spark 1.4 Databricks

1/geometric mean

Daytrader online stock trading application

Throughput (ops/sec)

Page 7: Under the Hood of the Testarossa JIT Compiler

7

• J9 and Testarossa have played critical role advancing Java performance– Competitive, often industry-leading, performance for 11 years now– You have benefited from competitive pressure on your JDK even if

you don’t actually use the IBM SDK for Java

• J9 and Testarossa are now being open sourced

You all benefit from it!

Page 8: Under the Hood of the Testarossa JIT Compiler

8

IBM SDK for Java built from open source

OpenJDK

HotSpotEclipseOMR

OpenJDK

OpenJ9

OMR

OpenJDK

OpenJ9

OMR

Provenadaptabletechnologyintheopenforrapidinnovationandcollaborationacrossmultiple

languagecommunities

OpenJDK IBMSDKforJava

Javacommunityopeninnovationandcollaboration,deepplatform

exploitationforX86&IBMhardwareplatforms

(OpenPOWER,LinuxONE)

Ruby?

OMR

CommunitiesBeyondJava

COBOLPL/IEmulator

Python?

OMR

JS?

OMR

Swift?

OMR

Longtermsupport,quickresponseforproblems,andotherformsofIBMcustomer

specificengagement

+IBMisms

Page 9: Under the Hood of the Testarossa JIT Compiler

9

How did we create Eclipse OMR?

Page 10: Under the Hood of the Testarossa JIT Compiler

10

Start from IBM J9 Java RuntimeJ9 Java Execution Environment

J9JavaPlatformAbstraction Layer

J9JavaGarbageCollector

J9JavaDiagnosticand

MonitoringServices

Source Code Bytecode/ASTCompiler

J9JavaJust-In-TimeCompiler

InterpreterJava

SourceJ9Java

BytecodeCompiler

J9JavaBytecodeInterpreter

Page 11: Under the Hood of the Testarossa JIT Compiler

11

Refactor “Java”-ness into a Glue layer that adds language specifics to each core component

J9Java

JITCompilerGlue

J9 Java Execution Environment

OMRPlatformAbstraction Layer

OMRGarbageCollector

OMRDiagnosticand

MonitoringServices

Source Code Bytecode/ASTCompiler Interpreter

JavaSource

J9JavaBytecodeCompiler

J9JavaBytecodeInterpreter

J9JavaDiagnosticandMonitoringGlue

J9JavaGCGlue

OMRJustinTime

(JIT)Compiler

Page 12: Under the Hood of the Testarossa JIT Compiler

12

Form Eclipse OMR around core components

OMRPlatformAbstraction Layer

OMRGarbageCollector

OMRDiagnosticand

MonitoringServices

OMRJustinTime

(JIT)Compiler

Page 13: Under the Hood of the Testarossa JIT Compiler

13

http://www.eclipse.org/omrhttps://github.com/eclipse/omr

https://developer.ibm.com/open/omr/

Dual License:Eclipse Public License V1.0

Apache 2.0

Users and contributors very welcomehttps://github.com/eclipse/omr/blob/master/CONTRIBUTING.md

Eclipse OMRCreated March 2016

Page 14: Under the Hood of the Testarossa JIT Compiler

14

port platform abstraction (porting) librarythread cross platform pthread-like threading library

vm APIs to manage per-interpreter and per-thread contexts

gc garbage collection framework for managed heaps

compiler extensible compiler framework

jitbuilder WIP project to simplify bring up for a new JIT compileromrtrace library for publishing trace events for monitoring/diagnostics

omrsigcompat signal handling compatibility library

example demonstration code to show how a language runtime might consume OMR components, also used for testing

fvtest language independent test framework built on the example glue so that components can be tested outside of a language runtime, uses Google Test 1.7 framework

+ a few others

~800KLOC at this point, more components coming!

OMR components

Page 15: Under the Hood of the Testarossa JIT Compiler

15

port platform abstraction (porting) librarythread cross platform pthread-like threading library

vm APIs to manage per-interpreter and per-thread contexts

gc garbage collection framework for managed heaps

compiler extensible compiler framework

jitbuilder WIP project to simplify bring up for a new JIT compileromrtrace library for publishing trace events for monitoring/diagnostics

omrsigcompat signal handling compatibility library

example demonstration code to show how a language runtime might consume OMR components, also used for testing

fvtest language independent test framework built on the example glue so that components can be tested outside of a language runtime, uses Google Test 1.7 framework

+ a few others

~800KLOC at this point, more components coming!

OMR componentsIBM Contributed

500KLOC of TestarossaSeptember 17, 2016

Page 16: Under the Hood of the Testarossa JIT Compiler

16

• TR JIT design principles• How compilation works• AOT compilation• Wrap-up

Rest of the talk is on Testarossa JIT

Page 17: Under the Hood of the Testarossa JIT Compiler

17

Be transparent

Users shouldn’t be aware of the JIT(except that the application runs a lot faster!)

JIT design principle #1

Page 18: Under the Hood of the Testarossa JIT Compiler

18

Let the interpreter handle the hard stuff

Optimize to target the top 75% ish of caseswith a “simple” solution

JIT design principle #2

Page 19: Under the Hood of the Testarossa JIT Compiler

19

Pay attention to the costs

Overheads can very easily trump benefitsProfile data occupies space

Consider what will happen at scale (10K+ classes)

JIT design principle #3

Page 20: Under the Hood of the Testarossa JIT Compiler

20

Use the right optimization tool for the job

Prove when you can prove easilyGuard when you can’t prove or can’t prove easily

Speculate appropriately for the bias

JIT design principle #4

Page 21: Under the Hood of the Testarossa JIT Compiler

21

Compilers can do amazing things

Remember the “unreadable” list of highlight technologies from slide 5! Many items on that list did not exist or had never been done in a

production runtime system before Java

Also keep in mind

Page 22: Under the Hood of the Testarossa JIT Compiler

22

Compilers are not all powerful

Can’t change algorithmsEngineering constraints can take away a lot of options

Also keep in mind

Page 23: Under the Hood of the Testarossa JIT Compiler

23

“JIT as optimizer for interpreter”is reasonable starting point

But it’s not how either production Java runtime compiler evolvedIMO interpreter should focus on getting it right without being really slow

JIT compiler should make it fast but stay as simple as possible

Also keep in mind

Page 24: Under the Hood of the Testarossa JIT Compiler

24

So how does it work?

Page 25: Under the Hood of the Testarossa JIT Compiler

25

• Methods almost always start out running in interpreter– Interpreter simulates the Java Virtual Machine– Uses a ”program counter” (pc) to point at the current bytecode– Conceptually just a loop loading and simulating bytecode at *pc

do {

switch (*pc) {…

case BCdup :

t=pop();push(t); push(t); pc++; break;

}} while (!finishedProgram());

J9 JVM: methods start off interpreted

Page 26: Under the Hood of the Testarossa JIT Compiler

26

• Remember: the interpreter has to handle all the hard stuff!• It is a switch loop

– But uses computed goto’s– Deal with exceptions– Deal with all the various things that can go wrong– Does some profiling– Counts method invocations to trigger JIT compilations– …

• More info in Dan Heidinga’s talk tomorrow on the J9 interpreterTuesday @ 12:30 in Continental Ballroom 1/2/3

OK, it’s more complicated than that

Page 27: Under the Hood of the Testarossa JIT Compiler

27

Interpreter helps JIT compiler do a good job

Thread

BytecodeInterpreter

VM State

Native State

Java Stack

pc

Method Bytecodes…15: ificmpne 29…23: instanceof…29: invokev <C.foo>…

sp

J9 JVM

Page 28: Under the Hood of the Testarossa JIT Compiler

28

Interpreter collects profilesThread

BytecodeInterpreter

VM State

Native State

Java Stack

pc

Method Bytecodes…15: ificmpne 29…23: instanceof…29: invokev <C.foo>…

sp

Thread Profile Buffer

- Branch directions- Actual classes- Invocation targets

Per thread buffer: no mutex!

Buffer is an event tracemethod,bytecode locatordata (e.g. receiver class)

Very easy to store and bumpcursor into the buffer

J9 JVM

Page 29: Under the Hood of the Testarossa JIT Compiler

29

Threads collect into buffer until fullThread 1

Profile Buffer A

Thread 2

Profile Buffer B

Thread 3

Profile Buffer C

Thread 4

Profile Buffer D

J9 JVM

Page 30: Under the Hood of the Testarossa JIT Compiler

30

When buffer fills, put onto a queue

ProfileBufferQueue

A

Thread 1

Profile Buffer E

Thread 2

Profile Buffer B

Thread 3

Profile Buffer C

Thread 4

Profile Buffer D

J9 JVM

Only one queue, so needs a mutex

But only held when buffers fill and only to enqueue/dequeue

Impact tunable with buffer sizeTrade-off: lag for profile data, footprint

Page 31: Under the Hood of the Testarossa JIT Compiler

31

Enqueue, allocate new buffer, keep going

ProfileBufferQueue

J9 JVM

AC

Thread 1

Profile Buffer E

Thread 2

Profile Buffer B

Thread 3

Profile Buffer F

Thread 4

Profile Buffer D

Queue decouples profile collection from profile aggregation

Pool of empty buffers reduces allocation stress

Page 32: Under the Hood of the Testarossa JIT Compiler

32

Another thread processes buffers

ProfileBufferQueue

C

BufferProcessing

Thread

AggregatedProfile

Data Structure

Thread 1

Profile Buffer G

Thread 2

Profile Buffer B

Thread 3

Profile Buffer F

Thread 4

Profile Buffer D

J9 JVM

AE

Iterate through trace, adding entries one by one to profile

Page 33: Under the Hood of the Testarossa JIT Compiler

33

JIT threads read&write aggregated profile

ProfileBufferQueue

BufferProcessing

Thread

AggregatedProfile

Data Structure

JITThread

1

JITThread

N

Thread 1

Profile Buffer G

Thread 2

Profile Buffer B

Thread 3

Profile Buffer F

Thread 4

Profile Buffer D

J9 JVM

EC

Aggregated profile also requires a mutex!

Page 34: Under the Hood of the Testarossa JIT Compiler

34

1. Invocation count while interpreted used for initial compilation• When a method’s count reaches zero, trigger method compile

2. Sampling thread• Periodically (10ms or so) ask active threads to sample themselves• If a method catches enough samples over time: trigger method recompile• Samples in interpreted methods dramatically reduce invocation count

How do those JIT threads get work?

Page 35: Under the Hood of the Testarossa JIT Compiler

35

• “trigger” just means to enqueue a method on compilation queue– Based on current conditions, select an optimization plan– May already be queued, may be queued with different plan

• Testarossa compilations are (mostly) asynchronous– Application thread continues running after enqueing the method

• Testarossa can employ multiple compilation threads– Dynamically resized pool based on compilation load, # cores,

configuration (e.g. how important is memory vs. ramp-up speed?)

Triggering a compilation

Page 36: Under the Hood of the Testarossa JIT Compiler

36

• Compiler thread dramatically oversimplified algorithm:while (!done) {

method = getNextMethodFromQueue();if (sharedClassesCache->hasAOTCompiledMethod(method))

… = loadAotCompiledMethod(method);else

… = compile(method); // may store AOT code to cachecommitCompiledMethod( … );

}

• You have questions, I know…

What does a compilation thread do?

Page 37: Under the Hood of the Testarossa JIT Compiler

37

• Compiler thread dramatically oversimplified algorithm:while (!done) {

method = getNextMethodFromQueue();if (sharedClassesCache->hasAOTCompiledMethod(method))

… = loadAotCompiledMethod(method);else

… = compile(method); // may store AOT code to cachecommitCompiledMethod( … );

}

• You have questions, I know…– Let’s start by explaining the compiler itself

The real work: the compiler thread

Page 38: Under the Hood of the Testarossa JIT Compiler

38

ARM

Testarossa Compilation Process

Optimizer

AnalysesandOptimizations

cold warm hot FSDscorching AOT

ILGeneration

x86POWERZ

CodeGenerators

RuntimeEnvironment/Configuration

•Options

•ObjectModel

•Memory

•Threading

•Tracing

codeMetadataRuntimeRT Helpers

very hot profiling

Profile Manager

Hardwarecounters

SamplingThread

InterpreterProfile Info

JIT Profile Info

Profiler

Page 39: Under the Hood of the Testarossa JIT Compiler

39

Convert the method’s bytecodes to Testarossa’s Intermediate Language (IL)

Have slides but not enough time LCome talk to me if you’re interested!

First step: IL Generation

Page 40: Under the Hood of the Testarossa JIT Compiler

40

• IL generator focuses on correctness

• Strive to avoid complexity for performance– *striving* not always successful

• Rely on the optimizer to make it fast

Second Step: Make the IL Better

Page 41: Under the Hood of the Testarossa JIT Compiler

41

• About 70 basic optimizations

• Three high level categories:1. Traditional compiler optimizations requiring little adaptation for Java

e.g. reaching definitions, block ordering, expression simplification, …2. Traditional compiler optimizations with Java adaptation

e.g. inlining, partial redundancy elimination, loop versioning, auto parallelization (SIMD, GPU), …

3. Optimizations developed for Javae.g. escape analysis, monitor coarsening, async check insertion, …

Testarossa Optimizations

Page 42: Under the Hood of the Testarossa JIT Compiler

42

• Strategy is just a sequence of individual optimizations– Contain groups which can be repeated or looped– Opts can be conditional on earlier opts finding/creating opportunities

• 6 strategies with increasing compilation cost & expected payback1. NoOpt not used by default2. Cold initial compile during startup3. Warm initial compile after startup or upgrade4. Hot methods consuming > ~1% of CPU5. Very Hot with Profiling collect profile before a scorching compile6. Scorching methods consuming > ~12.5% of CPU

Optimization Strategies

Page 43: Under the Hood of the Testarossa JIT Compiler

43

• Testarossa has 4 main code generators:– X86 (32- and 64-bit)– POWER (32- and 64-bit, BE and LE)– Z (IBM mainframe) (31-bit and 64-bit)– ARM 32-bit

• Responsible for converting Testarossa IL into native instructions– Generate fast instruction sequences for current processor– Efficient assignment of registers– Layout of native stack frame– Other very detailed things based on intricate workings of processors

Third step: code generation

Page 44: Under the Hood of the Testarossa JIT Compiler

44

Such a simple idea:

Store JIT compiled code then“Just” load into another JVM

AOT compilation for Java

Page 45: Under the Hood of the Testarossa JIT Compiler

45

Compiled code is for method, and

Methods come from classes…

But it’s not so simple

Page 46: Under the Hood of the Testarossa JIT Compiler

46

But what is a ”class”?

C

B

A I1

I3

I2 A implements I1, I2 { … }

B extends A { … }

C extends B implements I3 { … }

Page 47: Under the Hood of the Testarossa JIT Compiler

47

Inside a JVM

C

B

A I1

I3

I2 Compiler and applications work on objects of resolved classes

e.g. C objects:embed a Bwhich embeds an A

And C implements I3 and I1, I2

class A

class B

class C

Page 48: Under the Hood of the Testarossa JIT Compiler

48

Outside a JVM: sea of class files

C extends a class called “B” and implements an

interface called “I3”

B extends a class called “A”

A implements interfaces called “I1” and “I2”

I1

I3

I2

src/directory1/A.classI1.classI2.class

src/directory2/A.classI1.classI2.class

src/directory3/B.classC.class

src/directory4/C.classI3.class

Page 49: Under the Hood of the Testarossa JIT Compiler

49

• Class files can change• Classpath can change• Class files can be added or removed

”Class” identity a very complicated notion

Page 50: Under the Hood of the Testarossa JIT Compiler

50

• Class files can change• Classpath can change• Class files can be added or removed• Class loader object used to load the class can change

– Ever heard of an application class loader object outside of a JVM?– Class loader objects (like other objects) don’t exist outside the JVM– Serialization doesn’t help: what to deserialize to replace what object?

• Two class loaders can even load the exact same class files to create two unique classes in a single JVM

• All perfectly valid scenarios under the JVM specification

And it even gets worse (!)

Page 51: Under the Hood of the Testarossa JIT Compiler

51

Seems grim, what can we do?

Page 52: Under the Hood of the Testarossa JIT Compiler

52

• We did it this way for a long time (embedded space and for WebSphere Real Time)– AOT code stored alongside binary loadable version of class files called JXEs (kind of like a jar file)

• Class references aren’t the only problem though– Compiled code also directly references addresses in the JVM– e.g. Pointers to constant pools, pointers to ”ROM” parts of classes (see Dan Heidinga’s talk!)– e.g. Pointers to helper functions in JIT runtime

• Code generator also builds relocation records alongside the code– e.g. at code offset 0x208 is the address of the compiled method’s class’s constant pool– e.g. at code offset 0x4C3 is the 4 byte relative address of JIT helper jitNewObject()

• At class load time, process relocations to bind code into current JVM process

First cut: treat everything as unresolved

Page 53: Under the Hood of the Testarossa JIT Compiler

53

• Our shared classes cache (SCC) debuted in Java 5.0– Shared memory region mapped into every JVM process– Accelerates start-up by speeding up class loading– By itself, accelerated app server start-up by 20-30%

• Also created an opportunity to use AOT code “dynamically”– SCC handles part of problem: “is this the same class I had before”– So: AOT compile in first JVM run, store into SCC, load in other JVMs

• For Java 6, we revamped our AOT compilation story– Made some improvements in code quality– Provide another roughly 20% start-up improvement

Next goal: use AOT to accelerate startup

Page 54: Under the Hood of the Testarossa JIT Compiler

54

Simplified class loading, no shared cache

C ROMClassC.class

JVM Process A

class B { … }; class C extends B { … };

B ROMClassB.class

B RAMClass

CRAMClass

Page 55: Under the Hood of the Testarossa JIT Compiler

55

Simplified class loading, no shared cache

C ROMClassC.class

JVM Process A

class B { … }; class C extends B { … };

B ROMClassB.class

B RAMClass

CRAMClass

C ROMClass

JVM Process BB ROMClass

B RAMClass

CRAMClass

Page 56: Under the Hood of the Testarossa JIT Compiler

56

Simplified class loading with shared cache

C.class

JVM Process A

class B { … }; class C extends B { … };

B.classB

RAMClass

CRAMClass

Shared Cache

C ROMClass

B ROMClass

Page 57: Under the Hood of the Testarossa JIT Compiler

57

Simplified class loading with shared cache

C.class

JVM Process A

class B { … }; class C extends B { … };

B.classB

RAMClass

CRAMClass

JVM Process BB RAMClass

CRAMClass

Shared Cache

Shared Cache

C ROMClass

B ROMClass

C ROMClass

B ROMClass

Memory mapped

Page 58: Under the Hood of the Testarossa JIT Compiler

58

How did we make AOT betterwith the shared class cache?

Page 59: Under the Hood of the Testarossa JIT Compiler

59

• Start-up scenario: usually running the same code over and over– Anything you learn in first run *probably* applies in second run too

• Some optimizations are clearly ok for AOT:– e.g. Block ordering uses block frequencies to rearrange code nicely– Different profile in second run? Ok, it runs a bit more slowly– But usually, the profile is incredibly similar

• Can also rely on some tricks:– Any information local to this method or this class (fields, methods)– Shared cache gave us a way to identify and check other methods

Dynamic AOT to accelerate start-up

Page 60: Under the Hood of the Testarossa JIT Compiler

60

• Some direct calls can just be inlined– Direct call to, say, this class’s constructor

• Inline more direct calls using virtual guard infrastructure– AOT compile optimistically generates guard as a NOP– AOT load evaluates the guard at AOT load time (via relocation record)– Turn NOP into a jump to an unresolved call if relocation record fails

• Shared classes cache helps to inline virtual calls from “this”– Can reason about the vtable of the class of the compiled method

Inlining for AOT methods

Page 61: Under the Hood of the Testarossa JIT Compiler

61

Using the vtable for virtual “this” calls

Class C J9Method ROMMethod

B.foo()

class B { public void foo() {…} } class C extends B { void bar() { this.foo(); } }

Resolved “B.foo()” Foo() from

B.class

Resolved C vtable

JVM Process 1

Page 62: Under the Hood of the Testarossa JIT Compiler

62

No SCC: are B.foo and B’.foo same? No idea!

Class C J9Method ROMMethod

B.foo()

Resolved “B.foo()” Foo() from

B.class

Resolved C vtable

JVM Process 1

Class C’ J9Method ROMMethod

B’.foo()

Resolved “B’.foo()”

Foo() from B’.class

Resolved C’ vtable

JVM Process 2

class B { public void foo() {…} } class C extends B { void bar() { this.foo(); } }

Page 63: Under the Hood of the Testarossa JIT Compiler

63

SCC : B.foo, B’.foo same? Can answer!

Class C J9Method ROMMethod

B.foo()

Resolved “B.foo()” Foo() from

B.class

Resolved C vtable

Class C’ J9MethodB’.foo()

Resolved “B’.foo()”

Resolved C’ vtable

JVM Process 1

JVM Process 2

ROMMethod

Foo() from B.class

SCC

SCC

SameOffset!

class B { public void foo() {…} } class C extends B { void bar() { this.foo(); } }

Page 64: Under the Hood of the Testarossa JIT Compiler

64

• ROMMethod includes the bytecodes– If class’s vtable has a J9Method with the right ROMMethod, then the

right bytecodes will be inlined– Still need to be careful about other code aspects e.g. field offsets– But you know you got the same method implementation

• Just like the JIT:– Need to check to make sure there isn’t another possible target– Need to register runtime assumptions against future class loads

• Still wrap the inlined code in a guard resolved at AOT load time– If not the right or only target: back off to a virtual invocation

Only needs to be same “enough”

Page 65: Under the Hood of the Testarossa JIT Compiler

65

• Profile guard: C.method profiled as most common targetif (o.clazz == <common receiver class C address>)

{ /* inlined C.method() */ }

elseo.method();

• C needs to be a resolved class• Typically used for interface invokes

– Not as straight-forward as vtable

But we needed something stronger

Page 66: Under the Hood of the Testarossa JIT Compiler

66

• List of super classes and implemented interfaces for a class– Every one must have a ROMClass in the shared cache– AOT compiles record “validation relocation” for every referenced

resolved class (offset of a class chain in the SCC)– AOT loads walk class chains in parallel with resolved classes in

current JVM– Anything not right: bail and requeue method as JIT compile

• Still one challenge though:– How to look up the resolved class pointer for “some class” ?– Need a class loader to do that!

We implemented “class chains”

Page 67: Under the Hood of the Testarossa JIT Compiler

67

How can you finda class loader object in this JVM

that corresponds tothe “same” class loader object from another JVM?

Exercise for the audience

Page 68: Under the Hood of the Testarossa JIT Compiler

68

How can you finda class loader object in this JVM

that corresponds tothe “same” class loader object from another JVM?

I don’t have time today to tell you how we did itL

Come talk to me if you’re really interested!

Exercise for the audience

Page 69: Under the Hood of the Testarossa JIT Compiler

69

• Modularity work in JDK9 opening up interesting opportunities• Possibility to AOT compile entire modules• Sounds awesome but not a straight-forward win:

– Typically don’t know much about execution profile at load time– AOT code is generally much larger than bytecodes (10X footprint)– Generality/flexibility of JDK libraries could hurt us if not careful

• Locales, etc. not used in all runs but maybe in some run• Some interesting new possible optimization opportunities

– But remember the JIT design principles!

Where do we go with AOT?

Page 70: Under the Hood of the Testarossa JIT Compiler

70

• IBM Runtimes are going open source– 800KLOC already contributed to Eclipse OMR project for all runtimes– Working on the remainder in and around Java 9 development– You’re welcome to join us at Eclipse OMR and, later, Open J9 !– Any feedback welcome!

• Testarossa is a high performance, modular compiler technology– 500KLOC now open sourced at Eclipse OMR– Provides steady and significant performance uplift (through effort!)– Around 70 optimizations with code generators for 4 hardware platforms– Deep dove into Testarossa’s AOT compilation technology

Wrap Up

Page 71: Under the Hood of the Testarossa JIT Compiler

71

• Mark Stoodley [email protected] @mstoodle• Eclipse OMR www.eclipse.org/omr www.github.com/eclipse/omr

• Other J9 developer talks at Java One– Dan Heidinga on Tuesday at 2:30 in Continental Ballroom 1/2/3– Charlie Gracie on Wednesday at 10am in Golden Gate 2/3

• Visit me and other J9 devs at the IBM Booth– I’ll be there tomorrow morning at 9:30am

• I will also be at the Eclipse booth Tuesday at about 4pm - 5:30pm

Thank You!

Page 72: Under the Hood of the Testarossa JIT Compiler

72

Legal NoticeIBM and the IBM logo are trademarks or registered trademarks of IBM Corporation, in the United

States, other countries or both. Java and all Java-based marks, among others, are trademarks or registered trademarks of Oracle in

the United States, other countries or both. Other company, product and service names may be trademarks or service marks of others. THE INFORMATION DISCUSSED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL

PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION, IT IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, AND IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, SUCH INFORMATION. ANY INFORMATION CONCERNING IBM'S PRODUCT PLANS OR STRATEGY IS SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.