26

Compiling Scala Faster with GraalVM - Lightbend...•Scala compilation with GraalVM 1.3x-1.5x faster •Native images of Scala programs –Fast startup and low footprint –Faster

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Compiling Scala Faster with GraalVM - Lightbend...•Scala compilation with GraalVM 1.3x-1.5x faster •Native images of Scala programs –Fast startup and low footprint –Faster
Page 2: Compiling Scala Faster with GraalVM - Lightbend...•Scala compilation with GraalVM 1.3x-1.5x faster •Native images of Scala programs –Fast startup and low footprint –Faster

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Compiling Scala Faster with GraalVM

Christian Wimmer, Vojin JovanovicVM Research Group, Oracle Labs

Page 3: Compiling Scala Faster with GraalVM - Lightbend...•Scala compilation with GraalVM 1.3x-1.5x faster •Native images of Scala programs –Fast startup and low footprint –Faster

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Safe Harbor Statement

The following is intended to provide some insight into a line of research in Oracle Labs. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described in connection with any Oracle product or service remains at the sole discretion of Oracle. Any views expressed in this presentation are my own and do not necessarily reflect the views of Oracle.

3

Page 4: Compiling Scala Faster with GraalVM - Lightbend...•Scala compilation with GraalVM 1.3x-1.5x faster •Native images of Scala programs –Fast startup and low footprint –Faster

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 4

GraalVM: Run Programs Faster Anywhere

More information at:

https://www.graalvm.org

Page 5: Compiling Scala Faster with GraalVM - Lightbend...•Scala compilation with GraalVM 1.3x-1.5x faster •Native images of Scala programs –Fast startup and low footprint –Faster

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Graal: One Compiler for Managed Languages

• Written in Java – Eases development and maintenance

• Modular architecture– Configurable compiler phases

– Compiler-VM separation: snippets, provider interfaces

• Designed for speculative optimizations and deoptimization–Metadata for deoptimization is propagated through all optimization phases

• Designed for exact garbage collection– Read/write barriers, pointer maps for garbage collector

• Aggressive high-level optimizations

5

Page 6: Compiling Scala Faster with GraalVM - Lightbend...•Scala compilation with GraalVM 1.3x-1.5x faster •Native images of Scala programs –Fast startup and low footprint –Faster

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 6

Graal: Partial Escape Analysis (1)

public static Car getCached(int hp, String name) {

Car car = new Car(hp, name, null);

Car cacheEntry = null;

for (int i = 0; i < cache.length; i++) {

if (car.hp == cache[i].hp &&

car.name == cache[i].name) {

cacheEntry = cache[i];

break;

}

}

if (cacheEntry != null) {

return cacheEntry;

} else {

addToCache(car);

return car;

}

}

Page 7: Compiling Scala Faster with GraalVM - Lightbend...•Scala compilation with GraalVM 1.3x-1.5x faster •Native images of Scala programs –Fast startup and low footprint –Faster

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 7

Graal: Partial Escape Analysis (2)

public static Car getCached(int hp, String name) {

Car cacheEntry = null;

for (int i = 0; i < cache.length; i++) {

if (hp == cache[i].hp &&

name == cache[i].name) {

cacheEntry = cache[i];

break;

}

}

if (cacheEntry != null) {

return cacheEntry;

} else {

Car car = new Car(hp, name, null);

addToCache(car);

return car;

}

}

new Car(...) escapes at:

— addToCache(car);

— return car;

Might be a very unlikely path

No allocation in frequent path

Page 8: Compiling Scala Faster with GraalVM - Lightbend...•Scala compilation with GraalVM 1.3x-1.5x faster •Native images of Scala programs –Fast startup and low footprint –Faster

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 8

Graal: Simulation Based Path Duplication (1)

def f(a: Int, b: Int, x: Array[Int]) = {var p: Int = _if (a > b) {

p = a } else {

p = 2 }x.length / p

}

Slow path

Fast path

Expensive operation

p = 2 in the false branch

Strength reduction:if we know p == 2 and x.length > 0x.length / 2 x.length >> 1

Page 9: Compiling Scala Faster with GraalVM - Lightbend...•Scala compilation with GraalVM 1.3x-1.5x faster •Native images of Scala programs –Fast startup and low footprint –Faster

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 9

Graal: Simulation Based Path Duplication (2)

def f(a: Int, b: Int, x: Array[Int]) = if (a > b) {

x.length / a; } else {

x.length >> 1; }

• Cost-benefit analysis for duplication based on a cost model– Benefit: (Latency(Div) - Latency(Shift)) * Probability = 31 * 0.9 = 27.9

– Cost: 5 instructions• Add 1 Additional Return (+ 4 Instructions) + 1 Additional Shift + 1 Additional Read

• Subtract 1 Jump from branch

One order of magnitude faster on Intel CPUs

Page 10: Compiling Scala Faster with GraalVM - Lightbend...•Scala compilation with GraalVM 1.3x-1.5x faster •Native images of Scala programs –Fast startup and low footprint –Faster

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 10

Benchmarks: The Scala 2.12.6 Compiler

• Benchmarks from the Scala compiler benchmark suite

1 1 11.11 1.16 1.13

1.47 1.42 1.42

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

scalap vector re2s

Speedup

HotSpot

GraalVM CE

GraalVM EE

Page 11: Compiling Scala Faster with GraalVM - Lightbend...•Scala compilation with GraalVM 1.3x-1.5x faster •Native images of Scala programs –Fast startup and low footprint –Faster

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 11

Benchmarks: Building Scala Projects with SBT

• sbt compile step with a warmed up VM

1 1 11.07 1.12 1.12

1.261.35 1.35

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Shapeless Scalac 2.13 Akka

Speedup

HotSpot

GraalVM CE

GraalVM EE

Page 12: Compiling Scala Faster with GraalVM - Lightbend...•Scala compilation with GraalVM 1.3x-1.5x faster •Native images of Scala programs –Fast startup and low footprint –Faster

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 12

Using GraalVM for Scala

• GraalVM comes in two versions– GraalVM CE can be used freely by anyone

– GraalVM EE free for evaluation and non-production usage

• GraalVM EE can be downloaded from

https://www.graalvm.org/downloads/

• To enable GraalVM for Scala compilation:

sbt --java-home <path-to-graal-vm>

Page 13: Compiling Scala Faster with GraalVM - Lightbend...•Scala compilation with GraalVM 1.3x-1.5x faster •Native images of Scala programs –Fast startup and low footprint –Faster

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 13

Managed Runtimes: Slow Startup and High Footprint

• Slow startup and high footprint comes from– Class loading

– Bytecode interpretation or baseline compilation

– Just-in-time compilation

Program Time Instructions Memory

“Hello, World!” in C 0.005s 154,127 450 KByte

“Hello, World!” in Scala 0.109s 162,673,275 25,000 KByte

“Hello, World!” in JS on the JVM

1.268s 3,272,118,178 120,000 KByte

Page 14: Compiling Scala Faster with GraalVM - Lightbend...•Scala compilation with GraalVM 1.3x-1.5x faster •Native images of Scala programs –Fast startup and low footprint –Faster

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Native Image: Execution Model

14

Ahead-of-Time

Compilation

Points-To Analysis

Substrate VM

Polyglot JVM Program

Language Runtimes

Reachable methods, fields, and classes

Machine Code

Initial Heap

All classes from the user application,

all language runtimes,and Substrate VM

Application or shared library running without dependency on JDK and without Java class loading

DWARF Info

ELF / MachO Binary

Page 15: Compiling Scala Faster with GraalVM - Lightbend...•Scala compilation with GraalVM 1.3x-1.5x faster •Native images of Scala programs –Fast startup and low footprint –Faster

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 15

Native Image: Startup

• Bytecode compiled with Graal in AOT mode– produced assembly contains no profiling code, only application logic

• Startup performance comparable to C

Program Time Instructions Memory

“Hello, World!” in C 0.005s 154,127 450 KByte

“Hello, World!” in Scala(Native Image)

0.006s 232,122 780 KByte

“Hello, World!” in JS (Native Image)

0.028s 520,000 3,900 KByte

Page 16: Compiling Scala Faster with GraalVM - Lightbend...•Scala compilation with GraalVM 1.3x-1.5x faster •Native images of Scala programs –Fast startup and low footprint –Faster

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Native Image: Profile-Guided Optimizations (PGO)

• Graal compiler is built ground-up with profiles in mind– Collecting profiles is essential for performance of native images

• PGO requires running relevant workloads before building an image

16

native-image --pgo-instrument

Instrumented Binary

native-image --pgo Optimized Binary

Profiles (.iprof)

Relevant Workloads

Page 17: Compiling Scala Faster with GraalVM - Lightbend...•Scala compilation with GraalVM 1.3x-1.5x faster •Native images of Scala programs –Fast startup and low footprint –Faster

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Native Image: Performance of Cold Scala Compilation

17

• Benchmarks from the Scala compiler benchmark suite

1 1 1

6.72

13.67

8.98

0

2

4

6

8

10

12

14

16

scalap vector re2s

Speedup

HotSpot

Native PGO

Page 18: Compiling Scala Faster with GraalVM - Lightbend...•Scala compilation with GraalVM 1.3x-1.5x faster •Native images of Scala programs –Fast startup and low footprint –Faster

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Native Image: Performance of Hot Scala Compilation

18

1 1 1

1.161.24 1.24

1.19 1.22 1.25

0

0.2

0.4

0.6

0.8

1

1.2

1.4

scalap vector re2s

Speedup

HotSpot

HotSpot Comp. Point.

Native PGO

• Benchmarks from the Scala compiler benchmark suite

Page 19: Compiling Scala Faster with GraalVM - Lightbend...•Scala compilation with GraalVM 1.3x-1.5x faster •Native images of Scala programs –Fast startup and low footprint –Faster

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Native Image: Limitations

• Compiled programs must be known ahead of time– Dynamically loaded classes must be known during image build

– invokedynamic will not work in general

– No bytecode generation at runtime

• Currently not implemented– Parts of the JDK, e.g., Swing and AWT

19

Page 20: Compiling Scala Faster with GraalVM - Lightbend...•Scala compilation with GraalVM 1.3x-1.5x faster •Native images of Scala programs –Fast startup and low footprint –Faster

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

What about Scala Macros?

• Scala macros use dynamic class loading– Dynamically loaded by scalac

– Not always known at image build time

• Current solution– Build a custom native executable for a project with all macros included

20

Page 21: Compiling Scala Faster with GraalVM - Lightbend...•Scala compilation with GraalVM 1.3x-1.5x faster •Native images of Scala programs –Fast startup and low footprint –Faster

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Native Image: C Interoperability for Java

#include <time.h>@CContext(PosixDirectives.class)

#define CLOCK_MONOTONIC 1

struct timespec {__time_t tv_sec;__syscall_slong_t tv_nsec;

};

int* pint;

int** ppint;

@CConstant static native int CLOCK_MONOTONIC();

@CPointerTo(nameOfCType="int") interface CIntPointer extends PointerBase {int read();void write(int value);

}

@CPointerTo(CIntPointer.class) interface CIntPointerPointer ...

-lrt@CLibrary("rt")

@CStruct interface timespec extends PointerBase {@CField long tv_sec();@CField long tv_nsec();

}

int clock_gettime(clockid_t __clock_id, struct timespec *__tp)@CFunction static native int clock_gettime(int clock_id, timespec tp);

21

static long nanoTime() {timespec tp = StackValue.get(SizeOf.get(timespec.class));clock_gettime(CLOCK_MONOTONIC(), tp);return tp.tv_sec() * 1_000_000_000L + tp.tv_nsec();

}

Implementation of System.nanoTime() using with C interop:

Page 22: Compiling Scala Faster with GraalVM - Lightbend...•Scala compilation with GraalVM 1.3x-1.5x faster •Native images of Scala programs –Fast startup and low footprint –Faster

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Scala Native via Native Image

• Scala Native provides an idiomatic interface

• Scala Native can be implemented via GraalVM– Translate Scala Native intrinsics into GraalVM intrinsics

– Can be implemented as a simple compiler plugin

• All JVM libraries would become available with GraalVM– No need to re-write parts of the JVM ecosystem

22

Page 23: Compiling Scala Faster with GraalVM - Lightbend...•Scala compilation with GraalVM 1.3x-1.5x faster •Native images of Scala programs –Fast startup and low footprint –Faster

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Performance of Scala Native vs. Native Image

• Scala Native Benchmarks running with the immix GC

23

1 1 1 1 1 10.96

1.21

0.971.09

0.810.96

2.02

1.34

1.13

1.37

1.191.31

0

0.5

1

1.5

2

2.5

DeltaBlue Tracer GCBench Json Bounce Richards

Speedup

Scala Native 0.3.7

Native Image

Native Image PGO

Page 24: Compiling Scala Faster with GraalVM - Lightbend...•Scala compilation with GraalVM 1.3x-1.5x faster •Native images of Scala programs –Fast startup and low footprint –Faster

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

GraalVM: Run Programs Faster Anywhere

• Scala compilation with GraalVM 1.3x-1.5x faster

• Native images of Scala programs– Fast startup and low footprint

– Faster than HotSpot JIT compiled code

• Try it today– https://www.graalvm.org

• Demos available on GitHub– https://github.com/graalvm/graalvm-demos/scala-days-2018

• Blog article explaining the demo and the results– https://medium.com/graalvm

24

Page 25: Compiling Scala Faster with GraalVM - Lightbend...•Scala compilation with GraalVM 1.3x-1.5x faster •Native images of Scala programs –Fast startup and low footprint –Faster

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 25

Page 26: Compiling Scala Faster with GraalVM - Lightbend...•Scala compilation with GraalVM 1.3x-1.5x faster •Native images of Scala programs –Fast startup and low footprint –Faster