Upload
tvaleev
View
845
Download
3
Embed Size (px)
Citation preview
1
Что же мы измеряем?
Валеев ТагирИнститут систем информатики СО РАН
2
Disclaimer
Ямогу
врать!
4
Насколько Stream API медленнее?static long sumTwice(int max) { long sum = 0; for(int i=1; i<=max; i++) sum+=i*2; return sum;}
static long sumTwiceStream(int max) { return IntStream.rangeClosed(1, max) .mapToLong(x -> x*2).sum();}
5
Спрашивает StackOverflowhttp://stackoverflow.com/q/31761271/4856258 (удалено автором)Predicate<Integer> в Java 8 быстрее, чем IntPredicateВ Java 8 предлагается использовать IntPredicate вместо Predicate<Integer>и аналогично для других примитивных типов, так как таким образом можно избавиться от накладных расходов на автобоксинг, но когда я запускаю нижеследующий код, я получаю совершенно противоположный результат: на моей системе IntPredicate в 30-50 раз медленее, чем Predicate.
In Java 8 Predicate<Integer> is faster than IntPredicateIn Java 8 it is suggested that we should use IntPredicate rather than Predicate<Integer> and same for other premitive types as former one reduces the overhead related to autoboxing but when i run the following code. I get results shockingly opposite as IntPredicate is 30-50 times slower than Predicate on my system.
6
Спрашивает StackOverflowLong start = System.currentTimeMillis();IntPredicate evenNumPredicate = (int i) -> i % 2 == 0;evenNumPredicate.test(1000);System.out.println(System.currentTimeMillis()-start);
start = System.currentTimeMillis();Predicate<Integer> evenNumPredicate1 = (Integer i) -> i % 2 == 0;evenNumPredicate1.test(1000);System.out.println(System.currentTimeMillis()-start);
7
Реакция сообществаЭто потрясающе плохой бенчмарк. В него можно внести несколько кардинальных улучшений, и это всё равно будет плохой бенчмарк. – Marko Topolnik
This benchmark is shockingly bad. It could be improved in several significant ways and still be a bad benchmark.
8
Наивнякpublic static void main(String[] args) { long startSimple = System.nanoTime(); long resultSimple = sumTwice(10_000_000); long endSimple = System.nanoTime(); System.out.printf("Simple: %d; time=%8.3fms%n", resultSimple, (endSimple-startSimple)/1_000_000.0);
long startStream = System.nanoTime(); long resultStream = sumTwiceStream(10_000_000); long endStream = System.nanoTime(); System.out.printf("Stream: %d; time=%8.3fms%n", resultStream, (endStream-startStream)/1_000_000.0);}
9
Результаты наивнякаSimple: 100000010000000; time= 8.286msStream: 100000010000000; time= 57.774ms-XintSimple: 100000010000000; time= 136.347msStream: 100000010000000; time=1647.296ms-XX:-UseOnStackReplacementSimple: 100000010000000; time= 150.321msStream: 100000010000000; time= 374.367ms-XX:-UseOnStackReplacement -XX:-UseLoopCounterSimple: 100000010000000; time= 136.584msStream: 100000010000000; time= 364.105ms
10
Насколько Stream API медленнее?static long sumTwice(int max) { long sum = 0; for(int i=1; i<=max; i++) sum+=i*2; return sum;}
static long sumTwiceStream(int max) { return IntStream.rangeClosed(1, max) .mapToLong(x -> x*2).sum();}
11
Интерпретатор и JIT-компилятор
12
Результаты наивнякаSimple: 100000010000000; time= 8.286msStream: 100000010000000; time= 57.774ms-XintSimple: 100000010000000; time= 136.347msStream: 100000010000000; time=1647.296ms-XX:-UseOnStackReplacementSimple: 100000010000000; time= 150.321msStream: 100000010000000; time= 374.367ms-XX:-UseOnStackReplacement -XX:-UseLoopCounterSimple: 100000010000000; time= 136.584msStream: 100000010000000; time= 364.105ms
13
On-Stack Replacement (OSR)Замена на стеке
14
Результаты наивнякаSimple: 100000010000000; time= 8.286msStream: 100000010000000; time= 57.774ms-XintSimple: 100000010000000; time= 136.347msStream: 100000010000000; time=1647.296ms-XX:-UseOnStackReplacementSimple: 100000010000000; time= 150.321msStream: 100000010000000; time= 374.367ms-XX:-UseOnStackReplacement -XX:-UseLoopCounterSimple: 100000010000000; time= 136.584msStream: 100000010000000; time= 364.105ms
15
Результаты наивнякаSimple: 100000010000000; time= 8.286msStream: 100000010000000; time= 57.774ms-XintSimple: 100000010000000; time= 136.347msStream: 100000010000000; time=1647.296ms-XX:-UseOnStackReplacementSimple: 100000010000000; time= 150.321msStream: 100000010000000; time= 374.367ms-XX:-UseOnStackReplacement -XX:-UseLoopCounterSimple: 100000010000000; time= 136.584msStream: 100000010000000; time= 364.105ms
16
Stream-операции
return IntStream .rangeClosed(1, max) // создание Stream .mapToLong(x -> x*2) // промежуточная операция .sum(); // конечная операция
17
Иерархия вызовов StreamLongPipeline::sum
LongPipeline::reduceAbstractPipeline::evaluate
ReduceOp::evaluateSequentialAbstractPipeline::wrapAndCopyInto
AbstractPipeline::copyIntoSpliterator.OfInt::forEachRemaining
RangeIntSpliterator::forEachRemaining (цикл)IntPipeline$5$1::accept (mapToLong)
сгенерированный λ-классλ-функция x -> x*2
ReducingSink::accept (reduce)
Long::sumсгенерированный λ-класс
18
Иерархия вызовов StreamLongPipeline::sum
LongPipeline::reduceAbstractPipeline::evaluate
ReduceOp::evaluateSequentialAbstractPipeline::wrapAndCopyInto
AbstractPipeline::copyIntoSpliterator.OfInt::forEachRemaining
RangeIntSpliterator::forEachRemaining (цикл)IntPipeline$5$1::accept (mapToLong)
сгенерированный λ-классλ-функция x -> x*2
ReducingSink::accept (reduce)
Long::sumсгенерированный λ-класс
МАГИЯ
19
Иерархия вызовов StreamLongPipeline::sum
LongPipeline::reduceAbstractPipeline::evaluate
ReduceOp::evaluateSequentialAbstractPipeline::wrapAndCopyInto
AbstractPipeline::copyIntoSpliterator.OfInt::forEachRemaining
RangeIntSpliterator::forEachRemaining (цикл)IntPipeline$5$1::accept (mapToLong)
сгенерированный λ-классλ-функция x -> x*2
ReducingSink::accept (reduce)
Long::sumсгенерированный λ-класс
МАГИЯ
20
Результаты наивнякаSimple: 100000010000000; time= 8.286msStream: 100000010000000; time= 57.774ms-XintSimple: 100000010000000; time= 136.347msStream: 100000010000000; time=1647.296ms-XX:-UseOnStackReplacementSimple: 100000010000000; time= 150.321msStream: 100000010000000; time= 374.367ms-XX:-UseOnStackReplacement -XX:-UseLoopCounterSimple: 100000010000000; time= 136.584msStream: 100000010000000; time= 364.105ms
21
JMH• Инструкция и примеры здесь:
http://openjdk.java.net/projects/code-tools/jmh/• $ mvn archetype:generate \
-DinteractiveMode=false -DarchetypeGroupId=org.openjdk.jmh \ -DarchetypeArtifactId=jmh-java-benchmark-archetype \ -DgroupId=org.sample -DartifactId=test -Dversion=1.0
• $ mvn clean install• pom.xml: <javac.target>1.6</javac.target>
22
JMH benchmarkpublic class MyBenchmark { @Benchmark public long stream() { return sumTwiceStream(10_000_000); }
@Benchmark public long simple() { return sumTwice(10_000_000); } ...}
23
JMH benchmark@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)@Measurement(iterations = 10, time = 1, timeUnit =
TimeUnit.SECONDS)@BenchmarkMode(Mode.AverageTime)@OutputTimeUnit(TimeUnit.MILLISECONDS)@Fork(5)@State(Scope.Benchmark)public class MyBenchmark { ...}
[jmhtest/target]$ java –jar benchmark.jar >out.txt
24
JMH benchmark – результаты # JMH 1.11.1 (released 7 days ago)# VM version: JDK 1.8.0_60, VM 25.60-b23...Benchmark Mode Cnt Score Error UnitsMyBenchmark.simple avgt 50 4.535 ± 0.009 ms/opMyBenchmark.stream avgt 50 4.123 ± 0.009 ms/op____________________________________________________________Simple: 100000010000000; time= 8.286msStream: 100000010000000; time= 57.774ms
25
JMH benchmark – результаты # JMH 1.11.1 (released 7 days ago)# VM version: JDK 1.8.0_60, VM 25.60-b23...Benchmark Mode Cnt Score Error UnitsMyBenchmark.simple avgt 50 4.535 ± 0.009 ms/opMyBenchmark.stream avgt 50 4.123 ± 0.009 ms/op____________________________________________________________Simple: 100000010000000; time= 8.286msStream: 100000010000000; time= 57.774ms
26
JMH – параметризуем@Param({"100000", "1000000", "10000000"})private int n;
@Benchmarkpublic long stream() { return sumTwiceStream(n);}
@Benchmarkpublic long simple() { return sumTwice(n);}
27
JMH с параметром – результатыBenchmark (n) Score Error UnitsMyBenchmark.simple 100000 0.048 ± 0.001 ms/opMyBenchmark.simple 1000000 0.442 ± 0.001 ms/opMyBenchmark.simple 10000000 4.608 ± 0.012 ms/opMyBenchmark.stream 100000 0.567 ± 0.002 ms/opMyBenchmark.stream 1000000 5.776 ± 0.025 ms/opMyBenchmark.stream 10000000 4.079 ± 0.012 ms/op
28
n = 1_000_000# Warmup Iteration 1: 0.446 ms/op# Warmup Iteration 2: 0.414 ms/op# Warmup Iteration 3: 1.144 ms/op# Warmup Iteration 4: 5.729 ms/op# Warmup Iteration 5: 5.792 ms/opIteration 1: 5.821 ms/opIteration 2: 5.751 ms/opIteration 3: 5.733 ms/op...Iteration 9: 5.787 ms/opIteration 10: 5.829 ms/op
29
n = 10_000_000# Warmup Iteration 1: 4.342 ms/op# Warmup Iteration 2: 4.132 ms/op# Warmup Iteration 3: 4.131 ms/op# Warmup Iteration 4: 4.113 ms/op# Warmup Iteration 5: 4.083 ms/opIteration 1: 4.134 ms/opIteration 2: 4.121 ms/opIteration 3: 4.127 ms/op...Iteration 9: 4.099 ms/opIteration 10: 4.124 ms/op
30
n = 10_000_000[jmhtest/target]$ java –jar benchmark.jar –i 20 >out.txt# Warmup Iteration 1: 4.360 ms/op# Warmup Iteration 2: 4.135 ms/op...# Warmup Iteration 5: 4.084 ms/opIteration 1: 4.131 ms/opIteration 2: 4.111 ms/op...Iteration 16: 4.114 ms/opIteration 17: 4.114 ms/opIteration 18: 19.181 ms/opIteration 19: 57.917 ms/opIteration 20: 57.899 ms/op
31
Наивняк с цикломpublic static void main(String[] args) { for(int i=0; i<6000; i++) { long start = System.nanoTime(); long result = sumTwiceStream(10_000_000); long end = System.nanoTime(); System.out.printf("#%d: %d; time=%8.3fms%n", i, result, (end-start)/1_000_000.0); }}
32
Наивняк с циклом – результат#0: 100000010000000; time= 63.484ms#1: 100000010000000; time= 10.328ms#2: 100000010000000; time= 3.931ms#3: 100000010000000; time= 4.051ms…#5630: 100000010000000; time= 3.938ms#5631: 100000010000000; time= 4.154ms#5632: 100000010000000; time= 4.150ms#5633: 100000010000000; time= 3.955ms#5634: 100000010000000; time= 4.011ms#5635: 100000010000000; time= 58.184ms#5636: 100000010000000; time= 58.058ms#5637: 100000010000000; time= 57.024ms
33
-XX:+PrintCompilation 77 1 3 java.lang.String::equals (81 bytes) 78 2 3 java.lang.String::hashCode (55 bytes)... 80 13 n 0 java.lang.System::arraycopy (native) (static)... 81 16 s 3 java.lang.StringBuffer::append (13 bytes)...106 68 ! 3 java.lang.ref.ReferenceQueue::poll (28 bytes)...141 221 % 4 ...$RangeIntSpliterator::forEachRemaining @ 34 (65
bytes)144 219 % 3 ...$RangeIntSpliterator::forEachRemaining @ -2 (65
bytes) made not entrant
34
-XX:+PrintCompilation: читаем лог148 221 % 4 ...::forEachRemaining @ 34 (65 bytes)tstamp compile_id attrs comp_level name [@ osr_pos] (size) [status]• tstamp – время в миллисекундах с начала выполнения• compile_id – номер задачи на компиляцию в очереди• attrs – атрибуты• comp_level – номер уровня tier-компиляции• name – имя класса и метода• osr_pos – позиция в байткоде, на которой выполняется OSR• size – размер байткода метода в байтах (или “native”)• status – дополнительная информация о событии
35
-XX:+PrintCompilation: атрибуты148 221 % 4 ...::forEachRemaining @ 34 (65
bytes)tstamp compile_id attrs comp_level name [@ osr_pos] (size)
[status]• n – обёртка для native-метода (по факту не компиляция)• % – on-stack replacement• s – метод объявлен synchronized• ! – есть обработчик исключений• b – компиляция блокирует выполнение
36
-XX:+PrintCompilation: comp_level148 221 % 4 ...::forEachRemaining @ 34 (65
bytes)tstamp compile_id attrs comp_level name [@ osr_pos] (size)
[status]• 0 – none (интерпретатор / native-wrapper)• 1 – simple (C1-компилятор)• 2 – limited_profile (С1-компилятор с подсчётом вызовов и итераций циклов)• 3 – full_profile (уровень 2 плюс профилирование типов)• 4 – C2-компилятор С1 С2Интерпретатор С2
37
-XX:+PrintCompilation: status141 221 % 4 ...::forEachRemaining @ 34 (65 bytes)144 219 % 3 ...::forEachRemaining @ -2 (65 bytes) made not
entrant Вход воспрещён
38
-XX:+PrintCompilation: status• made zombie – метод не используется,
готов к удалению
39
-XX:+TraceNMethodInstallsnmethod — JIT-компилированный метод (OSR или обычный)
-XX:+UnlockDiagnosticVMOptions
59 1 3 java.lang.String::hashCode (55 bytes)Installing method (3) java.lang.String.hashCode()I 59 2 3 java.lang.String::equals (81 bytes)Installing method (3) java.lang.String.equals(Ljava/lang/Object;)Z 60 3 3 java.lang.String::indexOf (70 bytes) 60 6 n 0 java.lang.System::arraycopy (native)
(static)Installing method (3) java.lang.String.indexOf(II)I 61 4 3 java.lang.Object::<init> (1 bytes)Installing method (3) java.lang.Object.<init>()V
40
-XX:+PrintCompilation -XX:+TraceNMethodInstalls#5630: 100000010000000; time= 3.896ms 22939 567 4 Test::sumTwiceStream (21 bytes)#5631: 100000010000000; time= 4.327ms#5632: 100000010000000; time= 4.029ms#5633: 100000010000000; time= 4.170ms 22954 474 3 Test::sumTwiceStream (21 bytes) made not entrantInstalling method (4) Test.sumTwiceStream(I)J #5634: 100000010000000; time= 4.072ms 22956 568 4 j.u.s.LongPipeline$$...::applyAsLong (6 bytes) 22956 204 3 j.u.s.LongPipeline$$...::applyAsLong (6 bytes) made not entrantInstalling method (4) j.u.s.LongPipeline$$Lambda$2/142257191.applyAsLong(JJ)J #5635: 100000010000000; time= 58.784ms
41
Почему метод Test.sumTwiceStream перекомпилировался?
-XX:Tier4InvocationThreshold=5000
-XX:Tier4BackEdgeThreshold=40000
42
Инлайнингstatic long sumTwice(int max) { long sum = 0; for(int i=1; i<=max; i++) sum+=mult(i); return sum;}
static int mult(int x) { return x*2;}
static long sumTwice(int max) { long sum = 0; for(int i=1; i<=max; i++) sum+=i*2; return sum;}
-XX:+PrintInlining
43
-XX:+PrintInliningj.u.s.Streams$RangeIntSpliterator::forEachRemaining @ 34 (65 bytes) @ 44 j.u.s.IntPipeline$5$1::accept (23 bytes) inline (hot) \-> TypeProfile (55084/55084 counts) = j/u/s/IntPipeline$5$1 @ 12 …$$Lambda$1/321001045::applyAsLong (5 bytes) inline (hot) \-> TypeProfile (24272/24272 counts) = Test$$Lambda$1 @ 1 Test::lambda$sumTwiceStream$0 (5 bytes) inline (hot) @ 17 j.u.s.ReduceOps$8ReducingSink::accept (19 bytes) inline (hot) \-> TypeProfile (24272/24272 counts) = j/u/s/ReduceOps$8ReducingSink @ 10 …$$Lambda$2/303563356::applyAsLong (6 bytes) inline (hot) \-> TypeProfile (7376/7376 counts) = j/u/s/LongPipeline$$Lambda$2 @ 2 java.lang.Long::sum (4 bytes) inline (hot)
44
-XX:+PrintInliningj.u.s.Streams$RangeIntSpliterator::forEachRemaining @ 34 (65 bytes) @ 44 j.u.s.IntPipeline$5$1::accept (23 bytes) inline (hot) \-> TypeProfile (55084/55084 counts) = j/u/s/IntPipeline$5$1 @ 12 …$$Lambda$1/321001045::applyAsLong (5 bytes) inline (hot) \-> TypeProfile (24272/24272 counts) = Test$$Lambda$1 @ 1 Test::lambda$sumTwiceStream$0 (5 bytes) inline (hot) @ 17 j.u.s.ReduceOps$8ReducingSink::accept (19 bytes) inline (hot) \-> TypeProfile (24272/24272 counts) = j/u/s/ReduceOps$8ReducingSink @ 10 …$$Lambda$2/303563356::applyAsLong (6 bytes) inline (hot) \-> TypeProfile (7376/7376 counts) = j/u/s/LongPipeline$$Lambda$2 @ 2 java.lang.Long::sum (4 bytes) inline (hot)
45
JITWatch – compile chainhttps://github.com/AdoptOpenJDK/jitwatch
x -> x*2
Long::sum
46
Скомпилированный код (до 5600)LongPipeline::sum
LongPipeline::reduceAbstractPipeline::evaluate
ReduceOp::evaluateSequentialAbstractPipeline::wrapAndCopyInto
AbstractPipeline::copyIntoSpliterator.OfInt::forEachRemaining
RangeIntSpliterator::forEachRemaining (цикл)IntPipeline$5$1::accept (mapToLong)
сгенерированный λ-классλ-функция x -> x*2
ReducingSink::accept (reduce)
Long::sumсгенерированный λ-класс
sumTwiceStream C1
C1
C1
C1C1C1C1
C2
47
-XX:+PrintInlining@ 12 j.u.s.Streams$RangeIntSpliterator::forEachRemaining (65 bytes) inline (hot) @ 44 j.u.s.IntPipeline$5$1::accept (23 bytes) inline (hot) @ 12 …$$Lambda$1/321001045::applyAsLong (5 bytes) inline (hot) \-> TypeProfile (8593/8593 counts) = Test$$Lambda$1 @ 1 Test::lambda$sumTwiceStream$0 (5 bytes) inlining too deep @ 17 j.u.s.ReduceOps$8ReducingSink::accept (19 bytes) inline (hot) \-> TypeProfile (8593/8593 counts) = java/util/stream/ReduceOps$8ReducingSink @ 10 …$$Lambda$2/303563356::applyAsLong (6 bytes) inlining too deep \-> TypeProfile (11206/11206 counts) = java/util/stream/LongPipeline$$Lambda$2
48
JITWatchClass: TestMethod: lambda$sumTwiceStream$0JIT Compiled: YesInlined: No, inlining too deepCount: 7194iicount: 5599Bytes: 5Prof factor: 1
49
Скомпилированный код (после 5600)LongPipeline::sum
LongPipeline::reduceAbstractPipeline::evaluate
ReduceOp::evaluateSequentialAbstractPipeline::wrapAndCopyInto
AbstractPipeline::copyIntoSpliterator.OfInt::forEachRemaining
RangeIntSpliterator::forEachRemaining (цикл)IntPipeline$5$1::accept (mapToLong)
сгенерированный λ-классλ-функция x -> x*2
ReducingSink::accept (reduce)
Long::sumсгенерированный λ-класс
sumTwiceStream
C2
C2C2
50
java -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version
...uintx MaxGCMinorPauseMillis = 4294967295 {product}uintx MaxGCPauseMillis = 4294967295 {product}uintx MaxHeapFreeRatio = 100 {manageable}uintx MaxHeapSize := 1069547520 {product} intx MaxInlineLevel = 9 {product} intx MaxInlineSize = 35 {product} intx MaxJNILocalCapacity = 65536 {product} intx MaxJavaStackTraceDepth = 1024 {product} intx MaxJumpTableSize = 65000 {C2 product}...
51
JMH; -XX:MaxInlineLevel=13Benchmark (n) Score Error UnitsMyBenchmark.simple 100000 0.047 ± 0.001 ms/opMyBenchmark.simple 1000000 0.420 ± 0.002 ms/opMyBenchmark.simple 10000000 4.445 ± 0.029 ms/opMyBenchmark.stream 100000 0.038 ± 0.001 ms/opMyBenchmark.stream 1000000 0.379 ± 0.001 ms/opMyBenchmark.stream 10000000 4.102 ± 0.014 ms/op
52
n = 1_000_000; MaxInlineLevel = 13
# Warmup Iteration 1: 0.436 ms/op# Warmup Iteration 2: 0.408 ms/op# Warmup Iteration 3: 0.388 ms/op# Warmup Iteration 4: 0.376 ms/op# Warmup Iteration 5: 0.374 ms/opIteration 1: 0.377 ms/opIteration 2: 0.375 ms/opIteration 3: 0.376 ms/op...Iteration 9: 0.375 ms/opIteration 10: 0.376 ms/op
0.446 ms/op0.414 ms/op1.144 ms/op5.729 ms/op5.792 ms/op5.821 ms/op5.751 ms/op5.733 ms/op...5.787 ms/op5.829 ms/op
А было так:
53
MaxInlineLevel=13LongPipeline::sum
LongPipeline::reduceAbstractPipeline::evaluate
ReduceOp::evaluateSequentialAbstractPipeline::wrapAndCopyInto
AbstractPipeline::copyIntoSpliterator.OfInt::forEachRemaining
RangeIntSpliterator::forEachRemaining (цикл)IntPipeline$5$1::accept (mapToLong)
сгенерированный λ-классλ-функция x -> x*2
ReducingSink::accept (reduce)
Long::sumсгенерированный λ-класс
sumTwiceStream
C2
54
TypeProfilej.u.s.Streams$RangeIntSpliterator::forEachRemaining @ 34 (65 bytes) @ 44 j.u.s.IntPipeline$5$1::accept (23 bytes) inline (hot) \-> TypeProfile (55084/55084 counts) = j/u/s/IntPipeline$5$1 @ 12 …$$Lambda$1/321001045::applyAsLong (5 bytes) inline (hot) \-> TypeProfile (24272/24272 counts) = Test$$Lambda$1 @ 1 Test::lambda$sumTwiceStream$0 (5 bytes) inline (hot) @ 17 j.u.s.ReduceOps$8ReducingSink::accept (19 bytes) inline (hot) \-> TypeProfile (24272/24272 counts) = j/u/s/ReduceOps$8ReducingSink @ 10 …$$Lambda$2/303563356::applyAsLong (6 bytes) inline (hot) \-> TypeProfile (7376/7376 counts) = j/u/s/LongPipeline$$Lambda$2 @ 2 java.lang.Long::sum (4 bytes) inline (hot)
55
Profile pollution@Param({"0", "1", "2", "3"})private int pollute;
@Setuppublic void setup() { switch(pollute) { case 3: for(int i=0; i<1000; i++) IntStream.range(0,100).mapToLong(x -> x*3).sum(); case 2: for(int i=0; i<1000; i++) IntStream.range(0,100).mapToLong(x -> x*4).sum(); case 1: for(int i=0; i<1000; i++) IntStream.range(0,100).mapToLong(x -> x*5).sum(); }}
56
-XX:MaxInlineLevel=13 + pollutionBenchmark (n) (pollute) Score Error UnitsMB.stream 100000 0 0.038 ± 0.001 ms/opMB.stream 100000 1 0.047 ± 0.001 ms/opMB.stream 100000 2 0.510 ± 0.003 ms/opMB.stream 100000 3 0.509 ± 0.003 ms/opMB.stream 1000000 0 0.371 ± 0.002 ms/opMB.stream 1000000 1 0.435 ± 0.021 ms/opMB.stream 1000000 2 5.385 ± 0.067 ms/opMB.stream 1000000 3 5.433 ± 0.042 ms/opMB.stream 10000000 0 4.029 ± 0.019 ms/opMB.stream 10000000 1 4.550 ± 0.129 ms/opMB.stream 10000000 2 53.658 ± 0.812 ms/opMB.stream 10000000 3 54.563 ± 0.207 ms/op
57
MaxInlineLevel=13 + Type pollutionLongPipeline::sum
LongPipeline::reduceAbstractPipeline::evaluate
ReduceOp::evaluateSequentialAbstractPipeline::wrapAndCopyInto
AbstractPipeline::copyIntoSpliterator.OfInt::forEachRemaining
RangeIntSpliterator::forEachRemaining (цикл)IntPipeline$5$1::accept (mapToLong)
сгенерированный λ-классλ-функция x -> x*2
ReducingSink::accept (reduce)
Long::sumсгенерированный λ-класс
sumTwiceStream
C2
C2
58
Баги в OpenJDK• https://bugs.openjdk.java.net/browse/JDK-80154
16– tier one should collect context-dependent split
profiles• https://bugs.openjdk.java.net/browse/JDK-80154
17– profile pollution after call through invokestatic to
shared code
59
На самом делеpublic static long sumTwiceOpt(int max) { return max*(max+1L);}
Benchmark (n) Score Error UnitsMyBenchmark.opt 100000 0.003 ± 0.001 us/opMyBenchmark.opt 1000000 0.003 ± 0.001 us/opMyBenchmark.opt 10000000 0.003 ± 0.001 us/op
60
Но всё не зря!
-Xint-XX:UseOnStackReplacement-XX:UseLoopCounter-XX:MaxInlineLevel-XX:Tier4InvocationThreshold-XX:Tier4BackEdgeThreshold
-XX:UnlockDiagnosticVMOptions-XX:PrintCompilation-XX:TraceNMethodInstalls-XX:PrintInlining-XX:PrintFlagsFinal
Опции виртуальной машины
ИнструментыJMH JITWatch
61
Дополнительная информация• Алексей Шипилёв «The Black Magic of (Java) Method Dispatch»
– http://shipilev.net/blog/2015/black-magic-method-dispatch/• Владимир Иванов «Динамическая JIT-компиляция в JVM»
– http://www.youtube.com/watch?v=oYu3HuIYDhI• Алексей Шипилёв «Java Benchmarking: как два таймстампа прочитать!»
– http://shipilev.net/blog/2014/nanotrusting-nanotime/– https://www.youtube.com/watch?v=Vb3jyHl3FNk
• Пол Сандоз отвечает на «Erratic performance of Arrays.stream().map().sum()»– http://stackoverflow.com/a/25851390/4856258
62
Всем спасибо!
Пони отсюда: http://fire-seeker.deviantart.com/