78
Java Performance & Profiling M. Isuru Tharanga Chrishantha Perera Technical Lead at WSO2 Co-organizer of Java Colombo Meetup

Java Performance and Profiling

Embed Size (px)

Citation preview

Page 1: Java Performance and Profiling

Java Performance & Profiling

M. Isuru Tharanga Chrishantha PereraTechnical Lead at WSO2Co-organizer of Java Colombo Meetup

Page 2: Java Performance and Profiling

Measuring Performance

We need a way to measure the performance:o To understand how the system behaveso To see performance improvements after doing

any optimizations

There are two key performance metrics.o Latencyo Throughput

Page 3: Java Performance and Profiling

What is Throughput?

Throughput measures the number of messages that a server processes during a specific time interval (e.g. per second).

Throughput is calculated using the equation:

Throughput = number of requests / time to complete the requests

Page 4: Java Performance and Profiling

What is Latency?

Latency measures the end-to-end processing time for an operation.

Page 5: Java Performance and Profiling

Benchmarking Tools

Apache JMeterApache Benchmarkwrk - a HTTP benchmarking tool

Page 6: Java Performance and Profiling

Tuning Java Applications

We need to have a very high throughput and very low latency values.

There is a tradeoff between throughput and latency. With more concurrent users, the throughput increases, but the average latency will also increase.

Usually, you need to achieve maximum throughput while keeping latency within some acceptable limit. For eg: you might choose maximum throughput in a range where latency is less than 10ms

Page 7: Java Performance and Profiling

Throughput and Latency Graphs

Source: https://www.infoq.com/articles/Tuning-Java-Servers

Page 8: Java Performance and Profiling

Latency Distribution

When measuring latency, it’s important to look at the latency distribution: min, max, avg, median, 75th percentile, 98th percentile, 99th percentile etc.

Page 9: Java Performance and Profiling

Longtail latencies

When high percentiles have values much greater than the average latency

Source: https://engineering.linkedin.com/performance/who-moved-my-99th-percentile-latency

Page 10: Java Performance and Profiling

Latency Numbers Every Programmer Should Know

L1 cache reference 0.5 ns

Branch mispredict 5 ns

L2 cache reference 7 ns 14x L1 cache

Mutex lock/unlock 25 ns

Main memory reference 100 ns 20x L2 cache, 200x L1 cache

Compress 1K bytes with Zippy 3,000 ns 3 us

Send 1K bytes over 1 Gbps network 10,000 ns 10 us

Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD

Read 1 MB sequentially from memory 250,000 ns 250 us

Round trip within same datacenter 500,000 ns 500 us

Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory

Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip

Read 1 MB sequentially from disk 20,000,000 ns 20,000 us 20 ms 80x memory, 20X SSD

Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms

Page 11: Java Performance and Profiling

Java Garbage Collection

Java automatically allocates memory for our applications and automatically deallocates memory when certain objects are no longer used.

"Automatic Garbage Collection" is an important feature in Java.

Page 12: Java Performance and Profiling

Marking and Sweeping Away Garbage

GC works by first marking all used objects in the heap and then deleting unused objects.

GC also compacts the memory after deleting unreferenced objects to make new memory allocations much easier and faster.

Page 13: Java Performance and Profiling

GC roots

o JVM references GC roots, which refer the application objects in a tree structure. There are several kinds of GC Roots in Java.o Local Variableso Active Java Threadso Static variableso JNI references

o When the application can reach these GC roots, the whole tree is reachable and GC can determine which objects are the live objects.

Page 14: Java Performance and Profiling

Java Heap Structure

Java Heap is divided into generations based on the object lifetime.

Following is the general structure of the Java Heap. (This is mostly dependent on the type of collector).

Page 15: Java Performance and Profiling

Young Generation

o Young Generation usually has Eden and Survivor spaces.

o All new objects are allocated in Eden Space.o When this fills up, a minor GC happens.o Surviving objects are first moved to survivor

spaces.o When objects survives several minor GCs

(tenuring threshold), the relevant objects are eventually moved to the old generation.

Page 16: Java Performance and Profiling

Old Generation

o This stores long surviving objects.o When this fills up, a major GC (full GC)

happens. o A major GC takes a longer time as it has to

check all live objects.

Page 17: Java Performance and Profiling

Permanent Generation

o This has the metadata required by JVM.o Classes and Methods are stored here.o This space is included in a full GC.

Page 18: Java Performance and Profiling

Java 8 and PermGen

In Java 8, the permanent generation is not a part of heap.

The metadata is now moved to native memory to an area called “Metaspace”

There is no limit for Metaspace by default

Page 19: Java Performance and Profiling

"Stop the World"

o For some events, JVM pauses all application threads. These are called Stop-The-World (STW) pauses.

o GC Events also cause STW pauses.o We can see application stopped time with GC

logs.

Page 20: Java Performance and Profiling

GC Logging

o There are JVM flags to log details for each GC. o -XX:+PrintGC - Print messages at garbage collectiono -XX:+PrintGCDetails - Print more details at garbage

collectiono -XX:+PrintGCTimeStamps - Print timestamps at garbage

collectiono -XX:+PrintGCApplicationStoppedTime - Print the

application GC stopped timeo -XX:+PrintGCApplicationConcurrentTime - Print the

application GC concurrent time

o The GCViewer is a great tool to view GC logs

Page 21: Java Performance and Profiling

Java Memory Usage

Init - initial amount of memory that the JVM requests from the OS for memory management during startup.Used - amount of memory currently used Committed - amount of memory that is guaranteed to be available for use by the JVMMax - maximum amount of memory that can be used for memory management.

Page 22: Java Performance and Profiling

JDK Tools and Utilities

o Basic Tools (java, javac, jar)o Security Tools (jarsigner, keytool)o Java Web Service Tools (wsimport, wsgen)o Java Troubleshooting, Profiling, Monitoring and

Management Tools (jcmd, jconsole, jmc, jvisualvm)

Page 23: Java Performance and Profiling

Java Troubleshooting, Profiling, Monitoring and Management Toolso jcmd - JVM Diagnostic Commands toolo jconsole - A JMX-compliant graphical tool for

monitoring a Java applicationo jvisualvm – Provides detailed information about the

Java application. It provides CPU & Memory profiling, heap dump analysis, memory leak detection etc.

o jmc – Tools to monitor and manage Java applications without introducing performance overhead

Page 24: Java Performance and Profiling

Java Experimental Tools

o Monitoring Toolso jps – JVM Process Status Toolo jstat – JVM Statistics Monitoring Tool

o Troubleshooting Toolso jmap - Memory Map for Javao jhat - Heap Dump Browsero jstack – Stack Trace for Java

jstat -gcutil <pid>

sudo jmap -heap <pid>

sudo jmap -F -dump:format=b,file=/tmp/dump.hprof <pid>

jhat /tmp/dump.hprof

Page 25: Java Performance and Profiling

Java Ergonomics and JVM Flags

Java Virtual Machine can tune itself depending on the environment and this smart tuning is referred to as Ergonomics.

When tuning Java, it's important to know which values were used as default for Garbage collector, Heap Sizes, Runtime Compiler by Java Ergonomics

Page 26: Java Performance and Profiling

Printing Command Line Flags

We can use "-XX:+PrintCommandLineFlags" to print the command line flags used by the JVM.

This is a useful flag to see the values selected by Java Ergonomics.

eg: $ java -XX:+PrintCommandLineFlags -version

-XX:InitialHeapSize=128884992 -XX:MaxHeapSize=2062159872 -XX:+PrintCommandLineFlags -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseParallelGC

java version "1.8.0_102"

Java(TM) SE Runtime Environment (build 1.8.0_102-b14)

Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)

Page 27: Java Performance and Profiling

Use following command to see the default valuesjava -XX:+PrintFlagsInitial -versionUse following command to see the final values. java -XX:+PrintFlagsFinal -versionThe values modified manually or by Java Ergonomics are shown with “:=”java -XX:+PrintFlagsFinal -version | grep ':='

http://isuru-perera.blogspot.com/2015/08/java-ergonomics-and-jvm-flags.html

Printing Initial & Final JVM Flags

Page 28: Java Performance and Profiling

What is Profiling?

Here is what wikipedia says:In software engineering, profiling ("program profiling", "software profiling") is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization.

https://en.wikipedia.org/wiki/Profiling_(computer_programming)

Page 29: Java Performance and Profiling

What is Profiling?

Here is what wikipedia says:Profiling is achieved by instrumenting either the program source code or its binary executable form using a tool called a profiler (or code profiler). Profilers may use a number of different techniques, such as event-based, statistical, instrumented, and simulation methods.

https://en.wikipedia.org/wiki/Profiling_(computer_programming)

Page 30: Java Performance and Profiling

Why do we need Profiling?

o Improve throughput (Maximizing the transactions processed per second)

o Improve latency (Minimizing the time taken to for each operation)

o Find performance bottlenecks

Page 31: Java Performance and Profiling

Java Profiling Tools

Survey by RebelLabs in 2016:

http://pages.zeroturnaround.com/RebelLabs-Developer-Productivity-Report-2016.html

Page 32: Java Performance and Profiling

Java Profiling Tools

Java VisualVM - Available in JDKJava Mission Control - Available in JDKJProfiler - A commercially licensed Java profiling tool developed by ej-technologies

Honest Profiler - Open Source Sampling CPU profiler

Page 33: Java Performance and Profiling

How Profilers Work?

Generic profilers rely on the JVMTI specJVMTI offers only safepoint sampling stack trace collection options

Page 34: Java Performance and Profiling

Safepoints

A safepoint is a moment in time when a thread’s data, its internal state and representation in the JVM are, well, safe for observation by other threads in the JVM.

● Between every 2 bytecodes (interpreter mode)● Backedge of non-’counted’ loops● Method exit● JNI call exit

Page 35: Java Performance and Profiling

Measuring Methods for CPU Profiling

Sampling: Monitor running code externally and check which code is executed

Instrumentation: Include measurement code into the real code

Page 36: Java Performance and Profiling

Profiling Applications with Java VisualVM

CPU Profiling: Profile the performance of the application.

Memory Profiling: Analyze the memory usage of the application.

Page 37: Java Performance and Profiling

Java Mission Control

o A set of powerful tools running on the Oracle JDK to monitor and manage Java applications

o Free for development use (Oracle Binary Code License)

o Available in JDK since Java 7 update 40o Supports Pluginso Two main tools

o JMX Consoleo Java Flight Recorder

Page 38: Java Performance and Profiling

Sampling vs. Instrumentation

Sampling:o Overhead depends on the sampling intervalo Can see execution hotspotso Can miss methods, which returns faster than

the sampling interval.Instrumentation:o Precise measurement for execution timeso More data to process

Page 39: Java Performance and Profiling

Sampling vs. Instrumentation

o Java VisualVM uses both sampling and instrumentation

o Java Flight Recorder uses sampling for hot methods

o JProfiler supports both sampling and instrumentation

Page 40: Java Performance and Profiling

Problems with Profiling

o Runtime Overheado Interpretation of the results can be difficult

o Identifying the "crucial“ parts of the softwareo Identifying potential performance improvements

Page 41: Java Performance and Profiling

Java Flight Recorder (JFR)

o A profiling and event collection framework built into the Oracle JDK

o Gather low level information about the JVM and application behaviour without performance impact (less than 2%)

o Always on Profiling in Production Environmentso Engine was released with Java 7 update 4o Commercial feature in Oracle JDK

Page 42: Java Performance and Profiling

JFR Events

o JFR collects data about events.o JFR collects information about three types of

events:o Instant events – Events occurring instantlyo Sample (Requestable) events – Events with a user

configurable period to provide a sample of system activity

o Duration events – Events taking some time to occur. The event has a start and end time. You can set a threshold.

Page 43: Java Performance and Profiling

Java Flight Recorder Architecture

JFR is comprised of the following components:o JFR runtime - The recording engine inside the

JVM that produces the recordings.o Flight Recorder plugin for Java Mission Control

(JMC)

Page 44: Java Performance and Profiling

Enabling Java Flight Recorder

Since JFR is a commercial feature, we must unlock commercial features before trying to run JFR.

So, you need to have following arguments.-XX:+UnlockCommercialFeatures -XX:+FlightRecorder

Page 45: Java Performance and Profiling

Dynamically enabling JFR

If you are using Java 8 update 40 (8u40) or later, you can now dynamically enable JFR.

This is useful as we don’t need to restart the server.

Page 46: Java Performance and Profiling

Improving the accuracy of JFR Method Profilero An important feature of JFR Method Profiler is

that it does not require threads to be at safe points in order for stacks to be sampled.o Generally, the stacks will only be walked at safe

points.o HotSpot JVM doesn’t provide metadata for

non-safe point parts of the code. Use following to improve the accuracy.o -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints

Page 47: Java Performance and Profiling

JFR Event Settings

o There are two event settings by default in Oracle JDK.

o Files are in $JAVA_HOME/jre/lib/jfro Continuous - default.jfco Profiling - profile.jfc

Page 48: Java Performance and Profiling

JFR Recording Types

o Time Fixed Recordingso Fixed durationo The recording will be opened automatically in JMC

at the end (If the recording was started by JMC)o Continuous Recordings

o No end timeo Must be explicitly dumped

Page 49: Java Performance and Profiling

Running Java Flight Recorder

There are few ways we can run JFR.o Using the JFR plugin in JMCo Using the command lineo Using the Diagnostic Command

Page 50: Java Performance and Profiling

Running Java Flight Recorder

You can run multiple recordings concurrently and have different settings for each recording.

However, the JFR runtime will use same buffers and resulting recording contains the union of all events for all recordings active at that particular time.

This means that we might get more than we asked for. (but not less)

Page 51: Java Performance and Profiling

Running JFR from JMC

o Right click on JVM and select “Start Flight Recording”

o Select the type of recording: Time fixed / Continuous

o Select the “Event Settings” templateo Modify the event options for the selected flight

recording template (Optional)o Modify the event details (Optional)

Page 52: Java Performance and Profiling

Running JFR from Command Line

o To produce a Flight Recording from the command line, you can use “- XX:StartFlightRecording” option. Eg:o -XX:StartFlightRecording=delay=20s,duration=60s,name=Test,filename=recording.jfr,settings=profile

o Settings are in $JAVA_HOME/jre/lib/jfro Use following to change log level

o -XX:FlightRecorderOptions=loglevel=info

Page 53: Java Performance and Profiling

Continuous recording from Command Line

o You can also start a continuous recording from the command line using -XX:FlightRecorderOptions.o -XX:FlightRecorderOptions=defaultrecording=true,disk=true,repository=/tmp,maxage=6h,settings=default

Page 54: Java Performance and Profiling

The Default Recording

o Use default recording option to start a continuous recordingo -XX:FlightRecorderOptions=defaultrecording=true

o Default recording can be dumped on exito Only the default recording can be used with the

dumponexit and dumponexitpath parameterso -XX:FlightRecorderOptions=defaultrecording=true,dumponexit=true,dumponexitpath=/tmp/dumponexit.jfr

Page 55: Java Performance and Profiling

Running JFR using Diagnostic Commands

o The command “jcmd” can be usedo Start Recording Example:

o jcmd <pid> JFR.start delay=20s duration=60s name=MyRecording filename=/tmp/recording.jfr settings=profile

o Check recordingo jcmd <pid> JFR.check

o Dump Recordingo jcmd <pid> JFR.dump filename=/tmp/dump.jfr

name=MyRecording

Page 56: Java Performance and Profiling

Analyzing Flight Recordings

o JFR runtime engine dumps recorded data to files with *.jfr extension

o These binary files can be viewed from JMCo There are tab groups showing certain aspects

of the JVM and the Java application runtime such as Memory, Threads, I/O etc.

Page 57: Java Performance and Profiling

JFR Tab Groups

o General – Details of the JVM, the system, and the recording.

o Memory - Information about memory & garbage collection.

o Code - Information about methods, exceptions, compilations, and class loading.

Page 58: Java Performance and Profiling

JFR Tab Groups

o Threads - Information about threads and locks.o I/O: Information about file and socket I/O.o System: Information about environmento Events: Information about the event types in the

recording

Page 59: Java Performance and Profiling

Java Just-In-Time (JIT) compiler

Java code is usually compiled into platform independent bytecode (class files)

The JVM is able to load the class files and execute the Java bytecode via the Java interpreter.

Even though this bytecode is usually interpreted, it might also be compiled into native machine code using the JVM's Just-In-Time (JIT) compiler.

Page 60: Java Performance and Profiling

Java Just-In-Time (JIT) compiler

Unlike the normal compiler, the JIT compiler compiles the code (bytecode) only when required. With JIT compiler, the JVM monitors the methods executed by the interpreter and identifies the “hot methods” for compilation. After identifying the Java method calls, the JVM compiles the bytecode into a more efficient native code.

Page 61: Java Performance and Profiling

JIT Optimization Techniques

Dead Code Elimination Null Check EliminationBranch PredictionLoop UnrollingInlining Methods

Page 62: Java Performance and Profiling

JITWatch

The JITWatch tool can analyze the compilation logs generated with the “-XX:+LogCompilation” flag.

The logs generated by LogCompilation are XML-based and has lot of information related to JIT compilation. Hence these files are very large.

https://github.com/AdoptOpenJDK/jitwatch

Page 63: Java Performance and Profiling

Flame Graphs

o Flame graphs are a visualization of profiled software, allowing the most frequent code-paths to be identified quickly and accurately.

o Flame Graphs can be generated using https://github.com/brendangregg/FlameGrapho This creates an interactive SVG

http://www.brendangregg.com/flamegraphs.html

Page 64: Java Performance and Profiling

Types of Flame Graphs

o CPUo Memoryo Off-CPUo Hot/Coldo Differential

Page 65: Java Performance and Profiling

Flame Graph: Definition

o The x-axis shows the stack profile population, sorted alphabeticallyo The y-axis shows stack depth

o The top edge shows what is on-CPU, and beneath it is its ancestryo Each rectangle represents a stack frame.o Box width is proportional to the total time a function was profiled directly or

its children were profiled

Page 66: Java Performance and Profiling

Flame Graphs with Java Flight Recordings

o We can generate CPU Flame Graphs from a Java Flight Recording

o Program is available at GitHub: https://github.com/chrishantha/jfr-flame-graph

o The program uses the (unsupported) JMC Parser

Page 67: Java Performance and Profiling

Generating a Flame Graph using JFR dump

o JFR has Method Profiling Sampleso You can view those in “Hot Methods” and “Call Tree”

tabso A Flame Graph can be generated using these

Method Profilings Samples

Page 68: Java Performance and Profiling

Profiling a Sample Program

o Get Sample “highcpu” program from https://github.com/chrishantha/sample-java-programso Checkout v0.0.1 tag and build

o Get a Profiling Recordingo java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -XX:+UnlockCommercialFeatures

-XX:+FlightRecorder -XX:StartFlightRecording=delay=20s,duration=1m,name=Profiling,filename=highcpu_profiling.jfr,settings=profile -jar target/highcpu-0.0.1.jar

Page 69: Java Performance and Profiling

Using jfr-flame-graph

./create_flamegraph.sh -f /tmp/sample-java-programs/highcpu/highcpu_profiling.jfr -i > flamegraph.svg

Page 70: Java Performance and Profiling

Java Mixed-Mode Flame Graphs

o With Java Profilers, we can get information about Java process only.

o However with Java Mixed-Mode Flame Graphs, we can see how much CPU time is spent in Java methods, system libraries and the kernel.

o Mixed-mode means that the Flame Graph shows profile information from both system code paths and Java code paths.

Page 71: Java Performance and Profiling

Installing “perf_events” on Ubuntu

o On terminal, type perfo sudo apt-get install linux-tools-generic

Page 72: Java Performance and Profiling

The Problem with Java and Perf

o perf needs the Java symbol tableo JVM doesn’t preserve frame pointers by defaulto Run sample program

o java -jar target/highcpu-0.0.1.jar --exit-timeout 600o Run perf record

o sudo perf record -F 99 -g -p `pgrep -f highcpu`o Display trace output

o sudo perf script

Page 73: Java Performance and Profiling

Preserving Frame Pointers in JVM

o Run java program with the JVM flag "-XX:+PreserveFramePointer"o java -XX:+PreserveFramePointer -jar

target/highcpu-0.0.1.jar --exit-timeout 600o This flag is working only on JDK 8 update 60

and above.

Page 74: Java Performance and Profiling

How to generate Java symbol table

o Use a java agent to generate method mappings to use with the linux `perf` toolo Clone & Build

https://github.com/jrudolph/perf-map-agento Create symbol map

o ./create-java-perf-map.sh `pgrep -f highcpu`

Page 75: Java Performance and Profiling

Generate Java Mixed Mode Flame Graph

o Run perfo sudo perf record -F 99 -g -p `pgrep -f highcpu` --

sleep 60o Create symbol mapo Generate Flame Graph

o sudo perf script > out.stackso $FLAMEGRAPH_DIR/stackcollapse-perf.pl

out.stacks | $FLAMEGRAPH_DIR/flamegraph.pl --color=java --hash --width 1680 > java-mixed-mode.svg

Page 76: Java Performance and Profiling

Java Mixed-Mode Flame Graphs

o Helps to understand Java CPU Usage

o With Flame Graphs, we can see both java and system profiles

o Can profile GC as well

Page 77: Java Performance and Profiling

Does profiling matter?

Yes!Most of the performance issues are in the application code.

Early performance testing is key. Fix problems while developing.

Page 78: Java Performance and Profiling

Thank you!