75
Hotspot Garbage Collection Tuning Guide http://www.jclarity.com 1 Thursday, 2 May 13

Hotspot Garbage Collection - Tuning Guide

Embed Size (px)

DESCRIPTION

Part 2/2 Of the Hotspot Garbage Collection series. This is the Tuning Guide portion!

Citation preview

Page 1: Hotspot Garbage Collection - Tuning Guide

Hotspot Garbage Collection

Tuning Guide

http://www.jclarity.com1Thursday, 2 May 13

Page 2: Hotspot Garbage Collection - Tuning Guide

Who are we?

• Martijn Verburg (@karianna)– CTO of jClarity– aka "The Diabolical Developer"– co-leader of the LJC

• Dr. John Oliver (@johno_oliver)– Research Mentat at jClarity

• Strange title? Yes we're a start-up– Can read raw GC log files

• "Empirical Science Wins"

2Thursday, 2 May 13

Page 3: Hotspot Garbage Collection - Tuning Guide

What we're going to cover

• Part I - Shining a light into the Darkness– Retrospective from Talk I– Collector Flags Ahoy– Tooling and Basic Data

• Part II - Setting the stage– When to tune GC– Pause times vs Throughput vs Heap Size– Application Lifecycle

• Part III - Real World Scenarios– Possible Memory Leak(s), Long Pause Times– Premature Promotion, System GCs, Low Throughput– Healthy Application, Maxed Allocation Rate

3Thursday, 2 May 13

Page 4: Hotspot Garbage Collection - Tuning Guide

What we're not covering

• G1 Collector– It's supported in production now– But we doubt any of you are using it yet

• Non Hotspot JVMs– Again, most of you are using OpenJDK/Oracle.– Azul's Zing VM is a specialist VM you can look at

4Thursday, 2 May 13

Page 5: Hotspot Garbage Collection - Tuning Guide

Part I - Shining a light into the dark

• Retrospective

• Collector Flags ahoy

• Reading CMS Log records

• Tooling and basic data

5Thursday, 2 May 13

Page 6: Hotspot Garbage Collection - Tuning Guide

Java Heap Layout

Copyright - Oracle Corporation

6Thursday, 2 May 13

Page 7: Hotspot Garbage Collection - Tuning Guide

Weak Generational Hypothesis

Copyright - Oracle Corporation

7Thursday, 2 May 13

Page 8: Hotspot Garbage Collection - Tuning Guide

Copy Collectors

• aka "stop-and-copy"– Some literature will discuss "Cheney's algorithm"

• Used in many managed runtimes– Including Hotspot

• GC thread(s) trace from root(s) to find live objects

• Typically involves copying live objects– From one space to another space in memory– The result typically looks like a move as opposed to a copy

8Thursday, 2 May 13

Page 9: Hotspot Garbage Collection - Tuning Guide

Mark and Sweep Collectors

• Used by many modern collectors– Including Hotspot, usually for old generational collection

• Typically 2 mandatory and 1 optional step(s)1.Find live objects (mark)2.'Delete' dead objects (sweep)3.Tidy up - optional (compact)

9Thursday, 2 May 13

Page 10: Hotspot Garbage Collection - Tuning Guide

More Flags than your Deity

Copyright Frank Pavageau

10Thursday, 2 May 13

Page 11: Hotspot Garbage Collection - Tuning Guide

'Mandatory' Flags

• -Xloggc:<pathtofile>– Path to the log output, make sure you've got disk space!

• -XX:+PrintGCDetails– Minimum information for tools to help– Replace -verbose:gc with this

• -XX:+PrintTenuringDistribution– Premature promotion information

11Thursday, 2 May 13

Page 12: Hotspot Garbage Collection - Tuning Guide

Basic Heap Sizing Flags

• -Xms<size>– Set the minimum size reserved for the heap

• -Xmx<size>– Set the maximum size reserved for the heap

• -XX:MaxPermSize=<size>– Set the maximum size of your perm gen– Good for Spring apps and App servers

• We'll cover other flags in a tuning context

12Thursday, 2 May 13

Page 13: Hotspot Garbage Collection - Tuning Guide

Beware of Magic Happening

• When you touch GC Flags a Puppy dies

• Your Tenuring Threshold jumps to 15

• -XX:MaxTenuringThreshold=n– To reset this to what you really want

13Thursday, 2 May 13

Page 14: Hotspot Garbage Collection - Tuning Guide

Tooling• HPJMeter (Google it)

– Solid, but no longer supported / enhanced

• GCViewer (http://www.tagtraum.com/gcviewer.html)– Has rudimentary G1 support

• GarbageCat (http://code.google.com/a/eclipselabs.org/p/garbagecat/)– Best name

• IBM GCMV (http://www.ibm.com/developerworks/java/jdk/tools/gcmv/)– J9 support

• jClarity Censum (http://www.jclarity.com/products/censum)– The prettiest and most useful, but we're biased!

14Thursday, 2 May 13

Page 15: Hotspot Garbage Collection - Tuning Guide

Don't listen to the vendors ;-)

• Single log with consistent format?– You can probably grep for stuff– This doesn't scale

• Existing free tools are adequate*– *For older JVMs especially– Most are no longer actively maintained

• Latest tooling does more for you– Supports Latest JVMs & Collectors– Has more meaningful visualisations– Starts to do some of the Human analysis for you– Correlates and performs historical analysis– Parses certain data out that the others don't

15Thursday, 2 May 13

Page 16: Hotspot Garbage Collection - Tuning Guide

Summary Data

16Thursday, 2 May 13

Page 17: Hotspot Garbage Collection - Tuning Guide

Heap Usage After GC

17Thursday, 2 May 13

Page 18: Hotspot Garbage Collection - Tuning Guide

Recovered Heap

18Thursday, 2 May 13

Page 19: Hotspot Garbage Collection - Tuning Guide

Allocation Rates

19Thursday, 2 May 13

Page 20: Hotspot Garbage Collection - Tuning Guide

Pause Times

20Thursday, 2 May 13

Page 21: Hotspot Garbage Collection - Tuning Guide

Perm Space

21Thursday, 2 May 13

Page 22: Hotspot Garbage Collection - Tuning Guide

Tenuring Threshold

22Thursday, 2 May 13

Page 23: Hotspot Garbage Collection - Tuning Guide

Part II - Setting the stage

• When to Tune

• Latency / Throughput / Footprint– aka Performance goals

• Application Lifecycle

• Know your Hardware

23Thursday, 2 May 13

Page 24: Hotspot Garbage Collection - Tuning Guide

When to tune GC

• As part of a performance diagnostic process– After looking machine metrics– Before execution profiler

• It's cheap to switch on GC flags– It's cheap to eliminate or pin issue on GC– It's not cheap to setup execution profilers

• Result is either "GC is OK" or "GC is not OK"– Tune the GC and/or– Bring out the memory profiler

24Thursday, 2 May 13

Page 25: Hotspot Garbage Collection - Tuning Guide

Latency vs Throughput vs Footprint

• aka performance goals:– e.g. "Max Pause Times / 95th% Pause Times" vs– "Object Allocation Rate" vs– "Heap Size"– Throughput ~= % of time doing application work

• Tuning tradeoff– Latency x Throughput x Footprint = Z– You can typically tune for 2/3 of these– To increase Z you need to

• increase allocated hardware OR• Rewrite your app

• Decide what characteristics you want!– Before tuning

25Thursday, 2 May 13

Page 26: Hotspot Garbage Collection - Tuning Guide

Latency vs Throughput vs Footprint

• Better Throughput– Usually means worse Latency and Footprint

• Better Latency– Usually means worse Throughput

• Better footprint– Usually means worse Throughput

26Thursday, 2 May 13

Page 27: Hotspot Garbage Collection - Tuning Guide

Application Lifecycle

• Very little point in tuning based off limited information– Have you gathered enough data– Has your application gone through it's typical lifecycle?– This is why we don't run 'Live Demos'

• Very little point in tuning off incorrect information– Application start-up, shutdown and batch jobs are all outliers

• You can infer amazing things from GC logs– When Richard went to lunch– When John stopped playing Minecraft– When Ben kicked off the weekly customer report– .....

27Thursday, 2 May 13

Page 28: Hotspot Garbage Collection - Tuning Guide

Know your Hardware

• Number of CPU cores, matters– Allocate X threads to do GC work with a concurrent collector– How many is 'safe'?– How does that affect throughput?

• Memory Bandwidth, matters– How quickly can your hardware allocate?– See your manufacturer– Object Allocation Rates != Memory Bandwidth != Real Metric

• Use Hawkshaw to explore your hardware– Produces GC behaviour according to statistical models– http://www.github.com/jclarity/hawkshaw

28Thursday, 2 May 13

Page 29: Hotspot Garbage Collection - Tuning Guide

Part III - Tuning Scenarios

• Tuning can make it worse!

• Grain of Salt

• Scenarios– Possible Memory Leak(s)– Long Pause Times– Premature Promotion– System GCs– Low Throughput– Healthy Application– Maxed Allocation Rate

29Thursday, 2 May 13

Page 30: Hotspot Garbage Collection - Tuning Guide

Tuning can make it worse*

• Performance Tuning is an iterative process– Sometimes solving one problem uncovers a 2nd worse

problem– e.g. Fix the app, then the database gets hammered

• Overall performance goes down

• Only fix one aspect of GC at a time– Measure the next cycle with fresh eyes– Have you met your goals or made them worse?

• GC tuning still needs human interaction– Azul's Zing can/will claim otherwise.

30Thursday, 2 May 13

Page 31: Hotspot Garbage Collection - Tuning Guide

Grain of Salt

"Nothing that we say should be held as

performance tuning tips for *your* application"

"There is *always* more than one way to tune in order to meet your goal"

"Don't just use our numbers!"

31Thursday, 2 May 13

Page 32: Hotspot Garbage Collection - Tuning Guide

A Likely Memory Leak

• Memory leaks can't truly be ascertained by a GC log– It could just be an undersized heap!– Needs Human domain knowledge of app (periodicity)

• First rule of thumb is to increase your heap– Rule out having an undersized heap

• Second rule of thumb is to fire up the Memory profiler– Visual VM will do in most cases

32Thursday, 2 May 13

Page 33: Hotspot Garbage Collection - Tuning Guide

A Likely Memory Leak

• Only 1000 seconds, look at number of Full GC's, highly indicative. Note trend along the bottom.

33Thursday, 2 May 13

Page 34: Hotspot Garbage Collection - Tuning Guide

A Possible Memory Leak - I

• Note: trend along the bottom, slow leak possible. Look for cycles in the log e.g. A full day in an application's life.

34Thursday, 2 May 13

Page 35: Hotspot Garbage Collection - Tuning Guide

A Possible Memory Leak - II

• Note: Trend along the bottom, slow leak possible. Again, look for cycles in the log.

35Thursday, 2 May 13

Page 36: Hotspot Garbage Collection - Tuning Guide

Using a Memory Profiler

• Visual VM– Memory profiler - invasive and slow on large apps– Look at object ages (aka Generations)

• Look for high number of generations– They're a candidate– Make sure you switch on record allocation stack traces

• Use allocation stack trace to find root cause– Track back from core JRE classes to your code– Yes, it's always your code that's the problem!

• Can also try jmap -histo

36Thursday, 2 May 13

Page 37: Hotspot Garbage Collection - Tuning Guide

Visual VM - Memory Profiler

• Note: Objects in many generations! Indicative they're leaking

37Thursday, 2 May 13

Page 38: Hotspot Garbage Collection - Tuning Guide

Visual VM - Stack Trace

• NThreadedManagedCache$ProduceKey.run() root cause

38Thursday, 2 May 13

Page 39: Hotspot Garbage Collection - Tuning Guide

Long Pause Times

• The #1 complaint relating to GC– Lots of ways to mitigate– From small tuning tweaks --> off Heap solutions

• User reports paused/locked application!– e.g. Web pages taking ages to load– e.g. Progress bars stalling

• Tech Support want to uninstall Java!

39Thursday, 2 May 13

Page 40: Hotspot Garbage Collection - Tuning Guide

Long Pause Time Example

• User has set heap to: -Xms5G -Xmx5G

• NOTE: Resident Set Size ~1GB

40Thursday, 2 May 13

Page 41: Hotspot Garbage Collection - Tuning Guide

Long Pause Time Example

• ~125ms young gen pauses & ~500ms Full GC pauses– OK for web app, but this is a new prototype low latency trading app or

Media Streaming app or Advertising service, oh dear!

41Thursday, 2 May 13

Page 42: Hotspot Garbage Collection - Tuning Guide

Long Pause Time partial fix

• Reduce heap size -Xmx1500M, more frequent, shorter pauses

42Thursday, 2 May 13

Page 43: Hotspot Garbage Collection - Tuning Guide

Long Pause Time partial fix

• ~20ms young gen pauses & ~250ms Full GC pauses, Better!

43Thursday, 2 May 13

Page 44: Hotspot Garbage Collection - Tuning Guide

Long Pause Time 'fixed'

• Move to a CMS collector, hopefully shorter pauses

• No Full GC's! Therefore minimal Tenured pauses

44Thursday, 2 May 13

Page 45: Hotspot Garbage Collection - Tuning Guide

Long Pause Time 'fixed!'

• ~10ms young gen pauses, ~2ms tenured pauses, Better!

• BUT: Throughput decreased from 69% down to 49% :-(

45Thursday, 2 May 13

Page 46: Hotspot Garbage Collection - Tuning Guide

Other Long Pause Time Solutions

• Increase number of threads performing GC– -XX:ParallelGCThreads=N– Rule of thumb is to use 3/4 the available physical cores– Can reduce application throughput - can be bad– Can increase context switching - bad

• Try an alternative collector– ParNew/CMS vs PSScavenge/ParOld vs iCMS vs G1 etc– Match the collector to your application and hardware

• Special note on G1– You can set pause time goals– BUT: We haven't reliably succeed for <100ms pause times

46Thursday, 2 May 13

Page 47: Hotspot Garbage Collection - Tuning Guide

Extreme Long Pause Time Solutions

• Azul's Zing JVM– This has a proven low pause time goal settings– JCK/TCK compliant– Typically needs a very large heap (15GB+)

• Take memory off heap– Good for caches in particular

• GC in offline mode– Cluster app and offline nodes in order to run GC on them

47Thursday, 2 May 13

Page 48: Hotspot Garbage Collection - Tuning Guide

Premature Promotion

• User reports more pauses and/or longer pauses

• Tech support reports there are more full GC's

• Objects are promoted to Tenured too early– Recall the Young Generational Hypothesis!– This causes more Old Gen collections

• Which can lead to more Full GCs

48Thursday, 2 May 13

Page 49: Hotspot Garbage Collection - Tuning Guide

Premature Promotion Example

Customer had set:

-XX:+UseConcMarkSweepGC-XX:+UseParNewGC-XX:+PrintGCDetails-Xloggc:gc.log-Xmx1024m-XX:+PrintTenuringDistribution-XX:NewRatio=2-XX:MaxTenuringThreshold=4

NewRatio=2 means young gen gets ~1/3 of the total heap

49Thursday, 2 May 13

Page 50: Hotspot Garbage Collection - Tuning Guide

Premature Promotion Example

• Note: ~26% of objects promoted at age 1

50Thursday, 2 May 13

Page 51: Hotspot Garbage Collection - Tuning Guide

Premature Promotion 'Fixed'

• We dropped the NewRatio=1, Premature Promotion ~4%– Young Generational Hypothesis is a better fit– This gives the Young Gen ~1/2 the heap

51Thursday, 2 May 13

Page 52: Hotspot Garbage Collection - Tuning Guide

System GC's

• User reports frequent pauses– System GC's are Full GCs!

• Tech support reports there are more full GC's– With this funny System wording in the log

• System GCs often interfere with the GC subsystem– JVM no longer resizes heap based on runtime info

• Caused by System.gc() in code or an RMI call– Very occasionally used to solve a problem– System.gc() is almost always honoured– You can disable it -XX:+DisableExplicitGC

52Thursday, 2 May 13

Page 53: Hotspot Garbage Collection - Tuning Guide

System GC example

• NOTE: 34,000 system GC's, every 1/2 second– Throughput 51% - Unhappy Minecraft players!

53Thursday, 2 May 13

Page 54: Hotspot Garbage Collection - Tuning Guide

System GC calls 'Fixed'

• -XX:+DisableExplicitGC

• Throughput went to 99.8% - Happier Minecraft players

54Thursday, 2 May 13

Page 55: Hotspot Garbage Collection - Tuning Guide

Low Throughput

• User reports slow application– e.g. Batch job fails to complete on time

• Tech support reports there are lots of GC's

• Lots of small GC's can also be bad!– Your application threads aren't able to allocate objects– i.e. Low Throughput

• Throughput increases when system is quiet– Be careful in analysing the right period of activity

55Thursday, 2 May 13

Page 56: Hotspot Garbage Collection - Tuning Guide

Low Throughput example 1/4

• 61 seconds in total pause time, log is only 170 seconds long

• Throughput is 64% --> Rule of thumb, should be 95%+

56Thursday, 2 May 13

Page 57: Hotspot Garbage Collection - Tuning Guide

Low Throughput example 2/4

• Lots of small pauses from various collectors, which ones?

57Thursday, 2 May 13

Page 58: Hotspot Garbage Collection - Tuning Guide

Low Throughput example 3/4

• ~25% time spent in young GC & ~5-10% in Full GCs (CMFs)

58Thursday, 2 May 13

Page 59: Hotspot Garbage Collection - Tuning Guide

Low Throughput example 4/4

• Object allocation hitting max heap size– Able to recover memory, so no leak, needs a bigger heap!

59Thursday, 2 May 13

Page 60: Hotspot Garbage Collection - Tuning Guide

Low Throughput 'Fixed' 1/4

• Increased footprint to -Xmx1024M

60Thursday, 2 May 13

Page 61: Hotspot Garbage Collection - Tuning Guide

Low Throughput 'Fixed' 2/4

• Lots less pauses from Full GCs CMF's - just looks nicer!– Still lots of young gen pauses

61Thursday, 2 May 13

Page 62: Hotspot Garbage Collection - Tuning Guide

Low Throughput 'Fixed' 3/4

• ~15% time spent in young GC & ~0% in Full GCs

62Thursday, 2 May 13

Page 63: Hotspot Garbage Collection - Tuning Guide

Low Throughput 'Fixed' 4/4

• Note: 33 seconds out of 170, ~81% Throughput, Better!

63Thursday, 2 May 13

Page 64: Hotspot Garbage Collection - Tuning Guide

Low Throughput 'Really Fixed' 1/2

• Switched to PSYoungGen collector (from ParNew)– Worth trying as young gen collections are dominant

64Thursday, 2 May 13

Page 65: Hotspot Garbage Collection - Tuning Guide

Low Throughput 'Really Fixed' 2/2

• Note: 9 seconds out of 170, ~95% Throughput, Best!

65Thursday, 2 May 13

Page 66: Hotspot Garbage Collection - Tuning Guide

Healthy Application

• What is healthy? It depends!

• Throughput– Typically a 95%+ throughput is good

• Pause times– < 1sec is good for generic web apps

• Footprint– Smaller == Less live objects to track == Better?

66Thursday, 2 May 13

Page 67: Hotspot Garbage Collection - Tuning Guide

Healthy Application

• Saw tooth pattern

• Bottom of troughs trend line is flat

67Thursday, 2 May 13

Page 68: Hotspot Garbage Collection - Tuning Guide

Healthy Minecraft Client!

• Note: JVM resizing itself, you let IT do the work!

68Thursday, 2 May 13

Page 69: Hotspot Garbage Collection - Tuning Guide

Maxed Allocation Rate

• User reports slow application behaviour

• Tech support has no idea why!– Normally you'd do a full performance diagnostic– But we can look at GC cheaply

• GC logs can help with non GC problems!– Memory Bandwidth limits are being hit– Not a GC problem!

• More common in virtualised environments– What else on the hardware is using bandwidth?

69Thursday, 2 May 13

Page 70: Hotspot Garbage Collection - Tuning Guide

Not Maxed Allocation Rate Example

70Thursday, 2 May 13

Page 71: Hotspot Garbage Collection - Tuning Guide

Max Allocation Rate Example

• 8GB/sec - could be getting close to real memory bandwidth

71Thursday, 2 May 13

Page 72: Hotspot Garbage Collection - Tuning Guide

Max Allocation Rate Example

Hard limit at ~8GB (8e+06 on graph)

72Thursday, 2 May 13

Page 73: Hotspot Garbage Collection - Tuning Guide

Max Allocation Rate 'Fixes'

• Lots you can do!

• Stop allocating so much!– Get out your Memory profiler– Alter the applications's objection allocation behaviour

• Get better hardware!– CPU– Faster Bus– Faster RAM

• Don't virtualise/share– Have your application be the only thing on that hardware

73Thursday, 2 May 13

Page 74: Hotspot Garbage Collection - Tuning Guide

Summary

• You need to understand some basic GC theory– Work with the Weak Generational Hypothesis– See http://www.insightfullogic.com for blog posts

• Turn on GC logging!– It has low overhead*– Reading raw log files is hard– Use tooling!

• Tradeoff: Pause Times vs Throughput vs Heap Size– Use tools to help you tweak– "Empirical Science Wins!"

74Thursday, 2 May 13

Page 75: Hotspot Garbage Collection - Tuning Guide

Join our performance community

http://www.jclarity.com

Martijn Verburg (@karianna)Dr. John Oliver (@johno_oliver)

75Thursday, 2 May 13