43
Java Garbage Collectors – Moving to Java7 Garbage- First (G1) Collector Gurpreet S. Sachdeva Aricent Group

Java Garbage Collectors – Moving to Java7 Garbage First (G1) Collector

Embed Size (px)

Citation preview

Java Garbage Collectors – Moving to Java7 Garbage-First (G1) CollectorGurpreet S. Sachdeva

Aricent Group

2

Agenga

Memory Management• Performance Tuning• Garbage Collector• JIT Compiler• Heap size

GC Goals

• Minimal Footprint• High Throughput• Responsiveness / Low Latency

Generational Hypothesis

• Most objects die young• Only a few live very long• Longer they live, more likely they live longer

• Old objects rarely reference young objects

Generational Garbage Collector

GC Choices

CMS operations in Young Generation (i)

• Young Generation• 1 Eden and 2 Survivor

Spaces

• Old Generation• Compacted only at Full GC

CMS operations in Young Generation (ii)

• Young Generation Collection• Stop the World Pause• Live objects from

young generation moved to • Other survivor space• Old Generation

CMS operations in Young Generation (iii)

• After Young Generation GC• Eden and 1 Survivor

Space are empty• Objects promoted to

old generation

CMS operations in Old Generation (i)

• Mark Phases• Initial Mark (STW)• Concurrent Mark• Remark (STW)

CMS operations in Old Generation (ii)• Concurrent

Sweeping Phase• Collects objects

identified as unreachable during marking phases

• In-place de-allocation of unreachable objects

CMS operations in Old Generation (iii)• Resetting• All unmarked objects

de-allocated• Prepare for next

concurrent collection by clearing data structures

CMS Challenges

• Stop the World Pause (Remark phase)

• Very Large Heaps• Fragmentation• Hard to tune

Introducing G1

• Concurrent• Refinement, Marking, Cleanup

• Parallel• STW Pauses• Full GC is single threaded

• Compacting

G1 Goals

• Low Latency• Better Predictability• Easy to use & tune• Move away from current situation of 3 different

GC frameworks

G1 Heap Overview• Single large contiguous space divided into fixed

size regions (~ 2000)• No physical separation between young and old

generation• Objects moved between regions during collections• Humongous Regions for large objects

G1 - Young Generation GC• Live objects evacuated (copied/moved) to• One or more survivor regions• Old regions

• STW Pause• Done in parallel with multiple threads

• Eden size and survivor size calculated for next young GC cycle

G1 - Old Generation GC

• Initial Marking Phase• Piggybacked on Young Generation GC• STW Pause

G1 - Old Generation GC

• Concurrent Marking Phase• Calculates liveness information per region• Empty regions can be reclaimed easily

(denoted as X)

G1 - Old Generation GC

• Remark Phase• Completes marking of live objects in heap• Empty regions removed and reclaimed• STW Pause• Region liveness known for all other old

generation regions

G1 - Old Generation GC

• Copying/Cleanup Phase• Select regions with low liveness• Collect (some) during next Young GC

G1 Old Generation GC

• After Copying/Cleanup Phase• Selected regions collected and compacted• Some garbage objects may be left in old

generation regions

Summary - G1 Old Generation GC• Concurrent Marking Phase• Calculates liveness information per region, concurrently

while the application is running• Identifies best regions for subsequent evacuation phases• No corresponding sweeping phase

• Remark Phase• Different marking algorithm than CMS• Uses Snapshot-at-the-beginning (SATB) which is much

faster than what was being used in CMS• Completely empty regions are reclaimed

• Copying/Cleanup Phase• Young generation and Old generation reclaimed at the same

time• Old generation regions selected based on their liveness

G1 and CMS Comparison

Features G1 GC CMS GC

Concurrent and Generational Yes Yes

Releases Max Heap memory after usage Yes No

Low Latency Yes Yes

Throughput Higher Lower

Compaction Yes No

Predictability More Less

Physical separation between Young and Old No Yes

Footprint Overhead• For the same application size, as compared to

CMS, the heap size is likely to be larger in G1 due to additional accounting data structures

• Remembered Sets (RSets / RSet)• Track object references into a given region• Footprint overhead less than 5%• Caution• More inter-region references => Bigger

Remembered Set• Large Remembered Set => Slow GC

• Collection Sets (CSets / CSet)• Set of regions that will be collected in a GC• Footprint overhead less than 1%

Command Line Options

• -XX:+UseG1GC • Tells the JVM to use G1 Garbage Collector

• -XX:MaxGCPauseMillis=200• Sets target for the maximum GC pause time

G1 GC Tuning Options (i)

• Main goal is latency• If latency not a problem, then use Parallel GC

• Related goal is simplified tuning• Most important tuning option• XX:MaxGCPauseMillis=200 (default value =

200ms)• Influences maximum amount of work per

collection• Best effort only

G1 GC Tuning Options (ii)• -XX:InitiatingHeapOccupancyPercentage=n• Trigger to start GC• Percent of entire heap not just old generation

• -XX:G1OldCSetRegionLiveThresholdPercent=n• Threshold for region to be included in a Collection Set

G1 GC Tuning Options (iii)

• -XX:G1MixedGCCountTarget=n• How many Mixed GC / Concurrent Cycle

• Precaution• Fixing young generation size (-Xmn) can cause

PauseTimeTarget to be ignored• G1 no longer respects the pause time target• Even if heap expands, the young generation

size is fixed

G1 Logging (i)• Three different log levels• Log level as fine – Use -verbosegc (equivalent to -XX:+PrintGC)

• Sample Output

[GC pause (G1 Humongous Allocation) (young) (initial-mark) 24M- >21M(64M), 0.2349730 secs] [GC pause (G1 Evacuation Pause) (mixed) 66M->21M(236M), 0.1625268 secs]

• Log level as finer – Use -XX:+PrintGCDetails• Average, Min, and Max time displayed for each phase

• Root Scan, RSet Updating (with processed buffers information), RSet Scan, Object Copy, Termination (with number of attempts)

• Also shows “other” time such as time spent choosing CSet, reference processing, reference enqueuing and freeing CSet

• Shows the Eden, Survivors and Total Heap occupancies.

• Sample Output

[Ext Root Scanning (ms): Avg: 1.7 Min: 0.0 Max: 3.7 Diff: 3.7] [Eden: 818M(818M)->0B(714M) Survivors: 0B->104M Heap: 836M(4096M)->409M(4096M)]

G1 Logging (ii)• Log level as finest – Use -XX:

+UnlockExperimentalVMOptions -XX:G1LogLevel=finest• Like finer but includes individual worker thread information.

• Sample Output

[Ext Root Scanning (ms): 2.1 2.4 2.0 0.0 Avg: 1.6 Min: 0.0 Max: 2.4 Diff: 2.3] [Update RS (ms): 0.4 0.2 0.4 0.0 Avg: 0.2 Min: 0.0 Max: 0.4 Diff: 0.4] [Processed Buffers : 5 1 10 0 Sum: 16, Avg: 4, Min: 0, Max: 10, Diff: 10]

• Determine Time – How time is displayed in GC logs

XX:+PrintGCTimeStamps - Shows the elapsed time since the JVM started1.729: [GC pause (young) 46M->35M(1332M), 0.0310029 secs]

-XX:+PrintGCDateStamps - Adds a time of day prefix to each entry

2012-05-02T11:16:32.057+0200: [GC pause (young) 46M->35M(1332M), 0.0317225 secs]

G1 Logging Keywords (i)• Parallel Time - Overall elapsed time of the main parallel part of the

pause• Worker Start – Timestamp at which the workers start

• Note: The logs are ordered on thread id and are consistent on each entry

414.557: [GC pause (young), 0.03039600 secs] [Parallel Time: 22.9 ms] [GC Worker Start (ms): 7096.0 7096.0 7096.1 7096.1 706.1 7096.1 7096.1 7096.1 7096.2 7096.2 7096.2 7096.2 Avg: 7096.1, Min: 7096.0, Max: 7096.2, Diff: 0.2]

• External Root Scanning - The time taken to scan the external root (e.g., things like system dictionary that point into the heap.)

[Ext Root Scanning (ms): 3.1 3.4 3.4 3.0 4.2 2.0 3.6 3.2 3.4 7.7 3.7 4.4 Avg: 3.8, Min: 2.0, Max: 7.7, Diff: 5.7]

• Update Remembered Set - Buffers that are completed but have not yet been processed by the concurrent refinement thread before the start of the pause have to be updated.• Time depends on density of the cards. The more cards, the longer it will take.

[Update RS (ms): 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Avg: 0.0, Min: 0.0, Max: 0.1, Diff: 0.1] [Processed Buffers : 26 0 0 0 0 0 0 0 0 0 0 0 Sum: 26, Avg: 2, Min: 0, Max: 26, Diff: 26]

G1 Logging Keywords (ii)• Scanning Remembered Sets - Look for pointers that point into the Collection

Set

[Scan RS (ms): 0.4 0.2 0.1 0.3 0.0 0.0 0.1 0.2 0.0 0.1 0.0 0.0 Avg: 0.1, Min: 0.0, Max: 0.4, Diff: 0.3]F

• Object Copy - The time that each individual thread spent copying and evacuating objects

[Object Copy (ms): 16.7 16.7 16.7 16.9 16.0 18.1 16.5 16.8 16.7 12.3 16.4 15.7 Avg: 16.3, Min: 12.3, Max: 18.1, Diff: 5.8]

• Termination Time - When a worker thread is finished with its particular set of objects to copy and scan, it enters the termination protocol. It looks for work to steal and once it's done with that work it again enters the termination protocol. Termination attempt counts all the attempts to steal work.

[Termination (ms): 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0] [Termination Attempts : 1 1 1 1 1 1 1 1 1 1 1 1 Sum: 12, Avg: 1, Min: 1, Max: 1, Diff: 0]

• GC Worker End

[GC Worker End (ms): 7116.4 7116.3 7116.4 7116.3 7116.4 7116.3 7116.4 7116.4 7116.4 7116.4 7116.3 7116.3 Avg: 7116.4, Min: 7116.3, Max: 7116.4, Diff: 0.1]

• GC worker end time – Timestamp when the individual GC worker stops.

• GC worker time – Time taken by individual GC worker thread.

G1 Logging Keywords (iii)• GC Worker Other - The time (for each GC thread) that can't be attributed to

the worker phases listed previously. Should be quite low. [GC Worker Other (ms): 2.6 2.6 2.7 2.7 2.7 2.7 2.7 2.8 2.8 2.8 2.8 2.8 Avg:

2.7, Min: 2.6, Max: 2.8, Diff: 0.2]

• Clear CT - Time taken to clear the card table of RSet scanning meta-data[Clear CT: 0.6 ms]

• Other - Time taken for various other sequential phases of the GC pause.[Other: 6.8 ms]

• CSet - Time taken finalizing the set of regions to collect. Usually very small; slightly longer when having to select old

[Choose CSet: 0.1 ms]

• Ref Proc - Time spent processing soft, weak, etc. references deferred from the prior phases of the GC.

[Ref Proc: 4.4 ms]

• Ref Enq - Time spent placing soft, weak, etc. references on to the pending list.[Ref Enq: 0.1 ms]

• Free CSet - Time spent freeing the set of regions that have just been collected, including their remembered sets [Free CSet: 2.0 ms]

G1 Evacuation Failure• Promotion Failure when JVM runs out of heap

regions during the GC• Indicated by “to-space overflow” in

PrintGCDetails log• Very expensive operation

Sample Application Test

• Sample ApplicationCreate and add 190 Float Arrays into an Array List

Each Float Array reserves 4MB of memory, i.e. 1 x 1024 x 1024 = 4 MB

4 MB x 190 = 760 MB

After each iteration the arrays are released and application sleeps for some time

Same steps are repeated certain number of times

Observations for CMS• Command Line Argumentsjava -server -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:CMS.log -Dcom.sun.management.jmxremote.port=3333 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -classpath C:\Users\gusachde\workspace\Memory\bin GCTest 190

• Observations with VisualVM

Observations for G1Command Line Argumentsjava -server -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:G1GC.log -Dcom.sun.management.jmxremote.port=3333 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -classpath C:\Users\gusachde\workspace\Memory\bin GCTest 190

Observations with VisualVM

Results Comparison• G1 GC is able to

reclaim max heap size• CMS is not able to do so

• Lesser CPU utilization for G1 collection

• G1 Heap goes to max size in three distinct jumps• CMS seems to gain max

heap size in initial jump

Parameters

G1 GC CMS GC

Time taken for execution

7 min 5 sec

7 min 56 sec

Max CPU Usage

27.3% 70.2%

Max GC Activity

2% 24%

Max Heap Size

974 MB 974 MB

Max Used Heap Size

763 MB 779 GB

Is G1 For You• Evaluate all other options before moving to G1• Don’t need Low Latency• Use Parallel GC

• Don’t need big heap• Use small heap and Parallel GC

• Need big heap• Try CMS• If CMS not performing well => Tune it• If tuned CMS not performing well => Tune it further• If problem still persists => Check whether you require

such a big heap and low pauses• Start using G1• Test before deploying in production

References• JavaOne 2012 G1 Talk, Charlie Hunt, Monica

Beckwith• http://www.oracle.com/webfolder/technetwork/t

utorials/obe/java/gc01/index.html

• Poonam Bajaj’s blog • https://blogs.oracle.com/poonam/

• hotspot-gc-use mailing list

Thank You