Upload
j-on-the-beach
View
119
Download
0
Embed Size (px)
Citation preview
Copyright 2016 Kirk Pepperdine
Moving to G1GC
Copyright 2016 Kirk Pepperdine
About me
- Write and speak about performance tuning- Offer performance tuning services and training- created jPDM, a performance tuning methodology
- Co-founder of jClarity- building the first generation of performance
diagnostic engines- Java Champion since 2006
Copyright 2016 Kirk Pepperdine
G1GC will the be the default collector in Java 9
What impact might this have on your applications performance?
Copyright 2016 Kirk Pepperdine
Questions To Be Answered
- What does a Regional heap look like?- How does the current G1GC algorithm work?- How does performance compare to other collectors- What are the tools we can use the to help us;
- engage in evidence based tuning- develop strategies so GC doesn’t interfere with
with our application’s throughput- tune our application to work better with the
collector
Copyright 2016 Kirk Pepperdine
Generational Garbage Collection
- Mark-Sweep Copy (evacuation) for Young- eden and survivor spaces- both serial and parallel implementations
- Mark-Sweep (in-place) for Old space- Serial and Parallel with compaction- (mostly) Concurrent Mark-Sweep- incremential mode
Copyright 2016 Kirk Pepperdine
Why another collector
- Scalability- pause time tends to be a function of heap size
- CMS is difficult to tune- dozens of parameters some of which are very difficult
to understand how to use- -XX:TLABWasteTargetPercent=????
- Completely unpredictable- well, maybe but that is a different talk
Copyright 2016 Kirk Pepperdine
G1GC
- Designed to scale- break the pause time heap size dependency
- Easier to tune (maybe)- fewer configuration options
- Predictable- offer pause time goals and have the collector tune it’s
self
Copyright 2016 Kirk Pepperdine
A G1GC heap is
- 1 large contigous reserved space- specified with -mx- split into ~2048 regions- size is 1, 2, 4, 8, 16, 32, or 64m
eg. -mx10G,Region size = 10240M/2048 = 5mreduce to 4GNumber of regions = 10G/4m = 2560
Copyright 2016 Kirk Pepperdine
Regions
- Free regions are kept on a free region list- When in use will be tagged as;- Eden, Survivor, Old, or Humongous
Copyright 2016 Kirk Pepperdine
Allocation
- Free regions are kept on a free region list- mutator threads acquire a region from region free list- tag region as Eden
- allocate object into region- when region is full, get a new regions from free list
Eden
Eden
Eden
Eden
Copyright 2016 Kirk Pepperdine
Humongous Allocation
- allocation is larger than 1/2 a regions size- size of a regions defines what is humongous
- allocate into a humoungous region - created from a set of contigous regions
Eden
Eden
Eden
Eden
Humongous
Copyright 2016 Kirk Pepperdine
Garbage Collection Triggers
- Alloted number of Eden regions have been consumed- Unable to satisfy a Humongous allocation- regions fragmentation- may lead to full collection
- Heap is full- full collection
- Metaspace threshold is reached- full discussion beyond the scope of this talk
Copyright 2016 Kirk Pepperdine
Garbage Collection
- Young Gen is Mark-Sweep- Mostly Concurrent-Mark of Tenured- initial-mark included with Young-Gen collection- concurrent-root-region-scan- concurrent-mark- remark- cleanup- concurrent-cleanup
- Mixed is mark Young, sweep Young and some tenured
Copyright 2016 Kirk Pepperdine
Reclaiming Memory (detailed)
- Mark Sweep Copy (Evacuating) Garbage Collection- Capture all mutator threads at a safepoint- Complete RSet refinement- Scan for GC Roots- Trace all references from GC roots- mark all data reached during tracing
- Copy all marked data into a “to space”- Reset supporting structures- Release all mutator threads
Copyright 2016 Kirk Pepperdine
RSet
- Track all external pointers to a region- GC roots for the region
- Expensive to update- mutations recored to a refinement
queue- update delegated to refinement
threads
Copyright 2016 Kirk Pepperdine
RSet Refinement
- Refinement queue is divided into 4 regions- White: no refinement threads are working- Green: number of cards that can be processed
without exceeding 10% of pause time- Yellow: all refinement threads are working to keep
up- Red: Application threads are involved in refinement
Copyright 2016 Kirk Pepperdine
CSets
- Set of all regions to be swept- Goal is to keep pauses under MaxGCPauseMillis- controls the size of the CSet
- CSet contain- all Young regions- selected Old regions during mixed collections- number / mixed GC ratio
Copyright 2016 Kirk Pepperdine
Heap after a Mark/Sweep
- all surviving objects are copied into (to) Survivor regions- Eden and (from) Survivor regions are returned to free
regions list
Humoungous
Survivor
Copyright 2016 Kirk Pepperdine
Promotion to Old
- Data is promoted to old- from survivor when it reaches tenuring threshold- to prevent survivor from being overrun- pre-emptive or reactive
Humongous
Survivor Old
Copyright 2016 Kirk Pepperdine
Parallel Phases
- external root scanning- updating remembered sets- scan remembered sets- code root scanning- object copy- string dedup
Copyright 2016 Kirk Pepperdine
Serial Phases
- code root fixup- code root migration- clear CT- choose CSet- Reference processing- redirty cards- free CSet
Copyright 2016 Kirk Pepperdine
Starting a (mostly) Concurrent Cycle
- Scheduled when heap occupancy reaches 45%- initial-mark runs inside a Young collection- mark calculates livelyness- used for CSet inclusion decisions
Eden
Eden
Eden
Eden
Humoungous Survivor
Survivor
Old
OldOld
Old
Old
Old
Old
Old OldOld
Old
Old
Old
Old
Old
Copyright 2016 Kirk Pepperdine
Flags (you want to use)
-XX:+UseG1GC-mx4G-XX:MaxGCPauseMillis=200
-Xloggc:gc.log-XX:+PrintGCDetails-XX:+PrintTenuringDistribution-XX:+PrintReferenceGC"-XX:+PrintGCApplicationStoppedTime-XX:+PrintGCApplicationConcurrentTime"
Copyright 2016 Kirk Pepperdine
Flags (you might want to use)-XX:G1HeapRegionSize=1-XX:InitiatingHeapOccupancyPercent=45-XX:+UnlockExperimentalVMOptions"-XX:G1NewSizePercent=5
-XX:+UnlockDiagnosticVMOptions-XX:+G1PrintRegionLivenessInfo
-XX:SurvivorRatio=6-XX:MaxTenuringThreshold=15
Copyright 2016 Kirk Pepperdine
Flags (you should think twice about using)
-XX:G1MixedGCCountTarget=8
-XX:+UnlockExperimentalVMOptions"-XX:G1MixedGCLiveThresholdPercent=85/65
Copyright 2016 Kirk Pepperdine
Flags (you should never use)
-XX:+UnlockExperimentalVMOptions"-XX:G1OldCSetRegionThresholdPercent=10-XX:G1MaxNewSizePercent=60-XX:G1HeapWastePercent=10-XX:G1RSetUpdatingPauseTimePercent=10
Copyright 2016 Kirk Pepperdine
Things that give the G1 grief
- RSet refinement- too much overhead to put work on mutator thread- affects application throughput- high rates of mutation place pressure on RSet
refinement- will affect Young parallel phase and remark times
- Object copy- not much to say here (unfortunately)
Copyright 2016 Kirk Pepperdine
Things that give the G1 grief
- Humongous allocations- definition controlled by region size- bigger region yields bigger RSet refinement costs
- Floating garbage- “dead” objects in other regions keep dead objects
alive- negative impact on object copy costs
- more aggressive ripeness settings- most costly collections
Copyright 2016 Kirk Pepperdine
Tuning Cassandra (benchmark)
- Out of the box tuned for using CMS- exceptionally complex set of configurations
- Reconfigured- to run G1- given fixed unit of work which should ideally be
cleared in 15 minutes
Goal: Configure G1 to maximize MMU
Copyright 2016 Kirk Pepperdine
Cassandra throughput running @ 100% CPU
0
17500
35000
52500
70000
1 2 3 4 5 6 7 8 9 10 11 12
CMS
G1GC
Copyright 2016 Kirk Pepperdine
Run times
00:12:35
00:14:40
00:16:45
00:18:50
00:20:55
1 2 3 4 5 6 7 8 9 10 11
Copyright 2016 Kirk Pepperdine
Weak Generational HypothesisRa
te
Time
Copyright 2016 Kirk Pepperdine
Performance Seminar`
www.kodewerk.com
Java P
erform
ance T
uning,
June 2
-5, Chan
ia Gree
ce