Transcript
Page 1: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

Garbage, Garbage EverywhereGC Strategies for Event Processing Systems on the JVM

C. Scott AndreasPizza, Beer, and Tech TalksNovember 17, 2011

Page 2: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

What’s ESP / CEP?

• Event Stream ProcessingSelecting events on dimensions among a stream of movingdata, maintaining them for a brief period, emitting aggregations.

• Complex Event ProcessingIdentifying correlations between events, predicting trends,and programmatically reacting to emergent trends.

Page 3: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

ESP and Network Analytics

• Packet flows are event streams with many dimensions.

• Blast them into the engine, select over the stream, emit aggregations based on queries.

• Ipfix data flows in, JSON comes out.

Page 4: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

Back of the Envelope

• 500 Mbps / sec data comes into the JVMxxx Mbps / sec data goes out of the JVM

• This memory must be allocated, retainedfor processing, freed, and collected.

• Actual allocation rates far higher than data in / out(Memory also used for deserializing, aggregations, etc).

Page 5: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11
Page 6: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11
Page 8: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

Moment of Pause

• Don’t touch the knobs unless you need to

• Server defaults are a decent place to start for local development

• Defaults shipped with Cassandra decent for bimodal GC profiles

• Basic rule of thumb: if you’re aggressively tuning garbage collection, you can trade hours of frustration for ~10% gain

Page 9: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

Generational Garbage Collection

• Modern JVMs divide heap space up into multiple “generations.”

• Most applications have a lot of objects which live for a very short time, and a lot which live (nearly) forever.

• Generational collection enables the JVM to collect unused memory more efficiently by avoiding unnecessarily scanning heap / object graphs for references or free regions.

Page 10: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

first attempt: “deploy the g1”

Page 12: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11
Page 13: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

G1 Collector

• Hundreds of tiny 1ms collections / second rather thanParNew’s ~100 - 200ms larger collections.

• Capable of meeting ambitious pause targets.

• Powered by a gang of threads working in parallel

• ...cooperating to chew through CPU like it’s free.

Page 14: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

second attempt:you’re gonna laugh

Page 15: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

the unsafe

Page 16: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

Unsafe

• A OpenJDK/HotSpot class exposing direct access to the underlying VM, OS, and memory.

• This includes the ability to allocate, manage, and free memory.

• Perhaps we can outsmart the JVM and do a better job than it!

Page 17: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11
Page 18: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11
Page 19: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11
Page 20: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

Learned While Astray

• Finalization occurs in a single thread.

• Jumping from native finalization back into Java is expensive.

• Attempting to outsmart the garbage collector by creating hundreds of thousands of tiny ByteBuffers is...a thing.

• Java’s collectors are very good at collecting garbage.Your home-grown in-app GC go-kart is probably not.

Page 21: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

returning to earth ::attempt 3[a]

Page 22: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

Lessons from Science

• Your rate of “freeing” must be equal to or exceed your rate of object allocation on the heap.

• High rates of allocation speed up heap fragmentation,which compounds the problem.

• Creating less garbage reduces your rate of allocation(and freeing).

• This means less work for the garbage collector.

Page 23: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

best way to help out the gc ::

Page 24: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

PRODUCE LESS GARBAGE

Page 25: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

Breaking out YourKit

Page 26: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

attempt 3[b]:responsible tuning of the old hat

Page 27: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

Optimizing for Infant Mortality

• Java 6 AMD64 (server) defaults to allocating1/3 of heap to the new gen, 2/3 to the old gen.

• ESP/CEP workloads place tremendous pressure on the newgen. The vast majority of objects survive less than five seconds.

• Experiment: Allocate 80% of heap to the new gen, set a higher tenuring threshold, and lean hard on the ParNew collector.

default newgen ratios in java 6

Page 28: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

CMS Collector

• Guardian of the tenured generation, favorite workhorse for years.

• Primarily parallel, easier on the CPU than the G1.

• ...But contains a significant pause phase, is less suited to meeting low pause targets.

Page 29: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

ParNew Collector

• Designed for the small, but works great in the large.Excellent throughput, parallel collection.

• Can collect ~5GB in ~200ms on a quad-core Xeon w/HT.

• 200ms pause every several seconds favorable compared toless frequent multi-second pauses and promotion failures.

Page 30: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

Explosions in the Barrel

Page 31: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11
Page 32: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11
Page 33: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

“real-time” and the jvm

Page 34: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

Real Time and the JVM

• Real TimeAbility to meet specific targets with low variance is critical to the bare minimum functionality of the product (e.g., air bags).

• “Soft” Real TimeAbility to meet targets important but not critical. Value of system’s functionality is diminished but not eliminated by delay.

Page 35: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

Real Time and “The Pause”

• To what extent can a system which can endure pauses of unpredictable duration be considered “real-time”?

• Is it sufficient to mitigate the frequency and duration of pauses for a system to still deliver value as “soft real-time”?

• Is the alternative worth the cost?

Page 36: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

what does your app sound like?

Page 37: Scott Andreas - Garbage, Garbage Everywhere: GC Strategies for Event Processing Systems on the JVM, Boundary Tech Talks 11/17/11

Garbage, Garbage EverywhereGC Strategies for Event Processing Systems on the JVM

C. Scott AndreasPizza, Beer, and Tech TalksNovember 17, 2011


Recommended