jvm goes to big data

Preview:

DESCRIPTION

invited netflix talk: JVM issues in the age of scale! We take an under the hood look at java locking, memory model, overheads, serialization, uuid, gc tuning, CMS, ParallelGC, java.

Citation preview

JVM goes BigData

srisatish.ambati AT gmail.comDataStax/OpenJDK2/28/2011@srisatish

Motivation

• A compendium of recent jvm scale issues while working with big data.

• This talk will not have details on big data.

• Thanks Sid!

Trail Ahead

synchronizedNon-blocking Hashmap    - A state transition viewCollectionsSerializationUUIDGarbage Collection    - The free parameters!    - Generations, Promotion, Fragmentation    - OffheapQuestions & asynchronous IO

tools of trade

• What the JVM is doing:– dtrace, hprof, introscope, jconsole, visualvm, yourkit,

gchisto, zvision

• Invasive JVM observation tools:– bci, jvmti, jvmdi/pi agents, logging

• What the OS is doing:– dtrace, oprofile, vtune, perf

• What the network/disk is doing:– ganglia, iostat, lsof, nagios, netstat, tcpdump

synchronized

under the hood– Fast path for no-contention thin lock

– Bias threads to lock or bulk revoke bias

– Store free biasing

JMM: happens-before, causality

Partial order

volatile

Piggybacking

FutureTask

BlockingQueue

jsr133

java.util.concurrent also holds locks!

Tomcat under concurrent load!

Non-blocking collections: Amdahl's > Moore's!

 State, Actions – key/value pairs!get, put, delete, _resize

ByteArray to hold DataConcurrent writes: using CAS

No locks, no volatileMuch faster than locking under heavy load

Directly reach main data array in 1 step

Resize as neededCopy Array to a larger Array on demand. Post updates

Death & Taxes: Java Overheads!

• Cost of an 8-char String?

• Cost of 100-entry TreeMap<Double,Double> ?

8bhdr

12bfields

4bptr

4bpad

8bhdr

4blen

16bdata

A: 56 bytes, or a 7x blowup

48bTreeMap

40bTreeMap$Entry

16bDouble

16bDouble

A: 7248 bytes or a ~5x blowup

yourkit: memory profile

Which collection: Mozart or Bach?

Concurrency: Non-blocking HashMap Google Collections

Overheads Watch out for per-element costs! Primitives can be hard to manage!

Sparse collections

Average collection size in enterprise is ~3

 

  java.io.Serializable is S.L..O.…W

True to platform Use “transient” ObjectSerialField[] Avro Google Protocol Buffers, Externalizable + byte[] Roll your own

serializable

ser+deser smaller is better

https://github.com/eishay/jvm-serializers.git

avro

• Schema– No per datum overheads

– Optional code gen

• Types are runtime

• Untagged data

• No manually-assigned field Ids

Cons:

• Schema mismatches

• Runtime only checks

google-proto-buffer

• Define message format in .proto file

• All data in key/value pairs

• Generate sources

• .builder for each class with getter/setter

thrift

• Type, Transport, Protocol, Version, Processors

• Separation of structure from protocol & transport

• TCompactProtocol, etc– tag/data, compression

• TSocket, TfileTransport, etc

• colocated clients & servers

UUIDjava.util.UUID is slow

dominated by sha_transform costs Leach-salz (128-bit)

Turns out that default PRNG (via SecureRandom)

Uses /dev/urandom for seed initialization

-Djava.security.egd=file:/dev/urandom

• PRNG without file is atleast 20%-40% better.

Use TimeUUIDs where possible – much faster

Alternatives: JUG – java.uuid.generator, com.eaio.uuid

~10x faster

http://github.com/cowtowncoder/java-uuid-generator

http://jug.safehaus.org/

http://johannburkard.de/blog/programming/java/Java-UUID-generators-compared.htm

/**

* Returns a {@code String} object representing this {@code UUID}.

*

* <p> The UUID string representation is as described by this BNF:

* <blockquote><pre>

* {@code

* UUID = <time_low> "-" <time_mid> "-"

* <time_high_and_version> "-"

* <variant_and_sequence> "-"

* <node>

* time_low = 4*<hexOctet>

* time_mid = 2*<hexOctet>

* time_high_and_version = 2*<hexOctet>

* variant_and_sequence = 2*<hexOctet>

* node = 6*<hexOctet>

* hexOctet = <hexDigit><hexDigit>

* hexDigit =

* "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"

* | "a" | "b" | "c" | "d" | "e" | "f"

* | "A" | "B" | "C" | "D" | "E" | "F"

* }</pre></blockquote>

*

* @return A string representation of this {@code UUID}

*/

public String toString() {

return (digits(mostSigBits >> 32, 8) + "-" +

digits(mostSigBits >> 16, 4) + "-" +

digits(mostSigBits, 4) + "-" +

digits(leastSigBits >> 48, 4) + "-" +

digits(leastSigBits, 12));

}

Leach-salz UUID

------------------------------------------------------------------------------------------------------------------------------- PerfTop: 1485 irqs/sec kernel:18.6% exact: 0.0% [1000Hz cycles], (all, 8 CPUs)-------------------------------------------------------------------------------------------------------------------------------

samples pcnt function DSO _______ _____ ________________________________________________________________

1882.00 26.3% intel_idle [kernel.kallsyms] 1678.00 23.5% os::javaTimeMillis() libjvm.so 382.00 5.3% SpinPause libjvm.so 335.00 4.7% Timer::ImplTimerCallbackProc() libvcllx.so 291.00 4.1% gettimeofday /lib/libc-2.12.1.so 268.00 3.7% hpet_next_event [kernel.kallsyms] 254.00 3.6% ParallelTaskTerminator::offer_termination(TerminatorTerminator*) libjvm.so ------------------------------------------------------------------------------------------------------------------------------- PerfTop: 1656 irqs/sec kernel:59.5% exact: 0.0% [1000Hz cycles], (all, 8 CPUs)-------------------------------------------------------------------------------------------------------------------------------

samples pcnt function DSO _______ _____ ________________________________________________________________ 6980.00 38.5% sha_transform [kernel.kallsyms] 2119.00 11.7% intel_idle [kernel.kallsyms] 1382.00 7.6% mix_pool_bytes_extract [kernel.kallsyms] 437.00 2.4% i8042_interrupt [kernel.kallsyms] 416.00 2.3% hpet_next_event [kernel.kallsyms] 390.00 2.2% extract_buf [kernel.kallsyms] 376.00 2.1% ThreadInVMfromNative::~ThreadInVMfromNative() libjvm.so 321.00 1.8% T.3542 libjvm.so 298.00 1.6% __ticket_spin_lock [kernel.kallsyms] 296.00 1.6% Timer::ImplTimerCallbackProc() libvcllx.so 255.00 1.4% Unsafe_GetInt libjvm.so

summary

TimebasedUUIDs vs. UUIDs

use ~4 times less kernel time on creation!

No SHA library calls!

optimized toString()

Much faster than standard java.util.UUID

- Better Instructions per clocks as well.

If on EC2:

Watch out for non-cacheable file access to /dev/urandom!

String theory of Java!

byte[] vs. char[]

If ver > jdk16u21 try -XX:+UseCompressedStrings

Append performance (gc) differs:

Strings vs. StringBuffers

com.google.common.base.Joiner• Join text for cheap,

• skipNulls or useForNulls()

“Null References: A billion dollar mistake”

- C.A.R Hoare

“I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.” - qconlondon, '09

Best Practices:Garbage Collection

verbose:gc

GC Logs are cheap even in production

-Xloggc:gc.log

-XX:+PrintGCDetails

-XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution

A bit expensive/obscure ones: -XX:PrintFLSStatistics=2 -XX:CMSStatistics=1

-XX:CMSInitiationStatistics -XX:+PrintFLSCensus

Three free parameters

Allocation Rate: your workload!

Size: defines runway!

Live Set, memory

Pause times:

Stoppages!

Four free parameters

Allocation Rate: your application load!

Size: defines runway!

Live Set, system memory

Pause times:

Stoppages!

(fourth: Overheads of GC – Space & CPU.)

Part I: Sizingto be -Xmx == -Xms or not?Young generation:

Use -Xmn for predictable performance

edensurvivor spaces

new Object()survivor ratio

jvm allocates

TenuringThreshold

promotion

old gen

Part II: Pick a collector!

Serial GC – Serial new + Serial Old

Parallel GC (default) Parallel Scavenge + Serial Old

UseParallelOldGC : Parallel Scavenge + Parallel Old

UseConcurrentMarkSweep: ParNew, CMS Old, Serial Old

G1/Experimental

Reading GC logs – a topic/tool

Full GC is STW

Initial Mark, Rescan/WeakRef/Remark are STW

Look for promotion failures

Look for concurrent mode failures

... 995.330: [CMS-concurrent-mark: 0.952/1.102 secs] [Times: user=3.69 sys=0.54, real=1.10 secs] 995.330: [CMS-concurrent-preclean-start]995.618: [CMS-concurrent-preclean: 0.279/0.287 secs] [Times: user=0.90 sys=0.20, real=0.29 secs] 995.618: [CMS-concurrent-abortable-preclean-start]995.695: [GC 995.695: [ParNew (promotion failed)Desired survivor size 41943040 bytes, new threshold 1 (max 1)- age 1: 29826872 bytes, 29826872 total: 720596K->703760K(737280K), 0.4710410 secs]996.166: [CMS996.317: [CMS-concurrent-abortable-preclean: 0.218/0.699 secs] [Times: user=1.39 sys=0.10, real=0.70 secs] (concurrent mode failure): 4100132K->784070K(5341184K), 4.7478300 secs] 4780154K->784070K(6078464K), [CMS Perm : 17033K->17014K(28400K)], 5.2191410 secs] [Times: user=5.70 sys=0.01, real=5.22 secs]...

Tuning CMS

Don’t promote too often! Frequent promotion causes fragmentation

(avoid never tenure) TenuringThreshold

Size the generations Min GC times are a function of Live Set

Old Gen should host steady state comfortably

Avoid CMS Initiating heuristic -XX:+UseCMSInitiationOccupanyOnly

Use Concurrent for System.gc() -XX:+ExplicitGCInvokesConcurrent

GC Threads

Parallelize on multicores -XX:ParallelGCThreads=4

(default: derived from # of cpus on system)

*8 + (n-5)/8

-XX:ParallelCMSThreads=4

(default: derived from # of parallelgcthreads)

Strategy A:

Tune min gcs & let appl data in eden

Fragmentation

Performance degrades over time

Inducing “Full GC” makes problem go away

Free memory that cannot be used

Round off errors

Reduce occurrenceUse a compacting collector

Promote less often

Use uniform sized objects

Not enough large contiguous space for promotion

Small objects still can fit in the holes!

Compaction – stop the world.

Unsolved on Oracle/Sun Hotspot

Azul Systems Pauseless JVM.

JRockit Mission Control

Example

Application suddenly transitions to back-to-back full gcs.

Cannot use free mem – too many holes!

Tools

• GCHisto

• jconsole

• VisualVM/VisualGC

• Logs

• Thread dumps

• yourkit memory profile, snapshots

GCSpy

Gone 0xff the heap !!

ByteBuffer.allocateDirect(16 * 1024 * 1024)

Also can be mapped memory of a file region

Store long-lived objects outside jvm

Managed by native i/o ops.

JNA: dynamically load & call native libraries without compile time decl like JNI

Works for limited use cases in the lab.

Ex: Terracotta, Hbase, Cassandra

Gone 0xff the heap ?

Issues to consider:No clear api to de-allocate from this region

– See jbellis patch to JNA-179 for FreeableBufferObject cleanup relegated to finalization Single finalizer thread, Bug ID: 4469299Behind WeakReference processing in jdk16u21

Workaround:-XX:MaxDirectMemorySize=<size> Manually Trigger System.gc() to avoid “leak”

Virtually there!

Ballooning driver for Memory: Disable it!

Time (TSC) issue! It's relative!

Scheduling when # of threads > # of vcpus..

Tickless _nohz kernel

GC Thread starvation = STW pauses

large ec2 instances are not all equal..

DirectPathIO & vt-d, rvi – Watch out for Sockets!

Tools: Performance counters still not virtualized!

summary

• JVM is still the most popular platform for deployment for the new languages!

• JVM heartburn around scale!– Serialization– UUID– Object overhead– Garbage Collection– Hypervisor

References

Chris Wimmer, Chris Wimmer, http://wikis.sun.com/display/HotSpotInternals/Synchronizationhttp://wikis.sun.com/display/HotSpotInternals/SynchronizationRussel & Detlefs Russel & Detlefs http://www.oracle.com/technetwork/java/biasedlocking-oopsla2006-wp-149958.pdfGoogle Protocol Buffers Google Protocol Buffers http://code.google.com/p/protobufThrift Thrift http://incubator.apache.org/thrift/static/thrift-20070401.pdfLeach-Salz Variant of UUID Leach-Salz Variant of UUID http://www.upnp.org/resources/draft-leach-uuids-guids-00.txtHans Boehm, Hans Boehm, http://www.hpl.hp.com/personal/Hans_Boehm/gc/complexity.htmlBrian Goetz, JSR-133 Brian Goetz, JSR-133 http://www.ibm.com/developerworks/java/library/j-jtp03304/GCSpy GCSpy http://www.cs.kent.ac.uk/projects/gc/gcspy/Understanding GC logs Understanding GC logs http://blogs.sun.com/poonam/entry/understanding_cms_gc_logs

Cliff Click's http://sourceforge.net/projects/high-scale-lib/Cliff Click's http://sourceforge.net/projects/high-scale-lib/

Recommended