32
Advanced Off Heap IPC in Java using OpenHFT (How does it change your design) Peter Lawrey CEO, Higher Frequency Trading.

Advanced off heap ipc

Embed Size (px)

Citation preview

Page 1: Advanced off heap ipc

Advanced Off Heap IPC in Java

using OpenHFT

(How does it change your design)

Peter LawreyCEO, Higher Frequency Trading.

Page 2: Advanced off heap ipc

Who are we

Higher Frequency Trading is a small consulting and software development house specialising in

Low latency, high throughput software 8 developers in Europe and USA. Sponsor HFT related open source projects Core Java engineering

Page 3: Advanced off heap ipc

What is our OSS

Key OpenHFT projects OpenHFT Chronicle, low latency logging,

event store and IPC. (record / log everything)

OpenHFT Collections, cross process embedded persisted data stores. (only need the latest)

Millions of operations per second.

Micro-second latency.

Page 4: Advanced off heap ipc

Why use Java?

A rule of thumb is that 90% of the time is spent in 10% of the code.

Writing in Java will mean that 10% of your code might mean optimising heavily.

Writing in C or C++ will mean that 100% of your code will be harder to write, or you have to use JNI, JNA, JNR-FFI.

Low level Java works well with natural Java.

Page 5: Advanced off heap ipc

Problem: The Java heap size

As the heap gets larger, the worst case GC pauses increase into the seconds.

Page 6: Advanced off heap ipc

Solution: Use memory off the heap

This often means using a database. Embedded data is much faster. OpenHFT supports embedded data, off heap

shared across multiple processes.

Page 7: Advanced off heap ipc

How is off heap memory used?

Memory mapped files Durable on application restart One copy in memory. Can be used without serialization /

deserialization. Thread safe operations across processes. Around 8x faster than System V IPC.

Page 8: Advanced off heap ipc

Use case: SharedHashMap

Large machine, 240 cores, 3 TB of memory. 80 JVMs sharing 50 GB of data. One copy in memory. Between 40 - 350 nano-second latency.

Page 9: Advanced off heap ipc

Creating the Map

SharedHashMap<String, BondVOInterface> shm =

new SharedHashMapBuilder()

.generatedValueType(true)

.entrySize(320)

.create(

new File("/dev/shm/BondPortfolioSHM"),

String.class,

BondVOInterface.class

);

Page 10: Advanced off heap ipc

Using the Map

// old style map.get, creates objects.

BondVOInterface bond = shm.get("369604101");

// re-using an off heap reference

BondVOInterface bond = newDirectReference(BondVOInterface.class);

// get or create bond for keyshm.acquireUsing("369604101", bond);

bond.setCoupon(4.25);

double coupon = bond.getCoupon();

Page 11: Advanced off heap ipc

Problem: You have more data than memory

Using a heap larger than main memory will kill performance, if not lock up your machine.

OpenHFT supports dramatic over committing with modest impact to performance.

On Linux, sparse files are supported, and data is swapped asycnrhonously by the OS.

Page 12: Advanced off heap ipc

Over commiting your size.

File file = File.createTempFile("over-sized", "deleteme");

SharedHashMap<String, String> map = new SharedHashMapBuilder()

.entrySize(1024 * 1024)

.entries(1024 * 1024)

.create(file, String.class, String.class);

for (int i = 0; i < 1000; i++) {

char[] chars = new char[i];

Arrays.fill(chars, '+');

map.put("key-" + i, new String(chars));

}

Page 13: Advanced off heap ipc

By over committing, we avoid resizing

System memory: 7.7 GB,

Extents of map: 2199.0 GB,

disk used: 13MB,

addressRange: 7d380b7bd000-

7F380c000000

This was run on a laptop.

BTW: Only one memory mapping is used.

Page 14: Advanced off heap ipc

How does this appear in “top”

Note: the third program “java” has a virtual memory size of 2051 GB.

Page 15: Advanced off heap ipc

SharedHashMap design

You can cache data, shared between processes in a thread safe manner.

Allows you to split your JVMs how you want or add monitoring or control in an external process.

Can support more data than main memory. Avoids the need to resize, or collect garbage.

Page 16: Advanced off heap ipc

SHM and throughput

SharedHashMap tested on a machine with 128 GB, 16 cores, 32 threads.

String keys, 64-bit long values. 10 million key-values updated at 37 M/s 500 million key-values updated at 23 M/s On tmpfs, 2.5 billion key-values at 26 M/s

Page 17: Advanced off heap ipc

SHM and latency

For a Map of small key-values (both 64-bit longs)

With an update rate of 1 M/s, one thread.

Percentile 100K

entries1 M entries 10 M entries

50% (typical) 0.1 μsec 0.2 μsec 0.2 μsec

90% (worst 1 in 10) 0.4 μsec 0.5 μsec 0.5 μsec

99% (worst 1 in 100) 4.4 μsec 5.5 μsec 7 μsec

99.9% 9 μsec 10 μsec 10 μsec

99.99% 10 μsec 12 μsec 13 μsec

worst 24 μsec 29 μsec 26 μsec

Page 18: Advanced off heap ipc

Problem: your sustained update is too high for your consumers.

Your consumers might be on a limited bandwidth. Humans

Do you want to control the rate of data sent but still ensure the latest data is available ASAP.

Page 19: Advanced off heap ipc

SHM replication

Supports TCP replication and/or UDP & TCP. UDP replication only sends the data once and

doesn't have NACK storms. Uses TCP as back up.

You control the rate the data is sent. It always sends the latest values.

Page 20: Advanced off heap ipc

Problem: you want to record everything, but this is too slow.

Chronicle is designed to support millions of messages per second, without locking or garbage.

Build deterministic systems where all the inputs and outputs are recorded and reproducible

Downstream systems don't need to interrogate up stream as they have a complete view of the state of the system.

Page 21: Advanced off heap ipc

Problem: TCP and System IPC take many micro-seconds.

Chronicle typically takes micro-seconds, including serialization and deserialization.

Most messaging solutions don't consider serialization cost in Java.

Short binary messages can be 200 ns.

Page 22: Advanced off heap ipc

Use for Chronicle

Synchronous text logging. Synchronous binary data logging

Page 23: Advanced off heap ipc

Use for Chronicle

Messaging between processesvia shared memory

Messaging across systems

Page 24: Advanced off heap ipc

Use for Chronicle

Supports recording micro-second timestamps across the systems

Replay for production data in test

Page 25: Advanced off heap ipc

Chronicle and replication

Replication is point to point (TCP)

Server A records an event

– replicates to Server B

Server B reads local copy

– B processes the event

Server B stores the result.

– replicates to Server A

Server A replies.

Round trip 25 micro-seconds99% of the time

GC-freeLock lessOff heap

Unbounded

Page 26: Advanced off heap ipc

How does it recover?

Once finish() returns, the OS will do the rest.

If an excerpt is incomplete, it will be pruned.

Page 27: Advanced off heap ipc

Cache friendly

Data is laid out continuously, naturally packed. You can compress some types. One entry starts in the next byte to the previous one.

Page 28: Advanced off heap ipc

Problem: A slow consumer, slows the producer.

No matter how slow the consumer is, the producer never has to wait. It never needs to clean messages before publishing (as a ring buffer does)

You can start a consumer at the end of the day e.g. for reporting. The consumer can be more than the main memory size behind the producer as a Chronicle is not limited by main memory.

Page 29: Advanced off heap ipc

How does it collect garbage?

There is an assumption that your application has a daily or weekly maintenance cycle.

This is implemented by closing the files and creating new ones. i.e. the whole lot is moved, compressed or deleted.

Anything which must be retained can be copied to the new Chronicle

Page 30: Advanced off heap ipc

Is there a higher level API?

You can hide the low level details with an interface.

Page 31: Advanced off heap ipc

Is there a higher level API?

There is a demo program with a simple interface.

This models a “hub” process which take in events, processes them and publishes results.

Page 32: Advanced off heap ipc

Q & A

https://github.com/OpenHFT/OpenHFT

@PeterLawrey

[email protected]