Replacing Traditional Technologies with MongoDB: A Single Platform for All Financial Data at AHL

Preview:

DESCRIPTION

 

Citation preview

Replacing Traditional Technologies with MongoDB

A Single Platform for all Financial Market Data

June 2014

James Blackburn & Gary Collier

Opinions expressed are those of the author and may not be shared by all personnel of Man Group plc (‘Man’).  These opinions are subject to change without notice, and are for information purposes only and do not constitute an offer or invitation to make an investment in any financial instrument or in any product to which any member of Man’s group of companies provides investment advisory or any other services.  Any forward-looking statements speak only as of the date on which they are made and are subject to risks and uncertainties that may cause actual results to differ materially from those contained in the statements.  Unless stated otherwise this information is communicated by Man Investments Limited and AHL Partners LLP which are both authorised and regulated in the UK by the Financial Conduct Authority. 

© Man 2014 2

Legal Stuff

© Man 2014 3

Introductions

Gary Collier James Blackburn

© Man 2014 4

Agenda

The Story of MongoDB at AHL1. What is a Systematic Fund Manager?2. Low Frequency Futures and FX Data3. Single Stock Equity Trading4. Building a Tick Store5. Now and the Future?

PrologueAHL – A Systematic Fund Manager

© Man 2014 5

© Man 2014 6

Systematic Fund Management

Removing the first impedance mismatch…

© Man 2014 7

Quants and Techies Speak the Same Language

© Man 2014 8

Disparate Data Sources

Dat

a A

PI

But…

© Man 2014 9

All Data is Behind an API

Performance User Experience

Cluster Compute

Onboarding New Data Impedance Mismatch

Mix of Technologies

Is there one Technology which could address?

Many Moving Parts

Reliability

© Man 2013 10

Chapter 1Starting Small: Low Frequency Data

© Man 2014 11

The Data

8000 rows x 200 markets100 MB

5000000 rows x 250 markets500 GB

Parallel Filesystem

© Man 2014 12

Previous Solution

HDF5

HDF5HDF5

HDF5 HDF5Prop

PropProp

Prop

Prop

RDBMS

RDBMS RDBMS

© Man 2014 13

The Challenge

Fast?

Reliable?

Versionable?

Easy to extend?

© Man 2014 14

MongoDB Solution

node 85 node 96node 86 …node 87

node 1 node 2 node 12

node 73 node 84node 74

.

...

.

.

node 3

node 75

.

.

SSDshard 1 shard 2 shard 3 shard 4

shard 1 shard 2 shard 3 shard 4shard 1 shard 2 shard 3 shard 4

MongoDB Cluster

Linux24 cores

96 GB RAM

BloombergAdapter

JPMAdapter

MarkitAdapter

GSAdapter

© Man 2014 15

Performance: 200 Future Markets

Previous Solution MongoDB

100x faster to retrieve data

Consistent retrieval times

© Man 2014 16

Performance: EURUSD 1-Minute Data

Previous Solution MongoDB

2-5x faster to retrieve data

Consistent retrieval times

© Man 2014 17

Low Frequency Data - Conclusions

MongoDB faster than previous RDBMS/File Solution at…• ALL data sizes and ALL client load levels• …consistently

Game changing new features:• No impedance mismatch: onboard new data in minutes• Version Store: can ask “What did the data look like?”

Cost Savings:• Proprietary parallel filesystem replaced by commodity

SSD’s

© Man 2013 18

Chapter 2Getting Bigger: Single Stock Equities

© Man 2014 19

Single Stock Data - Scale

Thousands of Stocks

Many years of Time-series Data

Tens of different Data Item for each Stock

Complex trading models with many Quants sharing the Data

TradingSignal

Derived Data Item

Derived Data Item

Derived Data Item

Derived Data Item

Derived Data Item

Raw Data ItemsRaw Data

ItemsRaw Data ItemsRaw Data

ItemsRaw Data Item

Multi-user, versioned, interactive graph-based computation

© Man 2014 20

Single Stock Data

Source Data(Managed RDBMS)

Raw Data ItemsRaw Data

ItemsRaw Data ItemsRaw Data

ItemsRaw Data Item

Derived Data Item

Derived Data Item

Derived Data Item

Derived Data Item

Derived Data Item

TradingSignal

shard 1 shard 2 shard 3 shard 4shard 1 shard 2 shard 3 shard 4shard 1 shard 2 shard 3 shard 4

MongoDB Cluster~1TB Data

~10,000 Stocks~20 Years

250 Data Items Each Item is 600 MB

Single model ~150GB

Many Quants and models

Hours Minutes

© Man 2014 21

Single Stock Trading - Conclusions

MongoDB faster than previous RDBMS/File Solution at…• Fast interactive research• Read/write a 600MB Data item in < 1 second• Rebuild complex model: hours minutes

© Man 2013 22

Chapter 3MongoDB as a Tick Store

Almost, but not quite

© Man 2014 23

Big Data?

30TB Historic Data

Ticks/1000 per second

Sparse Data

© Man 2014 24

Third-Party Tick Stores

Typically…• Expensive• Proprietary query languages• Database-centric architectures, so…• Not ideal for cluster compute• Unless you pay for lots of cores…• Expensive!

So…• A real $$$ saving opportunity!

© Man 2014 25

Architecture

ReutersR

MD

S M

essa

ge B

us

Bloomberg

Banks

Kafka Queue

Kafka Queue

Kafka Queue

16 shard clusterMaster + 1 replica

Linux12 cores

256 GB RAM96TB Disk

Infiniband network LZ4 compressed data

MongoDB Cluster

Parallel Access

© Man 2014 26

Tick Store Performance

Infinibandsaturated

25x greater tick throughput

With just 2 machines!

© Man 2014 27

Tick Store: System Load

OtherTick Mongo (x2)N Tasks = 32

© Man 2014 28

Tick Store - Conclusions

Happy Quants!• 25x improvement in tick throughput• So fit models 25x as fast

Happy Accountants!• >40x cost saving of MongoDB Support compared to

previous Tick Store licensing.

© Man 2014 29

EpilogueWhere are we now and where next?

PerformanceLow Frequency Data: 100x faster

Equities Models: Hours SecondsTick Data: 25x faster

© Man 2014 30

Key Facts

Cost SavingsParallel File System Commodity SSD’s

Proprietary Tick Store MongoDBOrders of magnitude $$$ savings…

Efficiencies4 storage technologies 1Fully utilise expensive HPC resources

Support load on team down > 50%

Game ChangersOnboard Data: Days MinutesData Versioning

The technology is no longer the bottleneck

“Peopleware”Attract and retain great Quants

Attract and retain great Techies

And attend a great conference

© Man 2014 31

Where Next?

1. Extend the data ecosystem further2. Broader application across the company as a whole3. Open Source?

© Man 2014 32

Questions

Gary Colliergcollier@ahl.com

James Blackburnjblackburn@ahl.com