36
Massively Scalable Computational Finance with SciDB Bryan Lewis Chief Data Scientist Frank Smietana Solutions Architect

Massively Scalable Computational Finance with SciDB

Embed Size (px)

DESCRIPTION

Hedge funds, investment managers and prop shops need to keep pace with rapidly growing data volumes from many sources. SciDB—an advanced computational database programmable from R and Python—scales out to petabyte volumes and facilitates rapid integration of diverse data sources. Open source and running on commodity hardware, SciDB is extensible and scales cost effectively. Attend this webinar to learn how quants and system developers harness SciDB’s massively scalable complex analytics to solve hard problems faster. SciDB’s native array storage is optimized for time-series data, delivering fast windowed aggregates and complex analytics, without time-consuming data extraction. Webinar presenters will demonstrate real world use cases, including the ability to quickly: 1. Generate aggregated order books across multiple exchanges 2. Create adjusted continuous futures contracts 3. Analyze complex financial networks to detect anomalous behavior

Citation preview

Page 1: Massively Scalable Computational Finance with SciDB

Massively Scalable

Computational

Finance with SciDB

Bryan Lewis

Chief Data Scientist

Frank Smietana

Solutions Architect

Page 2: Massively Scalable Computational Finance with SciDB

© P

ara

dig

m4

GoToWebinar

• Ask questions using the

Q&A window

• This webinar is being

recorded

• Replays will be available

from paradigm4.com

Page 3: Massively Scalable Computational Finance with SciDB

© P

ara

dig

m4

Common issues

• Expensive data ETL

• Lack of horizontal scalability

• Hard to program

• Hard to extend

• Difficulty with data JOINS

Page 4: Massively Scalable Computational Finance with SciDB

© P

ara

dig

m4

What is SciDB?

Massively scalable

distributed array database

Page 5: Massively Scalable Computational Finance with SciDB

© P

ara

dig

m4

What is SciDB?

Open source

Page 6: Massively Scalable Computational Finance with SciDB

© P

ara

dig

m4

Mike Stonebraker CTO

What is SciDB?

Page 7: Massively Scalable Computational Finance with SciDB

© Paradigm4 Inc.

Lawrence Berkeley

NASA Goddard

Projects using satellite image data

Institute for Geoinformatics

Global land change analysis on remote

sensing data (LANDSAT, MODIS, SENTINEL)

Lawrence Berkeley

Big Science and SciDB

Page 8: Massively Scalable Computational Finance with SciDB

© P

ara

dig

m4

Commercial applications Pharma, Biotech, Healthcare

Quantitative Finance

Image & Sensor Analytics

E-commerce

Page 9: Massively Scalable Computational Finance with SciDB

© P

ara

dig

m4

Arrays for finance

Symbol

Tim

e

Page 10: Massively Scalable Computational Finance with SciDB

© P

ara

dig

m4

Fast multidimensional SELECTs

Page 11: Massively Scalable Computational Finance with SciDB

© P

ara

dig

m4

Table model i j data

1 1 0.5

1 2 0.3

1 3 0.1

1 4 -0.5

2 1 0.9

2 2 0.0

2 3 -0.8

2 4 -0.8

3 1 1.1

3 2 1.0

3 3 1.2

3 4 1.5

4 1 0.9

4 2 1.0

4 3 1.2

4 4 1,5

Page 12: Massively Scalable Computational Finance with SciDB

© P

ara

dig

m4

Array model

0.5 0.3 0.1 -0.5

0.9 0.0 -0.8 -0.8

1.1 1.0 1.2 1.5

0.9 1.0 1.2 1.5

j

i

(1,1)

Page 13: Massively Scalable Computational Finance with SciDB

© P

ara

dig

m4

Our approach

• Less data movement

• Spatial data clustering

• Leverage popular languages

• Extensibility

Page 14: Massively Scalable Computational Finance with SciDB

© P

ara

dig

m4

C++

Julia

Java/JVM

Javascript

Array SQL

Use Popular Languages

JDBC

Protocol buffers

C/C++ API

HTTP

Page 15: Massively Scalable Computational Finance with SciDB

© P

ara

dig

m4

SciDB

0

SciDB

SciDB

1

SciDB

2

Shared-nothing architecture

Page 16: Massively Scalable Computational Finance with SciDB

© P

ara

dig

m4

Common issues

• Expensive data ETL

• Lack of horizontal scalability

• Hard to program

• Hard to extend

• Difficulty with data JOINS

Page 17: Massively Scalable Computational Finance with SciDB

© P

ara

dig

m4

SciDB

• Minimize ETL

• Massively scalable

• Program from many languages

• Open-source extensibility

• Fast parallel JOIN

Page 18: Massively Scalable Computational Finance with SciDB

© P

ara

dig

m4

Poll

Page 19: Massively Scalable Computational Finance with SciDB

© P

ara

dig

m4

Examples

• Order books

• Network analysis

Page 20: Massively Scalable Computational Finance with SciDB

© P

ara

dig

m4

Order book challenges

• Lots of exchanges

• Regulatory compliance

• Margins are shrinking

• Want more alpha

Page 21: Massively Scalable Computational Finance with SciDB

© P

ara

dig

m4

Create order book

• Load raw data into array

• Dimension along symbol and time

coordinate axes

• Create order book entries with

custom aggregation function ORDERBOOK

https://github.com/Paradigm4/orderbook-example

Page 22: Massively Scalable Computational Finance with SciDB

© P

ara

dig

m4

Consolidate order books

• Load as arrays

• Merge into single array

• Impute missing value

(inexact temporal join)

• Aggregate by time and symbol

Page 23: Massively Scalable Computational Finance with SciDB

© P

ara

dig

m4

Example Order Books

Page 24: Massively Scalable Computational Finance with SciDB

© P

ara

dig

m4

Merge and impute

Page 25: Massively Scalable Computational Finance with SciDB

© P

ara

dig

m4

Consolidated Order Book

Page 26: Massively Scalable Computational Finance with SciDB

© P

ara

dig

m4

Benchmark Results

• 9 exchanges; 358,000,000 events; 8,000 symbols

• Order book depth: 10

Page 27: Massively Scalable Computational Finance with SciDB

© P

ara

dig

m4

Financial network analysis

Page 28: Massively Scalable Computational Finance with SciDB

© P

ara

dig

m4

A graph

Page 29: Massively Scalable Computational Finance with SciDB

© P

ara

dig

m4

Sparse matrix representation

Page 30: Massively Scalable Computational Finance with SciDB

© P

ara

dig

m4

Bitcoin transactions A directed graph

Represented as a nonsymmetric

sparse matrix

From

address

To

address Date, Amount,

Transaction ID

Page 31: Massively Scalable Computational Finance with SciDB

© P

ara

dig

m4

Bitcoin network schema

(using the Reid/Harrigan user ID method)

Page 32: Massively Scalable Computational Finance with SciDB

Identify important nodes

• Kleinberg HITS method

• Subgraph centrality

• Fielder clustering

• Other methods...

Page 33: Massively Scalable Computational Finance with SciDB

Bitcoin subgraph centrality

• Identify top 5 most central hub and authority nodes

• 16.3M nodes

• 6.3M x 6.3M sparse matrix

• 8-instance SciDB cluster on a single workstation (8 cores)

• 20 seconds

Page 34: Massively Scalable Computational Finance with SciDB

© Paradigm4 Inc.

Correlation network

1 Compute bar data closing

prices from TAQ trades

2 na.locf imputation

3 Correlation matrix across all

instruments

4 Regularize

5 Precision matrix

6 Threshold

7 Plot clusters

All inside SciDB up to plot

Page 35: Massively Scalable Computational Finance with SciDB

Take away

• Bringing the analysis to the data

• In-database complex math

• Parallel time series analysis

• Programmable from C++, R, Python ...

• MPP on commodity clusters, clouds

• Extensible, open-source

www.paradigm4.com

Page 36: Massively Scalable Computational Finance with SciDB

© Paradigm4 Inc.

Questions?

Tell us about your application • [email protected]

Try our Quick Start • scidb.org/forum

• Download a VM or EC2 AMI

www.paradigm4.com