Massively Scalable Computational Finance with SciDB

Massively Scalable

Computational

Finance with SciDB

Bryan Lewis

Chief Data Scientist

Frank Smietana

Solutions Architect

GoToWebinar

• Ask questions using the

Q&A window

• This webinar is being

recorded

• Replays will be available

from paradigm4.com

Common issues

• Expensive data ETL

• Lack of horizontal scalability

• Hard to program

• Hard to extend

• Difficulty with data JOINS

What is SciDB?

Massively scalable

distributed array database

What is SciDB?

Open source

Mike Stonebraker CTO

What is SciDB?

Lawrence Berkeley

NASA Goddard

Projects using satellite image data

Institute for Geoinformatics

Global land change analysis on remote

sensing data (LANDSAT, MODIS, SENTINEL)

Lawrence Berkeley

Big Science and SciDB

Commercial applications Pharma, Biotech, Healthcare

Quantitative Finance

Image & Sensor Analytics

E-commerce

Arrays for finance

Symbol

Fast multidimensional SELECTs

Table model i j data

1 1 0.5

1 2 0.3

1 3 0.1

1 4 -0.5

2 1 0.9

2 2 0.0

2 3 -0.8

2 4 -0.8

3 1 1.1

3 2 1.0

3 3 1.2

3 4 1.5

4 1 0.9

4 2 1.0

4 3 1.2

4 4 1,5

Array model

0.5 0.3 0.1 -0.5

0.9 0.0 -0.8 -0.8

1.1 1.0 1.2 1.5

0.9 1.0 1.2 1.5

Our approach

• Less data movement

• Spatial data clustering

• Leverage popular languages

• Extensibility

Java/JVM

Javascript

Array SQL

Use Popular Languages

Protocol buffers

C/C++ API

Shared-nothing architecture

Common issues

• Expensive data ETL

• Lack of horizontal scalability

• Hard to program

• Hard to extend

• Difficulty with data JOINS

• Minimize ETL

• Massively scalable

• Program from many languages

• Open-source extensibility

• Fast parallel JOIN

Examples

• Order books

• Network analysis

Order book challenges

• Lots of exchanges

• Regulatory compliance

• Margins are shrinking

• Want more alpha

Create order book

• Load raw data into array

• Dimension along symbol and time

coordinate axes

• Create order book entries with

custom aggregation function ORDERBOOK

https://github.com/Paradigm4/orderbook-example

Consolidate order books

• Load as arrays

• Merge into single array

• Impute missing value

(inexact temporal join)

• Aggregate by time and symbol

Example Order Books

Merge and impute

Consolidated Order Book

Benchmark Results

• 9 exchanges; 358,000,000 events; 8,000 symbols

• Order book depth: 10

Financial network analysis

A graph

Sparse matrix representation

Bitcoin transactions A directed graph

Represented as a nonsymmetric

sparse matrix

address

address Date, Amount,

Transaction ID

Bitcoin network schema

(using the Reid/Harrigan user ID method)

Identify important nodes

• Kleinberg HITS method

• Subgraph centrality

• Fielder clustering

• Other methods...

Bitcoin subgraph centrality

• Identify top 5 most central hub and authority nodes

• 16.3M nodes

• 6.3M x 6.3M sparse matrix

• 8-instance SciDB cluster on a single workstation (8 cores)

• 20 seconds

Correlation network

1 Compute bar data closing

prices from TAQ trades

2 na.locf imputation

3 Correlation matrix across all

instruments

4 Regularize

5 Precision matrix

6 Threshold

7 Plot clusters

All inside SciDB up to plot

Take away

• Bringing the analysis to the data

• In-database complex math

• Parallel time series analysis

• Programmable from C++, R, Python ...

• MPP on commodity clusters, clouds

• Extensible, open-source

www.paradigm4.com

Questions?

Tell us about your application • info@paradigm4.com

Try our Quick Start • scidb.org/forum

• Download a VM or EC2 AMI

www.paradigm4.com

Massively Scalable Computational Finance with SciDB

Software

Massively Parallel Solver for the High-Order Galerkin ...arrow.utias.utoronto.ca/~myano/papers/myano_smthesis.pdfThe solver must also be highly scalable to take advantage of the massively

PLANET: Massively Parallel Learning of Tree Ensembles …iraicu/teaching/CS595-F10/DM-Scalable-DT.pdf · PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce Authors:

Virtualization of ArcGIS Services: Delivering Massively Scalable GIS Services … · 2008-03-14 · Virtualization of ArcGIS Services: Delivering Massively Scalable GIS Services with

Massively Scalable Cloud Stroage for Cloud Native Applications · Massively Scalable Cloud Stroage for Cloud Native Applications Author: Russ Fellows, Mohammad Rabin: Evaluator Group

MySQL Reference Architectures for Massively Scalable … · MySQL Reference Architectures for Massively Scalable Web Infrastructure ... processes, and services to ... does serve as

The Linux Storage People Simple Fast Massively Scalable Network Storage Coraid EtherDrive ® Storage

ParaLearn: A Massively Parallel, Scalable System for ... · learning interaction networks using Bayesian statistics. Par-aLearn includes problem speci c parallel/scalable algorithms,

Building Massively Scalable Applications With Akka - Vikas Hazrati

Massively Scalable File Storage - SNIA

Cisco massively scalable data center

ParaLearn: A Massively Parallel, Scalable System …web.stanford.edu/group/wonglab/doc/ICS2010_CR.pdfParaLearn: A Massively Parallel, Scalable System for Learning Interaction Networks

Massively scalable genetic analysis of antibody repertoires · Massively scalable genetic analysis of antibody repertoires Bryan Briney1,2,3,4, Dennis R. Burton1,2,3,5 1Department

Scalable Distributed Memory Machines: Massively Parallel ...meseec.ce.rit.edu/eecc756-spring2003/756-5-6-2003.pdf · Scalable Distributed Memory Machines: Massively Parallel Processors

BioCompute & SciDB a pipeline-in-a-database · SciDB & BioCompute Objects SciDB • Data loaders enforce type & field constraints • Arrays are versioned and time-stamped • Database

SciDB-Py Documentation · SciDB-Py is a Python interface to theSciDB, the massively scalable array-oriented database. SciDB features include ACID transactions, parallel processing,

PhillipNicolas Massively Scalable File - SNIA · PRESENTATION TITLE GOES HEREMASSIVELY SCALABLE FILE STORAGE Philippe Nicolas, Scality

Simple Programmable Massively scalable · 2020-06-25 · Simple Programmable Massively scalable Large-scale deployments across 10,000+ hardware routers supporting multiple 10’s

Building Massively Scalable Applications With Akka

SCALABLE EARTH OBSERVATION ANALYTICS WITH SCIDB · SciDB: A database management system for applications with complex analytics. Computing in Science & Engineering, 15(3), 54-62. 9

Scalable Performance Analysis of Massively Parallel ... · Imperial College London Department of Computing Scalable Performance Analysis of Massively Parallel Stochastic Systems Richard