40
MongoDB as a Tick Store

MongoDB Tick Data Presentation

  • Upload
    mongodb

  • View
    125

  • Download
    1

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: MongoDB Tick Data Presentation

MongoDB as a Tick Store

Page 2: MongoDB Tick Data Presentation

MongoDB WorldNew York City, June 23-25

#MongoDBWorld

See what’s next in MongoDB including •MongoDB 2.6•Sharding•Replication•Aggregation

http://world.mongodb.comSave $200 with discount code THANKYOU

Page 3: MongoDB Tick Data Presentation

3

• What is MongoDB

- The Company

- The Product

• MongoDB for Tick Data

• Case Study

Agenda

Page 4: MongoDB Tick Data Presentation

4

MongoDB Overview

350+ employees 1,000+ customers

Over $231 million in funding13 offices around the world

Page 5: MongoDB Tick Data Presentation

5

7,000,000+ 7,000,000+ MongoDB DownloadsMongoDB Downloads

150,000+ 150,000+ Online Education RegistrantsOnline Education Registrants

35,000+ 35,000+ MongoDB Management Service (MMS) UsersMongoDB Management Service (MMS) Users

30,000+ 30,000+ MongoDB User Group MembersMongoDB User Group Members

20,000+ 20,000+ MongoDB Days AttendeesMongoDB Days Attendees

Global Community

Page 6: MongoDB Tick Data Presentation

6

• What is MongoDB

- The Company

- The Product

• MongoDB for Tick Data

• Case Study

Agenda

Page 7: MongoDB Tick Data Presentation

7

MongoDB.

NoSQL Document based database.

Designed to build todays applications.

•Fast to build.

•Quick to adapt.

•Easy to scale

•Lessons learned from 40 years of RDBMS.

Page 8: MongoDB Tick Data Presentation

8

Relational Model

PlanID BenFK Plan

100 1 PPO Plus

200 2 Standard

EmpID Name Dept Title Manage Payband

9950 Dunham, Justin

500 1500 6531 C

EmpBenPlanID EmpFK PlanFK

1 9950 100

2 9950 200

BenID Benefit

1 Health

2 Dental

DeptID Department

500 Marketing

TitleID Title

1500 Product Manager

Page 9: MongoDB Tick Data Presentation

9

Document Model

EmpID Name Dept Title Manage Payband Benefits

9950 Dunham, Justin

Marketing Product Manager

6531 C

EmpBenPlanID EmpFK PlanFK

1 9950 100

2 9950 200

Health PPO Plus

Dental Standard

PlanID BenFK Plan

100 Health PPO Plus

200 Dental Standard

Page 10: MongoDB Tick Data Presentation

10

MongoDB - Agility

Dynamic Schemas

V 1.0 V 1.1 V 2.0

EmpID Name Dept Title Manager Payband Benefits

9950 Dunham, Justin

Marketing Product Manager

6531 C

EmpID Name Title Payband Bonus

9952 Joe White CEO E 20,000

EmpID Name Dept Title Manager Payband Shares

9531 Nearey, Graham

Marketing Director 9952 D 5000

Health PPO Plus

Dental Standard

Page 11: MongoDB Tick Data Presentation

11

ShellCommand-line shell for interacting directly with database

MongoDB - Usability

DriversDrivers for most popular programming languages and frameworks

> db.collection.insert({product:“MongoDB”, type:“Document Database”})> > db.collection.findOne(){

“_id” : ObjectId(“5106c1c2fc629bfe52792e86”),“product” : “MongoDB”“type” : “Document Database”

}

Java

Python

Perl

Ruby

Haskell

JavaScript

Page 12: MongoDB Tick Data Presentation

12

MongoDB - Utility

• Complex Indexed Queries

• Aggregation.

Age > 65 AND Male living near LyonAge Profit Margin

1-17 0

18-35 20

36-50 80

51-65 50

66+ 5

Page 13: MongoDB Tick Data Presentation

13

MongoDB - Scalability

• High Availability

• Auto Sharding

• Enterprise Monitoring

• Grid file storage

Page 14: MongoDB Tick Data Presentation

14

Column Family

Key/Value Store

Relational

Document Store

Options for building a Operational Database

Page 15: MongoDB Tick Data Presentation

15

MongoDB & Hadoop

• Multi-source analytics• Interactive & Batch• Data lake

• Online, Real-time• High concurrency &

HA• Live analytics

Operational

Post Processingand

MongoDB Connector for

Hadoop

Page 16: MongoDB Tick Data Presentation

16

• What is MongoDB

- The Company

- The Product

• MongoDB for Tick Data

• Case Study

Agenda

Page 17: MongoDB Tick Data Presentation

17

Tick Data – Why MongoDB?

• Flexible Data Model– Easy Onboarding

• Flexible Querying and Indexing– Primary, Secondary & Index Intersection

• Aggregation Framework – Native to MongoDB

• Pre-aggregation pattern– Continous and up-to-date snapshot of “object”

• Language Drivers & Hadoop Connector– Java, Python, Scala, R, Matlab

• High Throughput & Linear Scalability

Page 18: MongoDB Tick Data Presentation

18

{ _id : ObjectId("4e2e3f92268cdda473b628f6"),symbol : "DIS",timestamp: ISODate("2013-02-15 10:00"),bidPrice: 55.37,offerPrice: 55.58,bidQuantity: 500,offerQuantity: 700

}

> db.ticks.find( {symbol: "DIS",

bidPrice: {$gt: 55.36} } )

Flexible Data ModelEasy Onboarding – e.g. Equities

Page 19: MongoDB Tick Data Presentation

19

{ _id : ObjectId("4e2e3f92268cdda473b628f6"),symbol : "DIS",timestamp: ISODate("2013-02-15 10:00"),bidPrices: [55.37, 55.36, 55.35],offerPrices: [55.58, 55.59, 55.60],bidQuantities: [500, 1000, 2000],offerQuantities: [1000, 2000, 3000]

}

> db.ticks.find( {bidPrices: {$gt: 55.36} } )

Flexible Data ModelEasy Onboarding – e.g. Depth of Book

Page 20: MongoDB Tick Data Presentation

20

{

_id : ObjectId("4e2e3f92268cdda473b628f6"),

symbol : "DIS",

timestamp: ISODate("2013-02-15 10:00"),

title: “Disney Earnings…”

body: “Walt Disney Company reported…”,

tags: [“earnings”, “media”, “walt disney”]

}

Flexible Data ModelEasy Onboarding – e.g. News

Page 21: MongoDB Tick Data Presentation

21

{

_id : ObjectId("4e2e3f92268cdda473b628f6"),

timestamp: ISODate("2013-02-15 10:00"),

twitterHandle: “jdoe”,

tweet: “Heard @DisneyPictures is releasing…”,

usernamesIncluded: [“DisneyPictures”],

hashTags: [“movierumors”, “disney”]

}

Flexible Data ModelEasy Onboarding – e.g. Social Networking

Page 22: MongoDB Tick Data Presentation

22

Tick Data – Why MongoDB?

• Flexible Data Model– Easy Onboarding

• Flexible Querying and Indexing– Primary, Secondary & Index Intersection

• Aggregation Framework & Map-Reduce– Native to MongoDB

• Pre-aggregation pattern– Continous and up-to-date snapshot of “object”

• Language Drivers & Hadoop Connector– Java, Python, Scala, R, Matlab

• High Throughput & Linear Scalability

Page 23: MongoDB Tick Data Presentation

23

Architecture for Querying Data

Higher Latency Trading

Applications

Higher Latency Trading

Applications

Backtesting ApplicationsBacktesting Applications

Research & Analysis

Applications

Research & Analysis

Applications

Page 24: MongoDB Tick Data Presentation

24

// Compound indexes

> db.ticks.ensureIndex({symbol: 1, timestamp:1})

// Index on arrays

>db.ticks.ensureIndex( {bidPrices: -1})

// Index on any depth

> db.ticks.ensureIndex( {“bids.price”: 1} )

// Full text search

> db.ticks.ensureIndex ( {tweet: “text”} )

Flexible Querying and IndexingIndex any field [or arrays]

Page 25: MongoDB Tick Data Presentation

25

// Ticks for last month for media companies

> db.ticks.find({ symbol: {$in: ["DIS", “VIA“, “CBS"]}, timestamp: {$gt: new ISODate("2013-01-01")}, timestamp: {$lte: new ISODate("2013-01-31")}})

// Ticks when Disney’s bid breached 55.50 this month

> db.ticks.find({ symbol: "DIS",

bidPrice: {$gt: 55.50}, timestamp: {$gt: new ISODate("2013-02-01")}})

Flexible Querying and IndexingRich Query Language

Page 26: MongoDB Tick Data Presentation

26

Tick Data – Why MongoDB?

• Flexible Data Model– Easy Onboarding

• Flexible Querying and Indexing– Primary, Secondary & Index Intersection

• Aggregation Framework & Map-Reduce– Native to MongoDB

• Pre-aggregation pattern– Continous and up-to-date snapshot of “object”

• Language Drivers & Hadoop Connector– Java, Python, Scala, R, Matlab

• High Throughput & Linear Scalability

Page 27: MongoDB Tick Data Presentation

27

//Aggregate minute bars for Disney for February

db.ticks.aggregate( { $match: {symbol: "DIS”, timestamp: {$gt: new ISODate("2013-02-01")}}}, { $project: { year: {$year: "$timestamp"}, month: {$month: "$timestamp"}, day: {$dayOfMonth: "$timestamp"}, hour: {$hour: "$timestamp"}, minute: {$minute: "$timestamp"}, second: {$second: "$timestamp"}, timestamp: 1, price: 1}}, { $sort: { timestamp: 1}}, { $group : { _id : {year: "$year", month: "$month", day: "$day", hour: "$hour", minute: "$minute"}, open: {$first: "$price"}, high: {$max: "$price"}, low: {$min: "$price"}, close: {$last: "$price"} }} )

Aggregation FrameworkParallel execution across cluster

Page 28: MongoDB Tick Data Presentation

28

Tick Data – Why MongoDB?

• Flexible Data Model– Easy Onboarding

• Flexible Querying and Indexing– Primary, Secondary & Index Intersection

• Aggregation Framework & Map-Reduce– Native to MongoDB

• Pre-aggregation pattern– Continuous and up-to-date snapshot of “object”

• Language Drivers & Hadoop Connector– Java, Python, Scala, R, Matlab

Page 29: MongoDB Tick Data Presentation

29

Pre-aggregation patternReal-time and continuous state

{ _id : ObjectId("4e2e3f92268cdda473b628f6”)symbol : "DIS",timestamp: ISODate("2013-02-15 10:00"),bidPrices: [55.37, 55.36, 55.35],…

} {

_id : ObjectId("4e2e3f92268cdda473b628f6”)symbol : "DIS",timestamp: ISODate("2013-02-15 …

}

{ _id : ObjectId("4e2e3f92268cdda473b628f6”)symbol : "DIS",Daily_high: 66.1Daily_low: 57.1Daily_volume: 100222

}

All Ticks CollectionPre-aggregated State

Page 30: MongoDB Tick Data Presentation

30

Tick Data – Why MongoDB?

• Flexible Data Model– Easy Onboarding

• Flexible Querying and Indexing– Primary, Secondary & Index Intersection

• Aggregation Framework & Map-Reduce– Native to MongoDB

• Pre-aggregation pattern– Continuous and up-to-date snapshot of “object”

• Language Drivers & Hadoop Connector– Java, Python, Scala, R, Matlab

Page 31: MongoDB Tick Data Presentation

31

Process Data in Hadoop

• MongoDB’s Hadoop Connector

• Supports Map/Reduce, Streaming, Pig

• MongoDB as input/output storage for Hadoop jobs– No need to go through HDFS

• Leverage power of Hadoop ecosystem against operational data in MongoDB

Page 32: MongoDB Tick Data Presentation

32

Tick Data – Why MongoDB?

• Flexible Data Model– Easy Onboarding

• Flexible Querying and Indexing– Primary, Secondary & Index Intersection

• Aggregation Framework & Map-Reduce– Native to MongoDB

• Pre-aggregation pattern– Continuous and up-to-date snapshot of “object”

• Language Drivers & Hadoop Connector– Java, Python, Scala, R, Matlab

• High Throughput & High Scalability

Page 33: MongoDB Tick Data Presentation

33

Why MongoDB Is Fast and Scalable

Better data locality

Relational MongoDB

In-Memory Caching

Auto-Sharding

Read/write scalingRead/write scaling

Page 34: MongoDB Tick Data Presentation

34

• What is MongoDB

- The Company

- The Product

• MongoDB for Tick Data

• Case Study

Agenda

Page 35: MongoDB Tick Data Presentation

35

Easy On-boarding

Easy On-boarding of all Financial Data

Problem Why MongoDB

• Financial data comes in many different shapes and sizes, and it needs to be on-boarded for research and analysis from multiple platforms like Bloombergs and Reuters

Shapes- Time Series News- Event- Sentiment

Sizes- 1MB 1x a day price data - 1GB x 1000s data matrices- 40GB 1-minute data- 30TB Tick data- Even bigger << options data

• On-boarding can takes week in a relational model with complex schema designs and ETL

•An FX Option can be a 80+ table schema

• Relational technology is a scale up architecture and did not meet performance requirement of AHL

• Dynamic schema: can on-board data of any shape or size almost instantly, without having to go through a typical “ETL” lifecyle

• Performance: Quant researchers want data rendered in <1s for up-to 20 years of historical data for back-testing trading strategies

• Replication: Team of 40 Quants researchers who rely on this system being up.

• Sharding: can scale seamlessly and accommodate data of any shape and size

Page 36: MongoDB Tick Data Presentation

36

Low latency:

-1xDay data: 4ms for 10,000 rows (vs. 2,210ms from SQL)

-OneMinute / Tick data: 1s for 3.5M rows Python (vs. 15s – 40s+ from OtherTick)

-1s for 15M rows Java

-

Parallel Access:

-Cluster with 256+ concurrent data access

-Consistent throughput – little load on the Mongo server

Efficient:

-10-15x reduction in network load

-Negligible decompression cost (lz4: 1.8Gb/s)

Easy On-boardingResults

Page 37: MongoDB Tick Data Presentation

37

Page 38: MongoDB Tick Data Presentation

38

Page 39: MongoDB Tick Data Presentation

39

James (AHL) Presentation Links

• Slides:

• http://www.slideshare.net/JamesBlackburn1/mongodb-and-python-as-a-market-data-platform

• YouTube:

• James Blackburn - Python and MongoDB as a Platform for Financial Market Data

Page 40: MongoDB Tick Data Presentation

Q&A