Upload
mongodb
View
4.995
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Learn how you can enjoy the developer productivity, low TCO, and unlimited scale of MongoDB as a tick database for capturing, analyzing, and taking advantage of opportunities in tick data. This presentation will illustrates how MongoDB can easily and quickly store variable data formats, like top and depth of book, multiple asset classes, and even news and social networking feeds. It will explore aggregating and analyzing tick data in real-time for automated trading or in batch for research and analysis and how auto-sharding enables MongoDB to scale with commodity hardware to satisfy unlimited storage and performance requirements.
Citation preview
Sr. Solution Architect, MongoDB
Matt Kalan
How Capital Markets Firms Use MongoDB as a Tick Database
Agenda
• MongoDB One Slide Overview
• FS Use Cases
• Writing/Capturing Market Data
• Reading/Analyzing Market Data
• Performance, Scalability, & High Availability
• Q&A
MongoDB Technical Benefits
Horizontally Scalable-Sharding
Agile &Flexible
High Performance-Indexes-RAM
Application
HighlyAvailable-Replica Sets
{ name: “John Smith”, date: “2013-08-01”), address: “10 3rd St.”, phone: [ { home: 1234567890}, { mobile: 1234568138} ] }
db.cust.insert({…})db.cust.find({ name:”John Smith”})
Most Common FS Use Cases
1. Tick Data Capture & Analysis
2. Reference Data Management
3. Risk Analysis & Reporting
4. Trade Repository
5. Portfolio Reporting
Writing and Capturing Tick Data
Tick Data Capture & Analysis Requirements
• Capture real-time market data (multi-asset, top of book, depth of book, even news)
• Load historical data
• Aggregate data into bars, daily, monthly intervals
• Enable queries & analysis on raw ticks or aggregates
• Drive backtesting or automated signals
Tick Data Capture & Analysis –Why MongoDB?
• High throughput => can capture real-time feeds for all
products/asset classes needed
• High scalability => all data and depth for all historical time
periods can be captured
• Flexible & Range-based indexing => fast querying on time
ranges and any fields
• Aggregation Framework => can shape raw data into aggregates
(e.g. ticks to bars)
• Map-reduce capability (Native MR or Hadoop Connector) =>
batch analysis looking for patterns and opportunities
• Easy to use => native language drivers and JSON expressions that
you can apply for most operational database needs as well
• Low TCO => Low software license cost and commodity hardware
Trades/metrics
High Level Trading Architecture
Feed Handler
Exchanges/Markets/Brokers
Capturing Application
Low Latency Applications
Higher Latency Trading
Applications
Backtesting and Analysis Applications
Market Data
Cached Static & Aggregated Data
News & social networking
sources
Orders
Orders
Trades/metrics
High Level Trading Architecture
Feed Handler
Exchanges/Markets/Brokers
Capturing Application
Low Latency Applications
Higher Latency Trading
Applications
Backtesting and Analysis Applications
Market Data
Cached Static & Aggregated Data
News & social networking
sources
Orders
Orders
Data Types• Top of book• Depth of book• Multi-asset• Derivatives (e.g.
strips)• News (text, video)• Social Networking
{ _id : ObjectId("4e2e3f92268cdda473b628f6"),symbol : "DIS",timestamp: ISODate("2013-02-15 10:00"),bidPrice: 55.37,offerPrice: 55.58,bidQuantity: 500,offerQuantity: 700
}
> db.ticks.find( {symbol: "DIS",
bidPrice: {$gt: 55.36} } )
Top of Book [e.g. equities]
{ _id : ObjectId("4e2e3f92268cdda473b628f6"),symbol : "DIS",timestamp: ISODate("2013-02-15 10:00"),bidPrices: [55.37, 55.36, 55.35],offerPrices: [55.58, 55.59, 55.60],bidQuantities: [500, 1000, 2000],offerQuantities: [1000, 2000, 3000]
}
> db.ticks.find( {bidPrices: {$gt: 55.36} } )
Depth of Book
{ _id : ObjectId("4e2e3f92268cdda473b628f6"),symbol : "DIS",timestamp: ISODate("2013-02-15 10:00"),bids: [
{price: 55.37, amount: 500}, {price: 55.37, amount: 1000}, {price: 55.37, amount: 2000} ],
offers: [ {price: 55.58, amount: 1000}, {price: 55.58, amount: 2000}, {price: 55.59, amount: 3000} ]
}
> db.ticks.find( {"bids.price": {$gt: 55.36} } )
Or However Your App Uses It
{ _id : ObjectId("4e2e3f92268cdda473b628f6"),symbol : "DIS",timestamp: ISODate("2013-02-15 10:00"),spreadPrice: 0.58leg1: {symbol: “CLM13, price: 97.34}leg2: {symbol: “CLK13, price: 96.92}
}
db.ticks.find( { “leg1” : “CLM13” },
{ “leg2” : “CLK13” },
{ “spreadPrice” : {$gt: 0.50 } } )
Synthetic Spreads
{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
symbol : "DIS",
timestamp: ISODate("2013-02-15 10:00"),
title: “Disney Earnings…”
body: “Walt Disney Company reported…”,
tags: [“earnings”, “media”, “walt disney”]
}
News
{
_id : ObjectId("4e2e3f92268cdda473b628f6"),
timestamp: ISODate("2013-02-15 10:00"),
twitterHandle: “jdoe”,
tweet: “Heard @DisneyPictures is releasing…”,
usernamesIncluded: [“DisneyPictures”],
hashTags: [“movierumors”, “disney”]
}
Social Networking
{ _id : ObjectId("4e2e3f92268cdda473b628f6"),symbol : "DIS”,openTS: Date("2013-02-15 10:00"),closeTS: Date("2013-02-15 10:05"),open: 55.36,high: 55.80,low: 55.20,close: 55.70
}
Aggregates (bars, daily, etc)
Querying/Analyzing Tick Data
Architecture for Querying Data
Higher Latency Trading
Applications
Backtesting Applications
• Ticks• Bars• Other
analysis
Research & Analysis
Applications
// Compound indexes
> db.ticks.ensureIndex({symbol: 1, timestamp:1})
// Index on arrays
>db.ticks.ensureIndex( {bidPrices: -1})
// Index on any depth
> db.ticks.ensureIndex( {“bids.price”: 1} )
// Full text search
> db.ticks.ensureIndex ( {tweet: “text”} )
Index Any Fields: Arrays, Nested, etc.
// Ticks for last month for media companies
> db.ticks.find({ symbol: {$in: ["DIS", “VIA“, “CBS"]}, timestamp: {$gt: new ISODate("2013-01-01")}, timestamp: {$lte: new ISODate("2013-01-31")}})
// Ticks when Disney’s bid breached 55.50 this month
> db.ticks.find({ symbol: "DIS",
bidPrice: {$gt: 55.50}, timestamp: {$gt: new ISODate("2013-02-01")}})
Query for ticks by time; price threshold
Analyzing/Aggregating Options
• Custom application code– Run your queries, compute your results
• Aggregation framework– Declarative, pipeline-based approach
• Native Map/Reduce in MongoDB– Javascript functions distributed across cluster
• Hadoop Connector– Offline batch processing/computation
//Aggregate minute bars for Disney for February
db.ticks.aggregate( { $match: {symbol: "DIS”, timestamp: {$gt: new ISODate("2013-02-01")}}}, { $project: { year: {$year: "$timestamp"}, month: {$month: "$timestamp"}, day: {$dayOfMonth: "$timestamp"}, hour: {$hour: "$timestamp"}, minute: {$minute: "$timestamp"}, second: {$second: "$timestamp"}, timestamp: 1, price: 1}}, { $sort: { timestamp: 1}}, { $group : { _id : {year: "$year", month: "$month", day: "$day", hour: "$hour", minute: "$minute"}, open: {$first: "$price"}, high: {$max: "$price"}, low: {$min: "$price"}, close: {$last: "$price"} }} )
Aggregate into min bars
…
//then count the number of down bars
{ $project: { downBar: {$lt: [“$close”, “$open”] }, timestamp: 1, open: 1, high: 1, low: 1, close: 1}}, { $group: {
_id: “$downBar”,
sum: {$sum: 1}}} })
Add Analysis on the Bars
var mapFunction = function () {
emit(this.symbol, this.bidPrice);
}
var reduceFunction = function (symbol, priceList) {
return Array.sum(priceList);
}
> db.ticks.mapReduce(
map, reduceFunction, {out: ”tickSums"})
MapReduce Example: Sum
Process Data in Hadoop
• MongoDB’s Hadoop Connector
• Supports Map/Reduce, Streaming, Pig
• MongoDB as input/output storage for Hadoop jobs– No need to go through HDFS
• Leverage power of Hadoop ecosystem against operational data in MongoDB
Performance, Scalability, and High Availability
Why MongoDB Is Fast and Scalable
Better data locality
Relational MongoDB
In-Memory Caching
Auto-Sharding
Read/write scaling
Auto-sharding for Horizontal Scale
mongod
Read/Write Scalability
Key RangeSymbol: A…Z
Auto-sharding for Horizontal Scale
Read/Write Scalability
mongod mongod
Key RangeSymbol: A…J
Key RangeSymbol: K…Z
Sharding
mongod mongodmongod mongod
Read/Write Scalability
Key RangeSymbol: A…F
Key RangeSymbol: G…J
Key RangeSymbol: K…O
Key RangeSymbol: P…Z
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Primary
Secondary
Secondary
MongoS MongoS MongoS
Key RangeSymbol: A…F, Time
Key RangeSymbol: G…J,Time
Key RangeSymbol: K…O,Time
Key RangeSymbol: P…Z, Time
Application
Summary
• MongoDB is high performance for tick data
• Scales horizontally automatically by auto-sharding
• Fast, flexible querying, analysis, & aggregation
• Dynamic schema can handle any data types
• MongoDB has all these features with low TCO
• We can support you with anything discussed
Questions?
Sr. Solution Architect, MongoDB
Matt Kalan
#ConferenceHashtag
Thank You