View
560
Download
0
Category
Preview:
DESCRIPTION
Colin talks about how he architected and built a high performance time series database from the ground up at Dataloop.io. Handling hundreds of thousands of metrics per second. One of the objectives was to provide real time graphing and alerting. If you're 'rolling your own' metrics, are interested in Node.JS, highly scalable architectures and like listening to plenty of war stories you should enjoy this talk. Video: http://youtu.be/vx6Ms5TNtqo DevOps Exchange Meetup Group: http://bit.ly/doxlonmeetup
Citation preview
www.dataloop.io | @dataloopio | info@dataloop.io
Colin Hemmings | Architect
Time-series Datastore on Riak
www.dataloop.io | @dataloopio | info@dataloop.io
•Collection •Storage •Analytics
Architecture
www.dataloop.io | @dataloopio | info@dataloop.io
Just stick it in a database, right?
The Storage Problem
www.dataloop.io | @dataloopio | info@dataloop.io
Past Solutions
TempoDB - the phantom menace
www.dataloop.io | @dataloopio | info@dataloop.io
Past Solutions
MongoDB - return of the Jedi
www.dataloop.io | @dataloopio | info@dataloop.io
Riak - Our New Hope
• Scales
• Ops Friendly
• Actually works
• No random JVM crashes here
www.dataloop.io | @dataloopio | info@dataloop.io
Objectives
• Handle the load
• Semi-arbitrary queries
• Data retention windows
• Low latency
www.dataloop.io | @dataloopio | info@dataloop.io
Data structure
• Resolution/rollup based queries
• Minimum 24 hours at 1 second resolution
• Second, minute and hour resolution
www.dataloop.io | @dataloopio | info@dataloop.io
Data structure
• 86,400 data points per resolution
• 1 second -> 24 hour retention
• 1 minute -> 60 day retention
• 1 hour -> 10 year retention
www.dataloop.io | @dataloopio | info@dataloop.io
Data structure
• per metric -> 250k data points
• 1000 metric per host -> 2.5M data points
• 300 hosts per user -> 750M data points
• 1000 customers -> 750B data points!!!!!
www.dataloop.io | @dataloopio | info@dataloop.io
Simple Riak Storage
• Timestamp keyed object per metric value
• 2i and MapReduce are too slow
• Especially across millions of keys
• Writes would soon cripple our Riak cluster
www.dataloop.io | @dataloopio | info@dataloop.io
Intelligent Riak Storage
• Units of storage: time based data blocks
• Compute keys
• Mutable data windows
www.dataloop.io | @dataloopio | info@dataloop.io
Query
Get cpu metrics for host A for period t1-t4 at 1 second resolution
• Pull the correct blocks from riak, based on block boundaries
• GET /buckets/host_a/keys/cpu_second_t1b
• GET /buckets/host_a/keys/cpu_second_t2b
• GET /buckets/host_a/keys/cpu_second_t3b
• GET /buckets/host_a/keys/cpu_second_t4b
www.dataloop.io | @dataloopio | info@dataloop.io
Query
• Filter points outside of our query range
• Aggregate all the data points
• Perform other operation if more complex query
www.dataloop.io | @dataloopio | info@dataloop.io
Expiring
• Cleanup worker
• Removes keys out of retention window
• Host keyed, easier to clear all hosts or account data
www.dataloop.io | @dataloopio | info@dataloop.io
Our cluster
• Riak 2.0
• 5 nodes on LevelDB
• Each 2 x 500GB striped SSDs
• Average 1ms GET and PUT latencies
www.dataloop.io | @dataloopio | info@dataloop.io
www.dataloop.io | @dataloopio | info@dataloop.io
Comments
• Awesome, especially for ops
• A bit more work in application tier
• Always compute keys avoid 2i and MapReduce
• Looking forward to using the new data types
Recommended