Click here to load reader
Upload
dataloopio
View
560
Download
0
Embed Size (px)
DESCRIPTION
Colin talks about how he architected and built a high performance time series database from the ground up at Dataloop.io. Handling hundreds of thousands of metrics per second. One of the objectives was to provide real time graphing and alerting. If you're 'rolling your own' metrics, are interested in Node.JS, highly scalable architectures and like listening to plenty of war stories you should enjoy this talk. Video: http://youtu.be/vx6Ms5TNtqo DevOps Exchange Meetup Group: http://bit.ly/doxlonmeetup
Citation preview
www.dataloop.io | @dataloopio | [email protected]
Colin Hemmings | Architect
Time-series Datastore on Riak
www.dataloop.io | @dataloopio | [email protected]
Just stick it in a database, right?
The Storage Problem
www.dataloop.io | @dataloopio | [email protected]
Riak - Our New Hope
• Scales
• Ops Friendly
• Actually works
• No random JVM crashes here
www.dataloop.io | @dataloopio | [email protected]
Objectives
• Handle the load
• Semi-arbitrary queries
• Data retention windows
• Low latency
www.dataloop.io | @dataloopio | [email protected]
Data structure
• Resolution/rollup based queries
• Minimum 24 hours at 1 second resolution
• Second, minute and hour resolution
www.dataloop.io | @dataloopio | [email protected]
Data structure
• 86,400 data points per resolution
• 1 second -> 24 hour retention
• 1 minute -> 60 day retention
• 1 hour -> 10 year retention
www.dataloop.io | @dataloopio | [email protected]
Data structure
• per metric -> 250k data points
• 1000 metric per host -> 2.5M data points
• 300 hosts per user -> 750M data points
• 1000 customers -> 750B data points!!!!!
www.dataloop.io | @dataloopio | [email protected]
Simple Riak Storage
• Timestamp keyed object per metric value
• 2i and MapReduce are too slow
• Especially across millions of keys
• Writes would soon cripple our Riak cluster
www.dataloop.io | @dataloopio | [email protected]
Intelligent Riak Storage
• Units of storage: time based data blocks
• Compute keys
• Mutable data windows
www.dataloop.io | @dataloopio | [email protected]
Query
Get cpu metrics for host A for period t1-t4 at 1 second resolution
• Pull the correct blocks from riak, based on block boundaries
• GET /buckets/host_a/keys/cpu_second_t1b
• GET /buckets/host_a/keys/cpu_second_t2b
• GET /buckets/host_a/keys/cpu_second_t3b
• GET /buckets/host_a/keys/cpu_second_t4b
www.dataloop.io | @dataloopio | [email protected]
Query
• Filter points outside of our query range
• Aggregate all the data points
• Perform other operation if more complex query
www.dataloop.io | @dataloopio | [email protected]
Expiring
• Cleanup worker
• Removes keys out of retention window
• Host keyed, easier to clear all hosts or account data
www.dataloop.io | @dataloopio | [email protected]
Our cluster
• Riak 2.0
• 5 nodes on LevelDB
• Each 2 x 500GB striped SSDs
• Average 1ms GET and PUT latencies
www.dataloop.io | @dataloopio | [email protected]
www.dataloop.io | @dataloopio | [email protected]
Comments
• Awesome, especially for ops
• A bit more work in application tier
• Always compute keys avoid 2i and MapReduce
• Looking forward to using the new data types