Time Series Data with Apache Cassandra (ApacheCon EU 2014)

  • View

  • Download

Embed Size (px)


  • 1. Time Series Data With AApache CassandraApacheCon EuropeNovember 18, 2014Eric Evanseevans@opennms.org@jericevans

2. Open 3. Open 4. Open 5. Open 6. NetworkManagementSystem 7. OpenNMS: What It Is Network Management System Discovery and Provisioning Service monitoring Data collection Event management, notifications Java, open source, GPLv3 Since 1999 8. Time series: RRDTool Round Robin Database First released 1999 Time series storage File-based, constant-size, self-maintaining Automatic, incremental aggregation 9. and oh yeah, graphing 10. Consider 5+ IOPs per update (read-modify-write)! 100,000s of metrics, 1,000s IOPS 1,000,000s of metrics, 10,000s IOPS 15,000 RPM SAS drive, ~175-200 IOPS 11. HmmmWe collect and write a great deal; We read(graph) relatively little.So why are we aggregating everything? 12. Also Not everything is a graph Inflexible Incremental backups impractical Availability subject to filesystem access 13. TILMetrics typically appear in groups that areaccessed together.Optimizing storage for grouped access is agreat idea! 14. What OpenNMS needs: High throughput High availability Late aggregation Grouped storage/retrieval 15. Cassandra Distributed database Highly available High throughput Tunable consistency 16. SSTablesWritesMemtableCommitlogSSTableMemoryDisk 17. Write Properties Optimized for write throughput Sorted on disk Perfect for time series! 18. Partitioning Z AABCKey: Apple... 19. PlacementABCKey: Apple... 20. ReplicationABCKey: Apple... 21. CAP TheoremConsistencyAvailabilityPartition tolerance 22. ConsistencyAB?W=2 23. ConsistencyR=2R+W > N?BC 24. Distribution Properties Symmetrical Linearly scalable Redundant Highly available 25. D ata M odel 26. Data Modelresource 27. Data ModelresourceT1 T2 T3 28. Data ModelresourceT1M1 M2V1 V2M3V3T2M1 M2V1 V2M3V3T3M1 M2V1 V2M3V3 29. Data ModelCREATE TABLE samples (T timestamp,M text,V double,resource text,PRIMARY KEY(resource, T, M)); 30. Data modelresource T1 M1 V1 T1 M2 V2 T1 M3 V3 31. Data modelresource T1 M1 V1 T1 M2 V2 T1 M3 V3SELECT * FROM samplesWHERE resource = resourceAND T = T1; 32. Data modelresource T1 M1 V1 T1 M2 V2 T1 M3 V3resource T1 M1 V1 33. Data modelresource T1 M1 V1 T1 M2 V2 T1 M3 V3T1 M1 V1T1 M2 V2resourceresource 34. Data modelresource T1 M1 V1 T1 M2 V2 T1 M3 V3T1 M1 V1T1 M2 V2T1 M3 V3resourceresourceresource 35. Data modelresource T1 M1 V1 T2 M1 V1 T3 M1 V1SELECT * FROM samplesWHERE resource = resourceAND T >= T1 AND T