Time Series Data with Apache Cassandra

Time Series Data With Apache Cassandra

Berlin BuzzwordsMay 27, 2014

Eric Evanseevans@opennms.org

@jericevans

Network

Management

System

OpenNMS: What It Is

● Network Management System○ Discovery and Provisioning○ Service monitoring○ Data collection○ Event management, notifications

● Java, open source, GPLv3● Since 1999

Time series: RRDTool

● Round Robin Database● First released 1999● Time series storage● File-based, constant-size, self-maintaining● Automatic, incremental aggregation

… and oh yeah, graphing

Consider

● 5+ IOPs per update (read-modify-write)!● 100,000s of metrics, 1,000s IOPS● 1,000,000s of metrics, 10,000s IOPS● 15,000 RPM SAS drive, ~175-200 IOPS

We collect and write a great deal; We read (graph) relatively little.

So why are we aggregating everything?

● Not everything is a graph● Inflexible● Incremental backups impractical● Availability subject to filesystem access

Metrics typically appear in groups that are accessed together.

Optimizing storage for grouped access is a great idea!

What OpenNMS needs:

● High throughput● High availability● Late aggregation● Grouped storage/retrieval

Cassandra

● Apache top-level project● Distributed database● Highly available● High throughput● Tunable consistency

SSTables

Writes

Commitlog

Memtable

SSTable

DiskMemory

Write Properties

● Optimized for write throughput● Sorted on disk● Perfect for time series!

Partitioning

Key: Apple

Placement

Key: Apple

Replication

Key: Apple

CAP Theorem

Consistency

Availability

Partition tolerance

Consistency

R+W > N

Distribution Properties

● Symmetrical● Linearly scalable● Redundant● Highly available

D ata odelM

Data Modelresource

T1 T2 T3

Data Modelresource

Data Model

CREATE TABLE samples ( T timestamp,

M text,

V double,

resource text,

PRIMARY KEY(resource, T, M));

Data model

V1T1 M1 V2T1 M2 T1 V3M3resource

Data model

SELECT * FROM samplesWHERE resource = ‘resource’AND T = ‘T1’;

Data model

T1 M1 V1resource

Data model

T1 M1 V1

T1 M2 V2

resource

Data model

T1 M1 V1

T1 M2 V2

T1 M3 V3

resource

Data model

SELECT * FROM samplesWHERE resource = ‘resource’AND T >= ‘T1’ AND T <= ‘T3’;

● Standalone time series data-store● Raw sample storage and retrieval● Flexible aggregations (computed at read)

○ Rate (counter types)○ Functions pluggable○ Arbitrary calculations

● Cassandra-speed

● Java API● REST interface● Apache licensed● Github (http://github.com/OpenNMS/newts)

Time Series Data with Apache Cassandra

Technology

State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

Introduction to Cassandra • Why Spark - Apache Cassandra | Apache Kafka | Apache Spark · 2017. 12. 20. · • Introduction to Cassandra • Why Spark + Cassandra • Problem background

Cassandra Day Atlanta 2015: Troubleshooting with Apache Cassandra

with Kaa, Apache Cassandra, and Apache Zeppelin … · Real-time IoT data analytics and visualization with Kaa, Apache Cassandra, and Apache Zeppelin. Agenda Why Kaa? Why Cassandra?

76613613 Apache Cassandra

Storing time series data with Apache Cassandra

Cassandra Day Denver 2014: Introduction to Apache Cassandra

Cassandra Summit 2014: Apache Cassandra at Telefonica CBS

Analyzing Time Series Data with Apache Spark and Cassandra

Pycon 2012 Apache Cassandra

Cassandra Community Webinar - Introduction To Apache Cassandra 1.2

DevCenter Apache cassandra

Cassandra + Hadoop: Analisi Batch con Apache Cassandra

Conhecendo Apache Cassandra Meetup - Conhecendo … · Eiti Kimura Coordenador de TI na Movile - Apache Cassandra MVP 2015 - Apache Cassandra MVP 2014 - Contribuidor Apache Cassandra

Time Series Data with Apache Cassandra

Amazon Managed Apache Cassandra Service - Developer GuideCassandra Query Language (CQL) is the primary language for communicating with Apache Cassandra. Amazon Managed Apache Cassandra

Apache Cassandra in Action - O'Reilly Mediaassets.en.oreilly.com/1/event/55/Apache Cassandra in Action... · Apache Cassandra in Action. Why Cassandra? ... Cassandra in production

Using Apache Spark, Apache Kafka and Apache Cassandra...USING APACHE SPARK, APACHE KAFKA AND APACHE CASSANDRA TO POWER INTELLIGENT APPLICATIONS | 02 Apache Cassandra is well known

Apache Cassandra - Einführung

Time Series Data with Apache Cassandra (ApacheCon EU 2014)