Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Time Series Data With A Apache Cassandra

ApacheCon EuropeNovember 18, 2014

Eric Evanseevans@opennms.org

@jericevans

Network

Management

System

OpenNMS: What It Is

● Network Management System○ Discovery and Provisioning○ Service monitoring○ Data collection○ Event management, notifications

● Java, open source, GPLv3● Since 1999

Time series: RRDTool

● Round Robin Database● First released 1999● Time series storage● File-based, constant-size, self-maintaining● Automatic, incremental aggregation

… and oh yeah, graphing

Consider

● 5+ IOPs per update (read-modify-write)!● 100,000s of metrics, 1,000s IOPS● 1,000,000s of metrics, 10,000s IOPS● 15,000 RPM SAS drive, ~175-200 IOPS

We collect and write a great deal; We read (graph) relatively little.

So why are we aggregating everything?

● Not everything is a graph● Inflexible● Incremental backups impractical● Availability subject to filesystem access

Metrics typically appear in groups that are accessed together.

Optimizing storage for grouped access is a great idea!

What OpenNMS needs:

● High throughput● High availability● Late aggregation● Grouped storage/retrieval

Cassandra

● Distributed database● Highly available● High throughput● Tunable consistency

SSTables

Writes

Commitlog

Memtable

SSTable

Memory

Write Properties

● Optimized for write throughput● Sorted on disk● Perfect for time series!

Partitioning AZ

Key: Apple

Placement

Key: Apple

Replication

Key: Apple

CAP Theorem

Consistency

Availability

Partition tolerance

Consistency

R+W > N?

Distribution Properties

● Symmetrical● Linearly scalable● Redundant● Highly available

D ata odelM

Data Model

resource

Data Model

resource

T1 T2 T3

Data Model

resource

Data Model

CREATE TABLE samples ( T timestamp,

M text,

V double,

resource text,

PRIMARY KEY(resource, T, M));

Data model

V1T1 M1 V2T1 M2 T1 V3M3resource

Data model

SELECT * FROM samples

WHERE resource = ‘resource’

AND T = ‘T1’;

Data model

T1 M1 V1resource

Data model

T1 M1 V1

T1 M2 V2

resource

Data model

T1 M1 V1

T1 M2 V2

T1 M3 V3

resource

Data model

SELECT * FROM samples

WHERE resource = ‘resource’

AND T >= ‘T1’ AND T <= ‘T3’;

● Standalone time series data-store○ REST server○ Java API

● Raw sample storage and retrieval● Flexible aggregations (computed at read)

○ Rate (counter types)○ Pluggable aggregation functions○ Arbitrary calculations

● Search-enabled● Fast; Runs at Cassandra-speed● Apache licensed● Github (http://github.com/OpenNMS/newts)● http://newts.io

Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Technology

Apache Helix presentation at ApacheCon 2013

Diaposotivas apache-cassandra

Apache iBatis (ApacheCon US 2007)

Introduction to Apache jclouds at ApacheCon 2014

Cassandra Community Webinar: Apache Cassandra Internals

CMIS and Apache Chemistry (ApacheCon 2010)

Policing Apache Brand Use - SCurcuru - ApacheCon 2014

Apache Portals Panel (ApacheCon US 2007)

ApacheCon NA 2013 - The New Face of Apache Cassandra

Cassandra + Hadoop: Analisi Batch con Apache Cassandra

Apache Cassandra in Action - O'Reilly Mediaassets.en.oreilly.com/1/event/55/Apache Cassandra in Action... · Apache Cassandra in Action. Why Cassandra? ... Cassandra in production

76613613 Apache Cassandra

Incubating Apache Linda (ApacheCon Europe 2012)

Apache cassandra cosnola

ApacheCon NA11 - Apache Celix, Universal OSGi?

[ApacheCon 2016] Advanced Apache Cordova

Inﬁnite’SessionClustering’with’ Apache’Shiro’&’Cassandra’ · #ApacheCon+ Inﬁnite’SessionClustering’with’ Apache’Shiro’&’Cassandra’ LesHazlewood@ lhazlewood’

Apache Cassandra overview

RESTful web applications with Apache Sling - ApacheCon

DevCenter Apache cassandra