Transcript
Page 1: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Time Series Data With A Apache Cassandra

ApacheCon EuropeNovember 18, 2014

Eric [email protected]

@jericevans

Page 2: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Open

Page 3: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Open

Page 4: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Open

Page 5: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Open

Page 6: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Network

Management

System

Page 7: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

OpenNMS: What It Is

● Network Management System○ Discovery and Provisioning○ Service monitoring○ Data collection○ Event management, notifications

● Java, open source, GPLv3● Since 1999

Page 8: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Time series: RRDTool

● Round Robin Database● First released 1999● Time series storage● File-based, constant-size, self-maintaining● Automatic, incremental aggregation

Page 9: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

… and oh yeah, graphing

Page 10: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Consider

● 5+ IOPs per update (read-modify-write)!● 100,000s of metrics, 1,000s IOPS● 1,000,000s of metrics, 10,000s IOPS● 15,000 RPM SAS drive, ~175-200 IOPS

Page 11: Time Series Data with Apache Cassandra (ApacheCon EU 2014)
Page 12: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Hmmm

We collect and write a great deal; We read (graph) relatively little.

So why are we aggregating everything?

Page 13: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Also

● Not everything is a graph● Inflexible● Incremental backups impractical● Availability subject to filesystem access

Page 14: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

TIL

Metrics typically appear in groups that are accessed together.

Optimizing storage for grouped access is a great idea!

Page 15: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

What OpenNMS needs:

● High throughput● High availability● Late aggregation● Grouped storage/retrieval

Page 16: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Cassandra

● Distributed database● Highly available● High throughput● Tunable consistency

Page 17: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

SSTables

Writes

Commitlog

Memtable

SSTable

Disk

Memory

Page 18: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Write Properties

● Optimized for write throughput● Sorted on disk● Perfect for time series!

Page 19: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Partitioning AZ

A

B

C

Key: Apple

...

Page 20: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Placement

A

B

C

Key: Apple

...

Page 21: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Replication

A

B

C

Key: Apple

...

Page 22: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

CAP Theorem

Consistency

Availability

Partition tolerance

Page 23: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Consistency

A

B

?

W=2

Page 24: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Consistency

R=2

R+W > N?

B

C

Page 25: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Distribution Properties

● Symmetrical● Linearly scalable● Redundant● Highly available

Page 26: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

D ata odelM

Page 27: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Data Model

resource

Page 28: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Data Model

resource

T1 T2 T3

Page 29: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Data Model

resource

T1

M1 M2

V1 V2

M3

V3

T2

M1 M2

V1 V2

M3

V3

T3

M1 M2

V1 V2

M3

V3

Page 30: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Data Model

CREATE TABLE samples ( T timestamp,

M text,

V double,

resource text,

PRIMARY KEY(resource, T, M));

Page 31: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Data model

V1T1 M1 V2T1 M2 T1 V3M3resource

Page 32: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Data model

SELECT * FROM samples

WHERE resource = ‘resource’

AND T = ‘T1’;

V1T1 M1 V2T1 M2 T1 V3M3resource

Page 33: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Data model

T1 M1 V1resource

V1T1 M1 V2T1 M2 T1 V3M3resource

Page 34: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Data model

T1 M1 V1

T1 M2 V2

resource

resource

V1T1 M1 V2T1 M2 T1 V3M3resource

Page 35: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Data model

T1 M1 V1

T1 M2 V2

T1 M3 V3

resource

resource

resource

V1T1 M1 V2T1 M2 T1 V3M3resource

Page 36: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Data model

SELECT * FROM samples

WHERE resource = ‘resource’

AND T >= ‘T1’ AND T <= ‘T3’;

V1T1 M1 V1T2 M1 T3 V1M1resource

Page 37: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Newts

● Standalone time series data-store○ REST server○ Java API

● Raw sample storage and retrieval● Flexible aggregations (computed at read)

○ Rate (counter types)○ Pluggable aggregation functions○ Arbitrary calculations

Page 38: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Newts

● Search-enabled● Fast; Runs at Cassandra-speed● Apache licensed● Github (http://github.com/OpenNMS/newts)● http://newts.io

Page 39: Time Series Data with Apache Cassandra (ApacheCon EU 2014)

Fin