Chronix Time Series Database - The New Time Series Kid on the Block

The new time series kid on the block

Florian Lautenschlager@flolaut

68.000.000.000* time correlated data objects.

3

* collect every 10 seconds 72 metrics x 15 processes x 20 hosts over 1 years

How to store such amount of data on your laptop computer and retrieve any point within a few milliseconds?

Well we tried that approach

4

Store data objects in a classical RDBMS

ButSlow import of data objectsHuge amount of hard drive spaceSlow retrieval of time seriesLimited scalability due to RDBMSMissing query functions for time series data

!68.000.000.000!Measurement Series

NameStartEnd

Time Series

StartEnd

Data Object

TimestampValue

Metric

Attributes

HostProcess

* *

*

*

Name

5

Hence it felt like

Image Credit: http://www.sail-world.com/

But what to do? Chunks + Compression + Document storage!

6

The key ideas to enable the efficient storage of billion data objects:Split time series into chunks of the same size with data objectsCompress these chunks to reduce the data volumeStore the compressed chunk and the attributes in one record

Reason for success:32 GB disk usage to store 68 billion data objectsFast retrieval of data objects within a few millisecondsFast navigation on attributes (finding the chunk)Everything runs on a laptop computer and many more!

Time Series RecordStartEndChunk[]SizeMetadata,

1 Million

!68.000!

Thats all. No secrets, nothing special and nothing more to say.

Time Series Database - Whats that? Definitions and typical features.

Why did we choose Apache Solr and are there alternatives?

Chronix Architecture that is based on Solr and Lucene.

Whats needed to speed up Chronix to a firehorse.

What comes next?

Time Series Database: Whats that?

8

Definition 1: A data object d is a tuple of {timestamp, value}, where the value could be any kind of object.

Definition 2: A time series T is an arbitrary list of chronological ordered data objects of one value type.

Definition 3: A chunk C is a chronological ordered part of a time series.

Definition 4: A time series database TSDB is a specialized database for storing and retrieving time series in an efficient and optimized way.

d{t,v}

1T

{d1,d2}

TCTT1

C1,1C1,2

TSDBT3C2,2

T1 C2,1

A few typical features of a time series database

9

Data managementRound Robin StoragesDown-sample old time series CompressionDelta-Encoding

Describing Attributes Arbitrary amount of attributes For time series (Country, Host, Customer, ) For data object (Scale, Unit, Type)

Performance and OperationalRare updates, Inserts are additive Fast inserts and retrievalsDistributed and efficient per nodeNo need of ACID, but consistency

Time series language and API Statistics: Aggregation (min, max, median), Transformations: Time windows, time shifting,

resampling, ..

High level analyses: Outlier, TrendsCheck out: A good post about the requirements of a time series: http://www.xaprb.com/blog/2014/06/08/time-series-database-requirements/

http://www.xaprb.com/blog/2014/06/08/time-series-database-requirements

10

Some time series databases out there.RRDTool - http://oss.oetiker.ch/rrdtool/Mainly used in traditional monitoring systems

Graphite https://github.com/graphite-projectUses the concepts of RRDTool and puts some sugar on it

InfluxDB - https://influxdata.com/time-series-platform/influxdb/A distributed time series database with a very handy query language

OpenTSDB - http://opentsdb.net/Is a scalable time series database and runs on Hadoop and Hbase

Prometheus- https://prometheus.io/ A monitoring system and time series database

KairosDB - https://kairosdb.github.io/Like OpenTSDB but is based on Apache Cassandra

many more! And of course Chronix! - http://chronix.io/

http://oss.oetiker.ch/rrdtool/https://github.com/graphite-projecthttps://influxdata.com/time-series-platform/influxdb/http://opentsdb.net/https://prometheus.io/https://kairosdb.github.io/http://chronix.io/

Ey, there are so many time series databases out there? Why did you create a new solution?

11

Our Requirements

A fast write and query performance Run the database on a laptop computerMinimal data volume for stored data objects Storing arbitrary attributes A query API for searching on all attributes Large community and an active development

That delivers Apache Solr

Based on Lucene which is really fast Runs embedded, standalone, distributed Lucene has a built-in compression Schema or schemaless Solr Query Language Lucidworks and an Apache project

Our tool has been around for a good few years, and in the beginning there was no time series database that complies our requirements. And there isnt one today!Elastic Search is

an alternative. It is also based on

Lucene.

12

Lets dig deeper into Chronix internals.

Image Credit: http://www.taringa.net/posts/ciencia-educacion/12656540/La-Filosofia-del-Dr-House-2.html

Chronix architecture enables both efficient storage of time series and millisecond range queries.

13

(1) Semantic Compression

(2) Attributes and Chunks

(3) Basic Compression

(4) Multi-Dimensional

Storage

Record

data:

attributes

Record

data:compressed

attributes

Record Storage

68 Billion Points 1 Mio. Chunks * 68.000 Points ~ 96% Compression

Optional

The key data type of Chronix is called a record. It stores a compressed time series chunk and its attributes.

14

record{

data:compressed{}

//technical fields id: 3dce1de0...93fb2e806d19 version: 1501692859622883300 start: 1427457011238 end: 1427471159292

//optional attributes host: prodI5 process: scheduler group: jmxmetric: heapMemory.Usage.Usedmax: 896.571

}

Data:compressed{}

Time Series: timestamp, numeric value Traces: calls, exceptions, Logs: access, method runtimes Complex data: models, test coverage,

anything else

Optional attributes

Arbitrary attributes for the time series Attributes are indexed Make the chunk searchable Can contain pre-calculated values

Chronix provides specialized aggregations, transformations, and analyses for time series that are commonly used.

15

Aggregations (ag)

Min / Max / Average / Sum / Count Percentile Standard Deviation First / Last Range

Analyses (analysis)

Trend AnalysisUsing a linear regression model

Outlier AnalysisUsing the IQR

Frequency AnalysisCheck occurrence within a time range

Fast Dynamic Time WarpingTime series similarity search

Symbolic Aggregate Approximation Similarity and pattern search

Transformations (tr)

Bottom/Top n-valuesMoving average Divide / Scale Vectorisation

Only scalar values? One size fits all? No! What about logs, traces, and others? No problem Just do it yourself!

16

Chronix Kassiopeia (Format)Time Series framework that is used by Chronix.Time Series Types:Numeric: Doubles (the time series known to be the default)Thread Dumps: Stack traces (e.g. java stack traces)Strace: Strace dumps (system call, duration, arguments

public interface TimeSeriesConverter {

/*** Shall create an object of type T from the given binary time series.*/

T from(BinaryTimeSeries binaryTimeSeriesChunk, long queryStart, long queryEnd);

/*** Shall do the conversation of the custom time series T into the binary time series that is

stored.*/

BinaryTimeSeries to(T timeSeriesChunk);}

Plain

Thats the easiest way to play with Chronix. A single instance of Chronix on a single node with a Apache Solr instance.

17

Java 8 (JRE)

Chronix - 0.2

Solr - 6.0.0

Lucene

Solr plugins

8983

Your Computer

Chronix-Query-Handler

Chronix-Response-Writer

Chronix-Retention

Chronix-Client

Json + Binary

Binary + Binary

Json + Json

Java 8 (JRE)

HTTP

Code-Slide: How to set up Chronix, ask for time series data, and call some server-side aggregations.

18

Create a connection to Solr and set up Chronix

Define and range query and stream its results

Call some aggregations

solr = new HttpSolrClient("http://localhost:8913/solr/chronix/")chronix = new ChronixClient(new KassiopeiaSimpleConverter(),

new ChronixSolrStorage(200, groupBy, reduce))

query = new SolrQuery("metric:*Load*")chronix.stream(solr,query)

query.addFilterQuery("ag=max,min,count,sdiff")stream = chronix.stream(solr,query) Signed Difference:

First=20, Last=-100 -80

Group chunks on a combination of attributes and reduce them to

a time series.

Get all time series whose metric contains Load

Thats the four week data that is shipped with the

release!

Tune Chronix to a firehorse. Even with defaults its blazing fast!

We have tuned Chronix in terms of chunk size, and compression technique to get the ideal default values for you.

21

Tuning DatasetThree real-world projects15 GB of time series data (typical monitoring data)About 500 million points in 15k time series92 typical queries with different time range and occurrence

We have measured:Compression rate for serval compression techniques (T) and chunk sizes (C).Query time for all 92 queries in the mix (range + aggregations)

What we want to know: Ideal values for T and C

We have evaluated several compression techniques and chunk sizes of the time series data to get the best parameter values.

22

T= GZIP + C = 128 kBytes

Florian Lautenschlager, Michael Philippsen, Andreas Kumlehn, Josef AdersbergerChronix: Efficient Storage and Query of Operational Time SeriesInternational Conference on Software Maintenance and Evolution 2016 (submitted)

For more details about the tuning check our paper.

Compared to other time series databases Chronix results for our use case are outstanding. The approach works!

23

We have evaluated Chronix with: InfluxDB, Graphite, OpenTSDB, and KairosDBAll databases are configured as single node

Storage demand for 15 GB of raw csv time series dataChronix (237 MB) takes 4 84 times less space

Query times on imported data49 91 % faster than the evaluated time

series databases

Memory footprint: after start, max during import, max during query mixGraphite is best (926 MB), Chronix (1.5 GB) is

second. Others 16 to 39 GB

The hard facts. For more details I suggest you to read our research paper about Chronix.

24

Florian Lautenschlager, Michael Philippsen, Andreas Kumlehn, Josef Adersberger

Chronix: Efficient Storage and Query ofOperational Time Series

International Conference on Software Maintenance and Evolution 2016 (submitted)

Now its your turn.

Now its your turn.

The whole Chronix Stack. Not yet completely implemented.

Outlook: A powerful way to work with time series. A Chronix Cloud, a Spark Cluster, and an analysis workbench like Zeppelin.

27

Chronix Cloud

Chronix Node Chronix Node Chronix Node Chronix Node

Spark Cluster

Spark Node Spark Node Spark Node Spark Node

Zeppelin

Chronix Spark Context

Java Scala

Various Applications as Workbench

Spark SQL Context

Chronix and Spark.

Time Series Processing with Apache Spark Josef

Adersberger, Wed, 3:00 pm

(mail) florian.lautenschlager@qaware.de(twitter) @flolaut(twitter) @ChronixDB(web) www.chronix.io

#lovetimeseries

Bart Simpson

Other interesting related talks:

Real-world Analytics with SolrCloud and Spark Johannes

Weigend, Wed, 3:00 pm

Time Series Processing with Apache Spark Josef

Adersberger, Wed, 3:00 pm

Code-Slide: Use Spark to process time series data that comes out right now from Chronix.

29

Create a ChronixSparkContext

Define and range query and stream its results

Play with the data

conf = new SparkConf().setMaster(SPARK_MASTER).setAppName(CHRONIX)jsc = new JavaSparkContext(conf)csc = new ChronixSparkContext(jsc)sqlc = new SQLContext(jsc)

query = new SolrQuery("metric:*Load*")rdd = csc.queryChronixChunks(query,ZK_HOST,CHRONIX_COLLECTION,

new ChronixSolrCloudStorage());

DataSet ds = rdd.toObservationsDataset(sqlContext)rdd.mean()rdd.max()rdd.iterator()

Dataset to use Spark SQL features

Set up Spark, a JavaSparkContext, a ChronixSparkContext, and a SQLContext

Get all time series whose metric contains Load

The new time series kid on the blockFoliennummer 268.000.000.000* time correlated data objects.Well we tried that approachHence it felt like But what to do? Chunks + Compression + Document storage!Thats all. No secrets, nothing special and nothing more to say.Time Series Database: Whats that?A few typical features of a time series databaseSome time series databases out there.Ey, there are so many time series databases out there? Why did you create a new solution?Lets dig deeper into Chronix internals.Chronix architecture enables both efficient storage of time series and millisecond range queries.The key data type of Chronix is called a record. It stores a compressed time series chunk and its attributes.Chronix provides specialized aggregations, transformations, and analyses for time series that are commonly used.Only scalar values? One size fits all? No! What about logs, traces, and others? No problem Just do it yourself!Thats the easiest way to play with Chronix. A single instance of Chronix on a single node with a Apache Solr instance.Code-Slide: How to set up Chronix, ask for time series data, and call some server-side aggregations.Foliennummer 19Tune Chronix to a firehorse. Even with defaults its blazing fast!We have tuned Chronix in terms of chunk size, and compression technique to get the ideal default values for you.We have evaluated several compression techniques and chunk sizes of the time series data to get the best parameter values.Compared to other time series databases Chronix results for our use case are outstanding. The approach works!The hard facts. For more details I suggest you to read our research paper about Chronix.Now its your turn.The whole Chronix Stack. Not yet completely implemented.Outlook: A powerful way to work with time series. A Chronix Cloud, a Spark Cluster, and an analysis workbench like Zeppelin.#lovetimeseriesCode-Slide: Use Spark to process time series data that comes out right now from Chronix.

Chronix Time Series Database - The New Time Series Kid on the Block

Data & Analytics

Chronix Poster for the Poster Session FAST 2017

Univariate time-series analysis - Peter Foldvari time series analysis.pdf · Univariate time-series analysis 1. Basic concepts A time series contains observations of the random variable

Introduction to Time Series Analysis1 - s ugauss.stat.su.se/gu/e/slides/Time Series/Introduction to Time... · Introduction to Time Series Analysis Time series methods take into account

Forecasting daily time series using periodic unobserved components time series … · 2015-03-02 · Forecasting daily time series using periodic unobserved components time series

Chronix as Long-Term Storage for Prometheus

Nonlinear Time Series Modelingrdavis/lectures/Cop_P1.pdfNonlinear Time Series Modeling ... Time Series Analysis by ... What are the limitations of linear time series models? ≡ What

Chronix as long term storage for Prometheus - Schedschd.ws/...Nativ-Conf-Chronix-as-long-term-storage-for-Prometheus.pdf · Chronix as long term storage for Prometheus Florian Lautenschlager,

Time Series and Forecasting Lecture 4 NonLinear Time Series

Univariate time-series analysis - Peter Foldvaripeterfoldvari.com/univariate time series analysis.pdf · Univariate time-series analysis 1. Basic concepts A time series contains observations

CORE Time series SAMPLE - Cambridge University Press · CHAPTER7 CORE Time series ... How do we use a time series for prediction? 7.1 Time series data Time series data is a special

Time Series in Continuous Time - University of Pittsburghmilos/courses/cs3750/...11/2/2018 2 Outline •Continuous-time time series •Event time series Discrete-time time series •Time

93 SOURCE TIME SERIES VOCAL TRACT TRANSFER FUNCTION VOICE TIME SERIES VOICE SPECTRUM SOURCE TIME SERIES VOCAL TRACT TRANSFER FUNCTION VOICE TIME SERIES

Time series analysis. Example Objectives of time series analysis

Ch1. Introduction - people.missouristate.edupeople.missouristate.edu/songfengzheng/Teaching/MTH548/Time Series... · Time Series Analysis Ch1. Introduction. 1.3. Time Series Plot

Recurrent Network InputsOutputs. Motivation Associative Memory Concept Time Series Processing – Forecasting of Time series – Classification Time series

Time Series

Time Series Analysis and Mining with R - Meetupfiles.meetup.com/1677477/Time-Series-Mining-slides.pdf · R and Time Series Data Time Series Decomposi-tion Time Series Forecasting

Time series

In re Chronix Biomedical, Inc

Time Series Analysis in a nutshell - University of Bathak257/talks/Behme.pdf · Time Series Analysis in a nutshell Reduction of time series to stationary time series The time series