38
Time ows, on Gr aph Managing event sequences and me series with a Document- Graph Database FOSDEM 2015 Enrico Risa Orient Technologies LTD Twier: @wolf4ood Emanuele Tagliaferri Orient Technologies LTD Twier: @tglman

Time Series With OrientDB - Fosdem 2015

Embed Size (px)

Citation preview

Page 1: Time Series With OrientDB - Fosdem 2015

Time flows, on Graph

Managing event sequences and time series with a Document-Graph Database

FOSDEM 2015

Enrico Risa

Orient Technologies LTD

Twitter: @wolf4ood

Emanuele Tagliaferri

Orient Technologies LTD

Twitter: @tglman

Page 2: Time Series With OrientDB - Fosdem 2015

Time What…?

Time series: A time series is a sequence of data points, typicallyconsisting of successive measurements made over atime interval (Wikipedia)

Page 3: Time Series With OrientDB - Fosdem 2015

Time What…?

Event sequences:

• A set of events with a timestamp

• A set of relationships “happenedbefore/after”

• Cause and effect relationships

Page 4: Time Series With OrientDB - Fosdem 2015

Graph approaches

•. Nodes/Edges

•. Index free adjacency

•. Fast traversal

•. Dynamic structure

Page 5: Time Series With OrientDB - Fosdem 2015

Graph approaches

Linked sequence

e1e1 e2e2next

e3e3next

e4e4next

e5e5next

(timestamp on vertex)

Page 6: Time Series With OrientDB - Fosdem 2015

Graph approaches

linked sequence (tag based)

e1e1 e2e2

nextTag1

e3e3

nextTag2

e4e4nextTag1

e5e5

nextTag1

nextTag2

[Tag1, Tag2] [Tag1]

[Tag1, Tag2]

[Tag1]

[Tag2]

Page 7: Time Series With OrientDB - Fosdem 2015

Graph approaches

Hierarchy

e1e1 e2e2 e60

e60

11

11

88

2424

22 6060…

Days

Hours

Minutes

Seconds

e3e3

Page 8: Time Series With OrientDB - Fosdem 2015

Graph approaches

Mixed

e1e1 e2e2 e60

e60

11

11

88

2424

22 6060…

Days

Hours

Minutes

Seconds

e3e3

Page 9: Time Series With OrientDB - Fosdem 2015

Current approaches

Advantages

•. Flexible

•. Events can be connected together in different ways

•. You can navigate events following a path by time ortag.

Page 10: Time Series With OrientDB - Fosdem 2015

Current approaches

Disadvantages

•. Slow query for a high number of event

Page 11: Time Series With OrientDB - Fosdem 2015

Optimization

● Data Pre-Aggregation

Page 12: Time Series With OrientDB - Fosdem 2015

Optimization

Pre-aggregate

11

11

88

2424

22 6060…

Days

Hours

Minutes

…Graph

Page 13: Time Series With OrientDB - Fosdem 2015

Optimization

Pre-aggregate

11

11

88

2424

22 6060…

Days

Hours

Minutes

…Graph

sum()

Page 14: Time Series With OrientDB - Fosdem 2015

Optimization

Pre-aggregate

11

11

88

2424

22 6060…

Days

Hours

Minutes

…Graph

sum()

sum()

Page 15: Time Series With OrientDB - Fosdem 2015

Optimization

Aggregation logic

• Second 0 -> insert

• Second 1 -> insert

• …

• Second 57 -> insert

• Second 58 -> insert

• Second 59 -> insert + aggregate update– Write aggregate value on minute vertex

● Minute == 59? Calculate aggregate on hour vertex

Page 16: Time Series With OrientDB - Fosdem 2015

OrientDB

How to aggregate

Hooks: Server side triggers (Java or Javascript),executed when DB operations happen (eg. Insert orupdate)

Java interface:

Public RESULT onBeforeInsert(…);

public void onAfterInsert(…);public RESULT onBeforeUpdate(…);

public void onAfterUpdate(…);

Page 17: Time Series With OrientDB - Fosdem 2015

Optimization

11

11

88

2424

22 6060…

Days

Hours

Minutes

sum = 1000

sum = 15000

sum = 300

incomplete

complete

11 22

sum = null

sum = null

Page 18: Time Series With OrientDB - Fosdem 2015

Optimization

Query logic:

• Traverse from root node to specified level(filtering based on vertex data)

• Is there aggregate value?

– Yes: return it

– No: go one level down and do the same

Aggregation on a level will be VERY fast if youhave horizontal edges!

Page 19: Time Series With OrientDB - Fosdem 2015

OrientDB

How to calculate aggregate values with a query

Input params:

- Root node (suppose it is #11:11)

select sum(aggregateVal) from (

traverse out() from #11:11

while in().aggregateVal is null

)

With the same logic you can query based on timewindows

Page 20: Time Series With OrientDB - Fosdem 2015

Time Series Proof of Concept

Page 21: Time Series With OrientDB - Fosdem 2015

POC Implementation

Core:● As OrientDB Plugin

● Rely on Hooks

● Aggregation Engine

● Handle all Time Unit

Data Visualization:

● Simple UI (Realtime/History)

● Query in Studio

Page 22: Time Series With OrientDB - Fosdem 2015

Core

● Plugin that register hook and some input/outputsource (websocket ,message queue, socket etc..)

● Hook on Event Class (entry point)

- Event can be saved or not.- Aggregations are made when the lower time units changes- Pre-allocation of TimeUnit Pointers

● Time unit tracked:-Year-Month-Day-Minute-Second

Page 23: Time Series With OrientDB - Fosdem 2015

Core

Advantages

● Simple (Few lines of code)

● No Indexes

● Easy to use

– Plain OrientDB sql to insert an eventinsert into event set bets = 1, cpu = 50

● Fast (Especially in plocal mode)

Page 24: Time Series With OrientDB - Fosdem 2015

Core

Disadvantages

● Too Simple (For now)

● Aggregator hardcoded (Maybejavascript aggregator?)

Page 25: Time Series With OrientDB - Fosdem 2015

Data Visualization

Two Charts:

● Realtime data through WebSocket

The engine pushes the events received every seconds

● Range query for history Data

Using the powerfull array range notation we can query fora specific time range

Page 26: Time Series With OrientDB - Fosdem 2015

Let's Run It

Page 27: Time Series With OrientDB - Fosdem 2015

Data Query Time unit

● Array Notation

selectexpand(m[1].d[30].h[13].m[5-10])

from year where time = 2015

● Traverse with Next

traverse next from(select expand(m[1].d[26].h[19].m[37])

from year where time = 2015 )while $depth <= 3

Page 28: Time Series With OrientDB - Fosdem 2015

Data Query Aggregation

● Array Notation

select sum(bets)from (selectexpand(m[1].d[30].h[13].m[5-10])

from year where time = 2015)

● Traverse with Next

select sum(bets)from {traverse next from(select expand(m[1].d[26].h[19].m[37])

from year where time = 2015 )while $depth <= 3)

Page 29: Time Series With OrientDB - Fosdem 2015

Multi-Model Optimization!We got OrientDB

Page 30: Time Series With OrientDB - Fosdem 2015

• Document database (schema-free, complexproperties)

• Graph database (index-free adjacency, fast traversal)

• SQL (extended)

• Operational (schema - ACID)

• OO concepts (Classes, inheritance, polymorphism)

• REST/JSON interface

• Native Javascript (extend query language, exposeservices, event hooks)

• Distributed (Multi-master replica/sharding)architecture

Page 31: Time Series With OrientDB - Fosdem 2015

● Studio 2.0

● Lucene & ETL in bundle

● WAL management (Fuzzy Checkpoint)

● Schema Driven Serialization

● Autosharding strategy on Distributed

Page 32: Time Series With OrientDB - Fosdem 2015

OrientDB

First step: put them together

11

11

88

2424

22 6060…

Days

Hours

Minutes

{0: 1000,1: 1500.…59: 96

}

Page 33: Time Series With OrientDB - Fosdem 2015

OrientDB

First step: put them together

11

11

88

2424

22 6060…

Days

Hours

Minutes

{0: 1000,1: 1500.…59: 96

}

<- IT’S A VERTEX TOO!!!

Graph

Document

Page 34: Time Series With OrientDB - Fosdem 2015

OrientDB

put them together

11

88

2424

Days

Hours…

{0: {

0: 1000, 1: 1500,…59: 210

}1: { … }…59: { … }

}

Graph

Document

Page 35: Time Series With OrientDB - Fosdem 2015

Where should I stop?

It depends on my domain andrequirements.

Page 36: Time Series With OrientDB - Fosdem 2015

OrientDB

Third step: Complex domains

11

11 22 6060…

Hours

Minutes

{0: {val: 1000},1: {val: 1500}.…59: {

val: 96,eventTags: [tag1, tag2]…

}}

Graph

Document <- Enrich the domain

Page 37: Time Series With OrientDB - Fosdem 2015

One model is not enough

One of most common issues of my customersis:

“I have a zoo of technologies in my applicationstack, and it’s getting worse every day”

My answer is: Multi-Model DB

of course ;-)

Page 38: Time Series With OrientDB - Fosdem 2015

Thank you!

Enrico Risa

Orient Technologies LTD

Twitter: @wolf4ood

Emanuele Tagliaferri

Orient Technologies LTD

Twitter: @tglman