How to Make Norikra Perfect

How to make Norikra perfectStream Processing Casual Talks #1 #streamctjp Jul 22, 2016 Satoshi Tagomori (@tagomoris)

Satoshi "Moris" Tagomori (@tagomoris)

Fluentd, MessagePack-Ruby, Norikra, ...

Treasure Data, Inc.

1. How Norikra is perfect

2. How to make Norikra more perfect

http://norikra.github.io/

Norikra: Schema-less Stream Processing using SQL

• Server software, written in JRuby, runs on JVM

• Open source software (GPLv2)

• http://norikra.github.io/

• https://github.com/norikra/norikra

https://github.com/norikra/norikra

SELECT user.age, COUNT(*) as cnt FROM events.win:time_batch(5 mins)

WHERE current=”San Diego” AND attend.$0 AND attend.$1

GROUP BY user.age

{“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...]}

{“user.age":35,"cnt":5}, {"user.age":36,"cnt":8}, ...

How Norikra is Perfect• Ultra fast bootstrap • Schema on read • Handling complex (nested) events • Dynamic query registration/unregistration • Simple Web UI • Data connector: Fluentd • Extensible: UDF/Listener plugins • Performance: good enough for small/middle site

Schema on Read• Query first, Data next • Query must know what it requires

• field names, types of fields, ... • Platform can ingest any data into processor.

Query can fetch events which matches required schema.

schema-less (mixed) data stream

fields subset

for query A

fields subset for query B

query A

query Bevents from

billing service

events from API endpoint

Architecture

Norikra Server (on JVM)

Esper Instance (Query Engine)

Type DefinitionManager

Output Event Pool

Norikra Engine

RPC Servermizuno (Jetty + Rack)

Rack RPC Handler

NorikraClientmsgpack-

rpc-over-http

For details :)• Norikra: Stream Processing with SQL

http://www.slideshare.net/tagomoris/norikra-stream-processing-with-sql

• Norikra: SQL Stream Processing in Ruby http://www.slideshare.net/tagomoris/norikra-sql-stream-processing-in-ruby

• Norikra in Action http://www.slideshare.net/tagomoris/norikra-in-action-ver-2014-spring

• Landscape of Norikra Features http://www.slideshare.net/tagomoris/norikra-meetup-features

• Norikra Recent Updates http://www.slideshare.net/tagomoris/norikra-recent-updates

http://www.slideshare.net/tagomoris/norikra-stream-processing-with-sql

http://www.slideshare.net/tagomoris/norikra-sql-stream-processing-in-ruby

http://www.slideshare.net/tagomoris/norikra-in-action-ver-2014-spring

http://www.slideshare.net/tagomoris/norikra-meetup-features

http://www.slideshare.net/tagomoris/norikra-recent-updates

Recent Updates

• v1.4.0: Jul 19, 2016 • Add support for "-D" and "-agentlib" of JVM • Update msgpack version

• Previous release v1.3.1: May 7, 2015 • Explained in "Norikra Recent Updates" slide

IS IT REALLY PERFECT!?

Good & Bad• Good for startup:

Fast bootstrap, SQL, Web UI, Fluentd plugins, Handling complex events, ...

• Good for middle: Dynamic query registration, Dynamic UDF loading, Good performance enough for middle (10k events/sec), Schema on read, ...

• Bad for big players: No Distribution, No High availability, Uncontrollable JVM/Esper behavior (CPU&Memory)

Tentative name:

Perfect Norikra

Perfect Norikra• All features of Norikra

• Including "Ultra fast bootstrap" • Compatible RPC API w/ original Norikra

• Distributed execution on any scheduler • YARN? Mesos? or ...? • Automatic failover & retry for failures (HA) • Automated optimization for load balancing • Dynamic scaling out

from 1 to 100 nodes - without any restarts/retries

Rough SketchRPC Server

RPC Handler

Type Definition Manager

Query Compiler

DAG Optimizer / Deoptimizer

DAG Executor

Event RouterEvent Buffer

Queries

Events

Events

master node

processor node

Rough Sketch• Brand new query executor

• SQL Parser • Query compiler into DAG • SQL operators as sub-DAGs (inspired by TimeStream) • DAG executor

• Brand new dataflow manager / nodes • Sync/Async data replication • Barriers for event stream (inspired by Flink) • Versioned routing/distribution

Dynamic Scaling Out

• Processing nodes are stateful • state: limited by available memory size • growing stream size -> memory overflow :-(

• Scaling strategy must be dynamic • restarting queries (of static scaling) increases

latency

Query: COUNT(DISTINCT uid) per 1day

7/1 7/2 7/3 7/4

3nodes 3nodes 3nodes

memory usage per node


7/1 7/2 7/3 7/4

memory overflow - CRASH!

Burst Traffic - failure

3nodes 3nodes 3nodes


7/1 7/2 7/3 7/4

3nodes 3nodes 6nodes6nodes

Crash

Recovery

• After crash, restart the query w/ increased # of nodes • After restart, query re-reads all data of that window • After recovery, all nodes back to realtime calculation

Crash & Recovery Strategy(1)


7/1 7/2 7/3 7/4

Crash & Recovery Strategy(2)

3nodes 3nodes 6nodes6nodes

Crash

Recovery

• Pros: Very easy to implement • Cons: Requires all data stored (distributed filesystem?) • Cons: Hard to know # of nodes for increasing traffic • Cons: Recovery state requires more nodes than normal state

Dynamic Scaling Out strategy(1)Query: COUNT(DISTINCT uid) per 1day

7/1 7/2 7/3 7/4

3nodes 5nodes5nodes 6nodes

intermediate result

3nodes

merge results for final result

• Before crash, increase # of processing nodes • Queries always produces intermediate results w/ # of distribution • Query results should be produced by merging intermediate results

Dynamic Scaling Out strategy(2)Query: COUNT(DISTINCT uid) per 1day

7/1 7/2 7/3 7/4

3nodes 5nodes5nodes 6nodes

intermediate result

3nodes

merge results for final result

• Pros: Less latency, less computing power • Cons: All operator must support such calculation

- SQL !

For Dynamic Scaling Out

• De-optimization of operators

• Virtual nodes for routing

• ... and many others

Hard things

• Resource monitoring & limitation

• Multi-tenancy

• UDF and sandbox

• Queries without aggregations

Why not on Spark or Flink?

• Because of schema-less event processing - it requires dataflow controlled by query manager

• Because of dynamic scaling - it requires brand new dataflow layer

No Bytes Implemented :P Stay Tuned!

We are hiring! by Treasure Data

Software

How to Make Norikra Perfect