29
Using Onyx in anger @sbelak [email protected]

Using Onyx in anger

Embed Size (px)

Citation preview

Page 1: Using Onyx in anger

Using Onyx in anger

@sbelak [email protected]

Page 2: Using Onyx in anger

Onyxa masterless, cloud scale, fault tolerant, high performance distributed computation system

… written entirely in Clojure

Page 3: Using Onyx in anger

Onyx at• In production for almost a year

• ETL

• online machine learning

• offline (batch) machine learning

• ad-hoc analysis

Page 4: Using Onyx in anger

Self-service infrastructure for data scientists

Page 5: Using Onyx in anger

1.Onyx at a glance

2.How Onyx rewired my brain

3.Putting “data is code” to work

Page 6: Using Onyx in anger

1.Onyx at a glance

2.How Onyx rewired my brain

3.Putting “data is code” to workDescribing computation

with data

Page 7: Using Onyx in anger

Onyx at a glance

Page 8: Using Onyx in anger

Job =

[[:input :processing-1] [:input :processing-2] [:processing-1 :output-1] [:processing-2 :output-2]]

[{:flow/from :input-stream :flow/to [:process-adults] :flow/predicate :my.ns/adult? :flow/doc "Emits segment if an adult.”}]

workflow + flow conditions + catalogue [{:onyx/name :add-5

:onyx/fn :my/adder :onyx/type :function :my/n 5 :onyx/params [:my/n]}

{:onyx/name :in :onyx/plugin :onyx.plugin.core-async/input :onyx/type :input :onyx/medium :core.async :onyx/batch-size batch-size :onyx/max-peers 1 :onyx/doc "Reads segments from a core.async channel"}

{:onyx/name :out :onyx/plugin :onyx.plugin.core-async/output :onyx/type :output :onyx/medium :core.async :onyx/doc "Writes segments to a core.async channel"}]

Page 9: Using Onyx in anger

Catalogue[{:onyx/name :add-5 :onyx/fn :my/adder :onyx/type :function :my/n 5 :onyx/params [:my/n]}

{:onyx/name :in :onyx/plugin :onyx.plugin.core-async/input :onyx/type :input :onyx/medium :core.async :onyx/batch-size batch-size :onyx/max-peers 1 :onyx/doc "Reads segments from a core.async channel"}

{:onyx/name :out :onyx/plugin :onyx.plugin.core-async/output :onyx/type :output :onyx/medium :core.async :onyx/doc "Writes segments to a core.async channel"}]

Vanilla Clojure function(defn adder [n {:keys [x] :as segment}] (assoc segment :x (+ n x))))

Plugins (I/O)seq, async, Kafka, Datomic, SQL, S3, SQS, …

parameter

self-documenting

Page 10: Using Onyx in anger

Computation entirely described with data

data is

code!

Page 11: Using Onyx in anger

Everything can be run locally!

Page 12: Using Onyx in anger

Testing without mocking

Page 13: Using Onyx in anger

How Onyx rewired my brain

Page 14: Using Onyx in anger

It’s not about scaling, but clean architecture

Page 15: Using Onyx in anger

My goto architecture

KafkaDB EventsOnyx Onyx

Onyx

Persist all events to S3 • time travel • query with AWS Athena

Page 16: Using Onyx in anger

Decomplect everything

Page 17: Using Onyx in anger

Computation graphs

Page 18: Using Onyx in anger

Putting “data is code” to work

Page 19: Using Onyx in anger

Interlude: queryable data descriptions with spec

• s/registry, s/form

• Build a graph (Datomic)

Interact with your type system!co

de is d

ata!

Page 20: Using Onyx in anger

Case study: autogenerating materialised views

KafkaMaterialised views

Events External data

Automatic view generation• Event & attribute ontology

• Manual (via spec) • Inferred

• Statistical analysis (seasonality detection, outlier removal, …)

Onyx Onyx

Onyx

Page 21: Using Onyx in anger

Automatic view generation

1. Walk spec registry

2. Apply rules

1. Define new view (spec)

2. Trigger Onyx job that creates the view

Page 22: Using Onyx in anger

Code is data or

data is code?

Page 23: Using Onyx in anger

Takeouts

Page 24: Using Onyx in anger

Onyx is production ready

Page 25: Using Onyx in anger

Everything should be live and interactive

Page 26: Using Onyx in anger

Computation graphs are a great way to structure data processing code

Page 27: Using Onyx in anger

Queryable data and computation descriptions supercharge interactive development and are a great building block for automation