Upload
simon-belak
View
415
Download
0
Embed Size (px)
Citation preview
Onyxa masterless, cloud scale, fault tolerant, high performance distributed computation system
… written entirely in Clojure
Onyx at• In production for almost a year
• ETL
• online machine learning
• offline (batch) machine learning
• ad-hoc analysis
Self-service infrastructure for data scientists
1.Onyx at a glance
2.How Onyx rewired my brain
3.Putting “data is code” to work
1.Onyx at a glance
2.How Onyx rewired my brain
3.Putting “data is code” to workDescribing computation
with data
Onyx at a glance
Job =
[[:input :processing-1] [:input :processing-2] [:processing-1 :output-1] [:processing-2 :output-2]]
[{:flow/from :input-stream :flow/to [:process-adults] :flow/predicate :my.ns/adult? :flow/doc "Emits segment if an adult.”}]
workflow + flow conditions + catalogue [{:onyx/name :add-5
:onyx/fn :my/adder :onyx/type :function :my/n 5 :onyx/params [:my/n]}
{:onyx/name :in :onyx/plugin :onyx.plugin.core-async/input :onyx/type :input :onyx/medium :core.async :onyx/batch-size batch-size :onyx/max-peers 1 :onyx/doc "Reads segments from a core.async channel"}
{:onyx/name :out :onyx/plugin :onyx.plugin.core-async/output :onyx/type :output :onyx/medium :core.async :onyx/doc "Writes segments to a core.async channel"}]
Catalogue[{:onyx/name :add-5 :onyx/fn :my/adder :onyx/type :function :my/n 5 :onyx/params [:my/n]}
{:onyx/name :in :onyx/plugin :onyx.plugin.core-async/input :onyx/type :input :onyx/medium :core.async :onyx/batch-size batch-size :onyx/max-peers 1 :onyx/doc "Reads segments from a core.async channel"}
{:onyx/name :out :onyx/plugin :onyx.plugin.core-async/output :onyx/type :output :onyx/medium :core.async :onyx/doc "Writes segments to a core.async channel"}]
Vanilla Clojure function(defn adder [n {:keys [x] :as segment}] (assoc segment :x (+ n x))))
Plugins (I/O)seq, async, Kafka, Datomic, SQL, S3, SQS, …
parameter
self-documenting
Computation entirely described with data
data is
code!
Everything can be run locally!
Testing without mocking
How Onyx rewired my brain
It’s not about scaling, but clean architecture
My goto architecture
KafkaDB EventsOnyx Onyx
Onyx
Persist all events to S3 • time travel • query with AWS Athena
Decomplect everything
Computation graphs
Putting “data is code” to work
Interlude: queryable data descriptions with spec
• s/registry, s/form
• Build a graph (Datomic)
Interact with your type system!co
de is d
ata!
Case study: autogenerating materialised views
KafkaMaterialised views
Events External data
Automatic view generation• Event & attribute ontology
• Manual (via spec) • Inferred
• Statistical analysis (seasonality detection, outlier removal, …)
Onyx Onyx
Onyx
Automatic view generation
1. Walk spec registry
2. Apply rules
1. Define new view (spec)
2. Trigger Onyx job that creates the view
⤾
Code is data or
data is code?
Takeouts
Onyx is production ready
Everything should be live and interactive
Computation graphs are a great way to structure data processing code
Queryable data and computation descriptions supercharge interactive development and are a great building block for automation
viebel.github.io/klipse/examples/onyx.html
onyxplatform.org
onyxplatform.org/jekyll/update/2017/02/08/Pyroclast-Preview-Simulation.html