Apache Drill (ver. 0.1, check ver. 0.2)

Apache Drill Design proposal from

OpenDremel team

Camuel Gilyadov & Constantine Peresypkin,

Email: Camuel@BigDataCraft.com

OpenDremel Story: 2010

• Camuel Gilyadov started Dremel implementation on

summer 2010 named OpenDremel.

• David Gruzman joined the effort a few months later

followed by Constantine Peresypkin.

• There wasn’t a comprehensive design or architecture.

The goal was to get hierarchal-columnar transformation

working smoothly and in strict accordance to the

Dremel paper. Several working implementations are

published by us under Apache License.

• Hong San was hired as first full-timer to speedup the

development. Metaxa milestone was set.

• OpenDremel early design was found too naive, mainly due to

Java underperformance in inner number-crunching loops.

• After fierce brainstorming, project was restarted from scratch

under new name Dazo. With Dazo, query plan is an arbitrary

piece of executable native code with Java frontend.

• From now on we got inspiration from BigQuery as opposed to

from Dremel paper.

• We decided to use Google NaCl as sandboxing technology to

isolate queries as well as meter resource consumption. The new

sandbox was named ZeroVM.

• As for storage we decided to use OpenStack Swift.

• Four people full-time, several others part time, we still

don’t have fully integrated version but we are satisfied

with what we have achieved and convinced that the

decisions behind Dazo were correct.

• We believe ZeroVM could be a disruptive technology in

itself revolutionizing BigData@Cloud space.

• We are excited by Apache Drill initiative and hope to be

useful for it.

Design Tenet #1

• Apache Drill must support multi-tenant semantics

internally and not to be run in guest VMs altogether.

• It should be inspired by BigQuery and not only by

Dremel/PowerDrill/Tenzing papers.

• It is not practical to setup a dedicated cloud (billed

hourly) just to be able to run a query for a few seconds.

• The codebase must be clearly divided into trusted part

and untrusted part. Trusted part must be kept to

absolute minimum and must be peer-reviewed, secured,

audited and metered.

Design Tenet #2

• Apache Drill must be extremely flexible and

customizable.

• Schema-on-read concept must be supported.

Imperative high-performance parser code must be

possible to be embedded into the query.

• SQL is no longer enough. New query languages must

be easily added as plug-ins or as user-defined-functions

(UDF).

• Additionally various data-formats must be supported

like column-stores, row-stores, PAX, RCFiles and etc.

Design Tenet #2 (cont.)

• We suggest that query plan format will be relaxed to

arbitrary distributed executable code and data

format relaxed to arbitrary opaque BLOB.

• This way new query languages and new data formats

could be easily supported without changing backend.

• As added benefit backend becomes generic lightweight

homogeneous compute-storage cloud.

• Such approach exhibits good separation of control.

Cloud operator controls an bills for generic

infrastructure and the query engine is left completely in

the control of the tenant/user.

Design Tenet #3

• Apache Drill requests/queries must be hyper-elastic

meaning capability to exploit compute capacity of

thousands of servers for short duration of just a few

seconds. No resources must be kept spinning per user

between queries or when idle.

• Traditional VMs are too heavyweight for that.

Container approach such as OpenVZ/LXC and etc. are

not secure enough in multi-tenancy context.

• We suggest making sandboxing pluggable and

supporting ZeroVM ( developed for OpenDremel ) and

LXC (is fine for private clouds) to begin with.

Design Tenet #4

• Apache Drill must be efficient.

• Value-per-byte is extremely low with BigData.

• Overhead in the inner loop must be kept to minimum.

• Java was found inefficient for general number

crunching (such as data compression). The main

problem with Java is that GC overhead is unavoidable

for the whole data corpus being scanned. We went so

far as to keep all data in byte arrays and auto-generate

transformation code and it still underperformed and

code complexity went through the roof.

Suggested Architecture

Browser / Client

Single-Tenant

Frontend running inside

traditional guest VM

Multi-Tenant

Backend scale-out object store

and in-situ compute

Query Compiler

Custom

executable job

OpenDremel/Dazo

Two separate

unfinished jQuery

apps & cmdline app

with no particular

codenames

We call it Metaxa (historic reasons)

BQL Parser, unfinished

compiler based on

Apache Velocity

We call it Zwift

(Swift + ZeroVM)

Alpha Quality

Custom

executable job

Query Compiler

What is Swift?

“Swift is a highly available, distributed,

eventually consistent object/blob store.

Organizations can use Swift to store

lots of data efficiently, safely, and

cheaply.”

Haven’t got it?

Swift is THE open-source

implementation of

Amazon S3

What is ZeroVM?

Highly-secure, low-overhead, low-latency container-style

virtualization based on Google Native Client project. The

critical security code is transferred verbatim from Chrome

Browser project and therefore is as secure as Chrome

Browser. More info: http://ZeroVM.org and

http://news.ycombinator.com/item?id=3746222

ZeroVM highlights

1. Disposable VM per request

2. HyperElasticity per request

3. Embeddable into everything

4. High-performance (x86/ARM)

5. Erlang inspired clustering

6. Written in pure C, not deps

Haven’t got it?

ZeroVM to Virtualization

is what

SQLite is to Databases

Where is the code?

• OpenDremel (1st generation design): – http://code.google.com/p/dremel/source/browse?repo=dremel

– http://code.google.com/p/dremel/source/browse?repo=metaxa

• Dazo (2nd generation design):

– https://github.com/Dazo-org

Thanks Camuel Gilyadov,

Email: Camuel@BigDataCraft.com

Apache Drill (ver. 0.1, check ver. 0.2)

Technology

Apache Kafka - RainFocus · Apache Kafka Scalable Message ... Introduction& Motivation Apache Kafka -Scalable Message Processing and more! Apache Kafka -Overview ... • Apache Spark

Apache DS Configuration Apache Directory Studio · Beside the integration in Apache Directory Studio the Apache Directory ... of course the Apache Directory Studio Apache DS Configuration

Writing Apache Spark and Apache Flink Applications Using Apache Bahir

แผนงาน และโครงการกิจกรรม · โครงการพระราชดําริ ระยะที่ 2 0.2 0.2 0.2 ทน .นนทบุรี

Unbenannt-1 - cdn.website-start.de · Terra Vital Kamille Pfefferminze Säfte Apfelsaft Orangensaft Rote Traube Granatapfel Pfirsichsaft Johannisbeersaft 0.2/ 0.2/ 0.2/ 0.2/ 0.2

Hortonworks Data Platform for HDInsight · • Apache Pig 0.16.0 • Apache Ranger 1.2.0 • Apache Spark 2.4.0 • Apache Sqoop 1.4.7 • Apache Storm 1.2.1 • Apache TEZ 0.9.1

PREDICTIVE DATACENTER ANALYTICS WITH …...Apache Spark: Yahoo! Streaming Benchmark, 16 workers, 8s snapshots SCHEDULING BOTTLENECK IN APACHE SPARK 32 0 5 10 15 Snapshot 0.0 0.2 0.4

Apache CarbonData Documentation Ver 1.4...Please visit Apache Spark Documentation for more details on Spark shell. 1.2.1.1 Basics Start Spark shell by running the following command

Servicios Ambientales Integrales Cs Ver 0.2-26052003-400pm 1

Apache HAWQ and Apache MADlib: Journey to Apache

Módulos temporizadores...Absorción con control externo (B1) mA 1 — Potencia disipada al ambiente en vacío W 0.1 (12 V) - 1 (230 V) 0.2 con carga nominal Ver relé serie 56T Ver

tel./fax: +34 91 675 33 06 info@autentia.com - www ... · » Trabajando con Mule ESB » Apache Hadoop - HDFS » Apache Hadoop-MapReduce Últimos Autor » Introducción a Spring Ver

Atikokan Amateur Radio Club Atikokan Amateur Radio Club By Warren Paulson VE3FYN 6 February 2012 ver 0.2

Proeject f prototype system ver 0.2

2502 SeRRAggio inteRno eD eSteRno - bison-bial.it · ... ison-bial.it ZZZ.bison-bial.fr e-mail: infobison ... Grande passaggio barra ... 0.2/0.8 0.2/0.8 0.2/0.8 0.2/0.8 0.2/0.8 0.2/0.8

Einführung in Formatvorlagen - Apache OpenOffice · • Der gerade eben geschriebenen Zeile wird das Format „Überschrift 1“ ver

Topic 5 nx os management-ver 0.2

X=0.2 Y=0.2

CTCK - He Thong InfoShow Ver 2.0 - Docs Ver 0.2

TP3“ModellierungundHomogenisierungmagneto ... · 2018. 6. 4. · −0.4 −0.2 0 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 −0.8 −0.6 −0.4 −0.2 0 −0.6 −0.4 −0.2 0 0.2 0.4