Introduction to Zeppelin - gets measured, gets managed - Peter Drucker A price of light is less than the cost of darkness - Arthur C. Nielsen , Founder of ACNielsen

  • View
    213

  • Download
    1

Embed Size (px)

Transcript

  • Introduction to Zeppelin

    Lee moon soo, NFLabs moon@apache.org

    mailto:moon@apache.org

  • What is Zeppelin?

  • Lets see demo

  • How Zeppelin started

  • What gets measured, gets managed - Peter Drucker

    A price of light is less than the cost of darkness - Arthur C. Nielsen , Founder of ACNielsen

    War is ninety percent information - Napoleon Bonaparate

    In God we trust, all others must bring data - W. Edwards Deming

    Data = Understanding

  • For data analysis, we need tool

  • But couldnt find one i like

    ImpalaDrill

    Hive tajoPig

    Cloudera-ML

    MLLib

    MRQL

  • Decided to make one

    Really good one

  • Good analytics environment

    Analytical language Many Libraries Interactive Visualization Sharing

  • The First attempt 2012~2013

  • Its got graphic REPL, deployment, search, import tool

    But failed, because

  • It wasnt widely used It wasnt opensource

  • Second attempt 2013~2014Opensourced graphic REPL from commercial product

    The first version of Zeppelin

  • Second attempt 2013~2014

    Not widely used

    It was slow, difficult to use,

  • Third attempt 2014~

    After few weeks of study, decided to rewrite Zeppelin

    Graphic REPl -> Notebook with Apache Spark integration

  • Third attempt 2014~

    Next week, beautifulized

  • Why do you like Zeppelin?

  • Web Based

    web framework

    d3.js

    Visualization

    Language

    Package management

    bower

    Build

  • Notebook

    Code

    Result

    Code

    Result

    Notebook

  • Notebook

    Data Visualize

  • Pivot

    Pivot

  • Dynamic Form Creation

    Dynamic Form

  • REST

    Web socket

    Synchronized

    Sharing

  • Zeppelin simplifies data analysis

  • Why do your project likes Zeppelin?

  • .

    Interpreters Spark PySpark SparkSQL Hive Mysql (JDBC) Markdown Shell

    Easy to extend

  • Zeppelin Interpreter Architecture

    Classloader

    InterpreterGroup

    Interpreter Interpreter

    Server

    Client

    HTTP Rest / Websocket

    Classloader

    InterpreterGroup

    Spark SparkSQL Dep

  • Classloader

    InterpreterGroup

    Interpreter Interpreter

    Server

    Client

    HTTP Rest / Websocket

    InterpreterGroup

    Interpreter Interpreter

    Seprate JVM process

    Thrift

    Zeppelin Interpreter Architecture

  • public abstract void open();

    public abstract void close();

    public abstract InterpreterResult interpret(String st, InterpreterContext context);

    public abstract void cancel(InterpreterContext context);

    public abstract int getProgress(InterpreterContext context);

    public abstract List completion(String buf, int cursor);

    public abstract FormType getFormType();

    public Scheduler getScheduler();

    Implementing new Interpreter

    Must have

    Good to have

    More controls

  • Roadmap Integration with more distributed processing framework Flink, Ignite, Tajo, etc..

    Output message streaming Ability to create rich GUI

  • ImpalaDrill

    Hive tajoPig

    Cloudera-ML

    MLLib

    MRQL

  • ThanksLee moon soo

    moon@apache.org

    mailto:moon@apache.org