Upload
mike-dusenberry
View
79
Download
1
Embed Size (px)
Citation preview
Apache SystemMLMike Dusenberry
Engineer, Machine Learning & SystemMLSpark Technology Center
@dusenberrymwDatapalooza, Denver - 05.19.16
Apache SystemML
1. Backgrounda. Machine Learningb. Declarative ML
2. SystemMLa. Overviewb. Languagec. Compiler/Optimizerd. Runtime
3. Demo4. Current Work
a. Deep Learning: SystemML-NN
5. Questions
Agenda
Links
● Main Website:systemml.apache.org
● Code:github.com/apache/incubator-systemml
● Documentation:apache.github.io/incubator-systemml
● JIRA:issues.apache.org/jira/browse/SYSTEMML
Machine Learning
Machine Learning● Data
○ Multiple “examples”○ Multiple “features” per “example”○ “Label(s)” for each “example” (supervised)
● Model○ Construct/select a model that fits the problem.○ Examples:
■ Linear/Logistic Regression■ SVM■ Neural Networks
● Loss○ An “evaluation” of how well the model fits the data.
● Optimizer○ Minimize “loss” by adjusting model to better fit the data.
Declarative Machine Learning
Laptop
Exploratory Data Analysis Today
7
R
Python
Others
DataScientist
DataR
Python
Others
DataScientist
Laptop
Exploratory Data Analysis Today
8
R
Python
Others
DataScientist
R
Python
Others
DataScientist
Current Best Practice for Big Data Analysis
DataScientist
DataScientist
DataScientist
HadoopEngineer
SparkEngineer
MPIEngineer
R
Python
Others
Laptop
DataScientist
Scale-up
Cluster
R
Python Query Optimization
Others
Vision: Declarative Machine Learning
Common patterns:
•Changes in feature set
•Changes in data size
•Algorithm customization
•Quick iteration
Declarative Machine Learning
Classification by level of abstraction (different target user)
Landscape of Existing Work
Distributed Systems w/ DSLs
Large-Scale ML Libraries (fixed plan)
Declarative ML (fixed algorithm)
Declarative ML++ (fixed task)
Spark, Flink, REEF, GraphLab, (R, Matlab, SAS)
MLlib, Mahout MR, MADlib, ORE, Rev R, HP Dist R, Custom alg.
SystemML, (Mahout Samsara, Tupleware, Cumulon, Dmac, SimSQL)
Mlbase*, Specific sys.
Requirements to Support Declarative ML• Goal: Write ML algorithms independent of input data and cluster characteristics.• R1: Full flexibility
▪ Specify new / customize existing ML algorithms.▪ ➔ ML DSL
• R2: Data independence▪ Hide physical data representation (sparse/dense, row/column-major, blocking
configs, partitioning, caching, compression).▪ ➔ Abstract data types and coarse-grained logical operations.
• R3: Efficiency and scalability▪ Very small to very large use-cases.▪ ➔ Automatic optimization and hybrid runtime plans.
• R4: Specified algorithm semantics▪ Understand, debug, and control algorithm behavior.▪ ➔ Optimization for performance only, not accuracy.
Apache SystemML
Sidenote: Fun Stuff - Neural Art
-A Neural Algorithm of Artistic Style, L.A. Gatys, A.S. Ecker, M. Bethge-https://github.com/jcjohnson/neural-style
Apache SystemML
Apache SystemML● High-level language
○ DML -> R-like○ PyDML -> Python-like
○ Focus is on matrices and linear algebra.
● Engine○ Compiler/Optimizer○ Lots of optimizations, such as
rewrites.
● Runtime○ Laptop○ Spark○ (also Hadoop)
(DML) (PyDML)
Engine
Apache SystemML● High-level language
○ DML -> R-like○ PyDML -> Python-like
○ Focus is on matrices and linear algebra.
● Engine○ Compiler/Optimizer○ Lots of optimizations, such as
rewrites.
● Runtime○ Laptop○ Spark○ (also Hadoop)
(DML) (PyDML)
Engine
SystemML - Example: Logistic Regression (DML)
SystemML - Example: Sigmoid Function (DML)
Apache SystemML● High-level language
○ DML -> R-like○ PyDML -> Python-like
○ Focus is on matrices and linear algebra.
● Engine○ Compiler/Optimizer○ Lots of optimizations, such as
rewrites.
● Runtime○ Laptop○ Spark○ (also Hadoop)
(DML) (PyDML)
Engine
Apache SystemML● High-level language
○ DML -> R-like○ PyDML -> Python-like
○ Focus is on matrices and linear algebra.
● Engine○ Compiler/Optimizer○ Lots of optimizations, such as
rewrites.
● Runtime○ Laptop○ Spark○ (also Hadoop)
(DML) (PyDML)
Engine
SystemML - Compilation Chain
SystemML - Compilation Chain
24
SystemML - Compilation Chain
25
SystemML - Compilation Chain
26
SystemML - Compilation Chain
27
Apache SystemML● High-level language
○ DML -> R-like○ PyDML -> Python-like
○ Focus is on matrices and linear algebra.
● Engine○ Compiler/Optimizer○ Lots of optimizations, such as
rewrites.
● Runtime○ Laptop○ Spark○ (also Hadoop)
(DML) (PyDML)
Engine
Apache SystemML● High-level language
○ DML -> R-like○ PyDML -> Python-like
○ Focus is on matrices and linear algebra.
● Engine○ Compiler/Optimizer○ Lots of optimizations, such as
rewrites.
● Runtime○ Laptop○ Spark○ (also Hadoop)
(DML) (PyDML)
Engine
More Fun...
https://github.com/google/deepdream
Apache SystemML● High-level language
○ DML -> R-like○ PyDML -> Python-like
○ Focus is on matrices and linear algebra.
● Engine○ Compiler/Optimizer○ Lots of optimizations, such as
rewrites.
● Runtime○ Laptop○ Spark○ (also Hadoop)
(DML) (PyDML)
Engine
SystemML - Compilation Chain
32
SystemML - Compilation Chain
33
Spark
CP + b sb _mVar1SPARK mapmm X.MATRIX.DOUBLE _mvar1.MATRIX.DOUBLE _mVar2.MATRIX.DOUBLE RIGHT false NONE CP * y _mVar2 _mVar3
Apache SystemML● High-level language
○ DML -> R-like○ PyDML -> Python-like
○ Focus is on matrices and linear algebra.
● Engine○ Compiler/Optimizer○ Lots of optimizations, such as
rewrites.
● Runtime○ Laptop○ Spark○ (also Hadoop)
(DML) (PyDML)
Engine
SystemML Architecture (APIs and runtime)
35
Command Line JMLC Spark
MLContextSpark
MLAPIs
High-Level Operators (HOPs)
Parser/Language
Low-Level Operators (LOPs)
Compiler
Runtime
Control ProgramRuntime
ProgBuffer Pool
ParFor Optimizer/Runtime
MR InstSpark Inst
CPInst
Recompiler
Cost-based optimizations
DFS IOMem/FS IO
Generic MR
MatrixBlock Library(single/multi-threaded)
SystemML Architecture (APIs and runtime)
36
Command Line JMLC Spark
MLContextSpark
MLAPIs
High-Level Operators (HOPs)
Parser/Language
Low-Level Operators (LOPs)
Compiler
Runtime
Control ProgramRuntime
ProgBuffer Pool
ParFor Optimizer/Runtime
MR InstSpark Inst
CPInst
Recompiler
Cost-based optimizations
DFS IOMem/FS IO
Generic MR
MatrixBlock Library(single/multi-threaded)
Demo
Current Work
Current Work● Usability / Applications:
○ Deep Learning (SYSTEMML-540)○ Embedded Scala/Python/R DSL with sufficient optimization scope (SYSTEMML-451)
● Optimizer:○ Cost-model enhancement (SYSTEMML-416)○ Global program optimization (SYSTEMML-421)○ Source code generation for automatic operator fusion (SYSTEMML-448)
● Runtime:○ Add GPU backend (SYSTEMML-445) => CUDA / OpenCL○ Frame support / Sparse block representation○ Integrate Apache Flink as additional backend for SystemML (SYSTEMML-636 / PR-119)○ NUMA-aware single node backend (SYSTEMML-406)
Deep Learning - Plans● Deep Learning library for SystemML written in DML (SYSTEMML-618).
○ SystemML-NN [https://github.com/dusenberrymw/systemml-nn]
● Built-in DML functions for computationally-intensive layers.○ Convolution (2D), Max Pooling
● GPU acceleration for these built-in functions (SYSTEMML-445).● Integration with existing deep learning libraries (Keras, TensorFlow, Torch,
etc.)?
Deep Learning - SystemML-NN Library● Deep learning library written in DML (and
PyDML soon…).● Multiple layers:
○ Core:■ Affine, 2D Convolution, Max Pooling
○ Nonlinearity/Transfer:■ Sigmoid, Tanh, Softmax, ReLU
○ Regularization:■ Dropout, L1, L2
○ Loss:■ Log-loss, Cross-entropy, L1, L2
● Multiple optimizers:○ SGD, SGD w/ momentum, SGD w/
Nesterov momentum, Adagrad, RMSprop, Adam
https://github.com/dusenberrymw/systemml-nn
Deep Learning - SystemML-NN Library (cont.)
https://github.com/dusenberrymw/systemml-nn
● Each layer type has a simple `forward(...)` and `backward(...)` API.
○ `forward(...)` computes the output of the function based on the inputs.
○ `backward(...)`computes the partial derivatives (gradient) of the inputs to the function w.r.t. some function deeper in the network (usually the loss function at the end).
● Each optimizer has a simple `update(...)` API.
○ `update(...)` adjusts the given parameters based on their partial derivatives.
● Includes test code in DML.○ Gradient checks, unit tests
Deep Learning - SystemML-NN Library (cont.)
SystemML-NN
SystemMLEngine
Apache SystemML
1. Backgrounda. Machine Learningb. Declarative ML
2. SystemMLa. Overviewb. Languagec. Compiler/Optimizerd. Runtime
3. Demo4. Current Work
a. Deep Learning: SystemML-NN
5. Questions
Agenda Revisited
Links
● Main Website:systemml.apache.org
● Code:github.com/apache/incubator-systemml
● Documentation:apache.github.io/incubator-systemml
● JIRA:issues.apache.org/jira/browse/SYSTEMML
Questions?
Thanks!