Comparison of deep learning frameworks from a viewpoint of double backpropagation

Comparisonofdeeplearningframeworksfromaviewpointof

doublebackpropagation

PreferredNetworks,Inc.KentaOono <oono@preferred.jp>

Chainer Meetup#6@PreferredNetworksSep.30th 2017

Agenda

• TechnologicalstackofDLframeworks• DesignchoiceinDLframeworks• Doublebackprop primer• Codingexamplesofdoublebackprop inChainer,PyTorch,andTF

TechnologystackofaDLframework

name functions example

Graphical visualization DIGITS, TensorBoard

Machine learning workflowmanagement

Dataset prep, Save/LoadTraining loop

Keras, TF slim

Computational graph(CG)management

Build/Optimize CGsForward/Back prop

Theano, TensorFlowTorch.nn

Multi-dimensionalarray processing

High-level array manipulation

NumPy, CuPyEigen, Torch (core)

Numerical computation Matrix operationConvolution

BLAS(OpenBLAS, MKL),cuBLAS, cuDNN, MKL DNN

Computational device CPU, GPU, TPU, FPGA

TechnologystackofChainer

Chainer

NumPy CuPy

BLAS cuBLAS,cuRAND

CPU GPU

Graphical visualization

Machine learning workflowmanagementComputational graph managementMulti-dimensionalarray processingNumerical computation

Computational device

TechnologystackofTensorFlow

TensorFlow

Eigen::Tensor

BLAS cuBLAS,cuRAND

CPU GPU

TensorBoard

TFslimKeras

TechnologystackofTheano

CUDA,OpenCLCUDAToolkit

Theano

CPU GPU

libgpuarrayNumPy

Keras,Lasagne,Blocks,etc.

TechnologystackofKeras

TensorFlowTheano

TechnologyStackofTheano

TechnologyStackofTF

ImportantDesignChoicesthroughuser’stypicalworkflow

WriteNNs(inwhichlanguage?)

Computebackprop(how?)

Updateparameters(howtorepresent?)(howtoupdate?)

Runusercodes(when?)

OptimizeCG(how?)

Scaleuptraining(how?)

Coding Execution Improvement

ImportantDesignChoicesthroughuser’stypicalworkflow

WriteNNs(inwhichlanguage?)

Computebackprop(how?)

Updateparameters(howtorepresent?)(howtoupdate?)

Runusercodes(when?)

Coding Execution Improvement

OptimizeCG(how?)

Scaleuptraining(how?)

http://bit.ly/aaai-dlif

NeuralNetworkasaComputationalGraph

• Inmostframeworks,NNisconceptualizedasacomputationalgraph(CG).• ThesimplestformofCGisabipartite DAG(DirectedAcyclicGraph)consistingofdatanodes andoperatornodes.

y = x1 * x2z = y - x3

x1 mul suby

datanode

operatornode15

MultiLayerPerceptron(MLP)

x Affine

h1 ReLU a1

Affine

h2 ReLU a2

Softmax prob Cross

Entropy loss

HowtocomputebackpropBackprop throughgraphsFrameworkonlybuildsgraphsofforwardprop,anddobackpropbybacktrackingthegraphs.

E.g.Torch.nn,Caffe

Backprop asextendedgraphsFrameworkbuildsgraphsforbackprop aswellasthoseforforwardprop.

E.g.Theano,MXNet,TensorFlow,Chainer,PyTorch

a mul suby

∇y z∇a z ∇z z = 1

Howtocomputebackprop

Backprop throughgraphs

EasyandsimpletoimplementBackpropcomputationneednotbedefinedasgraphs.

LowflexibilityFeaturesavailableforgraphsmaynotapplytobackpropcomputations.

Backprop asextendedgraphs

Implementationgetscomplicated

HighflexibilityAnyfeaturesavailableforgraphscanalsobeappliedtobackpropcomputations(e.g.backpropofbackprop).

Doublebackprop

・・・ L

class F(FunctionNode):def forward(self, x, y):

return x * x + y

def backward(self, x, y, gz):return 2 * gz * x, gz

NumPy,CuPy

Note:Theinterfaceissimplifiedfromactualimplementation.

chainer.Variable->CreatesCG

Doublebackprop

gx Grad F gz

・・・ L

Backprop!

=∂L/∂z=∂L/∂x

=∂L/∂y

=∂L/∂L

Doublebackprop

gx Grad F 1.0

Backprop!

=∂z/∂x

=∂z/∂y 21

Doublebackprop

Grad F1.0

Doublebackpropx Mul z

Grad F1.0

Backprop!

1.0DoubleGrad F

=∂2z/∂x2 23

Doublebackprop

ComputesthedifferentiationofL = G(f(x), ∇f(x)) withrespecttox

L = G(f(x), ∇f(x))

Doublebackprop

gxGrad f

L = G(f(x), ∇f(x))

Doublebackprop

gxGrad f

・・・ L

L = G(f(x), ∇f(x))

Doublebackprop

gxGrad f

・・・ L

Backprop!

ggxDoubleGrad f

∂L/∂x

1.0gzGrad f

L = G(f(x), ∇f(x))

Example(Chainer)

http://bit.ly/2wpEzO5

Example(PyTorch)

Example(TensorFlow)

Conclusion

• SeveralDLframeworkshavesimilarityintheirstructure• Differenceinchoiceofdesigndeterminescapabilityofframeworks• Introductionofdoublebackprop andtoyexamplesinseveralframeworks.

Comparison of deep learning frameworks from a viewpoint of double backpropagation

Technology

Backpropagation without Multiplicationpapers.nips.cc/paper/833-backpropagation-without...Backpropagation without Multiplication Patrice Y. Simard AT &T Bell Laboratories Holmdel, NJ

Hardware Implementation Backpropagation

Utp sirn_s8_algoritmo backpropagation

Lecture 4 Backpropagation - ttic.uchicago.eduttic.uchicago.edu/~shubhendu/Pages/Files/Lecture4_flat.pdf · Lecture 4 Backpropagation CMSC 35246. A General View of Backpropagation

Jst Metode Backpropagation

UN EJEMPLO USANDO BACKPROPAGATION Y MATLABwebdelprofesor.ula.ve/.../economia/programas_de_backpropagation.pdf · Backpropagation 160 UN EJEMPLO USANDO BACKPROPAGATION Y MATLAB 1

MLP & Backpropagation Issues

7 The Backpropagation Algorithm

The Art Of Backpropagation

The backpropagation learning procedure

PERBANDINGAN ALGORITMA BACKPROPAGATION DAN …

Backpropagation - cs.cmu.edumgormley/courses/10601-s18/slides/lecture14... · 2.2.2 Backpropagation Thebackpropagationalgorithm (Rumelhartetal., 1986)isageneralmethodforcomputing

PERBANDINGAN ALGORITMA BACKPROPAGATION LEVENBERG

Backpropagation - cs.cmu.edumgormley/courses/10601/slides/lecture13-backprop.pdf · 2.2.2 Backpropagation Thebackpropagationalgorithm (Rumelhartetal., 1986)isageneralmethodforcomputing

Learnig backpropagation-Rumelhart

BACKPROPAGATION NEURAL NETWORK

Backpropagation Network Structure

METODE BACKPROPAGATION

Backpropagation Networks

Backpropagation algorithm