View
1.820
Download
0
Category
Preview:
Citation preview
Comparisonofdeeplearningframeworksfromaviewpointof
doublebackpropagation
PreferredNetworks,Inc.KentaOono <oono@preferred.jp>
Chainer Meetup#6@PreferredNetworksSep.30th 2017
1
Agenda
• TechnologicalstackofDLframeworks• DesignchoiceinDLframeworks• Doublebackprop primer• Codingexamplesofdoublebackprop inChainer,PyTorch,andTF
2
TechnologystackofaDLframework
name functions example
Graphical visualization DIGITS, TensorBoard
Machine learning workflowmanagement
Dataset prep, Save/LoadTraining loop
Keras, TF slim
Computational graph(CG)management
Build/Optimize CGsForward/Back prop
Theano, TensorFlowTorch.nn
Multi-dimensionalarray processing
High-level array manipulation
NumPy, CuPyEigen, Torch (core)
Numerical computation Matrix operationConvolution
BLAS(OpenBLAS, MKL),cuBLAS, cuDNN, MKL DNN
Computational device CPU, GPU, TPU, FPGA
3
TechnologystackofChainer
cuDNN
Chainer
NumPy CuPy
BLAS cuBLAS,cuRAND
CPU GPU
4
name
Graphical visualization
Machine learning workflowmanagementComputational graph managementMulti-dimensionalarray processingNumerical computation
Computational device
TechnologystackofTensorFlow
cuDNN
TensorFlow
Eigen::Tensor
BLAS cuBLAS,cuRAND
CPU GPU
5
TensorBoard
TFslimKeras
name
Graphical visualization
Machine learning workflowmanagementComputational graph managementMulti-dimensionalarray processingNumerical computation
Computational device
TechnologystackofTheano
CUDA,OpenCLCUDAToolkit
Theano
BLAS
CPU GPU
6
libgpuarrayNumPy
Keras,Lasagne,Blocks,etc.
name
Graphical visualization
Machine learning workflowmanagementComputational graph managementMulti-dimensionalarray processingNumerical computation
Computational device
TechnologystackofKeras
7
Keras
TensorFlowTheano
TechnologyStackofTheano
TechnologyStackofTF
name
Graphical visualization
Machine learning workflowmanagementComputational graph managementMulti-dimensionalarray processingNumerical computation
Computational device
8
9
10
11
12
ImportantDesignChoicesthroughuser’stypicalworkflow
WriteNNs(inwhichlanguage?)
Computebackprop(how?)
Updateparameters(howtorepresent?)(howtoupdate?)
Runusercodes(when?)
OptimizeCG(how?)
Scaleuptraining(how?)
Coding Execution Improvement
ImportantDesignChoicesthroughuser’stypicalworkflow
WriteNNs(inwhichlanguage?)
Computebackprop(how?)
Updateparameters(howtorepresent?)(howtoupdate?)
Runusercodes(when?)
Coding Execution Improvement
OptimizeCG(how?)
Scaleuptraining(how?)
13
http://bit.ly/aaai-dlif
14
NeuralNetworkasaComputationalGraph
• Inmostframeworks,NNisconceptualizedasacomputationalgraph(CG).• ThesimplestformofCGisabipartite DAG(DirectedAcyclicGraph)consistingofdatanodes andoperatornodes.
y = x1 * x2z = y - x3
x1 mul suby
x3
z
x2
datanode
operatornode15
MultiLayerPerceptron(MLP)
x Affine
W1 b1
h1 ReLU a1
Affine
W2 b2
h2 ReLU a2
Softmax prob Cross
Entropy loss
t 16
HowtocomputebackpropBackprop throughgraphsFrameworkonlybuildsgraphsofforwardprop,anddobackpropbybacktrackingthegraphs.
E.g.Torch.nn,Caffe
Backprop asextendedgraphsFrameworkbuildsgraphsforbackprop aswellasthoseforforwardprop.
E.g.Theano,MXNet,TensorFlow,Chainer,PyTorch
a mul suby
c
z
b
a mul suby
c
z
b
gzid
neg
mul
mul
gy
gc
ga
gb
∇y z∇a z ∇z z = 1
17
Howtocomputebackprop
Backprop throughgraphs
EasyandsimpletoimplementBackpropcomputationneednotbedefinedasgraphs.
LowflexibilityFeaturesavailableforgraphsmaynotapplytobackpropcomputations.
Backprop asextendedgraphs
Implementationgetscomplicated
HighflexibilityAnyfeaturesavailableforgraphscanalsobeappliedtobackpropcomputations(e.g.backpropofbackprop).
18
Doublebackprop
x F z
y
・・・ L
class F(FunctionNode):def forward(self, x, y):
return x * x + y
def backward(self, x, y, gz):return 2 * gz * x, gz
NumPy,CuPy
Note:Theinterfaceissimplifiedfromactualimplementation.
chainer.Variable->CreatesCG
19
Doublebackprop
x F z
y
gx Grad F gz
gy
・・・ L
Backprop!
=∂L/∂z=∂L/∂x
=∂L/∂y
1.0
=∂L/∂L
Mul
x
gz
y
gx
gy
*2
20
Doublebackprop
x F z
y
gx Grad F 1.0
gy
Backprop!
=∂z/∂x
=∂z/∂y 21
Doublebackprop
x F z
y
gx
Grad F1.0
gy
22
Doublebackpropx Mul z
y
gx
Grad F1.0
gy
Backprop!
1.0DoubleGrad F
ggx
=∂2z/∂x2 23
Doublebackprop
x f z
ComputesthedifferentiationofL = G(f(x), ∇f(x)) withrespecttox
L = G(f(x), ∇f(x))
24
Doublebackprop
x f z
gxGrad f
ComputesthedifferentiationofL = G(f(x), ∇f(x)) withrespecttox
L = G(f(x), ∇f(x))
25
Doublebackprop
x f z
gxGrad f
・・・ L
ComputesthedifferentiationofL = G(f(x), ∇f(x)) withrespecttox
L = G(f(x), ∇f(x))
26
Doublebackprop
x f z
gxGrad f
・・・ L
Backprop!
ggxDoubleGrad f
∂L/∂x
1.0gzGrad f
ComputesthedifferentiationofL = G(f(x), ∇f(x)) withrespecttox
L = G(f(x), ∇f(x))
27
Example(Chainer)
http://bit.ly/2wpEzO5
28
Example(PyTorch)
29
Example(TensorFlow)
30
Conclusion
• SeveralDLframeworkshavesimilarityintheirstructure• Differenceinchoiceofdesigndeterminescapabilityofframeworks• Introductionofdoublebackprop andtoyexamplesinseveralframeworks.
31
Recommended