Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Deep Learning
Frameworks
Dell - Internal Use - Confidential2 of Y
Dell Customer Communication - Confidential
Deep Learning Frameworks• Tensorflow: Google
• Keras: Google
• Pytorch: Facebook
• Caffe: Berkley
• Theano (discontinued)
• MXNET: Apache
(Amazon)
• CNTK: Microsoft
• Caffee2: Facebook
Deep Learning Framework Power Scores 2018
Dell - Internal Use - Confidential3 of Y
Dell Customer Communication - Confidential
Power Score Weights
Dell - Internal Use - Confidential4 of Y
Dell Customer Communication - Confidential
Trends Deep Learning Frameworks
Tensorflow 2.0 Changes
Dell - Internal Use - Confidential5 of Y
Dell Customer Communication - Confidential
Programing Model Tensorflow
• Variables are 0-ary stateful nodes which
output their current value.
• Placeholders are 0-ary nodes whose
value is fed in at execution time.
• Mathematical operations:– MatMul: Multiply two matrix values.
– Add: Add elementwise (with
broadcasting).
– ReLU: Activate with elementwise
rectified linear function.
Dell - Internal Use - Confidential6 of Y
Dell Customer Communication - Confidential
Programing Model Tensorflow
import numpy as np
import tensorflow as tf
b = tf.Variable(tf.zeros((100,)))
W = tf.Variable(tf.random_uniform((784, 100),
-1, 1))
x = tf.placeholder(tf.float32, (None, 784))
h_i = tf.nn.relu(tf.matmul(x, W) + b)
-----Initial and Run Session-------
sess = tf.Session()sess.run(tf.initialize_all_variables())sess.run(h_i, {x: np.random.random(64, 784)})
Dell - Internal Use - Confidential7 of Y
Dell Customer Communication - Confidential
Keras: Making neural Nets and Tensorflow easy
Dell - Internal Use - Confidential8 of Y
Dell Customer Communication - Confidential
Eager Execution
Standard Graph Mode
Eager Execution
Dell - Internal Use - Confidential9 of Y
Dell Customer Communication - Confidential
Pytorch
Dell - Internal Use - Confidential10 of Y
Dell Customer Communication - Confidential
Training is the main Challenge of AI
Fastest Accelerator Cards Today ~ 100 Tera Flops/secCourtesy NVidia
Days Weeks
Month
Dell - Internal Use - Confidential11 of Y
Dell Customer Communication - Confidential
Scaling with distributed Training
Source Google
Dell - Internal Use - Confidential12 of Y
Dell Customer Communication - Confidential
Distributed Training: Data Parallelism
Source Google
Dell - Internal Use - Confidential13 of Y
Dell Customer Communication - Confidential
Distributed Training: Model Parallelism
Source Google
Model and Data
Parallelism possible
together
Rarely used
Dell - Internal Use - Confidential14 of Y
Dell Customer Communication - Confidential
Data Parallelism in Practice
Master
Dell - Internal Use - Confidential15 of Y
Dell Customer Communication - Confidential
Uber Horovod
Benchmark on 32 servers with 4 Pascal GPUs each connected by RoCE-capable 25 Gbit/s network