Upload
ai-frontiers
View
415
Download
1
Embed Size (px)
Citation preview
Towards Better DL Frameworks
Yangqing JiaResearch Lead on AI Platforms, Facebook
Source: XKCD, [Girshick et al. CVPR 2014]
• Researchers: "I will need to reproduce the ResNet paper."
• Companies: "I need to apply DL to drive cars."
The NeedsTwo sides of the same coin
• A grad student driven project• Started by doing one job really well: image
classification• Adopted by industry participants• Popular deep learning framework run by a non-
profit.
Yet very minimal (10k LOC)
Democratizing Deep Learning w/ CaffeGetting AlexNet running in 10 mins
http://caffe.berkeleyvision.org/
What makes a better DL library?
???
"MAPS"! !!
"MAPS"-
Scalability
ScalabilityRun fast, run far
“How do I train on multiple GPUs and machines?”
- Probably the most question we got from Caffe users
ScalabilityRun fast, run far
L1 L2 L3 L3b L2b L1b U3 U2 U1
ScalabilityRun fast, run far
L1 L2 L3 L3b L2b L1b U3 U2 U1R3 R2 R1
ScalabilityRun fast, run far
L1 L2 L3 L3b L2b L1b U3 U2 U1R3 R2 R1
L1 L2 L3 L3b L2b L1b U3 U2 U1R3 R2 R1
ScalabilityRun fast, run far
L1 L2 L3 L3b L2b L1b
U3 U2 U1R3 R2 R1
L1 L2 L3 L3b L2b L1b
U3 U2 U1R3 R2 R1
The Return of MPI"I'm your father", said Allreduce.
AllreduceTree based - O(MlogN)
Ring based - O(M)etc.
ScalabilitySitting on top of giants
... and many more
"MAPS"-
Portability
Portable SystemCloud, Mobile, IoT, Cars, Drones, Coffee makers
AI Math and Algorithms
Deployment Platforms
Portable SystemCloud, Mobile, IoT, Cars, Drones, Coffee makers
Model
auto predictor = caffe2::Predictor(model_file)
public class Predictor implements Caffe2ModelInterface;
Still, a lot of thoughts needed
• Limited computation• Battery life is a thing• Our models may be luxurious• Ecosystem less developed
Portable System Challenges
"MAPS"-
Augmented Comp Patterns
Augmented Comp PatternsForget about float dense math, the world is bigger
• Quantized Computation• Sparse Math Libraries• Model Compression• Rethinking Existing Operations
Quantized ComputationForget about float, the world is bigger
8 23
5 10
16
8
floatfp16
fixed16fixed8
Quantized ComputationForget about float, the world is bigger
float add
fp16 add
fixed16 add
fixed8 add
0.9
0.4
0.05
0.03
float mul
fp16 mul
fixed8 mul
4.0
1.0
0.2
Why?
Source: Nvidia https://devblogs.nvidia.com/parallelforall/mixed-precision-programming-cuda-8/
Rethinking Existing OperationsResNEXT is coming to town
gconv gconv
g g g g g g g g g ...
g
g g g g g g g g g ...
g
AlexNet Group Conv
ResNext
Augmented Math ChallengesForget about float, the world is bigger
• Solutions• Eigen fp16• CuDNN• NNPack• gemmlowp
• Challenges• Seamless
conversion?• Model training?• Performance tuning?• ...
"MAPS"-
Modularity
A Repeated Pattern
Many key components in deep learning are
reusable across frameworks.
In 2013 it used to be...
Caffe Torch Theano ...
Unix Philosophy?
Applications
Caffe, Torch, TF, MXNet, etc...
Core MathEigen
CuDNN NNPackTHNNMKL
CommsNCCL
MPIZeroMQ
Redis...
Low LevelCUDA
OpenGLOpenCLVulkan
...
Compilers
DataBasesLevelDB RocksDBHadoop
Amazon S3your old disk
or, "UnFramework"
ModularDesigns
MAPS for a good frameworkAugmented
MathematicsPortableSystem
Scalability
Interface toExistingToolkits
EfficientMobile
Runtimes
Tuned CollectivePrimitives
Optimized Math
Libraries
+Flexible Framework Design
No Silver Bullet?
There is no silver bullet
Industry:StabilityScale & speedData IntegrationRelatively Fixed
Research:Flexible
Fast IterationDebuggable
Relatively bare-bone
Caffe Torch
TheanoTensorFlowD4J etc.
There is no silver bullet
Industry:StabilityScale & speedData IntegrationRelatively Fixed
Research:Flexible
Fast IterationDebuggable
Relatively bare-bone
Caffe Torch
“In open source, we feel strongly thatto really do something well,
you have to get a lot of people involved.”
— Linus Torvalds
Thank you!
Towards Better Deep Learning FrameworksYangqing Jia, Research Lead on AI Platforms, Facebook