Minerva: A Scalable and Highly Efficient Training Platform for Deep Learning M. Wang, T. Xiao, J....

Minerva: A Scalable and Highly Efficient Training Platform for Deep Learning

M. Wang, T. Xiao, J. Li, J. Zhang, C. Hong, & Z. Zhang (2014)

Presentation by Cameron Hamilton

Overview

• Problem: disparity between deep learning tools oriented towards productivity/generality (e.g. MATLAB) and task-specific tools designed for speed and scale (e.g. CUDA-Convnet).

• Solution: A matrix-based API, known as Minerva, with a MATLAB-like procedural coding style. Program is translated into an internal dataflow graph at runtime, which is generic enough to be implemented on different types of hardware.

Minerva System Overview

• Every training iteration has two phases– Generate dataflow graph from user code– Evaluate dataflow graph

Example of User Code

System Overview: Performance via Parallelism

• Performance of deep learning algorithms dependent on whether operations can be performed in parallel. Minerva utilizes two forms of parallelism:– Model parallelism: model replicas used to train the same

model• Replicas exchange updates via “logically centralized parameter

server” (p. 4).

– Data parallelism: model replicas assigned to different portions of the data sets

• Always evaluates on GPU if available

Programming Model

• Minerva API 3 stages for deep learning– Define model architecture • Model model;• Layer layer1 = model.AddLayer(dim);• model.AddConnection(layer1,layer2,FULL);

– Declaring primary matrices (i.e. weights & biases)• Matrix W = Matrix(layer2,layer1,RANDOM);• Matrix b(layer2,1,RANDOM);• Vector<Matrix> inputs = LoadBatches(layer1,…);

Programming Model

– Specifying training procedure

Convolutional neural networks (CNNs) are specified with a different syntax. The architecture is specified with a single line: AddConvConnect(layer1,layer2,…). Minerva then handles the arrangement of these layers (p.4).

Programming Model

• Expressing Parallelism– Model Parallelism

• SetPartition(layer1,2);SetPartition(layer2,2);

– Data Parallelism• ParameterSet pset;• pset.Add(“W”,W); pset.Add(“V”,V);• pset.Add(“b”,b); pset.Add(“c”,c);• RegisterToParameterServer(pset);• …//Learning Procedure Here• if(epoch % 3 == 0) PushToParameterServer(pset);• if(epoch % 6 == 0) PullFromParameterServer(pset);

EvalAll();

Putting it All Together

System Design: More on Parallelism

• Within a neural network, the operations that will occur at each computing vertex (i.e. forward propagation, backward propagation, weight update) are predefined. This allows for network training to be partitioned for theoretically any number of threads.– Updates shared between local parameter servers

• Load-balance by dividing task up amongst partitions• Coordination and Overhead by determining ownership of

computing vertex based on location of its input and output vertices. Partitions stick to their vertices.

• Locality by receiving input to vertex in layer n from n-1 and outputting layer n+1

Model Parallelism

Convolutional Networks

• Partitions handle patches of the input data, the patches are merged, then convolved with a kernel.

Minerva: A Scalable and Highly Efficient Training Platform for Deep Learning M. Wang, T. Xiao, J....

Documents

144116-10 Y. Zhang and E. J. Maginn J. Chem. Phys.144116-11 Y. Zhang and E. J. Maginn J. Chem. Phys. 136, 144116 (2012) becomes independent of the number of voids, then the pic-ture

Minerva Hernández

Analysis of the Transcriptome of Erigeron breviscapus ... · Citation: Jiang N-H, Zhang G-H, Zhang J-J, Shu L-P, Zhang W, et al. (2014) Analysis of the Transcriptome of Erigeron breviscapus

Opeoluwa (Luwa) Matthews, Meng Zhang, and Daniel J. Sorin

Apresentação Minerva

Levchenko, A. a., & Zhang, J. (2012)

36. Minerva

Minerva MNR 502 Mass Sets - Minerva Metrology & Calibration

W. J. Zhang, L. X. You , H. Li, J. Huang, C. L. Lv, L

Atenea minerva

Statistical Mechanics of Ion Channels: No Life Without Entropy A. Kamenev J. Zhang J. Zhang B. I. Shklovskii A. I. Larkin Department of Physics, U of Minnesota

Atena minerva

Xu and Zhang., Afr J Tradit Complement Altern Med., (2017

Revista Minerva

Perhexiline, a KLF14 activator, reduces atherosclerosis by ... · Minerva T. Garcia-Barrio5, Ji Zhang 1, Lixia Zeng6, Lei Li4,6, Subramaniam Pennathur6, Cristen J Willer1, Daniel

Socrates - Minerva

minerva gestión

J. J. Estrella Sanchez - Zhang San Feng. One Man One Destiny (Thesis)

Package ‘RAMpath’Zhiyong Zhang, Jack McArdle, Aki Hamagami, and Kevin Grimm Maintainer: Zhiyong Zhang References Boker, S. M., McArdle, J. J. & Neale,

Minerva Intelligence Inc. · 2020. 5. 14. · Minerva Intelligence (Canada) Ltd. (formerly Minerva Intelligence Inc.) (“Minerva anada”) was incorporated on May 17, 2017 pursuant