Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input

Introduction to Deep Learning

Quan Geng

Columbia UniversityOctober 25, 2019

1

Outline

● Background of myself● Motivation: Success of Deep Learning● Basics of Deep Learning

○ Neural networks: Neuron, activation function○ Optimizers: (Stochastic) Gradient Descent ○ Backpropagation○ Convolutional Neural Network

● Applications of Deep Learning○ Personal Photo Search, Search Ranking, Smart Reply, ...

● Summary2

Reference

● Jeffrey Hinton, Yoshua Bengio, and Yann LeCun, Tutorial on Deep Learning

● Jeff Dean, Trends and Developments in Deep Learning Research● Jeff Dean, Large-Scale Deep Learning With TensorFlow● Ian Goodfellow and Yoshua Bengio and Aaron Courville, Deep Learning

(textbook)

● Google, Machine Learning Crash Course

3

https://drive.google.com/drive/folders/0BxKBnD5y2M8NYWg5SmNDdGl2Sk0

https://www.slideshare.net/AIFrontiers/jeff-dean-trends-and-developments-in-deep-learning-research

https://www.matroid.com/scaledml/slides/jeff.pdf

http://www.deeplearningbook.org/

https://developers.google.com/machine-learning/crash-course

Background of myself

● 2013 PhD from ECE Dept., University of Illinois Urbana Champaign● 2014 - 2015 Quantitative Analyst, Tower Research, New York● 2015 - now Senior Software Engineer, Google Research, New York

● Homepage: https://dreaven.github.io/

4

https://dreaven.github.io/

Machine Learning Jobs

5

https://www.quanwei.tech/?job=machine+learning

https://www.quanwei.tech/?job=machine+learning

Motivation: Success of Deep Learning

6

2018 ACM Turing Awards

7https://awards.acm.org/about/2018-turing

ACM named Yoshua Bengio, Geoffrey Hinton, and Yann LeCun recipients of the 2018 ACM A.M. Turing Award for conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing.

In recent years, deep learning methods have been responsible for astonishing breakthroughs in computer vision, speech recognition, natural language processing, and robotics.

https://awards.acm.org/about/2018-turing

https://awards.acm.org/about/2018-turing

https://awards.acm.org/award_winners/bengio_3406375

https://awards.acm.org/award_winners/hinton_4791679

https://awards.acm.org/award_winners/lecun_6017366

A brief history

8Image credit

https://www.slideshare.net/mohamedloey/deep-learning-71352250

First major success of Neural Networks: AlexNet

9

ImageNet● 15M labeled high-resolution images● 22,000 categories.

ImageNet Large Scale Visual Recognition Challenge (ILSVRC)● A subset of ImageNet● 1000 Images in each of 1000 Categories● 1.2M training images● 50K validation images● 150K testing images

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, 2012

https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

Neural networks dominate ImageNet competitions

10

Trend of Deep Learning papers

11Evolution on the number of papers published on Deep Learning topics with respect to those on Deep Learning in BioInformatics (source ink) .

https://www.researchgate.net/figure/Annual-trend-in-the-number-of-papers-related-to-deep-learning-in-the-medical-field_fig1_332275109

Deep Learning for High Frequency Trading

12http://www.hudson-trading.com/careers/job/?gh_jid=940856

http://www.hudson-trading.com/careers/job/?gh_jid=940856

Basics of Deep Learning

13

14https://cs.nyu.edu/~yann/talks/lecun-ranzato-icml2013.pdf

https://cs.nyu.edu/~yann/talks/lecun-ranzato-icml2013.pdf

15

16

Neuron in Human Brain

17https://training.seer.cancer.gov/anatomy/nervous/tissue.html

https://training.seer.cancer.gov/anatomy/nervous/tissue.html

Artificial Neuron in Deep Learning

18

Activation functions introduces non-linearity in the model. Commonly used

● Sigmoid functions● Rectified linear unit (ReLU)

Feed-Forward Neural Networks (FFNN)

19Input layer

Hidden layersOutput layer

Neural network training: (Stochastic) Gradient Descent

20

● 1. Randomly initialize the weights in the neural networks.● 2. Given a batch or all input data, compute the predicted output.● 3. Compute the loss of the actual output and the predicted

output.● 4. Compute the gradient for each weight in the neural network.● 5. Update the weights based on the gradient.

Repeat from step 2 until convergence.

Neural network training: Backpropagation

21

4. Compute the gradient for each weight in the neural network.

video link

https://www.youtube.com/watch?v=An5z8lR8asY

Neural network training: Optimizers for Gradient descent

22https://towardsdatascience.com/types-of-optimization-algorithms-used-in-neural-networks-and-ways-to-optimize-gradient-95ae5d39529f

https://towardsdatascience.com/types-of-optimization-algorithms-used-in-neural-networks-and-ways-to-optimize-gradient-95ae5d39529f

Properties for Neural network training

23

Properties of Gradient Descent algorithms● Does not guarantee convergence to the global minima● Random initialization converges to different local minima● But performs very well in practice

Techniques to avoid overfitting● Weight regularization● Dropout● Early-stop

video link

https://www.youtube.com/watch?v=vzoe2G5g-w4&t=384s

Convolutional Neural Network

24

Convolutional Neural Network

25

Applications of Deep Learning

26

Deep Learning Frameworks

27https://medium.com/@NirantK/the-silent-rise-of-pytorch-ecosystem-693e74b33f1e

https://medium.com/@NirantK/the-silent-rise-of-pytorch-ecosystem-693e74b33f1e

TensorFlow

28

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications. Released by Google Brain in 2015.

TensorFlow Example: MNIST

29

Link

https://www.tensorflow.org/overview/

30

31

32

33

34

35

Summary

36

● Motivation: Success of Deep Learning● Basics of Deep Learning (DL)

○ Neural networks (NN): Neuron, activation function○ Optimizers: (Stochastic) Gradient Descent ○ Backpropagation○ Convolutional Neural Network

● Applications of Deep Learning○ TensorFlow○ Personal Photo Search, Search Ranking, Smart Reply and more

● Advanced topics (not covered in this lecture)○ recurrent neural networks○ sequence model○ word2vec (text embeddings)○ advanced optimizers○ autoencoder○ generative adversarial network

Thank you!

37

Documents

Introduction to Deep Learning · Neural network training: (Stochastic) Gradient Descent 20 1. Randomly initialize the weights in the neural networks. 2. Given a batch or all input