71
DEEP MACHINE LEARNING “A Shallow Introduction” IAT 813, Instructor Steve DiPaola Guest Lecturer: Graeme McCaig March 12, 2015

DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

  • Upload
    donhu

  • View
    227

  • Download
    2

Embed Size (px)

Citation preview

Page 1: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

DEEP MACHINE LEARNING “A Shallow Introduction”

IAT 813, Instructor Steve DiPaola Guest Lecturer: Graeme McCaig

March 12, 2015

Presenter
Presentation Notes
Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive Exam. Thanks to you all for making the time to attend.
Page 2: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

• Deep Learning (DL) is a complex topic

• Authors often employ heavy statistics, machine learning terminology

• This lecture: overview the field and de-mystify key terms, concepts • I hope to save you time/struggle getting started if you pursue DL in your work • Topics not covered much: Recurrent Nets, Autoencoders

2

Hazards of the Deep

Page 3: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

OVERVIEW 1. Deep learning – believe the hype?

• DL in the news • “Depth” definition and benefits

2. What has changed? Is this just NNets?

• DL recent history timeline

3. Types of Deep Learning network and training • Restricted Boltzmann Machines &

Deep Belief Networks • Convolutional Networks

4. Practical advice

• Useful libraries; GPU • Further reading

3

TERMS & CONCEPTS: Minibatch Probabilistic/Stochastic Undirected, Energy-based Pre-training, Fine-tuning Convolution Dropout, ReLU

Page 4: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

DEEP LEARNING IN THE NEWS

4

Page 5: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

DEEP LEARNING IN THE NEWS

• Visual Object ___ • Recognition • Detection • Captioning

5

Page 6: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

• Object recognition task

• Recent State-of-Art results

• He et al. (2015) Microsoft Research (arXiv preprint)

http://arxiv-web3.library.cornell.edu/pdf/1502.01852v1.pdf

6

Page 7: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

7

http://googleresearch.blogspot.ca/2013/06/improving-photo-search-step-across.html

Page 8: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

8

http://cs.stanford.edu/people/karpathy/deepimagesent/ Andrej Karpathy, Li Fei-Fei (2014) Stanford

Page 10: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

10

Andrej Karpathy, Li Fei-Fei (2014) Stanford http://cs.stanford.edu/people/karpathy/deepimagesent/

GoogLeNet Detection Model (2014) http://googleresearch.blogspot.ca/2014/09/building-deeper-understanding-of-images.html

Page 11: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

DEEP LEARNING IN THE NEWS

• Applications • Self-driving cars • Biomedical imaging • Predicting DNA disease mapping • Drug discovery / virtual screening • Smartphone Apps

11

Page 12: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

12

NVIDIA Drive PX http://www.nvidia.ca/object/drive-px.html

Page 13: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

13

Cireşan et al. (2013). Mitosis detection in breast cancer histology images with deep neural networks. In Medical Image Computing and Computer-Assisted Intervention.

Page 14: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

14

Scyfer (U Amsterdam spinoff) http://scyfer.nl/case-3d-mri-brain-scan-analysis/

Page 15: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

15

“Beautiful Me” App http://btfl.me/

Page 16: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

DEEP LEARNING IN THE NEWS

• Audio applications • Music recommendation • Speech recognition

16

Page 17: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

17

Recommending music on Spotify with Deep Learning – Sander Dieleman Blog Post (2014) http://benanne.github.io/2014/08/05/spotify-cnns.html

Page 18: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

18

Baidu Research (2015) http://usa.baidu.com/deep-speech-accurate-speech-recognition-with-gpu-accelerated-deep-learning/

Page 19: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

DEEP LEARNING IN THE NEWS

• Game-playing AI • Deep Reinforcement Learning

19

Page 20: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

20

Two-dimensional t-SNE embedding of the representations in the last hidden layer assigned by DQN to game states experienced while playing Space Invaders.

Learning to play Atari 2600 games with Deep Reinforcement Learning - Mnih et al. (2015) Nature doi:10.1038/nature14236

A visualization of the learned value function on the game Breakout.

Page 21: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

DEEP LEARNING IN THE NEWS

• GPU Enabling Technology • Mass-market hardware • CUDA libraries

21

Page 24: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

“DEPTH” DEFINITION AND BENEFITS

25

Page 25: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

WHAT IS “DEPTH”

• It's deep if it has more than one stage of non-linear feature transformation (LeCun & Ranzato 2013)

26

Figures from Bengio (2009)

Deep Feedforward Neural Net

Page 26: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

27

Slide from LeCun & Ranzato (2013)

Page 27: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

28

Slide from LeCun & Ranzato (2013)

Page 28: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

BENEFITS OF DEPTH

• Replaces feature engineering “by hand”

• More compact (fewer nodes than equivalent shallow

net) • Theoretical arguments suggest improved training,

generalization • (Bengio et al. various papers)

• Appears to be how the brain works

• …Because it works (now giving state-of-art results on

many tasks)

29

Page 29: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

31

Visualization of nearest-neighbors in top network layer code [Krizhevsky et al 2012]

Semantic class separation, visualized with t-SNE [Donahue et al 2014]

Page 30: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

WHAT HAS CHANGED?

• Is Deep Learning anything different from previous Neural Nets research?

• In fact, both “yes” and “no” • And trends have flip-flopped in the short period

from 2006 - present

33

Page 31: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

New Concepts

• Build a better representation via Unsupervised Learning

• Can then transfer to Supervised tasks

• Leverage massive amounts of unlabelled data

• Probabilistic, generative

network types • Restricted Boltzmann Machine

• New training algorithms • Greedy, layer-wise pre-

training • Stochastic sampling-based

estimation

34

More of the Same; Minor Tweaks

• More computing power • GPU • Cloud, cluster

• Big data • Crowd-sourced labels

• “Good old” feed-forward multi-layer NN’s

• Supervised learning • Backpropagation (SGD)

• New (and old*) tricks • Convolution*, dropout,

rectified linear units…

OR

Page 32: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

• Hinton’s perspective on Deep Learning circa 2006-2010

• Geoff Hinton “The Next Generation of Neural Networks”, for GoogleTechTalks, 2007 https://www.youtube.com/watch?v=AyzOUbkUf3M

• E.g. 4:20

35

Page 33: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

DEEP LEARNING HISTORY Excerpts from Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85-117. • 1969: book (Minsky & Papert, 1969) on the limitations of simple linear perceptrons

with a single layer discouraged some researchers from further studying NNs. • 1979: the Neocognitron (Fukushima, 1979, 1980, 2013a) was perhaps the first

artificial NN that deserved the attribute deep, and the first to incorporate […] neurophysiological insights

• 1986: a paper significantly contributed to the popularization of BP for NNs (Rumelhart, Hinton, & Williams, 1986), experimentally demonstrating the emergence of useful internal representions

• 1989: backpropagation (Section 5.5) was applied (LeCun et al., 1989; LeCun, Boser, et al., 1990; LeCun, Bottou, Bengio, & Haffner, 1998) to Neocognitron-like, weight-sharing, convolutional neural layers with adaptive connections.

• 1991: by the late 1980s, experiments had indicated that traditional deep feedforward or recurrent networks are hard to train by backpropagation (BP) Hochreiter’s (1991, thesis) work formally identified a major reason: Typical deep NNs suffer from the now famous problem of vanishing or exploding gradients.

36

Page 34: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

DEEP LEARNING HISTORY Excerpts from Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85-117. • ~1995-2005: In the decade around 2000, many practical and commercial

pattern recognition applications were dominated by non-neural machine learning methods such as Support Vector Machines (SVMs) (Schölkopf et al., 1998; Vapnik, 1995).

• 2006: While learning networks with numerous non-linear layers date back at least to 1965 and explicit DL research results have been published at least since 1991, the expression Deep Learning was actually coined around 2006, when unsupervised pre-training of deep FNNs helped to accelerate subsequent SL through BP (Hinton, Osindero, & Teh, 2006; Hinton & Salakhutdinov, 2006). a DBN fine-tuned by BP achieved 1.2% error rate (Hinton & Salakhutdinov, 2006) on the MNIST handwritten digits (Sections This result helped to arouse interest in DBNs. DBNs also achieved good results on phoneme recognition, with an error rate of 26.7% on the TIMIT core test set (Mohamed & Hinton, 2010)

37

Page 35: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

DEEP LEARNING HISTORY Excerpts from Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85-117. • 2012: an ensemble of (supervised) GPU-based Max-Pooling

Convolutional Neural Nets achieved best results on the ImageNet classification benchmark (Krizhevsky, Sutskever, & Hinton, 2012), which is popular in the computer vision community.

• Also in 2012: the biggest NN so far (109 free parameters) was trained in unsupervised mode on unlabeled data (Le et al., 2012), then applied to ImageNet. The codes across its top layer were used to train a simple supervised classifier, which achieved best results so far on 20,000 classes. Instead of relying on efficient GPU programming, this was done by brute force on 1000standard machines with 16,000 cores.

38

Page 36: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

DEEP LEARNING HISTORY Excerpts from Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85-117. • ~2015 (present day): Most competition-winning or benchmark

record-setting Deep Learners actually use one of two supervised techniques: (a) recurrent Long Short-Term Memory (LSTM) (1997) trained by Connectionist Temporal Classification (CTC) (2006), or (b) feedforward GPU-based Max-Pooling Convolutional Neural Nets (2011) based on CNNs (1979) plus MP (1992), trained through Backpropagation (1989–2007).

39

Page 37: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

• Y. LeCun in IEEE Spectrum interview: “A lot of us involved in the resurgence of Deep Learning in the mid-2000s, including Geoff Hinton, Yoshua Bengio, and myself—the so-called “Deep Learning conspiracy”—as well as Andrew Ng, started with the idea of using unsupervised learning more than supervised learning. Unsupervised learning could help “pre-train” very deep networks. We had quite a bit of success with this, but in the end, what ended up actually working in practice was good old supervised learning, but combined with convolutional nets, which we had over 20 years ago. But from a research point of view, what we’ve been interested in is how to do unsupervised learning properly. We now have unsupervised techniques that actually work. The problem is that you can beat them by just collecting more data, and then using supervised learning. This is why in industry, the applications of Deep Learning are currently all supervised. But it won’t be that way in the future.”

40

http://spectrum.ieee.org/automaton/robotics/artificial-intelligence/facebook-ai-director-yann-lecun-on-deep-learning

Page 38: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

41

Shallow Deep

Adapted from LeCun & Ranzato (2013)

Page 39: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

42

Page 40: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

43

Page 41: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

44

Page 42: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

RESTRICTED BOLTZMANN MACHINE (RBM)

• Unsupervised, Probabilistic, Energy-based model • Shallow building block for Deep Belief Network (DBN) and Deep

Boltzmann Machine (DBM)

45

Restricted Boltzmann Machine

Hidden Layer

(or “v”) Visible Data Layer

Page 43: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

OBJECTIVE FUNCTION FOR UNSUPERVISED LEARNING

46

• For supervised learning, minimize Training Error {difference between model’s P(y|x) and true (y,x) data}

• Equivalent for Unsupervised, Generative model? Maximize Likelihood of Train/Test sets under the model, i.e. model’s P(x) where x is training data

From http://imonad.com/rbm/restricted-boltzmann-machine/

Training Distribution Learned Model

Page 44: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

• Energy of the network: • Likelihood of a datapoint P(v) is hard to find directly: probability is

known relative to all possible states of the net!! (sum of all states is called Partition Function Z)

• Block Gibbs Sampling: a “back and forth” technique • Use to find a hidden-layer “representation” for known visible vector

(inference) • Use to generate a sample from the model’s probability distribution

RESTRICTED BOLTZMANN MACHINE (RBM)

47

Binary, probabilistic neurons (on / off) Propagation of activation:

Page 45: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

Contrastive Divergence learning • Uses 1 or a few passes of Gibbs

Sampling • Updates done on Minibatches (e.g.

100 to 1k input vectors at once) • Good for convergence • Efficient on GPU (matrix multiply)

48

Stochastic Gradient Descent (pink) vs. Batch Gradient Descent (red) [http://www.holehouse.org/mlclass/17_Large_Scale_Machine_Learning.html]

http://journal.frontiersin.org/article/10.3389/fnins.2013.00272/full

Page 46: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

49

Contrastive Divergence for DBNs (Bengio et al. 2009)

Page 47: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

STACKING RBMS TO FORM A DEEP BELIEF NET (DBN)

• Now comes the “deep” part… • Greedy, Layer-wise, Unsupervised learning

• Hold low-layer weights constant and “stack” a new RBM on top of net, train that using Contrastive Divergence again

50

http://www.iro.umontreal.ca/~lisa/twiki/bin/view.cgi/Public/DeepBeliefNetworks

Lower layers now operate like a straight feedforward (or feedback) net!

Page 48: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

51

Greedy, layer-wise stacking for DBNs (Bengio et al. 2009)

Page 49: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

• To use a DBN for classification, supply class label data along with bottom-up node data when training the top layer

• Another method is to add a logistic regression layer (or other classifier) on the top and train it from the top-layer representation of data

52

http://www.iro.umontreal.ca/~lisa/twiki/bin/view.cgi/Public/DeepVsShallowComparisonICML2007

Page 50: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

• “Fine-tuning” techniques adjust the whole network’s parameters simultaneously

• For supervised learning, can use Backpropagation! • Unsupervised learning algorithms also exist…

• “Mean field” algorithms propagate real-number

probability values as activations instead of stochastically sampling binary values

53

RBM/DBN - MISC. IMPORTANT CONCEPTS

Page 51: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

DEEP BOLTZMANN MACHINES (DBM)

• Unlike DBN, the DBM retains true bidirectional connections at all layers, even once stacked

• Potentially better for generative use • More complicated to train • Slower to run

54

From Salakhutdinov & Hinton (2009 AISTATS)

Page 52: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

NON-BINARY NODES FOR RBM

• Useful for e.g. image data • Gaussian-Bernoulli nodes: handles real values at input layer,

binary values at hidden layer

• Spike-and-Slab nodes (Courville et al. 2011, AISTATS)

55

(from Hinton 2012)

Page 53: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

GENERATING SAMPLES FROM THE MODEL

56

Digit images generated from a DBN with digit labels clamped (per row) [Hinton et al 2006]

Page 54: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

TESTING GENERATIVE (IMAGE) DL WITH DATA VISUALIZATION METHODS

(MOSTLY QUALITATIVE) • Display generated

samples in paper; very common

• Look for novel re-combination of factors

• Generate image completions

Shape Boltzmann Machine [Eslami et al 2012]

HD-DBM [Salakhutdinov et al 2013]

TssRBM, TGaussRBM [Luo et al 2012]

57

Page 55: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

(Courville et al 2013)

Spike-and-Slab RBM Samples

Nearest Pixel-wise Training Samples

58

Page 56: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

IMPLEMENTING RBM, DBN IN CODE

• Matlab code example (Salakhutdinov DBM) • In essence, a lot of “Back and Forth” propagation for sampling • Handled well on a GPU: Matrix multiplication (whole mini-batch at

a time)

59

Let’s look at Matlab code from http://www.utstat.toronto.edu/~rsalakhu/DBM.html ...

Page 57: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

60 RBM Training (1/2)

Page 58: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

61 RBM Training (2/2)

Page 59: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

62

Page 60: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

CONVOLUTIONAL NETWORKS (CNN)

• Most commonly for image processing (also audio, etc.) • Receptive fields & Tied weights create “Feature Maps” similar to convolution kernel

in image proc.

63

Tiled CNNs (Le et al. NIPS 2010)

http://www.deeplearning.net/tutorial/lenet.html

Page 61: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

64

Technique for Visualizing CNN Features (Zeiler & Fergus, ECCV 2014)

Page 62: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

CNN STRUCTURE 65

• local receptive fields - nodes which only connect to a limited subset of lower-layer nodes, based on topology

• shared weights - separate connections in the neural net which are constrained to have the same weight

• sub-sampling - combining spatially adjacent low-level features into one high-level feature

A convolution layer implements receptive fields and shared weights. It is divided into multiple “feature maps”, which can be visualized as 2d grids (“planes”). Each node in a given grid learns the same set of weights, but is connected to a different spatial patch (receptive field) in its input layer. This is conceptually equivalent to a filter kernel, or feature detector, being swept over each location of (i.e. convolved with) the input. A pooling layer performs subsampling by having planes of half the length and width of the lower layer, in which each node integrates inputs from a 2x2 receptive field.

LeCun et al. (1998)

Page 63: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

66

Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton (2012) ImageNet Classification with Deep Convolutional Neural Networks, NIPS.

Page 64: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

CONVOLUTIONAL NETS FOR IMAGE RECOGNITION

• “Supervised training using stochastic gradient descent and the backpropagation algorithm (just repeated application of the chain rule)” from Krizhevsky et al. (2012) LSVRC slides

• New tricks: Dropout, ReLUs, Data augmentation

67

Page 65: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

68

Slide from LeCun & Ranzato (2013)

Page 66: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

ImageNet classification examples from [Krizhevsky et al 2012]. “Each meaningful concept in WordNet, possibly described by multiple words or word phrases, is called a "synonym set" or "synset". There are more than 100,000 synsets in WordNet, majority of them are nouns (80,000+). In ImageNet, we aim to provide on average 1000 images to illustrate each synset. Images of each concept are quality-controlled and human-annotated.” [http://www.image-net.org/about-overview]

70

Page 67: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

RECTIFIED LINEAR UNITS (RELU)

71

from Krizhevsky et al. (2012) LSVRC slides

Page 68: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

DROPOUT 72

from Krizhevsky et al. (2012) LSVRC slides

• Reduces over-fitting due to co-adaptation of units

Page 69: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

DL LIBRARIES

• THEANO • Python • Includes RBM, DBN types; can experiment with custom algorithms • Helpful tutorials • Emphasis on automatic symbolic differentiation can be confusing

• PYLEARN2: built on Theano, offers some newer models (DBM, S3C), also complicated • CUDA-CONVNET

• Base library for high GPU performance • Feedforward (convolutional) only • C++ • I have not tried it • Apparently can be wrapped inside Theano

• Matlab Code, e.g. http://www.cs.toronto.edu/~rsalakhu/DBM.html • See also https://github.com/rasmusbergpalm/DeepLearnToolbox I have not tried it

• CAFFE • Feedforward (convolutional) only • C++ • Optional Python wrapper:

http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/filter_visualization.ipynb

• NVIDIA CuDNN • Fast primitive operations (e.g. propagate activation thru sigmoid) • Incorporated into Caffe

73

• TORCH7 • Apparently

like Theano but w/ Lua

Page 70: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

FURTHER READING

• My favorite comprehensive review papers are:

• Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(8), 1798–1828.

• Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85-117.

• Suggested readings at http://deeplearning.net/reading-list/ • Google Group on Deep Learning for current events • Hinton, LeCun, Schmidhuber have done recent Reddit A.M.A.s • Yoshua Bengio’s web site at U. Montreal is good reading • FastML Blog can be interesting www.fastml.com • ICML, NIPS (also AISTATS) conferences have many of the top papers

74

Page 71: DEEP MACHINE LEARNING - SFU.ca · PDF fileDEEP MACHINE LEARNING “A Shallow ... Welcome to the Slide Presentation and Question/Answer session for my PhD Comprehensive ... • Convolution*,

THE END – CLOSING THOUGHTS

• It’s fun to observe from the midst of a purported “revolution” • Relevant to SIAT work?

• As consumers • As researchers

• Thanks • QUESTIONS WELCOME

• Also can e-mail me for questions (or grab a coffee)

75