86
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018 Midterm Review May 4th, 2018 Midterm Review

Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

  • Upload
    lynhu

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018Midterm Review May 4th, 2018

Midterm Review

Page 2: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

50 minutes is short!

This is just to help you get going with your studies.

Midterm Review May 4th, 2018

Page 3: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Overview of today’s sessionSummary of Course Material:

● How we “power” neural networks:○ Loss function○ Optimization

● How we build complex network models○ Nonlinear Activations○ Convolutional Layers

● How we “rein in” complexity○ Regularization

Practice Midterm ProblemsQ&A, time permitting

Page 4: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Overview of today’s sessionSummary of Course Material

● How we “power” neural networks:○ Loss function○ Optimization

● How we build complex network models○ Nonlinear Activations○ Convolutional Layers

● How we “rein in” complexity○ Regularization

Practice Midterm Problems

Page 5: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 20185

Lecture 3:Loss Functions

and Optimization

Page 6: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

An optimization problemAt the end of the day, we want to train a model that performs a desired task well – and a proxy for best achieving this is minimizing a loss function

Page 7: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 20187

SVM/Softmax Loss- We have some dataset of (x,y)- We have a score function: - We have a loss function:

e.g.

Softmax

SVM

Full loss

Page 8: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 20188

Know how to derive the SVM and Softmax gradients!

Page 9: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Stochastic Gradient Descent (SGD)

9

Full sum expensive when N is large!

Approximate sum using a minibatch of examples32 / 64 / 128 common

Page 10: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Learning Rate Loss Curves

Page 11: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - April 24, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - April 24, 201811

Optimization: Problems with SGD

What if the loss function has a local minima or saddle point?

Page 12: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - April 24, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - April 24, 201812

Optimization: Problems with SGD

What if the loss function has a local minima or saddle point?

Zero gradient, gradient descent gets stuck

Page 13: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - April 24, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - April 24, 201813

Optimization: Problems with SGDWhat if loss changes quickly in one direction and slowly in another?What does gradient descent do?Very slow progress along shallow dimension, jitter along steep direction

Loss function has high condition number: ratio of largest to smallest singular value of the Hessian matrix is large

Page 14: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - April 24, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - April 24, 201814

Optimization: Problems with SGD

Our gradients come from minibatches so they can be noisy!

Page 15: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - April 24, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - April 24, 201815

Update Rules

SGDMomentumNesterov MomentumAdaGradRMSPropAdam

Page 16: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Overview of today’s sessionSummary of Course Material:

● How we “power” neural networks:○ Loss function○ Optimization

● How we build complex network models○ Nonlinear Activations○ Convolutional Layers

● How we “rein in” complexity○ Regularization

Practice Midterm Problems

Page 17: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 6 - April 19, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 6 - April 19, 201817

Lecture 6:Training Neural Networks,

Part I

Page 18: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 6 - April 19, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 6 - April 19, 201818

Activation FunctionsSigmoid

tanh

ReLU

Leaky ReLU

Maxout

ELU

Page 19: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 6 - April 19, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 6 - April 19, 201819

Activation Functions

Sigmoid

- Squashes numbers to range [0,1]- Historically popular since they

have nice interpretation as a saturating “firing rate” of a neuron

3 problems:

1. Saturated neurons “kill” the gradients

2. Sigmoid outputs are not zero-centered

3. exp() is a bit compute expensive

Page 20: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 6 - April 19, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 6 - April 19, 201820

Consider what happens when the input to a neuron is always positive...

What can we say about the gradients on w?Always all positive or all negative :((this is also why you want zero-mean data!)

hypothetical optimal w vector

zig zag path

allowed gradient update directions

allowed gradient update directions

Page 21: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 6 - April 19, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 6 - April 19, 201821

Activation Functions

ReLU(Rectified Linear Unit)

- Computes f(x) = max(0,x)

- Does not saturate (in +region)- Very computationally efficient- Converges much faster than

sigmoid/tanh in practice (e.g. 6x)- Actually more biologically plausible

than sigmoid

- Not zero-centered output- An annoyance:

hint: what is the gradient when x < 0?

Page 22: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 6 - April 19, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 6 - April 19, 201822

DATA CLOUDactive ReLU

dead ReLUwill never activate => never update

h = WX + bo = relu(h)

Page 23: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 6 - April 19, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 6 - April 19, 201823

DATA CLOUDactive ReLU

dead ReLUwill never activate => never update

h = WX + bo = relu(h)

do / dh = 0

Page 24: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 6 - April 19, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 6 - April 19, 201824

DATA CLOUDactive ReLU

dead ReLUwill never activate => never update

h = WX + bo = relu(h)

do / dh = 0dL / dh = 0

Page 25: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 6 - April 19, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 6 - April 19, 201825

DATA CLOUDactive ReLU

dead ReLUwill never activate => never update

h = WX + bo = relu(h)

do / dh = 0dL / dh = 0dL / dh * dh / dW = 0

Page 26: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - April 24, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - April 24, 201826

Vanishing/Exploding Gradient

Vanishing Gradient:- Gradient becomes too small

Page 27: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - April 24, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - April 24, 201827

Vanishing/Exploding Gradient

Vanishing Gradient:- Gradient becomes too small- Some causes:

- Choice of activation function- Multiplying many small numbers

together

Page 28: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - April 24, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - April 24, 201828

Vanishing/Exploding Gradient

Vanishing Gradient:- Gradient becomes too small- Some causes:

- Choice of activation function- Multiplying many small numbers

together

Exploding Gradient:- Gradient becomes too large

Page 29: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 3, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 3, 201829

Vanilla RNN Gradient Flow

h0 h1 h2 h3 h4

x1 x2 x3 x4

Largest singular value > 1: Exploding gradients

Largest singular value < 1:Vanishing gradients

Gradient clipping: Scale gradient if its norm is too bigComputing gradient

of h0 involves many factors of W(and repeated tanh)

Bengio et al, “Learning long-term dependencies with gradient descent is difficult”, IEEE Transactions on Neural Networks, 1994Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013

Page 30: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Overview of today’s sessionSummary of Course Material:

● How we “power” neural networks:○ Loss function○ Optimization

● How we build complex network models○ Nonlinear Activations○ Convolutional Layers

● How we “rein in” complexity○ Regularization

Practice Midterm Problems

Page 31: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung April 17, 2018Lecture 5 -Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - April 17, 201831

32

32

3

Convolution Layer32x32x3 image5x5x3 filter

convolve (slide) over all spatial locations

activation map

1

28

28

Page 32: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung April 17, 2018Lecture 5 -Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - April 17, 201832

32

32

3

Convolution Layer

activation maps

6

28

28

For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps:

We stack these up to get a “new image” of size 28x28x6!

Page 33: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung April 17, 2018Lecture 5 -Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - April 17, 2018

Convolution LayerIn contrast to fully connected layer, Each term in output is dependent on spatially local ‘subregions’ of input

Page 34: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung April 17, 2018Lecture 5 -Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - April 17, 2018

Convolution LayerIn contrast to fully connected layer, Each term in output is dependent on spatially local ‘subregions’ of input

Page 35: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung April 17, 2018Lecture 5 -Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - April 17, 2018

Convolution LayerIn contrast to fully connected layer, Each term in output is dependent on spatially local ‘subregions’ of input

Question: connection between an FC layer and a convolutional layer?

Page 36: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung April 17, 2018Lecture 5 -Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 5 - April 17, 2018

Convolution LayerIn contrast to fully connected layer, Each term in output is dependent on spatially local ‘subregions’ of input

Question: connection between an FC layer and a convolutional layer?Answer: FC looks like convolution layer with filter size HxW

Page 37: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Overview of today’s sessionSummary of Course Material:

● How we “power” neural networks:○ Loss function○ Optimization

● How we build complex network models○ Nonlinear Activations○ Convolutional Layers

● How we “rein in” complexity○ Regularization

Practice Midterm Problems

Page 38: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Drawbacks of increased complexity: Overfitting(Bias vs Variance)

Source: Wikipedia

Page 39: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Combat overfitting● Increase data quantity/quality● Impose extra constraints● Introduce randomness/uncertainty

Page 40: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Combat overfitting● Increase data quantity/quality

○ Data augmentation

● Impose extra constraints ○ On model parameters: L2 regularization○ On layer outputs: Batchnorm

● Introduce randomness/uncertainty ○ Dropout○ Batchnorm○ Stochastic depth, drop connect

Page 41: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Overview of today’s sessionSummary of Course Material:

● How we “power” neural networks:○ Loss function○ Optimization

● How we build complex network models○ Nonlinear Activations○ Convolutional Layers

● How we “rein in” complexity○ Regularization

Practice Midterm Problems

Page 42: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Page 43: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Page 44: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Receptive field size

‘Input data seen/received’ in single activation layer ‘pixel’

Input Conv2d

Page 45: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Receptive field size

Input Conv2d

‘Input data seen/received’ in single activation layer ‘pixel’

Page 46: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Receptive field size

‘Input data seen/received’ in single output layer ‘pixel’

Input Conv2d

Page 47: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Input Conv2d

Page 48: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Input Conv2d

Page 49: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Summary

Page 50: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Summary

Note: Generally when we refer to ‘receptive field’, we mean with respect to input data/layer 0/original image,

not with respect to direct input to the layer

Page 51: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Summary

(Need to compute recursively!)

Page 52: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Going back to activation dimensions...

Page 53: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Cumulative receptive field of layer output = layer input

Going back to activation dimensions...

Page 54: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Cumulative receptive field of layer output = layer input

Going back to activation dimensions...

Page 55: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Activation dimensions

Page 56: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Activation dimensions

Page 57: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Activation dimensions

Page 58: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Summary

Page 59: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Receptive field size

Conv2dk=3, s=1

Conv2dk=5, s=1

k=3, s=1, m=1

Page 60: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Receptive field sizek=3, s=1, m=1

n=3

Conv2dk=3, s=1

Conv2dk=5, s=1

Page 61: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Receptive field sizek=3, s=1, m=1

n=3

Conv2dk=3, s=1

Conv2dk=5, s=1

Page 62: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Receptive field sizek=5, s=1, m=3

Conv2dk=3, s=1

Conv2dk=5, s=1

Page 63: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Receptive field sizek=5, s=1, m=3

n=7

Conv2dk=3, s=1

Conv2dk=5, s=1

Page 64: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - May 1, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 9 - May 1, 2018

Case Study: VGGNet

64

[Simonyan and Zisserman, 2014]

Q: Why use smaller filters? (3x3 conv)

3x3 conv, 128

Pool

3x3 conv, 64

3x3 conv, 64

Input

3x3 conv, 128

Pool

3x3 conv, 256

3x3 conv, 256

Pool

3x3 conv, 512

3x3 conv, 512

Pool

3x3 conv, 512

3x3 conv, 512

Pool

FC 4096

FC 1000

Softmax

FC 4096

3x3 conv, 512

3x3 conv, 512

3x3 conv, 384

Pool

5x5 conv, 256

11x11 conv, 96

Input

Pool

3x3 conv, 384

3x3 conv, 256

Pool

FC 4096

FC 4096

Softmax

FC 1000

Pool

Input

Pool

Pool

Pool

Pool

Softmax

3x3 conv, 512

3x3 conv, 512

3x3 conv, 256

3x3 conv, 256

3x3 conv, 128

3x3 conv, 128

3x3 conv, 64

3x3 conv, 64

3x3 conv, 512

3x3 conv, 512

3x3 conv, 512

3x3 conv, 512

3x3 conv, 512

3x3 conv, 512

FC 4096

FC 1000

FC 4096

AlexNet VGG16 VGG19

Stack of three 3x3 conv (stride 1) layers has same effective receptive field as one 7x7 conv layer

But deeper, more non-linearities

And fewer parameters: 3 * (32C2) vs. 72C2 for C channels per layer

Page 65: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Page 66: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Page 67: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Page 68: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Chain Rule

Page 69: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Chain Rule?

Page 70: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Chain Rule!

Page 71: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Page 72: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Page 73: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Page 74: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Page 75: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Page 76: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 6 - April 19, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 6 - April 19, 201876

Loss

time

Bad initialization a prime suspect

Page 77: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Page 78: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Page 79: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Page 80: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Page 81: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Page 82: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Page 83: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Page 84: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018Midterm Review May 4th, 2018

Symmetry Breaking

W1 X + b1b1

max(x, 0) W2 X + b2

max(b1, 0)

W1 X + b1 Lossb2

L

Page 85: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018

Page 86: Midterm Review - cs231n.stanford.educs231n.stanford.edu/slides/2018/cs231n_2018_midterm_review.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Midterm ReviewLecture 3 - May 4th,

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 10, 2018