34
Deep Learning: An Introduction for Applied Mathematicians Manuel Baumann June 13, 2018 CSC Reading Group Seminar

Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

Deep Learning: An Introduction forApplied Mathematicians

Manuel Baumann

June 13, 2018

CSC Reading Group Seminar

Page 2: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

Disclaimer

Claim: I am not an expert in machine learning – But we are!

Manuel Baumann, [email protected] Introduction to Deep Learning 2/16

Page 3: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

Reference

C. F. Higham and D. J. Higham (2018). Deep Learning: An Introductionfor Applied Mathematicians. arXiv:1801.05894v1.

Manuel Baumann, [email protected] Introduction to Deep Learning 3/16

Page 4: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

Reference

C. F. Higham and D. J. Higham (2018). Deep Learning: An Introductionfor Applied Mathematicians. arXiv:1801.05894v1.

Manuel Baumann, [email protected] Introduction to Deep Learning 3/16

Page 5: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

Reference

C. F. Higham and D. J. Higham (2018). Deep Learning: An Introductionfor Applied Mathematicians. arXiv:1801.05894v1.

Manuel Baumann, [email protected] Introduction to Deep Learning 3/16

Page 6: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

New names for old friends

Machine Learning Numerical Analysis

neuron smoothed step function

neural network directed graph

training phase parameter fitting

stochastic gradient steepest descent variant

learning rate step size in line search

back propagation adjoint equation

hidden layers parameter (over-)fitting

’deep’ learning large-scale

artificial intelligence fp(x)

??? model-order reduction

Manuel Baumann, [email protected] Introduction to Deep Learning 4/16

Page 7: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

New names for old friends

Machine Learning Numerical Analysis

neuron smoothed step function

neural network directed graph

training phase parameter fitting

stochastic gradient steepest descent variant

learning rate step size in line search

back propagation adjoint equation

hidden layers parameter (over-)fitting

’deep’ learning large-scale

artificial intelligence fp(x)

??? model-order reduction

Manuel Baumann, [email protected] Introduction to Deep Learning 4/16

Page 8: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

New names for old friends

Machine Learning Numerical Analysis

neuron smoothed step function

neural network directed graph

training phase parameter fitting

stochastic gradient steepest descent variant

learning rate step size in line search

back propagation adjoint equation

hidden layers parameter (over-)fitting

’deep’ learning large-scale

artificial intelligence fp(x)

??? model-order reduction

Manuel Baumann, [email protected] Introduction to Deep Learning 4/16

Page 9: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

New names for old friends

Machine Learning Numerical Analysis

neuron smoothed step function

neural network directed graph

training phase parameter fitting

stochastic gradient steepest descent variant

learning rate step size in line search

back propagation adjoint equation

hidden layers parameter (over-)fitting

’deep’ learning large-scale

artificial intelligence fp(x)

??? model-order reduction

Manuel Baumann, [email protected] Introduction to Deep Learning 4/16

Page 10: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

New names for old friends

Machine Learning Numerical Analysis

neuron smoothed step function

neural network directed graph

training phase parameter fitting

stochastic gradient steepest descent variant

learning rate step size in line search

back propagation adjoint equation

hidden layers parameter (over-)fitting

’deep’ learning large-scale

artificial intelligence fp(x)

??? model-order reduction

Manuel Baumann, [email protected] Introduction to Deep Learning 4/16

Page 11: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

New names for old friends

Machine Learning Numerical Analysis

neuron smoothed step function

neural network directed graph

training phase parameter fitting

stochastic gradient steepest descent variant

learning rate step size in line search

back propagation adjoint equation

hidden layers parameter (over-)fitting

’deep’ learning large-scale

artificial intelligence fp(x)

??? model-order reduction

Manuel Baumann, [email protected] Introduction to Deep Learning 4/16

Page 12: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

New names for old friends

Machine Learning Numerical Analysis

neuron smoothed step function

neural network directed graph

training phase parameter fitting

stochastic gradient steepest descent variant

learning rate step size in line search

back propagation adjoint equation

hidden layers parameter (over-)fitting

’deep’ learning large-scale

artificial intelligence fp(x)

??? model-order reduction

Manuel Baumann, [email protected] Introduction to Deep Learning 4/16

Page 13: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

New names for old friends

Machine Learning Numerical Analysis

neuron smoothed step function

neural network directed graph

training phase parameter fitting

stochastic gradient steepest descent variant

learning rate step size in line search

back propagation adjoint equation

hidden layers parameter (over-)fitting

’deep’ learning large-scale

artificial intelligence fp(x)

??? model-order reduction

Manuel Baumann, [email protected] Introduction to Deep Learning 4/16

Page 14: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

New names for old friends

Machine Learning Numerical Analysis

neuron smoothed step function

neural network directed graph

training phase parameter fitting → p

stochastic gradient steepest descent variant

learning rate step size in line search

back propagation adjoint equation

hidden layers parameter (over-)fitting

’deep’ learning large-scale

artificial intelligence fp(x)

??? model-order reduction

Manuel Baumann, [email protected] Introduction to Deep Learning 3/16

Page 15: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

New names for old friends

Machine Learning Numerical Analysis

neuron smoothed step function

neural network directed graph

training phase parameter fitting → p

stochastic gradient steepest descent variant

learning rate step size in line search

back propagation adjoint equation

hidden layers parameter (over-)fitting

’deep’ learning large-scale

artificial intelligence fp(x)

??? model-order reduction

Manuel Baumann, [email protected] Introduction to Deep Learning 3/16

Page 16: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

Overview

1. The General Set-up

2. Stochastic Gradient

3. Back Propagation

4. MATLAB Example

5. Current Research Directions

Manuel Baumann, [email protected] Introduction to Deep Learning 4/16

Page 17: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

The General Set-up

The sigmoid function,

σ(x) =1

1 + e−x

models the bahavior of a neuron in the brain.

Manuel Baumann, [email protected] Introduction to Deep Learning 5/16

Page 18: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

A neural network

a[1] = x ∈ Rn1 ,

a[l] = σ(W [l]a[l−1] + b[l]

)∈ Rnl , l = 2, 3, ..., L

Manuel Baumann, [email protected] Introduction to Deep Learning 6/16

Page 19: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

The entering example

minW [j],b[j]

1

10

10∑i=1

1

2‖y(xi)− F (xi)‖22, j ∈ {2, 3, 4},

F (x) = σ(W [4]σ

(W [3]σ

(W [2]x+ b[2]

)+ b[3]

)b[4])∈ R2

Manuel Baumann, [email protected] Introduction to Deep Learning 7/16

Page 20: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

The entering example

minp∈R23

1

10

10∑i=1

1

2‖y(xi)− F (xi)‖22, j ∈ {2, 3, 4},

F (x) = σ(W [4]σ

(W [3]σ

(W [2]x+ b[2]

)+ b[3]

)b[4])∈ R2

Manuel Baumann, [email protected] Introduction to Deep Learning 7/16

Page 21: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

Stochastic Gradient

Objective function:

J (p) = 1

N

N∑i=1

1

2‖y(xi)− a[L](xi, p)‖22=:

1

N

N∑i=1

Ci(xi, p)

Steepest descent:

p← p− η∇J (p), η ∈ R+ is called ’learning rate’.

Stochastic gradient:

∇J (p) =≈

Manuel Baumann, [email protected] Introduction to Deep Learning 8/16

Page 22: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

Stochastic Gradient

Objective function:

J (p) = 1

N

N∑i=1

1

2‖y(xi)− a[L](xi, p)‖22=:

1

N

N∑i=1

Ci(xi, p)

Steepest descent:

p← p− η∇J (p), η ∈ R+ is called ’learning rate’.

Stochastic gradient:

∇J (p) =≈

Manuel Baumann, [email protected] Introduction to Deep Learning 8/16

Page 23: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

Stochastic Gradient

Objective function:

J (p) = 1

N

N∑i=1

1

2‖y(xi)− a[L](xi, p)‖22 =:

1

N

N∑i=1

Ci(xi, p)

Steepest descent:

p← p− η∇J (p), η ∈ R+ is called ’learning rate’.

Stochastic gradient:

∇J (p) = 1

N

N∑i=1

∇pCi(xi, p) ≈ 1

|I|∑i∈I∇pCi(x

i, p)

Manuel Baumann, [email protected] Introduction to Deep Learning 8/16

Page 24: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

Stochastic Gradient

Objective function:

J (p) = 1

N

N∑i=1

1

2‖y(xi)− a[L](xi, p)‖22 =:

1

N

N∑i=1

Ci(xi, p)

Steepest descent:

p← p− η∇J (p), η ∈ R+ is called ’learning rate’.

Stochastic gradient:

∇J (p) = 1

N

N∑i=1

∇pCi(xi, p) ≈ 1

|I|∑i∈I∇pCi(x

i, p)

Manuel Baumann, [email protected] Introduction to Deep Learning 8/16

Page 25: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

Back Propagation (1/3)

Now, p ∼{[W [l]

]j,k,[b[l]]j

}. Let z[l] :=W [l]a[l−1] + b[l] and δ

[l]j := ∂C

∂z[l]j

.

Lemma: Back Propagation

The partial derivatives are given by,

δ[L] = σ′(z[L]) · (a[L] − y), (1)

δ[l] = σ′(z[l]) · (W [l+1])T δ[l+1], 2 ≤ l ≤ L− 1, (2)

∂C

∂b[l]j

= δ[l]j , 2 ≤ l ≤ L, (3)

∂C

∂w[l]jk

= δ[l]j a

[l−1]k , 2 ≤ l ≤ L. (4)

Manuel Baumann, [email protected] Introduction to Deep Learning 9/16

Page 26: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

Back Propagation (2/3)Proof.

We prove (1) component-wise:

δ[L]j =

∂C

∂z[L]j

=∂C

∂a[L]j

∂a[L]j

∂z[L]j

= (a[L]j − yj)σ

′(z[L]j ) = (a

[L]j − yj)(σ(z

[L]j )− σ2(z

[L]j ))

Next, we prove (2) component-wise:

δ[l]j =

∂C

∂z[l]j

=

nl+1∑k=1

∂C

∂z[l+1]k

∂z[l+1]k

∂z[l]j

=

nl+1∑k=1

δ[l+1]k

∂z[l+1]k

∂z[l]j

=

nl+1∑k=1

δ[l+1]k w

[l+1]kj σ′(z

[l]j ),

where z[l+1]k =

∑nl

s=1 w[l+1]ks σ(z

[l]s ) + b

[l+1]k .

(3) and (4) similar.

Manuel Baumann, [email protected] Introduction to Deep Learning 10/16

Page 27: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

Back Propagation (2/3)Proof.

We prove (1) component-wise:

δ[L]j =

∂C

∂z[L]j

=∂C

∂a[L]j

∂a[L]j

∂z[L]j

= (a[L]j − yj)σ

′(z[L]j ) = (a

[L]j − yj)(σ(z

[L]j )− σ2(z

[L]j ))

Next, we prove (2) component-wise:

δ[l]j =

∂C

∂z[l]j

=

nl+1∑k=1

∂C

∂z[l+1]k

∂z[l+1]k

∂z[l]j

=

nl+1∑k=1

δ[l+1]k

∂z[l+1]k

∂z[l]j

=

nl+1∑k=1

δ[l+1]k w

[l+1]kj σ′(z

[l]j ),

where z[l+1]k =

∑nl

s=1 w[l+1]ks σ(z

[l]s ) + b

[l+1]k .

(3) and (4) similar.

Manuel Baumann, [email protected] Introduction to Deep Learning 10/16

Page 28: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

Back Propagation (2/3)Proof.

We prove (1) component-wise:

δ[L]j =

∂C

∂z[L]j

=∂C

∂a[L]j

∂a[L]j

∂z[L]j

= (a[L]j − yj)σ

′(z[L]j ) = (a

[L]j − yj)(σ(z

[L]j )− σ2(z

[L]j ))

Next, we prove (2) component-wise:

δ[l]j =

∂C

∂z[l]j

=

nl+1∑k=1

∂C

∂z[l+1]k

∂z[l+1]k

∂z[l]j

=

nl+1∑k=1

δ[l+1]k

∂z[l+1]k

∂z[l]j

=

nl+1∑k=1

δ[l+1]k w

[l+1]kj σ′(z

[l]j ),

where z[l+1]k =

∑nl

s=1 w[l+1]ks σ(z

[l]s ) + b

[l+1]k .

(3) and (4) similar.

Manuel Baumann, [email protected] Introduction to Deep Learning 10/16

Page 29: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

Back Propagation (3/3)

Interpretation:

Evaluation of a[L] requires aso-called forward pass:a[1], z[2] → a[2], z[3] → ...→ a[L]

Compute: σ[L] via (1)

Compute: backward pass (2)σ[L] → σ[L−1] → ...→ σ[2]

Gradients via (3)-(4)

Manuel Baumann, [email protected] Introduction to Deep Learning 11/16

Page 30: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

MATLAB Example

Manuel Baumann, [email protected] Introduction to Deep Learning 12/16

Page 31: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

Some more detailsConvolutional Neural Network:

Presented approach unfeasible for large data (W [l] is dense).

Layers can be pre- and post-precessing steps used in image analysis;filtering, max pooling, average pooling, ...

Avoiding Overfitting:

Trained network works well on given data, but not on new data.

Splitting: training – validation data

Dropout: independently remove neurons

Manuel Baumann, [email protected] Introduction to Deep Learning 13/16

Page 32: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

Current Research Directions

Research directions:

proofs – e.g. when data is assumed to be i.i.d.

in practice: design of layers

perturbation theory: update trained network

autoencoders: ‖x−G(F (x))‖22 → min

Two software examples:

http://www.vlfeat.org/matconvnet/

http://scikit-learn.org/

Manuel Baumann, [email protected] Introduction to Deep Learning 14/16

Page 33: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

Summary

New names for old friends.

neuron smoothed step function

neural network directed graph

training phase parameter fitting → p

stochastic gradient steepest descent variant

learning rate step size in line search

back propagation adjoint equation

hidden layers parameter (over-)fitting

’deep’ learning large-scale

artificial intelligence fp(x)

PCA POD

Manuel Baumann, [email protected] Introduction to Deep Learning 15/16

Page 34: Deep Learning: An Introduction for Applied Mathematiciansmanuelbaumann.de/projects/reading-ml.pdf · Reference C. F. Higham and D. J. Higham (2018). Deep Learning: An Introduction

Further readings

Andrew Ng. Coursera Machine Learning (online courses)

M. Nielsen, Neural Networks and Deep Learning, Determination Press,2015.

I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press,Boston, 2016.

Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, 521(2015), pp. 436444.

S. Mallat, Understanding deep convolutional networks, PhilosophicalTransactions of the Royal Society of London A, 374 (2016).

Manuel Baumann, [email protected] Introduction to Deep Learning 16/16