79
Lecture 6 recap Prof. Leal-Taixé and Prof. Niessner 1

Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Lecture 6 recap

Prof. Leal-Taixé and Prof. Niessner 1

Page 2: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Neural Network

Depth

Wid

th

Prof. Leal-Taixé and Prof. Niessner 2

Page 3: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Gradient Descent for Neural Networks

𝑥0

𝑥1

𝑥2

ℎ0

ℎ1

ℎ2

ℎ3

𝑦0

𝑦1

𝑡0

𝑡1

𝑦𝑖 = 𝐴(𝑏1,𝑖 +

𝑗

ℎ𝑗𝑤1,𝑖,𝑗)ℎ𝑗 = 𝐴(𝑏0,𝑗 +

𝑘

𝑥𝑘𝑤0,𝑗,𝑘)

𝐿𝑖 = 𝑦𝑖 − 𝑡𝑖2

𝛻𝑤,𝑏𝑓𝑥,𝑡 (𝑤) =

𝜕𝑓

𝜕𝑤0,0,0……𝜕𝑓

𝜕𝑤𝑙,𝑚,𝑛……𝜕𝑓

𝜕𝑏𝑙,𝑚

Just simple: 𝐴 𝑥 = max(0, 𝑥)Prof. Leal-Taixé and Prof. Niessner 3

Page 4: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Stochastic Gradient Descent (SGD)𝜃𝑘+1 = 𝜃𝑘 − 𝛼𝛻𝜃𝐿(𝜃

𝑘 , 𝑥{1..𝑚}, 𝑦{1..𝑚})

𝛻𝜃𝐿 =1

𝑚σ𝑖=1𝑚 𝛻𝜃𝐿𝑖

+ all variations of SGD: momentum, RMSProp, Adam, …

𝑘 now refers to 𝑘-th iteration

𝑚 training samples in the current batch

Gradient for the 𝑘-th batch

Prof. Leal-Taixé and Prof. Niessner 4

Page 5: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Importance of Learning Rate

Prof. Leal-Taixé and Prof. Niessner 5

Page 6: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Over- and Underfitting

Underfitted Appropriate Overfitted

Figure extracted from Deep Learning by Adam Gibson, Josh Patterson, O‘Reily Media Inc., 2017

Prof. Leal-Taixé and Prof. Niessner 6

Page 7: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Over- and Underfitting

Source: http://srdas.github.io/DLBook/ImprovingModelGeneralization.html

Prof. Leal-Taixé and Prof. Niessner 7

Page 8: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Basic recipe for machine learning• Split your data

Find your hyperparameters

20%

train testvalidation

20%60%

Prof. Leal-Taixé and Prof. Niessner 8

Page 9: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Basic recipe for machine learning

Prof. Leal-Taixé and Prof. Niessner 9

Page 10: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Basically…

Prof. Leal-Taixé and Prof. Niessner 10Deep learning memes

Page 11: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Fun things…

Prof. Leal-Taixé and Prof. Niessner 11Deep learning memes

Page 12: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Fun things…

Prof. Leal-Taixé and Prof. Niessner 12Deep learning memes

Page 13: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Fun things…

Prof. Leal-Taixé and Prof. Niessner 13Deep learning memes

Page 14: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Going Deep into Neural Networks

Prof. Leal-Taixé and Prof. Niessner 14

Page 15: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Simple Starting Points for Debugging• Start simple!

– First, overfit to a single training sample– Second, overfit to several training samples

• Always try simple architecture first– It will verify that you are learning something

• Estimate timings (how long for each epoch?)

Prof. Leal-Taixé and Prof. Niessner 15

Page 16: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Neural Network

• Problems of going deeper…– Vanishing gradients (multiplication of chain rule)

• The impact of small decisions (architecture, activation functions...)

• Is my network training correctly?

Prof. Leal-Taixé and Prof. Niessner 16

Page 17: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Neural Networks

Prof. Leal-Taixé and Prof. Niessner 17

3) Input of data

2) Functions in Neurons

1) Output functions

Page 18: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Output Functions

Prof. Leal-Taixé and Prof. Niessner 18

Page 19: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Neural NetworksWhat is the shape of this function?

Loss (Softmax,

Hinge)

Prediction

Prof. Leal-Taixé and Prof. Niessner 19

Page 20: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Sigmoid for Binary Predictions

0

Can be interpreted as a probability

1

Prof. Leal-Taixé and Prof. Niessner 20

Page 21: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Softmax formulation• What if we have multiple classes?

Prof. Leal-Taixé and Prof. Niessner 21

Page 22: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Softmax formulation• What if we have multiple classes?

Softmax

Prof. Leal-Taixé and Prof. Niessner 22

Page 23: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Softmax formulation• What if we have multiple classes?

Softmax

Prof. Leal-Taixé and Prof. Niessner 23

Page 24: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Softmax formulation• Softmax

• Softmax loss (Maximum Likelihood Estimate)

exp

normalize

Prof. Leal-Taixé and Prof. Niessner 24

Page 25: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Loss Functions

Prof. Leal-Taixé and Prof. Niessner 25

Page 26: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Naïve Losses

- Sum of squared differences (SSD)- Prune to outliers- Compute-efficient (optimization)- Optimum is the mean

L2 Loss: 𝐿2 = σ𝑖=1𝑛 𝑦𝑖 − 𝑓 𝑥𝑖

2

L1 Loss: 𝐿1 = σ𝑖=1𝑛 |𝑦𝑖 − 𝑓(𝑥𝑖)|

12 24 42 23

34 32 5 2

12 31 12 31

31 64 5 13

15 20 40 25

34 32 5 2

12 31 12 31

31 64 5 13

𝑥𝑖 𝑦𝑖

𝐿1 𝑥, 𝑦 = 3 + 4 + 2 + 2 + 0 +⋯+ 0 = 15

𝐿2 𝑥, 𝑦 = 9 + 16 + 4 + 4 + 0 +⋯+ 0 = 66- Sum of absolute differences- Robust- Costly to compute- Optimum is the median

Prof. Leal-Taixé and Prof. Niessner 26

Page 27: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Cross-Entropy (Softmax)Softmax 𝐿𝑖 = − log(

𝑒𝑠𝑦𝑖

σ𝑖 𝑒𝑠𝑗)

Suppose: 3 training examples and 3 classes

1.34.92.0

3.25.1-1.7

2.22.5-3.1

catchair“car”

Loss

sco

res

Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁

𝑇

Given a function with weights 𝑊,Training pairs [𝑥𝑖; 𝑦𝑖] (input and labels)

Prof. Leal-Taixé and Prof. Niessner 27

Page 28: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Cross-Entropy (Softmax)Softmax 𝐿𝑖 = − log(

𝑒𝑠𝑦𝑖

σ𝑖 𝑒𝑠𝑗)

Suppose: 3 training examples and 3 classes

1.34.92.0

3.25.1-1.7

2.22.5-3.1

catchair“car”

Loss

sco

res

Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁

𝑇

Given a function with weights 𝑊,Training pairs [𝑥𝑖; 𝑦𝑖] (input and labels)

3.25.1-1.7

Prof. Leal-Taixé and Prof. Niessner 28

Page 29: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Cross-Entropy (Softmax)Softmax 𝐿𝑖 = − log(

𝑒𝑠𝑦𝑖

σ𝑖 𝑒𝑠𝑗)

Suppose: 3 training examples and 3 classes

1.34.92.0

3.25.1-1.7

2.22.5-3.1

catchair“car”

Loss

sco

res

Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁

𝑇

Given a function with weights 𝑊,Training pairs [𝑥𝑖; 𝑦𝑖] (input and labels)

3.25.1-1.7

24.5164.00.18

exp

Prof. Leal-Taixé and Prof. Niessner 29

Page 30: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Cross-Entropy (Softmax)Softmax 𝐿𝑖 = − log(

𝑒𝑠𝑦𝑖

σ𝑖 𝑒𝑠𝑗)

Suppose: 3 training examples and 3 classes

1.34.92.0

3.25.1-1.7

2.22.5-3.1

catchair“car”

Loss

sco

res

Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁

𝑇

Given a function with weights 𝑊,Training pairs [𝑥𝑖; 𝑦𝑖] (input and labels)

3.25.1-1.7

24.5164.00.18

0.130.870.00

exp

norm

aliz

e

Prof. Leal-Taixé and Prof. Niessner 30

Page 31: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Cross-Entropy (Softmax)Softmax 𝐿𝑖 = − log(

𝑒𝑠𝑦𝑖

σ𝑖 𝑒𝑠𝑗)

Suppose: 3 training examples and 3 classes

1.34.92.0

3.25.1-1.7

2.22.5-3.1

catchair“car”

Loss 2.04 0.14 6.94

sco

res

Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁

𝑇

Given a function with weights 𝑊,Training pairs [𝑥𝑖; 𝑦𝑖] (input and labels)

3.25.1-1.7

24.5164.00.18

0.130.870.00

exp

2.040.146.94no

rmal

ize

-lo

g(x

)

Prof. Leal-Taixé and Prof. Niessner 31

Page 32: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Cross-Entropy (Softmax)Softmax 𝐿𝑖 = − log(

𝑒𝑠𝑦𝑖

σ𝑖 𝑒𝑠𝑗)

Suppose: 3 training examples and 3 classes

1.34.92.0

3.25.1-1.7

2.22.5-3.1

catchair“car”

Loss 2.04 0.079 6.156

sco

res

Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁

𝑇

Given a function with weights 𝑊,Training pairs [𝑥𝑖; 𝑦𝑖] (input and labels)

3.25.1-1.7

24.5164.00.18

0.130.870.00

exp

2.040.146.94no

rmal

ize

-lo

g(x

)

𝐿 =1

𝑁

𝑖=1

𝑁

𝐿𝑖 =

=𝐿1 + 𝐿2 + 𝐿3

3=

=2.04 + 0.079 + 6.156

3=

= 𝟐. 𝟕𝟔Prof. Leal-Taixé and Prof. Niessner 32

Page 33: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Hinge Loss (SVM Loss)Multiclass SVM loss 𝐿𝑖 = σ𝑗≠𝑦𝑖

max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)

Prof. Leal-Taixé and Prof. Niessner 33

Page 34: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Hinge Loss (SVM Loss)Multiclass SVM loss 𝐿𝑖 = σ𝑗≠𝑦𝑖

max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)

Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁

𝑇

Given a function with weights 𝑊,Training pairs [𝑥𝑖; 𝑦𝑖] (input and labels)

Prof. Leal-Taixé and Prof. Niessner 34

Page 35: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Hinge Loss (SVM Loss)Multiclass SVM loss 𝐿𝑖 = σ𝑗≠𝑦𝑖

max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)

Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁

𝑇

Suppose: 3 training examples and 3 classes

Given a function with weights 𝑊,Training pairs [𝑥𝑖; 𝑦𝑖] (input and labels)

Prof. Leal-Taixé and Prof. Niessner 35

Page 36: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Hinge Loss (SVM Loss)Multiclass SVM loss 𝐿𝑖 = σ𝑗≠𝑦𝑖

max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)

Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁

𝑇

Suppose: 3 training examples and 3 classes

Given a function with weights 𝑊,Training pairs [𝑥𝑖; 𝑦𝑖] (input and labels)

1.34.92.0

3.25.1-1.7

2.22.5-3.1

catchair“car”sc

ore

s

Prof. Leal-Taixé and Prof. Niessner 36

Page 37: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Hinge Loss (SVM Loss)Multiclass SVM loss 𝐿𝑖 = σ𝑗≠𝑦𝑖

max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)

Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁

𝑇

Suppose: 3 training examples and 3 classes

Given a function with weights 𝑊,Training pairs [𝑥𝑖; 𝑦𝑖] (input and labels)

1.34.92.0

3.25.1-1.7

2.22.5-3.1

catchair“car”

Loss

𝐿1 =max(0, 5.1 − 3.2 + 1) +max(0, −1.7 − 3.2 + 1) == max 0, 2.9 + max(0, −3.9) == 2.9 + 0 == 𝟐. 𝟗

2.9

sco

res

Prof. Leal-Taixé and Prof. Niessner 37

Page 38: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Hinge Loss (SVM Loss)Multiclass SVM loss 𝐿𝑖 = σ𝑗≠𝑦𝑖

max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)

Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁

𝑇

Suppose: 3 training examples and 3 classes

Given a function with weights 𝑊,Training pairs [𝑥𝑖; 𝑦𝑖] (input and labels)

1.34.92.0

3.25.1-1.7

2.22.5-3.1

catchair“car”

Loss

𝐿2 =max(0, 1.3 − 4.9 + 1) +max(0, 2.0 − 4.9 + 1) == max 0, −2.6 + max 0,−1.9 == 0 + 0 == 𝟎

2.9 0

sco

res

Prof. Leal-Taixé and Prof. Niessner 38

Page 39: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Hinge Loss (SVM Loss)Multiclass SVM loss 𝐿𝑖 = σ𝑗≠𝑦𝑖

max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)

Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁

𝑇

Suppose: 3 training examples and 3 classes

Given a function with weights 𝑊,Training pairs [𝑥𝑖; 𝑦𝑖] (input and labels)

1.34.92.0

3.25.1-1.7

2.22.5-3.1

catchair“car”

Loss 2.9 0 12.9

𝐿3 =max(0, 2.2 − (−3.1) + 1) +max(0, 2.5 − (−3.1) + 1) == max 0, 6.3 + max 0, 6.6 == 6.3 + 6.6 == 𝟏𝟐. 𝟗

sco

res

Prof. Leal-Taixé and Prof. Niessner 39

Page 40: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Hinge Loss (SVM Loss)Multiclass SVM loss 𝐿𝑖 = σ𝑗≠𝑦𝑖

max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)

Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁

𝑇

Suppose: 3 training examples and 3 classes

Given a function with weights 𝑊,Training pairs [𝑥𝑖; 𝑦𝑖] (input and labels)

1.34.92.0

3.25.1-1.7

2.22.5-3.1

catchair“car”

Loss 2.9 0 12.9

𝐿 =1

𝑁

𝑖=1

𝑁

𝐿𝑖 =

=𝐿1 + 𝐿2 + 𝐿3

3=

=2.9 + 0 + 10.9

3=

= 𝟒. 𝟔

sco

res

Full Loss (over all pairs):

Prof. Leal-Taixé and Prof. Niessner 40

Page 41: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Hinge Loss vs Softmax

Softmax: 𝐿𝑖 = − log(𝑒𝑠𝑦𝑖

σ𝑖 𝑒𝑠𝑗)

Hinge loss: 𝐿𝑖 = σ𝑗≠𝑦𝑖max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)

Prof. Leal-Taixé and Prof. Niessner 41

Page 42: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Hinge Loss vs Softmax

Softmax: 𝐿𝑖 = − log(𝑒𝑠𝑦𝑖

σ𝑖 𝑒𝑠𝑗)

Given the following scores:

𝑠 = [5,−3, 2]

𝑠 = [5, 10, 10]

𝑠 = [5, −20,−20]

𝑦𝑖 = 0

Hinge loss: 𝐿𝑖 = σ𝑗≠𝑦𝑖max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)

Prof. Leal-Taixé and Prof. Niessner 42

Page 43: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Hinge Loss vs Softmax

Softmax: 𝐿𝑖 = − log(𝑒𝑠𝑦𝑖

σ𝑖 𝑒𝑠𝑗)

Given the following scores:

𝑠 = [5,−3, 2]

𝑠 = [5, 10, 10]

𝑠 = [5, −20,−20]

𝑦𝑖 = 0

Hinge loss: max(0,−3 − 5 + 1) +max 0, 2 − 5 + 1 =0

max(0, 10 − 5 + 1) +max 0, 10 − 5 + 1 =12

max(0,−20 − 5 + 1) +max 0,−20 − 5 + 1 =0

Hinge loss: 𝐿𝑖 = σ𝑗≠𝑦𝑖max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)

Prof. Leal-Taixé and Prof. Niessner 43

Page 44: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Hinge Loss vs Softmax

Softmax: 𝐿𝑖 = − log(𝑒𝑠𝑦𝑖

σ𝑖 𝑒𝑠𝑗)

Given the following scores:

𝑠 = [5,−3, 2]

𝑠 = [5, 10, 10]

𝑠 = [5, −20,−20]

𝑦𝑖 = 0

Hinge loss: Softmax loss: max(0,−3 − 5 + 1) +max 0, 2 − 5 + 1 =0

max(0, 10 − 5 + 1) +max 0, 10 − 5 + 1 =12

max(0,−20 − 5 + 1) +max 0,−20 − 5 + 1 =0

Google…-ln(e^(5)/(e^(5)+e^(-3)+e^(2)))=0.05

Google…-ln(e^(5)/(e^(5)+e^(10)+e^(10)))=5.70

Google…-ln(e^(5)/(e^(5)+e^(-20)+e^(-20)))=2.e-11

Hinge loss: 𝐿𝑖 = σ𝑗≠𝑦𝑖max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)

Prof. Leal-Taixé and Prof. Niessner 44

Page 45: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Hinge Loss vs Softmax

Softmax: 𝐿𝑖 = − log(𝑒𝑠𝑦𝑖

σ𝑖 𝑒𝑠𝑗)

Given the following scores:

𝑠 = [5,−3, 2]

𝑠 = [5, 10, 10]

𝑠 = [5, −20,−20]

𝑦𝑖 = 0

Hinge loss: Softmax loss: max(0,−3 − 5 + 1) +max 0, 2 − 5 + 1 =0

max(0, 10 − 5 + 1) +max 0, 10 − 5 + 1 =12

max(0,−20 − 5 + 1) +max 0,−20 − 5 + 1 =0

Google…-ln(e^(5)/(e^(5)+e^(-3)+e^(2)))=0.05

Google…-ln(e^(5)/(e^(5)+e^(10)+e^(10)))=5.70

Google…-ln(e^(5)/(e^(5)+e^(-20)+e^(-20)))=2.e-11

Softmax *always* wants to improve!Hinge Loss saturates

Hinge loss: 𝐿𝑖 = σ𝑗≠𝑦𝑖max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)

Prof. Leal-Taixé and Prof. Niessner 45

Page 46: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Loss in Compute Graph

𝑊

𝑥𝑖

𝑓(𝑥𝑖 ,𝑊)SVM 𝐿𝑖 = σ𝑗≠𝑦𝑖

max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)

𝑦𝑖

𝐿score function

regularization loss

data lossesSoftmax 𝐿𝑖 = − log(

𝑒𝑠𝑦𝑖

σ𝑖 𝑒𝑠𝑗)

Full Loss 𝐿 =1

𝑁σ𝑖=1𝑁 𝐿𝑖 + 𝑅2(𝑊)

e.g., 𝐿2-reg: 𝑅2 𝑊 = σ𝑖=1𝑁 𝑤𝑖

2

labeled data

input data

Given a function with weights 𝑊,Training pairs [𝑥𝑖; 𝑦𝑖] (input and labels)

Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁

𝑇

Prof. Leal-Taixé and Prof. Niessner 46

Page 47: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Compute Graphs

SVM 𝐿𝑖 = σ𝑗≠𝑦𝑖max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)

Softmax 𝐿𝑖 = − log(𝑒𝑠𝑦𝑖

σ𝑖 𝑒𝑠𝑗)

Full Loss 𝐿 =1

𝑁σ𝑖=1𝑁 𝐿𝑖 + 𝑅2(𝑊)

e.g., 𝐿2-reg: 𝑅2 𝑊 = σ𝑖=1𝑁 𝑤𝑖

2

Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁

𝑇

Prof. Leal-Taixé and Prof. Niessner 47

Page 48: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Compute Graphs

SVM 𝐿𝑖 = σ𝑗≠𝑦𝑖max(0, 𝑠𝑗 − 𝑠𝑦𝑖 + 1)

Softmax 𝐿𝑖 = − log(𝑒𝑠𝑦𝑖

σ𝑖 𝑒𝑠𝑗)

Full Loss 𝐿 =1

𝑁σ𝑖=1𝑁 𝐿𝑖 + 𝑅2(𝑊)

e.g., 𝐿2-reg: 𝑅2 𝑊 = σ𝑖=1𝑁 𝑤𝑖

2

Score function 𝑠 = 𝑓(𝑥𝑖 ,𝑊)e.g., 𝑓(𝑥𝑖 ,𝑊) = 𝑊 ⋅ 𝑥0, 𝑥1, … , 𝑥𝑁

𝑇

Want to find optimal 𝑊. I.e., weights are unknowns ofoptimization problem

Compute gradient w.r.t. 𝑊.

Gradient 𝛻𝑊𝐿 is computed via backpropagation

Prof. Leal-Taixé and Prof. Niessner 48

Page 49: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Weight Regularization & SVM LossMulticlass SVM loss 𝐿𝑖 = σ𝑗≠𝑦𝑖

max(0, 𝑓 𝑥𝑖;𝑊 𝑗 − 𝑓 𝑥𝑖;𝑊 𝑦𝑖+ 1)

𝐿2-reg: 𝑅2 𝑊 = σ𝑖=1𝑁 σ𝑗≠𝑦𝑖

𝑤𝑖2

𝐿1-reg: 𝑅1 𝑊 = σ𝑖=1𝑁 σ𝑗≠𝑦𝑖

|𝑤𝑖|

Full loss 𝐿 = 1

𝑁σ𝑖=1𝑁 σ𝑗≠𝑦𝑖

max 0, 𝑓 𝑥𝑖;𝑊 𝑗 − 𝑓 𝑥𝑖;𝑊 𝑦𝑖 + 1 + 𝜆𝑅(𝑊)

Prof. Leal-Taixé and Prof. Niessner 49

Page 50: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Weight Regularization & SVM LossMulticlass SVM loss 𝐿𝑖 = σ𝑗≠𝑦𝑖

max(0, 𝑓 𝑥𝑖;𝑊 𝑗 − 𝑓 𝑥𝑖;𝑊 𝑦𝑖+ 1)

𝐿2-reg: 𝑅2 𝑊 = σ𝑖=1𝑁 σ𝑗≠𝑦𝑖

𝑤𝑖2

𝐿1-reg: 𝑅1 𝑊 = σ𝑖=1𝑁 σ𝑗≠𝑦𝑖

|𝑤𝑖|

Full loss 𝐿 = 1

𝑁σ𝑖=1𝑁 σ𝑗≠𝑦𝑖

max 0, 𝑓 𝑥𝑖;𝑊 𝑗 − 𝑓 𝑥𝑖;𝑊 𝑦𝑖 + 1 + 𝜆𝑅(𝑊)

𝑤2 = [0.25, 0.25, 0.25, 0.25]

𝑥 = 1,1,1,1

𝑤1 = [1, 0, 0, 0]

Example:

𝑤1𝑇𝑥 = 𝑤2

𝑇𝑥 = 1

𝑅2 𝑤1 = 1

𝑅2 𝑤2 = 0.252 + 0.252 + 0.252 + 0.252 = 0.25

Prof. Leal-Taixé and Prof. Niessner 50

Page 51: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Activation Functions

Prof. Leal-Taixé and Prof. Niessner 51

Page 52: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Neurons

Prof. Leal-Taixé and Prof. Niessner 52

Page 53: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Neurons

Prof. Leal-Taixé and Prof. Niessner 53

Page 54: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Neural NetworksWhat is the shape of this function?

Loss (Softmax,

Hinge)

Prediction

Prof. Leal-Taixé and Prof. Niessner 54

Page 55: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Activation Functions or Hidden Units

Prof. Leal-Taixé and Prof. Niessner 55

Page 56: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Sigmoid

Can be interpreted as a probability

Prof. Leal-Taixé and Prof. Niessner 56

Page 57: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

SigmoidForward

Prof. Leal-Taixé and Prof. Niessner 57

Page 58: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

SigmoidForward

Saturated neurons kill the gradient flow

Prof. Leal-Taixé and Prof. Niessner 58

Page 59: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

SigmoidForward

Active region for gradient

descent

Prof. Leal-Taixé and Prof. Niessner 59

Page 60: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Sigmoid

Output is always positive

Prof. Leal-Taixé and Prof. Niessner 60

Page 61: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Problem of Positive Output

We want to compute the gradient wrt the weights

Prof. Leal-Taixé and Prof. Niessner 61

Page 62: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Problem of Positive Output

We want to compute the gradient wrt the weights

Prof. Leal-Taixé and Prof. Niessner 62

Page 63: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Problem of Positive Output

It is going to be either positive or negative for all weights

Prof. Leal-Taixé and Prof. Niessner 63

Page 64: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Problem of Positive Output

More on zero-mean data later

Prof. Leal-Taixé and Prof. Niessner 64

Page 65: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

tanh

Zero-centered

Still saturates

Still saturates

LeCun 1991Prof. Leal-Taixé and Prof. Niessner 65

Page 66: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Rectified Linear Units (ReLU)

Large and consistent gradients

Does not saturateFast convergence

Krizhevsky 2012Prof. Leal-Taixé and Prof. Niessner 66

Page 67: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Rectified Linear Units (ReLU)

Large and consistent gradients

Does not saturateFast convergence

What happens if a ReLU outputs zero?

Dead ReLU

Prof. Leal-Taixé and Prof. Niessner 67

Page 68: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Rectified Linear Units (ReLU)• Initializing ReLU neurons with slightly positive biases

(0.1) makes it likely that they stay active for most inputs

Prof. Leal-Taixé and Prof. Niessner 68

Page 69: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Leaky ReLU

Does not die

Mass 2013Prof. Leal-Taixé and Prof. Niessner 69

Page 70: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Parametric ReLU

Does not die

One more parameter to backprop into

He 2015Prof. Leal-Taixé and Prof. Niessner 70

Page 71: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Maxout units

Goodfellow 2013Prof. Leal-Taixé and Prof. Niessner 71

𝑀𝑎𝑥𝑜𝑢𝑡 = max(𝑤1𝑇𝑥 + 𝑏1, 𝑤2

𝑇𝑥 + 𝑏2)

Page 72: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Maxout units

Piecewise linear approximation of a convex function with N pieces

Goodfellow 2013Prof. Leal-Taixé and Prof. Niessner 72

Page 73: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Maxout units

Generalization of ReLUs

Linear regimes

Does not die

Does not saturate

Increase of the number of parametersProf. Leal-Taixé and Prof. Niessner 73

Page 74: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Quick Guide• Sigmoid is not really used

• ReLU is the standard choice

• Second choice are the variants of ReLu or Maxout

• Recurrent nets will require tanh or similar

Prof. Leal-Taixé and Prof. Niessner 74

Page 75: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Neural Networks

Data pre-processing?

Loss (Softmax,

Hinge)

Prediction

Prof. Leal-Taixé and Prof. Niessner 75

Page 76: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Data pre-processing

For images subtract the mean image (AlexNet) or per-channel mean (VGG-Net)Prof. Leal-Taixé and Prof. Niessner 76

Page 77: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Outlook

Prof. Leal-Taixé and Prof. Niessner 77

Page 78: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Outlook

Prof. Leal-Taixé and Prof. Niessner 78

Regularization in the optimization:- Dropout, weight decay, etc..

Regularization in the architecture:- Convolutions! (CNNs)

Init. of optimization- How to set weights at beginning

Handling limited training data- Data augmentation

Page 79: Lecture 6 recap · Cross-Entropy (Softmax) Softmax =−log( 𝑒𝑠𝑦 σ 𝑒 𝑠 ) Suppose: 3 training examples and 3 classes 1.3 4.9 2.0 3.2 5.1-1.7 2.2 2.5-3.1 cat chair “car”

Next lecture

• Next lecture Dec 13th

– More about training neural networks; regularization, BN, dropout, etc.

– Followed by CNNs

Prof. Leal-Taixé and Prof. Niessner 79