Upload
others
View
19
Download
0
Embed Size (px)
Citation preview
Deep Learning
Andreas Geiger
Autonomous Vision GroupMPI Tubingen
Computer Vision and Geometry LabETH Zurich
January 13, 2017
Deep Learning
2
Deep Learning
“Deep learning is just a buzzword for neural nets, and neural nets arejust a stack of matrix-vector multiplications, interleaved with somenon-linearities. No magic there.” Ronan Collobert, 2011
3
“I sometimes get questions like: how does deep learning comparewith graphical models? There is no answer to this question becausedeep learning and graphical models are orthogonal concepts that canbe advantageously combined.” Yann LeCun, 2013
4
Representation Matters
CHAPTER 1. INTRO
Cartesian Coordinates
x
y
TION
Polar Coordinates
r
θ
5
Classification
fθ
Input
"Beach"
Model Output
� fθ : x ∈ RW×H 7→ y ∈ {1, . . . , L}� fθ : x ∈ RW×H 7→ y = [0, . . . , 1, . . . , 0] ∈ {0, 1}L
6
Linear Regression
� Mapping:x = Image
y = Ax + a
� Classification:L∗ = argmax l yl
xN
x1
1
7
Linear Regression
� Mapping:x = Imagey = Ax + a
� Classification:L∗ = argmax l yl
xN
x1
1
yL
y1
A1L
A11
a1aL
ANL
AN1
7
Linear Regression
� Mapping:x = Imagey = Ax + a
� Classification:L∗ = argmax l yl
xN
x1
1
yL
y1
A1L
A11
a1aL
ANL
AN1
7
Linear Regression
� Two Layers:
x = Image
h = Ax + a
y = Bh + b
� Is this model better?
xN
x1
1
7
Linear Regression
� Two Layers:
x = Image
h = Ax + a
y = Bh + b
� Is this model better?
xN
x1
1
hM
h1
A1M
A11
a1aM
ANM
AN1
1
7
Linear Regression
� Two Layers:
x = Image
h = Ax + a
y = Bh + b
� Is this model better?
xN
x1
1
hM
h1
A1M
A11
a1aM
ANM
AN1
y1
yL
1
B1L
B11
b1bL
BML
BM1
7
Linear Regression
� Two Layers:
x = Image
h = Ax + a
y = Bh + b
� Is this model better?
xN
x1
1
hM
h1
A1M
A11
a1aM
ANM
AN1
y1
yL
1
B1L
B11
b1bL
BML
BM1
7
Linear Regression
� Two Layers:
x = Image
h = Ax + a
y = Bh + b
� Is this model better?y = Bh + b
xN
x1
1
hM
h1
A1M
A11
a1aM
ANM
AN1
y1
yL
1
B1L
B11
b1bL
BML
BM1
7
Linear Regression
� Two Layers:
x = Image
h = Ax + a
y = Bh + b
� Is this model better?y = B(Ax + a) + b
xN
x1
1
hM
h1
A1M
A11
a1aM
ANM
AN1
y1
yL
1
B1L
B11
b1bL
BML
BM1
7
Linear Regression
� Two Layers:
x = Image
h = Ax + a
y = Bh + b
� Is this model better?y = BAx + Ba + b
xN
x1
1
hM
h1
A1M
A11
a1aM
ANM
AN1
y1
yL
1
B1L
B11
b1bL
BML
BM1
7
Linear Regression
� Two Layers:
x = Image
h = Ax + a
y = Bh + b
� Is this model better?y = BA︸︷︷︸
=C
x + Ba + b︸ ︷︷ ︸=c
xN
x1
1
hM
h1
A1M
A11
a1aM
ANM
AN1
y1
yL
1
B1L
B11
b1bL
BML
BM1
7
Linear Regression
� Two Layers:
x = Image
h = Ax + a
y = Bh + b
� Is this model better?y = Cx + c
xN
x1
1
yL
y1
C1L
C11
c1cL
CNL
CN1
7
Logistic Regression
� Mapping:x = Image
y = σ(Ax + a)
� With (elementwise):σ(x) = 1
1+exp(−x)
� Classification:L∗ = argmax l yl
xN
x1
1
8
Logistic Regression
� Mapping:x = Imagey = σ(Ax + a)
� With (elementwise):σ(x) = 1
1+exp(−x)
� Classification:L∗ = argmax l yl
xN
x1
1
yL
y1
A1L
A11
a1aL
ANL
AN1
8
Logistic Regression
� Mapping:x = Imagey = σ(Ax + a)
� With (elementwise):σ(x) = 1
1+exp(−x)
� Classification:L∗ = argmax l yl
−10 −5 0 5 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
8
Logistic Regression
� Mapping:x = Imagey = σ(Ax + a)
� With (elementwise):σ(x) = 1
1+exp(−x)
� Classification:L∗ = argmax l yl
xN
x1
1
yL
y1
A1L
A11
a1aL
ANL
AN1
8
Neural Networks
� Two Layers:
x = Image
h = σ(Ax + a)
y = σ(Bh + b)
� Now:
xN
x1
1
8
Neural Networks
� Two Layers:
x = Image
h = σ(Ax + a)
y = σ(Bh + b)
� Now:
xN
x1
1
hM
h1
A1M
A11
a1aM
ANM
AN1
1
8
Neural Networks
� Two Layers:
x = Image
h = σ(Ax + a)
y = σ(Bh + b)
� Now:
xN
x1
1
hM
h1
A1M
A11
a1aM
ANM
AN1
y1
yL
1
B1L
B11
b1bL
BML
BM1
8
Neural Networks
� Two Layers:
x = Image
h = σ(Ax + a)
y = σ(Bh + b)
� Now:y = σ(B(σ(Ax + a)) + b)
xN
x1
1
hM
h1
A1M
A11
a1aM
ANM
AN1
y1
yL
1
B1L
B11
b1bL
BML
BM1
8
Training Neural Networks
xN
x1
1
hM
h1 y1
yL
1
xobs yobs
D. Rumelhart, G. Hinton and R. Williams: Learning representations by back-propagatingerrors. Nature, 1986.
9
Training Neural Networks
xN
x1
1
hM
h1 y1
yL
1
xobs yobs
D. Rumelhart, G. Hinton and R. Williams: Learning representations by back-propagatingerrors. Nature, 1986.
9
Training Neural Networks
xN
x1
1
hM
h1 y1
yL
1
xobs yobs
D. Rumelhart, G. Hinton and R. Williams: Learning representations by back-propagatingerrors. Nature, 1986.
9
Training Neural Networks
xN
x1
1
hM
h1 y1
yL
1
xobs yobs
D. Rumelhart, G. Hinton and R. Williams: Learning representations by back-propagatingerrors. Nature, 1986.
9
Training Neural Networks
xN
x1
1
hM
h1 y1
yL
1
xobs yobsyest
D. Rumelhart, G. Hinton and R. Williams: Learning representations by back-propagatingerrors. Nature, 1986.
9
Training Neural Networks
xN
x1
1
hM
h1 y1
yL
1
xobs yobsyestE =
D. Rumelhart, G. Hinton and R. Williams: Learning representations by back-propagatingerrors. Nature, 1986.
9
Training Neural Networks
xN
x1
1
hM
h1 y1
yL
1
xobs yobs
D. Rumelhart, G. Hinton and R. Williams: Learning representations by back-propagatingerrors. Nature, 1986.
9
Training Neural Networks
xN
x1
1
hM
h1 y1
yL
1
xobs yobs
D. Rumelhart, G. Hinton and R. Williams: Learning representations by back-propagatingerrors. Nature, 1986.
9
Convolutional Neural Networks
� Convolution filters with shared parameters
� Subsampling / pooling / unpooling
Try yourself: www.cvlibs.net/learn
Y. LeCun, L. Bottou, Y. Bengio and Patrick Haffner: Gradient-based learning applied todocument recognition. Proceedings of the IEEE, 1989, Vol. 86, no. 11, pp. 2278–2324.
10
Convolutional Neural Networks
11
Depth Matters
3.57
6.7 7.3
11.7
16.4
25.8
28.2
ILSVRC'15ResNet
ILSVRC'14GoogleNet
ILSVRC'14VGG
ILSVRC'13 ILSVRC'12AlexNet
ILSVRC'11 ILSVRC'10
ImageNet Classification top-5 error (%)
shallow8 layers
19 layers22 layers
152 layers
8 layers
K. He, X. Zhang, S. Ren, and J. Sun: Deep Residual Learning for Image Recognition.CVPR, 2016. Best Paper Award.
12
Feature Visualization
M. Zeiler and R. Fergus: Visualizing and Understanding Conv. Networks. ECCV, 2014.
13
Feature Visualization
M. Zeiler and R. Fergus: Visualizing and Understanding Conv. Networks. ECCV, 2014.
13
Feature Visualization
M. Zeiler and R. Fergus: Visualizing and Understanding Conv. Networks. ECCV, 2014.
13
Feature Visualization
M. Zeiler and R. Fergus: Visualizing and Understanding Conv. Networks. ECCV, 2014.
13
Image Captioning
O. Vinyals, A. Toshev, S. Bengio and D. Erhan: Show and Tell: A Neural Image CaptionGenerator. CVPR, 2015.
14
Graphical Models vs. Deep Learning
Graphical Models
� Probabilistic
� Dependencies btw. RVs
� Low capacity
� Domain knowledge: easy
Deep Neural Networks
� Deterministic
� Input/Output Mapping
� High capacity
� Domain knowledge: hard
Combinations:D. Kingma and M. Welling: Auto-encoding variational Bayes. ICLR, 2014.
L. Chen, A. Schwing and R. Urtasun: Learning Deep Structured Models. ICML, 2015.
J. Domke: Learning graphical model parameters with approximate marginal inference.PAMI, 2013, Vol. 35, no. 10, pp. 2454–2467.
15