43
Deep Learning Andreas Geiger Autonomous Vision Group MPI T¨ ubingen Computer Vision and Geometry Lab ETH Z¨ urich January 13, 2017

Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

  • Upload
    others

  • View
    19

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Deep Learning

Andreas Geiger

Autonomous Vision GroupMPI Tubingen

Computer Vision and Geometry LabETH Zurich

January 13, 2017

Page 2: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Deep Learning

2

Page 3: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Deep Learning

“Deep learning is just a buzzword for neural nets, and neural nets arejust a stack of matrix-vector multiplications, interleaved with somenon-linearities. No magic there.” Ronan Collobert, 2011

3

Page 4: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

“I sometimes get questions like: how does deep learning comparewith graphical models? There is no answer to this question becausedeep learning and graphical models are orthogonal concepts that canbe advantageously combined.” Yann LeCun, 2013

4

Page 5: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Representation Matters

CHAPTER 1. INTRO

Cartesian Coordinates

x

y

TION

Polar Coordinates

r

θ

5

Page 6: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Classification

Input

"Beach"

Model Output

� fθ : x ∈ RW×H 7→ y ∈ {1, . . . , L}� fθ : x ∈ RW×H 7→ y = [0, . . . , 1, . . . , 0] ∈ {0, 1}L

6

Page 7: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Linear Regression

� Mapping:x = Image

y = Ax + a

� Classification:L∗ = argmax l yl

xN

x1

1

7

Page 8: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Linear Regression

� Mapping:x = Imagey = Ax + a

� Classification:L∗ = argmax l yl

xN

x1

1

yL

y1

A1L

A11

a1aL

ANL

AN1

7

Page 9: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Linear Regression

� Mapping:x = Imagey = Ax + a

� Classification:L∗ = argmax l yl

xN

x1

1

yL

y1

A1L

A11

a1aL

ANL

AN1

7

Page 10: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Linear Regression

� Two Layers:

x = Image

h = Ax + a

y = Bh + b

� Is this model better?

xN

x1

1

7

Page 11: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Linear Regression

� Two Layers:

x = Image

h = Ax + a

y = Bh + b

� Is this model better?

xN

x1

1

hM

h1

A1M

A11

a1aM

ANM

AN1

1

7

Page 12: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Linear Regression

� Two Layers:

x = Image

h = Ax + a

y = Bh + b

� Is this model better?

xN

x1

1

hM

h1

A1M

A11

a1aM

ANM

AN1

y1

yL

1

B1L

B11

b1bL

BML

BM1

7

Page 13: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Linear Regression

� Two Layers:

x = Image

h = Ax + a

y = Bh + b

� Is this model better?

xN

x1

1

hM

h1

A1M

A11

a1aM

ANM

AN1

y1

yL

1

B1L

B11

b1bL

BML

BM1

7

Page 14: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Linear Regression

� Two Layers:

x = Image

h = Ax + a

y = Bh + b

� Is this model better?y = Bh + b

xN

x1

1

hM

h1

A1M

A11

a1aM

ANM

AN1

y1

yL

1

B1L

B11

b1bL

BML

BM1

7

Page 15: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Linear Regression

� Two Layers:

x = Image

h = Ax + a

y = Bh + b

� Is this model better?y = B(Ax + a) + b

xN

x1

1

hM

h1

A1M

A11

a1aM

ANM

AN1

y1

yL

1

B1L

B11

b1bL

BML

BM1

7

Page 16: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Linear Regression

� Two Layers:

x = Image

h = Ax + a

y = Bh + b

� Is this model better?y = BAx + Ba + b

xN

x1

1

hM

h1

A1M

A11

a1aM

ANM

AN1

y1

yL

1

B1L

B11

b1bL

BML

BM1

7

Page 17: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Linear Regression

� Two Layers:

x = Image

h = Ax + a

y = Bh + b

� Is this model better?y = BA︸︷︷︸

=C

x + Ba + b︸ ︷︷ ︸=c

xN

x1

1

hM

h1

A1M

A11

a1aM

ANM

AN1

y1

yL

1

B1L

B11

b1bL

BML

BM1

7

Page 18: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Linear Regression

� Two Layers:

x = Image

h = Ax + a

y = Bh + b

� Is this model better?y = Cx + c

xN

x1

1

yL

y1

C1L

C11

c1cL

CNL

CN1

7

Page 19: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Logistic Regression

� Mapping:x = Image

y = σ(Ax + a)

� With (elementwise):σ(x) = 1

1+exp(−x)

� Classification:L∗ = argmax l yl

xN

x1

1

8

Page 20: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Logistic Regression

� Mapping:x = Imagey = σ(Ax + a)

� With (elementwise):σ(x) = 1

1+exp(−x)

� Classification:L∗ = argmax l yl

xN

x1

1

yL

y1

A1L

A11

a1aL

ANL

AN1

8

Page 21: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Logistic Regression

� Mapping:x = Imagey = σ(Ax + a)

� With (elementwise):σ(x) = 1

1+exp(−x)

� Classification:L∗ = argmax l yl

−10 −5 0 5 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

8

Page 22: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Logistic Regression

� Mapping:x = Imagey = σ(Ax + a)

� With (elementwise):σ(x) = 1

1+exp(−x)

� Classification:L∗ = argmax l yl

xN

x1

1

yL

y1

A1L

A11

a1aL

ANL

AN1

8

Page 23: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Neural Networks

� Two Layers:

x = Image

h = σ(Ax + a)

y = σ(Bh + b)

� Now:

xN

x1

1

8

Page 24: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Neural Networks

� Two Layers:

x = Image

h = σ(Ax + a)

y = σ(Bh + b)

� Now:

xN

x1

1

hM

h1

A1M

A11

a1aM

ANM

AN1

1

8

Page 25: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Neural Networks

� Two Layers:

x = Image

h = σ(Ax + a)

y = σ(Bh + b)

� Now:

xN

x1

1

hM

h1

A1M

A11

a1aM

ANM

AN1

y1

yL

1

B1L

B11

b1bL

BML

BM1

8

Page 26: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Neural Networks

� Two Layers:

x = Image

h = σ(Ax + a)

y = σ(Bh + b)

� Now:y = σ(B(σ(Ax + a)) + b)

xN

x1

1

hM

h1

A1M

A11

a1aM

ANM

AN1

y1

yL

1

B1L

B11

b1bL

BML

BM1

8

Page 27: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Training Neural Networks

xN

x1

1

hM

h1 y1

yL

1

xobs yobs

D. Rumelhart, G. Hinton and R. Williams: Learning representations by back-propagatingerrors. Nature, 1986.

9

Page 28: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Training Neural Networks

xN

x1

1

hM

h1 y1

yL

1

xobs yobs

D. Rumelhart, G. Hinton and R. Williams: Learning representations by back-propagatingerrors. Nature, 1986.

9

Page 29: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Training Neural Networks

xN

x1

1

hM

h1 y1

yL

1

xobs yobs

D. Rumelhart, G. Hinton and R. Williams: Learning representations by back-propagatingerrors. Nature, 1986.

9

Page 30: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Training Neural Networks

xN

x1

1

hM

h1 y1

yL

1

xobs yobs

D. Rumelhart, G. Hinton and R. Williams: Learning representations by back-propagatingerrors. Nature, 1986.

9

Page 31: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Training Neural Networks

xN

x1

1

hM

h1 y1

yL

1

xobs yobsyest

D. Rumelhart, G. Hinton and R. Williams: Learning representations by back-propagatingerrors. Nature, 1986.

9

Page 32: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Training Neural Networks

xN

x1

1

hM

h1 y1

yL

1

xobs yobsyestE =

D. Rumelhart, G. Hinton and R. Williams: Learning representations by back-propagatingerrors. Nature, 1986.

9

Page 33: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Training Neural Networks

xN

x1

1

hM

h1 y1

yL

1

xobs yobs

D. Rumelhart, G. Hinton and R. Williams: Learning representations by back-propagatingerrors. Nature, 1986.

9

Page 34: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Training Neural Networks

xN

x1

1

hM

h1 y1

yL

1

xobs yobs

D. Rumelhart, G. Hinton and R. Williams: Learning representations by back-propagatingerrors. Nature, 1986.

9

Page 35: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Convolutional Neural Networks

� Convolution filters with shared parameters

� Subsampling / pooling / unpooling

Try yourself: www.cvlibs.net/learn

Y. LeCun, L. Bottou, Y. Bengio and Patrick Haffner: Gradient-based learning applied todocument recognition. Proceedings of the IEEE, 1989, Vol. 86, no. 11, pp. 2278–2324.

10

Page 36: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Convolutional Neural Networks

11

Page 37: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Depth Matters

3.57

6.7 7.3

11.7

16.4

25.8

28.2

ILSVRC'15ResNet

ILSVRC'14GoogleNet

ILSVRC'14VGG

ILSVRC'13 ILSVRC'12AlexNet

ILSVRC'11 ILSVRC'10

ImageNet Classification top-5 error (%)

shallow8 layers

19 layers22 layers

152 layers

8 layers

K. He, X. Zhang, S. Ren, and J. Sun: Deep Residual Learning for Image Recognition.CVPR, 2016. Best Paper Award.

12

Page 38: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Feature Visualization

M. Zeiler and R. Fergus: Visualizing and Understanding Conv. Networks. ECCV, 2014.

13

Page 39: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Feature Visualization

M. Zeiler and R. Fergus: Visualizing and Understanding Conv. Networks. ECCV, 2014.

13

Page 40: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Feature Visualization

M. Zeiler and R. Fergus: Visualizing and Understanding Conv. Networks. ECCV, 2014.

13

Page 41: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Feature Visualization

M. Zeiler and R. Fergus: Visualizing and Understanding Conv. Networks. ECCV, 2014.

13

Page 42: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Image Captioning

O. Vinyals, A. Toshev, S. Bengio and D. Erhan: Show and Tell: A Neural Image CaptionGenerator. CVPR, 2015.

14

Page 43: Deep Learning - Cvlibs · Deep Learning Andreas Geiger Autonomous Vision Group MPI Tubingen Computer Vision and Geometry Lab ETH Zurich January 13, 2017

Graphical Models vs. Deep Learning

Graphical Models

� Probabilistic

� Dependencies btw. RVs

� Low capacity

� Domain knowledge: easy

Deep Neural Networks

� Deterministic

� Input/Output Mapping

� High capacity

� Domain knowledge: hard

Combinations:D. Kingma and M. Welling: Auto-encoding variational Bayes. ICLR, 2014.

L. Chen, A. Schwing and R. Urtasun: Learning Deep Structured Models. ICML, 2015.

J. Domke: Learning graphical model parameters with approximate marginal inference.PAMI, 2013, Vol. 35, no. 10, pp. 2454–2467.

15