16
2/27/20 1 2019 Philipp Krähenbühl and Chao-Yuan Wu Adopted from: CNN architectures and visualization AlexNet Won the ImageNet competition 2012 Start of deep learning revolution in computer vision 0 7.5 15 22.5 30 2010 2011 2012 (AlexNet) 2013 (ZFNet) 2014 (VGG/GoogLeNet) 2015 (ResNet) Top-5 error ImageNet challenge Non-deep 8 layers 19-22 layers 152 layers ImageNet Classification with Deep Convolutional Neural Networks, Krizhevsky et al., NIPS 2012

07 cnn architectures - George Mason Universitykosecka/cs747/07_cnn_architectures.pdf · 2020-02-27 · What’s happening inside the black box? “cat” What about FC layers? Source:

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 07 cnn architectures - George Mason Universitykosecka/cs747/07_cnn_architectures.pdf · 2020-02-27 · What’s happening inside the black box? “cat” What about FC layers? Source:

2/27/20

1

� 2019 Philipp Krähenbühl and Chao-Yuan WuAdopted from:

CNN architectures and visualization

AlexNet

• Won the ImageNet competition 2012

• Start of deep learning revolution in computer vision

0

7.5

15

22.5

30

2010

2011

2012 (AlexNet)

2013 (ZFNet)

2014 (VGG/GoogLeNet)

2015 (ResNet)

Top-

5 er

ror

ImageNet challenge

Non-deep

8 layers

19-22 layers

152 layers

ImageNet Classification with Deep Convolutional Neural Networks, Krizhevsky et al., NIPS 2012

Page 2: 07 cnn architectures - George Mason Universitykosecka/cs747/07_cnn_architectures.pdf · 2020-02-27 · What’s happening inside the black box? “cat” What about FC layers? Source:

2/27/20

2

AlexNet

Max Pool 3x3

Conv 11x11

LRN

Conv 5x5 Conv 5x5

Max Pool 3x3

LRN

Conv 3x3

Conv 3x3 Conv 3x3

Conv 3x3 Conv 3x3

Max Pool 3x3

Linear (4096)

Linear (4096)

Linear (1000)

no pad

no padpad

...

stride=4, C=96

stride=2C=128

pad=0, stride=2

C=384

C=192

C=128

pad=0, stride=2

227

227

Max Pool 3x3

Conv 11x11

LRN

Conv 5x5 Conv 5x5

Max Pool 3x3

LRN

Conv 3x3

Conv 3x3 Conv 3x3

Conv 3x3 Conv 3x3

Max Pool 3x3

Linear (4096)

Linear (4096)

Linear (1000)

227

227

Activation maps in AlexNet

55

55

96

2727

256

1313

384

1313

384

1313

256

66

256

9216

4096

4096

1000

Page 3: 07 cnn architectures - George Mason Universitykosecka/cs747/07_cnn_architectures.pdf · 2020-02-27 · What’s happening inside the black box? “cat” What about FC layers? Source:

2/27/20

3

Activation maps in AlexNet

22755 27 13 13 13 6

1 1 11

10

100

1000

input

conv 1

conv 2

conv 3

conv 4

conv 5

maxpool

fc 6

fc 7

fc 8W

idth

3

96 256 384 384 256 25640964096

1000

1

10

100

1000

10000

Cha

nnel

s

Parameters and computation

106224

150 112 75 38 17 40

75150225300

conv 1

conv 2

conv 3

conv 4

conv 5

fc 6

fc 7

fc 8FL

OPs

(M)

0.0 0.3 0.9 0.7 0.4

37.8

16.84.1

0

10

20

30

40

Para

met

ers

(M)

Page 4: 07 cnn architectures - George Mason Universitykosecka/cs747/07_cnn_architectures.pdf · 2020-02-27 · What’s happening inside the black box? “cat” What about FC layers? Source:

2/27/20

4

What’s happening inside the black box?

“cat”

Visualize first-layer weights directly

What’s happening inside the black box?

“cat”

Not too helpful for subsequent layers

Features from a CIFAR10 network, via Stanford CS231n

Page 5: 07 cnn architectures - George Mason Universitykosecka/cs747/07_cnn_architectures.pdf · 2020-02-27 · What’s happening inside the black box? “cat” What about FC layers? Source:

2/27/20

5

What’s happening inside the black box?

“cat”Visualize maximally activating patches:

pick a unit; run many images through the network; visualize patches that produce

the highest output values

What’s happening inside the black box?

“cat”

Visualize maximally activating patches

Page 6: 07 cnn architectures - George Mason Universitykosecka/cs747/07_cnn_architectures.pdf · 2020-02-27 · What’s happening inside the black box? “cat” What about FC layers? Source:

2/27/20

6

What’s happening inside the black box?

“cat”

What about FC layers?

Source: Stanford CS231n

Visualize nearest neighbor images

according to activation vectors

What’s happening inside the black box?

“cat”

What about FC layers?

Source: Andrej Karpathy

Fancy dimensionality reduction, e.g., t-SNE

Page 7: 07 cnn architectures - George Mason Universitykosecka/cs747/07_cnn_architectures.pdf · 2020-02-27 · What’s happening inside the black box? “cat” What about FC layers? Source:

2/27/20

7

What’s happening inside the black box?

“cat”

Visualize activations for a particular image

Source

What’s happening inside the black box?

“cat”

Visualize activations for a particular image

Source

Visualize pixel values responsible for the

activation

1. Unpool2. Rectify3. Deconvolve

Page 8: 07 cnn architectures - George Mason Universitykosecka/cs747/07_cnn_architectures.pdf · 2020-02-27 · What’s happening inside the black box? “cat” What about FC layers? Source:

2/27/20

8

Deep visualization toolbox

J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, and H. Lipson, Understanding neural networks through deep visualization, ICML DL workshop, 2015

YouTube video

Additional visualization techniques

A. Nguyen, J. Yosinski, J. Clune, Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks, ICML workshop, 2016

K. Simonyan, A. Vedaldi, and A. Zisserman, Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps, ICLR 2014

J. Springenberg, A. Dosovitskiy, T. Brox, M. Riedmiller, Striving for simplicity: The all convolutional net, ICLR workshop, 2015

Page 9: 07 cnn architectures - George Mason Universitykosecka/cs747/07_cnn_architectures.pdf · 2020-02-27 · What’s happening inside the black box? “cat” What about FC layers? Source:

2/27/20

9

ZFNet

• Won the ImageNet competition 2013

• Nice analysis and visualization of AlexNet

• Introduced up-conv0

7.5

15

22.5

30

2010

2011

2012 (AlexNet)

2013 (ZFNet)

2014 (VGG/GoogLeNet)

2015 (ResNet)

Top-

5 er

ror

ImageNet challenge

Non-deep

8 layers

19-22 layers

152 layers

Visualizing and Understanding Convolutional Networks, Zeiler & Fergus, ECCV 2014

AlexNet to ZFNet

Max Pool 3x3

Conv 11x11

LRN

Conv 5x5 Conv 5x5

Max Pool 3x3

LRN

Conv 3x3

Conv 3x3 Conv 3x3

Conv 3x3 Conv 3x3

Max Pool 3x3

Linear (4096)

Linear (4096)

Linear (1000)

227

227

Max Pool 3x3

Conv 7x7

LRN

Conv 5x5

Max Pool 3x3

LRN

Conv 3x3

Conv 3x3

Max Pool 3x3

Linear (4096)

Linear (4096)

Linear (1000)

227

227

Conv 3x3

stride=4, C=96

stride=2C=12

8

pad=0, stride=2

C=384

C=192

C=128

pad=0, stride=2

stride=2, C=96

stride=2

stride=2, C=256

pad=0, stride=2

C=512

C=1024C=512pad=0, stride=2

Page 10: 07 cnn architectures - George Mason Universitykosecka/cs747/07_cnn_architectures.pdf · 2020-02-27 · What’s happening inside the black box? “cat” What about FC layers? Source:

2/27/20

10

Activation maps in ZFNet

22755

2713 13 13

6

1 1 1

225110

2613 13 13

6

1 1 11

10

100

1000

input

conv 1

conv 2

conv 3

conv 4

conv 5

maxpool

fc 6

fc 7

fc 8Widt

h

AlexNet ZFNet

3

96256 384 384 256 256

4096 40961000

3

96256

5121024

512 512

4096 40961000

1

10

100

1000

10000

Channels

Parameters and computation

106 224 150 112 75 38 17 4172

416199

798 798

76 17 40

250

500

750

1000

conv 1

conv 2

conv 3

conv 4

conv 5

fc 6

fc 7

fc 8FL

OPs (M

)

AlexNet ZFNet

0.0 0.3 0.9 0.7 0.4

37.8

16.8 4.10.0 0.6 1.2 4.7 4.7

75.5

16.84.1

0

20

40

60

80

Para

mete

rs (M

)

Page 11: 07 cnn architectures - George Mason Universitykosecka/cs747/07_cnn_architectures.pdf · 2020-02-27 · What’s happening inside the black box? “cat” What about FC layers? Source:

2/27/20

11

VGG

• Deeper AlexNet/ZFNet0

7.5

15

22.5

30

2010

2011

2012 (AlexNet)

2013 (ZFNet)

2014 (VGG/GoogLeNet)

2015 (ResNet)

Top-

5 er

ror

ImageNet challenge

Non-deep

8 layers

19-22 layers

152 layers

Very Deep Convolutional Networks for Large-Scale Image Recognition, Simonyan and Zisserman, ICLR 2015

ZFNet to VGG

Max Pool 3x3

Conv 7x7

LRN

Conv 5x5

Max Pool 3x3

LRN

Conv 3x3

Conv 3x3

Max Pool 3x3

Linear (4096)

Linear (4096)

Linear (1000)

Conv 3x3

stride=2, C=96

stride=2stride=2, C=256

pad=0, stride=2C=512

C=1024

C=512pad=0, stride=2

Conv 3x3

Max Pool 2x2

Conv 3x3

Conv 3x3

Max Pool 2x2

Conv 3x3

Conv 3x3

Max Pool 2x2

Conv 3x3

Conv 3x3

Conv 3x3

Max Pool 2x2

Conv 3x3

Conv 3x3

Conv 3x3

Max Pool 2x2

Conv 3x3

Conv 3x3

Linear (4096)

Linear (4096)

Linear (1000)

C=64C=64

C=128C=128

C=256C=256C=256

C=512C=512C=512

C=512C=512C=512

stride=2

stride=2

stride=2

stride=2

stride=2

Page 12: 07 cnn architectures - George Mason Universitykosecka/cs747/07_cnn_architectures.pdf · 2020-02-27 · What’s happening inside the black box? “cat” What about FC layers? Source:

2/27/20

12

Insights in VGG

• Why use smaller filters?

• Factorization≈Conv 3x3

Conv 3x3Conv 5x5

Training VGG

• Vanishing gradients

Conv 3x3

Max Pool 2x2

Conv 3x3

Conv 3x3

Max Pool 2x2

Conv 3x3

Conv 3x3

Max Pool 2x2

Conv 3x3

Conv 3x3

Conv 3x3

Max Pool 2x2

Conv 3x3

Conv 3x3

Conv 3x3

Max Pool 2x2

Conv 3x3

Conv 3x3

Linear (4096)

Linear (4096)

Linear (1000)

C=64C=64

C=128C=128

C=256C=256C=256

C=512C=512C=512

C=512C=512C=512

stride=2

stride=2

stride=2

stride=2

stride=2

Page 13: 07 cnn architectures - George Mason Universitykosecka/cs747/07_cnn_architectures.pdf · 2020-02-27 · What’s happening inside the black box? “cat” What about FC layers? Source:

2/27/20

13

Parameters and computation

0.1

1.9

0.9

1.9

0.9

1.9

0.50.9

1.9 1.9

0.50.50.50.1 0.00.0

01122

conv 1.1

conv 1.2

conv 2.1

conv 2.2

conv 3.1

conv 3.2

conv 3.3

conv 4.1

conv 4.2

conv 4.3

conv 5.1

conv 5.2

conv 5.3

fc 6

fc 7

fc 8FL

OPs

(G)

0.00.0 0.1 0.1 0.30.60.6 1.22.42.42.42.42.4

102.8

16.84.10

30

60

90

120

Para

mete

rs (M

)

1x1 convolutions and factorization

• Convolutions are linear operators

• Can we make them non-linear?

Network-in-Network, Lin et al., ICLR 2014

Figure source: "Network-in-Network", ICLR 2014 paper

Page 14: 07 cnn architectures - George Mason Universitykosecka/cs747/07_cnn_architectures.pdf · 2020-02-27 · What’s happening inside the black box? “cat” What about FC layers? Source:

2/27/20

14

Factorized convolution

• Implements “non-linear” convolution

• Less computation, fewer parameters

Conv 3x3

ReLU

Conv 1x1

Inception architecture

• GoogLeNet

• Evolution of Network-in-Network

Going deeper with convolutions, Szegedy et al. CVPR 2015.

Conv 5x5

Concat

Conv 3x3Conv 1x1 Conv 1x1

Max Pool 3x3Conv 1x1Conv 1x1

Page 15: 07 cnn architectures - George Mason Universitykosecka/cs747/07_cnn_architectures.pdf · 2020-02-27 · What’s happening inside the black box? “cat” What about FC layers? Source:

2/27/20

15

Inception architecture

Residual Networks

• Uses residual connections to build deeper networks

• Activation maps are additive

0

7.5

15

22.5

30

2010

2011

2012 (AlexNet)

2013 (ZFNet)

2014 (VGG/GoogLeNet)

2015 (ResNet)

Top-

5 er

ror

ImageNet challenge

Non-deep

8 layers

19-22 layers

152 layers

Deep Residual Learning for Image Recognition, He et al., CVPR 2016

Page 16: 07 cnn architectures - George Mason Universitykosecka/cs747/07_cnn_architectures.pdf · 2020-02-27 · What’s happening inside the black box? “cat” What about FC layers? Source:

2/27/20

16

Residual blocks

• Add shortcut connections for gradients

• Identity

• Strided 1x1 convolution

Conv

ReLU

BN

Conv

BN

!"

!"#$

ReLU

+

ResNet

• Multiple variants

• ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-152, ResNet-1001

Conv 1x1 (128)

Conv 3x3 (128)

Conv 1x1 (512)

Conv 1x1 (64)

Conv 3x3 (64)

Conv 1x1 (256)

Conv 1x1 (256)

Conv 3x3 (256)

Conv 1x1 (1024)

Conv 1x1 (512)

Conv 3x3 (512)

Conv 1x1 (2048)

Avg Pool

x3

x8

x36

x3

ResNet-152Conv 7x7 (64)

Max Pool 3x3