07 cnn architectures - George Mason Universitykosecka/cs747/07_cnn_architectures.pdf · 2020-02-27 · What’s happening inside the black box? “cat” What about FC layers? Source:

2/27/20

1

� 2019 Philipp Krähenbühl and Chao-Yuan WuAdopted from:

CNN architectures and visualization

AlexNet

• Won the ImageNet competition 2012

• Start of deep learning revolution in computer vision

0

7.5

15

22.5

30

2010

2011

2012 (AlexNet)

2013 (ZFNet)

2014 (VGG/GoogLeNet)

2015 (ResNet)

Top-

5 er

ror

ImageNet challenge

Non-deep

8 layers

19-22 layers

152 layers

ImageNet Classification with Deep Convolutional Neural Networks, Krizhevsky et al., NIPS 2012

2/27/20

2

AlexNet

Max Pool 3x3

Conv 11x11

LRN

Conv 5x5 Conv 5x5

Max Pool 3x3

LRN

Conv 3x3

Conv 3x3 Conv 3x3

Conv 3x3 Conv 3x3

Max Pool 3x3

Linear (4096)

Linear (4096)

Linear (1000)

no pad

no padpad

...

stride=4, C=96

stride=2C=128

pad=0, stride=2

C=384

C=192

C=128

pad=0, stride=2

227

227

Max Pool 3x3

Conv 11x11

LRN

Conv 5x5 Conv 5x5

Max Pool 3x3

LRN

Conv 3x3

Conv 3x3 Conv 3x3

Conv 3x3 Conv 3x3

Max Pool 3x3

Linear (4096)

Linear (4096)

Linear (1000)

227

227

Activation maps in AlexNet

55

55

96

2727

256

1313

384

1313

384

1313

256

66

256

9216

4096

4096

1000

2/27/20

3

Activation maps in AlexNet

22755 27 13 13 13 6

1 1 11

10

100

1000

input

conv 1

conv 2

conv 3

conv 4

conv 5

maxpool

fc 6

fc 7

fc 8W

idth

3

96 256 384 384 256 25640964096

1000

1

10

100

1000

10000

Cha

nnel

s

Parameters and computation

106224

150 112 75 38 17 40

75150225300

conv 1

conv 2

conv 3

conv 4

conv 5

fc 6

fc 7

fc 8FL

OPs

(M)

0.0 0.3 0.9 0.7 0.4

37.8

16.84.1

0

10

20

30

40

Para

met

ers

(M)

2/27/20

4

What’s happening inside the black box?

“cat”

Visualize first-layer weights directly


“cat”

Not too helpful for subsequent layers

Features from a CIFAR10 network, via Stanford CS231n

http://cs231n.stanford.edu/slides/2018/cs231n_2018_lecture13.pdf

2/27/20

5


“cat”Visualize maximally activating patches:

pick a unit; run many images through the network; visualize patches that produce

the highest output values


“cat”

Visualize maximally activating patches

2/27/20

6


“cat”

What about FC layers?

Source: Stanford CS231n

Visualize nearest neighbor images

according to activation vectors


“cat”

What about FC layers?

Source: Andrej Karpathy

Fancy dimensionality reduction, e.g., t-SNE

http://cs231n.stanford.edu/slides/2018/cs231n_2018_lecture13.pdf

https://cs.stanford.edu/people/karpathy/cnnembed/

http://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf

2/27/20

7


“cat”

Visualize activations for a particular image

Source


“cat”

Visualize activations for a particular image

Source

Visualize pixel values responsible for the

activation

1. Unpool2. Rectify3. Deconvolve

http://yosinski.com/media/papers/Yosinski__2015__ICML_DL__Understanding_Neural_Networks_Through_Deep_Visualization__.pdf

http://yosinski.com/media/papers/Yosinski__2015__ICML_DL__Understanding_Neural_Networks_Through_Deep_Visualization__.pdf

2/27/20

8

Deep visualization toolbox

J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, and H. Lipson, Understanding neural networks through deep visualization, ICML DL workshop, 2015

YouTube video

Additional visualization techniques

A. Nguyen, J. Yosinski, J. Clune, Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks, ICML workshop, 2016

K. Simonyan, A. Vedaldi, and A. Zisserman, Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps, ICLR 2014

J. Springenberg, A. Dosovitskiy, T. Brox, M. Riedmiller, Striving for simplicity: The all convolutional net, ICLR workshop, 2015

http://yosinski.com/deepvis

https://www.youtube.com/watch?v=AgkfIQ4IGaM

http://www.evolvingai.org/files/mfv_icml_workshop_16.pdf

https://arxiv.org/pdf/1312.6034.pdf

https://arxiv.org/pdf/1412.6806.pdf

2/27/20

9

ZFNet

• Won the ImageNet competition 2013

• Nice analysis and visualization of AlexNet

• Introduced up-conv0

7.5

15

22.5

30

2010

2011

2012 (AlexNet)

2013 (ZFNet)


2015 (ResNet)

Top-

5 er

ror

ImageNet challenge

Non-deep

8 layers

19-22 layers

152 layers

Visualizing and Understanding Convolutional Networks, Zeiler & Fergus, ECCV 2014

AlexNet to ZFNet

Max Pool 3x3

Conv 11x11

LRN

Conv 5x5 Conv 5x5

Max Pool 3x3

LRN

Conv 3x3

Conv 3x3 Conv 3x3

Conv 3x3 Conv 3x3

Max Pool 3x3

Linear (4096)

Linear (4096)

Linear (1000)

227

227

Max Pool 3x3

Conv 7x7

LRN

Conv 5x5

Max Pool 3x3

LRN

Conv 3x3

Conv 3x3

Max Pool 3x3

Linear (4096)

Linear (4096)

Linear (1000)

227

227

Conv 3x3

stride=4, C=96

stride=2C=12

8

pad=0, stride=2

C=384

C=192

C=128

pad=0, stride=2

stride=2, C=96

stride=2

stride=2, C=256

pad=0, stride=2

C=512

C=1024C=512pad=0, stride=2

2/27/20

10

Activation maps in ZFNet

22755

2713 13 13

6

1 1 1

225110

2613 13 13

6

1 1 11

10

100

1000

input

conv 1

conv 2

conv 3

conv 4

conv 5

maxpool

fc 6

fc 7

fc 8Widt

h

AlexNet ZFNet

3

96256 384 384 256 256

4096 40961000

3

96256

5121024

512 512

4096 40961000

1

10

100

1000

10000

Channels


106 224 150 112 75 38 17 4172

416199

798 798

76 17 40

250

500

750

1000

conv 1

conv 2

conv 3

conv 4

conv 5

fc 6

fc 7

fc 8FL

OPs (M

)

AlexNet ZFNet

0.0 0.3 0.9 0.7 0.4

37.8

16.8 4.10.0 0.6 1.2 4.7 4.7

75.5

16.84.1

0

20

40

60

80

Para

mete

rs (M

)

2/27/20

11

VGG

• Deeper AlexNet/ZFNet0

7.5

15

22.5

30

2010

2011

2012 (AlexNet)

2013 (ZFNet)


2015 (ResNet)

Top-

5 er

ror

ImageNet challenge

Non-deep

8 layers

19-22 layers

152 layers

Very Deep Convolutional Networks for Large-Scale Image Recognition, Simonyan and Zisserman, ICLR 2015

ZFNet to VGG

Max Pool 3x3

Conv 7x7

LRN

Conv 5x5

Max Pool 3x3

LRN

Conv 3x3

Conv 3x3

Max Pool 3x3

Linear (4096)

Linear (4096)

Linear (1000)

Conv 3x3

stride=2, C=96

stride=2stride=2, C=256

pad=0, stride=2C=512

C=1024

C=512pad=0, stride=2

Conv 3x3

Max Pool 2x2

Conv 3x3

Conv 3x3

Max Pool 2x2

Conv 3x3

Conv 3x3

Max Pool 2x2

Conv 3x3

Conv 3x3

Conv 3x3

Max Pool 2x2

Conv 3x3

Conv 3x3

Conv 3x3

Max Pool 2x2

Conv 3x3

Conv 3x3

Linear (4096)

Linear (4096)

Linear (1000)

C=64C=64

C=128C=128

C=256C=256C=256

C=512C=512C=512

C=512C=512C=512

stride=2

stride=2

stride=2

stride=2

stride=2

2/27/20

12

Insights in VGG

• Why use smaller filters?

• Factorization≈Conv 3x3

Conv 3x3Conv 5x5

Training VGG

• Vanishing gradients

Conv 3x3

Max Pool 2x2

Conv 3x3

Conv 3x3

Max Pool 2x2

Conv 3x3

Conv 3x3

Max Pool 2x2

Conv 3x3

Conv 3x3

Conv 3x3

Max Pool 2x2

Conv 3x3

Conv 3x3

Conv 3x3

Max Pool 2x2

Conv 3x3

Conv 3x3

Linear (4096)

Linear (4096)

Linear (1000)

C=64C=64

C=128C=128

C=256C=256C=256

C=512C=512C=512

C=512C=512C=512

stride=2

stride=2

stride=2

stride=2

stride=2

2/27/20

13


0.1

1.9

0.9

1.9

0.9

1.9

0.50.9

1.9 1.9

0.50.50.50.1 0.00.0

01122

conv 1.1

conv 1.2

conv 2.1

conv 2.2

conv 3.1

conv 3.2

conv 3.3

conv 4.1

conv 4.2

conv 4.3

conv 5.1

conv 5.2

conv 5.3

fc 6

fc 7

fc 8FL

OPs

(G)

0.00.0 0.1 0.1 0.30.60.6 1.22.42.42.42.42.4

102.8

16.84.10

30

60

90

120

Para

mete

rs (M

)

1x1 convolutions and factorization

• Convolutions are linear operators

• Can we make them non-linear?

Network-in-Network, Lin et al., ICLR 2014

Figure source: "Network-in-Network", ICLR 2014 paper

2/27/20

14

Factorized convolution

• Implements “non-linear” convolution

• Less computation, fewer parameters

Conv 3x3

ReLU

Conv 1x1

Inception architecture

• GoogLeNet

• Evolution of Network-in-Network

Going deeper with convolutions, Szegedy et al. CVPR 2015.

Conv 5x5

Concat

Conv 3x3Conv 1x1 Conv 1x1

Max Pool 3x3Conv 1x1Conv 1x1

2/27/20

15

Inception architecture

Residual Networks

• Uses residual connections to build deeper networks

• Activation maps are additive

0

7.5

15

22.5

30

2010

2011

2012 (AlexNet)

2013 (ZFNet)


2015 (ResNet)

Top-

5 er

ror

ImageNet challenge

Non-deep

8 layers

19-22 layers

152 layers

Deep Residual Learning for Image Recognition, He et al., CVPR 2016

2/27/20

16

Residual blocks

• Add shortcut connections for gradients

• Identity

• Strided 1x1 convolution

Conv

ReLU

BN

Conv

BN

!"

!"#$

ReLU

+

ResNet

• Multiple variants

• ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-152, ResNet-1001

Conv 1x1 (128)

Conv 3x3 (128)

Conv 1x1 (512)

Conv 1x1 (64)

Conv 3x3 (64)

Conv 1x1 (256)

Conv 1x1 (256)

Conv 3x3 (256)

Conv 1x1 (1024)

Conv 1x1 (512)

Conv 3x3 (512)

Conv 1x1 (2048)

Avg Pool

x3

x8

x36

x3

ResNet-152Conv 7x7 (64)

Max Pool 3x3

Documents

07 cnn architectures - George Mason Universitykosecka/cs747/07_cnn_architectures.pdf · 2020-02-27 · What’s happening inside the black box? “cat” What about FC layers? Source: