Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
2/27/20
1
� 2019 Philipp Krähenbühl and Chao-Yuan WuAdopted from:
CNN architectures and visualization
AlexNet
• Won the ImageNet competition 2012
• Start of deep learning revolution in computer vision
0
7.5
15
22.5
30
2010
2011
2012 (AlexNet)
2013 (ZFNet)
2014 (VGG/GoogLeNet)
2015 (ResNet)
Top-
5 er
ror
ImageNet challenge
Non-deep
8 layers
19-22 layers
152 layers
ImageNet Classification with Deep Convolutional Neural Networks, Krizhevsky et al., NIPS 2012
2/27/20
2
AlexNet
Max Pool 3x3
Conv 11x11
LRN
Conv 5x5 Conv 5x5
Max Pool 3x3
LRN
Conv 3x3
Conv 3x3 Conv 3x3
Conv 3x3 Conv 3x3
Max Pool 3x3
Linear (4096)
Linear (4096)
Linear (1000)
no pad
no padpad
...
stride=4, C=96
stride=2C=128
pad=0, stride=2
C=384
C=192
C=128
pad=0, stride=2
227
227
Max Pool 3x3
Conv 11x11
LRN
Conv 5x5 Conv 5x5
Max Pool 3x3
LRN
Conv 3x3
Conv 3x3 Conv 3x3
Conv 3x3 Conv 3x3
Max Pool 3x3
Linear (4096)
Linear (4096)
Linear (1000)
227
227
Activation maps in AlexNet
55
55
96
2727
256
1313
384
1313
384
1313
256
66
256
9216
4096
4096
1000
2/27/20
3
Activation maps in AlexNet
22755 27 13 13 13 6
1 1 11
10
100
1000
input
conv 1
conv 2
conv 3
conv 4
conv 5
maxpool
fc 6
fc 7
fc 8W
idth
3
96 256 384 384 256 25640964096
1000
1
10
100
1000
10000
Cha
nnel
s
Parameters and computation
106224
150 112 75 38 17 40
75150225300
conv 1
conv 2
conv 3
conv 4
conv 5
fc 6
fc 7
fc 8FL
OPs
(M)
0.0 0.3 0.9 0.7 0.4
37.8
16.84.1
0
10
20
30
40
Para
met
ers
(M)
2/27/20
4
What’s happening inside the black box?
“cat”
Visualize first-layer weights directly
What’s happening inside the black box?
“cat”
Not too helpful for subsequent layers
Features from a CIFAR10 network, via Stanford CS231n
2/27/20
5
What’s happening inside the black box?
“cat”Visualize maximally activating patches:
pick a unit; run many images through the network; visualize patches that produce
the highest output values
What’s happening inside the black box?
“cat”
Visualize maximally activating patches
2/27/20
6
What’s happening inside the black box?
“cat”
What about FC layers?
Source: Stanford CS231n
Visualize nearest neighbor images
according to activation vectors
What’s happening inside the black box?
“cat”
What about FC layers?
Source: Andrej Karpathy
Fancy dimensionality reduction, e.g., t-SNE
2/27/20
7
What’s happening inside the black box?
“cat”
Visualize activations for a particular image
Source
What’s happening inside the black box?
“cat”
Visualize activations for a particular image
Source
Visualize pixel values responsible for the
activation
1. Unpool2. Rectify3. Deconvolve
2/27/20
8
Deep visualization toolbox
J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, and H. Lipson, Understanding neural networks through deep visualization, ICML DL workshop, 2015
YouTube video
Additional visualization techniques
A. Nguyen, J. Yosinski, J. Clune, Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks, ICML workshop, 2016
K. Simonyan, A. Vedaldi, and A. Zisserman, Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps, ICLR 2014
J. Springenberg, A. Dosovitskiy, T. Brox, M. Riedmiller, Striving for simplicity: The all convolutional net, ICLR workshop, 2015
2/27/20
9
ZFNet
• Won the ImageNet competition 2013
• Nice analysis and visualization of AlexNet
• Introduced up-conv0
7.5
15
22.5
30
2010
2011
2012 (AlexNet)
2013 (ZFNet)
2014 (VGG/GoogLeNet)
2015 (ResNet)
Top-
5 er
ror
ImageNet challenge
Non-deep
8 layers
19-22 layers
152 layers
Visualizing and Understanding Convolutional Networks, Zeiler & Fergus, ECCV 2014
AlexNet to ZFNet
Max Pool 3x3
Conv 11x11
LRN
Conv 5x5 Conv 5x5
Max Pool 3x3
LRN
Conv 3x3
Conv 3x3 Conv 3x3
Conv 3x3 Conv 3x3
Max Pool 3x3
Linear (4096)
Linear (4096)
Linear (1000)
227
227
Max Pool 3x3
Conv 7x7
LRN
Conv 5x5
Max Pool 3x3
LRN
Conv 3x3
Conv 3x3
Max Pool 3x3
Linear (4096)
Linear (4096)
Linear (1000)
227
227
Conv 3x3
stride=4, C=96
stride=2C=12
8
pad=0, stride=2
C=384
C=192
C=128
pad=0, stride=2
stride=2, C=96
stride=2
stride=2, C=256
pad=0, stride=2
C=512
C=1024C=512pad=0, stride=2
2/27/20
10
Activation maps in ZFNet
22755
2713 13 13
6
1 1 1
225110
2613 13 13
6
1 1 11
10
100
1000
input
conv 1
conv 2
conv 3
conv 4
conv 5
maxpool
fc 6
fc 7
fc 8Widt
h
AlexNet ZFNet
3
96256 384 384 256 256
4096 40961000
3
96256
5121024
512 512
4096 40961000
1
10
100
1000
10000
Channels
Parameters and computation
106 224 150 112 75 38 17 4172
416199
798 798
76 17 40
250
500
750
1000
conv 1
conv 2
conv 3
conv 4
conv 5
fc 6
fc 7
fc 8FL
OPs (M
)
AlexNet ZFNet
0.0 0.3 0.9 0.7 0.4
37.8
16.8 4.10.0 0.6 1.2 4.7 4.7
75.5
16.84.1
0
20
40
60
80
Para
mete
rs (M
)
2/27/20
11
VGG
• Deeper AlexNet/ZFNet0
7.5
15
22.5
30
2010
2011
2012 (AlexNet)
2013 (ZFNet)
2014 (VGG/GoogLeNet)
2015 (ResNet)
Top-
5 er
ror
ImageNet challenge
Non-deep
8 layers
19-22 layers
152 layers
Very Deep Convolutional Networks for Large-Scale Image Recognition, Simonyan and Zisserman, ICLR 2015
ZFNet to VGG
Max Pool 3x3
Conv 7x7
LRN
Conv 5x5
Max Pool 3x3
LRN
Conv 3x3
Conv 3x3
Max Pool 3x3
Linear (4096)
Linear (4096)
Linear (1000)
Conv 3x3
stride=2, C=96
stride=2stride=2, C=256
pad=0, stride=2C=512
C=1024
C=512pad=0, stride=2
Conv 3x3
Max Pool 2x2
Conv 3x3
Conv 3x3
Max Pool 2x2
Conv 3x3
Conv 3x3
Max Pool 2x2
Conv 3x3
Conv 3x3
Conv 3x3
Max Pool 2x2
Conv 3x3
Conv 3x3
Conv 3x3
Max Pool 2x2
Conv 3x3
Conv 3x3
Linear (4096)
Linear (4096)
Linear (1000)
C=64C=64
C=128C=128
C=256C=256C=256
C=512C=512C=512
C=512C=512C=512
stride=2
stride=2
stride=2
stride=2
stride=2
2/27/20
12
Insights in VGG
• Why use smaller filters?
• Factorization≈Conv 3x3
Conv 3x3Conv 5x5
Training VGG
• Vanishing gradients
Conv 3x3
Max Pool 2x2
Conv 3x3
Conv 3x3
Max Pool 2x2
Conv 3x3
Conv 3x3
Max Pool 2x2
Conv 3x3
Conv 3x3
Conv 3x3
Max Pool 2x2
Conv 3x3
Conv 3x3
Conv 3x3
Max Pool 2x2
Conv 3x3
Conv 3x3
Linear (4096)
Linear (4096)
Linear (1000)
C=64C=64
C=128C=128
C=256C=256C=256
C=512C=512C=512
C=512C=512C=512
stride=2
stride=2
stride=2
stride=2
stride=2
2/27/20
13
Parameters and computation
0.1
1.9
0.9
1.9
0.9
1.9
0.50.9
1.9 1.9
0.50.50.50.1 0.00.0
01122
conv 1.1
conv 1.2
conv 2.1
conv 2.2
conv 3.1
conv 3.2
conv 3.3
conv 4.1
conv 4.2
conv 4.3
conv 5.1
conv 5.2
conv 5.3
fc 6
fc 7
fc 8FL
OPs
(G)
0.00.0 0.1 0.1 0.30.60.6 1.22.42.42.42.42.4
102.8
16.84.10
30
60
90
120
Para
mete
rs (M
)
1x1 convolutions and factorization
• Convolutions are linear operators
• Can we make them non-linear?
Network-in-Network, Lin et al., ICLR 2014
Figure source: "Network-in-Network", ICLR 2014 paper
2/27/20
14
Factorized convolution
• Implements “non-linear” convolution
• Less computation, fewer parameters
Conv 3x3
ReLU
Conv 1x1
Inception architecture
• GoogLeNet
• Evolution of Network-in-Network
Going deeper with convolutions, Szegedy et al. CVPR 2015.
Conv 5x5
Concat
Conv 3x3Conv 1x1 Conv 1x1
Max Pool 3x3Conv 1x1Conv 1x1
2/27/20
15
Inception architecture
Residual Networks
• Uses residual connections to build deeper networks
• Activation maps are additive
0
7.5
15
22.5
30
2010
2011
2012 (AlexNet)
2013 (ZFNet)
2014 (VGG/GoogLeNet)
2015 (ResNet)
Top-
5 er
ror
ImageNet challenge
Non-deep
8 layers
19-22 layers
152 layers
Deep Residual Learning for Image Recognition, He et al., CVPR 2016
2/27/20
16
Residual blocks
• Add shortcut connections for gradients
• Identity
• Strided 1x1 convolution
Conv
ReLU
BN
Conv
BN
!"
!"#$
ReLU
+
ResNet
• Multiple variants
• ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-152, ResNet-1001
Conv 1x1 (128)
Conv 3x3 (128)
Conv 1x1 (512)
Conv 1x1 (64)
Conv 3x3 (64)
Conv 1x1 (256)
Conv 1x1 (256)
Conv 3x3 (256)
Conv 1x1 (1024)
Conv 1x1 (512)
Conv 3x3 (512)
Conv 1x1 (2048)
Avg Pool
x3
x8
x36
x3
ResNet-152Conv 7x7 (64)
Max Pool 3x3