Upload
vukhuong
View
222
Download
1
Embed Size (px)
Citation preview
SPP-netSpatial Pyramid Poolingin Deep Convolutional Networks
Kaiming He
Xiangyu Zhang
Shaoqing Ren
Jian Sun
Microsoft Research Asia
Visual Computing Group
“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014
Highlights
• ILSVRC 2014 (all provided-data tracks)
• DET - 2nd
• CLS - 3rd
• LOC - 5th
• ECCV 2014 paper
• Published 2 months ago (arXiv:1406.4729v1, June 18)
• Details disclosed (arXiv:1406.4729v2)
“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014
Overview
• SPP-net- a new network structure
• Classification- improves all CNNs
• Detection- 20-60x faster than R-CNN, as accurate
“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014
Spatial Pyramid Matching
• SPM: very successful in traditional computer vision[Grauman & Darrell, ICCV 2005] “The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features”
[Lazebnik et al, CVPR 2006] “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories”
dense SIFT encoded(VQ, SC, FV)
SPM SVM
prediction
“fc layers”simply pooling?“conv layers”CNN
counterparts
“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014
SPP-net: SPM in CNN
4096
1000
4096
traditionalCNN
fixed size conv fc
SPP-net
any size
4096
1000
4096
spatial pyramid pooling
• Fix bin numbers• DO NOT fix bin size
“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014
SPP-net
• variable input size/scale• multi-size training
• multi-scale testing
• full-image view
• multi-level pooling• robust to deformation
• operates on feature maps• pooling in regions
conv layers
conv feature maps
concatenate
input image
…...
…...
spatial pyramid pooling layer
fc layers
“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014
14.76
13.92
13.52
11.97
14.14
13.54
12.80
11.12
13.64
13.33
12.33
10.95
10.00
10.50
11.00
11.50
12.00
12.50
13.00
13.50
14.00
14.50
15.00
ZF-5 Convnet*-5 Overfeat-5 Overfeat-7
ILSVRC top-5 val (10-view)
no-SPP baselines
+ multi-size training
multi-level pooling
All CNNs improved!
4 architectures
“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014
ILSVRC 2014 CLS Results
multiple SPP-nets 8.06%
7-conv SPP-net, 10-view 10.95%
7-conv SPP-net, multi-scale/view 9.08%
• “shallow”• 7-conv, 1 Titan GPU, 3 weeks
• but potential• SPP can improve deeper nets: >1% gain post-competition
team top-5 test
GoogLeNet 6.66
Oxford VGG 7.32
ours 8.06
Howard 8.11
DeeperVision 9.50
NUS-BST 9.79
TTIC_ECP 10.22
…
7-conv SPP-net, 96-view+2-full 9.08%
“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014
Detection: SPP on Regions
SPP
conv feature maps
conv layers
input image
region
…...
fc layers
“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014
RCNN vs. SPP
• image regions vs. feature map regions
SPP-net1 net on full image
image
net
feature
featurefeature
net
feature
image
net
feature
net
feature
net
feature
R-CNN2000 nets on image regions
“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014
• With regional features, we can do everything of RCNN• fine-tune, SVM, bbox regression…
• similar accuracy, much faster
SPP-net1-scale
SPP-net5-scale
RCNN
mAP 58.0 59.2 58.5
GPU time / img 0.14s 0.38s 9s
speed-up 64x 24x -
VOC 2007
“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014
SPP-net RCNN
GPU time / img 0.6s 32s
40k test imgs 8 hours 15 days
cost of a single model
ILSVRC 2014 DET Results
“provided data” track
mAP
NUS 37.2
ours, multi SPP-nets 35.1
UvA 32.0
ours, 1 SPP-net 31.8
Southeast-CASIA 30.4
1-HKUST 28.8
CASIA_CRIPAC_2 28.6
“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014
• Conclusion• SPM in CNNs• CLS: improve all CNNs in the literature• DET: practical, fast, and accurate
• Future work• SPP on advanced networks
• Resources• code, config, tech report…http://research.microsoft.com/en-us/um/people/kahe/
• Acknowledgement• We thank NVIDIA for the GPU donation.