13
SPP-net Spatial Pyramid Pooling in Deep Convolutional Networks Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun Microsoft Research Asia Visual Computing Group

SPP-net - ImageNetimage-net.org/challenges/LSVRC/2014/slides/sppnet_ilsvrc...“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S

Embed Size (px)

Citation preview

Page 1: SPP-net - ImageNetimage-net.org/challenges/LSVRC/2014/slides/sppnet_ilsvrc...“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S

SPP-netSpatial Pyramid Poolingin Deep Convolutional Networks

Kaiming He

Xiangyu Zhang

Shaoqing Ren

Jian Sun

Microsoft Research Asia

Visual Computing Group

Page 2: SPP-net - ImageNetimage-net.org/challenges/LSVRC/2014/slides/sppnet_ilsvrc...“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S

“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

Highlights

• ILSVRC 2014 (all provided-data tracks)

• DET - 2nd

• CLS - 3rd

• LOC - 5th

• ECCV 2014 paper

• Published 2 months ago (arXiv:1406.4729v1, June 18)

• Details disclosed (arXiv:1406.4729v2)

Page 3: SPP-net - ImageNetimage-net.org/challenges/LSVRC/2014/slides/sppnet_ilsvrc...“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S

“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

Overview

• SPP-net- a new network structure

• Classification- improves all CNNs

• Detection- 20-60x faster than R-CNN, as accurate

Page 4: SPP-net - ImageNetimage-net.org/challenges/LSVRC/2014/slides/sppnet_ilsvrc...“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S

“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

Spatial Pyramid Matching

• SPM: very successful in traditional computer vision[Grauman & Darrell, ICCV 2005] “The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features”

[Lazebnik et al, CVPR 2006] “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories”

dense SIFT encoded(VQ, SC, FV)

SPM SVM

prediction

“fc layers”simply pooling?“conv layers”CNN

counterparts

Page 5: SPP-net - ImageNetimage-net.org/challenges/LSVRC/2014/slides/sppnet_ilsvrc...“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S

“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

SPP-net: SPM in CNN

4096

1000

4096

traditionalCNN

fixed size conv fc

SPP-net

any size

4096

1000

4096

spatial pyramid pooling

• Fix bin numbers• DO NOT fix bin size

Page 6: SPP-net - ImageNetimage-net.org/challenges/LSVRC/2014/slides/sppnet_ilsvrc...“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S

“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

SPP-net

• variable input size/scale• multi-size training

• multi-scale testing

• full-image view

• multi-level pooling• robust to deformation

• operates on feature maps• pooling in regions

conv layers

conv feature maps

concatenate

input image

…...

…...

spatial pyramid pooling layer

fc layers

Page 7: SPP-net - ImageNetimage-net.org/challenges/LSVRC/2014/slides/sppnet_ilsvrc...“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S

“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

14.76

13.92

13.52

11.97

14.14

13.54

12.80

11.12

13.64

13.33

12.33

10.95

10.00

10.50

11.00

11.50

12.00

12.50

13.00

13.50

14.00

14.50

15.00

ZF-5 Convnet*-5 Overfeat-5 Overfeat-7

ILSVRC top-5 val (10-view)

no-SPP baselines

+ multi-size training

multi-level pooling

All CNNs improved!

4 architectures

Page 8: SPP-net - ImageNetimage-net.org/challenges/LSVRC/2014/slides/sppnet_ilsvrc...“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S

“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

ILSVRC 2014 CLS Results

multiple SPP-nets 8.06%

7-conv SPP-net, 10-view 10.95%

7-conv SPP-net, multi-scale/view 9.08%

• “shallow”• 7-conv, 1 Titan GPU, 3 weeks

• but potential• SPP can improve deeper nets: >1% gain post-competition

team top-5 test

GoogLeNet 6.66

Oxford VGG 7.32

ours 8.06

Howard 8.11

DeeperVision 9.50

NUS-BST 9.79

TTIC_ECP 10.22

7-conv SPP-net, 96-view+2-full 9.08%

Page 9: SPP-net - ImageNetimage-net.org/challenges/LSVRC/2014/slides/sppnet_ilsvrc...“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S

“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

Detection: SPP on Regions

SPP

conv feature maps

conv layers

input image

region

…...

fc layers

Page 10: SPP-net - ImageNetimage-net.org/challenges/LSVRC/2014/slides/sppnet_ilsvrc...“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S

“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

RCNN vs. SPP

• image regions vs. feature map regions

SPP-net1 net on full image

image

net

feature

featurefeature

net

feature

image

net

feature

net

feature

net

feature

R-CNN2000 nets on image regions

Page 11: SPP-net - ImageNetimage-net.org/challenges/LSVRC/2014/slides/sppnet_ilsvrc...“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S

“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

• With regional features, we can do everything of RCNN• fine-tune, SVM, bbox regression…

• similar accuracy, much faster

SPP-net1-scale

SPP-net5-scale

RCNN

mAP 58.0 59.2 58.5

GPU time / img 0.14s 0.38s 9s

speed-up 64x 24x -

VOC 2007

Page 12: SPP-net - ImageNetimage-net.org/challenges/LSVRC/2014/slides/sppnet_ilsvrc...“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S

“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

SPP-net RCNN

GPU time / img 0.6s 32s

40k test imgs 8 hours 15 days

cost of a single model

ILSVRC 2014 DET Results

“provided data” track

mAP

NUS 37.2

ours, multi SPP-nets 35.1

UvA 32.0

ours, 1 SPP-net 31.8

Southeast-CASIA 30.4

1-HKUST 28.8

CASIA_CRIPAC_2 28.6

Page 13: SPP-net - ImageNetimage-net.org/challenges/LSVRC/2014/slides/sppnet_ilsvrc...“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition” K. He, X. Zhang, S

“Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition”K. He, X. Zhang, S. Ren, J. Sun. ECCV 2014

• Conclusion• SPM in CNNs• CLS: improve all CNNs in the literature• DET: practical, fast, and accurate

• Future work• SPP on advanced networks

• Resources• code, config, tech report…http://research.microsoft.com/en-us/um/people/kahe/

• Acknowledgement• We thank NVIDIA for the GPU donation.