Faster R-CNN: Towards real-time object detection with region proposal networks (UPC Reading Group)

Faster R-CNN:Towards Real-Time Object Detection with

Region Proposal Networks

Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun[paper@NIPS15][arXiv][python][matlab][slides by R. Girshick]

Slides by Amaia Salvador [GDoc]Computer Vision Reading Group (01/03/2016)

1. Introduction

Object Detection

Object Detection: Previously...

DPM. P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan. Object Detection with Discriminatively Trained Part Based Models. In IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, No. 9, Sep. 2010

Hand-crafted features + Sliding Window

Object Detection: Previously...

R-CNN. Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014, June). Rich feature hierarchies for accurate object detection and semantic segmentation. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on(pp. 580-587). IEEE.SPPnet. He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 37(9), 1904-1916.Fast R-CNN. Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1440-1448).

R-CNN SPPnet Fast R-CNN

CNN features + Object Proposals

Object Detection: Limitations

Selective Search CPMC

Object Proposal computation is the bottleneck in current state of the art object detection systems

Selective Search. Van de Sande, K. E., Uijlings, J. R., Gevers, T., & Smeulders, A. W. (2011, November). Segmentation as selective search for object recognition. InComputer Vision (ICCV), 2011 IEEE International Conference on (pp. 1879-1886). IEEE.CPMC. Carreira, J., & Sminchisescu, C. (2010, June). Constrained parametric min-cuts for automatic object segmentation. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on (pp. 3241-3248). IEEE.MCG. Arbeláez, P., Pont-Tuset, J., Barron, J., Marques, F., & Malik, J. (2014). Multiscale combinatorial grouping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 328-335). 6

Faster R-CNN: Motivation

Selective Search CPMC

Replace the usage of external Object Proposals with a Region Proposal Network (RPN).

2. Methodology

Faster R-CNN: Overview

Conv Layer 5

rsRPN RPN Proposals

RPN Proposals

Class probabilities

RoI pooling layerFC layersClass scores

Conv Layer 5

rsRPN RPN Proposals

RPN Proposals

Class probabilities

Region Proposal Network (RPN)

Objectness scores

Bounding Box Regression

In practice, k = 9 (3 different scales and 3 aspect ratios)

RPN: Loss Function

Predicted probability of being an object for anchor i

i = anchor index in minibatch

Coordinates of the predicted bounding box for anchor i

Ground truth objectness label

True box coordinates

= Number of anchors in minibatch (~ 256)Nreg = Number of anchor locations ( ~ 2400)

Log lossSmooth L1 loss

In practice = 10, so that both terms are roughly equally balanced

RPN: Positive/Negative Samples

An anchor is labeled as positive if:

(a) the anchor is the one with highest IoU overlap with a ground-truth box

(b) the anchor has an IoU overlap with a ground-truth box higher than 0.7

Negative labels are assigned to anchors with IoU lower than 0.3 for all ground-truth boxes.

50%/50% ratio of positive/negative anchors in a minibatch.

Conv Layer 5

rsRPN RPN Proposals

RPN Proposals

Class probabilities

Object Detection Network

15Fast R-CNN

Object Detection Network: Loss

*From Fast R-CNN

Predicted class scores

True class scores

True box coordinates

Predicted box coordinates

Log lossSmooth L1 loss

Fast R-CNN: Positive/Negative Samples

Positive samples are defined as those whose IoU overlap with a ground-truth bounding box is > 0.5.

Negative examples are sampled from those that have a maximum IoU overlap with ground truth in the interval [0.1, 0.5).

25%/75% ratio for positive/negative samples in a minibatch.

*From Fast R-CNN

Faster R-CNN: Training

Conv Layer 5

rsRPN RPN Proposals

RPN Proposals

Class probabilities

4-step training to share features for RPN and Fast R-CNN

Faster R-CNN: 4-step training

Conv Layer 5

rsRPN RPN Proposals

Step 1: Train RPN initialized with an ImageNet pre-trained model.

ImageNet weights(fine tuned)

Conv Layer 5

RPN Proposals (learned in 1)

Class probabilities

Step 2: Train Fast R-CNN with learned RPN proposals.

ImageNet weights(fine tuned)

Conv Layer 5

rsRPN RPN Proposals

Step 3: The model trained in 2 is used to initialize RPN and train again.

Weights from Step 2(fixed)

Conv Layer 5

RPN Proposals (learned in 3)

Class probabilities

Step 4: Fine tune FC layers of Fast R-CNN using same shared convolutional layers as in 3.

Weights from Step 2&3(fixed)

3. Experiments

Experiments: CNN Architectures

VGG-16: Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

ZF: Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer vision–ECCV 2014 (pp. 818-833). Springer International Publishing.

Experiments: Datasets

Experiments I: VOC 2007 & ZF

Comparison between Fast R-CNN trained with external object proposals (SS: Selective Search, EB: EdgeBoxes) with Faster R-CNN

Experiments I: VOC 2007 & ZF

Experiments II

Detection Accuracy

Timing (ms)

Experiments III

Experiments IV

One-Stage Detection: 1) Directly Refine and Classify Sliding Window locations

Two-Stage Proposal + Detection: 1) Learn Object Proposals 2) Refine and classify Object Proposals

Experiments V: MS COCO (arXiv)

Qualitative Results

4. Summary

Summary

● Region Proposal Network sharing convolutional features with Object Detection Network makes region generation step nearly cost-free.

● Quality of proposals is improved with RPN wrt SS and EB.

● Object Detection system at 5-17 fps.

Summary

● Faster R-CNN is the basis of the winners of COCO and ILSVRC 2015 object detection competitions [1].

● RPN is also used in the winning entries of ILSVRC 2015 localization [1] and COCO 2015 segmentation competitions [2].

[1] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” arXiv:1512.03385, 2015.[2] J. Dai, K. He, and J. Sun, “Instance-aware semantic segmentation via multi-task network cascades,” arXiv:1512.04412, 2015.

Thank you !

Questions?

Faster R-CNN: Towards real-time object detection with region proposal networks (UPC Reading Group)

Technology

Rethinking the Faster R-CNN Architecture for Temporal Action … · 2019-02-27 · Rethinking the Faster R-CNN Architecture for Temporal Action Localization Yu-Wei Chao1∗, Sudheendra

ShapeShifter: Robust Physical Adversarial Attack on Faster ... · ShapeShifter: Robust Physical Adversarial Attack on Faster R-CNN Object Detector 3 In this work, we propose ShapeShifter,

Faster R-CNN Features for Instance Search · 2016-05-30 · Faster R-CNN Features for Instance Search Amaia Salvador, Xavier Giro-i-Nieto, Ferran Marqu´ es´ Universitat Politecnica

k1 Faster R-CNN Features for Instance Search · 2016 Acknowledgements Paris Buildings TRECVID Instance Search 2013 (subset of 23k frames) Oxford Buildings Faster R-CNN Image- and

Faster R-CNN: Towards Real-Time Object Detection … Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian

Image-Based Food Calorie Estimation Using Knowledge on ...img.cs.uec.ac.jp/pub/conf17/171223ege_3_ppt.pdf · Faster R-CNNの導入 S. Ren et al. Faster R-CNN: Towards realtime object

Face Detection with the Faster R-CNN - arXiv · Face Detection with the Faster R-CNN Huaizu Jiang University of Massachusetts Amherst Amherst MA 01003 hzjiang@cs.umass.edu Erik Learned-Miller

Meta R-CNN : Towards General Solver for Instance-level Low ... · inference in Faster /Mask R-CNNs. Each RoI feature refers to a single object or background, so Faster /Mask R-CNN

Mask R-CNN - cseweb.ucsd.edu · Faster R-CNN: Fast R-CNN + RPN Region Proposal Network (RPN) after last convolutional layer RPN produces region proposals directly Can be extended

Automatic Detection of Welding Defects Using Faster R-CNN

Rethinking the Faster R-CNN Architecture for …Rethinking the Faster R-CNN Architecture for Temporal Action Localization Yu-Wei Chao1, Sudheendra Vijayanarasimhan 2, Bryan Seybold

Rethinking the Faster R-CNN Architecture for Temporal ... · Rethinking the Faster R-CNN Architecture for Temporal Action Localization Yu-Wei Chao1, Sudheendra Vijayanarasimhan 2,

Assessment of Faster R-CNN in Man-Machine Collaborative Searchopenaccess.thecvf.com/content_CVPR_2019/...Faster_R-CNN_in_Ma… · Assessment of Faster R-CNN in Man-Machine collaborative

Mask r-cnn · 2018-02-09 · Overview of Mask R-CNN •Goal: to create a framework for Instance segmentation •Builds on top of Faster R-CNN by adding a parallel branch •For each

Faster R-CNN with Region Proposal Reﬁnementcs231n.stanford.edu/reports/2017/pdfs/112.pdf · Faster R-CNN with Region Proposal Reﬁnement Peng Yuan Electrical Engineering Department

CHEST X-RAY IMAGE CLASSIFICATION USING FASTER R-CNN · that applied R-CNN and Faster R-CNN in medical imaging area. Our work complements the existing in demonstrating the application

ディープラーニングによる背景船舶のタンク・ホールド内画像 … · Region of Interest (RoI) pooling 背景ヒト CNN ウマ CNN Fast R-CNN CNN Faster

Siam R-CNN: Visual Tracking by Re-Detectionopenaccess.thecvf.com/content_CVPR_2020/papers/... · re-detector, Siam R-CNN, an adaptation of Faster R-CNN [54] with a Siamese architecture,

Lab2 Object Detection & Mask · 2020-01-03 · Faster R-cnn. Mask R-Cnn •Extending Faster R-CNN for Pixel Level Segmentation •Use RoI Align. FPN(feature pyramid network) You Only

Mask R-CNN - Facebook Research · PDF fileMask R-CNN is conceptually simple: Faster R-CNN has two outputs for each candidate object, a class label and a ... (RPN), proposes candidate