Large-Scale Object Recognition with Weak Supervision Weiqiang
Ren, Chong Wang, Yanhua Cheng, Kaiqi Huang, Tieniu Tan
{wqren,cwang,yhcheng,kqhuang,tnt}@nlpr.ia.ac.cn
Slide 2
Task2 : Classification + Localization Task 2b: Classification +
localization with additional training data Ordered by
classification error 1.Only classification labels are used 2.Full
image as object location
Slide 3
Outline Motivation Method Results
Slide 4
Motivation
Slide 5
Knowing where to look, recognizing objects will be easier !
However, in the classification-only task, no annotations of object
location are available. Weakly Supervised Localization Why Weakly
Supervised Localization (WSL)?
Slide 6
Current WSL Results on VOC07
Slide 7
13.9: Weakly supervised object detector learning with model
drift detection, ICCV 2011 15.0: Object-centric spatial pooling for
image classification, ECCV 2012 22.4: Multi-fold mil training for
weakly supervised object localization, CVPR 2014 22.7: On learning
to localize objects with minimal supervision, ICML 2014 26.4:
Weakly supervised object detection with posterior regularization,
BMVC 2014 31.6: Weakly supervised object localization with latent
category learning, ECCV 2014 Sep 11, Poster Session 4A, #34 26.2:
Discovering Visual Objects in Large-scale Image Datasets with Weak
Supervision, submitted to TPAMI
Slide 8
VOC 2007Results Ours31.6 DPM 5.033.7 Weakly Supervised Object
Localization with Latent Category Learning ECCV 2014 VOC
2007Results Ours26.2 DPM 5.033.7 Discovering Visual Objects in
Large-scale Image Datasets with Weak Supervision Submitted to TPAMI
Our Work For the consideration of high efficiency in large-scale
tasks, we use the second one.
Slide 9
Method
Slide 10
Framework Conv Layers FC Layers Input Images Cls Prediction Det
Prediction Rescoring 2 1 3 4
Slide 11
1 st : CNN Architecture Chatfield et al. Return of the Devil in
the Details: Delving Deep into Convolutional Nets
Slide 12
2 nd : MILinear SVM
Slide 13
Good region proposal algorithms High recall High overlap Small
number Low computation cost MCG pretrained on VOC 2012 Additional
Data Training: 128 windows/ image Testing: 256 windows/image
Compared to Selective Search (~2000) MILinear : Region
Proposal
Slide 14
Low Level Features SIFT, LBP, HOG Shape context, Gabor,
Mid-Level Features Bag of Visual Words (BoVW) Deep Hierarchical
Features Convolutional Networks Deep Auto-Encoders Deep Belief Nets
MILinear: Feature Representations
Multiple instance Linear SVM Optimization: trust region Newton
A kind of Quasi Newton method Working in the primal Faster
convergence MILinear: Objective Function and Optimization
Slide 17
MILinear: Optimization Efficiency
Slide 18
3 rd : Detection Rescoring Rescoring with softmax 1000 classes
128 boxes max train softmax 1000 dim Softmax: consider all the
categories simultaneously at each minibatch of the optimization
Suppress the response of other appearance similar object
categories
Slide 19
4 th : Classification Rescoring Linear Combination 1000 dim One
funny thing: We have tried some other strategies of score
combination, but it seems not working !
Slide 20
Results
Slide 21
1 st : Classification without WSL MethodTop 5 Error Baseline
with one CNN :13.7 Average with four CNNs:12.5
2 nd : MILinear on ILSVRC 2013 detection mAP: 9.63%! vs 8.99%
(DPM5.0)
Slide 25
2 nd : MILinear for Classification MethodsTop 5 Error
Milinear17.1
Slide 26
3 rd : WSL Rescoring (Softmax) MethodTop 5 Error Baseline with
one CNN :13.7 Average with four CNN :12.5 MILinear17.1 MILinear +
Rescore13.5 The Softmax based rescoring successfully suppresses the
predictions of other appearance similar object categories !
Slide 27
4 th : Cls and WSL Combinataion MethodTop 5 Error Baseline with
one CNN model:13.7 Average with four CNN models:12.5 MILinear17.1
MILinear + Rescore13.5 Cls (12.5) + MILinear (13.5)11.5 WSL and Cls
can be complementary to each other!
Slide 28
Russakovsky et al. ImageNet Large Scale Visual Object
Challenge.
Slide 29
Conclusion WSL always helps classification WSL has large
potential: WSL data is cheap