Large-Scale Object Recognition with Weak Supervision Weiqiang Ren, Chong Wang, Yanhua Cheng, Kaiqi Huang, Tieniu Tan {wqren,cwang,yhcheng,kqhuang,tnt}@nlpr.ia.ac.cn

Task2 : Classification + Localization Task 2b: Classification + localization with additional training data Ordered by classification error 1.Only classification labels are used 2.Full image as object location

Outline Motivation Method Results

Motivation

Knowing where to look, recognizing objects will be easier ! However, in the classification-only task, no annotations of object location are available. Weakly Supervised Localization Why Weakly Supervised Localization (WSL)?

Current WSL Results on VOC07

13.9: Weakly supervised object detector learning with model drift detection, ICCV 2011 15.0: Object-centric spatial pooling for image classification, ECCV 2012 22.4: Multi-fold mil training for weakly supervised object localization, CVPR 2014 22.7: On learning to localize objects with minimal supervision, ICML 2014 26.4: Weakly supervised object detection with posterior regularization, BMVC 2014 31.6: Weakly supervised object localization with latent category learning, ECCV 2014 Sep 11, Poster Session 4A, #34 26.2: Discovering Visual Objects in Large-scale Image Datasets with Weak Supervision, submitted to TPAMI

VOC 2007Results Ours31.6 DPM 5.033.7 Weakly Supervised Object Localization with Latent Category Learning ECCV 2014 VOC 2007Results Ours26.2 DPM 5.033.7 Discovering Visual Objects in Large-scale Image Datasets with Weak Supervision Submitted to TPAMI Our Work For the consideration of high efficiency in large-scale tasks, we use the second one.

Method

Framework Conv Layers FC Layers Input Images Cls Prediction Det Prediction Rescoring 2 1 3 4

1 st : CNN Architecture Chatfield et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets

2 nd : MILinear SVM

Good region proposal algorithms High recall High overlap Small number Low computation cost MCG pretrained on VOC 2012 Additional Data Training: 128 windows/ image Testing: 256 windows/image Compared to Selective Search (~2000) MILinear : Region Proposal

Low Level Features SIFT, LBP, HOG Shape context, Gabor, Mid-Level Features Bag of Visual Words (BoVW) Deep Hierarchical Features Convolutional Networks Deep Auto-Encoders Deep Belief Nets MILinear: Feature Representations

Clustering KMeans Topic Model pLSA, LDA, gLDA CRF Multiple Instance Learning DD, EMDD, APR MI-NN, MI-SVM, mi-SVM MILBoost MILinear: Positive Window Mining

Multiple instance Linear SVM Optimization: trust region Newton A kind of Quasi Newton method Working in the primal Faster convergence MILinear: Objective Function and Optimization

MILinear: Optimization Efficiency

3 rd : Detection Rescoring Rescoring with softmax 1000 classes 128 boxes max train softmax 1000 dim Softmax: consider all the categories simultaneously at each minibatch of the optimization Suppress the response of other appearance similar object categories

4 th : Classification Rescoring Linear Combination 1000 dim One funny thing: We have tried some other strategies of score combination, but it seems not working !

Results

1 st : Classification without WSL MethodTop 5 Error Baseline with one CNN :13.7 Average with four CNNs:12.5

2 nd : MILinear on ImageNet 2014 MethodsDetection Error Baseline (Full Image)61.96 MILinear40.96 Winner25.3

2 nd : MILinear on VOC 2007

2 nd : MILinear on ILSVRC 2013 detection mAP: 9.63%! vs 8.99% (DPM5.0)

2 nd : MILinear for Classification MethodsTop 5 Error Milinear17.1

3 rd : WSL Rescoring (Softmax) MethodTop 5 Error Baseline with one CNN :13.7 Average with four CNN :12.5 MILinear17.1 MILinear + Rescore13.5 The Softmax based rescoring successfully suppresses the predictions of other appearance similar object categories !

4 th : Cls and WSL Combinataion MethodTop 5 Error Baseline with one CNN model:13.7 Average with four CNN models:12.5 MILinear17.1 MILinear + Rescore13.5 Cls (12.5) + MILinear (13.5)11.5 WSL and Cls can be complementary to each other!

Russakovsky et al. ImageNet Large Scale Visual Object Challenge.

Conclusion WSL always helps classification WSL has large potential: WSL data is cheap

Thank You!

Documents

Large-Scale Object Recognition with Weak Supervision Weiqiang Ren, Chong Wang, Yanhua Cheng, Kaiqi Huang, Tieniu Tan {wqren,cwang,yhcheng,kqhuang,tnt}@nlpr.ia.ac.cn