Learning Object Detectors with Weak Supervision

Learning Object Detectors with Weak Supervision

Kun He

Committee members:

Prof. Stan Sclaroff

Prof. Margrit Betke

Prof. Pedro Felzenszwalb

Problem: object detection

09/24/2014Learning Object Detectors with Weak Supervision

2

Source: The PASCAL Visual Object Classes Challenge 2007

Supervised learning pipeline


3

• Image credit: Sudheendra Vijayanarasimhan

What about annotations?

• Example: Microsoft COCO (Lin et al ECCV’14)


4


Image credit: Tsung-Yi Lin


5



6

Example taken from Microsoft COCO dataset http://mscoco.org/explore/?id=79387

http://mscoco.org/explore/?id=79387



7

Example taken from Microsoft COCO dataset http://mscoco.org/explore/?id=79387

http://mscoco.org/explore/?id=79387

Relaxing annotation requirements

• Annotation process: laborious & error-prone

• Learn directly from the images! (weak supervision)


8

Literature review outline

• Weber et al ECCV’00, Fergus et al CVPR’03, Crandall & Huttenlocher ECCV’06

Generative models

Discriminative: Multiple Instance Learning (MIL)

• Vijayanarasimhan & Grauman CVPR’08, Siva & Xiang ICCV’11, Cinbis et al CVPR’14, Song et al ICML’14 …MI-SVM

• Deselaers et al IJCV’12MI-CRF


9



Generative models





10

Generative part-based models

• Detect sparse features → fit part-based model → determine (non-)existence of object

• Rob Fergus, Pietro Perona and Andrew Zisserman, CVPR’03


11

Generative models (Fergus et al CVPR’03)


12

• Likelihood ratio test

• Likelihood: product of Gaussians• Features: location X, scale S, appearance A

• h : hypothesis (part-based object configuration)

Foreground model

Background model


• Learning: maximum likelihood via EM• E-step: expectation wrt. ℎ

• M-step: update Gaussian parameters


13


• Learning: maximum likelihood via EM• E-step: expectation wrt. ℎ

• M-step: update Gaussian parameters


14

𝑂(#𝑝𝑎𝑟𝑡𝑠#𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠)Typical: 630


• Learned 6-part model for “face”


15


• Face: single-Gaussian appearance model fails


16


• Spotted cat: single-Gaussian shape model fails


17

Generative models: critiques

• GoodProbabilistic formulation

Models multiple factors

• Bad EM is slow

Limited modeling power


18

Generative models: critiques

• GoodProbabilistic formulation

Models multiple factors

• Bad EM is slow

Limited modeling power

• Discriminative models• Only model the decision boundary

• Usually perform better, eg. DPM (Felzenszwalb et al PAMI’10)


19



Generative models





20

Multiple Instance Learning (MIL)


21

Image credit: Samarjit Das

Multiple Instance Learning (MIL)

• Images as bags

• Candidate generation• Segmentation [Galleguillos et al ECCV’08]

• Objectness [Alexe et al PAMI’12]

• Selective Search [Uijlings et al IJCV’13]

• EdgeBoxes [Zitnick & Dollar ECCV’14]

• ……


22

MIL for learning detectors

• Chicken-and-egg problem / latent variable model

Optimize(positive_instances, model_parameters)

• EM-like algorithms (MI-SVM, MI-CRF)• Impute latent variables

• Update model parameters

• Iterate


23

latent



Generative models





24

SVM review


25

MI-SVM


26

MI-SVM

• “Witness”: identified positive instance within a positive bag


27

MI-SVM algorithm (Andrews et al NIPS’02)


28

1. Initialize

2. Update witnesses for positive bags•

3. Update model• solve fully-supervised SVM

4. Repeat

• Convergence: to local optimum

Progression of MI-SVM

• Source: R. Gokberk Cinbis, Jakob Verbeek and Cordelia Schmid, CVPR’14


29

MI-SVM: critiques

• GoodSimple optimization problem, solvers available

• Bad Sensitive to initialization

Witness update: no strong coupling between images


30



Generative models





31

Conditional Random Fields (CRF) for MIL

• Enforce similarity between witnesses


32

MI-CRF (Deselaers et al IJCV’12)

• Pairwise CRF


33

“objectness”

similarity

MI-CRF (Deselaers et al IJCV’12)

• Pairwise CRF

• “Objectness”

• Ω: generic “objectness”

• Π: class-specific shape score

• Υ: class-specific appearance score

• Similarity

• Λ: shape similarity

• Γ: appearance similarity


34

MI-CRF: algorithm


35

Localize objects by

optimizing global energy

MI-CRF: results• Example detections

• Models learned by DPM (Felzenszwalb et al PAMI’10) vs. MI-CRF


36

MI-CRF: critiques

• GoodStrong coupling between images

• Bad High complexity (fully-connected CRF)

Limited #candidates per image (<=100)


37



Generative models





38

Beyond MIL

• OPTIMOL: Li et al CVPR’07

• NEIL: Chen et al ICCV’13

Active learning

• Improving MI-SVM

Current research


39

Active learning

• Closing the loop


40

?

OPTIMOL (automatic Object Picture collecTion via Incremental MOdel Learning)

• Li-Jia Li, Gang Wang and Li Fei-Fei, CVPR’07


41

NEIL (Never-Ending Image Learner)


42

• Xinlei Chen, Abhinav Shrivastava and Abhinav Gupta, ICCV’13

Current research: improving MI-SVM

• MI-SVM (→ local optimum)1. Update witnesses independently

2. Update model parameters: solve SVM

• Idea: relax step 1 to

Still have convergence

Freedom to enforce desired properties


43

Current research: improving MI-SVM

• Enforcing similarity between witnesses

• Step t:

• Comparison: PASCAL VOC 2007, detection mAPcat cow dog

• MI-SVM 23.83, Ours 24.12


44

MI-SVM 34.8 43.7 22.2 10.4 7.8 36.2 22.0 20.6 11.1 21.4 28.7 38.0 19.6 23.7 19.8 35.4 9.8

Ours 38.9 42.4 22.5 10.4 10.6 38.3 17.2 28.0 14.5 18.9 23.4 35.6 18.8 23.2 20.3 35.8 11.3

Cats


45

And dogs


46

Summary

• Weakly supervised object detector learning

• Existing methods• Generative

• MI-SVM

• MI-CRF

• Future directions• Active learning (eg. OPTIMOL, NEIL)

• Current research: improving MI-SVM

• Open questions: part-based, multi-modal data, etc.


47

Documents

Learning Object Detectors with Weak Supervision