Learning Object Detectors From Weakly Supervised Image Data

КОМПЬЮТЕРНОЕ ЗРЕНИЕ: ОБУЧЕНИЕ РАСПОЗНАВАНИЮ ОБЪЕКТОВ

Kate Saenko, University of Massachusetts, Lowell

COMPUTER VISION: LEARNING TO DETECT OBJECTS

Kate Saenko, University of Massachusetts, Lowell

What is computer vision?3

Computer Vision4

Terminator 2

we’re not quite there yet, but….

terminator 2, enemy of the state (from UCSD “Fact or Fiction” DVD)

Machine Learning: What is it?

Program a computer to learn from experience

Learn from “big data”

Machine Learning in practice

Machine learning is not perfect7

Machine learning is not perfect8

Personal photo albums

Lots of image data available!

Data for computer vision

http://www.picsearch.com/

What are applications of computer vision?11

Surveillance and security

Computer Vision: Surveillance and Security

Smart cars

Mobileye Vision systems currently in high-end BMW, GM, Volvo models By 2010: 70% of car manufacturers

Slide content courtesy of Amnon Shashua

http://www.mobileye.com/

Scientific Images

Medical Imaging

Image guided surgeryGrimson et al., MIT

3D imagingMRI, CT

slide by S. Seitz

http://groups.csail.mit.edu/vision/medical-vision/surgery/surgical_navigation.html

Vision for Robotics

http://www.robocup.org/NASA’s Mars Spirit Roverhttp://en.wikipedia.org/wiki/Spirit_rover

slide by S. Seitz

http://www.robocup.org/

http://en.wikipedia.org/wiki/Spirit_rover

http://upload.wikimedia.org/wikipedia/commons/d/d8/NASA_Mars_Rover.jpg

Object Detection: Face Detection

Viola and Jones, Robust object detection using a boosted cascade of simple features, CVPR 2001

What is object detection?18

Goal of object detection19

Detect: PERSON

Why is object detection difficult?20

Why is object detection difficult?21

Can you detect all objects in this image?

Easy to collect data on the web!22

Difficult to label image annotations23

Easy to label from search engine

Much more difficult and costly to label

dog apple

dog apple

Goal of this research:24

Learn from weakly labeled data!

How well can we do without bounding box labels?

25

Computer detecting pedestrians

26

Computer detecting 7,000 object categories

How well can we do without bounding box labels?

Join work with Karim Ali

Confidence-rated Multiple instance Boosting for Detection

Motivation28

Object Detection High accuracy requires large labeled data sets Scalability

Reducing annotation requirements Semi-supervised Learning Active Learning Multiple-Instance Learning

Overview29

CR-MILBOOST

Multiple instance learning with noise30

MI Learning cannot handle noisy bags

Outline31

Reminder: What is MIL?

CR-MILBoost (CVPR’14)

Conclusion & Future Work

Discussion

Reminder: What is MIL?32

Supervised Learning Each instance has an associated label

MIL: Weaker Supervision Examples come in bags Each Bag has a label

Negative Bag: all instances in bag are negative Positive Bag: at least one instance in bag is positive

Supervised vs MIL (binary)33

Supervised Learning MI Learning

Related Methods34

How to estimate latent labels for positives

Gartner, ICML’02 Xu, ICML’04 Andrews, NIPS’03

Bunescu, ICML’07 SVM Constraints

Viola, NIPS’07

Supervised MIL

CR-MILBOOST35

MILBoost

CR-MILBOOST36

MILBoost

CR-MILBOOST37

Two Step Procedure Estimate Probabilities on latent label Integrate estimate in new loss

Mitigates label estimation error by incorporating priors

CR-MILBOOST38

Step 1

CR-MILBOOST39

Step 2

CR-MILBOOST40

Step 2

Experiments: Features41

Weak Learners: An edge orientation A sub-window A threshold

Simple, Efficient Q=4, number of stumps

Experiments: Pedestrian Detection42

Training Data 200 images automatically downloaded from the web 200 “objectness” bounding boxes


Testing Data INRIA Person 300 images containing 600 pedestrians




Experiments: Horse Detection47

Training Data 200 images automatically downloaded from the web 200 “objectness” bounding boxes


Testing Data 200 images containing 200 side-view horses




Conclusion52

New MIL method: CR-MILBOOST Two step procedure

Dramatic increase in performance 200% on two datasets

Quality of selected examples still suffer from additional ambiguity when compared to the fully supervised examples

Joint work with Judy Hoffman, Eric Tzeng, Sergio Guadarrama and Trevor Darrell at UC Berkeley

Adapting Deep CNNs from Classification to Detection

54

Recall: classification is easier than detection55

Classification label: Easy to label

Detection label: much more difficult and costly!

dog apple

dog apple

ICLASSIFY

dog

apple

IDET

dog

apple

ICLASSIFY

cat

WCLASSIFYdog

WCLASSIFYapple

ClassifiersWDET

dog

WDETapple

Detectors

WCLASSIFYcat WDET

cat IDET

?

Main idea behind the approach

cat: 0.90

dog: 0.85

airplane: 0.05

person: 0.10

layers 1-5

fc6 fc7fcA

fcB

Classification data from categories A and B

Train Classification CNN

cat

dog

Deep Convolutional Neural Network

dog: 0.87

person: 0.15

cat: 0.90

dog: 0.85

background: 0.25

airplane: 0.05

person: 0.10

layers 1-5

det layers 1-5

fc6

detfc6

fc7

detfc7

fcA

fcB

detfcB

Classification data from categories A and B

Train Classification CNN

Detection data from categories B

Labeledwarped region

Train adapteddetection CNN

dog

cat

dog

background

background: 0.25

detlayers 1-5

detfc6

detfc7

Final Combined and fully adapted CNN

cat: 0.90

airplane: 0.02detfcA

dog: 0.45

person: 0.15

detfcB

adapt

background

(c) Output Layer Adaptation

(a) C

lass

ifica

tion

CNN

(b) Hidden Layer Adaptation

Results on ILSVRC 2013 Detection



Preliminary results on 7K categories63

Conclusion64

Presented two new methods for object detector training with minimal bounding box annotation MIL based method for learning from results of image

search Adaptation from classification to detection task

Questions?65

Science

Learning Object Detectors From Weakly Supervised Image Data