Upload
yandex
View
792
Download
17
Embed Size (px)
DESCRIPTION
One of the fundamental challenges in automatically detecting and localizing objects in images is the need to collect a large number of example images with annotated object locations (bounding boxes). The introduction of detection challenge datasets has propelled progress by providing the research community with enough fully annotated images to train competitive detectors for 20-200 classes. However, as we look forward towards the goal of scaling our systems to human level category detection, it becomes impractical to collect a large quantity of bounding box labels for tens, or even hundreds of thousands of categories. In this talk I will discuss recent work on enabling the training of detectors with weakly annotated images, i.e. images that are known to contain the object but with unknown object location (bounding box). The first approach I will present proposes a new multiple instance learning method for object detection that is capable of handling noisy automatically obtained annotations. Our approach consists of first obtaining confidence estimates over the label space and second incorporating these estimates within a new Boosting procedure. We demonstrate the efficiency of our procedure on two detection tasks, namely horse detection and pedestrian detection, where the training data is primarily annotated by a coarse area of interest detector, and show substantial improvements over existing MIL methods. I will also present a second, complimentary approach--a domain adaptation algorithm which learns the difference between the classification task and the detection task, and transfers this knowledge to classifiers for categories without bounding box annotated data, adapting them into detectors. Our method has the potential to enable detection for the tens of thousands of categories that lack bounding box annotations, yet have plenty of classification data in Imagenet. The approach is evaluated on the ImageNet LSVRC-2013 detection challenge.
КОМПЬЮТЕРНОЕ ЗРЕНИЕ: ОБУЧЕНИЕ РАСПОЗНАВАНИЮ ОБЪЕКТОВ
Kate Saenko, University of Massachusetts, Lowell
COMPUTER VISION: LEARNING TO DETECT OBJECTS
Kate Saenko, University of Massachusetts, Lowell
What is computer vision?3
Computer Vision4
Terminator 2
we’re not quite there yet, but….
terminator 2, enemy of the state (from UCSD “Fact or Fiction” DVD)
Machine Learning: What is it?
Program a computer to learn from experience
Learn from “big data”
Machine Learning in practice
Machine learning is not perfect7
Machine learning is not perfect8
Personal photo albums
Lots of image data available!
What are applications of computer vision?11
Surveillance and security
Computer Vision: Surveillance and Security
Smart cars
Mobileye Vision systems currently in high-end BMW, GM, Volvo models By 2010: 70% of car manufacturers
Slide content courtesy of Amnon Shashua
Scientific Images
Medical Imaging
Image guided surgeryGrimson et al., MIT
3D imagingMRI, CT
slide by S. Seitz
Vision for Robotics
http://www.robocup.org/NASA’s Mars Spirit Roverhttp://en.wikipedia.org/wiki/Spirit_rover
slide by S. Seitz
Object Detection: Face Detection
Viola and Jones, Robust object detection using a boosted cascade of simple features, CVPR 2001
What is object detection?18
Goal of object detection19
Detect: PERSON
Why is object detection difficult?20
Why is object detection difficult?21
Can you detect all objects in this image?
Easy to collect data on the web!22
Difficult to label image annotations23
Easy to label from search engine
Much more difficult and costly to label
dog apple
dog apple
Goal of this research:24
Learn from weakly labeled data!
How well can we do without bounding box labels?
25
Computer detecting pedestrians
26
Computer detecting 7,000 object categories
How well can we do without bounding box labels?
Join work with Karim Ali
Confidence-rated Multiple instance Boosting for Detection
Motivation28
Object Detection High accuracy requires large labeled data sets Scalability
Reducing annotation requirements Semi-supervised Learning Active Learning Multiple-Instance Learning
Overview29
CR-MILBOOST
Multiple instance learning with noise30
MI Learning cannot handle noisy bags
Outline31
Reminder: What is MIL?
CR-MILBoost (CVPR’14)
Conclusion & Future Work
Discussion
Reminder: What is MIL?32
Supervised Learning Each instance has an associated label
MIL: Weaker Supervision Examples come in bags Each Bag has a label
Negative Bag: all instances in bag are negative Positive Bag: at least one instance in bag is positive
Supervised vs MIL (binary)33
Supervised Learning MI Learning
Related Methods34
How to estimate latent labels for positives
Gartner, ICML’02 Xu, ICML’04 Andrews, NIPS’03
Bunescu, ICML’07 SVM Constraints
Viola, NIPS’07
Supervised MIL
CR-MILBOOST35
MILBoost
CR-MILBOOST36
MILBoost
CR-MILBOOST37
Two Step Procedure Estimate Probabilities on latent label Integrate estimate in new loss
Mitigates label estimation error by incorporating priors
CR-MILBOOST38
Step 1
CR-MILBOOST39
Step 2
CR-MILBOOST40
Step 2
Experiments: Features41
Weak Learners: An edge orientation A sub-window A threshold
Simple, Efficient Q=4, number of stumps
Experiments: Pedestrian Detection42
Training Data 200 images automatically downloaded from the web 200 “objectness” bounding boxes
Experiments: Pedestrian Detection43
Testing Data INRIA Person 300 images containing 600 pedestrians
Experiments: Pedestrian Detection44
Experiments: Pedestrian Detection45
Experiments: Pedestrian Detection46
Experiments: Horse Detection47
Training Data 200 images automatically downloaded from the web 200 “objectness” bounding boxes
Experiments: Horse Detection48
Testing Data 200 images containing 200 side-view horses
Experiments: Horse Detection49
Experiments: Horse Detection50
Experiments: Horse Detection51
Conclusion52
New MIL method: CR-MILBOOST Two step procedure
Dramatic increase in performance 200% on two datasets
Quality of selected examples still suffer from additional ambiguity when compared to the fully supervised examples
Joint work with Judy Hoffman, Eric Tzeng, Sergio Guadarrama and Trevor Darrell at UC Berkeley
Adapting Deep CNNs from Classification to Detection
54
Recall: classification is easier than detection55
Classification label: Easy to label
Detection label: much more difficult and costly!
dog apple
dog apple
ICLASSIFY
dog
apple
IDET
dog
apple
ICLASSIFY
cat
WCLASSIFYdog
WCLASSIFYapple
ClassifiersWDET
dog
WDETapple
Detectors
WCLASSIFYcat WDET
cat IDET
?
Main idea behind the approach
cat: 0.90
dog: 0.85
airplane: 0.05
person: 0.10
layers 1-5
fc6 fc7fcA
fcB
Classification data from categories A and B
Train Classification CNN
cat
dog
Deep Convolutional Neural Network
dog: 0.87
person: 0.15
cat: 0.90
dog: 0.85
background: 0.25
airplane: 0.05
person: 0.10
layers 1-5
det layers 1-5
fc6
detfc6
fc7
detfc7
fcA
fcB
detfcB
Classification data from categories A and B
Train Classification CNN
Detection data from categories B
Labeledwarped region
Train adapteddetection CNN
dog
cat
dog
background
background: 0.25
detlayers 1-5
detfc6
detfc7
Final Combined and fully adapted CNN
cat: 0.90
airplane: 0.02detfcA
dog: 0.45
person: 0.15
detfcB
adapt
background
(c) Output Layer Adaptation
(a) C
lass
ifica
tion
CNN
(b) Hidden Layer Adaptation
Results on ILSVRC 2013 Detection
Results on ILSVRC 2013 Detection
Results on ILSVRC 2013 Detection
Preliminary results on 7K categories63
Conclusion64
Presented two new methods for object detector training with minimal bounding box annotation MIL based method for learning from results of image
search Adaptation from classification to detection task
Questions?65