Upload
maite-hamilton
View
32
Download
0
Tags:
Embed Size (px)
DESCRIPTION
0. 1. 0. 2. Discovering Objects and their Location in Images. Josef Sivic 1 , Bryan C. Russell 2 , Alexei A. Efros 3 , Andrew Zisserman 1 and William T. Freeman 2. 1 Oxford University 2 MIT 3 Carnegie Mellon University. Introduction. The topic discovery models. - PowerPoint PPT Presentation
Citation preview
Discovering Objects and their Location in ImagesJosef Sivic1, Bryan C. Russell2, Alexei A. Efros3, Andrew Zisserman1 and William T. Freeman2
Goal: Discover visual object categories and their segmentation given a collection of unlabelled images
Introduction
Represent an image as a histogram of “visual words”
The topic discovery models
Probabilistic Latent Semantic Analysis (pLSA) [Hofmann’99]
Experiment I: Caltech Dataset
pLSA graphical model
Five samples from a ‘motorbike’ visual word
Improving localization using doublets
1Oxford University 2MIT 3Carnegie Mellon University
Experiment II: MIT dataset
Overview
Find topic vectors P(w|z) common to all documents and mixture coefficients P(z|d) specific to each document. Fit model by maximizing likelihood of data using EM.
pLSA Model fitting:
Assign each image to a topic with the highest P(z|d)
Learn K = (5,6,7) topics
Background is better modelled by multiple topics
Pre-learning background topics on a separate bg dataset improves results
Performance on novel images is comparable with semi-supervised method of [Fergus et al.’03] Confusion tables (K=5,6,7) learned topics
Form a new vocabulary from pairs of locally co-occurring regions
Doublet example I Doublet examle II
Doublet segmentationSinglet segmentation
4 of the 10 learned topics shown by the 5 most probable images for each topic
- 2873 images, learn 10 topics
Singlet segmentationAll detected visual words
“Buildings” “Trees / Grass”
“Bookshelves”“Computers”
Example Images with multiple objects
Image representation
Approach: 1) Represent an image as a collection of visual words
2) Apply topic discovery models from statistical text analysis
Results
Histogram of visual words
• Detect affine covariant regions
• Represent each region by a SIFT descriptor
• Build visual vocabulary by k-means clustering (K~1,000)
• Assign each region to the nearest cluster centre
2
0
1
0
..
.
Five samples from an ‘airplane’ visual word
Mikolajczyk and Schmid’02, Schaffalitzky and Zisserman’02, Matas et al. ’02, Lowe’99, Sivic and Zisserman’03
Examples of visual words
Doublet formation
Segmentation
For a given word wi in document dj examine posterior probability over topics.
FacesMotorbikesAirplanesCars
Background IBackground IIBackground III
Visual words colour coded according to the topic with the highest probability
Example motorbike segmentationExample airplane segmentation
Image Classification
Four object categories: faces, motorbikes, airplanes and cars rear (total of 3,190 images) and 900 background images
LDA graphical model
Latent Dirichlet Allocation (LDA) [Blei et al.’03]
Treat multinomial weights over topics as random variables. Fit model using Gibbs sampling [Griffiths and Steyvers’04].
Results shown only for pLSA. LDA had very similar performance.
Experiment III: Application to image retrieval
Learn topic vectors on Caltech databaseRepresent new query image in terms of learned topic vectors
Retrieved images using visual word histograms
Retrieved images using pLSA ‘object’ coefficients P(z|d)
Example face segmentation
Represent each keyframe using topic vectors learned on Caltech database
Pretty Woman (6,641 keyframes)
Retrieve images within Caltech database
Query image
pLSA
Retrieve images in movie Pretty Woman
Raw word histograms
Precision – Recall plot
Find visual words
Form histograms
Discover topics
Visual Polysemy. Single visual word occurring on different (but locally similar) parts on different object
categories.
Visual Synonyms. Two different visual words representing similar part of an object (wheel of an
motorbike).
w … visual words d … documents (images) z … topics (‘objects’)
P(z|d) and P(w|z) are multinomial distributions