Upload
azize
View
66
Download
0
Embed Size (px)
DESCRIPTION
790-133 Recognizing People, Objects, & Actions. Tamara Berg Object Recognition – BoF models. Topic Presentations. Hopefully you have met your topic presentations group members? - PowerPoint PPT Presentation
Citation preview
1
Tamara BergObject Recognition – BoF models
790-133Recognizing People, Objects, & Actions
2
Topic Presentations• Hopefully you have met your topic presentations group
members?
• Group 1 – see me to run through slides this week or Monday at the latest (I’m traveling Thurs/Friday). Send me links to 2-3 papers for the class to read.
• Sign up for class google group (790-133). To find the group go to groups.google.com and search for 790-133 (sorted by date). Use this to post/answer questions related to the class.
3
ObjectBag of
‘features’
Bag-of-features models
source: Svetlana Lazebnik
4
Exchangeability
• De Finetti Theorem of exchangeability (bag of words theorem): the joint probability distribution underlying the data is invariant to permutation.
5
Origin 2: Bag-of-words models
US Presidential Speeches Tag Cloudhttp://chir.ag/phernalia/preztags/
• Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)
source: Svetlana Lazebnik
6
Bag of words for text
· Represent documents as a “bags of words”
7
Example
• Doc1 = “the quick brown fox jumped”• Doc2 = “brown quick jumped fox the”
Would a bag of words model represent these two documents differently?
8
Bag of words for images
· Represent images as a “bag of features”
9
Bag of features: outline1. Extract features
source: Svetlana Lazebnik
10
Bag of features: outline1. Extract features2. Learn “visual vocabulary”
source: Svetlana Lazebnik
11
Bag of features: outline1. Extract features2. Learn “visual vocabulary”3. Represent images by frequencies of
“visual words”
source: Svetlana Lazebnik
12
2. Learning the visual vocabulary
Clustering
…
Slide credit: Josef Sivic
13
2. Learning the visual vocabulary
Clustering
…
Slide credit: Josef Sivic
Visual vocabulary
14
K-means clustering (reminder)• Want to minimize sum of squared Euclidean
distances between points xi and their nearest cluster centers mk
Algorithm:• Randomly initialize K cluster centers• Iterate until convergence:
• Assign each data point to the nearest center• Recompute each cluster center as the mean of all points assigned
to it
k
ki
ki mxMXDcluster
clusterinpoint
2)(),(
source: Svetlana Lazebnik
15
Example visual vocabulary
Fei-Fei et al. 2005
Image Representation• For a query image
Extract features
Associate each feature with the nearest cluster center (visual word)
Accumulate visual word frequencies over the image
Visual vocabulary
xx
x x
x x
x
x
x
x
17
3. Image representation
…..
freq
uenc
y
codewords
source: Svetlana Lazebnik
18
4. Image classification
…..
freq
uenc
y
codewords
source: Svetlana Lazebnik
Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
CAR
Image Categorization
Choose from many categoriesWhat is this? helicopter
Image Categorization
Choose from many categoriesWhat is this?
SVM/NBCsurka et al (Caltech 4/7)Nearest NeighborBerg et al (Caltech 101)Kernel + SVMGrauman et al (Caltech 101)Multiple Kernel Learning + SVMsVarma et al (Caltech 101)…
21
Visual Categorization with Bags of KeypointsGabriella Csurka, Christopher R. Dance, Lixin Fan, Jutta Willamowski, Cédric Bray
22
Data
• Images in 7 classes: faces, buildings, trees, cars, phones, bikes, books
• Caltech 4 dataset: faces, airplanes, cars (rear and side), motorbikes, background
23
Method
Steps:– Detect and describe image patches.– Assign patch descriptors to a set of predetermined
clusters (a visual vocabulary).– Construct a bag of keypoints, which counts the
number of patches assigned to each cluster.– Apply a classifier (SVM or Naïve Bayes), treating the
bag of keypoints as the feature vector– Determine which category or categories to assign to
the image.
24
Bag-of-Keypoints Approach
Interesting Point Detection
Key PatchExtraction
FeatureDescriptors Bag of Keypoints Multi-class
Classifier
5.1...
5.01.0
Slide credit: Yun-hsueh Liu
25
SIFT Descriptors
Interesting Point Detection
Key PatchExtraction
FeatureDescriptors Bag of Keypoints Multi-class
Classifier
5.1...
5.01.0
Slide credit: Yun-hsueh Liu
26
Bag of Keypoints (1)
• Construction of a vocabulary– Kmeans clustering find “centroids” (on all the descriptors we find from all the training images) – Define a “vocabulary” as a set of “centroids”, where every centroid represents
a “word”.
Interesting Point Detection
Key PatchExtraction
FeatureDescriptors Bag of Keypoints Multi-class
Classifier
Slide credit: Yun-hsueh Liu
27
Bag of Keypoints (2)
• Histogram– Counts the number of occurrences of different visual words in each image
Interesting Point Detection
Key PatchExtraction
FeatureDescriptors Bag of Keypoints Multi-class
Classifier
Slide credit: Yun-hsueh Liu
28
Multi-class Classifier
• In this paper, classification is based on conventional machine learning approaches– Support Vector Machine (SVM)– Naïve Bayes
Interesting Point Detection
Key PatchExtraction
FeatureDescriptors Bag of Keypoints Multi-class
Classifier
Slide credit: Yun-hsueh Liu
29
SVM
Reminder: Linear SVM
x1
x2 Margin
wT x + b = 0
wT x + b = -1w
T x + b = 1
x+
x+
x-
Support Vectors
Slide credit: Jinwei GuSlide 30 of 113
( ) Tg b x w x
( ) 1Ti iy b w x
21minimize 2w
s.t.
31
Nonlinear SVMs: The Kernel Trick With this mapping, our discriminant function becomes:
SV
( ) ( ) ( ) ( )T Ti i
i
g b b
x w x x x
No need to know this mapping explicitly, because we only use the dot product of feature vectors in both the training and test.
A kernel function is defined as a function that corresponds to a dot product of two feature vectors in some expanded feature space:
( , ) ( ) ( )Ti j i jK x x x x
Slide credit: Jinwei Gu
32
Nonlinear SVMs: The Kernel Trick
Linear kernel:
2
2( , ) exp( )2i j
i jK
x xx x
( , ) Ti j i jK x x x x
( , ) (1 )T pi j i jK x x x x
0 1( , ) tanh( )Ti j i jK x x x x
Examples of commonly-used kernel functions:
Polynomial kernel:
Gaussian (Radial-Basis Function (RBF) ) kernel:
Sigmoid:
Slide credit: Jinwei Gu
34
SVM for image classification
• Train k binary 1-vs-all SVMs (one per class)• For a test instance, evaluate with each
classifier• Assign the instance to the class with the
largest SVM output
35
Naïve Bayes
36
Naïve Bayes Model
C – Class F - Features
We only specify (parameters): prior over class labels
how each feature depends on the class
37
Slide from Dan Klein
Example:
38
Slide from Dan Klein
39
Slide from Dan Klein
40
Percentage of documents in training set labeled as spam/ham
Slide from Dan Klein
41
In the documents labeled as spam, occurrence percentage of each word (e.g. # times “the” occurred/# total words).
Slide from Dan Klein
42
In the documents labeled as ham, occurrence percentage of each word (e.g. # times “the” occurred/# total words).
Slide from Dan Klein
43
Classification
The class that maximizes:
44
Classification
• In practice
45
Classification
• In practice– Multiplying lots of small probabilities can result in
floating point underflow
46
Classification
• In practice– Multiplying lots of small probabilities can result in
floating point underflow– Since log(xy) = log(x) + log(y), we can sum log
probabilities instead of multiplying probabilities.
47
Classification
• In practice– Multiplying lots of small probabilities can result in
floating point underflow– Since log(xy) = log(x) + log(y), we can sum log
probabilities instead of multiplying probabilities.– Since log is a monotonic function, the class with
the highest score does not change.
48
Classification
• In practice– Multiplying lots of small probabilities can result in
floating point underflow– Since log(xy) = log(x) + log(y), we can sum log
probabilities instead of multiplying probabilities.– Since log is a monotonic function, the class with
the highest score does not change.– So, what we usually compute in practice is:
49
Naïve Bayes on images
50
Naïve Bayes
C – Class F - Features
We only specify (parameters): prior over class labels
how each feature depends on the class
51
Naive Bayes Parameters
Problem: Categorize images as one of k object classes using Naïve Bayes classifier:– Classes: object categories (face, car, bicycle, etc)– Features – Images represented as a histogram of
visual words. are visual words.
treated as uniform. learned from training data – images labeled
with category. Probability of a visual word given an image category.
52
Multi-class classifier –Naïve Bayes (1)
• Let V = {vi}, i = 1,…,N, be a visual vocabulary, in which each vi represents a visual word (cluster centers) from the feature space.
• A set of labeled images I = {Ii } .
• Denote Cj to represent our Classes, where j = 1,..,M
• N(t,i) = number of times vi occurs in image Ii
• Compute P(Cj|Ii):
Slide credit: Yun-hsueh Liu
53
Multi-class Classifier –Naïve Bayes (2)
• Goal - Find maximum probability class Cj:
• In order to avoid zero probability, use Laplace smoothing:
Slide credit: Yun-hsueh Liu
Results
55
Results
56
Results
57
Results
Results on Dataset 2
58
Results
59
Results
60
Results
Thoughts?
• Pros?
• Cons?
62
Related BoF modelspLSA, LDA, …
63
pLSA
wordtopicdocument
64
pLSA
67
pLSA on images
68
Discovering objects and their location in imagesJosef Sivic, Bryan C. Russell, Alexei A. Efros, Andrew Zisserman, William T. Freeman
Documents – ImagesWords – visual words (vector quantized SIFT descriptors)Topics – object categories
Images are modeled as a mixture of topics (objects).
69
Goals
They investigate three areas: – (i) topic discovery, where categories are
discovered by pLSA clustering on all available images.
– (ii) classification of unseen images, where topics corresponding to object categories are learnt on one set of images, and then used to determine the object categories present in another set.
– (iii) object detection, where you want to determine the location and approximate segmentation of object(s) in each image.
70
(i) Topic Discovery
Most likely words for 4 learnt topics (face, motorbike, airplane, car)
71
(ii) Image Classification
Confusion table for unseen test images against pLSA trained on images containing four object categories, but no background images.
72
(ii) Image Classification
Confusion table for unseen test images against pLSA trained on images containing four object categories, and background images. Performance is not quite as good.
73
(iii) Topic Segmentation
74
(iii) Topic Segmentation
75
(iii) Topic Segmentation