Upload
zukun
View
288
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
Computer vision: models, learning and inference
Chapter 20 Models for visual words
Please send errata to [email protected]
2
Visual words
2Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Most models treat data as continuous• Likelihood based on normal distribution• Visual words = discrete representation of
image• Likelihood based on categorical distribution• Useful for difficult tasks such as scene
recognition and object recognition
3
Motivation: scene recognition
3Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
4
Structure
4Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications
5
Computing dictionary of visual words
5Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
1. For every one of the I training images, select a set of Ji spatial locations.• Interest points• Regular grid
2. Compute a descriptor at each spatial location in each image
3. Cluster all of these descriptor vectors into K groups using a method such as the K-Means algorithm
4. The means of the K clusters are used as the K prototype vectors in the dictionary.
6
Encoding images as visual words
6Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
1. Select a set of J spatial locations in the image using the same method as for the dictionary
2. Compute the descriptor at each of the J spatial locations. 3. Compare each descriptor to the set of K prototype
descriptors in the dictionary4. Assign a discrete index to this location that corresponds to
the index of the closest word in the dictionary.
End result:
Discrete feature index x and y position
7
Structure
7Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications
8
Bag of words model
8Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Key idea:
• Abandon all spatial information• Just represent image by relative frequency
(histogram) of words from dictionary
where
9
Bag of words
9Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
10
Structure
10Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Learning (MAP solution):
Inference:
11
Bag of words for object recognition
11Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
12
Problems with bag of words
12Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
13
Structure
13Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications
14
Latent Dirichlet allocation
14Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Describes relative frequency of visual words in a single image (no world term)
• Words not generated independently (connected by hidden variable)
• Analogy to text documents– Each image contains mixture of several topics (parts)– Each topic induces a distribution over words
15
Latent Dirichlet allocation
15Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
16
Latent Dirichlet allocation
16Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Generative equations
Marginal distribution over features
Conjugate priors over parameters
17
Latent Dirichlet allocation
17Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
18
Learning LDA model
18Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Part labels p hidden variables• If we knew them then it would be easy to estimate the
parameters
• How about EM algorithm? Unfortunately, parts within in image not independent
19
Latent Dirichlet allocation
19Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
20
Learning
20Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Strategy:
1. Write an expression for posterior distribution over part labels
2. Draw samples from posterior using MCMC3. Use samples to estimate parameters
21
1. Posterior over part labels
21Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Can compute two terms in numerator in closed formDenominator
intractable
22
2. Draw samples from posterior
22Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Gibbs’ sampling: fix all part labels except one and sample from conditional distribution
This can be computed in closed form
23
3. Use samples to estimate parameters
23Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Samples substitute in for real part labels in update equations
24
Structure
24Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications
25
Single author topic model
25Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
26
Single author-topic model
26Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
27
Learning
27Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
1. Posterior over part labels
Likelihood same as before, prior becomes
28
Learning
28Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
2. Draw samples from posterior
3. Use samples to estimate parameters
29
Inference
29Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Compute posterior over categories
Likelihood that words in this image are due to category n
30
Structure
30Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications
31
Problems with bag of words
31Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
32
Constellation model
32Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
33
Constellation model
33Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
34
Learning
34Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
1. Posterior over part labels
Prior same as before, likelihood becomes
35
Learning
35Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
2. Draw samples from posterior
3. Use samples to estimate parameters
Part and word probabilities as before
36
Inference
36Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Compute posterior over categories
Likelihood that words in this image are due to category n
37
Learning
37Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
38
Structure
38Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications
39
Problems with bag of words
39Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
40
Scene model
40Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
41
Scene model
41Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
42
Structure
42Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
• Computing visual words• Bag of words model• Latent Dirichlet allocation• Single author-topic model• Constellation model• Scene model• Applications
43
Video Google
43Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
44
Action recognition
44Computer vision: models, learning and inference. ©2011 Simon J.D. Prince
Spatio-temporal bag of words model 91.8% classification
45
Action recognition
45Computer vision: models, learning and inference. ©2011 Simon J.D. Prince