View
215
Download
0
Embed Size (px)
Citation preview
Joint Estimation of Image Clusters and Image Transformations
Brendan J. FreyComputer Science, University of Waterloo, Canada
Beckman Institute and ECE, Univ of Illinois at Urbana
Nebojsa JojicBeckman Institute, University of Illinois at Urbana
We’d like to cluster images, but The unknown subjects have unknown positions
The unknown subjects have unknown positions
unknown rotations unknown scales unknown levels of shearing . . .
Oneapproach
Normalization
PatternAnalysis
Images
Normalizedimages
Labor
Anotherapproach
Apply transformations to each image
PatternAnalysis
Images
Huge dataset
• Assumes transformations are equally likely
• noise gets copied
• analysis is more complex
Yet anotherapproach
Extract transformation-
invariant features
PatternAnalysis
Images
Transformation-invariant data
• Difficult to work with
• May hide useful features
Ourapproach
Joint Normalization
andPattern Analysis
Images
• A continuous transformation moves an image, , along a continuous curve
• Our clustering algorithm should assign images near this nonlinear manifold to the same cluster
What transforming an image does in the vector space of pixel intensities
Tractable approaches to modeling the transformation manifold
\ Linear approximation
- good locally, bad globally
• Finite-set approximation
- good globally, bad locally
Generative models
• Local invariance: PCA, Turk, Moghaddam, Pentland (96); factor analysis, Hinton, Revow, Dayan, Ghahramani (96); Frey, Colmenarez, Huang (98)
• Layered motion: Black,Jepson,Wang,Adelson,Weiss(93-98)
Learning discrete representations of generative manifolds
• Generative topographic maps, Bishop,Svensen,Williams (98)
Discriminative models
• Local invariance: tangent distance, tangent prop, Simard, Le Cun, Denker, Victorri (92-93)
• Global invariance: convolutional neural networks, Le Cun, Bottou, Bengio, Haffner (98)
Related work
Generative density modeling• The goal is to find a probability model that
– reflects the structure we want to extract
– can randomly generate plausible images,
– represents the data using parameters
• ML estimation is used to find the parameters
• We can use class-conditional likelihoods, p(image|class) for recognition, detection, ...
Mixture of Gaussians
cThe probability that an image comes
from cluster c = 1,2,… is P(c) = c
Mixture of Gaussians
c
z
The probability of pixel intensities z given that the image is from
cluster c is p(z|c) = N(z; c , c)
P(c) = c
Mixture of Gaussians
cP(c) = c
zp(z|c) = N(z; c , c)
• Parameters c, c and c represent the data
• For input z, the cluster responsibilities are
P(c|z) = p(z|c)P(c) / c p(z|c)P(c)
Example: Hand-crafted model
cP(c) = c
zp(z|c) = N(z; c , c)
1= 0.6,
2= 0.4,
Example: Simulation
cP(c) = c
zp(z|c) = N(z; c , c)
1= 0.6,
2= 0.4,
Example: Simulation
c=1P(c) = c
zp(z|c) = N(z; c , c)
1= 0.6,
2= 0.4,
Example: Simulation
c=1P(c) = c
z=
p(z|c) = N(z; c , c)
1= 0.6,
2= 0.4,
Example: Simulation
cP(c) = c
zp(z|c) = N(z; c , c)
1= 0.6,
2= 0.4,
Example: Simulation
c=2P(c) = c
zp(z|c) = N(z; c , c)
1= 0.6,
2= 0.4,
Example: Simulation
c=2P(c) = c
z=
p(z|c) = N(z; c , c)
1= 0.6,
2= 0.4,
Example: Inference
c
z
1= 0.6,
2= 0.4,
Images from data set
Example: Inference
c=1
1= 0.6,
2= 0.4,
Images from data set
z=
c=2
P(c|z)
c0.99
0.01
Example: Inference
1= 0.6,
2= 0.4,
Images from data set
z=
cc=1
c=2
P(c|z)
0.02
0.98
Example: Learning - E step
c
z
1= 0.5,
2= 0.5,
Images from data set
Example: Learning - E step
c=1
Images from data set
z=
c=2
P(c|z)
c0.52
0.48
1= 0.5,
2= 0.5,
Example: Learning - E step
Images from data set
z=
cc=1
c=2
P(c|z)
0.51
0.49
1= 0.5,
2= 0.5,
Example: Learning - E step
Images from data set
z=
cc=1
c=2
P(c|z)
0.48
0.52
1= 0.5,
2= 0.5,
Example: Learning - E step
Images from data set
z=
cc=1
c=2
P(c|z)
0.43
0.57
1= 0.5,
2= 0.5,
Example: Learning - M step
c
1= 0.5,
2= 0.5,
zSet 1 to the average of zP(c=1|z)
Set 2 to the average of zP(c=2|z)
Example: Learning - M step
c
1= 0.5,
2= 0.5,
zSet 1 to the average of zP(c=1|z)
Set 2 to the average of zP(c=2|z)
Example: Learning - M step
c
1= 0.5,
2= 0.5,
zSet 1 to the average of
diag((z-1)T (z-1))P(c=1|z)Set 2 to the average of
diag((z-2)T (z-2))P(c=2|z)
Example: Learning - M step
c
1= 0.5,
2= 0.5,
zSet 1 to the average of
diag((z-1)T (z-1))P(c=1|z)Set 2 to the average of
diag((z-2)T (z-2))P(c=2|z)
Example: After iterating EM...
c
z
1= 0.6,
2= 0.4,
Adding “transformation” as a discrete latent variable
• Say there are N pixels
• We assume we are given a set of sparse N x N transformation generating matrices G1,…,Gl ,…,GL
• These generate points from point
Transformed Mixture of Gaussians
cThe probability that the image comes
from cluster c = 1,2,… is P(c) = c
Transformed Mixture of Gaussians
c
z
The probability of latent image z
for cluster c is p(z|c) = N(z; c , c)
P(c) = c
Transformed Mixture of Gaussians
lThe probability of transf
l = 1,2,… is P(l) = l
p(z|c) = N(z; c , c)
c
z
P(c) = c
Transformed Mixture of Gaussians
The probability of observed
image x is p(x|z,l) = N(x; Gl z , )
x
P(l) = l l
p(z|c) = N(z; c , c)
c
z
P(c) = c
Transformed Mixture of Gaussians
• l, c, c and c represent the data
• The cluster/transf responsibilities, P(c,l|x), are quite easy to compute
p(x|z,l) = N(x; Gl z , )
x
P(l) = l l
p(z|c) = N(z; c , c)
c
z
P(c) = c
Example: Hand-crafted model
G1 = shift left and up, G2 = I, G3 = shift right and up
x
l
c
zl = 1, 2, 3
1= 0.6, 2= 0.4
1= 2= 3= 0.33
Example: Simulation
x
l
c
z
G1 = shift left and up, G2 = I, G3 = shift right and up
Example: Simulation
c=1
G1 = shift left and up, G2 = I, G3 = shift right and up
x
l z
Example: Simulation
c=1
G1 = shift left and up, G2 = I, G3 = shift right and up
z=
x
l
Example: Simulation
l=1
c=1
G1 = shift left and up, G2 = I, G3 = shift right and up
z=
x
Example: Simulation
l=1
c=1
G1 = shift left and up, G2 = I, G3 = shift right and up
z=
x=
Example: Simulation
x
l
c
z
G1 = shift left and up, G2 = I, G3 = shift right and up
Example: Simulation
c=2
G1 = shift left and up, G2 = I, G3 = shift right and up
x
l z
Example: Simulation
c=2
G1 = shift left and up, G2 = I, G3 = shift right and up
z=
x
l
Example: Simulation
l=3
c=2
G1 = shift left and up, G2 = I, G3 = shift right and up
z=
x
Example: Simulation
l=3
c=2
G1 = shift left and up, G2 = I, G3 = shift right and up
z=
x=
ML estimation of a Transformed Mixture of Gaussians using EM
x
l
c
z
• E step: Compute P(l|x), P(c|x) and p(z|c,x) for each x in data
• M step: Set
– c = avg of P(c|x)
– l = avg of P(l|x)
– c = avg mean of p(z|c,x)
– c = avg variance of p(z|c,x)
– = avg var of p(x-Gl z|x)
A Tough Toy Problem
• 4 different shapes
• 25 possible locations• cluttered background
• fixed distraction
• 100 “clusters”
• 200 training cases
Mixture of Gaussians
Mean and first 5principal components
TransformedMixture of Gaussians5 horiz shifts + 5 vert shifts20 iterations of EM
20 iterations of EM
Face ClusteringExamples of 400 outdoor images of 2 people
(44 x 28 pixels)
Mixture of Gaussians
15 iterations of EM (MATLAB takes 1 minute)
Cluster meansc = 1 c = 2 c = 3 c = 4
Transformed mixture of Gaussians• 11 horizontal shifts; 11 vertical shifts
• 4 clusters
• Each cluster has 1 mean and 1 variance for each latent pixel
• 1 variance for each observed pixel
• Training: 15 iterations of EM (MATLAB script takes 10 sec/image)
Initialization
Cluster meansc = 1 c = 2 c = 3 c = 4
Transformed mixture of Gaussians
1 iteration of EM
Cluster meansc = 1 c = 2 c = 3 c = 4
Transformed mixture of Gaussians
2 iterations of EM
Cluster meansc = 1 c = 2 c = 3 c = 4
Transformed mixture of Gaussians
3 iterations of EM
Cluster meansc = 1 c = 2 c = 3 c = 4
Transformed mixture of Gaussians
4 iterations of EM
Cluster meansc = 1 c = 2 c = 3 c = 4
Transformed mixture of Gaussians
5 iterations of EM
Cluster meansc = 1 c = 2 c = 3 c = 4
Transformed mixture of Gaussians
6 iterations of EM
Cluster meansc = 1 c = 2 c = 3 c = 4
Transformed mixture of Gaussians
7 iterations of EM
Cluster meansc = 1 c = 2 c = 3 c = 4
Transformed mixture of Gaussians
8 iterations of EM
Cluster meansc = 1 c = 2 c = 3 c = 4
Transformed mixture of Gaussians
9 iterations of EM
Cluster meansc = 1 c = 2 c = 3 c = 4
Transformed mixture of Gaussians
10 iterations of EM
Cluster meansc = 1 c = 2 c = 3 c = 4
Transformed mixture of Gaussians
11 iterations of EM
Cluster meansc = 1 c = 2 c = 3 c = 4
Transformed mixture of Gaussians
12 iterations of EM
Cluster meansc = 1 c = 2 c = 3 c = 4
Transformed mixture of Gaussians
13 iterations of EM
Cluster meansc = 1 c = 2 c = 3 c = 4
Transformed mixture of Gaussians
14 iterations of EM
Cluster meansc = 1 c = 2 c = 3 c = 4
Transformed mixture of Gaussians
15 iterations of EM
Cluster meansc = 1 c = 2 c = 3 c = 4
Transformed mixture of Gaussians
20 iterations of EM
Cluster meansc = 1 c = 2 c = 3 c = 4
Transformed mixture of Gaussians
30 iterations of EM
Cluster meansc = 1 c = 2 c = 3 c = 4
Transformed mixture of Gaussians
Mixture of Gaussians
30 iterations of EM
Cluster meansc = 1 c = 2 c = 3 c = 4
Modeling Written Digits
A TMG that Captures Writing Angle
• P(l|x) identifies the writing angle in image x
CLUSTERS
TRANSFORMATIONS
Wrap-up• MATLAB scripts available at
www.cs.uwaterloo.ca/~frey
• Other domains: audio, bioinformatics, …
• Other latent image models, p(z)– factor analysis (prob PCA) (ICCV99)– mixtures of factor analyzers (NIPS99)– time series (CVPR00)
• Automatic video clustering
• Fast variational inference and learning