Joint Estimation of Image Clusters and Image Transformations Brendan J. Frey Computer Science, University of Waterloo, Canada Beckman Institute and ECE,

Joint Estimation of Image Clusters and Image Transformations

Brendan J. FreyComputer Science, University of Waterloo, Canada

Beckman Institute and ECE, Univ of Illinois at Urbana

Nebojsa JojicBeckman Institute, University of Illinois at Urbana

We’d like to cluster images, but The unknown subjects have unknown positions

The unknown subjects have unknown positions

unknown rotations unknown scales unknown levels of shearing . . .

Oneapproach

Normalization

PatternAnalysis

Images

Normalizedimages

Labor

Anotherapproach

Apply transformations to each image

PatternAnalysis

Images

Huge dataset

• Assumes transformations are equally likely

• noise gets copied

• analysis is more complex

Yet anotherapproach

Extract transformation-

invariant features

PatternAnalysis

Images

Transformation-invariant data

• Difficult to work with

• May hide useful features

Ourapproach

Joint Normalization

andPattern Analysis

Images

• A continuous transformation moves an image, , along a continuous curve

• Our clustering algorithm should assign images near this nonlinear manifold to the same cluster

What transforming an image does in the vector space of pixel intensities

Tractable approaches to modeling the transformation manifold

\ Linear approximation

- good locally, bad globally

• Finite-set approximation

- good globally, bad locally

Generative models

• Local invariance: PCA, Turk, Moghaddam, Pentland (96); factor analysis, Hinton, Revow, Dayan, Ghahramani (96); Frey, Colmenarez, Huang (98)

• Layered motion: Black,Jepson,Wang,Adelson,Weiss(93-98)

Learning discrete representations of generative manifolds

• Generative topographic maps, Bishop,Svensen,Williams (98)

Discriminative models

• Local invariance: tangent distance, tangent prop, Simard, Le Cun, Denker, Victorri (92-93)

• Global invariance: convolutional neural networks, Le Cun, Bottou, Bengio, Haffner (98)

Related work

Generative density modeling• The goal is to find a probability model that

– reflects the structure we want to extract

– can randomly generate plausible images,

– represents the data using parameters

• ML estimation is used to find the parameters

• We can use class-conditional likelihoods, p(image|class) for recognition, detection, ...

Mixture of Gaussians

cThe probability that an image comes

from cluster c = 1,2,… is P(c) = c


c

z

The probability of pixel intensities z given that the image is from

cluster c is p(z|c) = N(z; c , c)

P(c) = c


cP(c) = c

zp(z|c) = N(z; c , c)

• Parameters c, c and c represent the data

• For input z, the cluster responsibilities are

P(c|z) = p(z|c)P(c) / c p(z|c)P(c)

Example: Hand-crafted model

cP(c) = c

zp(z|c) = N(z; c , c)

1= 0.6,

2= 0.4,

Example: Simulation

cP(c) = c

zp(z|c) = N(z; c , c)

1= 0.6,

2= 0.4,

Example: Simulation

c=1P(c) = c

zp(z|c) = N(z; c , c)

1= 0.6,

2= 0.4,

Example: Simulation

c=1P(c) = c

z=

p(z|c) = N(z; c , c)

1= 0.6,

2= 0.4,

Example: Simulation

cP(c) = c

zp(z|c) = N(z; c , c)

1= 0.6,

2= 0.4,

Example: Simulation

c=2P(c) = c

zp(z|c) = N(z; c , c)

1= 0.6,

2= 0.4,

Example: Simulation

c=2P(c) = c

z=

p(z|c) = N(z; c , c)

1= 0.6,

2= 0.4,

Example: Inference

c

z

1= 0.6,

2= 0.4,

Images from data set

Example: Inference

c=1

1= 0.6,

2= 0.4,


z=

c=2

P(c|z)

c0.99

0.01

Example: Inference

1= 0.6,

2= 0.4,


z=

cc=1

c=2

P(c|z)

0.02

0.98

Example: Learning - E step

c

z

1= 0.5,

2= 0.5,



c=1


z=

c=2

P(c|z)

c0.52

0.48

1= 0.5,

2= 0.5,



z=

cc=1

c=2

P(c|z)

0.51

0.49

1= 0.5,

2= 0.5,



z=

cc=1

c=2

P(c|z)

0.48

0.52

1= 0.5,

2= 0.5,



z=

cc=1

c=2

P(c|z)

0.43

0.57

1= 0.5,

2= 0.5,

Example: Learning - M step

c

1= 0.5,

2= 0.5,

zSet 1 to the average of zP(c=1|z)

Set 2 to the average of zP(c=2|z)


c

1= 0.5,

2= 0.5,

zSet 1 to the average of zP(c=1|z)

Set 2 to the average of zP(c=2|z)


c

1= 0.5,

2= 0.5,

zSet 1 to the average of

diag((z-1)T (z-1))P(c=1|z)Set 2 to the average of

diag((z-2)T (z-2))P(c=2|z)


c

1= 0.5,

2= 0.5,

zSet 1 to the average of

diag((z-1)T (z-1))P(c=1|z)Set 2 to the average of

diag((z-2)T (z-2))P(c=2|z)

Example: After iterating EM...

c

z

1= 0.6,

2= 0.4,

Adding “transformation” as a discrete latent variable

• Say there are N pixels

• We assume we are given a set of sparse N x N transformation generating matrices G1,…,Gl ,…,GL

• These generate points from point

Transformed Mixture of Gaussians

cThe probability that the image comes

from cluster c = 1,2,… is P(c) = c


c

z

The probability of latent image z

for cluster c is p(z|c) = N(z; c , c)

P(c) = c


lThe probability of transf

l = 1,2,… is P(l) = l

p(z|c) = N(z; c , c)

c

z

P(c) = c


The probability of observed

image x is p(x|z,l) = N(x; Gl z , )

x

P(l) = l l

p(z|c) = N(z; c , c)

c

z

P(c) = c


• l, c, c and c represent the data

• The cluster/transf responsibilities, P(c,l|x), are quite easy to compute

p(x|z,l) = N(x; Gl z , )

x

P(l) = l l

p(z|c) = N(z; c , c)

c

z

P(c) = c

Example: Hand-crafted model

G1 = shift left and up, G2 = I, G3 = shift right and up

x

l

c

zl = 1, 2, 3

1= 0.6, 2= 0.4

1= 2= 3= 0.33

Example: Simulation

x

l

c

z


Example: Simulation

c=1


x

l z

Example: Simulation

c=1


z=

x

l

Example: Simulation

l=1

c=1


z=

x

Example: Simulation

l=1

c=1


z=

x=

Example: Simulation

x

l

c

z


Example: Simulation

c=2


x

l z

Example: Simulation

c=2


z=

x

l

Example: Simulation

l=3

c=2


z=

x

Example: Simulation

l=3

c=2


z=

x=

ML estimation of a Transformed Mixture of Gaussians using EM

x

l

c

z

• E step: Compute P(l|x), P(c|x) and p(z|c,x) for each x in data

• M step: Set

– c = avg of P(c|x)

– l = avg of P(l|x)

– c = avg mean of p(z|c,x)

– c = avg variance of p(z|c,x)

– = avg var of p(x-Gl z|x)

A Tough Toy Problem

• 4 different shapes

• 25 possible locations• cluttered background

• fixed distraction

• 100 “clusters”

• 200 training cases


Mean and first 5principal components

TransformedMixture of Gaussians5 horiz shifts + 5 vert shifts20 iterations of EM

20 iterations of EM

Face ClusteringExamples of 400 outdoor images of 2 people

(44 x 28 pixels)


15 iterations of EM (MATLAB takes 1 minute)

Cluster meansc = 1 c = 2 c = 3 c = 4

Transformed mixture of Gaussians• 11 horizontal shifts; 11 vertical shifts

• 4 clusters

• Each cluster has 1 mean and 1 variance for each latent pixel

• 1 variance for each observed pixel

• Training: 15 iterations of EM (MATLAB script takes 10 sec/image)

Initialization


Transformed mixture of Gaussians

1 iteration of EM



2 iterations of EM



3 iterations of EM



4 iterations of EM



5 iterations of EM



6 iterations of EM



7 iterations of EM



8 iterations of EM



9 iterations of EM



10 iterations of EM



11 iterations of EM



12 iterations of EM



13 iterations of EM



14 iterations of EM



15 iterations of EM



20 iterations of EM



30 iterations of EM




30 iterations of EM


Modeling Written Digits

A TMG that Captures Writing Angle

• P(l|x) identifies the writing angle in image x

CLUSTERS

TRANSFORMATIONS

Wrap-up• MATLAB scripts available at

www.cs.uwaterloo.ca/~frey

• Other domains: audio, bioinformatics, …

• Other latent image models, p(z)– factor analysis (prob PCA) (ICCV99)– mixtures of factor analyzers (NIPS99)– time series (CVPR00)

• Automatic video clustering

• Fast variational inference and learning

Documents

Joint Estimation of Image Clusters and Image Transformations Brendan J. Frey Computer Science, University of Waterloo, Canada Beckman Institute and ECE,