Transform-based Models Principal component analysis (PCA) or Karhunen-Loeve transform (KLT) Application into Face Recognition and MATLAB demo DFT, DCT

Transform-based Models

Principal component analysis (PCA) or Karhunen-Loeve transform (KLT)

Application into Face Recognition and MATLAB demo

DFT, DCT and Wavelet transformsStatistical modeling of transform coefficients

(sparse representations)

Application into Texture Synthesis and MATLAB demos

EE565 Advanced Image Processing Copyright Xin Li 2008

1

PCA/KLT

What is principal components?direction of maximum variance in the input space

(physical interpretation)principal eigenvector of the covariance matrix

(mathematical definition)

Theoretic derivations (This is not a Theory Course like EE513)There exist several different approaches in the

literature of statistics, economics and communication theory


2

Standard Derivation (Covariance method)


3

(unitary condition, refer to EE465)

Basic idea: Diagonalization

=

Geometric Interpretation (direction of maximum variation/information)


4

Why Does PCA/KLT Make Sense? In Pattern Recognition (e.g., R. Duda’s textbook “Pattern

Classification”) or in Signal Processing (e.g., S. Mallat’s textbook “A Wavelet Tour of Signal Processing”)Analytical results are available for stationary Gaussian

processes except the unknown parameters (low-order statistics)

Classical ML/Bayes parameter estimation works most effectively under the independence assumption (recall the curse of dimensionality)

Transform facilitates the satisfaction of this assumption In Economics, Google “Hotteling transform”


5


6

Example: Transform Facilitates Modeling

x1

x2 y1y2

x1 and x2 are highly correlated

p(x1x2) p(x1)p(x2)

y1 and y2 are less correlated

p(y1y2) p(y1)p(y2)

Comparison Between LR and LT

Linear regression (AR model)Hyperplane fitting (a

local strategy)Dimensionality

reduction: data space mapped to parameter space

Distortion not preserved (refer to EE467 closed-loop opt. in speech coding)

Linear transform (PCA/KLT)Rotation of coordinate

(a global strategy)Dimensionality

reduction: only preserve the largest eigenvalues in the data space

Preserves distortion (unitary property of P)


7








8

Appearance-based Recognition(adapted from CMU Class 15385-s06)

• Directly represent appearance (image brightness), not geometry.

• Why?

Avoids modeling geometry, complex interactions between geometry, lighting and reflectance.

• Why not?

Too many possible appearances!

m “visual degrees of freedom” (eg., pose, lighting, etc)R discrete samples for each DOF

“nature is economical of structures but of principles” –Abdus Salam

The Space of Faces

An image with N pixels is a point in N-dimensional spaceA collection of M images is a cloud of M point in RN

We can define vectors in this space as we did in the 2D case

+=

[Apologies to former President Bush]

Key Idea: Linear Subspace

}ˆ{ PRLx• Images in the possible set are highly correlated.

• So, compress them to a low-dimensional linear subspace that captures key appearance characteristics of the visual DOFs.

Linearity assumption is a double-bladed sword: it facilitates analytical derivation and computational solution butNature seldom works in a linear fashion

• EIGENFACES: [Turk and Pentland’1991]

USE PCA!

Example of Eigenfaces

Eigenfaces look somewhat like ghost faces.

Training set of face images. 15 principal components(eigenfaces or eigenvectorscorresponding to 15 largesteigenvalues).

Linear Subspaces explained by 2D Toy Example (Easier for Visualization)

Classification can be expensiveMust either search (e.g., nearest neighbors) or store large probability density functions.

• Suppose the data points are arranged as above– Idea—fit a line, classifier measures distance to line

convert x into v1, v2 coordinates

What does the v2 coordinate measure?

What does the v1 coordinate measure?

- distance to line- use it for classification—near 0 for orange pts

- position along line- use it to specify which orange point it is

Dimensionality Reduction

• Dimensionality reduction– We can represent the orange points with only their v1 coordinates

• since v2 coordinates are all essentially 0

– This makes it much cheaper to store and compare points– A much bigger deal for higher dimensional problems

Linear Subspaces (PCA in 2D)

Consider the variation along direction v among all of the orange points:

What unit vector v minimizes var?

What unit vector v maximizes var?

Solution: v1 is eigenvector of A with largest eigenvalue v2 is eigenvector of A with smallest eigenvalue

PCA in Higher Dimensions

Suppose each data point is N-dimensionalSame procedure applies:

The eigenvectors of A define a new coordinate systemeigenvector with largest eigenvalue captures the most variation among

training vectors xeigenvector with smallest eigenvalue has least variation

We can compress the data by only using the top few eigenvectors

corresponds to choosing a “linear subspace”• represent points on a line, plane, or “hyper-plane”

these eigenvectors are known as the principal components

Problem: Size of Covariance Matrix A

Suppose each data point is N-dimensional (N pixels)

The size of covariance matrix A is N x NThe number of eigenfaces is N

Example: For N = 256 x 256 pixels, Size of A will be 65536 x 65536 ! Number of eigenvectors will be 65536 !

Typically, only 20-30 eigenvectors suffice. So, this method is very inefficient!

2 2

If B is MxN and M<<N then A=BTB is NxN >> MxM

M number of images, N number of pixels

use BBT instead, eigenvector of BBT is easily

converted to that of BTB

(BBT) y = y

=> BT(BBT) y = (BTy)

=> (BTB)(BTy) = (BTy)

=> BTy is the eigenvector of BTB

Efficient Computation of Eigenvectors*(You can skip it if you don’t like matrix)

Eigenfaces – summary in words

Eigenfaces are the eigenvectors of the covariance matrix of the probability distribution of the vector space of human faces

Eigenfaces are the ‘epitomized face ingredients’ derived from the statistical analysis of many pictures of human faces

A human face may be considered to be a combination of these epitomized faces

Generating Eigenfaces – in words

1. Large set of images of human faces is taken.2. The images are normalized to line up the eyes,

mouths and other features. 3. The eigenvectors of the covariance matrix of

the face image vectors are then extracted.4. These eigenvectors are called eigenfaces.5. Well – please also keep in mind that if BTB is

too large, you can use BBT instead (an algebraic trick)

Eigenfaces for Face Recognition

When properly weighted, eigenfaces can be summed together to create an approximate gray-scale rendering of a human face.

Remarkably few eigenvector terms are needed to give a fair likeness of most people's faces.

Hence eigenfaces provide a means of applying “data compression” to faces for identification purposes (note NOT for transmission purpose).

Detection with Eigenfaces

The set of faces is a “subspace” of the set of images

Suppose it is K dimensional

We can find the best subspace using PCA

This is like fitting a “hyper-plane” to the set of faces

spanned by vectors v1, v2, ..., vK

Any face:

Eigenfaces

PCA extracts the eigenvectors of A Gives a set of vectors v1, v2, v3, ...

Each one of these vectors is a direction in face spacewhat do these look like?

Projecting onto the Eigenfaces(it is easier to understand projection by using the 2D toy

example though conceptually high-D works the same way)The eigenfaces v1, ..., vK span the space of faces

A face is converted to eigenface coordinates by

Key Property of Eigenspace Representation

Given

• 2 images that are used to construct the Eigenspace

• is the eigenspace reconstruction of image

• is the eigenspace reconstruction of image

Then,

That is, distance in Eigenspace is approximately equal to the correlation between two images.

21 ˆ,ˆ xx

1x

2x1g

2g

||ˆˆ||||ˆˆ|| 1212 xxgg

Advantage of Dimensionality ReductionxRHxW →a RK

Training set: x1 ,x2 ,…,xM → a1 ,a2 ,…,aM

New image: x → aFor detection: thresholding d=mean(||a-

ak||)

For recognition: select the index minimizing ||a-ak||


26

Detection: Is this a face or not?

Recognition: Whose Face is This? Algorithm

1. Process the image database (set of images with labels)

Run PCA—compute eigenfaces Calculate the K coefficients for each image

2. Given a new image (to be recognized) x, calculate K coefficients

3. Detect if x is a face

4. If it is a face, who is it?

• Find closest labeled face in database• nearest-neighbor in K-dimensional space

(M’ is the number of eigenfaces used)

An Excellent Toy Example to Help Your Understanding

Choosing the Dimension K

K NMi =

eigenvalues

How many eigenfaces to use?

Look at the decay of the eigenvalues

the eigenvalue tells you the amount of variance “in the direction” of that eigenface

ignore eigenfaces with low variance

New Ideas We can Play with

A localized version of eigenfaces-based recognitionFor each subject k=1,…,K, obtain its associated N

eigen-faces vk1 ,…, vk

N

For a new image x, project it onto all K sets of “localized” eigen-face spaces (so we can obtain K reconstructed copies x1,…, xK)

Select the one minimizing ||x-xk||

Connection with sparse representation (or l1 regularization) - refer to Prof. Yi Ma’s homepage and his new PAMI paper

^ ^

^








35

Alternative Tools (more suitable for Telecommunication Applications)Discrete Fourier Transform

Can be shown to be KLT for circular stationary process (eigen-vectors take the form of discrete Fourier basis)

Discrete Cosine Transform (DCT)Good approximation of KLT for AR(1) with high-

correlation (e.g., a=0.9) – you are asked to show this in HW#1

Wavelet TransformEffective tool for characterizing transient signals


36

One-Minute Tour of Wavelet


37

0 20 40 60 80 100-1.5

-1

-0.5

0

0.5

1

1.5

0 20 40 60 80 100-0.04

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

0 20 40 60 80 100 120 140 160 180 200-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

x(n)

s(n) d(n)


38

Wavelet Transform on Images

After row transform:each row is decomposed intolow-band (approximation) and high-band (detail)

s(m,n) d(m,n)

LL

LH

HL

HH

Note that the order of row/columntransform does not matter


39

From one-level to multi-level

Relationship to Linear Regression (Sparsity Perspective)In AR model, the effectiveness is

measured by the energy (sparsity) of prediction errors

In transform-based models, the effectiveness is measured by the energy (sparsity) of transform coefficients

Improved sparsity (or lower energy) implies a better match between the assumed model and observation data


40


41

Empirical Observation

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

100

200

300

400

500

600

700

800

900

1000

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

100

200

300

400

500

600

700

800

900

1000

ExtractHL bandAfter WT

A single peak at zero


42

Univariate Probability Model

Laplacian:

Gaussian:


43

Gaussian Distribution


44

Laplacian Distribution


45

Statistical Testing

How do we know which parametric model better fits the empirical distribution of wavelet coefficients?

In addition to visual inspection (which is often subjective and less accurate), we can use various statistical testing tools to objectively evaluate the closeness of an empirical cumulative distribution function (ECDF) to the hypothesized one

One of the most widely used techniques is Kolmogorov-Smirnov Test (MATLAB function: >help kstest).


46

Kolmogorov-Smirnov Test*

The K-S test is based on the maximum distance between empirical CDF (ECDF) and hypothesized CDF (e.g., the normal distribution N(0,1)).


47

Example

Usage: [H,P,KS,CV] = KSTEST(X,CDF)If CDF is omitted, it assumes pdf of N(0,1)

x: computer-generated samples(0<P<1, the larger P, the more likely)

Accept the hypothesis

Reject the hypothesis

d: high-band wavelet coefficientsof lena image (note the normalizationby signal variance)


48

Generalized Gaussian/Laplacian Distribution

where

Laplacian

Gaussian

P: shape parameter: variance parameter


49

Model Parameter Estimation*

Maximum Likelihood EstimationMethod of momentsLinear regression method

[1] Sharifi, K. and Leon-Garcia, A. “Estimation of shape parameter for generalized Gaussian distributions in subband decompositions of video,”IEEE T-CSVT, No. 1, February 1995, pp. 52-56.

[2] www.cimat.mx/reportes/enlinea/I-01-18_eng.pdf

Ref.








50


51

Wavelet-based Texture Synthesis

Pyramid-based (using steerable pyramids)Facilitate the statistical modeling

Histogram matchingEnforce the first-order statistical constraint

Texture matchingAlternate histogram matching in spatial and wavelet

domain

Boundary handling: use periodic extension Color consistency: use color transformation

Basic idea: two visually similar textures will also have similar statistics


52

Histogram MatchingGeneralization of histogram equalization (the target is the histogramof a given image instead of uniform distribution)


53

Histogram Equalization

x

t

thLy0

)(Uniform

Quantization

L

t

th0

1)(Note:

L

1

x

t

ths0

)(

x

L

y

0

cumulative probability function


54

MATLAB Implementation

function y=hist_eq(x)

[M,N]=size(x);for i=1:256 h(i)=sum(sum(x= =i-1));End

y=x;s=sum(h);for i=1:256 I=find(x= =i-1); y(I)=sum(h(1:i))/s*255;end

Calculate the histogramof the input image

Perform histogramequalization


55

Histogram Equalization Example


56

Histogram Specification

ST

S-1*T

histogram1 histogram2

?


57

Texture Matching

Objective: the histogram of both subbands and synthesized image matches the given template

Basic hypothesis: if two texture images visually look similar, then theyhave similar histograms in both spatial and wavelet domain


58

Image Examples


59

I.I.D. Assumption Challenged

If wavelet coefficients of each subband are indeed i.i.d., then random permutation of pixels should produce another image of the same class (natural images)

The fundamental question here: does WT completely decorrelate image signals?


60

Image Example

High-band coefficientspermutation

You can run the MATLAB demo to check this experiment


61

Another Experiment

-4 -3 -2 -1 0 1 2 3 4-4

-3

-2

-1

0

1

2

3

4

5

Joint pdf of two uncorrelated random variables X and Y

X

Y


62

Joint PDF of Wavelet Coefficients

Neighborhood I(Q): {Left,Up,cousin and aunt}

X=

Y=

Joint pdf of two correlated random variables X and Y


63

Texture Synthesis via Parametric Models in the Wavelet Space

Instead of matching histogram (nonparametric models), we can buildparametric models for wavelet coefficients and enforce the synthesizedimage to inherit the parameters of given image

Model parameters: 710 parameters were used in Portilla and Simoncelli’s experiment (4 orientations, 4 scales, 77 neighborhood)

Basic idea: two visually similar textures will also have similar statistics


64

Statistical Constraints

Four types of constraintsMarginal StatisticsRaw coefficient correlationCoefficient magnitude statisticsCross-scale phase statistics

Alternating Projections onto the four constraint setsProjection-onto-convex-set (POCS)


65

Convex Set

A set Ω is said to be convex if for any two point yx ,We have 10,)1( ayaax

Convex set examples

Non-convex set examples


66

Projection Operator

f

g

Projection onto convex set C

C ||}||min||||{ fxfxCxPfgCx

In simple words, the projection of f onto a convex set C is theelement in C that is closest to f in terms of Euclidean distance


67

Alternating Projection

X0

X1

X2

X∞

Projection-Onto-Convex-Set (POCS) Theorem: If C1,…,Ck areconvex sets, then alternating projection P1,…,Pk will convergeto the intersection of C1,…,Ck if it is not empty

Alternating projection does not always converge in the caseof non-convex set. Can you think of any counter-example?

C1

C2


68

Convex Constraint Sets

● Non-negative set

}0|{ ff

● Bounded-value set

}2550|{ ff

● Bounded-variance set

}||||{ 2 Tgff

A given signal

}|{ BfAf or


69

Non-convex Constraint Set

Histogram matching used in Heeger&Bergen’1995

Bounded Skewness and Kurtosis

skewness kurtosis

The derivation of projection operators onto constraint sets are tediousare referred to the paper and MATLAB codes by Portilla&Simoncelli.


70

Image Examples

original

synthesized


71

Image Examples (Con’d)

original

synthesized


72

When Does It Fail?

original

synthesized


73

Summary

Textures represent an important class of structures in natural images – unlike edges characterizing object boundaries, textures often associate with the homogeneous property of object surfaces

Wavelet-domain parametric models provide a parsimonious representation of high-order statistical dependency within textural images

Documents

Transform-based Models Principal component analysis (PCA) or Karhunen-Loeve transform (KLT) Application into Face Recognition and MATLAB demo DFT, DCT