47
Face Recognition Ying Wu [email protected] Electrical and Computer Engineering Northwestern University, Evanston, IL http://www.ece.northwestern.edu/~yingwu

Face Recognition Ying Wu [email protected] Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Embed Size (px)

Citation preview

Page 1: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Face Recognition

Ying [email protected]

Electrical and Computer EngineeringNorthwestern University, Evanston, IL

http://www.ece.northwestern.edu/~yingwu

Page 2: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Recognizing Faces?

Lighting

View

Page 3: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Outline

Bayesian Classification Principal Component Analysis (PCA) Fisher Linear Discriminant Analysis (LDA) Independent Component Analysis (ICA)

Page 4: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Bayesian Classification

Classifier & Discriminant Function Discriminant Function for Gaussian Bayesian Learning and Estimation

Page 5: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Classifier & Discriminant Function

Discriminant function gi(x) i=1,…,C Classifier

Example

Decision boundary

ij )()( if xgxgx jii

)(ln)|(ln)(

)(|)|()(

)|()(

iii

iii

ii

pxpxg

pxpxg

xpxg

The choice of D-function is not unique

Page 6: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Multivariate Gaussian

),(~)( Nxp

principal axes

The principal axes (the direction) are given by the eigenvectors of ;

The length of the axes (the uncertainty) is given by the eigenvalues of

x1

x2

Page 7: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Mahalanobis Distance

Mahalanobis distance is a normalized distance

x1

x2

2221 |||||||| xx

MM xx |||||||| 21

)()(|||| 1 xxcx TM

Page 8: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Whitening Whitening:

– Find a linear transformation (rotation and scaling) such that the covariance becomes an identity matrix (i.e., the uncertainty for each component is the same)

y1

y2

x1

x2

y=ATx

p(x) ~ N(, ) p(y) ~ N(AT, ATA)

UUUA TT

e wher :solution 2

1

Page 9: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Disc. Func. for Gaussian

Minimum-error-rate classifier

)(ln||ln2

12ln

2)()(

2

1)(

)(ln)|(ln)(

1iiii

Tii

iii

pd

xxxg

pxpxg

Page 10: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Case I: i = 2I

)(ln22

1)(ln

2

||||)(

22

2

iTi

Ti

Ti

ii pxxxp

xxg

0

22)(ln

2

11)(

iT

i

iTi

T

ii

WxW

pxxg

constant

Liner discriminant function

Boundary:

)(ln)(

where0)( )(

)()(

||||21

0

0

2

2

jipp

ji

ji

Tji

j

i

jix

W

xxWgxg

Page 11: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Example

Assume p(i)=p(j)

i

j

Let’s derive the decision boundary:

02

)(

jiT

ji x

Page 12: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Case II: i= )(ln)()()( 1

21

iiT

ii pxxxg

0

1211 ))(ln()()(

iT

i

iiTi

Tii

WxW

pxxg

)())(

)(ln)(ln)(

2

1

)(

where0)(

10

1

0

jiji

Tji

jiji

ji

T

ppx

W

xxW

The decision boundary is still linear:

Page 13: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Case III: i= arbitrary

)(ln||ln

where)(

211

21

0

1

121

0

iiiiTii

iii

ii

iT

iiT

i

pW

W

A

WxWxAxxg

The decision boundary is no longer linear, but hyperquadrics!

Page 14: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Bayesian Learning

Learning means “training” i.e., estimating some unknowns from

“training data” WHY?

– It is very difficult to specify these unknowns– Hopefully, these unknowns can be recovered

from examples collected.

Page 15: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Maximum Likelihood Estimation

Collected examples D={x1,x2,…,xn}

Estimate unknown parameters in the sense that the data likelihood is maximized

Likelihood Log Likelihood

ML estimation

n

kkxpDp

1

)|()|(

)|(maxarg)|(maxarg*

DLDp

n

kkxpDpL

1

)|(ln)|(ln)(

Page 16: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Case I: unknown

)()(|]|)2ln[()|(ln 121

21

kT

kd

k xxxp

n

kk

n

kk

kk

xn

x

xxp

11

1

1

1ˆ 0)ˆ(

)()|((ln

Page 17: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Case II: unknown and 22

21

21 let )(2ln),|(ln kk xxp

2

2

2

)(

2

1),|(ln

)(1),|(ln

kk

kk

xxp

xxp

)ˆ(ˆ1

0)ˆ(ˆ1

12

2

1

1

n

k

kn

k

k

n

k

x

x

n

kk

n

kk

xn

xn

1

222

1

)ˆ(1ˆˆ

Tk

n

kk

n

kk

xxn

xn

)ˆ()ˆ(1ˆ

1

1

generalize

Page 18: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Bayesian Estimation

Collected examples D={x1,x2,…,xn}, drawn independently from a fixed but unknown distribution p(x)

Bayesian learning is to use D to determine p(x|D), i.e., to learn a p.d.f.

p(x) is unknown, but has a parametric form with parameters ~ p()

Difference from ML: in Bayesian learning, is not a value, but a random variable and we need to recover the distribution of , rather than a single value.

Page 19: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Bayesian Estimation

This is obvious from the total probability rule, i.e., p(x|D) is a weighted average over all

If p( |D) peaks very sharply about some value *, then p(x|D) ~ p(x| *)

dDpxpdDxpDxp )|()|()|,()|(

Page 20: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

The Univariate Case

assume is the only unknown, p(x|)~N(, 2) is a r.v., assuming a prior p() ~ N(0, 0

2), i.e., 0 is the best guess of , and 0 is the uncertainty of it.

),(~)( ),,(~)|( where

)()|()()|()|(

20

2

1

ok

n

kk

NpNxp

pxppDpDp

kk

n xDp

)(2(exp)|( 2

0222

12121

P(|D) is also a Gaussian for any # of training examples

Page 21: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

The Univariate Case

220

2202

0220

2

220

20

n

2

ˆ

have we

),(~)|(let weif

n

nn

n

NDp

n

n

nn

The best guess for after observing n examples

n measures the uncertainty of this guess after observing n examples

n=1

n=5

n=10

n=30

p(|x1,…,xn) p(|D) becomes more and more sharply peaked when observing more and more examples, i.e., the uncertainty decreases.

nn

22

Page 22: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

The Multivariate Case

),(~)( ),(~)|( 00 NpNxp

n

kkn

nnn

nnnnn

nn

xn

NDp

1

11100

011

0111

00

1ˆ where

)(

ˆ)(ˆ)(

have we),,(~)|(let

),(~)|()|()|( and nnNdDpxpDxp

Page 23: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

PCA and Eigenface

Principal Component Analysis (PCA) Eigenface for Face Recognition

Page 24: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

PCA: motivation

Pattern vectors are generally confined within some low-dimensional subspaces

Recall the basic idea of the Fourier transform– A signal is (de)composed of a linear combination

of a set of basis signal with different frequencies.

Page 25: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

PCA: idea

emx

222

1

21

||||)(2||||

||)(||),,...,(

mxmxee

xemeJ

kkT

kk

n

kkkn

)( 0)(22 mxemxeJ

kT

kkT

kk

m

e x

xk

Page 26: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

PCA

2

2

222

||||

||||))((

||||2)(

mxeSe

mxemxmxe

mxeJ

k

kT

kkT

kkk

1|||| s.t. maxarg)(minarg eSeeeJ T

ee

)1(maxarg* eeSeee TT

e

1 i.e., ,0 SeeeSe TTo maximize eTSe, we need to select max

Page 27: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Algorithm

Learning the principal components from {x1, x2, …, xn}

] ..., , [ (5)

u and sorting )4(

ion decomposit eigenvalue )3(

))(( (2)

] ..., ,[ ,1

)1(

21

i

1

11

Tm

TTT

i

T

Tn

k

Tkk

n

n

kk

uu,u P

UUS

AAmxmxS

mxmxAxn

m

Page 28: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

PCA for Face Recognition

Training data D={x1, …, xM}– Dimension (stacking the pixels together to make a vector

of dimension N)

– Preprocessing cropping normalization

These faces should lie in a “face” subspace Questions:

– What is the dimension of this subspace?

– How to identify this subspace?

– How to use it for recognition?

Page 29: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Eigenface

The EigenFace approach: M. Turk and A. Pentland, 1992

Page 30: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

An Issue

In general, N >> M However, S, the covariance matrix, is NxN! Difficulties:

– S is ill-conditioned. Rank(S)<<N– The computation of the eigenvalue decomposition

of S is expensive when N is large Solution?

Page 31: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Solution I: Let’s do eigenvalue decomposition on ATA,

which is a MxM matrix ATAv=v AATAv= Av To see is clearly! (AAT) (Av)= (Av) i.e., if v is an eigenvector of ATA, then Av

is the eigenvector of AAT corresponding to the same eigenvalue!

Note: of course, you need to normalize Av to make it a unit vector

Page 32: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Solution II:

You can simply use SVD (singular value decomposition)

A = [x1-m, …, xM-m]

A = UVT

– A: NxM– U: NxM UTU=I : MxM diagonal – V: MxM VTV=VVT=I

Page 33: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Fisher Linear Discrimination

LDA PCA+LDA for Face Recognition

Page 34: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

When does PCA fail?

PCA

Page 35: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Linear Discriminant Analysis

Finding an optimal linear mapping WFinding an optimal linear mapping W

Catches major difference between classes and Catches major difference between classes and discount irrelevant factorsdiscount irrelevant factors

In the mapped space, data are clusteredIn the mapped space, data are clustered

Page 36: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Within/between class scatters

2211

1

22

11

1

21

~ ,1~

:nsformlinear tra the

1 ,

1

mWmmWxWn

m

xWy

xn

mxn

m

T

Dx

TT

T

DxDx

TB

W

T

Yy

T

Yy

Dx

T

Dx

T

mmmmS

SSS

WSWmySWSWmyS

mxmxSmxmxS

))(( :scatter classbetween

:scatter classwithin

)~(~

,)~(~

))(( ,))((

2121

21

22

2212

11

222111

21

21

Page 37: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Fisher LDA

WSW

WSW

WSSW

WmmmmW

SS

mmWJ

WT

BT

T

TT

)(

))((~~

|~~|)(

21

2121

21

221

)(maxarg* WJWW

problem eigenvalue dgeneralize a is this )(max wSwSWJ WB

Page 38: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Solution I

If Sw is not singular

You can simply do eigenvalue decomposition on SW

-1SB

wwSS BW 1

Page 39: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Solution II

Noticing:– SBW is on the direction of m1-m2 (WHY?)

– We are only concern about the direction of the projection, rather than the scale

We have

)( 211 mmSw W

Page 40: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Multiple Discriminant Analysis

C

kkW

Dx

Tiii

Dxii

SS

mxmxS

xn

m

i

i

1

))((

,1

WSWmmmmnS

WSWS

mnn

m

mWyn

m

BT

C

k

TiiiB

WT

W

C

kkk

iT

Yyii

i

1

1

)~~)(~~(~

,~

,~1~

,1~

xWy T

||

||

|~

|

|~

|maxarg*

WSW

WSW

S

SW

WT

BT

W

B

W

iWiB wSwS

Page 41: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Comparing PCA and LDA

PCA

LDA

Page 42: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

MDA for Face Recognition

Lighting

• PCA does not work well! (why?)

• solution: PCA+MDA

Page 43: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Independent Component Analysis

The cross-talk problem ICA

Page 44: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Cocktail party

t

S1(t)

S2(t)

t

t

x1(t)

t

x2(t)

t

x3(t)

A

Can you recover s1(t) and s2(t) from x1(t), x2(t) and x3(t)?

Page 45: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Formulation

ASXsaX

jsasaxn

iii

njnjj

or

...

1

11

Both A and S are unknowns!

Can you recover both A and S from X?

Page 46: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

The Idea

y is a linear combination of {si} Recall the central limit theory! A sum of even two independent r.v. is more

Gaussian than the original r.v. So, ZTS should be more Gaussian than any of {si} In other words, ZTS become least Gaussian when

in fact equal to {si} Amazed!

SZASWXWY TTT

Page 47: Face Recognition Ying Wu yingwu@ece.northwestern.edu Electrical and Computer Engineering Northwestern University, Evanston, IL yingwu

Face Recognition: Challenges

View