21
Linear Discriminant Fu nctions Wen-Hung Liao, 11/25/2008

Linear Discriminant Functions Wen-Hung Liao, 11/25/2008

Embed Size (px)

Citation preview

Page 1: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008

Linear Discriminant Functions

Wen-Hung Liao, 11/25/2008

Page 2: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008

Introduction: LDF

Assume we know the proper form of the discriminant functions, instead of the underlying probability densities.Use samples to estimate the parameters of the classifier.(statistical or non-statistical)Will be concerned with discriminant functions that are either linear in the components of x, or linear in some given set of functions of x.

Page 3: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008

Why LDF?

Simplicity vs. accuracyAttractive candidates for initial, trial classifiersRelated to neural networks

Page 4: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008

Approach

Find the LDF by minimizing a criterion function. Use gradient descent procedure for

minimization Convergence property Computational complexities

Example of criterion function: Sample risk, or training error. (Not appropriate, why?) Because a small training error does not guarantee a small test error.

Page 5: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008

LDF and Decision Surfaces

A linear discriminant function:

where w : weight vectorw0: bias or threshold

0)( xxg t

Page 6: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008

Two-Category CaseDecision rule: Decide w1 if g(x) > 0, decide w2 if g(x)<0

In other words, x is assigned to w1 if the inner product wtx exceeds the threshold –w0.

Page 7: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008

Decision Boundary

A hyperplane H defined by g(x)=0If x1 and x2 are both on the decision surface, then:

w is normal to any vector lying on the hyperplane.

0)( 21

0201

xxw

wxwwxwt

tt

Page 8: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008

Distance Measure

For any x,

where xp is the normal projection of x onto H , and r is the algebraic distance.

|||| w

wrxx p

||||)||||

()( wrw

wrxwxg p

t

||||

)(

w

xgr

Page 9: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008

Multi-category Case

General case:

c-1 2-class c(c-1)/2 linear discriminant

Page 10: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008

Use c linear discriminants,,...1,)( 0 ciwxwxg i

tii

Page 11: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008

Distance Measure

wi-wj is normal to Hij.Distance for x to Hij is given by:

|||| ji

jiij ww

ggr

Page 12: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008

Quadratic DF

Add terms involving products of pairs of component of x to obtain the quadratic discriminant function:

The separating surface defined by g(x)=0 is a hyperquadric function.

ji

d

jij

d

ii

d

ii xxwxwwxg

111

0)(

Page 13: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008

Hyperquadric Surfaces

If W=[wij] is not singular, then the linear terms in g(x) can be eliminated by translating the axes.Define a scale matrix:HypersphereHyperellipsoidHyperperboloid

)4( 01 wwWw

WW t

Page 14: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008

Generalized LDF

Polynomial discriminant functionsGeneralized LDF:

d

iii xyaxg

ˆ

1

)()(

Page 15: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008

Augment Vectors

Augment feature vector:

Augment weight vector:

Mapping a d-dimensional x-space to (d+1)-dimensional y-space

i

d

iii

d

ii xwxwwxg

01

0)(

w

w

w

w

w

a

d

0

1

0

x

x

xy

d

11

1

Page 16: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008

2-Category Separable Case

Look for a weight vector that classifies all of the samples correctly. If such a weight does exist, then the samples are said to be linearly separable.

Page 17: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008

Gradient Descent Procedure

Define a criterion function J(a) that is minimized if a is a solution vector.Step 1: Randomly pick a(1), and compute the gradient vector:Step 2: a(2) is obtained by moving some distance from a(1) in the direction of the steepest descent.

))1((aJ

))(()()()1( kaJkkaka

Page 18: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008

Setting the Learning Rate

Second-order expansion of J(a):

Substituting

Minimized whenJHJkJkkaJkaJ t )(

2

1||||)())(())1(( 22

jiij aa

Jh

2

))(()()()1( kaJkkaka

))(())((2

1))(())(()( kaaHkaakaaJkaJaJ tt

JHJ

Jk

t

2||||

)(

Page 19: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008

Newton Descent For nonsingular H

Converges faster but more difficult to compute per step.

JHkaka 1)()1(

Page 20: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008

Perceptron Criterion Function

where Y(a) is the set of samples misclassified by a. Since

Update rule:

Yy

tp yaaJ )()(

Yy

p yJ )(

kYy

ykkaka )()()1(

Page 21: Linear Discriminant Functions Wen-Hung Liao, 11/25/2008

Convergence Proof

Refer to page 229 to 232 of textbook.