25
Linear Models for Classification Berkay Topçu

Linear Models for Classification Berkay Topçu. Linear Models for Classification Goal: Take an input vector and assign it to one of K classes (C k where

Embed Size (px)

Citation preview

Page 1: Linear Models for Classification Berkay Topçu. Linear Models for Classification  Goal: Take an input vector and assign it to one of K classes (C k where

Linear Models for Classification

Berkay Topçu

Page 2: Linear Models for Classification Berkay Topçu. Linear Models for Classification  Goal: Take an input vector and assign it to one of K classes (C k where

Linear Models for Classification Goal: Take an input vector and assign it to

one of K classes (Ck where k=1,...,K) Linear separation of classes

Page 3: Linear Models for Classification Berkay Topçu. Linear Models for Classification  Goal: Take an input vector and assign it to one of K classes (C k where

Generalized Linear Models We wish to predict discrete class labels, or

more generally class posterior probabilities that lies in range (0,1).

Classification model as a linear function of the parameters,

Classification directly in the original input space , or a fixed nonlinear transformation of the input variables using a vector of basis functions

)()( 0Txwx wfy w

)|( xkCp

x

)(x

Page 4: Linear Models for Classification Berkay Topçu. Linear Models for Classification  Goal: Take an input vector and assign it to one of K classes (C k where

Discriminant Functions Linear discriminants

If , assign to class C1 and to class C2 otherwise

Decision boundary is given by determines the orientation of the decision

surface and determines the location Compact notation:

0Txwx wy )(

0)( xy0)( xy

w0w

xwx T~~)( y

),(~0 ww w

),1(~0 xx x

Page 5: Linear Models for Classification Berkay Topçu. Linear Models for Classification  Goal: Take an input vector and assign it to one of K classes (C k where

Multiple Classes K-class discriminant by combining number of

two-class discriminant functions (K>2) One-versus-the-rest: seperating points in one

particular class Ck from points not in that class One-versus-one: K(K-1)/2 binary discriminant

functions

Page 6: Linear Models for Classification Berkay Topçu. Linear Models for Classification  Goal: Take an input vector and assign it to one of K classes (C k where

Multiple Classes A single K-class discriminant comprising K

linear functions

Assign to class Ck if for all

How to learn the parameters of linear discriminant functions?

0xwx kTkk wy )(

)()( xx jk yy kj

Page 7: Linear Models for Classification Berkay Topçu. Linear Models for Classification  Goal: Take an input vector and assign it to one of K classes (C k where

Least Squares for Classification Each class Ck is described by its own linear

model

Training data set for n =1,...,N where

Matrix whose nth row is the vector and whose nth row is

xWxxwx 0~~

)(y )( Tk

Tkk wy

n, txn

)0,...,1,...,0,0(nt

T Tnt X

Tnx

Page 8: Linear Models for Classification Berkay Topçu. Linear Models for Classification  Goal: Take an input vector and assign it to one of K classes (C k where

Least Squares for Classification Minimizing the sum-of-squares error function

Solution :

Discriminant function :

)~~()

~~( Tr

2

1)( T TWXTWXW DE

TXTXXXW ~~)

~~(

~ 1 TT

xXTxWxy ~)~

(~~)( TTT

Page 9: Linear Models for Classification Berkay Topçu. Linear Models for Classification  Goal: Take an input vector and assign it to one of K classes (C k where

Fisher’s Linear Discriminant Dimensionality reduction: take the D-dimensional

input vector and project to one dimension using

Projection that maximizes class seperation Two-class problem: N1 points of C1 and N2 points of C2

Fisher’s idea: large separation between the projected class means small variance within each class, minimizing class

overlap

xwTy

ii

T

Cnn

Cnn

wmm

NN

1 ),(

1 ,

1

2112

22

11

21

mmw

xmxm

Page 10: Linear Models for Classification Berkay Topçu. Linear Models for Classification  Goal: Take an input vector and assign it to one of K classes (C k where

Fisher’s Linear Discriminant

The Fisher criterion:

21

))(())((

))(( ,)(

2211

1212

Cn

Tnn

Cn

TnnW

TB

WT

BT

J

mxmxmxmxS

mmmmSwSw

wSww

)( 121 mmSw w

Page 11: Linear Models for Classification Berkay Topçu. Linear Models for Classification  Goal: Take an input vector and assign it to one of K classes (C k where

Fisher’s Linear Discriminant For the two-class problem, Fisher criterion is a

special case of least squares (reference : Penalized Discriminant Analysis – Hastie, Buja and Tibshirani)

For multiple classes:

The weights values are determined by the eigenvectors that corresponds to K highest eigenvalues of

xWTy

)(

)(Tr)(

))(( ,1

B1

TW

TB

K

k

Tkkk

K

kkW

J

N

WWS

WWSW

mmmmSSS

BWSS 1

Page 12: Linear Models for Classification Berkay Topçu. Linear Models for Classification  Goal: Take an input vector and assign it to one of K classes (C k where

The Perceptron Algorithm Input vector is transformed using a

nonlinear transformation

Perceptron criterion: For all training samples

We need to minimize

)(xx

))(()( xwx Tfy

0 ,1

0 ,1)(

a

aaf

)1,1( t

0)( nnT txw

Mn

nnT tE ww)(P

Page 13: Linear Models for Classification Berkay Topçu. Linear Models for Classification  Goal: Take an input vector and assign it to one of K classes (C k where

The Perceptron Algorithm – Stocastic Gradient Descent Cycle through the training patterns in turn

If the pattern is correctly classified weight vectors remains unchanged, else:

nntE )(P

)()1( )( wwww

Page 14: Linear Models for Classification Berkay Topçu. Linear Models for Classification  Goal: Take an input vector and assign it to one of K classes (C k where

Probabilistic Generative Models Depend on simple assumptions about the

distribution of the data

Logistic sigmoid function Maps the whole real axis to a

finite interval

)()exp(1

1

)()|()()|(

)()|()|(

2211

111

aa

CpCpCpCp

CpCpCp

xx

xx

)()|(

)()|(ln

22

11

CpCp

CpCpa

x

x

Page 15: Linear Models for Classification Berkay Topçu. Linear Models for Classification  Goal: Take an input vector and assign it to one of K classes (C k where

Continuous Inputs - Gaussian Assuming the class-conditional densities are

Gaussian

Case of two classes

)()(

2

1exp

1

)2(

1)|( 1

212 kT

kDkCp μxΣμxΣ

x

)(

)(ln

2

1

2

1

)(

)()|(

2

12

121

110

211

01

Cp

Cpw

wCp

TT

T

μΣμμΣμ

μμΣw

xwx

Page 16: Linear Models for Classification Berkay Topçu. Linear Models for Classification  Goal: Take an input vector and assign it to one of K classes (C k where

Maximum Likelihood Solution Likelihood function:

Maximizing log-likelihood

N

nnn

nn NNp1

12121 )]|()1[()]|([)|( tt ,μx,μx,μ,μπ,t

2

1

))((1

))((1

)1(1

1

222

2

111

1

22

11

122

111

Cn

Tnn

Cn

Tnn

N

nnn

N

nnn

N

N

N

N

N

N

tN

tN

μxμxS

μxμxS

SS

xμxμ

Page 17: Linear Models for Classification Berkay Topçu. Linear Models for Classification  Goal: Take an input vector and assign it to one of K classes (C k where

Probabilistic Discriminative Models Probabilistic generative model

Number of parameters grows quadratically with M (# dim.)

However has M adjustable parameters Maximum likelihood solution for Logistic

Regression

Energy function: negative log likelihood

)()()|( 1 φwφφ TyCp

φwT

N

n

tn

tn

nn yyp1

1)1()|( wt

N

nnnnn ytytpE

1

)}1ln()1(ln{)|(ln)( wtw

Page 18: Linear Models for Classification Berkay Topçu. Linear Models for Classification  Goal: Take an input vector and assign it to one of K classes (C k where

Iterative Reweighted Least Squares Newton-Raphson iterative optimization on

linear regression

Same as the standard least-squares solution

)(1)()( wHww Eoldnew

tΦΦwΦφφww TTn

N

nnn

T tE 1

)()(

ΦΦφφwH TTn

N

nnE

1

)(

tΦΦΦtΦΦwΦΦΦww TTToldTToldnew 1)(1)()( )(}{)(

Page 19: Linear Models for Classification Berkay Topçu. Linear Models for Classification  Goal: Take an input vector and assign it to one of K classes (C k where

Iterative Reweighted Least Squares Newton-Raphson update for negative log

likelihood

Weighted least-squares problem

)()()(1

tyΦφw

Tn

N

nnn tyE

)1( ,)1()(1

nnnnTT

nn

N

nnn yyRyyE

RΦΦφφwH

)( )(

)( )(

)()(

1)(1

)(1

1)()(

tyRwΦRΦRΦΦ

tyΦwRΦΦRΦΦ

tyΦRΦΦww

oldTT

ToldTT

TToldnew

Page 20: Linear Models for Classification Berkay Topçu. Linear Models for Classification  Goal: Take an input vector and assign it to one of K classes (C k where

Maximum Margin Classifiers Support Vector Machines for two-class

problem

Assuming linearly seperable data set There exists at least one set of variables satisfies That give the smallest generalization error Margin: the smallest distance between decision

boundary and any of the samples

by T )()( xwx }1,1{ ,...1 nN txx0)( nn yt x

Page 21: Linear Models for Classification Berkay Topçu. Linear Models for Classification  Goal: Take an input vector and assign it to one of K classes (C k where

Support Vector Machines Optimization of parameters, maximizing the

margin

Maximizing the margin minimizing : subject to the constraint:

Introduction of Lagrange multipliers

w

xw

w

x ))(()( btyt nT

nnn

2

, 2

1minarg ww b

2w

1))(( bt nT

n xw

0na

N

nn

Tnn btabL

1

2}1))(({

2

1),,( xwwaw

Page 22: Linear Models for Classification Berkay Topçu. Linear Models for Classification  Goal: Take an input vector and assign it to one of K classes (C k where

Support Vector Machines - Lagrange Multipliers Minimizing with respect to w and b and

maximizing with respect to a.

The dual form:

Quadratic programming problem:

N

nn

N

nnnn tata

1n1

0 ,)(xw

,),(2

1)(

~

1 11

N

n

N

mmnmnmn

N

nn kttaaaL xxa

N

nnnn taa

1

0 ,0

CQaa

a

T

2

1 maxarg

~

bktayN

nnnn

1

),()( xxx

Page 23: Linear Models for Classification Berkay Topçu. Linear Models for Classification  Goal: Take an input vector and assign it to one of K classes (C k where

Support Vector Machines Overlapping class distributions (linearly

unseparable data) Slack variable: distance from the boundary

To maximize the margin whilepenalizing points that lie on the wrong side of the margin boundary

)(or 0 nnnn yt x

nnn yt 1)(x

2

1 2

1 min w

N

nnC

Page 24: Linear Models for Classification Berkay Topçu. Linear Models for Classification  Goal: Take an input vector and assign it to one of K classes (C k where

SVM-Overlapping Class Distributions Identical to separable case

Again represents a quadratic programming problem

N

nnn

N

nnnnn

N

nn ytaCbL

111

2}1)({

2

1),,( xwaw

,),(2

1)(

~

1 11

N

n

N

mmnmnmn

N

nn kttaaaL xxa

N

nnnn taCa

1

0 ,0

Page 25: Linear Models for Classification Berkay Topçu. Linear Models for Classification  Goal: Take an input vector and assign it to one of K classes (C k where

Support Vector Machines Relation to logistic regression

Hinge loss used in SVM and the error function of logistic regression approximate the ideal misclassification error(MCE)

Black : MCE, Blue: Hinge Loss, Red: Logistic Regression, Green: Squared Error

nntyz