Classification : Logistic regression. Generative ...milos/courses/cs2750... · CS 2750 Machine...

CS 2750 Machine Learning

CS 2750 Machine LearningLecture 8

Milos Hauskrechtmilos@cs.pitt.edu5329 Sennott Square

Classification : Logistic regression.

Generative classification model.

Binary classification

• Two classes• Our goal is to learn to classify correctly two types of examples

– Class 0 – labeled as 0, – Class 1 – labeled as 1

• We would like to learn• Zero-one error (loss) function

• Error we would like to minimize:• First step: we need to devise a model of the function

}1,0{=Y

}1,0{: →Xf

iiii yf

yfyError

),(0),(1

),(1 wxwx

)),(( 1),( yErrorE yx x

Discriminant functions

• One way to represent a classifier is by using– Discriminant functions

• Works for binary and multi-way classification

• Idea: – For every class i = 0,1, …k define a function

mapping– When the decision on input x should be made choose the

class with the highest value of

• So what happens with the input space? Assume a binary case.

)(xigℜ→X

)()( 01 xx gg ≥

-2 -1.5 -1 -0.5 0 0.5 1 1.5-2

)()( 01 xx gg ≥

)()( 01 xx gg ≤

-2 -1.5 -1 -0.5 0 0.5 1 1.5-2

)()( 01 xx gg ≤

)()( 01 xx gg ≥

)()( 01 xx gg ≤

-2 -1.5 -1 -0.5 0 0.5 1 1.5-2

)()( 01 xx gg ≤

)()( 01 xx gg ≥

• Define decision boundary

)()( 01 xx gg ≥

)()( 01 xx gg ≤

-2 -1.5 -1 -0.5 0 0.5 1 1.5-2

)()( 01 xx gg ≥

)()( 01 xx gg ≤

)()( 01 xx gg =

-2 -1.5 -1 -0.5 0 0.5 1 1.5-2

3Decision boundary

Quadratic decision boundary

)()( 01 xx gg ≥

)()( 01 xx gg ≤ )()( 01 xx gg =

Logistic regression model

• Defines a linear decision boundary• Discriminant functions:

• where

)()()( 1 xwxwwx, TT ggf ==

)1/(1)( zezg −+=

xInput vector

1x )( wx,f

Logistic function

)()(1 xwx Tgg = )(1)(0 xwx Tgg −=- is a logistic function

Logistic functionfunction

• Is also referred to as a sigmoid function• Replaces the threshold function with smooth switching • takes a real number and outputs the number in the interval [0,1]

)1(1)( ze

zg −+=

-20 -15 -10 -5 0 5 10 15 200

Logistic regression model

• Discriminant functions:

• Values of discriminant functions vary in [0,1]– Probabilistic interpretation

),|1( wx=yp

xInput vector

)()(1 xwx Tgg = )(1)(0 xwx Tgg −=

)()(),|1()( 1 xwxxwwx, Tggypf ====

Logistic regression

• We learn a probabilistic function

– where f describes the probability of class 1 given x

Note that:

• Transformation to binary class values:

),|1()()( 1 wxxwwx, === ypgf T

]1,0[: →Xf

2/1)|1( ≥= xypIf then choose 1Else choose 0

)|1(1),|0( wx,wx =−== ypyp

Linear decision boundary

• Logistic regression model defines a linear decision boundary• Why?• Answer: Compare two discriminant functions.• Decision boundary:• For the boundary it must hold:

)(1log)()(log

)()( 01 xx gg =

0)(explog

)(exp11

)(exp1)(exp

log)()(log

==−=

−+−

= xwxw

Logistic regression model. Decision boundary

• LR defines a linear decision boundaryExample: 2 classes (blue and red points)

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2

2Decision boundary

Likelihood of outputs• Let

• Then

• Find weights w that maximize the likelihood of outputs– Apply the log-likelihood trick The optimal weights are the

same for both the likelihood and the log-likelihood

Logistic regression: parameter learning

=−=−= ∑∏=

iiiiDl1

1 )1(log)1(log),( µµµµw

∏∏=

−===n

iiyyPDL1

)1(),|(),( µµwxw

)()(),|1( xwwx Tiiii gzgyp ====µ

)1log()1(log1

ii yy µµ −−+= ∑

>=< iii yD ,x

Logistic regression: parameter learning• Log likelihood

• Derivatives of the loglikelihood

• Gradient descent:

)1log()1(log),(1

ii yyDl µµ −−+= ∑

)),(())((),(11

fygyDl xwxxwxww −−=−−=−∇ ∑∑==

)1(|)],([)()1()(−−∇−← −

kDlkkkww www α

Nonlinear in weights !!

−− −+←n

kk fyk1

)1()1()( )],([)( xxwww α

))((),(1

zgyxDlw

−−=∂∂

− ∑=

Logistic regression. Online gradient descent

• On-line component of the loglikelihood

• On-line learning update for weight w

• ith update for the logistic regression and

)1(|)],([)()1()(−∇−← −

kkonlinekk DJk

ww www α

),( wkonline DJ

>=< kkk yD ,x

iki fyk xxwww )],()[( )1()1()( −− −+← α

)1log()1(log),(online iiiii yyDJ µµ −−+=− w

Online logistic regression algorithm

Online-logistic-regression (D, number of iterations)initialize weightsfor i=1:1: number of iterations

do select a data point from Dset update weights (in parallel)

end forreturn weights

),,( 210 dwwww K=w

i/1=α

iii fyi xxwww )],()[( −+← α

>=< iii yD ,x

Online algorithm. Example.

Derivation of the gradient• Log likelihood

• Derivatives of the loglikelihood

)1log()1(log),(1

ii yyDl µµ −−+= ∑

)),(())((),(11

fygyDl xwxxwxww −−=−−=∇ ∑∑==

ij wzyy

w ∂∂

−−+∂∂

=∂∂ ∑

)1log()1(log),( µµw

iiiiii

z ∂∂

−−

−+∂

∂=−−+

∂∂ )(

)(11)1()(

)(1)1log()1(log µµ

))(1)(()(ii

i zgzgzzg

−=∂

∂Derivative of a logistic function

)())()(1())(1( iiiiii zgyzgyzgy −=−−+−=

,=∂∂

Generative approach to classification

Idea: 1. Represent and learn the distribution2. Use it to define probabilistic discriminant functions

Typical model• = Class-conditional distributions (densities)

binary classification: two class-conditional distributions

• = Priors on classes - probability of class ybinary classification: Bernoulli distribution

)0|( =yp x

1)1()0( ==+= ypyp

),( yp x

)()|(),( ypypyp xx =)|( yp x

)1|( =yp x)( yp

)|0()( xx == ypg o )|1()(1 xx == ypg

Generative approach to classification

Example:• Class-conditional distributions

– multivariate normal distributions

• Priors on classes (class 0,1)– Bernoulli distribution

−−−= − )()(

)2(1)|( 1

2/12/µxΣµx

ΣΣµ,x T

0for),(~ 00 =yN Σµx1for),(~ 11 =yN Σµx

Multivariate normal ),(~ Σµx N

yyyp −−= 1)1(),( θθθ

Bernoulliy ~

}1,0{∈y

Learning of parameters of the model

Density estimation in statistics• We see examples – we do not know the parameters of

Gaussians (class-conditional densities)

• ML estimate of parameters of a multivariate normal for a set of n examples of x Optimize log-likelihood:

• How about class priors?

−−−= − )()(

)2(1),|( 1

2/12/µxΣµx

ΣΣµx T

1ˆ xµ Tn

in)ˆ)(ˆ(1ˆ

1µxµxΣ ii −−= ∑

)|(log),,(1

Σ,µxΣµ ∏=

),( ΣµN

Generative model

)()( 01 xx gg ≥

-2 -1.5 -1 -0.5 0 0.5 1 1.5-2

2 Gaussian class-conditional densities

Making class decision

Basically we need to design discriminant functionsTwo possible choices:• Likelihood of data – choose the class (Gaussian) that explains

the input data (x) better (likelihood of the data)

• Posterior of a class – choose the class with better posterior probability

),|(),|( 0011 ΣxΣx µµ pp > then y=1else y=0

)|0()|1( xx =>= ypyp then y=1else y=0

)1(),|()0(),|()1(),|()|1(

==yppypp

yppypΣxΣx

Σxxµµ

)(1 xg )(0 xg

2 Gaussians: Quadratic decision boundary

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2

2Contours of class-conditional densities

2 Gaussians: Quadratic decision boundary

-2 -1.5 -1 -0.5 0 0.5 1 1.5-2

3Decision boundary

2 Gaussians: Linear decision boundary• When covariances are the same 0,),(~ 0 =yN Σµx

1,),(~ 1 =yN Σµx

2 Gaussians: Linear decision boundary

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2

2Contours of class-conditional densities

2 Gaussians: linear decision boundary

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2

2Decision boundary

Classification : Logistic regression. Generative ...milos/courses/cs2750... · CS 2750 Machine...

Documents

Logistic Regression and Discriminant Analysis...Logistic regression and discriminant analysis are approaches using a number of factors to investigate the function of a nominally (e.g.,

Logistic Regression - krishanpandey.com Regression.pdfCHAPTER 22. LOGISTIC REGRESSION 22.1 Introduction 22.1.1 Difference between standard and logistic regression In regular multiple-regression

Logistic Regression III: Advanced topics Conditional Logistic Regression for Matched Data Conditional Logistic Regression for Matched Data

Logistic Regression - cs.wellesley.educs.wellesley.edu/~cs305/lectures/6_Logistic_Regression.pdfLogistic Regression Logistic regression is used for classification, not regression!

Logistic Regression for Distribution Modeling - CLAS Usersusers.clas.ufl.edu/.../logistic-regression_modeling.pdf · Logistic Regression for Distribution Modeling ... logistic regression

Tackling Big Data Using MATLAB€¦ · Scale up with tallmachine learning models Linear Regression (fitlm) Logistic & Generalized Linear Regression (fitglm) Discriminant Analysis

Teaching Logistic Regression using Ordinary Least … Logistic Regression ... showed how they used logistic regression in the room class to ... Teaching Logistic Regression using Ordinary

Regression Logistic Regression

Logistic Regression€¦ · Logistic Regression • Combine with linear regression to obtain logistic regression approach: • Learn best weights in • • We know interpret this

Logistic Regression - svivek · Logistic Regression is the discriminative version. This lecture •Logistic regression •Connection to Naïve Bayes •Training a logistic regression

1 Chapter 16 logistic Regression Analysis. 2 Content Logistic regression Conditional logistic regression Application

Logistic Regression and Discriminant Function Analysis

Logistic Regression Using SPSS - sites.education.miami.edu...Logistic Regression Using SPSS Overview Logistic Regression - Logistic regression is used to predict a categorical (usually

Journal of Computing::A Neural-CBR System for Real ...cisjournal.org/journalofcomputing/archive/vol4no8/vol4no8_4.pdf · hedonic regression, logistic regression, and discriminant

11 Linear and Quadratic Discriminant Analysis, Logistic Regression

Multiple Discriminant Analysis and Logistic Regression

Logistic Regression and Discriminant Function Analysis

Stata for Logistic Regression - people.umass.edu for Logistic Regression.pdfFit a Logistic Regression Model Summary The commands logit and logistic will fit logistic regression models

Logistic regression

Lecture 20 - Logistic Regression - Statistical Science · Logistic Regression Logistic Regression Logistic regression is a GLM used to model a binary categorical variable using numerical