WINSEM2014-15 CP3599 07-Jan-2015 RM01 LogisticRegression

Review of Leture 8

Bias and variane

Expeted value of Eout

w.r.t. D

= bias + var

f

H bias

var

f

H

g(D)(x) g(x) f(x)

Learning urves

How Ein

and Eout

vary with N

PSfrag replaements

Number of Data Points, N

E

x

p

e

t

e

d

E

r

r

o

r

bias

variane

Eout

Ein

20

40

60

80

0.16

0.17

0.18

0.19

0.2

0.21

0.22

PSfrag replaements

Number of Data Points, N

E

x

p

e

t

e

d

E

r

r

o

r

in-sample error

generalization error

Eout

Ein

20

40

60

80

0.16

0.17

0.18

0.19

0.2

0.21

0.22

N VC dimension

B-V:

VC:

Learning From Data

Yaser S. Abu-Mostafa

California Institute of Tehnology

Leture 9: The Linear Model II

Sponsored by Calteh's Provost Oe, E&AS Division, and IST Tuesday, May 1, 2012

Where we are

Linear lassiation X

Linear regression X

Logisti regression ?

Nonlinear transforms X

Learning From Data - Leture 9 2/24

Nonlinear transforms

x = (x0, x1, , xd) z = (z0, z1, , zd)

Eah zi = i(x) z = (x)

Example: z = (1, x1, x2, x1x2, x21, x

22)

Final hypothesis g(x) in X spae:

sign

(w

T(x))

or wT(x)


The prie we pay

x = (x0, x1, , xd) z = (z0, z1, , zd)

w w

dv

= d+ 1 dv

d+ 1


Two non-separable ases

PSfrag replaements

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

-1.5

-1

-0.5

0

0.5

1

1.5

PSfrag replaements

-1

-0.5

0

0.5

1

-1.5

-1

-0.5

0

0.5

1

1.5


First ase

PSfrag replaements

-1

-0.5

0

0.5

1

-1.5

-1

-0.5

0

0.5

1

1.5

Use a linear model in X ; a

ept Ein

> 0

or

Insist on Ein

= 0; go to high-dimensional Z


Seond ase

PSfrag replaements

-1

-0.5

0

0.5

1

-1.5

-1

-0.5

0

0.5

1

1.5

z = (1, x1, x2, x1x2, x21, x

22)

Why not: z = (1, x21, x22)

or better yet: z = (1, x21 + x22)

or even: z = (x21 + x22 0.6)


Lesson learned

Looking at the data before hoosing the model an be hazardous to your Eout

Data snooping


Logisti regression - Outline

The model

Error measure

Learning algorithm


A third linear model

s =

di=0

wixi

linear lassiation linear regression logisti regression

h(x) = sign(s) h(x) = s h(x) = (s)

sx

x

x

x0

1

2

d

h x( )s

x

x

x

x0

1

2

d

h x( )s

x

x

x

x0

1

2

d

h x( )


The logisti funtion

The formula:

(s) =es

1 + es

PSfrag replaements

(s)1

0 s

-4

-2

0

2

4

0

0.5

1

soft threshold: unertainty

sigmoid: attened out `s'


Probability interpretation

h(x) = (s) is interpreted as a probability

Example. Predition of heart attaks

Input x: holesterol level, age, weight, et.

(s): probability of a heart attak

The signal s = wTx risk sore

h(x) = (s)


Genuine probability

Data (x, y) with binary y, generated by a noisy target:

P (y | x) =

{f(x) for y = +1;

1f(x) for y = 1.

The target f : Rd [0, 1] is the probability

Learn g(x) = (wT x) f(x)


Error measure

For eah (x, y), y is generated by probability f(x)

Plausible error measure based on likelihood:

If h = f , how likely to get y from x?

P (y | x) =

{h(x) for y = +1;

1 h(x) for y = 1.


Formula for likelihood

P (y | x) =

{h(x) for y = +1;

1 h(x) for y = 1.

Substitute h(x) = (wTx), noting (s) = 1 (s)

PSfrag replaements

(s)

1

0 s

-4

-2

0

2

4

0

0.5

1

P (y | x) = (y wTx)

Likelihood of D = (x1, y1), . . . , (xN , yN) is

Nn=1

P (yn | xn) =

Nn=1

(ynwT

xn)


Maximizing the likelihood

Minimize

1

Nln

(Nn=1

(yn wT

xn)

)

=1

N

Nn=1

ln

(1

(yn wT xn)

) [(s) =

1

1 + es

]

Ein

(w) =1

N

Nn=1

ln(1 + eynw

T

xn

)

e(h(xn),yn)ross-entropy error


Logisti regression - Outline

The model

Error measure

Learning algorithm


How to minimize Ein

For logisti regression,

Ein

(w) =1

N

Nn=1

ln(1 + eynw

T

xn

)

Compare to linear regression:

Ein

(w) =1

N

Nn=1

(wTxn yn)2

losed-form solution

iterative solution


Iterative method: gradient desent

PSfrag replaements

Weights, w

I

n

-

s

a

m

p

l

e

E

r

r

o

r

,

E

i

n

-10

-8

-6

-4

-2

0

2

10

15

20

25

General method for nonlinear optimization

Start at w(0); take a step along steepest slope

Fixed step size: w(1) = w(0) + v

What is the diretion v?

Ein

(w)


Formula for the diretion v

Ein

= Ein

(w(0) + v ) Ein

(w(0))

= Ein

(w(0))tv + O(2)

Ein

(w(0))

Sine v is a unit vetor,

v = E

in

(w(0))

Ein

(w(0))


Fixed-size step?

How aets the algorithm:

PSfrag replaements

Weights, w

I

n

-

s

a

m

p

l

e

E

r

r

o

r

,

Ei

n

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

PSfrag replaements

Weights, w

I

n

-

s

a

m

p

l

e

E

r

r

o

r

,

E

i

n

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0

0.05

0.1

0.15

0.2

0.25

PSfrag replaements

Weights, w

I

n

-

s

a

m

p

l

e

E

r

r

o

r

,

E

i

n

large

small

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

too small too large variable just right

should inrease with the slope


Easy implementation

Instead of

w = v

= E

in

(w(0))

Ein

(w(0))

Have

w = Ein

(w(0))

Fixed learning rate


Logisti regression algorithm

1: Initialize the weights at t = 0 to w(0)

2: for t = 0, 1, 2, . . . do

3: Compute the gradient

Ein

= 1

N

Nn=1

ynxn

1 + e ynwT(t)xn

4: Update the weights: w(t+ 1) = w(t) Ein

5: Iterate to the next step until it is time to stop

6: Return the nal weights w


Summary of Linear Models

Credit

Analysis

Amount

of Credit

Approve

or Deny

Probability

of Default

Pereptron

Logisti Regression

Linear Regression

Classiation Error

PLA, Poket,. . .

Cross-entropy Error

Gradient desent

Squared Error

Pseudo-inverse


Documents

WINSEM2014-15 CP3599 07-Jan-2015 RM01 LogisticRegression