Upload
vijay-singore
View
218
Download
1
Tags:
Embed Size (px)
DESCRIPTION
gttggttt
Citation preview
Review of Leture 8
Bias and variane
Expeted value of Eout
w.r.t. D
= bias + var
f
H bias
var
f
H
g(D)(x) g(x) f(x)
Learning urves
How Ein
and Eout
vary with N
PSfrag replaements
Number of Data Points, N
E
x
p
e
t
e
d
E
r
r
o
r
bias
variane
Eout
Ein
20
40
60
80
0.16
0.17
0.18
0.19
0.2
0.21
0.22
PSfrag replaements
Number of Data Points, N
E
x
p
e
t
e
d
E
r
r
o
r
in-sample error
generalization error
Eout
Ein
20
40
60
80
0.16
0.17
0.18
0.19
0.2
0.21
0.22
N VC dimension
B-V:
VC:
Learning From Data
Yaser S. Abu-Mostafa
California Institute of Tehnology
Leture 9: The Linear Model II
Sponsored by Calteh's Provost Oe, E&AS Division, and IST Tuesday, May 1, 2012
Where we are
Linear lassiation X
Linear regression X
Logisti regression ?
Nonlinear transforms X
Learning From Data - Leture 9 2/24
Nonlinear transforms
x = (x0, x1, , xd) z = (z0, z1, , zd)
Eah zi = i(x) z = (x)
Example: z = (1, x1, x2, x1x2, x21, x
22)
Final hypothesis g(x) in X spae:
sign
(w
T(x))
or wT(x)
Learning From Data - Leture 9 3/24
The prie we pay
x = (x0, x1, , xd) z = (z0, z1, , zd)
w w
dv
= d+ 1 dv
d+ 1
Learning From Data - Leture 9 4/24
Two non-separable ases
PSfrag replaements
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
-1.5
-1
-0.5
0
0.5
1
1.5
PSfrag replaements
-1
-0.5
0
0.5
1
-1.5
-1
-0.5
0
0.5
1
1.5
Learning From Data - Leture 9 5/24
First ase
PSfrag replaements
-1
-0.5
0
0.5
1
-1.5
-1
-0.5
0
0.5
1
1.5
Use a linear model in X ; a
ept Ein
> 0
or
Insist on Ein
= 0; go to high-dimensional Z
Learning From Data - Leture 9 6/24
Seond ase
PSfrag replaements
-1
-0.5
0
0.5
1
-1.5
-1
-0.5
0
0.5
1
1.5
z = (1, x1, x2, x1x2, x21, x
22)
Why not: z = (1, x21, x22)
or better yet: z = (1, x21 + x22)
or even: z = (x21 + x22 0.6)
Learning From Data - Leture 9 7/24
Lesson learned
Looking at the data before hoosing the model an be hazardous to your Eout
Data snooping
Learning From Data - Leture 9 8/24
Logisti regression - Outline
The model
Error measure
Learning algorithm
Learning From Data - Leture 9 9/24
A third linear model
s =
di=0
wixi
linear lassiation linear regression logisti regression
h(x) = sign(s) h(x) = s h(x) = (s)
sx
x
x
x0
1
2
d
h x( )s
x
x
x
x0
1
2
d
h x( )s
x
x
x
x0
1
2
d
h x( )
Learning From Data - Leture 9 10/24
The logisti funtion
The formula:
(s) =es
1 + es
PSfrag replaements
(s)1
0 s
-4
-2
0
2
4
0
0.5
1
soft threshold: unertainty
sigmoid: attened out `s'
Learning From Data - Leture 9 11/24
Probability interpretation
h(x) = (s) is interpreted as a probability
Example. Predition of heart attaks
Input x: holesterol level, age, weight, et.
(s): probability of a heart attak
The signal s = wTx risk sore
h(x) = (s)
Learning From Data - Leture 9 12/24
Genuine probability
Data (x, y) with binary y, generated by a noisy target:
P (y | x) =
{f(x) for y = +1;
1f(x) for y = 1.
The target f : Rd [0, 1] is the probability
Learn g(x) = (wT x) f(x)
Learning From Data - Leture 9 13/24
Error measure
For eah (x, y), y is generated by probability f(x)
Plausible error measure based on likelihood:
If h = f , how likely to get y from x?
P (y | x) =
{h(x) for y = +1;
1 h(x) for y = 1.
Learning From Data - Leture 9 14/24
Formula for likelihood
P (y | x) =
{h(x) for y = +1;
1 h(x) for y = 1.
Substitute h(x) = (wTx), noting (s) = 1 (s)
PSfrag replaements
(s)
1
0 s
-4
-2
0
2
4
0
0.5
1
P (y | x) = (y wTx)
Likelihood of D = (x1, y1), . . . , (xN , yN) is
Nn=1
P (yn | xn) =
Nn=1
(ynwT
xn)
Learning From Data - Leture 9 15/24
Maximizing the likelihood
Minimize
1
Nln
(Nn=1
(yn wT
xn)
)
=1
N
Nn=1
ln
(1
(yn wT xn)
) [(s) =
1
1 + es
]
Ein
(w) =1
N
Nn=1
ln(1 + eynw
T
xn
)
e(h(xn),yn)ross-entropy error
Learning From Data - Leture 9 16/24
Logisti regression - Outline
The model
Error measure
Learning algorithm
Learning From Data - Leture 9 17/24
How to minimize Ein
For logisti regression,
Ein
(w) =1
N
Nn=1
ln(1 + eynw
T
xn
)
Compare to linear regression:
Ein
(w) =1
N
Nn=1
(wTxn yn)2
losed-form solution
iterative solution
Learning From Data - Leture 9 18/24
Iterative method: gradient desent
PSfrag replaements
Weights, w
I
n
-
s
a
m
p
l
e
E
r
r
o
r
,
E
i
n
-10
-8
-6
-4
-2
0
2
10
15
20
25
General method for nonlinear optimization
Start at w(0); take a step along steepest slope
Fixed step size: w(1) = w(0) + v
What is the diretion v?
Ein
(w)
Learning From Data - Leture 9 19/24
Formula for the diretion v
Ein
= Ein
(w(0) + v ) Ein
(w(0))
= Ein
(w(0))tv + O(2)
Ein
(w(0))
Sine v is a unit vetor,
v = E
in
(w(0))
Ein
(w(0))
Learning From Data - Leture 9 20/24
Fixed-size step?
How aets the algorithm:
PSfrag replaements
Weights, w
I
n
-
s
a
m
p
l
e
E
r
r
o
r
,
Ei
n
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
PSfrag replaements
Weights, w
I
n
-
s
a
m
p
l
e
E
r
r
o
r
,
E
i
n
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0
0.05
0.1
0.15
0.2
0.25
PSfrag replaements
Weights, w
I
n
-
s
a
m
p
l
e
E
r
r
o
r
,
E
i
n
large
small
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
too small too large variable just right
should inrease with the slope
Learning From Data - Leture 9 21/24
Easy implementation
Instead of
w = v
= E
in
(w(0))
Ein
(w(0))
Have
w = Ein
(w(0))
Fixed learning rate
Learning From Data - Leture 9 22/24
Logisti regression algorithm
1: Initialize the weights at t = 0 to w(0)
2: for t = 0, 1, 2, . . . do
3: Compute the gradient
Ein
= 1
N
Nn=1
ynxn
1 + e ynwT(t)xn
4: Update the weights: w(t+ 1) = w(t) Ein
5: Iterate to the next step until it is time to stop
6: Return the nal weights w
Learning From Data - Leture 9 23/24
Summary of Linear Models
Credit
Analysis
Amount
of Credit
Approve
or Deny
Probability
of Default
Pereptron
Logisti Regression
Linear Regression
Classiation Error
PLA, Poket,. . .
Cross-entropy Error
Gradient desent
Squared Error
Pseudo-inverse
Learning From Data - Leture 9 24/24