Upload
hugo-ferguson
View
216
Download
0
Embed Size (px)
DESCRIPTION
Neural NetworksNN 23 Perceptron: Neuron Model Uses a non-linear (McCulloch-Pitts) model of neuron: x1x1 x2x2 xnxn w2w2 w1w1 wnwn b (bias) vy (v) is the sign function: (v) = +1IF v >= 0 -1IF v < 0 Is the function sign(v)
Citation preview
Neural Networks NN 2 1
Architecture
• We consider the architecture: feed-forward NN with one layer
• It is sufficient to study single layer perceptrons with just one neuron:
Neural Networks NN 2 2
Single layer perceptrons • Generalization to single layer perceptrons
with more neurons is easy because:
• The output units are independent among each other • Each weight only affects one of the outputs
Neural Networks NN 2 3
Perceptron: Neuron Model• Uses a non-linear (McCulloch-Pitts) model
of neuron:x1
x2
xn
w2
w1
wn
b (bias)
v y(v)
is the sign function:
(v) = +1 IF v >= 0
-1 IF v < 0Is the function sign(v)
Neural Networks NN 2 4
Perceptron: Applications
• The perceptron is used for classification: classify correctly a set of examples into one of the two classes C1, C2:
If the output of the perceptron is +1 then the input is assigned to class C1
If the output is -1 then the input is assigned to C2
Neural Networks NN 2 5
Perceptron: Classification • The equation below describes a hyperplane in the
input space. This hyperplane is used to separate the two classes C1 and C2
0 bxwm
1iii
x2
C1
C2x1
decisionboundary
w1x1 + w2x2 + b = 0
decisionregion for C1
w1x1 + w2x2 + b >= 0
Neural Networks NN 2 6
Perceptron: Limitations • The perceptron can only model linearly separable
functions.• The perceptron can be used to model the following
Boolean functions:• AND• OR• COMPLEMENT
• But it cannot model the XOR. Why?
Neural Networks NN 2 7
Perceptron: Learning Algorithm
• Variables and parameters at iteration n of the learning algorithm:x (n) = input vector = [+1, x1(n), x2(n), …, xm(n)]T
w(n) = weight vector = [b(n), w1(n), w2(n), …, wm(n)]T
b(n) = biasy(n) = actual responsed(n) = desired response = learning rate parameter
Neural Networks NN 2 8
The fixed-increment learning algorithm• Initialization: w(1) = 0 (or random small values)• Activation: activate perceptron by applying input
example (input = x (n), desired output = d(n) )• Compute actual output of perceptron:
y(n) = sgn[wT(n) x(n)]
• Adapt weight vector: d(n) y(n) thenw(n + 1) = w(n) + d(n) x(n)
where d(n) =+1 if x(n) C1 -1 if x(n) C2
• Continuation: increment time step n by 1 and go to Activation step
Neural Networks NN 2 9
Example
Consider 2D training set C1 C2, where: C1 = {(1,1), (1, -1), (0, -1)} elements of class 1 C2 = {(-1,-1), (-1,1), (0,1)} elements of class -1
Use the perceptron learning algorithm to classify these examples.
• w(1) = [1, 0, 0]T = 1
Neural Networks NN 2 10
Consider the augmented training set C’1 C’2, with firstentry fixed to 1 (to deal with the bias as extra weight):(1, 1, 1), (1, 1, -1), (1, 0, -1)
(1,-1, -1), (1,-1, 1), (1,0, 1)
Replace with - for all C2’ and use the following simpler update rule:
w(n) + x(n) if w(n) x(n) 0w(n+1) =
w(n) otherwise
Trick
Neural Networks NN 2 11
• Training set after application of trick: (1, 1, 1), (1, 1, -1), (1,0, -1), (-1,1, 1), (-1,1, -1), (-
1,0, -1)• Application of perceptron learning algorithm:
Example
Adjustedpattern
Weightapplied
w(n) x(n) Update? Newweight
(1, 1, 1) (1, 0, 0) 1 No (1, 0, 0)(1, 1, -1) (1, 0, 0) 1 No (1, 0, 0)(1,0, -1) (1, 0, 0) 1 No (1, 0, 0)(-1,1, 1) (1, 0, 0) -1 Yes (0, 1, 1)(-1,1, -1) (0, 1, 1) 0 Yes (-1, 2, 0)(-1,0, -1) (-1, 2, 0) 1 No (-1, 2, 0)
End epoch 1
Neural Networks NN 2 12
Example Adjustedpattern
Weightapplied
w(n) (n) Update? Newweight
(1, 1, 1) (-1, 2, 0) 1 No (-1, 2, 0)(1, 1, -1) (-1, 2, 0) 1 No (-1, 2, 0)(1,0, -1) (-1, 2, 0) -1 Yes (0, 2, -1)(-1, 1, 1) (0, 2, -1) 1 No (0, 2, -1)(-1, 1, -1) (0, 2, -1) 3 No (0, 2, -1)(-1,0, -1) (0,2, -1) 1 No (0, 2, -1)
End epoch 2
At epoch 3 no updates are performed. (check!) stop execution of algorithm.Final weight vector: (0, 2, -1). decision hyperplane is 2x1 - x2 = 0.
Neural Networks NN 2 13
Example
+ +
-
-
x1
x2
C2
C1
- +
1
1
-1
-1 1/2
Decision boundary:2x1 - x2 = 0
Neural Networks NN 2 14
Convergence of the learning algorithm
Suppose datasets C1, C2 are linearly separable. The perceptron convergence algorithm converges after n0 iterations, with n0 nmax on training set C1 C2.
Proof:• suppose x C1 output = 1 and x C2 output = -1.• For simplicity assume w(1) = 0, = 1.• Suppose perceptron incorrectly classifies x(1) … x(k) … C1. Then wT(k) x(k) 0.
Error correction rule:w(2) = w(1) + x(1)
w(3) = w(2) + x(2) w(k+1) = x(1) + … + x(k)
w(k+1) = w(k) + x(k)
Neural Networks NN 2 15
Convergence theorem (proof)
• Let w0 be such that w0T x(n) > 0 x(n) C1.
w0 exists because C1 and C2 are linearly separable.
• Let = min w0T x(n) x(n) C1.
• Then w0T w(k+1) = w0
T x(1) + … + w0T x(k) k
• Cauchy-Schwarz inequality:||w0||2 ||w(k+1)||2 [w0
T w(k+1)]2
||w(k+1)||2 (A) k2 2
||w0|| 2
Neural Networks NN 2 16
• Now we consider another route: w(k+1) = w(k) + x(k) || w(k+1)||2 = || w(k)||2 + ||x(k)||2 + 2 w
T(k)x(k) euclidean norm 0 because x(k) is misclassified
||w(k+1)||2 ||w(k)||2 + ||x(k)||2
=0
||w(2)||2 ||w(1)||2 + ||x(1)||2
||w(3)||2 ||w(2)||2 + ||x(2)||2
||w(k+1)||2
Convergence theorem (proof)
k
i 1
2||x(i)||
Neural Networks NN 2 17
• Let = max ||x(n)||2 x(n) C1
• ||w(k+1)||2 k (B)• For sufficiently large values of k: (B) becomes in
conflict with (A). Then k cannot be greater than kmax such that (A)
and (B) are both satisfied with the equality sign.
• Perceptron convergence algorithm terminates in at most nmax=iterations.
convergence theorem (proof)
β||||kk||||
k2
20
20
22
w
wmaxmax
max
||w0||2
2
Neural Networks NN 2 18
Adaline: Adaptive Linear Element
• Adaline: uses a linear neuron model and the Least-Mean-Square (LMS) learning algorithmThe idea: try to minimize the square error, which is a function of the weights
• We can find the minimum of the error function E by means of the Steepest descent method
)n(e)w(n)( 221E
m
0jjj )(n(n)wx)n(d)n(e
Neural Networks NN 2 19
Steepest Descent Method
)) E(nofgradient ()w(n)1n(w
• start with an arbitrary point• find a direction in which E is decreasing most rapidly
• make a small step in that direction
mwE
wEE
,, ))w( of(gradient
1
Neural Networks NN 2 20
Least-Mean-Square algorithm (Widrow-Hoff algorithm)
• Approximation of gradient(E)
• Update rule for the weights becomes:
](n)xe(n)[
w(n)e(n)e(n)
w(n))w(n)(
T
E
(n)e(n)x w(n)1)w(n
Neural Networks NN 2 21
Summary of LMS algorithm Training sample: input signal vector (n)
desired response d(n)User selected parameter >0Initializationset ŵ(1) = 0
Computation for n = 1, 2, … computee(n) = d(n) - ŵT(n)(n)ŵ(n+1) = ŵ(n) + (n)e(n)
Neural Networks NN 2 22
Comparison LMS and Perceptron
• Perceptron and Adaline represent different implementations of a single-layer perceptron based on error-correction learning.
LMS: Linear.
• Model of a neuron Perceptron: Non linear. Hard-Limiter activation function. McCulloch-Pitts model.
LMS: Continuous.
• Learning Process Perceptron: A finite number of iterations.
Neural Networks NN 2 23
Example
-1 -1 -1 -1 -1 -1 -1 -1-1 -1 +1 +1 +1 +1 -1 -1-1 -1 -1 -1 -1 +1 -1 -1-1 -1 -1 +1 +1 +1 -1 -1-1 -1 -1 -1 -1 +1 -1 -1-1 -1 -1 -1 - 1 +1 -1 -1-1 -1 +1 +1 +1 +1 -1 -1-1 -1 -1 -1 -1 -1 -1 -1
Neural Networks NN 2 24
Example
• How to train a perceptron to recognize this 3?
• Assign –1 to weights of input values that are equal to +1, and +1 to weights of input values that are equal to -1