24
Neural Networks NN 2 1 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with just one neuron:

Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with

Embed Size (px)

DESCRIPTION

Neural NetworksNN 23 Perceptron: Neuron Model Uses a non-linear (McCulloch-Pitts) model of neuron: x1x1 x2x2 xnxn w2w2 w1w1 wnwn b (bias) vy  (v)  is the sign function:  (v) = +1IF v >= 0 -1IF v < 0 Is the function sign(v)

Citation preview

Page 1: Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with

Neural Networks NN 2 1

Architecture

• We consider the architecture: feed-forward NN with one layer

• It is sufficient to study single layer perceptrons with just one neuron:

Page 2: Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with

Neural Networks NN 2 2

Single layer perceptrons • Generalization to single layer perceptrons

with more neurons is easy because:

• The output units are independent among each other • Each weight only affects one of the outputs

Page 3: Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with

Neural Networks NN 2 3

Perceptron: Neuron Model• Uses a non-linear (McCulloch-Pitts) model

of neuron:x1

x2

xn

w2

w1

wn

b (bias)

v y(v)

is the sign function:

(v) = +1 IF v >= 0

-1 IF v < 0Is the function sign(v)

Page 4: Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with

Neural Networks NN 2 4

Perceptron: Applications

• The perceptron is used for classification: classify correctly a set of examples into one of the two classes C1, C2:

If the output of the perceptron is +1 then the input is assigned to class C1

If the output is -1 then the input is assigned to C2

Page 5: Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with

Neural Networks NN 2 5

Perceptron: Classification • The equation below describes a hyperplane in the

input space. This hyperplane is used to separate the two classes C1 and C2

0 bxwm

1iii

x2

C1

C2x1

decisionboundary

w1x1 + w2x2 + b = 0

decisionregion for C1

w1x1 + w2x2 + b >= 0

Page 6: Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with

Neural Networks NN 2 6

Perceptron: Limitations • The perceptron can only model linearly separable

functions.• The perceptron can be used to model the following

Boolean functions:• AND• OR• COMPLEMENT

• But it cannot model the XOR. Why?

Page 7: Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with

Neural Networks NN 2 7

Perceptron: Learning Algorithm

• Variables and parameters at iteration n of the learning algorithm:x (n) = input vector = [+1, x1(n), x2(n), …, xm(n)]T

w(n) = weight vector = [b(n), w1(n), w2(n), …, wm(n)]T

b(n) = biasy(n) = actual responsed(n) = desired response = learning rate parameter

Page 8: Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with

Neural Networks NN 2 8

The fixed-increment learning algorithm• Initialization: w(1) = 0 (or random small values)• Activation: activate perceptron by applying input

example (input = x (n), desired output = d(n) )• Compute actual output of perceptron:

y(n) = sgn[wT(n) x(n)]

• Adapt weight vector: d(n) y(n) thenw(n + 1) = w(n) + d(n) x(n)

where d(n) =+1 if x(n) C1 -1 if x(n) C2

• Continuation: increment time step n by 1 and go to Activation step

Page 9: Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with

Neural Networks NN 2 9

Example

Consider 2D training set C1 C2, where: C1 = {(1,1), (1, -1), (0, -1)} elements of class 1 C2 = {(-1,-1), (-1,1), (0,1)} elements of class -1

Use the perceptron learning algorithm to classify these examples.

• w(1) = [1, 0, 0]T = 1

Page 10: Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with

Neural Networks NN 2 10

Consider the augmented training set C’1 C’2, with firstentry fixed to 1 (to deal with the bias as extra weight):(1, 1, 1), (1, 1, -1), (1, 0, -1)

(1,-1, -1), (1,-1, 1), (1,0, 1)

Replace with - for all C2’ and use the following simpler update rule:

w(n) + x(n) if w(n) x(n) 0w(n+1) =

w(n) otherwise

Trick

Page 11: Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with

Neural Networks NN 2 11

• Training set after application of trick: (1, 1, 1), (1, 1, -1), (1,0, -1), (-1,1, 1), (-1,1, -1), (-

1,0, -1)• Application of perceptron learning algorithm:

Example

Adjustedpattern

Weightapplied

w(n) x(n) Update? Newweight

(1, 1, 1) (1, 0, 0) 1 No (1, 0, 0)(1, 1, -1) (1, 0, 0) 1 No (1, 0, 0)(1,0, -1) (1, 0, 0) 1 No (1, 0, 0)(-1,1, 1) (1, 0, 0) -1 Yes (0, 1, 1)(-1,1, -1) (0, 1, 1) 0 Yes (-1, 2, 0)(-1,0, -1) (-1, 2, 0) 1 No (-1, 2, 0)

End epoch 1

Page 12: Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with

Neural Networks NN 2 12

Example Adjustedpattern

Weightapplied

w(n) (n) Update? Newweight

(1, 1, 1) (-1, 2, 0) 1 No (-1, 2, 0)(1, 1, -1) (-1, 2, 0) 1 No (-1, 2, 0)(1,0, -1) (-1, 2, 0) -1 Yes (0, 2, -1)(-1, 1, 1) (0, 2, -1) 1 No (0, 2, -1)(-1, 1, -1) (0, 2, -1) 3 No (0, 2, -1)(-1,0, -1) (0,2, -1) 1 No (0, 2, -1)

End epoch 2

At epoch 3 no updates are performed. (check!) stop execution of algorithm.Final weight vector: (0, 2, -1). decision hyperplane is 2x1 - x2 = 0.

Page 13: Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with

Neural Networks NN 2 13

Example

+ +

-

-

x1

x2

C2

C1

- +

1

1

-1

-1 1/2

Decision boundary:2x1 - x2 = 0

Page 14: Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with

Neural Networks NN 2 14

Convergence of the learning algorithm

Suppose datasets C1, C2 are linearly separable. The perceptron convergence algorithm converges after n0 iterations, with n0 nmax on training set C1 C2.

Proof:• suppose x C1 output = 1 and x C2 output = -1.• For simplicity assume w(1) = 0, = 1.• Suppose perceptron incorrectly classifies x(1) … x(k) … C1. Then wT(k) x(k) 0.

Error correction rule:w(2) = w(1) + x(1)

w(3) = w(2) + x(2) w(k+1) = x(1) + … + x(k)

w(k+1) = w(k) + x(k)

Page 15: Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with

Neural Networks NN 2 15

Convergence theorem (proof)

• Let w0 be such that w0T x(n) > 0 x(n) C1.

w0 exists because C1 and C2 are linearly separable.

• Let = min w0T x(n) x(n) C1.

• Then w0T w(k+1) = w0

T x(1) + … + w0T x(k) k

• Cauchy-Schwarz inequality:||w0||2 ||w(k+1)||2 [w0

T w(k+1)]2

||w(k+1)||2 (A) k2 2

||w0|| 2

Page 16: Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with

Neural Networks NN 2 16

• Now we consider another route: w(k+1) = w(k) + x(k) || w(k+1)||2 = || w(k)||2 + ||x(k)||2 + 2 w

T(k)x(k) euclidean norm 0 because x(k) is misclassified

||w(k+1)||2 ||w(k)||2 + ||x(k)||2

=0

||w(2)||2 ||w(1)||2 + ||x(1)||2

||w(3)||2 ||w(2)||2 + ||x(2)||2

||w(k+1)||2

Convergence theorem (proof)

k

i 1

2||x(i)||

Page 17: Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with

Neural Networks NN 2 17

• Let = max ||x(n)||2 x(n) C1

• ||w(k+1)||2 k (B)• For sufficiently large values of k: (B) becomes in

conflict with (A). Then k cannot be greater than kmax such that (A)

and (B) are both satisfied with the equality sign.

• Perceptron convergence algorithm terminates in at most nmax=iterations.

convergence theorem (proof)

β||||kk||||

k2

20

20

22

w

wmaxmax

max

||w0||2

2

Page 18: Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with

Neural Networks NN 2 18

Adaline: Adaptive Linear Element

• Adaline: uses a linear neuron model and the Least-Mean-Square (LMS) learning algorithmThe idea: try to minimize the square error, which is a function of the weights

• We can find the minimum of the error function E by means of the Steepest descent method

)n(e)w(n)( 221E

m

0jjj )(n(n)wx)n(d)n(e

Page 19: Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with

Neural Networks NN 2 19

Steepest Descent Method

)) E(nofgradient ()w(n)1n(w

• start with an arbitrary point• find a direction in which E is decreasing most rapidly

• make a small step in that direction

mwE

wEE

,, ))w( of(gradient

1

Page 20: Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with

Neural Networks NN 2 20

Least-Mean-Square algorithm (Widrow-Hoff algorithm)

• Approximation of gradient(E)

• Update rule for the weights becomes:

](n)xe(n)[

w(n)e(n)e(n)

w(n))w(n)(

T

E

(n)e(n)x w(n)1)w(n

Page 21: Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with

Neural Networks NN 2 21

Summary of LMS algorithm Training sample: input signal vector (n)

desired response d(n)User selected parameter >0Initializationset ŵ(1) = 0

Computation for n = 1, 2, … computee(n) = d(n) - ŵT(n)(n)ŵ(n+1) = ŵ(n) + (n)e(n)

Page 22: Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with

Neural Networks NN 2 22

Comparison LMS and Perceptron

• Perceptron and Adaline represent different implementations of a single-layer perceptron based on error-correction learning.

LMS: Linear.

• Model of a neuron Perceptron: Non linear. Hard-Limiter activation function. McCulloch-Pitts model.

LMS: Continuous.

• Learning Process Perceptron: A finite number of iterations.

Page 23: Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with

Neural Networks NN 2 23

Example

-1 -1 -1 -1 -1 -1 -1 -1-1 -1 +1 +1 +1 +1 -1 -1-1 -1 -1 -1 -1 +1 -1 -1-1 -1 -1 +1 +1 +1 -1 -1-1 -1 -1 -1 -1 +1 -1 -1-1 -1 -1 -1 - 1 +1 -1 -1-1 -1 +1 +1 +1 +1 -1 -1-1 -1 -1 -1 -1 -1 -1 -1

Page 24: Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with

Neural Networks NN 2 24

Example

• How to train a perceptron to recognize this 3?

• Assign –1 to weights of input values that are equal to +1, and +1 to weights of input values that are equal to -1