37
The Simple Perceptron

The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

  • Upload
    ngongoc

  • View
    251

  • Download
    1

Embed Size (px)

Citation preview

Page 1: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

The Simple Perceptron

Page 2: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Artificial Neural Network● Information processing architecture loosely

modelled on the brain● Consist of a large number of interconnected

processing units (neurons)● Work in parallel to accomplish a global task● Generally used to model relationships between

inputs and outputs or find patterns in data

Page 3: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Artificial Neural Network● 3 Types of layers

Page 4: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Single Processing Unit

Page 5: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Activation Functions● Function which takes the total input and

produces an output for the node given some threshold.

Page 6: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Network Structure● Two main network structures

1. Feed-Forward Network

2. Recurrent Network

Page 7: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Network Structure● Two main network structures

1. Feed-Forward Network

2. Recurrent Network

Page 8: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Learning Paradigms● Supervised Learning: Given training data consisting of pairs of

inputs/outputs, find a function which correctly matches them

● Unsupervised Learning: Given a data set, the network finds patterns and

categorizes into data groups.● Reinforcement Learning: No data given. Agent interacts with the

environment calculating cost of actions.

Page 9: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Simple Perceptron● The perceptron is a single layer feed-forward

neural network.

Page 10: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Simple Perceptron● Simplest output function

● Used to classify patterns said to be linearly separable

Page 11: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Linearly Separable

Page 12: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Linearly Separable

The bias is proportional to the offset of the plane from the origin

The weights determine the slope of the line

The weight vector is perpendicular to the plane

Page 13: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Perceptron Learning Algorithm● We want to train the perceptron to classify

inputs correctly● Accomplished by adjusting the connecting

weights and the bias● Can only properly handle linearly separable

sets

Page 14: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Perceptron Learning Algorithm● We have a “training set” which is a set of input

vectors used to train the perceptron.● During training both wi and θ (bias) are modified

for convenience, let w0 = θ and x0 = 1● Let, η, the learning rate, be a small positive

number (small steps lessen the possibility of destroying correct classifications)

● Initialise wi to some values

Page 15: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Perceptron Learning Algorithm

1. Select random sample from training set as input2. If classification is correct, do nothing3. If classification is incorrect, modify the weight

vector w using

Repeat this procedure until the entire training set is classified correctly

Desired output d n = {1 if x n∈set A−1 if x n∈set B }

w i=w iηd n xi n

Page 16: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Learning ExampleInitial Values:

η = 0.2

w = 01

0.50 = w0w1 x1w2 x2

= 0x10.5x2

⇒ x2 =−2x1

Page 17: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Learning Example

η = 0.2

w = 01

0.5x1 = 1, x2 = 1wTx > 0

Correct classification,no action

Page 18: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Learning Example

η = 0.2

w = 01

0.5x1 = 2, x2 = -2w0 = w0−0.2∗1w1 = w1−0.2∗2w2 = w2−0.2∗−2

Page 19: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Learning Example

η = 0.2

w = −0.20.60.9

x1 = 2, x2 = -2w0 = w0−0.2∗1w1 = w1−0.2∗2w2 = w2−0.2∗−2

Page 20: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Learning Example

η = 0.2

w = −0.20.60.9

x1 = -1, x2 = -1.5wTx < 0

Correct classification,no action

Page 21: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Learning Example

η = 0.2

w = −0.20.60.9

x1 = -2, x2 = -1wTx < 0

Correct classification,no action

Page 22: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Learning Example

η = 0.2

w = −0.20.60.9

x1 = -2, x2 = 1w0 = w00.2∗1

w2 = w20.2∗1w1 = w10.2∗−2

Page 23: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Learning Example

η = 0.2

w = 00.21.1

x1 = -2, x2 = 1w0 = w00.2∗1

w2 = w20.2∗1w1 = w10.2∗−2

Page 24: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Learning Example

η = 0.2

w = 00.21.1

x1 = 1.5, x2 = -0.5w0 = w00.2∗1

w2 = w20.2∗−0.5w1 = w10.2∗1.5

Page 25: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Learning Example

η = 0.2

w = 0.20.51

x1 = 1.5, x2 = -0.5w0 = w00.2∗1

w2 = w20.2∗−0.5w1 = w10.2∗1.5

Page 26: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Perceptron Convergence Theorem● The theorem states that for any data set which is linearly separable, the perceptron learning rule is guaranteed to find a solution in a finite number of iterations.

● Idea behind the proof: Find upper & lower bounds on the length of the weight vector to show finite number of iterations.

Page 27: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Perceptron Convergence Theorem

Let's assume that the input variables come from two linearly separable classes C1 & C2.

Let T1 & T2 be subsets of training vectors which belong to the classes C1 & C2 respectively. Then T1 U T2 is the complete training set.

Page 28: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Perceptron Convergence TheoremAs we have seen, the learning algorithms

purpose is to find a weight vector w such that

If the kth member of the training set, x(k), is correctly classified by the weight vector w(k) computed at the kth iteration of the algorithm, then we do not adjust the weight vector.

However, if it is incorrectly classified, we use the modifier

w⋅x 0 ∀ x∈C1

w⋅x ≤ 0 ∀ x∈C 2

(x is an input vector)

w k1=w k ηd k x k

Page 29: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Perceptron Convergence Theorem

So we get

We can set η = 1, as for η ≠ 1 (>0) just scales the vectors.

We can also set the initial condition w(0) = 0, as any non-zero value will still converge, just decrease or increase the number of iterations.

w k1 = w k −ηx k if w k ⋅x k 0, x k ∈C 2

w k1 = w k ηx k if w k ⋅x k ≤ 0, x k ∈C1

Page 30: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Perceptron Convergence TheoremSuppose that w(k)⋅x(k) < 0 for k = 1, 2, ... where

x(k) ∈ T1, so with an incorrect classification we get

By expanding iteratively, we get

.

w k1=w k x k x k ∈C1

w k1=x k w k =x k x k – 1w k – 1

=x k ...x 1w 0

.

.

.

Page 31: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Perceptron Convergence TheoremAs we assume linear separability, ∃ a solution w*

where w⋅x(k) > 0, x(1)...x(k) ∈ T1. Multiply both sides by the solution w* to get

Thus we get

These are all > 0,hence all >= α,where

w∗⋅w k1 = w∗⋅x 1...w∗⋅x k

w∗⋅w k1 ≥ kα

α=min w∗⋅xk

Page 32: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Perceptron Convergence TheoremNow we make use of the Cauchy-Schwarz

inequality which states that for any two vectors A, B

Applying this we get

From the previous slide we knowThus, it follow that

∥A∥2∥B∥2 = A⋅B2

w∗⋅w k1 ≥ kα

∥w∗∥2∥w k1∥2 ≥ w∗⋅w k12

∥w k1∥2 ≥ k 2α2

∥w∗∥2

Page 33: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Perceptron Convergence Theorem

We continue the proof by going down another route.

We square the Euclidean norm on both sides

Thus we get

w j1 = w j x j for j=1, ... , k with x j ∈T 1

=∥w j ∥2∥x j ∥22w j ⋅x j ∥w j1∥2=∥w j x j ∥2

incorrectly classified, so < 0

∥w j1∥2−∥w j ∥2 ≤∥x j ∥2

Page 34: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Perceptron Convergence TheoremSumming both sides for all j

We get

∥w j1∥2−∥w j ∥2 ≤∥x j ∥2

∥w j ∥2−∥w j−1∥2 ≤∥x j−1∥2

.

.

.

∥w 1∥2−∥w 0∥2 ≤∥x 1∥2

∥w k1∥2 ≤∑j=1

k

∥x j ∥2

≤ kβ β = max ∥x j ∥2

Page 35: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Perceptron Convergence Theorem

But now we have a conflict between the equations, for sufficiently large values of k

So, we can state that k cannot be larger than some value kmax for which the two equations are both satisfied.

∥w k1∥2 ≤ kβ ∥w k1∥2 ≥ k 2α2

∥w∗∥2

kmax β =kmax

2 α2

∥w∗∥2 ⇒ kmax =β∥w∗∥2

α2

Page 36: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

Perceptron Convergence Theorem

Thus it is proved that for ηk = 1, k, w(0) = 0, given that a solution vector w* exists, the perceptron learning rule will terminate after at most kmax iterations.

Page 37: The Simple Perceptron - 130.243.105.49130.243.105.49/.../2007_02_01b-Janecek-Perceptron.pdf · Perceptron Convergence Theorem As we have seen, the learning algorithms purpose is to

The End