Lecture 9 Perceptron

Machine Learning for Language Technology Lecture 9: Perceptron

Marina San2ni Department of Linguis2cs and Philology Uppsala University, Uppsala, Sweden

Autumn 2014

Acknowledgement: Thanks to Prof. Joakim Nivre for course design and materials

Inputs and Outputs

Feature Representa2on

Features and Classes

Examples (i)

Examples (ii)

Block Feature Vectors

Representa2on

Linear Classifiers: Repe22on & Extension 8

Linear classifiers (atomic classes)

•  Assump2on: data must be linearily separable

Perceptron

Perceptron (i)

Perceptron Learning Algorithm

Separability and Margin (i)

Separability and Margin (ii)

•  Given a training instance, let Y bar t be the set of all labels that are incorrect, let’s define the set of incorrect labels minus the correct labels for that instance.

•  Then we say that a training set is separable with a margin gamma, if there exists a weight vector w that has a certain norm (ie 1),

The score that we get when we use this vector w minus the score of every incorrect label is at least gamma

Separability and Margin (iii) •  IMPORTANT: for every training instance the score that we

get when we use the training vector w minus the score of every incorrect label is at least a certain margin gamma (ɣ). That is, the margin ɣ is the smallest difference between the score of the right class and the best score of the incorrect class.

The higher the weights, the greater the norms. And we want this to be 1 (normalization).

There are different ways of measuring the length/magnitude of a vector and they are known as norms. The Eucledian norm (or L2 norm) says: take all the values of the weight vector, square them and sum them up, then take the square root .

Perceptron

Perceptron Learning Algorithm

Main Theorem

Perceptron Theorem

• For any training set that is separable with some margin, we can prove that the number of mistakes during training -‐-‐ if we keep itera2ng over the training set -‐-‐ is bounded by a quan2ty that depends on the size of the margin (see proofs in the Appendix, slides Lecture 3).

• R depends on the norm of the largest difference you can have between feature vectors. The larger R, the more spread out the data, the more errors we can poten2ally make. Similarly if gamma is larger we will make fewer mistakes.

Summary

Basically…

.... if it is possible to find such a weight vector for some posiAve margin gamma, then the training set is separable.

So... if the training set is separable, Perceptron will eventually find the weight vector that separates the data. The 2me it takes depends on the property of the data. But aeer a finite number of itera2on, the training set will converge to 0. However... although we find the perfect weight vector for separa2ng the training data, it might be the case that the classifier has not good generaliza2on (do you remember the difference between empirical error and generaliza2on error?) So, with Perceptron, we have a fixed norm (=1) and variable margin (>0).

Appendix: Proofs and Deriva2ons

Lecture 9 Perceptron

Education

Multilayer Perceptrons Lecture 11labs.seas.wustl.edu/bme/...MultiLayerPerceptrons.pdf · 4! Multilayer Perceptron! g Graph of a multilayer perceptron with two hidden layers" Input

Lecture 1: Perceptron - ut · • The perceptron learning procedure is still widely used today for tasks with enormous feature vectors that contain many millions of features. Perceptron

Neural Networks Lecture 3:Multi-Layer Perceptronele.aut.ac.ir/~abdollahi/Lec_2_NN11.pdf · Neural Networks Lecture 3:Multi-Layer Perceptron ... identity activation function, the network

Machine Learning Basics Lecture 3: Perceptron · Perceptron Algorithm •Assume for simplicity: all 𝑖 has length 1 Perceptron: figure from the lecture note of Nina Balcan

B.Macukow 1 Neural Networks Lecture 5. B.Macukow 2 The Perceptron

Perceptron Lecture 4. Perceptron The perceptron was introduced by McCulloch and Pitts in 1943 as an artificial neuron with a hard-limiting activation

© Negnevitsky, Pearson Education, 2005 1 Lecture 8 (chapter 6) Artificial neural networks: Supervised learning The perceptron Quick Review The perceptron

AI-Lecture 12 - Simple Perceptron

Perceptron - SNU · •Hypothesis space of a perceptron (representation power of a perceptron) •The case where the threshold perceptron return 1 • defines a hyperplane in the

Lecture 4: Linear predictors and the Perceptroninabd171/wiki.files/lecture4_handouts.pdf · Lecture 4: Linear predictors and the Perceptron Introduction to Learning and Analysis of

Lect 9 Perceptron - Vision Labsvision.psych.umn.edu/.../Lect_9_Perceptron.nb.pdf · The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain. In,

JST PERCEPTRON

Distributed perceptron

MULTILAYER PERCEPTRON

COSC 522 –Machine Learning Lecture 11 –Perceptron

Perceptron C++

Lecture 6: Perceptron & Logistic Regression

Lecture 6. Notes on Linear Algebra. Perceptron · • Notes on linear algebra ∗Vectors and dot products ∗Hyperplanes and vector normals • Perceptron ∗Introduction to Artificial

September 23, 2010Neural Networks Lecture 6: Perceptron Learning 1 Refresher: Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly

Perceptron Learning