Machine Learning: Connectionist
McCulloch-Pitts NeuronPerceptronsMultilayer NetworksSupport Vector MachinesFeedback NetworksHopfield Networks
Artificial Neuron
Input signals, xi
Weights, wi
Activation level, Sigma wi x
i
Threshold function, f
McCulloch-Pitts Neuron
Output is either +1 or -1.Computes weighted sum of inputs.If weighted sum >= 0 outputs +1, else -1.Can be combined into networks (multilayers)Not trainedComputationally complete
Perceptrons (Rosenblatt)
Similar to McCulloch-Pitts neuronSingle layerHard limited threshold function, +1 if
weighted sum >=t, -1 otherwiseCan use sign function if bias includedAllows for supervised training (Perceptron Training
Algorithm)
Perceptron Training Algorithm
Adjusts weights by using the difference between the
actual output and the expected output in a a training
example.Rule: Δw
i = c(d
i – O
i) x
i
c is the learning rated
i is the expected output
Oi is the computed output, sign(Σw
i x
i).
Example: Matlab nnd4pr function
Perceptron (Cont'd)
Simple training algorithmNot computationally completeCounter-example: XOR function
Requires problem to be linearly separableThreshold function not continuous (needed for more
sophisticated training algorithms)
Generalized Delta Rule
Conducive to finer granularity in the error
measurement Form of gradient descent learning – consider the
error surface, the map of the error vs. the weights.
The rule takes a step closer to a local minima by
following the gradientUses the learning parameter, c
Generalized Delta Rule (cont'd)
The threshold function must be continuous. We use
the a sigmoid function, f(x) = 1/(1 + e -λx), instead of
a hard limit function. The sigmoid function is
continuous, but approximates the hard limit fn.The rule is: Δ w
i = c (d
i – O
i) f'(Σ w
i x
i) x
k
= - c (d
i -O
i) * O
i * (1 – O
i) * x
k
Hill-climbing algorithmc determines how much the weight changes in a
single step
Multilayer Network
Since a single-layer perceptron network is not
computationally complete, we allow for a
multilayer network where the output of each layer is
the input for the next layer (except for the final
layer, the output layer). The first layer whose input
comes from the external source is the input layer.
All other layers are called hidden layers.
Training a ML Network
How can we train a multilayer network? Given a
training example, the ouput layer can be trained like
a single-layer network by comparing the expected
output to the actual output and adjusting the weights
going of the lines going into the output layer
accordingly. But how can the hidden layers (and the
input layer) be trained?
Training an ML Network (cont'd)
The solution is to assign a certain amount of blame,
delta, to each neuron in a hidden layer (or the input
layer) based on its contribution to the total error.
The blame is used to adjust the weights. The blame
for a node in the hidden layer (or the input layer) is
calculated by using the blame values for the next
layer.
Backpropagation
To train a multilayer network we use the
backpropagation algorithm. First we run the
network on a training example. Then we compare
the expected output to the actual output to calculate
the error. The blame (delta) is attributed to the non-
output-layer nodes by working backward, from the
output layer to the input layer. Finally the blame is
used to adjust the weights on the connections.
Backpropagation (cont'd)
Δ wi = - c * (d
i -O
i) * O
i * (1 – O
i) * x
k, for output
nodesΔ w
i = - c * O
i * (1 – O
i) * Σ
j(-delta
j * w
ij) * x
k, for
hidden and input nodeswhere delta
j = (d
i – O
i) * O
i * (1 – O
i) or
deltaj = - O
j * (1 – O
j) * Σk(-delta
k * w
jk)
Example - NETtalk
NETtalk is a neural net for pronouncing English
text.The input consists of a sliding window of seven
characters. Each character may be one of 29 values
(26 letters, two punctuation chars, and a space), for
a total of 203 input lines.There are 26 output lines (21 phonemes and 5 to
encode stress and syllable boundaries).There is a single hidden layer of 80 units.
NETtalk (cont'd)
Uses backpropagation to trainRequires many passes through the training setResults comparable to ID3 (60% correct)The hidden layers serve to abstract information
from the input layers
Competitive Learning
Can be supervised or unsupervised, the latter
usually for clusteringIn Winner-Take-All learning for classification, one
output node is considered the “winner.” The weight
vector of the winner is adjusted to bring it closer to
the input vector that caused the win.Kohonen Rule: Δ wt = c (Xt-1 – Wt-1)Don't need to compute f(x), weighted sum sufficient
Kohonen Network
Can be used to learn prototypesInductive bias in terms of the number of prototypes
originally specified.Start with random prototypesEssentially measures the distance between each
prototype and the data point to select the winnerReinforces the winning node by moving it closer to
the input dataSelf-organizing network
Support Vector Machines
Form of supervised competitive learningClassifies data to be in one of two categories by
finding a hyperplane (determined by the support
vectors) between the positive and negative instancesClassifies elements by computing the distance from
a data point to a hyperplane as an optimization
problemRequires training and linearly separable data, o.w.,
doesn't converge.