2
Last time
• Biologically realistic architecture
• Dynamic interactive behaviour
• Natural learning protocols
• Biologically-inspired associative memories
• Also… steps away from biologically realistic model
• Unsupervised learning
• Applications
Today
Recurrent neural nets: Attractor neural nets:
3
Recurrent Nets: Pros & Cons
• Biologically-realistic architecture/performance
• Complex self-sustained activity
• Distributed representations
• Dynamic interactions with environment
• Powerful computation• Noise tolerance• Graceful degradation
• Hard to formalise in information processing terms
• Hard to visualise activity• Hard to train with no
guarantee of convergence
• No guaranteed solution
Pros Cons
attractor neural nets are a special case of recurrent nets.
Pros
4
Associative Memory
The imprinting and recollection of memories is an important component of what we do & how we process information.
If we were to model these processes, here are a few conditions we might want to include in our model:
• Store and reliably recall multiple independent memories.• Given only partial input, recall complete information, or• Given noisy input, recall noise-free prototype information.• Learn new memories in a biologically realistic manner.• Recall memories fast enough (before next input is received) • Once recalled, maintain attention or memory long enough (for information processing & transmission elsewhere in brain).
5
Attractor Neural Nets
In some cases, trajectories in state-space can be guaranteed to lead to one of several stable states (fixed points, cycles or generic attractors).
In RNNs, the state of the system is dictated both by internal dynamics & the system response to inputs from the environment.
cross-section of energy landscape
state space
6
Attractor Neural Nets (cont.)
Dynamical systems such as RNNs could serve as models of associative memory if it was possible to encode each memory in a specific stable state or attractor.
In 1982, John Hopfield realised that by imposing a couple of restrictions on the architecture of the nets, he could guarantee the existence of attractors, such that every initial condition would necessarily evolve to a stable solution, where it would stay.
This is tantamount to the requirement that the above picture be described in terms of an energy landscape.
7
Attractor Neural Nets: Architecture
• No self-connectedness
• All connections are symmetric wij = wji wij
wji i
wii=0
j
“Gerard Toulouse has called Hopfield’s use of symmetric connections a ‘clever step backwards from biological realism’. The cleverness arises from the existence of an energy function.”*
* Hertz, Krogh & Palmer Introduction to the theory of neural computation (1990).
The existence of an energy function provides us with:• A formalism of the process of memory storage and recall• A tool to visualise the activity (both learning and recall)• A straightforward way to train the net• Once trained, a guaranteed solution (recall of the correct memory).
8
How Does It Work?Nodes are modelled by conventional binary MP neurons.
Each neuron serves both as an input and output unit. (There are no hidden units.)
States are given by the pattern of activity of the neurons (e.g. 101 for a network with three neurons). The number of neuron sets the maximum length for a bit-string of memory.
Different patterns can be simultaneously stored in the network. The number of independent patterns that can be remembered is less than or equal to the number of nodes.
Memory recall corresponds to a trajectory taking the system from some initial state (input) to the local energy minimum (closest association) .
Each step along the (recall) trajectory results in the same or a lower energy. Since energy is bounded from below, a solution is guaranteed for every input.
9
1 2 3 4 5 (neuron i)
5
4
3
2 1 0 1 -1 2 -3
1 0 3 -1 0
-1 3 0 1 -2
2 -1 1 0 1
-3 0 -2 1 0
Weight matrix
(ne
uro
n j)
0 1 0 0 0
A working example1 2 3 4 5
Input (t=0)
t=1
t=2
t=3
t=4
1
0 1 1 1 0
1 1 1 1 0
1 1 0 1
1 1 1 1 01 1 1 1 0
t
1 1 1 1 0
...-3 4 0 3 -5
2 2 4 0 -12 3 3 2 -4
1 0 3 -1 0∑ xi wij
i
Exercise: repeat this example with an initial input of [ 0 1 0 1 0 ].
threshold = 0
10
More general examples
Stability:
The stable pattern reached in the working example represents a fixed point in the dynamics.
While stable solutions are guaranteed, not all stable solutions are fixed point solutions.
State update rule:
This example used a “synchronous updating” method.
Asynchronous (sequential or random) updating methods can also be implemented.
11
Trajectories in energy landscape
1) neuron coding [0 1] [-1 1]
Where does energy come in? The formalism we need to answer this question comes from physics and requires slight modifications to our notation:
3) Asynchronous updateSpin Glass
2) Threshold now becomes a “sign” function:
sign(input) =x=1 if input>0old x if input=0x= -1 if input<0{
12
From energies to MP neurons
We have just re-discovered that the MP update rule exactly corresponds to magnetic field alignment in Spin Glasses!
xi sign(inputi)
This is called the mean field approximation: the “magnetic field” at each node (each spin) corresponds to a weighted average over all the fields generated by all other spins. When a specific node senses this field, it wants to align with the mean field, thus reducing its energy. This is the update rule:
Define the energy of node i as
inputi = ∑ xj wijj
Ei = - xi inputi
where inputi to node i is the weighted sum over
all neurons
13
• All connections are symmetric wij = wji
Attractor Neural NetsThe restrictions imposed on the recurrent net are now:
• No self-connectedness
wij
wji i
wii=0
j
“Gerard Toulouse has called Hopfield’s use of symmetric connections a ‘clever step backward from biological realism’. The cleverness arises from the existence of an energy function.”*
The existence of an energy function provides us with:• A formalism of the process of memory storage and recall• A tool to visualise the activity (both learning and recall)• A straightforward way to train the net• Once trained, a guaranteed solution (recall of the correct memory).
14
Training the Net
We need to find the set of weights that encode a single pattern p of length N bits as a minimum energy solution.
The minimum energy is obtained when the output of node i exactly matches the inputs to that node.
)p wsign( p j1j
iji
N
OK)pp p1
sign( p jj1j
ii
N
N
jiij pp1
wN
Plug in a guess for w:
so that
15
Training the Net (cont.)
Now generalising for M memories or patterns:
)(j
1
)(iij pp
1 w m
M
m
m
N
This weight assignment is remarkably reminiscent of Hebbian learning:
If two nodes are spiking at the same time, then the
weight connecting them is strengthened. Here anti-
correlated nodes result in negative (inhibitory) weights.
For gradual learning . Only patterns
introduced repeatedly will result in the formation of new
memories; noise will be ignored.
Generalised Hebb Rule
jiij pp1
wN
16
Does it work?
http://www.cs.tcd.ie/Padraig.Cunningham/applets/Hopfield/Hopfield.htmfrom http://suhep.phy.syr.edu/courses/modules/MM/SIM/Hopfield/ where no longer available.
This applet demonstrates: The distributed representationThe ability to perform powerful computationHigh storage capacity (7 100-bit patterns in 100 neurons)High fidelity and noise toleranceGraceful degradation (for more memories)
Eliminated features: Dynamic inputs in trainingIntrinsic background activity
17
Storage CapacityHow many memories can be stored in the network? To store M memories, each of length N bits, in a network of N neurons, we first ask how many stable patterns can be reached? In 1987, McEliece et al derived an upper limit for the number of memories that can be stored accurately: M = N/(2 logN).
e.g. for N = 100 neurons, M = 11 distinct memories, each 100 bits long can be faithfully stored and recalled. To write out these 11 distinct memories, would take 1100 bits!
In general, the coding efficiency of the network can be summarised as 2 log N neurons per pattern (each N bits long).This enormous capacity is paid for by a potentially lengthy recall process.
McEliece et al., (1987) IEEE Trans. Inf. Theor. IT-33:461-482.
18
Hopfield nets have obvious applications for any problem that can be posed in terms of optimisation in the sense of maximising or minimising some function, that can be likened to an energy function.
The distance matching problem:
The travelling salesman problem:
Applications
shortestlength
match pairsgiven points
given points find path shortest
19
What about the brain (pros)?Hopfield nets maintain some very attractive features from recurrent net architectures. However, the imposition of symmetric weights was a conscious move away from biological realism and toward engineering-like reliability.
In contrast, Hopfield nets seem more biologically realistic in disallowing self-connected neurons.
Hebbian-like learning is also a great appeal of Hopfield nets, capturing several important principles:
(1) unsupervised learning
(2) natural synaptic plasticity
(3) No necessary distinction between training & testing.
(4) robustness to details of training procedure
20
While we now have dynamics in training and in recall, we might still ask is this dynamics realistic in the brain?
1) In the memory recall stage: we consider inputs one at a time, waiting for the association to be made before proceeding to the next pattern. Is this how the brain works?
2) The aspiration of every Hopfield net is to arrive at a stable solution. Is this a realistic representation of association, or cognition in general in the brain? In other words, do we represent solutions to problems by relaxing to stable states of activity in the brain, or does the brain represent solutions according to very different, dynamic paradigms that handle continuous inputs and actively resist falling into the abyss of equilibrium?
What about the brain (cons)?
21
How memories are implanted in our brains remains an exciting research question.
While Hopfield nets no longer participate in this discourse, their formative role in shaping our intuition about associative memories remains admirable.
What about the brain? (cont.)
22
Next time…
Reading• John Hopfield (1982) “Neural Networks and Physical Systems with Emergent Collective Computational Properties”, Proc. Nat. Acad. Sci. 79: 2554-2588.
• A highly accessible introduction to the subject, incl. both non-technical and technical approaches can be found at:
www.shef.ac.uk/psychology/gurney/notes/contents.html
• Some food for thought: a popular article on CNN: “Researchers: It’s easy to plant false memories”, CNN.com, Feb 16, 2003.
• Final lecture about neural networks (for the time being)