December 7, 2010Neural Networks Lecture 21: Hopfield Network Convergence 1 The Hopfield Network The nodes of a Hopfield network can be updated synchronously

December 7, 2010 Neural Networks Lecture 21: Hopfield Network Convergence

1

The Hopfield NetworkThe Hopfield Network

The nodes of a Hopfield network can be updated The nodes of a Hopfield network can be updated synchronously or asynchronously.synchronously or asynchronously.

SynchronousSynchronous updating means that at time step (t+1) updating means that at time step (t+1) every neuron is updated based on the network state at every neuron is updated based on the network state at time step t.time step t.

In In asynchronousasynchronous updating, a random node k updating, a random node k11 is picked is picked

and updated, then a random node kand updated, then a random node k22 is picked and is picked and

updated (already using the new value of kupdated (already using the new value of k11), and so on.), and so on.

The synchronous mode can be problematic because it The synchronous mode can be problematic because it may never lead to a stable network state. may never lead to a stable network state.


2

Asynchronous Hopfield NetworkAsynchronous Hopfield Network

Current network state O, attractors (stored patterns) Current network state O, attractors (stored patterns) X and Y:X and Y:

OO

XX

YY


3


After first update, this could happen:After first update, this could happen:

OO

XX

YY


4


… … or this:or this:

OOXX

YY


5

Synchronous Hopfield NetworkSynchronous Hopfield Network

What happens for synchronous updating?What happens for synchronous updating?

OO

XX

YY


6


Something like shown below. And then?Something like shown below. And then?

OOXX

YY


7


The network may oscillate between these two states The network may oscillate between these two states forever.forever.

OO

XX

YY


8

The Hopfield NetworkThe Hopfield Network

The previous illustration shows that the The previous illustration shows that the synchronoussynchronous updating rule may never lead to a stable network updating rule may never lead to a stable network state.state.

However, is the However, is the asynchronousasynchronous updating rule updating rule guaranteed to reach such a state within a finite guaranteed to reach such a state within a finite number of iterations?number of iterations?

To find out about this, we have to characterize the To find out about this, we have to characterize the effect of the network dynamics more precisely.effect of the network dynamics more precisely.

In order to do so, we need to introduce an energy In order to do so, we need to introduce an energy function. function.


9

The Energy FunctionThe Energy Function

Updating rule (as used in the textbook):Updating rule (as used in the textbook):

(t)I(t)xwtx p,kp,j

n

jk,jkp

1, sgn)1(

Often,Often,

otherwise ,0

0 if input, initial)(,

ttI kp


10

The Energy FunctionThe Energy FunctionGiven the way we determine the weight matrix W (but also for iterative learning methods) , we expect Given the way we determine the weight matrix W (but also for iterative learning methods) , we expect the weight from node j to node l to be proportional to:the weight from node j to node l to be proportional to:

for P stored input patterns.for P stored input patterns.

In other words, if two units are often active (+1) or In other words, if two units are often active (+1) or inactive (-1) together in the given input patterns, we inactive (-1) together in the given input patterns, we expect them to be connected by large, positive weights.expect them to be connected by large, positive weights.

If one of them is active whenever the other one is not, If one of them is active whenever the other one is not, we expect large, negative weights between them. we expect large, negative weights between them.

P

pjplpjl iiw

1,,, )(


11

The Energy FunctionThe Energy FunctionSince the above formula applies to all weights in the network, we expect the following expression to be Since the above formula applies to all weights in the network, we expect the following expression to be positive and large for each stored pattern (attractor pattern):positive and large for each stored pattern (attractor pattern):

We would still expect a large, positive value for those We would still expect a large, positive value for those input patterns that are very similar to any of the input patterns that are very similar to any of the attractor patterns.attractor patterns.

The lower the similarity, the lower is the value of this The lower the similarity, the lower is the value of this expression that we expect to find.expression that we expect to find.

l j

jplpjl iiw ,,,


12

The Energy FunctionThe Energy FunctionThis motivates the following approach to an energy function, which we want to decrease with greater similarity of This motivates the following approach to an energy function, which we want to decrease with greater similarity of the network’s current activation pattern to any of the attractor patterns (similar to the error function in the BPN):the network’s current activation pattern to any of the attractor patterns (similar to the error function in the BPN):

If the value of this expression is minimized (possibly If the value of this expression is minimized (possibly by some form of gradient descent along activation by some form of gradient descent along activation patterns), the resulting activation pattern will be close patterns), the resulting activation pattern will be close to one of the attractors. to one of the attractors.

l j

jljl xxw ,


13

The Energy FunctionThe Energy FunctionHowever, we do not want the activation pattern to arbitrarily reach one of the attractor patterns.However, we do not want the activation pattern to arbitrarily reach one of the attractor patterns.

Instead, we would like the final activation pattern to be the attractor that is most similar to the initial input to the network.Instead, we would like the final activation pattern to be the attractor that is most similar to the initial input to the network.

We can achieve this by adding a term that penalizes deviation of the current activation pattern from the initial input.We can achieve this by adding a term that penalizes deviation of the current activation pattern from the initial input.

The resulting energy function has the following form:The resulting energy function has the following form:

l j l

lljljl xIbxxwaE ,


14

The Energy FunctionThe Energy FunctionHow does this network energy change with every application of the asynchronous updating rule?How does this network energy change with every application of the asynchronous updating rule?

)()1()( tEtEtE

l lj l

llljljljl txtxIbtxtxtxtxwatE )]()1([)]()()1()1([)( ,

When updating node k, xWhen updating node k, x jj(t+1)=x(t+1)=xjj(t) for every node jk:(t) for every node jk:

kj

kkkjkkkjjk txtxbItxtxtxwwatE )]()1([)]())()1()([()( ,,

))()1(()]()()( ,, txtxbItxwwatE kkkj

kjkjjk


15


Since wSince wk,jk,j = w = wj,kj,k, if we set a = 0.5 and b = 1 we get: , if we set a = 0.5 and b = 1 we get:

This means that in order to reduce energy, the k-th This means that in order to reduce energy, the k-th node should change its state if and only ifnode should change its state if and only if

))()1(()()( , txtxItxwtE kkkj

kjkj

)()(net)( txttE kk

0)()(net txt kk

In other words, the state of a node should change In other words, the state of a node should change whenever it differs from the sign of the net input.whenever it differs from the sign of the net input.


16

The Energy FunctionThe Energy FunctionAnd this is exactly what our asynchronous updating rule does!And this is exactly what our asynchronous updating rule does!

Consequently, every weight update reduces the network’s energy.Consequently, every weight update reduces the network’s energy.

By definition, every possible network state (activation pattern) is associated with a specific energy.By definition, every possible network state (activation pattern) is associated with a specific energy.

Since there is a finite number of states that the network can assume (2Since there is a finite number of states that the network can assume (2nn for an n-node network), and every update leads to a state of lower energy, there can only be a finite number of updates. for an n-node network), and every update leads to a state of lower energy, there can only be a finite number of updates.


17


Therefore, we have shown that the network reaches a Therefore, we have shown that the network reaches a stable state after a finite number of iterations.stable state after a finite number of iterations.

This state is likely to be one of the network’s stored This state is likely to be one of the network’s stored patterns.patterns.

It is possible, however, that we get stuck in a local It is possible, however, that we get stuck in a local energy minimum and never reach the absolute energy minimum and never reach the absolute minimum (just like in BPNs).minimum (just like in BPNs).

In that case, the final pattern will usually be very In that case, the final pattern will usually be very similar to one of the attractors, but not identical. similar to one of the attractors, but not identical.

Documents

December 7, 2010Neural Networks Lecture 21: Hopfield Network Convergence 1 The Hopfield Network The nodes of a Hopfield network can be updated synchronously