STATISTICAL MECHANICS OF NEURAL NETWORKS

STATISTICAL MECHANICSOF NEURAL NETWORKS

Studies of disordered systems hovegenerated new insights into the

cooperative behavior and emergentcomputational properties of large,

highly connected networks of simple,neuron-like processors.

Haim Sompolinsky

Haim Sompolinsky is a professor of physics at the RacahInstitute of Physics of the Hebrew University of Jerusalem.

7 0 PHYSICS TODAY DECEMBER 1986

A neural network is a large, highly interconnectedassembly of simple elements. The elements, called neur-ons, are usually two-state devices that switch from onestate to the other when their input exceeds a specificthreshold value. In this respect the elements resemblebiological neurons, which fire—that is, send a voltagepulse down their axons—when the sum of the inputs fromtheir synapses exceeds a "firing" threshold. Neuralnetworks therefore serve as models for studies of cooperat-ive behavior and computational properties of the sortexhibited by the nervous system.

Neural network models are admittedly gross oversim-plifications of biology. But these simple models areaccessible to systematic investigations and may thereforeshed light on the principles underlying "computation" inbiological systems and on how those principles differ fromthe ones that we have so successfully mastered in buildingdigital computers. In addition, psychologists use neuralnetworks as conceptual models for understanding cogni-tive processes in the human mind. For theoreticalphysicists, understanding the dynamical properties oflarge, strongly coupled nonequilibrium systems such asneural networks is a challenging problem in its own right.

Attempts to model the working of the brain withnetworks of simple, formal neurons date back to 1943,when Warren McCulloch and Walter Pitts proposednetworks of two-state threshold elements that are capableof performing arbitrary logic operations.1 In 1949, DonaldHebb, the psychologist, proposed that neural systems canlearn and form associations through selective modifica-tions of the synaptic connections.2 Several adaptivenetworks, so called because they could learn to performsimple recognition tasks, were studied in the 1960s. Theseincluded Frank Rosenblatt's feedforward network, calledthe perceptron,3 and Bernard Widrow's adaptive linearmachine, the Adaline.4 A variety of network models forassociative memory and pattern recognition have beeninvestigated over the past two decades by several researchgroups, including those of Shun-ichi Amari,5 StephenGrossberg8 and Teuvo Kohonen.7 Physicists' interest inneural networks stems largely from the analogy betweensuch networks and simple magnetic systems. The analogywas first pointed out in 1974 by William Little.8 Recentlyactivity in this direction was stimulated by the work of

© 1988 Amehcon Institute of Physics

Network architectures, a: A feedforward system with three layers, b: A neural circuit. A circuit containsfeedback loops, such as the directed graph 2—*3—*5—*b—+2, that close on themselves. This closure gives rise torecurrent activity in the network. Figure 1

John Hopfield, who pointed out the equivalence betweenthe long-time behavior of networks with symmetricconnections and equilibrium properties of magnetic sys-tems such as spin glasses.9 In particular, Hopfield showedhow one could exploit this equivalence to "design" neuralcircuits for associative memory and other computationaltasks.

Spin glasses are magnetic systems with randomlydistributed ferromagnetic and antiferromagnetic interac-tions. The low-temperature phase of these systems—thespin glass phase—is in many ways a prototype forcondensation in disordered systems with conflicting con-straints. Theoretical studies have revealed that in spinglasses with long-range interactions between the spins,the energy surface (the energy as a function of the system'sstate, or spin configuration) has a rich topology, with manylocal minima very close in energy to the actual groundstate.10 (See the article by Daniel S. Fisher, Geoffrey M.Grinstein and Anil Khurana on page 56.)

Neural systems share several features with long-range spin glasses. (I will use the term "neural systems"for assemblies of real neurons.) The spatial configurationof the two systems bears no resemblance to the crystallineorder of pure magnets or solids. The couplings betweenspins in spin glasses can be both positive and negative,which is also true of the couplings between neurons. Andjust as each spin in a long-range spin glass is connected tomany others, so is each neuron in most neural systems.For example, each neuron in the cortex is typicallyconnected to about 104 neurons.11

Of course, the analogy between long-range spinglasses and neural systems is far from perfect. First, theconnections in neural systems, unlike those in spinglasses, are not distributed at random, but possesscorrelations that are formed both genetically and in thecourse of learning and adaptation. These correlationsalter the dynamical behavior of the system and endow itwith useful computational properties. Another majordifference is the asymmetry of the connections: Thepairwise interactions between neurons are, in general,not reciprocally symmetric; hence their dynamic proper-ties may be very different from those of equilibriummagnetic systems, in which the pairwise interactions aresymmetric.

In this article I will describe how the concepts andtools of theoretical physics are being applied to the studyof neural networks. As I have already indicated, themethods of equilibrium statistical mechanics have beenparticularly useful in the study of symmetric neuralnetwork models of associative memory. I will describesome of these models and discuss the interplay betweenrandomness and correlations that determines a model'sperformance. The dynamics of asymmetric networks ismuch richer than that of symmetric ones and must bestudied within the general framework of nonlineardynamics. I will discuss some dynamical aspects ofasymmetric networks and the computational potential ofsuch networks. Learning—the process by which thenetwork connections evolve under the influence of exter-nal inputs to meet new computational requirements—is acentral problem of neural network theory. I will brieflydiscuss learning as a statistical mechanical problem. Iwill comment on the applications of neural networks tosolving hard optimization problems. I will conclude with afew remarks about the relevance of neural network theoryto the neurosciences.

Basic dynamics and architectureI consider in this article neural network models in whichthe neurons are represented by simple, point-like ele-ments that interact via pairwise couplings called synapses.The state of a neuron represents its level of activity.Neurons fire an "action potential" when their "mem-brane potential" exceeds a threshold, with a firing ratethat depends on the magnitude of the membrane poten-tial. If the membrane potential is below threshold, theneurons are in a quiescent state. The membrane potentialof a neuron is assumed to be a linear sum of potentials in-duced by the activity of its neighbors. Thus the potentialin excess of the threshold, which determines the activity,can be denoted by a local field

h (A _ y j SJ{t) + 1

/Ti " 2

The synaptic efficacy JtJ measures the contribution of theactivity of the yth, presynaptic neuron to the potentialacting on the rth, postsynaptic neuron. The contribution

PHYSICS TODAY DECEMBER 1988 7 1

•+enO

£1.51.0

I 0.50.05 0.10

a0.15

Errors per neuron increase discontinuously asT—>-0 in the Hopfield model, signaling acomplete loss of memory, when theparameter a = pi N exceeds the critical value0.14. Here p is the number of randommemories stored in a Hopfield network of Nneurons. (Adapted from ref. 16.) Figure 2

is positive for excitatory synapses and negative forinhibitory ones. The activity of a neuron is represented bythe variable S,, which takes the value — 1 in the quiescentstate and + 1 in the state of maximal firing rate. The val-ue of the threshold potential of neuron i is denoted by 6,.

In this model there is no clock that synchronizesupdating of the states of the neurons. In addition, thedynamics should not be deterministic, because neuralnetworks are expected to function also in the presence ofstochastic noise. These features are incorporated bydenning the network dynamics in analogy with the single-spin-flip Monte Carlo dynamics of Ising systems at a finitetemperature.'J The probability that the updating neuron,which is chosen at random, is in the state, say, + 1 at timet + dt is

Rhl(t)) = -exp( -4/3h,)

(2)

where ht is the local field at time t, and T=l//3is the tem-perature of the network. In the Monte Carlo process, theprobability of a neuron's being in, say, state + 1 at timet + dt is compared with a random number and theneuron's state is switched to + 1 if the probability isgreater than that number. The temperature is a measureof the level of stochastic noise in the dynamics. In thelimit of zero temperature, the dynamics consists of single-spin flips that align the neurons with their local fields,that is, Si(t + dt) = sign(/i, (t)). The details of the dynamicsand, in particular, the specific form of Pih) are largely amatter of convenience. Other models for the dynamics,including ones involving deterministic, continuous-timedynamics of analog networks, have also been studied.

The computational process in a neural networkemerges from its dynamical evolution, that is, from flowsin the system configuration space. The end products ofthis evolution, called attractors, are states or sets of statesto which the system eventually converges. Attractors mayconsist of stable states, periodic orbits or the so-calledstrange attractors characteristic of chaotic behavior.Understanding the dynamics of the system involvesknowing the nature of its attractors, as well as their basinsof attraction and the time the system takes to converge tothe attractors. In stochastic systems such as ours, one hasto take into account the smearing of the trajectories by thestochastic noise.

The behavior of the network depends on the form ofthe connectivity matrix JtJ. Before specifying this matrixin detail, however, I will discuss the basic features ofnetwork architecture. Studies of neural networks havefocused mainly on two architectures. One is the layerednetwork (see figure la), in which the information flowsforward, so that the computation is a mapping from thestate of the input layer onto that of the output layer. Theperceptron, consisting of only an input and an outputlayer, is a simple example of such a feedforward network.Although interest in the perceptron declined in the 1960s,interest in feedforward networks that contain hidden

layers has revived in the last few years as new algorithmshave been developed for the exploitation of these morepowerful systems. The usefulness of multilayer networksfor performing a variety of tasks, including associativememory and pattern recognition, is now being activelystudied.13 As dynamical systems, feedforward networksare rather primitive: The input layer is held in a fixedconfiguration and all the neurons in each subsequentlayer compute their states in parallel according to thestates of the preceding layer at the previous time step. Iwill focus on a different class of network models, namely,networks that contain feedback loops. These networks Iterm neural circuits (see figure lb). Many structures inthe cortex show extensive feedback pathways, suggestingthat feedback plays an important role in the dynamics aswell as in the computational performance. Feedback loopsare essential also for the function of nervous systems thatcontrol stereotypical behavioral patterns in animals.14

Besides their biological relevance, however, neural cir-cuits are interesting because the long iterative dynamicalprocesses generated via the feedback loops endow themwith properties not obtained in layered networks ofcomparable sizes.

Symmetric circuits and Ising magnetsThe dynamics may be rather complex for a general circuitof the type described above, but it is considerably simplerin symmetric circuits, in which the synaptic coefficientsJtJ and Jj, are equal for each pair of neurons. In that casethe dynamics is purely relaxational: There exists afunction of the state of the system, the energy function,such that at T=0 the value of this function alwaysdecreases as the system evolves in time. For a circuit oftwo-state neurons the energy function has the same formas the Hamiltonian of an Ising spin system:

(3)

The first term represents the exchange energy mediatedby pairwise interactions, which are equal in strength tothe respective synaptic coefficients. The last term is theenergy due to the interaction with external magneticfields, which, in our case, are given by

The existence of an energy function implies that at T = 0the system flows always terminate at the local minima ofE. These local minima are spin configurations in whichevery spin is aligned with its local field. At nonzero T,the notion of minima in configuration space is moresubtle. Strictly speaking, thermal fluctuations will even-tually carry any finite system out of the energy "valleys,"leading to ergodic wandering of the trajectories. If theenergy barriers surrounding a valley grow with the size ofthe system, however, the probability of escaping the

72 PHYSICS TODAY DECEMBER, 19SS

valley may vanish in the thermodynamic limit, N~x, atlow temperatures. In that case energy valleys becomedisjoint, or disconnected, on finite time scales, and onesays that ergodicity is broken. Each of these finite-temperature valleys represents a distinct thermodynamicstate, or phase.

The analogy with magnetic systems provides anumber of lessons that are useful in understanding thestructure of the energy terrain and its implications for thedynamics of neural circuits. Ising ferromagnets with aconstant positive value of JtJ are easily equilibrated evenat relatively low temperatures. Their energy landscapehas only two minima: one for each direction of the totalmagnetization. By contrast, a disordered magnet canrarely equilibrate to its low-temperature equilibrium stateon a reasonable time scale. This is particularly true ofspin glasses; their energy landscapes possess an enormousnumber of local minima, of higher energy than the groundstate, that are surrounded by high-energy barriers. A spinglass usually gets stuck in one of these local minima anddoes not reach its equilibrium state when it is cooled to lowtemperatures.

I have already mentioned the presence of disorder andinternal competition in many real neural assemblies.Indeed, most of the interesting neural circuits that havebeen studied are disordered and frustrated. (Frustrationmeans that a system configuration cannot be found inwhich competing interactions are all satisfied.) Neverthe-less there are applications of neural circuits in which thecomputational process is not as slow and painful asequilibrating a spin glass. Among important examples ofsuch applications are models of associative memory.

Associative memoryAssociative memory is the ability to retrieve storedinformation using as clues partial sets or corrupted sets ofthat information. In a neural circuit model for associativememory the information is stored in a special set of statesof the network. Thus, in a network of ./V neurons a set ofpmemories are represented as p, TV-component vectors S'',fx = \,... ,p. Each component S[' takes the value + 1 or— 1 and represents a single bit of information. Themodels are based on two fundamental hypotheses. First,the information is stored in the values of the JtJ 's. Second,recalling a memory is represented by the settling of theneurons into a persistent state that corresponds to thatmemory, implying that the states S'' must be attractors ofthe network.

Associative memory is implemented in the presentmodels in two dynamic modes. Information is stored inthe learning mode. In this mode the p memories arepresented to the system and the synaptic coefficientsevolve according to the learning rules. These rules ensurethat at the completion of the learning mode, the memorystates will be attractors of the network dynamics. Insymmetric networks the Ju are designed so that S'' will belocal minima of E. The stored memories are recalled

0.0

Phase diagram of the Hopfield model. Thesolid blue line marks the transition from thehigh-temperature ergodic phase to a spin glassphase, in which the states have negligibleoverlap with the memories. Memory phases,that is, valleys in configuration space that areclose to the embedded memory states, appearbelow the solid red line. A first-ordertransition occurs along the dashed line; belowthis line the memory phases become theglobally stable phases of the model. (Adaptedfrom ref. 16.) Figure 3

associatively in the second, retrieval mode. In thelanguage of magnetism, the Ju 's are quenched and thedynamic variables are the neurons.

In the retrieval mode, the system is presented withpartial information about the desired memory. This putsthe system in an initial state that is close in configurationspace to that memory. Having "enough" informationabout the desired memory means that the initial state is inthe basin of attraction of the valley corresponding to thememory, a condition that guarantees the network willevolve to the stable state that corresponds to that memory.(An illustration of the recall process is given in the figureon page 23 of this issue.)

Some of the simplest and most important learningparadigms are based on the mechanism suggested byHebb.2 Hebb's hypothesis was that when neurons areactive in a specific pattern, their activity induces changesin their synaptic coefficients in a manner that reinforcesthe stability of that pattern of activity. One variant ofthese Hebb rules is that the simultaneous firing of a pair ofneurons i and j increases the value of JtJ, whereas if onlyone of them is active, the coupling between them weakens.Applying these rules to learning sessions in which theneural activity patterns are the states S'' results in thefollowing form for synaptic strengths:

</,,= — f s>;s>;1 AT A ' (4)

This simple quadratic dependence of Ju on S'' is only oneof many versions of the Hebb rules. Other versions havealso been studied. They all share several attractive

PHYSICS TODAY DECEMDEP, 1988 7 3

features. First, the changes in the Jt] 's are local: Theydepend only on the activities in the memorized patterns ofthe presynaptic and postynaptic neurons i and,/. Second, anew pattern is learned in a single session, without the needfor refreshing the old memories. Third, learning isunsupervised: It is performed without invoking theexistence of a hypothetical teacher. Finally, regarding theplausibility of these learning rules occurring in biologicalsystems, it is encouraging to note that Hebb-like plasticchanges in synaptic strengths have been observed inrecent years in some cortical preparations.15

I turn now to the performance of a network forassociative memory in the retrieval mode, assuming thatthe learning cycle has been completed and the JtJ 's havereached a given static limit. The performance is charac-terized by several parameters. One is the capacity, that is,the maximum number of memories that can be simulta-neously stabilized in the network. The associative natureof the recall is characterized by the sizes of the basins of at-traction of the valleys corresponding to the memory states.These basins are limited in size by the presence of other,spurious attractors, which in the case of symmetricnetworks are local minima of E other than the memorystates. The convergence times within the basins deter-mine the speed of recall. Another important issue is therobustness of the network to the presence of noise or tofailures of neurons and synapses. A theoretical analysis ofmost of these issues is virtually impossible unless oneconsiders the thermodynamic limit, TV— oo, where answerscan be derived from a statistical mechanical investigation.I discuss below two simple associative memory networksthat are based on the Hebb rules.

The Hopfield modelThe Hopfield model consists of a network of two-stateneurons evolving according to the asynchronous dynamicsdescribed above. It has an energy function of the formgiven in equation 3, with symmetric connections given bythe Hebb rule (equation 4) and K] = 0. The memories areassumed to be completely uncorrelated. They are there-fore represented by quenched random vectors S'', each ofwhose components can take the values + 1 with equalprobability.9

The statistical mechanical theory of the Hopfieldmodel, derived by Daniel Amit, Hanoch Gutfreund andmyself at the Hebrew University of Jerusalem, hasprovided revealing insight into the workings of thatsystem and set the stage for quantitative studies of othermodels of associative memory.16 The theory characterizesthe different phases of the system by the overlaps of thestates within each phase with the memories, given by

£ (5)

where S, is the state of the neuron i in that phase. All theM'"s are of order 1/N"2 for a state that is uncorrelatedwith the memories, whereas a Af for, say, /n = 2 is of

order unity if a state is strongly correlated with memory 2.To understand why this model functions as associ-

ative memory, let us consider for a moment the casep = 1,when there is only a single memory. Obviously, the statesS, = S] and S, = — S) are the ground states of E, since inthese states every bond energy — Jtj S^j has its mini-mum possible value, — l/N. Thus, even though the </y'sare evenly distributed around zero, they are also spatiallycorrelated so that all the bonds can be satisfied, exactly ashappens in a pure ferromagnet. In a large system, addinga few more uncorrelated patterns will not change theglobal stability of the memories, since the energy contribu-tion of the random interference between the patterns issmall. This expectation is corroborated by our theory. Aslong as p remains finite as iV— oo, the network isunsaturated. There are 2p degenerate ground states,corresponding to the p memories and their spin-reversedconfigurations. Even for small values of p, however, thememories are not the only local minima. Other minimaexist, associated with states that strongly "mix" severalmemories. These mixture states have a macroscopic value(that is, of order unity) for more than one component Af.Asp increases, the number of the mixture states increasesvery rapidly with N. This decreases the memories' basinsof attraction and eventually leads to their destabilization.

A surprising outcome of the theory is that despite thefact that the memory states become unstable if p > Nl(2 \nN), the system provides useful associative memoryeven when p increases in proportion to N. The randomnoise generated by the overlaps among the patternsdestabilizes them, but new stable states appear, close tothe memories in the configuration space, as long as theratio a = p/N is below a critical value ac = 0.14. Memo-ries can still be recalled for a < ac, but a small fraction ofthe bits in the recall will be incorrect. The averagefraction of incorrect bits e, which is related to the overlapwith the nearest memory by e = (1 — MM2, is plotted infigure 2. Note that e rises discontinuously to 0.5 at ac,which signals the "catastrophic" loss of memory thatoccurs when the number of stored memories exceeds thecritical capacity. The strong nonlinearity and the abun-dance of feedback in the Hopfield model are the reasonsfor this behavior.

Near saturation, when the ratio a is finite, the natureof the spurious states is different from that in theunsaturated case. Most of the states now are seeminglyrandom configurations that are only slightly biased in thedirection of the memories. The overlaps of these spuriousstates with the memories are all of order 1/NU2. Theirstatistical properties are similar to those of states ininfinite-range spin glasses.

A very important feature of the Hopfield model is theappearance of distinct valleys near each of the memories,even at nonzero temperatures. This implies that theenergy barriers surrounding these minima diverge with N,and it also indicates that the memories have substantialbasins of attraction. The existence of memory phasescharacterized by large overlaps with the memory states


EXCITATORY SYNAPSES (MODIFIABLE) INHIBITORY SYNAPSES (FIXED)

EXCITATORY NEURONS

EXCITATORYSYNAPSES(FIXED)

INHIBITORYSYNAPSES

(FIXED)

INHIBITORY NEURONS

Associative memory circuit with a biologically plausiblearchitecture. The circuit consists of two neural populations:excitatory neurons, which excite other neurons into the active state,and inhibitory ones, which inhibit the activity of other neurons.Information is encoded only in the connections between theexcitatory neurons. The synaptic matrix for the circuit isasymmetric, and for appropriate values of the synaptic strengths andthresholds, the circuit's dynamics might converge to oscillatoryorbits rather than to stable states. Figure 4

implies that small amounts of noise do not disrupt theperformance of the system entirely but do increase theinaccuracy in the retrieval. The full phase diagram of themodel at finite a and T is shown in figure 3. The diagramshows that accurate memory phases exist even when theyare not the global minima of E. This feature distinguishesthe model from those encountered in equilibrium statisti-cal mechanics: For a system to be able to recallassociatively, its memory states must be robust localminima having substantial basins of attraction, but theydo not necessarily have to be the true equilibrium states ofthe system.

The Willshow modelFrom a biological point of view, the Hopfield model hasseveral important drawbacks. A basic feature of themodel is the symmetry of the connections, whereas thesynaptic connections in biological systems are usuallyasymmetric. (I will elaborate on this issue later.) An-other characteristic built into the Hopfield network is theup-down, or S — — S, symmetry, which occurs naturallyin magnetic systems but not in biology. This symmetryappears in the model in several aspects. For one, theexternal magnetic fields, h° of equation 3, are set to zero,and this may require fine tuning of the values of theneuronal thresholds. More important, the memorieshave to be completely random for the model to work,implying that about half of the neurons are active in eachof the memory states. By contrast, the observed levels ofactivity in the cortex17 are usually far below 50%. Fromthe point of view of memory storage as well, there areadvantages to sparse coding, in which only a smallfraction of the bits are + 1.

Another feature of the Hopfield model is that eachneuron sends out about an equal number of inhibitory andexcitatory connections, both having the same role in thestorage of information. This should be contrasted with thefact that cortical neurons are in general either excitatoryor inhibitory. (An excitatory neuron when active excitesother neurons that receive synaptic input from it; aninhibitory neuron inhibits the activity of its neighbors.)Furthermore, the available experimental evidence for

Hebb-type synaptic modifications in biological systems sofar has demonstrated Hebb-type activity-dependentchanges only of excitatory synapses.15

An example of a model that has interesting biologicalfeatures is based on a proposal made by David Willshawsome 20 years ago.18 Willshaw's proposal can be imple-mented in a symmetric circuit of two-state neurons whosedynamics are governed by the energy

E= -\ 2where

N

(6)

(7)

where 0 (x) is 0 for x = 0 and 1 for x > 0. Thus the synapsesin this model have two possible values. A JtJ is 0 if neuronsi and j are simultaneously active in at least one of thememory states, and it is — 1/N otherwise. The memoriesare random except that the average fraction of activeneurons in each of the states S'' is given by a parameter fthat is assumed to be small. This model is suitable for stor-ing patterns in which the fraction of active neurons issmall, particularly when f^Q in the thermodynamic limit.These sparsely coded memories are perfectly recalled aslong as p < \n(Nf/\nN)/f2. The capacity of the Willshawmodel with sparse coding is far better than that of theHopfield model. There are circuits for sparse coding thathave a much greater capacity. For example, the capacityin some is given by p < N/\f( — In /")].19

The learning algorithm implicit in equation 7 isinteresting in that it involves only enhancements of theexcitatory components of the synaptic interactions. Theinhibitory component is uniform, — 1/N in equation 7,and its role is to suppress the activity of all neurons exceptthose with the largest excitatory fields. This ensures thatonly the neutrons that are "on" in the memory state areactivated. Furthermore, the same effect can be achievedby a model in which the inhibitory synaptic componentsrepresent not direct synaptic interactions between the Nneurons of the circuit but indirect interactions mediated


by other inhibitory neurons. This architecture, which isillustrated in figure 4, is compatible with the knownarchitecture of cortical structures. Finally, information isstored in this model in synapses that assume only twovalues. This is desired because the analog depth ofsynaptic strengths in the cortex is believed to be rathersmall.

The two models discussed above serve as prototypesfor a variety of neural networks that use some form of theHebb rules in the design of the synaptic matrix. Most ofthese models share the thermodynamic features describedabove. In particular, at finite temperatures they alreadyhave distinct memory phases; near saturation, the mem-ory states are corrupted by small fractions of erroneousbits, and spin glass states coexist with the memory states.Toward the end of this article I will discuss other learningstrategies for associative memory.

Asymmetric synopsesThe applicability of equilibrium statistical mechanics tothe dynamics of neural networks depends on the conditionthat the synaptic connections are symmetric, that is,J,j = Jj,. As I have already mentioned, real synapticconnections are seldom symmetric. Actually, quite oftenonly one of the two bonds Jlt and J/t is nonzero. I shouldalso mention that the fully connected circuits, which haveabundant feedback, and the purely feedforward, layerednetworks are two extreme idealizations of biologicalsystems, most of which have both a definite direction of in-formation flow and substantial feedback. Models ofasymmetric circuits offer the possibility of studying suchmixed architectures.

Asymmetric circuits have a rich repertoire of possibledynamic behaviors in addition to convergence onto astable state. The "mismatch" in the response of asequence of bonds when they are traversed in oppositedirections gives rise, in general, to time-dependent attrac-tors. This time dependence might propagate coherentlyalong feedback loops, creating periodic or quasiperiodicorbits, or it might lead to chaotic trajectories characterizedby continuous bands of spectral frequencies.

In asynchronous circuits, asymmetry plays a role inthe dynamics in several respects. At T = 0 , either thetrajectories converge to stable fixed points, as they do inthe symmetric case, or they wander chaotically inconfiguration space. Whether the trajectories end instable fixed points or are chaotic is particularly importantin models of associative memory, where stable statesrepresent the results of the computational process. Sup-pose one dilutes a symmetric network of Hebbian synap-ses, such as that in equation 4, by cutting the directedbonds at random, leaving only a fraction c of the bonds.Asymmetry is generated at random in the cutting processbecause a bond •J,l may be set to 0 while the reverse bondJj, is not. The result is an example of a network with spa-tially unstructured asymmetry. Often one can modelunstructured asymmetry by adding spatially randomasymmetric synaptic components to an otherwise symmet-ric circuit. In the above example, the resulting synapticmatrix may be regarded as the sum of a symmetric part,'/,, c, which is the same as that in equation 4, and a random

asymmetric part with a variance (pc(l — c))l/2/N.Recent studies have shown that the trajectories of

large two-state networks with random asymmetry arealways chaotic, even at T= 0. This is because of the noisegenerated dynamically by the asymmetric synaptic in-puts. Although finite randomly asymmetric systems mayhave some stable states, the time it takes to converge tothose states grows exponentially with the size of thesystem, so that the states are2" inaccessible in finite timeas N~ oo. This nonconvergence of the flows in largesystems occurs as soon as the asymmetric contribution tothe local fields, even though small in magnitude, is afinite fraction of the symmetric contribution as N— oo.In the above example of cutting the directed bonds atrandom, the dilution affects the dynamics in the thermo-dynamic limit only if c is not greater, in order ofmagnitude, than p/N.

The above discussion implies that the notion ofencoding information in stable states to obtain associativememory is not valid in the presence of random asymmetry.When the strength of the random asymmetric componentis reduced below a critical value, however, the dynamicflows break into separate chaotic trajectories that areconfined to small regions around each of the memorystates. The amount of information that can be retrievedfrom such a system depends on the amplitude of thechaotic fluctuations, as well as on the amount of timeaveraging that the external, "recalling," agent performs.

In contrast to the strictly random asymmetric synap-ses I discussed above, asymmetric circuits with appropri-ate correlations can have robust stable states at T = 0.For instance, the learning algorithms of equation 10 (seebelow), in general, produce asymmetric synaptic matrices.Although the memory states are stable states of thedynamics of these circuits, the asymmetry does affect thecircuits' performance in an interesting way. In regions ofconfiguration space far from the memories the asymmetrygenerates an "effective" temperature that leads to nonsta-tionary flows. When the system is in a state close to one ofthe memories, by contrast, the correlations induced by thelearning procedure ensure that no such noise will appear.That the behavior near the attractors representing thememories is dynamically distinct from the behavior whenno memories are being recalled yields several computa-tional advantages. For instance, a failure to recall amemory is readily distinguished from a successful attemptby the persistence of fluctuations in the network activity.

Coherent temporal patternsWhen studying asymmetric circuits at T> 0 it is useful toconsider the phases of the system instead of individualtrajectories. Phases of asymmetric circuits at finitetemperatures are defined by the averages of dynamicquantities over the stochastic noise as t — oo, where t is thetime elapsed since the initial condition. This definitionextends the notion of thermodynamic phases of symmetriccircuits. The phases of symmetric circuits are alwaysstationary, and the averages of dynamic quantities have awell-defined static limit. By contrast, asymmetric systemsmay exhibit phases that are time dependent even atnonzero temperatures. The persistence of time depend-

76 PHYSICS TODAY DECEMBER 1988

oQ

o

O< -0 .5

- 1 -

100 110 120 130 150 160

TIME (MONTE CARLO STEPS PER NEURON)

Periodic behavior of an asymmetric stochastic neural circuit of N inhibitoryand N excitatory neurons having the architecture shown in figure 4. In thesimulation, all connections between excitatory neurons had equal magnitude,1 //v. The (excitatory) connections the excitatory neurons make with theinhibitory neurons had strengths 0.75//V. The inhibitory neurons hadsynapses only with the excitatory neurons, of strength 0.75/N. The "externalfields" were set to 0 and the dynamics was stochastic, with the probabilitylaw given in equation 2. The neural circuit has stationary phases whenT> 0.5, and nonstationary, periodic phases at lower temperatures. Resultsare shown for N = 200 at T = 0.3. The green curve shows the averageactivity of the excitatory population (that is, the activity summed over allexcitatory neurons), the blue curve shows the corresponding result for theinhibitory population. The slight departure from perfect oscillations is aconsequence of the finite size of the system. The instantaneous activity ofindividual neurons is not periodic but fluctuates with time, as shown here(orange) for one of the excitatory neurons in the circuit. Figure 5

ence even after averaging over the stochastic noise is acooperative effect. Often it can be described in terms of afew nonlinearly coupled collective modes, such as theoverlaps of equation 5. Such time-dependent phases areeither periodic or chaotic. The attractor in the chaoticcase has a low dimensionality, like the attractors ofdynamical systems with a few degrees of freedom. Thetime-dependent phases represent an organized, coherenttemporal behavior of the system that can be harnessed toprocess temporal information. A phase characterized byperiodic motion is an example of the coherent temporalbehavior that asymmetric circuits may exhibit.

Figure 4 shows an example of an asymmetric circuitthat exhibits coherent temporal behavior. For appropri-ate sets of parameters, such a circuit exhibits a bifurcationat a critical value of T, such that its behavior is stationaryabove Tc and periodic below it, as shown in figure 5.Although the activities of single cells are fairly random insuch a circuit, the global activity—that is, the activity of amacroscopic part of the circuit—consists of coherentoscillations (see figure 5). The mechanism of oscillationsin the system is quite simple: The activity of theexcitatory neurons excites the inhibitory population,which then triggers a negative feedback that turns off the

excitatory neurons causing their activity. Such a mecha-nism for generation of periodic activity has been invokedto account for the existence of oscillations in many realnervous systems, including the cortex.

In general, the dynamical behavior should be morecomplex than the simple oscillations described above for itto be useful for interesting "computations." As in the caseof static patterns, appropriate learning rules must be usedto make sure that the complex dynamical patternsrepresent the desired computational properties. An inter-esting and important example of such learning rules arethose used for temporal association, in which the systemhas to reconstruct associatively a temporally orderedsequence of memories. Asymmetric circuits can representsuch a computation if their flows can be organized as atemporally ordered sequence of rapid transitions betweenquasistable states that represent the individual memories.One can generate and organize such dynamical patternsby introducing time delays into the synaptic responses.

In a simple model of temporal association the synapticmatrix is assumed to consist of two parts. One is thesymmetric Hebb matrix of equation 4, with synapses witha short response time. The quick response ensures thatthe patterns S'' are stable for short periods of time. The

PHYSICS TODAY DECEMBER 198S 7 7

other component encodes the temporal order of thememories according to the equation

(8)

where the index /i denotes the temporal order. Thesynaptic elements of this second component have adelayed response so that they do not disrupt the recall ofthe memories completely but induce transitions from thequasistable state S'' to S ' ' + ' . The composite local fields attime t are

where r is the delay time and A denotes the relativestrength of the delayed synaptic input. If A is smaller thana critical value Ac, of order 1, then all the memoriesremain stable. However, when A > Ac, the system will stayin each memory only for a time of order r and then be driv-en by the delayed inputs to the next memory. The flowwill terminate at the last memory in the sequence. If thememories are organized cyclically, that is, if Sp = S \ then,starting from a state close to one of the memories, thesystem will exhibit a periodic motion, passing through allthe memories in each period. The same principle can beused to embed several sequential or periodic flows in asingle network. It should be noted that the sharp delayused in equation 9 is not unique. A similar effect can beachieved by integrating the presynaptic activity over afinite integration time T.

Circuits similar to those described above have beenproposed as models for neural control of rhythmic motoroutputs.14 Synapses with different response times can alsobe used to form networks that recognize and classifytemporally ordered inputs, such as speech signals.Whether biological systems use synapses with differenttime courses to process temporal information is question-able. Perhaps a more realistic possibility is that effectivedelays in the propagation of neural activity are achievednot by direct synaptic delays but by the interposition ofadditional neurons in the circuit.

Learning, or exploring fhe space of synapsesSo far I have focused on the dynamics of the neurons andassumed that the synaptic connections and their strengthsare fixed in time. I now discuss some aspects of thelearning process, which determines the synaptic matrix.Learning is relatively simple in associative memory: Thetask is to organize the space of the states of the circuit incompact basins around the "exemplar" states that areknown a priori. But in most perception and recognitiontasks the relationship between the input (or initial) andthe output (or final) states is more complex. Simplelearning rules, such as the Hebb rules, are not known forthese tasks. In some cases iterative error-correctinglearning algorithms have been devised. Many of these

algorithms can be formulated in terms of an energyfunction defined on the configuration space of the synapticmatrices. Synaptic strengths converge to the valuesneeded for the desired computational capabilities whenthe energy function is minimized.

An illuminating example of this approach is itsimplementation for associative memory by the lateElizabeth Gardner and her coworkers in a series ofimportant studies of neural network theory.19 Instead ofusing the Hebb rules, let us consider a learning mode inwhich the J,/s are regarded as dynamical variablesobeying a relaxational dynamics with an appropriateenergy function. This energy is like a cost function thatembodies the set of constraints the synaptic matrices mustsatisfy. Configurations of connections that satisfy all theconstraints have zero energy. Otherwise, the energy ispositive and its value is a measure of the violation of theconstraints. An example of such a function is

(10)

where the /i',"s, defined as in equation 1, are the local fieldsof the memory state S'' and are thus linear functions of thesynaptic strengths.

Two interesting forms of V(x) are shown in figure 6,.In one case the synaptic matrix has zero energy if thegenerated local fields obey the constraint h'-S',' > K for all iand n, where /<• is a positive constant. For the particularcase of K—0 the condition reduces to the requirement thatall the memories be stable states of the neural dynamics.The other case represents the more stringent rquirementh','S'' = K, which means that not only are the memoriesstable but they also generate local fields with a givenmagnitude K. In both cases K is defined using thenormalization that the diagonal elements of the square ofthe synaptic matrix be unity.

One can use energy functions such as equation 10 inconjunction with an appropriate relaxational dynamics toconstruct interesting learning algorithms provided thatthe energy surface in the space of connections is not toorough. In the case of equation 10 there are no localminima of E besides the ground states for reasonablechoices of V, such as the ones described above. Hencesimple gradient-descent dynamics, in which each stepdecreases the energy function, is sufficient to guaranteeconvergence to the desired synaptic matrix, that is, onewith zero E, if such a matrix exists. Indeed, such agradient-descent dynamics with the V(x) as in the first ofthe two cases discussed above is similar to the perceptronlearning algorithm;3 the dynamics with the second choiceof V(x) is related to the Adaline learning algorithm.4

However, energy functions that are currently used forlearning in more complex computations are expected tohave complicated surfaces with many local minima, atleast in large networks. Hence the usefulness of applyingthem together with gradient-descent dynamics in large-scale problems is an important open problem.413


2.01

0.2 -

Capacity of a network of N neurons whenrandom memories are stored by minimizingthe energy function of equation 10. The linesmark the boundaries of the regions in the (K,O)plane where synaptic matrices having zeroenergy exist for the two choices (shown in theinset) of the energy function in equation 10.In one case (red) the energy functionconstrains the local fields of the memorystates to be bigger than some constant K; inthe other case (blue), the local fields areconstrained to be equal to K. (a is the ratiobetween the number p of stored memoriesand N. The capacity in the limit N-+<x> isshown.) The red line terminates at a = 2,implying that the maximum number ofrandomly chosen memories that can beembedded as stable states in a large network\sp = 2N. Figure 6

1.5

Formulating the problem of learning in terms ofenergy functions provides a useful framework for itstheoretical investigation. One can then use the powerfulmethods of equilibrium statistical mechanics to determinethe number and the statistical properties of connectionmatrices satisfying the set of imposed constraints. Forinstance, one can calculate the maximum number ofmemories that can be embedded using function 10 fordifferent values of K. The results for random memories areshown in figure 6. Valuable information concerning theentropy and other properties of the solutions has also beenderived.19

In spite of the great interest in learning strategies ofthe type described above, their usefulness as models forlearning in biology is questionable. To implement func-tion 10 using relaxational dynamics, for example, eitherall the patterns to be memorized must be presentedsimultaneously or, if they are presented successively,several, and often many, sessions of recycling through allof them are needed before they are learned. Furthermore,the separation in time between the learning phase and thecomputational, or recall, phase, is artificial from abiological perspective. Obviously, understanding the prin-ciples of learning in biological systems remains one of themajor challenges of the field.

Optimization using neural circuitsSeveral tasks in pattern recognition can be formulated asoptimization problems, in which one searches for a statethat is the global minimum of a cost function. In some in-teresting cases, the cost functions can be expressed asenergy functions of the form of equation 3, with appropri-ate choices of the couplings Jtj and the fields h°. In thisformulation, the optimization task is equivalent to theproblem of finding the ground state of a highly frustrated

Ising system. The mapping of optimization problems ontostatistical mechanical problems has stirred up consider-able research activity in both computer science andstatistical mechanics. Stochastic algorithms, known assimulated annealing, have been devised that mimic theannealing of physical systems by slow cooling.21 Inaddition, analytical methods from spin-glass theory havegenerated new results concerning the optimal values ofcost functions and how these values depend on the size ofthe problem.10

Hopfield and David Tank have proposed the use ofdeterministic analog neural circuits for solving optimiz-ation problems.9 In analog circuits the state of a neuron ischaracterized by a continuous variable S,, which can bethought of as analgous to the instantaneous firing rate of areal neuron. The dynamics of the circuits is given byAht /At = — dE/dS,, where h, is the local input to the tthneuron. The energy E contains, in addition to the terms inequation 3, local terms that ensure that the outputs S, areappropriate sigmoid functions of their inputs h,. As inmodels of associative memory, computation is achieved—that is, the optimal solution is obtained—by a convergenceof the dynamics to an energy minimum. However, inretrieving a memory one has partial information aboutthe desired state, and this implies that the initial state isin the proximity of that state. In optimization problemsone does not have a clue about the optimum configuration;one has to find the deepest valley starting from unbiasedconfigurations. It is thus not surprising that using two-state circuits and the conventional zero-temperaturesingle-spin-flip dynamics to solve these problems is asfutile as attempting to equilibrate a spin glass afterquenching it rapidly to a low temperature. On the otherhand, simulations of the analog circuit equations onseveral optimization problems, including small sizes of the


famous "traveling salesman" problem, yielded "good"solutions, typically in timescales on the order of a few timeconstants of the circuit. These solutions usually are notthe optimal solutions but are much better than thoseobtained by simple discrete algorithms.

What is the reason for the improved performance ofthe analog circuits? Obviously, there is nothing in thecircuit's dynamics, which is the same as gradient descent,that prevents convergence to a local minimum. Apparent-ly, the introduction of continuous degrees of freedomsmooths the energy surface, thereby eliminating many ofthe shallow local minima. However, the use of "contin-uous" neurons is by itself unlikely to modify significantlythe high energy barriers, which it takes changes in thestates of many neurons to overcome. In light of this, onemay question the advantage of using analog circuits tosolve large-scale, hard optimization problems. From thepoint of view of biological computation, however, arelevant question is whether the less-than-optimal solu-tions that these networks find are nonetheless acceptable.Other crucial unresolved questions are how the perfor-mance of these networks scales with the size of theproblem, and to what extent the performance depends onfine-tuning the circuit's parameters.

Neural network theory and biologyInterest in neural networks stems from practical as well astheoretical sources. Neural networks suggest novel archi-tectures for computing devices and new methods forlearning. However, the most important goal of neuralnetwork research is the advancement of the understand-ing of the nervous system. Whether neural networks ofthe types that are studied at present can computeanything better than conventional digital computers hasyet to be shown. But they are certainly indispensable astheoretical frameworks for understanding the operation ofreal, large neural systems. The impact of neural networkresearch on neuroscience has been marginal so far. Thisreflects, in part, the enormous gap between the present-day idealized models and the biological reality. It is alsonot clear to which level of organization in the nervoussystem these models apply. Should one consider the wholecortex, with its 10" or so neurons, as a giant neuralnetwork? Or is a single neuron perhaps a large network ofmany processing subunits?

Some physiological and anatomical considerationssuggest that cortical subunits of sizes on the order of 1mm3 and containing about 105 neurons might be consid-ered as relatively homogeneous, highly interconnectedfunctional networks. Such a subunit, however, cannot beregarded as an isolated dynamical system. It functions aspart of a larger system and is strongly influenced by inputsboth from sensory stimuli and from other parts of thecortex. Dynamical aspects pose additional problems. Forinstance, persistent changes in firing activities duringperformance of short-term-memory tasks have been mea-sured. This is consistent with the idea of computation byconvergence to an attractor. However, the large fluctu-ations in the observed activities and their relatively lowlevel are difficult to reconcile with simple-minded "conver-gence to a stable state." More generally, we lack criteriafor distinguishing between functionally important biologi-cal constraints and those that can be neglected. This isparticularly true for the dynamics. After all, the charac-teristic time of perception is in some cases about one-tenthof a second. This is only one hundred times the "micro-scopic" neural time constant, which is about 1 or 2 msec.

To make constructive bridges with experimentalneurobiology, neural network theorists will have to focusmore attention on architectural and dynamical features of

specific biological systems. This undoubtedly will alsogive rise to new ideas about the dynamics of neuralsystems and the ways in which it may be cultivated toperform computations. In the near future neural networktheories will hopefully make more predictions aboutbiological systems that will be concrete, nontrivial andsusceptible to experimental verification. Then the theo-rists will indeed be making a contribution to the unravel-ing of one of nature's biggest mysteries: the brain.

While preparing the article I enjoyed the kind hospitality of A T&TBell Labs. lam indebted to M. Abeles, P. Hohenberg, D. Kleinfeldand N. Rubin for their valuable comments on the manuscript. Myresearch on neural networks has been partially supported by theFund for Basic Research, administered by the Israeli Academy ofScience and Humanities, and by the USA-Israel BinationalScience Foundation.

References1. W. S. McCulloch, W. A. Pitts, Bull. Math. Biophys. 5, 115

(1943).2. D. O. Hebb, The Organization of Behavior, Wiley, New York

(1949).3. F. Rosenblatt, Principles of Neurodynamics, Spartan, Wash-

ington, D. C. (1961). M. Minsky, S. Papert, Perceptrons, MITP., Cambridge, Mass. (1988).

4. B. Widrow, in Self-Organizing Systems, M. C. Yovits, G. T.Jacobi, G. D. Goldstein, eds., Spartan, Washington, D. C.(1962).

5. S. Amari, K. Maginu, Neural Networks 1, 63 (1988), andreferences therein.

6. S. Grossberg, Neural Networks 1, 17 (1988), and referencestherein.

7. T. Kohonen, Self Organization and Associative Memory,Springer-Verlag, Berlin (1984). T. Kohonen, Neural Net-works 1, 3 (1988).

8. W. A. Little, Math. Biosci. 19, 101 (1974). W. A. Little, G. L.Shaw, Math. Biosci. 39, 281 (1978).

9. J. J. Hopfield, Proc. Natl. Acad. Sci. USA 79,2554 (1982). J. J.Hopfield, D. W. Tank, Science 233, 625 (1986), and referencestherein.

10. M. Mezard, G. Parisi, M. A. Virasoro, Spin Glass Theory andBeyond, World Scientific, Singapore (1987). K. Binder, A. P.Young, Rev. Mod. Phys. 58, 801 (1986).

11. V. Braitenberg, in Brain Theory, G. Palm, A. Aertsen, eds.,Springer-Verlag, Berlin (1986), p. 81.

12. K. Binder, ed., Applications of the Monte Carlo Methods inStatistical Physics, Springer-Verlag, Berlin (1984).

13. D. E. Rumelhart, J. L. McClell and the PDP Group, ParallelDistributed Processing, MIT P., Cambridge, Mass. (1986).

14. D. Kleinfeld, H. Sompolinsky, Biophys. Jour., in press, andreferences therein.

15. S. R. Kelso, A. H. Ganong, T. H. Brown, Proc. Natl. Acad. Sci.USA 83, 5326 (1986). G. V. diPrisco, Prog. Neurobiol. 22 89(1984).

16. D. J. Amit, H. Gutfreund, H. Sompolinsky, Phys. Rev. A 32,1007 (1985). D. J. Amit, H. Gutfreund, H. Sompolinsky, Ann.Phys. N. Y. 173, 30 (1987), and references therein.

17. M. Abeles, Local Cortical Circuits, Springer-Verlag Berlin(1982).

18. D. J. Willshaw, O. P. Buneman, H. C. Longuet-Higgins, Na-ture 222, 960 (1969). A. Moopen, J. Lambe, P. Thakoor, IEEESMC 17, 325 (1987).

19. E. Gardner, J. Phys. A 21, 257 (1988). A. D. Bruce, A. Can-ning, B. Forrest, E. Gardner, D. J. Wallace, in Neural Net-works for Computing, J. S. Denker, ed., AIP, New York (1986),p. 65.

20. A. Crisanti, H. Sompolinsky, Phys. Rev. A 37, 4865 (1988).21. S. Kirkpatrick, C. D. Gellat Jr, M. P. Vecchi, Science 220 671

(1983). •


Documents

STATISTICAL MECHANICS OF NEURAL NETWORKS