75
IUniversity of Pennsylvania The Moore School of Electrical Engineering 200 South 33rd Street (Y Philadelphia, PA 19104-6390 ! OPTICAL COMPUTING BASED ON KNEURONAL MODELS Final Report E, I E " " JUL 2 L .Prepared by Nabil H. Farhat L For Naval Research Laboratory 4555 Overlook Avenue, S.W. Washington, D.C. 20375 Attn: Code 5709 Under Grant No. N00014-85-K-203 6 May 1986 EO/MO Report No. 14 Pr~od: 9/15/85 - 2/15/88 iV -' .. . . .,,.... . ...... . . . A - . . - - . &... -- _,,A

IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

IUniversity of Pennsylvania

The Moore School of Electrical Engineering200 South 33rd Street

(Y Philadelphia, PA 19104-6390

!OPTICAL COMPUTING BASED ON

KNEURONAL MODELS

Final Report

E, I E " "

JUL 2 L .Prepared by

Nabil H. FarhatL

ForNaval Research Laboratory4555 Overlook Avenue, S.W.Washington, D.C. 20375

Attn: Code 5709

Under

Grant No. N00014-85-K-2036

May 1986

EO/MO Report No. 14Pr~od: 9/15/85 - 2/15/88

iV

-' .. . . .,,.... . ...... . . . A - . .- - . &... -- _,,A

Page 2: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

University of PennsylvaniaThe Moore School of Electrical Engineering

200 South 33rd StreetPhiladelphia, PA 19104-6390

OPTICAL COMPUTING BASED ONNEURONAL MODELS

Final Report

Prepared by

Nabil H. Farhat

ForNaval Research LaboratoryTIC

4555 Overlook Avenue, S.W.Washington, D.C. 20375

Attn: Code 5709

Under

Grant No. N00014-85-K-2036

May 1988 ..

EO/MO Report No. 14Period: 9/15/85 - 2/15/88

Page 3: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

TABLE OF CONTENTS

Page

Distribution List ...... .................... .... ii

1. Introduction ....... ...................... .I... 1

2. Research Accomplishments ........ ................ 8

3. Conplusions ....... ....................... .i.i... 11

4. References ....... ....................... .... 13

5. Publications and Other Activities .. ............ ... 14

6. Appendices ....... ....................... .... 17

I. Optical Implementation of Associative MemoryBased on Models of Neural Networks

II. Architectures for Opto-Electronic Analogsof Self-Organizing Neural Networks

III. Opto-Electronic Analogs of Self-ProgrammingNeural Nets: Architectures and Methodologiesfor Implementing Stochastic Learning bySimulated Annealing

IV. Phased-Array Antenna Pattern Synthesis bySimulated Annealing

V. Architectures ano Methodologies for Self-Organization and Stonhastic Learning inOpto-Electronic Analogs of Neural Nets

VI. Bimodal Stochastic Optical Learning Machine

'A

Page 4: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

REPORT DISTRIBUTION LIST

Copies

Naval Research Laboratory4555 Overlook Ave., S.W.Washington, D.C. 20375Attn: Code 5709

ONRRNational Academy of Sciences2100 Pennsylvania Ave., N.W.Washington, D.C. 20037-3202

Director Naval Research Laboratory 63555 Overlook Ave., S.W.Washington, D.C. 20375Attn: Code 2627

Defense Technical Information Center 12Bldg. 5, Cameron StationAlexandria, VA 22314

Program Management Office1400 Wilson Blvd.Arlington, VA 22209-2308Attn: Dr. John Neff

I

Page 5: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

OPTICAL COMPUTING BASED ON NEURONAL MODELS

1. INTRODUCTION

The ultimate goal of the research work carried out under this grant is

understanding the computational algorithms used by the nervous system and

development of systems that emulate, match, or surpass in their performance

the computational power of biological brain. Tasks such as seeing, hearing,

touch, walking, and cognition are far too complex for existing sequential

digital computers. Therefore new architectures, hardware, and algorithms

modeled after neural circuits must be considereo in order to deal with real-

world problems.

Neural net models and their analogs represent a new approach to

collective signal processing that is robust, fault tolerant and can be

extremely fast. These properties stem directly from the massive

interconnectivity of neurons (the logic elements) in the brain and their

ability to perform many-to-one mappings with varied degree of nonlinearity

and to store information as weights of the links between them, i.e., their

synaptic interconnections, in a distributed non-localized manner. As a

result signal processing tasks such as nearest neighbor searches in

associative memory can be performed in time durations equal to a few time

constants of the decision making elements, the neurons, of the net. We note

that the switching time-constant of a biological neuron is of the order of a

few milliseconds. Artificial neurons (electronic or opto-electronic

decision making plements) c"n be dc t7 - tt tcou~a' toa nilior ''-

faster. Artificial neural nets can therefore be expected to function for

example as content addressable associative memory or to perform complex

computational tasks such as combinatorial optimization and minimization

-1-A

Page 6: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

which are encountered in self-organization and learning (self programming),

computational vision, imaging, inverse scattering, super-resolution and

automated recognition from partial, (sketchy) information, extremely fast in

a time scale Lhat exceeds by far the capability of even the most powerful

present day serial computer. For Lhese reasons electronic and opto-

electronic analogs and implementations of neural nets are attracting today

considerable attention. The optics in the opto-electronic implementations

provide the needed parallelism and massive interconnectivity while the

decision making elements are realized electronically heralding an ultimate

marriage of VLSI and optics. It should be kept in mind however that

research and advances in optical bistability devices (OBDs) and nonlinear

optics and optical materials, promise to furnish also all-optical decision

making elements and eventually neural nets in which both the

interconnections and decision making are performed optically with the

electronics being used only for control and assessment of the state of the

net. The combination of optics and electronics and the potential for

exploiting advances in opto-electronic components ana materials (for

example; nonvolatile spatial light modulators for realizing programmable

synaptic or interconnectivity masks (plasticity) and OBDs for decision

making) promise also that embodiments of neural nets can be compact and have

low power consumption. Such embodiments, being primarily.analog, are

leading to rekindling of interest in analog computation whose development

has been curtailed by the explosive progress in digital computing.

In associative memory applications, the strength of interconnection

between the "neurons" of the net is determined by the entities one wishes to

store in the net. Usually these entities need to be in the form of

uncorrelated binary representations of the original data. Specific storage

-2-

Page 7: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

"recipes" based on a Hebbian model of learning (outer-product storage

algorithm), or variations thereof, are employed then to first calculate the

connectivity matrix then set the weights of links between neurons

accordingly. In this sense the memory is explicitly programmed i.e. taught

what it should know and should be cognizant of. This mode of programming a

net is sometimes called hard learning. What is most intriguing however, is

that neural nets can also be made to be self-organizing and learning i.e.,

to become self-programming (soft learning) through a process of automated

connectivity weight modification driven by the entities presented to them

for learning. This alleviates one of the major constraints of neural nets:

programming complexity, and makes them a much more attractive and powerful

tool for neuromorphic signal and knowledge processing. The combination of

neural nets, Boltzmann machines, and simulated annealing concepts with high

speed opto-electronic implementations promise, as demonstrated by research

carried out under this grant (L1]-[2] and this report), to produce high-

speed artificial neural net processors with stochastic rules for decision

making and state update that can form their own internal representations

(connectivity weights) of outside world data they are presented with,

regardless whether the data is correlated or not, in a manner very analogous

to the way the brain is believed to form its own symbolic representation of

reality. This is an exciting prospect and has far reaching implications for

smart sensing and recognition, and artificial intelligence as a whole. The

use of noise in stochastic learning can shed light on the way nature has

managed to turn noise present in biological neural nets to work to its

advantage and makes stochastic learning, as opposed to deterministic

learning schemes more biologically plausible. Such learning is

probabilistic in nature aimed at capturing the probability density function

-3-

. . . .... , ,,-d, ... d w 6w~mmm,., ., . m,

. . .. - A

Page 8: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

of the environmental representation the net is exposed to. Probabilistic

learning is therefore naturally compatible with real environmental

representations that are fuzzy in nature. E.,ploratory work at the

University of Pennsylvania is showing that optics can play an important role

in the implementation and speeding up of adaptive learning algorithm, such

as the simulated annealing and the error back-propagation algorithms, in

such self-organizing nets and can lead to their use in automated robust

recognition of entities the nets have had a chance to learn earlier either

with or without the aid of a teacher (supervised or unsupervised learning)

by repeated exposure to them when the net is in its learning mode. One can

envision modules of such self-teaching neural nets trained to recognize and

create symbols of certain features found in natural scenes, patterns or

other input signals. Such modules could be used collectively for higher

level processing where their output symbols are fused to form better or more

reliable interpretation or assessment of the environmental input. The

implication of this for autonomous systems are obvious but the achievement

of such scenarios requires further concerted research.

Learning in neural nets is not rote but involves generalization, i.e.

the net can recognize an input as a member of a class of entities it became

familiar with earlier even though that specific input was not specifically

among those shown to it earlier. This property can be extremely useful for

accelerating teaching sessions in that one need not think of and present to

the learning net all possible associations it is supposed to recognize in

order to make it useful. This property relegates however a degree of

decision making to the net perhaps beyond what we are ordinarily accustomed

to in signal processing systems. Thorough understanding of learning

processes and how a network generalizes is therefore desired to alleviate

-4-

Page 9: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

apprehensions and uncertainties stemming from the inclusion of "tninkin

networks" in man made systems that share with him the decision making

process. Such understanding can be realized only through iusights gained

normally by theoretical analysis and with software and hardware simulation

tools. Being highly nonlinear, neural nets (as for example in higher

cortical areas), and their models are often difficult to analyze. Numerical

simulation of neural nets, even relatively small multilayered self-

organizing nets, are proving to be computationally too intensive and

therefore unacceptably time consuming which is hindering progress in the

field. It is for this reason that analog systems in which neural net

behavior can be modeled and studied dynamically at speeds that can be

several orders of magnitude faster than in numerical simulation are an

important component of our ongoing studies and the future research

directions stemming from it. This work is pointing towards neural nets as

nonlinear dynamical systems that are characterized by their phase space

behavior and concepts of attractors, chaos and fractal dimensions. This

will in our opinion provide an infusion of powerful concepts of

nonlinearity, collective behavior, and iterative processing into optical

processing and artificial neurodynamical systems.

Another intriguing promise of neural nets is their ability to store and

retrieve information in a sequential or cyclic manner where a chain of

entities can be stored and recalled in a hetero-associative sequential or

cyclic fashion. This can provide a crude but simple way for forming,

shaping, and controlling the limit cycle (trajectory) of a neural net in its

phase-space. This property together with that of generalization, mentioned

earlier, are important for work in pattern recognition in general and are

being intensively studied at our Electro-Optics and Microwave Optics

-5-

Page 10: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

Laboratory in the context of distortionless radar target recogniticn as

described in earlier work (see references listed in this report and in L]and r2]. The results of this work are expected to be general and would be

beneficial to active and passive machine vision

Being highly nonlinear neural nets possess complex, rich phase-spac e

behavior that exhibits or is in principle capable of exhibiting the

following general featuLes of nonlinear systems:

o Fixed points or limit points in phase-space that act as attractors

with prescribed basins of attraction that are formed explicitly

(hard learning) or implicitly (soft learning) by Hebbian based

rules.

o Fixed limit cycles or closed trajectories in phase-space that act

also as attractors.

o Fixed open trajectories that act as attractors.

o Modification and control of fixed limit points, limit cycles and

open trajectories by external and/or contextual input or by

adaptive thresholding.

Bifurcation and chaotic behavior.

Neuromorphic signal and knowledge processing systems (whether optical,

electronic, or opto-electronic) must be able to draw upon and make use of

these features to achieve powerful signal processing functions. Such

functions include:

Machines that utilize active illumination to discern and perceive theenvironment or utilize natural scene illumination or emission for the samepurpose.

"'°A" " -- -- '-"ii "-.=Im..= =ll hil ==-6'

Page 11: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

o Nearest neighbor searches

o Combinatorial optimization by minimization of cost-functions

o Solution of ill-posed problems of the kind encountered in

vision, remote sensing, and inverse scattering

" Feature extraction: self-organization, learning and self-

programing

o Generalization

o Sequential and cyclic retention and recall

o Higher order or more complex computations in phase-space. (e.g.

spoken language processing)

All the above are issues that provide motivation to our neural net and

opto-electronic implementation research, both current and future.

The ultimate realization of neuromorphic systems for wide use in signal

processing applicaticns is not a trivial task. It requires vigorous

research and development in three primary areas: neuroscience - to increase

cur understanding of the anatomical, physiological, and biochemical

properties, and function of neural tissue (neural net:) in order to identify

those attributes that might help in their modeling and that can be usefully

applied in artificial systems; the study of opto-electronic architectures

and implementations, and vigorous device development based on advances in

linear and nonlinear optical materials for efficient implementation of

programmable synaptic weights (artificial plasticity) and sensitive optical

decision making elements capable of performing at lower threshold then

present day devices. Thus synergisim between a triad of research

activities: neuroscience; mathematical modeling and analysis coupled with

architectures, implementations, and programming; and material research s

-7-

Page 12: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

called for. Our future research in neurodynamics will continue to te

influenced by developments in these fields.

2. RESEARCH ACCOMPLISHMENTS

The first parts of our research program under this grant were concerned

with the modeling, simulation, and implementation of fully interconnected

neural networks and were reported upon in detail in two previous annual

reports F1],[2]. Work with fully interconnected nets and spcifioally a

comparison of inner product versus outer product schemes for associative

storage and recall (see Appendix III) revealed to us that one of the most

distinctive property of neural nets that is worth considering in our

research efforts is self-organization and learning. Sel-organization

(adaptivity) and learning seems to be what sets neural net processing apart

from other approaches to signal processing. Hence our efforts have since

been more concerned with learning ana self-programmability in neural nets.

!earning rerLuires layered nets in which one can clearly distinguish input,

output, and hidden (buffer) groups of neurons with proscribed communication

patterns among them. Such nets are hence non-fully interconnected. To this

end we have devised a scheme for partitioning existing opto-electroni,

vector-matrix multiplier architectures into any number of desired layers

(see Appendices !I and III). Learning requires plasticity, i.e. modifiable

weights of connections between neurons. In our work such plasticity is

achieved primarily through the use of programmable nonvolatile spatial light

modulators (SLMs) such as the magneto-optic SLM (MOSLM). We have devised a

Nonvolatile ferroelectric liquid crystal SLM can also be used. Thesehowever are not available commercially yet.

AM-_-

Page 13: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

- . -- -UW I - U- U U -

new scheme for driving a commercially available 48x43 element '.SLY ,t a

frame refresh time of I msec demonstrating thereby that s~naptic

modification in a 24 binary neuron net constructed around this MOSLM can

take place if desired in a time period as short as 1 msec.

From the outset we have concentrated our effort on stochastic rather

than deterministic learning for several reasons. Stochastic or

probabilistic learning is more compatible with the uncertainty of most

environments in which learning neural nets are expected to operate. 7n

addition operation and study of stochastic neural nets could shed some light

on the way nature has harnessed noise present in biological neural nets to

work to its advantage. Learning in stochastic neural nets involves finding

the global minimum of an energy function associated with the network by

introducing uncertainty in the state update rule of neurons in the net.

Conventionally this is done by a simulated annealing algorithm in a context

of a Boltzmann machine that was devised to be carried out on a serial

computer and f- frequently used in the solution of combinatorialI

optimization problems. Software implementation of the process however is

time consuming. For example finding the optimal wiring layout for a typical

IC chip might take 24 hours on mainframe computer. We have ceveloped

therefore a method for accelerating the annealing time in an opto-electronic

stochastic neural net that can be several orders of magnitude faster than

serial digital methods. As a result, stochastic learning in such nets can

be speeded-up by the same factor. The method involves the use of noisy

thresholding of the neurons iu the net which introduces controlled shaking

of the energy landscape of the net and prevents the net from getting trapped

in a state of local energy minimum improving thereby its chances of finding

the ground statc (global minimum) or one close to it. By introducing the

-9-

Page 14: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

noise in the net in bursts of decaying magnitude the chances of converging

onto a low-lying energy state and staying in it are enhanced considerably.

(Such controlled annealing profiles or annealing schedules are also useful

in stochastic learning with binary weight to be described later.) The

results of numerical and experimental simulations (see Appendix V) show that

the noisy thresholding scheme is quite effective. We used the noisy

prototype thresholding scheme in a network of 16 neurons with random bipolar

binary weights implemented in opto-electronic hardware. The results show

that the net can find the ground state in 35T where T is the time constant

of the neurons in the net. This means that for a net with neurons of 7 =

1%sec response time the net can be annealed in 35 Psec and this is

independent of the number of neurons in the net as noise in our scheme is

injected optically onto all neurons simultaneously by projecting a portion

of the snow pattern appearing on an open channel T.V. receiver onto the

photodetector array segment of the opto-electronic neural net.

The pixel transmittance of the MOSLM mentioned earlier is binary. Most

known learning algorithms require small incremental changes in the

connection strengths between neurons of the net. This means multivalued

weights are necessary precluding the use of MOSLMs despite their highly

desirable nonvolatile nature (storage capability). Small incremental

changes in the weights is a requisite for convergence of the learning

algorithm. To overcome this limitation we have devised a method for

stochastic learning with binary weights. The method combines multiple

time-constant annealing bursts and dead-zone limiting as detailed in

Appendix VI.

Fast annealing by noisy thresholding and stochastic learning with

binary weights are significant developments that enabled successful

-10-- -I0

Page 15: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

operation recently in our work of the first bimodal stochastic optical

learning machine as detailed (see Appendix VI). The machine consists of 24

unipolar binary neurons, 24x24 bipolar binary connectivity mask implemented in

a 48x48 computer controlled MOSLM, and LED and photodetector arrays with

associated thresholding amplifiers and LED drivers for the neurons themselves.

Preliminary results of the learning capabilities show that the net can learn a

set of 3 associations in a time interval ranging between 10 minutes to 60

minutes with relatively slow (60 msec) neurons. Preliminary results are shown

in Fig. 1. Slow neurons were chosen deliberately to permit visual observation

of the evolving state vector of the net as represented by the LED array of the

net during the various stages of learning.

3. CONCLUSIONS

Research effort under this grant has led to the demonstration of the

first stochastic opto-electronic learning machine employing fast annealing by

noisy thresholding and stochastic learning with binary weights. The prototype

machine of 24 neurons now operational in our laboratory provides a valuable

vehicle for studying the dynamics of stochastic neural nets. As such, the net

can be viewed as an opto-electronic analog computer that can perform iterative

mappings, do stochastic searches of the energy landscape, self-organize and

learn, and act as associative memory after learning is completed. We will

continue our studies of this and larger versions of the machine (a subject for

renewal proposal under preparation) in order to gan better understanding of

the behavior of such machines as artificial neurodynamical systems and explore

a host of intriguing applications involving solution of combinatorial

optimization problems of the kind encountered in vision, remote sensing and

inverse scattering.

-11 -

Page 16: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

Y u

U >1

Y-1

--

oooc~" _ - '

- - -dp

40910 -o 0aun

Page 17: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

4. REFERENCES

1. N. Farhat, "Optical Computing Based on the Hopfield Model for NeuralNets," University of Pennsylvania Report No. EO/MO 10, prepared forNRL, May 1986.

2. N. Farhat, "Optical Computing Based on ieuronal Models," Universityof Pennsylvania Report No. EO/MO 12, October 1987.

A-13

Page 18: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

5. PUBLICATIONS AND OTHER ACTIVITIES

Following is a list of journal and conference proceeding publicationsdescribing research carried out under this grant:

1. D. Psaltis and N. Farhat, "A new approach to optical informationprocessing based on the Hopfield Model," in Digest of the ThirteenthCongress of the International Commission on Optics, (ICO-13),Sapporo, Japan (1984), p. 24. (pre-grant publication).

2. D. Psaltis and N. Farhat, "Optical information processing based onan associative-memory model of neural nets with thresholding andfeedback," Opt. Lett., 10, 98 (1985).

3. N.H. Farhat and D. Psaltis, A. Prata, and E. Paek, "Opticalimplementation of the Hopfield Model," Appl. Opt., 24, 1469 (1985),pp. 1469-1475.

4. N.H. Farhat and D. Psaltis, "Architectures for opticalimplementation of 2-D content addressable memories," in TechnicalDigest, Optical Society of America Annual Meeting (Optical Societyof America, Washington, DC, 1985), paper WT3.

5. K.S. Lee and N.H. Farhat, "Content addressable memory with smoothtransition and adaptive thresholding," in Technical Digest, OpticalSociety of America Annual Meeting (Optical Society of America,Washington, DC, 1985), paper WJ35.

6. N. Farhat, S. Miyahara and K.S. Lee, "Optical implementation of 2-Dneural nets and their application in recognition of radar targets,"in Neural Networks for Computing, J.S. Denker, Ed. (AmericanInstitute of Physics, New York), pp. 146-152.

7. N. Farhat and D. Psaltis, "Optical implementation of associativememory based on models of neural networks," in Otical SignalProcessing, J.L. Horner, Ed. (Academic, New York, 1987), pp. 129-

, 162.

8. N.H. Farhat, "Architectures for opto-electronic analogs of self-organizing neural networks," Opt. Lett. 12, 448 (1987), pp. 448-450.

9. N.H. Farhat and B. Bai, "Phased array antenna pattern synthesis bysimulated annealing," Proc. IEEE (Letters), 75 June 1987, p. 842-844.

10. N.H. Farhat, "Opto-electronic analogs of self-programming neuralnets: architectures and methodologies for implementing stochasticlearning by simulated annealing," App. Optics, 26, Dec. 1987, pp.5093-5103.

11. N.H. Farhat and Z.Y. Shae, "Architectures and methodologies forself-organization and stochastic learning in opto-electronic analogsof neural nets," Proc. of the IEEE First Annual InternationalConference on Neural Networks, June 1987, (IEEE Cat. No. 87TH0191-7).

-14-

.., ..... . II.. I - .AN* im '--' l. I

Page 19: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

12. N.H. Farhat and Z.Y. Shae, "Bimodal stochastic optical learningmachine," (submitted)(see preprint in Appendix VI).

13. N.H. Farnat and Z.Y. Shae, "Methods for accelerating the frame-rateof magneto-optic spatial light modulators," (submitted).

Degrees Awarded:

1. S. Miyahara, "Automated radar target recognition based on models ofneural nets," Univ. of Pennsylvania, Ph.D. Dissertation, 1987.

2. K.S. Lee, "A new approach to optical information processing based onneural networks models with application to object recognition,"Univ. of Pennsylvania, Ph.D. Dissertation, 1987.

Patent Disclosure:

Super-resolution - Patent disclosure filed April 9, 1987 on behalf of theUniversity of Pennsylvania by University Patent Inc., 1465 Post RoadEast, P.O. Box 901, Westport, CT. 06881.

Invited Talks and Presentations:

In addition to those listed in earlier annual reports (see refs [1] and[2]), the following invited talks were presented by N. Farhat in the periodNov. 1, 1987 to May 30, 1988.

1. "Collective nonlinear optical processing based on models of neuralnetworks," Battelle Memorial Institute, Columbus Division, Nov. 27,1987,

2. Super-resolved recognition of radar signal targets based on modelsof neural networks," IEEE Philadelphia- Section, February 16, 1988.

3. Self-organization and learning in neural net analogs," PhiladelphiaIEEE Circuits and Systems Chapter, March 2, 1983.

4. Stochastic neural nets," Long Island IEEE Computer Society, March24, 1988.

5. "Stochastic optical learning machines," Columbia University, May 4,1988.

--'-- - ----. alm., mnm~mal.,u ,, m , dh~ mmm , .

Page 20: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

Workshop Participation:

N. Farhat participated in the following workshops:

NATO Advanced Study Institute, "Electromagnetic Modeling and Measurementsfor Analysis and Synthesis Problems," Ii Ciocco, Tuscany, Italy, Aug. 10-21, 1987.

JPL Workshop on Neural Network Devices and Applications, Jet PropulsionLaboratory, Pasadena, Feb. 18-19, 1987.

DARPA/Lincoln Lab Review Panels.Optical Neural Nets, Caltech, Pasadena, Dec. 14, 1987Applications, Bedford, MA., Jan. 22, 1988

NSF/ONR "Workshop on Hardware Implementation of Neuron Nets andSynapses," San Diego, CA., Jan. 13-15, 1988.

"ARO Workshop on Submillimeter Wave Imaging," Breckenridge, Colo.,February 2-4, 1988.

"Neural Networks for Computing Conference," Snowbird, Utah, April 1988.

-16-

Page 21: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

6. APPENDICES

I. Optical Implementation of Associative MemoryBased on Models of Neural Networks

II. Architectures for Optoelectronic Analogs ofSelf-Organizing Neural Networks

III. Optoelectronic Analogs of Self-Programming Neural.4ets: Architecture and Methodologies for ImplementingFast Stochastic Learning by Simulated Annealing

IV. Phased-Array Antenna Pattern Synthesis bySimulated Annealing

V. Architectures and Methodologies for Self-Organizationand Stochastic Learning in Opto-Electronic Analogsof Neural Nets

VI. Biomodal Stochastic Optical Learning Machine

-17-

Page 22: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

Appendix I

~ a7

-C E-o -V v

ooE

z. -

C C -

- z-

c --

40 E

atJ Sm - a 0 wC

w *w 00

-o cc C

CL

a

> a- -

Page 23: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

; , Z '=-

m ,>

.C eN~ -~~- Uu C ~*U *

41 Z C - >

-~o u v3~ X .u

> 00 ODuac j - - c- 4

0o 0~ z c 'm

C.U C. ->

~ -~ -

Cd .0 Z 7 - U0

ZZ

- ~ ~ ~ ~ 3 .. u r-~o >

A' 440U

3 Ji *- --

-7s

7- - zo C. - C

-o L v --a

S m

-0 C- 20

7S ~ 'U _

-= - Uz JC0*-.U~~-- 0.3>* ~ ~ O -

-~ u-~ - - -U - U 0

- ....---- 37'

Page 24: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

- - - >~ .,'..- r - - - -. -j t

- 1~.

- 1.. - -

- A - .X - -- - - - - 0. - - -

- A A.- - -3 - -.'.0 ~

A 306. - - .1

A~- ~6.-3 - -

0.0 -- 0. A

o-~ -~ A'~ .J ~

6A.0 3J ~.j 2 --

- - - - ~S .3 ~.0 .3 --- - ~ o~ - - A j" - ~- 3 ~J -

'- -~ ~I - - -- A~

- A.0~J30 A~1J tJ .3

~ - ~ti .0 ~

.3~ - -. ~-~- .3- ~j 0... .3 >~>~J - A -

.0 6. ~ -. 0-30- .0.0 - 30

.05 - 0 J 0. A A

.3.3 ~ ~ .. ~ 0-~ 2 6 .~A .-00 - - 0.~ ~-i ~j6. A - >~3J ~ >~ .0

,.0 A .3

'A.0A U- U

Z ~ 2 -- UU A~ U- - -1) 0-~ - -

~ 2 ~-~* 03j A - -.30 3 ~ -~ -- U

~ ~ ~ A3J.3.

- - b-~0. - -Z '.3 - U~ - - - - -

- .3~ - U- o 0. , ..~ - -, .~ - U

- A A -~ 'A 0 >~ - z - -

.0~ U ~0--....- U A>~**~A 2

U- U 0. 300.'**** 0.0 Z -- -- -0 *~ ~

- A A ~ - ~ 30~ - .0

A 30-0 U - U .0~* - - - . U

- U .

-~ o.5- -

~0. .- .3- ~ 0

U0~~U~' U -- > U -~ -'

0U0- jU .~U ~ 0'~-~ -

A A U ~UA0~.' - - - U- UA - A - ~.3 - - - -- U-U~ '~ -- - . -- 0 -U ;- U - - - - -: -

~A~ -U~ U.0 ~U -

* 0.- - - ~ ~.j -- -- 3-- ~ A ?~ - -

-3- A -- 6.~ ~0.A -~ ~J

- _ - A - .0 - - -~ UA..3 A- -- ~-.0' -

0~~ Z -. 0 U -0~ .. -U 0. -~ . - - - 2~ .0-S

.0- - - ~ >.-0 . - N -

U I - U U U- u A - .3- >~

'3-Z -- U U U - ~ A - A

- A~, ~J

02USU~U A '3 U

- U ~ - -- .0 - -

... 0.3 0. ~o2 XA - ~

- A >~A ~ - A.0 6. ~ ~ ~

AA ~ I .3).~UU.3 0. -0 ~

- A .0~ ~.o ~. - -

<>~U....Z - -o U

Z A OA ~ ~ 'I ~ 'OAU0.3 ~j U~ U ~ U 0 *3.3 A . .30

U a~ - ~U --. 0 ~ ~ o.a~ - ~.U-

.30..3 .3 ~ - 'AO.3j2 -__________ U .3 ~ A -~ .

0.0. 0 A.0,.

00~2%~ ~A30A0* ~U 2

.3 30.3.0 ~U - A2.0~ ~ I .0~ -U

0.0

~'3 0~~A ~ -- ~ 0.

U.0~0.~ ~ 0.30A - > - A .02 2

U ~ ~ --. 0 A _ 30~ .3.3 -

A ~ EAE0~>.0 ~ U 0- - - A A .0- U - -: Z

~-- 0.U - A U - 0 - - U

.3 A - A ~.. j~ ~ U, ~.0 .0 .J< ~

U A

Page 25: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

ILAL

M -o

> >20 v

.33

-C iu1~C z.~

A . . .- U -. -o

X - Z ;10

u 30

7E C

.a a U3 . zid '4

.3 U ~ ~ >

L5 M

U A_

Page 26: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

LJ~~~ ~~ ~~ aiU -- -- - - - ~ --

- -C- --. - - -- ~- - - - " - -- J * - ,

-~~a < - -zAd -0 7 --

- ~ -- - - -- U ~ - .- - -

- - -- 2Z- t ' A - - j--

-f. L- M - ~ . J -

_ =' U. - - U:'-- - - - - .'- U- - - - -to

- U

.3 -U 1JU - - --.- c *~ S -

- - . U~- % U -- 4, - -

U U ' '3>~ - ' ~ U

- - U... - : U U , - :0~j

z- -3

S z

p -U --

'3 -20U

iJ U- ' *~ -o-J - -- >,~ 4 - -< ~, .

50 r - - .~;*- - -

50 ~ < *3* -:

Page 27: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

b b'

0 I 0 0 - 2 2 0 2 0 4 0 4 0 0 -2 2 1 (1 -4 0

0 1 - 0 -: 0 0 0 - 0 2 - 2 - 4 0 0-o1 -0 -1 2 4 -2 0 0 4 -4 0 -- 0 0 6

I - 0 2 0 0 0 2 0 0 - - -0-4 -

o i o 0 . 0 -: 0 0 -4 - 2 - 2 200 -

I 0 t- 0 2 0 -4 0 2 0 : -: -: o 0 4 -2

1 0 0 1 0 - 4 :2-: 2 0 : 0 4 0 -4 0- - 0 0 0

1 1 0 2 0-: 0 0 0 -: 0- 21 - 2 2 21 0 0 0- : :

0 0 0 - -2 : 2 0- 0 0 0 0 0 -4 2 2 4 0 -41 4- 0 - Z2 0 2 0 0 0 4 0 0- 2 2 0 -4 0)

1 0 0 1 0 -: 4 2-: 2 4 -2 0 0 0 0 -4 0--2 0 0 0Z 0 4 -2 - - 0 2 0 4 0 0 0 0- 2 2 0 -4 0

I 0 0 2 -4 -2 2 -2 -4 0 0 -4 0 0 0 2 -2 0 0 0

I .00- 0 21 2 -: 0 2 -4 0 0 0 0 0-2-2-2 -4 0 40 1-1 4 -2 0 0 0 -2 0 2 : -2 - 2 2 0 0 0 2 -

0 1 0 2 0 -2 -4 0 0 -2 0 2 2 - 2 - 0 0 0 2 - -

o I0 0 0-4 4 : 0 2 2 2 - -2 0 0 0 -: -

0 I0 0 0-- 2. 0- 4 0 0 0 0 -4 2 2 Z 0 01-4

fi 0 1-4 2 0 2 -2 0- 0 -4 0 -4 0 0 2-: -2: 0 0 00 002 0 2: 0 2 -4 0 0 0 0 4 -2 -2-2-4 0 0

C 0 0 0 0 0 0 0 0 0 0 0 Paral inputi b--: k, 4 -8 4 t, -4 2 - 10 - 10 -t -So - : Is~tm t

1, 0 i I C, I (I 1 0 0 0 ( 0 0 1 I 0 1 st Thresholding I teration0 1 0 1 0 1 0 0 0 1 0 1 0 0 2ndIteration

0 1 0 0 0 1 0 1 1 c' Q rd Iterationi Stable

0 1' I I 0 0 0 1 0 1 1 0 4thtIteration

'C

Fig. 2. N umencal example 0f supplementing a partial input N = 20. m =4 (a1 Stored vectors, lb' interconnectioni or Synaptic matnx.

c, esults of initializing with a partia: version of (b, showing convergence to a stable state on the third iteration.

£~~SAVM MCM.V A

ANKDSC G ,IT. 1T.-0GI AND (c)SOLIG "A'L

ARA

t &No&, MEMMPV mass(b)ED Msd)-e

Fi.3 rhtctrsfrotLE imlmnainoA otn-drsal eo~bsdo oeso erlnt 'Mti etrmlileincrprain noliea eectonc eebak. b otolecroicscem fo ralzig bna' ipl rmak rasmttaceininoheen lgh.

optca fedackscem icorortig ybid ptca lghtamliie ar-.\ diopica fedac sstbthn il bitalelihi m~iferanprogrammable connANDtiHRESNmatrix.

Page 28: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

- -- - -

- - - - --- - - - - - - -SS

-- - - - - - -

- - - - -0- - - - - - -

- - - - - - - - - - -

I7 - - - - - - - - - - - - -

- - - - - - -

z0 -- I S S

--C - - - - - - - - - - - - - -

- - - -T -

TT . II

- - - - - - - -

- -s - -- - - - - - - - - -- -

Si,7 7 - - - -- -- - - - -i~ T

-- - - - - - - - - - - - - - - - --- - - -- - --

-:~~ -i -i- - -I - - - - - -.15 ~ - .15 .0

is i

7I i

Z 6L5 5

-; -C is -'31 S i -

Id - - S 5

z j *~ ~ cZ ~ ' i.5 5 . 5 S0 -

- '-5~ *~ =~itd

.0 Si ~ 10 1 ~115 51)to5~ --- ~0

Si 5J .'S~~~0 i ~ 5.5 i~z

-3 6tS >~3p i~

- -~~ >1

ss*i0~s s~c ~ 1. - 15.5 'S SiZS

2 -E s

Page 29: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

z z0-

I-I w

- 4 Z C

z 4s

seeLi

zE

c .rIj~i

A ~ - -U -~ A U

> c-Z

S T I9 a' I - -

- e IN OR~ - vA

-C, U

m~em>u iii ISO

C. "3 L: .-Z-

Not aU t A

Z C--%)

noe a A . jA

of Imu -U"; v

I..,, *J U U I

Page 30: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

2 7- -T Z'! ta z d

E -

=-:zu . N-

z -2

C-~ .4J

-Z Z

4 7:

z -~La

, = - -C >.

Page 31: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

-- - - - - - - - - - - r- - ----- --

Z,

z C

-- Z

- I-

z --

-* oo rC

it C

- - 4

Page 32: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

rr

- e-_" ,..-1.z = > . : i= -'

o - - ., -- Z

.... .... .....0

- -M- V m.Z:

.C -_, .- 2,- £ .C ..= "-

" -> .2 . .. - :.. u 1 .

0 00~C

z ._ ~ -~ t. , .. ... -0 - .; ._: -. 5 E _ -L -0 .L

.~~C rU1 ~2 -'

- U o0 " = :

.- :_ , ._ ' = : ~-- -

• A "- , - ,.. --ZU_*S -.- : - - -

z I (Lm 0 - z

--- . -. ., .-

- ,. .- .. .. - .

<Uz a

CL <. 3E0E X

c- V V-q - 1

z t 0 - SZ - v*to V- 5

-t m m r- a .

t c -

zu '

S- E

Page 33: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

tJtd

14- 14 U4

4, f , 4 zov c4E

-1u - ' z4 - Iz , * 0~ 4IJ ~ - - E -

t- -

14 .'~ 14. 14~ * o~to.~- ~44 .4 - -1

14 ~:.~1 - ~ '0 o ~Id

zu U4 4 .4

144

45 C . M. 4~ 4

.1 -

-- CU - -14-~ ~ 1 -0-:~ . 0-- -'-~~ .

0.14~ -14 1 ~ 4 -5~- ~ - 140CC

4,=. E. -1 -1 El4'

Page 34: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

~= f l I I I 1I Ii 1 1 1 1 1 1) - I -I- ~ -

- -I 1 0 1 1 1 I 0 I 1 - -i 0i 1 u i -

k= o l 3-- 0 -1l I -0 0 1 1 1t 0

k= 0 1 l - 0 1 l - 0 1 -I 0- I

1-3 -l 4 3l 4I -I 4 ~ 4 , - l Ik-S I I I -I -I l - -I I - -I -I - -I -l I -I -l I 4

1-IFig 14 Pattoe mari of enite in Fig 13 I - I -I I - I II

=~ ~ = 2 - -l I I -I -l I I -I -II - I I-I I

1=3 I - -1 -I I - I I -I I I I I - -II >

1=4 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ 3 ni v I I0 I- I I I 0- l - I - I I- I -

1=5~~ C-I -1 - I I - I -I I I - I I - I -

I-I 1 I I --I I - I -I -l - I -I I -l -I I l ><s 202 -I - 1 I I - I - - I - I I I I - l -1=3 I I 1 -l -I I -l - -I I I -I l I I -I -I - -I IN - I -

k=4 I - -l I -i l - -I -I 1 I 1 -1 1 I I -I - -I - no

< -m I - l I 0 I - I I - - I -

2 3 S 2 4 S I 2 3 4 S I 2 3 S 2 4>

=Zo

Fig.14.Parttioed T, mtrixof ntites n Fi. 1

C. -3 0 L

- Ma-~~ nO C

00

2 C-

LL C.

Page 35: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

4r 02

-r x

o~ ~ -r - L

L) 000000004r t000000

00 00000 C

0x0 LLJ UJ2 ' 0000

0000000000r0 00000 L.

00zui 000-

0000 ,-

t000 . ~1~:U.00000

z M Ul00~ W Cr c cz m ON -

U,~~- UU U- - .

<~ U.

2 -n

- - U U - ~ .~ ~~0cr z)

w M c- U - 4

M- MU 4,

a 4 ) -

zU

C."

C C-

Z' M~- ~

<~ C

Si U 30 -u Z ~<& C- ~~, U 0 .

--

.0k -. A~

Page 36: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

LI ?3

- LI

Ol I cI

0

acZC

L: LA

LI 4

.C LI ~ -. - LI

cc Coc a -

2' 4,

uj- -

CL .3 4C .3 -

.7), C- C.

ZCC Ad I-j -

-1 - 00L -3

Page 37: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

W 7 W~~~ - ---

>~~0 5 - A -

.0 0.40

- - U to

U-t L:-

- *~-0 0

ouJ 4, ZO -

II.

70 >.

zz

-o Z

0 I 0

00.0 E U0;- 73 rE U~

- -AML

Page 38: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

4,- r It A - -c Id

-~ ~~ ~ -z - '- 4. - -- 4

-j m~ ~ -- IM - 1 =z;

.4C- E c c - .5-C A *-. 4.4> .4 .4~ -- E

)- 4, SO.- C ~ -

~ ~ .C - me

w X Z 4-'41.. c

u - o~

C.1 '4 CZPA C~ - : 1

Z < , 4 .4 A-

z04 z z <

- -1. - -4 --

5 3c~

C~0 -V- -

~.... ma

E0 7c~C.m3

Q61c. - I -0 ;e- A ~ O A 4~ C S~ 014z

Page 39: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

m 7R

z ~ z

~0- to ~ 1D

T ~ r r

CL 0 ~ ~ -c O d

*0 - j

-, t a0t

Page 40: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

Reprinted from Optics Letters, Vol. 12, page 448, June 1987

Copyright 0 1987 by the Optical Society of America and reprinted by permission of the copyright owner.

Architectures for optoelectronic analogs of self-organizingneural networks

Nabil H. Farhat

Department of Electrical Engineering. Electro-Optics and Microwave-Optics Libora:ory, The University of Pennsylvania. Philadelphia.Pennsylvania 19104-6390

Received December 12. 1986. accepted March 20, 1987

Architectures for partitioning optoelectronic analogs of neural nets into input-output and internal groups to form amultilayered net capable of self-organization, self-programming, and learning are described. The architectures andimplementation ideas given describe a class of optoelectronic neural net modules that, when interfaced to a

conventional computer controller, can impart to it artificial intelligence attributes.

In earlier work on optical analogs of neural nets,1-6 the Here we describe a method for partitioning an opto-nets described were programmed to do a specific corn- ele :tronic analog of a neural net to implement a multi-putational task, namely, a nearest-neighbor search layered net analog that can learn stochastically byconsisting of finding the stored entity that is closest to means of a simulated annealing learning algorithm inthe address in the Hamming sense. The net acted as a the context of a Boltzmann machine formalism [seecontent-addressable associative memory. The pro- Fig. 1(a)]. The arrangement shown in Fig. 1(a) de-gramming was done by first computing the intercon- rives from the neural network analogs that we de-nectivity matrix using an outer-product recipe given scribed earlier. 2 The network, consisting of, say, Nthe entities that one wished the net to store and then neurons, is partitioned into three groups. Twosetting the weights of synaptic interconnections be- groups, V, and V2 , represent input and output units,tween neurons accordingly. respectively. The third group, H, comprises hidden

In this Letter we are concerned with architectures or internal units. The partition is such that N, + N, +for optoelectronic implementation of neural nets that N 3 = N, where N1 , N9 , ard N 3 refer to the number ofare able to program or organize themselves under su- neurons in the V1, V2 , and H groups, respectively.pervised conditions, i.e., of nets that are capable of (1) The interconnectivity matrix, designated here W,, iscomputing the interconnectivity matrix for the associ- partitioned into nine submatrices, A-F, and three zeroations that they are to learn and (2) changing the submatrices, shown as blackened or opaque regions ofweights of the links between their neurons according- the Wj mask. The LED array represents the state ofly. Such self-organizing networks therefore have the the neurons, assumed to be unipolar binary (LED on,ability to form and store their own internal represen- neuron firing; LED off, neuron not firing). The W,tations of the associations that they are presented mask represents the strengths of interconnectionwith. among neurons in a manner similar to earlier arrange-

Multilayered self-programming nets were ,'escribed ments.- Light from each LED is smeared verticallyas early as 1969,7 and in more recent descriptions' - 10 over the corresponding column of the W, mask withthe net is partitioned into three groups. Two are the aid of an anamorphic lens system [not shown ininput and output groups of neurons that interface with Fig. 1(a)], and light emerging from each row of thethe net environment. The third is a group of hidden mask is focused with the aid of another anamorphicor internal units that separates the input and output lens system (also not shown) onto the correspondingunits and participates in the process of forming inter- elements of the photodetector (PD) array. Thenal representations of the associations that the net is scheme utilized in Ref. 2 for realizing bipolar values ofpresented with. W,, in incoherent light is adopted here; it consists of

Two supervised learning procedures in such parti- separating each row of the Wi, mask into two subrows,tioned nets have recently attracted attention. One is assigning positive-valued Wij to one subrow and nega-stochastic, involving a simulated annealing pro- tive-valued Wij to the other, and focusing light emerg-cess,' t1.2 and the other is deterministic, involving an ing from the two subrows separately onto two adjacenterror backpropagation process. 9 There is general photosites connected in opposition in the photodetec-agreement, however, that because of their iterative tor array. Submatrix A, with N, x Nt elements, pro-nature, sequential computation of the links using vides the interconnection weights between units orthese algorithms is time consuming. A faster means neurons within group Vi. Submatrix B, with N2 X Nfor carrying out the required computations is needed. elements, provides the interconnection weights be-

Optics and optoelectronic architectures and tech- tween units within V2 . Submatrices C (with N, X N.1niques can play an important role in the study and elements) and D (with N 3 X N , elements) provide theimplementation of self-programming networks and in interconnection weights between units of V, and H,speeding up the execution of learning algorithms, and submatrices E (with N. X N., elements) and F

01 46-9592/87,'060448-0:1$2.i8 /0 c 1987. Optical Society of Ameri-a

-AW- ..L .. -,d. r . nnnm , ]m

Page 41: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

june i . 1_, N () 6 k' IS LET-ERS 449

V repeated application of the simulated annealing pro-cedure with the Boltzmann or another stochastic

D H state-update rule and collecting statistics on the statesE 2of the neurons at the end of each run when the net

PO reaches thermodynamic equilibrium.FRRAYS Starting from an arbitrary W, and for each clamp-

"'J ing of the V, and V, units to one of the associations,i LD I C the states of units in H are switched and annealing isTHRESOLD INTERCON NEC-

LED Tv 141 Kapplied, until thermodynamic equilibrium is reached.The state vector of the entire net, which represents astate of the global energy minimum, is then stored by

COMPUTER the computer. This procedure is repeated for eachCONTROLLfR association several times and the final state vectors

(0 ) recorded every time. Note that, because of the proba-bilistic nature of the state-update rule discussed later

- .- and in Eqs. (1) and (2) below, the states of global85 energy minimum in the runs for each association may

- MOSnot necessarily be exactly the same. Therefore the"OSILM need to collect statistics from which the probabilities

~MASK p, of finding the ith and jth neurons in the same stateIk ' ,can then be obtained. Next, with the output units V,

unclamped to let them run free, the above procedure isPO- repeated for the same number of annealings as before

F ED1 and the probabilities p,j' are obtained. The weights

/ r W, ae then incremented by A W, = 70(P - P1 '), whereINTERCONNEC- 17 is a constant that controls the speed and efficacy ofrIvIrY MASK learning. Starting from the new W,,, the above proce-

Fig. 1. Architecture for optoelectronic analog of layered dure is repeated until a steady state W, is reached, atself-programming net. (a) Partitioning concept showing which stage the learning procedure is complete.adjustable global threshold scheme. ib) Arrangement for Learning by simulated annealing requires calculat-rapid determination of the net's global energy E for use in ing the energy, E, of the net8 .10:learning by simulated annealing.

E= - ±'us, 1)

(with N: x N., elements) provide the interconnection where s, is the state of the ith neuron and

weights of units of V., and H. Units in V, and V., U Ws S , + .2)

cannot communicate with one another directly be-cause the locations of their interconnectivity weightsin the W,. matrix or mask are blocked out (blackened respectively. A simplified version of a rapid schemelower-left and top-right portions of W.). Similarly, for obtaining E optoelectronically is shown in Fig.units within H do not communicate with one another l(b). The scheme requires the use of an electronicallybecause locations of their interconnectivity weights in addressed nonvolatile binary (on-off) spatial lightthe W, mask are also blocked out (center blackened modulator (SLM) consisting of a single column of Nsquare of W). The LED element 0 is of graded re- pixels. A suitable candidate is a parallel-addressedsponse. Its output represents the state of an auxiliary magneto-optic SLM (MOSLM) consisting of a singleneuron in the net that is always on to provide a global column of N pixels that are driven electronically bythreshold level to all units by contributing only to the the same signal driving the LED array in order tolight focused onto negative photosites of the PD ar- represent the state vector s of the net. A fraction ofrays from pixels in the G column of the interconnectiv- the focused light emerging from each row of the W,ity mask. This is achieved by suitable modulation of mask is deflected by the beam splitter BS onto thepixels in the G column. This method for introducing individual pi-:els of the column MOSLM such thatthe threshold level is attractive, as it allows for provid- light from adjacent pairs of subrows falls upon oneing to all neurons in the net a fixed global threshold, an pixel of the MOSLM. The MOSLM pixels are over-adaptive global threshold, or even a noisy global laid by a checkered binary mask as shown. Thethreshold. opaque and transparent pixels in the checkered mask

By using a computer-controlled nonvolatile spatial are staggered in such a fashion that light emerginglight modulator to implement the W, mask in Fig. 1(a) from the left subcolumn will be derived from the posi-and including a computer-controller as shown, the tive subrows W, + of W, and light emerging from thescheme can be made self-programming with the ability right subcolumn will be derived from the negativeto modify the weights of synaptic links between its subrows W,- of W,,. By separately focusing the lightneurons. This is done by fixing or clamping the states from the left and right subcolumns as shown onto twoof the V, (input) and V, (output) groups to each of the photodetectors and subtracting and halving their out-associations that we want the net to learn and by puts, one obtains

Page 42: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

450 OPTICS LETTERS Vol. 12. No. 6 Junc 1987

F! that the role of optics in the architecture described notE -, W,:+ - I! s s: only facilitates partitioning the net into groups or lay-2 -ers but elso provides the massive interconnectivitv

mentioned earlier. For example, for a neural net with

a total of N = 512 neurons, the optics permit making1 I ' Wis Js, (3) 2N = 2.62 X 105 programmable weighted intercon-

2 - \ r nections among the neurons in addition to the 4N =2 , 2048 interconnections that would be needed in the

arrangement shown Fig. I(b) to compute the energy E.which is the required energy. In Eq. (3) the contribu- Assuming that material and device requirements oftions of Oi and Ii in Eq. (2) are absorbed in Wi,. The the architectures described can be met and parti-simulated annealing algorithm involves determining tioned, self-organizing neural net modules will be rou-the change AEk in E that is due to switching the state tinely constructed; then the addition of such a moduleof the kth neuron in H selected at random, computing to a computer-controller through a high-speed inter-the probability p(AEk) = 1/[I + exp(-AEk/T)], and face can be viewed as providing the computer-control-comparing the result with a random number nf(0, 1) ler with artificial intelligence capabilities by impartingproduced by a fast random-number generator. If to it neural net attributes. These capabilities includep(AE) > n.. the change in the state of the kth neuron self-organization, self-programmability and learning,is retained. Otherwise it is discarded and the neuron and associative memory capability for conductingis returned to its original state before a new neuron in nearest-neighbor searches. Such attributes would en-H is randomly selected and switched and the anneal- able a small computer to perform powerful computa-ing procedure repeated. This algorithm ensures that tional tasks of the kind needed in pattern recognitionin thermal equilibrium the relative probability of two and in the solution of combinatorial optimizationglobal states follows a Boltzmann distribution8; hence problems and ill-posed problems encountered, for ex-it is sometimes referred to as the Boltzmann machine, ample, in inverse scattering and vision, which are con-In this fashion the search for a state of global energy fined at present to the domain of supercomputers.minimum is done by employing a gradient-descent The research reported was supported by grantsalgorithm that allows for probabilistic hill climbing, from the Defense Advanced Research Projects Agen-The annealing process usually also includes a cooling cy-Naval Research Laboratory, the U.S. Army Re-schedule in which the temperature T is allowed to search Laoroy the U .S. Ary Re-decrease between iterations to increase gradually the search Office, and The University of Pennsylvania'sfineness of search for the global energy minimum. Laboratory for Research on the Structure of Matter.Optoelectronic methods for generating p(AEk) thatemploy speckle statistics'' and for generating the ran- Referencesdom number n, by photon counting 4 can also be in-corporated to speed up the procedure and reduce the 1. D. Psaltis and N. Farhat, Opt. Lett. 10, 98 11985).number of digital computations involved. 2. N. H. Farhat. D. Psaltis. A. Prata, and E. Paek. Appi

Opt. 24, 1469 (1985).The architecture described here for partitioning a 3. A. D. Fischer. C. Giles, and J. Lee. J. Opt. Soc. Am. A 1.neural net can be used in hardware implementation 1337 (1984).and study of self-programming and learning algo- 4. D. Z. Andersen. Opt. Lett. 11, 56 (1986).rithms such as the simulated annealing algorithm out- 5. B. Soffer. G. Dunning. Y. Owechko. and E. Marom. Opt.lined here. The parallelism and massive interconnec- Lett. 11, 118 (1986).tivity provided through the use of optics should mark- 6. A. Yariv and S. K. Kwong. Opt. Lett. 11, 186 (1986).edly speed up learning even for the simulated 7. S. Grossberg, J. Statist. Phys. 1,319 (1969).annealing algorithm, which is known to be quite time 8. D. H. Ackley. G. Hinton, and T. Sejnowski. Cognit. Sci.consuming when carried out on a sequential machine. 1, 147 (1985).The partitioning concept described is also extendable 9. D. E. Rumelhart and all authors. in Parallel Distributedthe patiioningconceptfmdescribedthisealsoeestendbteProcessing, D. F. Rumelhart and J. L. McClelland, eds.to multilayered nets of more than three layers and to (Bradford/MIT. Cambridge, Mass.. 1986). Vol. 1.the two-dimensional arrangement of synaptic inputs 10. T. J. Sejnowski and C. R. Rosenberg, Electrical Engi-to neurons, as opposed to the one-dimensional or lin- neering and Computer Science Tech. Rep. JHU/EECS-eal arrangement described here. Other learning algo- 96/01 Johns Hopkins University, Baltimore. Md.. 1986)rithms calling for a multilayered architecture, such as 11. N. Metropolis, A. Rosenbluth. M. Rosenbluth. and A.the error backpropagation algorithm,9' 5 can also now Teller. -. Chem. Phys. 21, 1087 (1953).be envisaged optoelectronically by employing the par- 12. S. Kirkpatric, C. Gelatt. Jr., and M. Vecchi. Science 220,titioning scheme described here or variations of it. 671 f 1983).

Learning algorithms in layered nets lead to analog 1:3. A. Ticknor. H. Barrett. and R. Easton. Jr.. in Diwest oior multivalued W,. Therefore high-speed computer- Topical Meeting on Optical Computing (Optical Soci-coroultled . witherdere hih-spcomear- etv of America. Washington, D.C.. 1985). postdeadlinecontrolled SLM's with graded pixel response are paper PD-4.called for. Methods of reducing the needed dynamic 14. J. Marron. A. Martino. and G. Morris, Appl. Opt. 25. 26range of W, or for allowing the use of ternary W,1 are, (1986).however, under study to permit the use of commercial- 15. K. Wagner and D. Psaltis. Proc. Soc. Photo-Opt. In-ly available fast nonvolatile binary SLM devices such strum. Eng. 756 (to be published).as the Litton/Semetex MOSLM.'1 It is worth noting 16. W. Ross, Opt. Eng. 22, 485 (1983).

Page 43: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

Appendix III

Reprinted from Applied Optics, Vol. 26, page 5093, December 1, 1987C.pyright T 197 v the Optical Society cf America and reprinted by permission of the copyright owner.

Optoelectronic analogs of self-programming neural nets:architecture and methodologies for implementing faststochastic learning by simulated annealing

Nabil H. Farhat

Self-organization and learning is a distinctive feature of neural nets and processors that sets them apart fromconventional approaches to signal processing. It leads to self-programmability which alleviates the problemof programming complexity in artificial neural nets. In this paper architectures for partitioning an optoelec-tronic analog of a neural net into distinct layers with prescribed interconnectivity pattern to enable stochasticlearning by simulated annealing in the context of a Boltznsn machine are presented. Stochastic learning isof interest because of its relevance to the role of noise in biological neural nets. Practical considerations andmethodologies for appreciably accelerating stochastic learning in such a multilayered net are described.These include the use of parallel optical computing of the global energy of the net, the use of fast nonvolatileprogrammable spatial light modulators to realize fast plasticity, optical generation of random number arrays.and an adaptive noisy thresholding scheme that also makes stochastic learning more biologically plausible.The findings reported predict optoelectronic chips that can be used in the realization of optical learningmachines.

I. kdroductim decision-making elements, the neurons, of the net.Interest in neural net models (see, for example, Refs. The switching time constant of a biological neuron is of

1-9) and their optical analogs (see, for example, Refs. the order of a few milliseconds. Artificial neurons10-25) stems from well-recognized information pro- (electronics or optoelectronic decision-making ele-cessing capabilities of the brain and the fit between ments) can be made to be a thousand to a million timeswhat optics can do and what even simpified models of faster. Artificial neural nets can therefore be expect-neural nets can offer toward the development of new ed to function, for example, as content-addressableapproaches to collective signal processing. associative memory or to perform complex computa-

Neural net models and their analogs present a new tional tasks such as combinatorial optimization whichapproach to collective signal processing that is robust, are encountered in computational vision, imaging, in-fault tolerant, and can be extremely fast. Collective or verse scattering, superresolution. and automated rec-distributed processing describes the transfer among ognition from partial, (sketchy) information, extreme-groups of simple processing units (e.g., neurons), that ly fast in a time scale that is way out of reach for evencommunicate among each other, of information that the most powerful serial computer. In fact once aone unit alone cannot pass to another. These proper- neural net is programmed to do a given task it will do itties stem directly from the massive interconnectivity almost instantaneously. More about this point later.of neurons (the decision-making elements) in the brain As a result optoelectronic analogs and implementa-and their ability to store information as weights of tions of neural nets are attracting considerable atten-links between them, i.e., their synaptic interconnec- tion. Because of the noninteracting nature of pho-tions, in a distributed nonlocalized manner. As a re- tons, the optics in these implementations provide thesuit, signal processing tasks such as nearest-neighbor needed parallelism and massive interconnectivity andsearches in associative memory can be performed in therefore a potential for realizing relatively large neu-time durations equal to a few time constants of the ral nets while the decision-making elements are real-

ized electronically heralding a possible ultimate mar-riage between VLSI and optics.

The author is with University of Pennsylvania, Electrical Engi- Architectures suitable for use in the implementationnearing Department, Philadelphia, Pennsylvania 19104-6390. of optoelectronic neural nets of -D and 2-D arrange-

Received 15 May 1987. ments of neurons were studied and described earli-00O3-6l91.5/87/215$9.1. 1102.00/0. er.1° - ' 5 Two-dimensional architectures for optoelec-C 1987 Optical Society of America. tronic analogs have been successfully utilized in the

1 December 1987 / Vol. 26. No. 23 / APPUED OPTICS 5093

Page 44: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

recognition of objects from partial information by ei- In this paper we are therefore concerned with archi-ther complementing the missing informatiu- or by tactures fo. optoelectionic implementation of neuralautomatically generating correct labels of the data (ob- nets that are able to program or organize themselvesject feature spaces) the memory is presented with.2 under supervised conditions, i.e., of nets that are capa-These architectures are based primarily on the use of ble of (a) computing the interconnectivity matrix forincoherent light to help maintain robustness, by avoid- the associations they are to learn, and (b) changing theing speckle noise and the strict positioning require- weights of the links between their neurons accordingly.ments encountered when use of coherent light is con- Such self-organizing networks have therefore the abili-templated. ty to form and store their own internal representations

In associative memory applications, the strengths of of the entities or associations they are presented with.interconnections between the neurons of the net are In Sec. II we attempt to elucidate those features thatdetermined by the entities one wishes to store. Ideal set neural processing apart from conventional ap-storage and recall occurs when the stored vectors are proaches to signal processing. The ideas expressedrandomly chosen, Le., uncorrelated. Specific storage have been arrived at as a result of maintaining a criticalrecipes based on a Hebbian model of learning (outer- attitude and constantly keeping in mind, when en-product storage algorithm), or variations thereof, are gaged in the study of neural net models and theirusually used to explicitly calculate the weights of inter- applications, the question of what is unique about theconnections which are set accordingly. This repre- way they perform signal processing tasks. If theysents explicit programming of the net, i.e., the net is seem to perform a signal processing function well, couldexplicitly taught what it should know. What is most the same function be carried out equally well with aintriguing, however, is that neural net analogs can also conventional processing scheme? To gain insight intobe made to be self-orglnizing and learning, i.e., become this question we were led to a comparison betweenself-programming. The combination of neural net outer-product and inner-product schemes for imple-modeling, Boltzmann machines, and simulated an- menting associative memory. The insight gainednealing concepts with high-speed optoelectronic im- from this exercise points clearly to certain distinctionplementations promises to produce high-speed artifi- between neural and conventional approaches to signalcial neural net processors with stochastic rather than processing which will lead us to considerations of self-deterministic rules for decision making and state up- programmability and learning. These are presenteddate. Such nets can form their own internal represen- in Sec. 11 together with a description of architecturestations (connectivity weights) of their environment for optoelectronic analogs of such self-organizing(the outside world data they are presented with) in a nets. The emphasis is on stochastic supervised learn-manner analogous to the way the brain forms its own ing, rather than deterministic learning, and on the userepresentations of reality. This is quite intriguing and of noise to ensure that the combinatorial search proce-has far-reaching implications for smart sensing and dure for a global energy or cost function during therecognition, thinking machines, and artificial intelli- learning phase does not get trapped in a local mini-gence as a whole. Our exploratory work is showing mum of the cost function. In Sec. III a discussion ofthat optics can also play a role in the implementation practical considerations related to the implementationand speeding up of learning procedures such as simu- of the architectures described and for accelerating thelated annealing in the context of Boltzmann machine learning process is presented. An estimate of theformalism 2 - 9 .4

9 and error backpropagation- ° in such speedup factor compared to serial implementation isself-teaching nets and for their subsequent use in auto- included. Conclusions and implications of the workmated robust recognition of entities the nets have had are then given. These attest to a continuing role fora chance to learn earlier by repeated exposure to them optics in the implementation of artificial neural netwhen the net is in a learning mode. Induced self- modules or neural chips with self-programming andorganization and learning seem to be what sets apart learning capabilities, i.e., to optical learning machines.optical and optoelectronic architectures and process-ing based on models of neural nets from other conven-tional approaches to optical processing and have the . i or Neual Procassngadvantage of avoiding explicit programming of the net Right from the outset, when attention was flrstwhich can be time-consuming and has come to be drawn to the fit between optics and neural models, 10 11

referred to as the programming complexity of neural our investigations of optoelectronic analogs of neuralnets- 8 The partitioning scheme presented in Sec. III nets and their applications have perpetually kept inpermits defiming input, output, and intermediate lay- view the question of what is it that neural nets can doera of neurons and any prescribed communication pat- that is not doable by conventional means, i.e., by well.tern between them. This enables the implementation established approaches to signal processing. Suchof deterministic learning algorithms such as error critical attitude is found useful and almost mandatorybackprojection. However, the discussion in this paper to avoid being swept into ill-conceived research en-focuses on stochastic learning by simulated annealing deavors. It is not easy of course to see all the ramifica-since such learning algorithms may prove to be more tions of a problem while one is immersed in its qtudvbiologically plausible since they might account for the and solution, but a critical attitude always helps tonoise present in biological neural nets as will be elabo- isolate real attributes from biased ones.rated on in Sec. IV. Being collective, adaptive, iterative, and highly non-

5094 APPLIED OPTICS / Vol. 26. No. 23 / I Oecember 1987_AL

Page 45: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

linear, neural net models and their analogs exhibit qlv sgn T 'v,complex and rich behavior in their phase space or state " =

space that is described in terms of attractors, limitpoints, and limit cycles with associated basins of at- where the superscripts (q) and (q+1) designate twotraction, bifurcation, and chaotic behavior. The rich consecutive iterations and sgnfl represents the sign ofbehavior offers intellectually attractive and challeng- the bracketed quantity. The iteration triggered by aning areas of research. Moreover, many be!-eve that in externally applied initializing or strobing vector q7v, qstudying neural nets and their models we are attempt- 0 0, i.e., (0v, continues until a steady-state vector thating to benefit from nature's experience in its having is one of the nominal state vectors or attractors of thearrived over a prolonged period of time, through a net that is closest to 10v in the Hamming sense isprocess of trial and error and retainment of those per- converged upon. At this point the net has completed amutations that enhance the survivability of the organ- nearest-neighbor search operation. For simplicity theism, at a powerful, robust, and highly fault-tolerant usual terms for the threshold 0, and external input 1, ofprocessor, the brain, that can serve as the model for a the ith neuron have been omitted from Eq. (1). Thesenew generation of computing machines. Clues and can, without los of generality of the conclusions ar-insights gained from its study can be immensely bene- rived at below, be assumed to be zeiu or absorbed in theficial for use in artificially intelligent man.made ma- summation in Eq. (1) through the use of two additionalchines that, like the brain, are highly suited for pro- always-on neurons that communicate to every otherceasing of spatiotemporal multi-sensory data and for neuron in the net its tlh -shold and external inputmotor control in a highly adaptive and interactive levels, through appropriate weights added to T,,.environment. Note in Fig. 1 that the itera ted input vector is always

All the above are general attributes and observations the transpose of the threshclded output vector.that by themselves are sufficient justification for the By substituting the expression for the storage ma-interest displayed in neural nets as a new approach to trixsignal processing and computation. To gain, however, Mfurther specific insight in what sets neural nets apart T,1 - 2)from other approaches to signal processing, we consid-er a specific example. This involves comparison be- formed by summing the outer products of the storedtween two mathematically equivalent representations vectors vI"", i - 1,2. N and m - 1,2 ... M, into Eq.of a neural net, one involving outer products, and the (1) and interchanging the order of summations, weother inner products. 31 We begin by considering the obtainoptoelectronic neural net analog described earlier1 2 .and represented here in Fig. 1. The iterative proce- )= sgn ,vdure determining the evolution of the state vector v ofthe net is illustrated in Fig. 1(a) and the vector-matrix wheremultiplication scheme with thresholding and feedback 41c, = 'X" 4

' ,v

used to interconnect all neurons with each otherthrough weights set by the Tqj mask is shown in Fig. are coefficients determined by the inner product of the1(b). For a net of size N with interconnectivity matrix input vector 'qlv at any iteration by each of the storedT,j, where Ti - 0, ii - 1,2 ... N, the iterative equation vectors. Equations (3) and (4) can be implementedfor the state vector is employing the optoelectronic direct storage and inner-

product recall scheme shown in Fig. 2 in which LEAand PDA represent light emitting array and photode-tector array, respectively. Noting that the two seg-ments to the left and to the right of the diffuser in Fig. 2are identical, one can arrive at the simplified equiva-

----- o-......- lent reflexive inner-product scheme shown in Fig. 3.Now we have arrived at two equivalent implementa-tions of the neural model. These are shown together

......------ ., ' 'in Fig. 4. One employs outer-product distributed stor-,j'L- .. =...~ age and vector-matrix multiplication with threshold-

ed feedback in the recall as shown in Fig. 4(a), and thesecond employs direct storage and inner-product re-h -call with thresholded feedback as shown in Fig. 4(b).

,J .The reflexive or inner-product scheme has several ad-i . . vantages over the outer-product scheme. One is stor-

age capacity. While an N X N storage matrix in theouter-product scheme can store M < N/4 InN vectorsof length N beyond which the probability of correctrecall deteriorates rapidly because of proliferation of

Fig. 1. Outer-product Idistributed) storage and recall scheme, spurious states, 2 the storage mask T,, j 1.2 ... N, m

1 December 1987 / Vol. 26. No, 23 / APPLIED OPTICS 5095

Page 46: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

"-," = 1,2 ... M, in the inner-product scheme can store

- - directly a stack of up to M - N vectors in the same size-. .. matrix. It can be argued that the robustness and the

IA fault tolerance of the distributed storage scheme havebeen sacrificed in the inner-product scheme, but this is

- --- ----.... a mute argument. Robustness can be easily restored'I ~ by introducing a certain degree of redundancy in the

L,..... inner-product scheme of Fig. 4(b). This can be done,L. .......- for example, by storing a vector in more than one

Fig. 2. Direct storage and inner-product recall scheme. location in the stack. The internal optical feedback inthe inner-product implementation is certainly anotherattractive advantage. In fact the beauty of internalfeedback has inspired the concept of reflexive associa-tive memory or nonlinear resonator CAM (contentaddressable memory) shown in Fig. 5. This scheme,which becomes possible because the T matrix is sym-

PDA - . metrical, utilizes the same optics for internal feedback

AO og ."Ik - _ and for transposing the reflected state vectors. Thescheme is perfectly suited for use with nonlinear reflec-

.tor arrays or arrays of optically bistable elements.

Z, mA " , The advantages of a similar bidirectional associative

"Am 3TCA OPmemory have also been noted recently elsewhere. 25

L- 0.1 vcTons In view of the obvious advantages of the reflexivescheme (Fig. 4(b)], one is led to question the reason

Fig. 3. Reflexive inner-product scheme. nature appears to prefer distributed (Hebbian) storage

[as in Fig. 4(a)] over localized storage [as in Fig. 4(b)]besides fault tolerance and redundancy. As a result ofthe preceding exercise the answer now comes readily tomind: in the inner-product scheme the connectivitymatrix T,, is not present. Self-organization and learn-

-ing in biological systems are associated with modifica-,' '' tions of the synaptic weights matrix. Hence learning

in the neural sense is not possible in the inner-product

scheme. In this sense the inner-product scheme ofFig. 4(b) is not neural but involves conventional corre-lations between the input vectors and the stored vec-tors. One can argue that the instant the identity of the

"- ........ weights matrix T, was obliterated the inner product* network stopped being neural as learning through

weights modification is no longer possible. We aretherefore led to conclude that distributed storage and

self-organization and learning are the most distinctive

Fig. 4. Two equivalent neural net analogs: (a) outer-product di3- features of neural signal processing as opposed to con-

tributed storage and recall with external feedback; (b) reflexive ventional approaches to signal processing such as in

inner-product direct storage and recall with internal feedback. the inner-product scheme which involves simple corre-lations and where it is not clear how seif-organizationand learning can be performed since there is no T,matrix to be modified.

Neural net processing has additional attractive fea-NtNNLINEAR tures that are not as distinctive as self-organizationAfV RcoT and learning. These include heteroassociative storage

T T and recall where the same net performs the functionsof storage, processing, and labeling of the output (finalstate) simultaneously. While such a task may also be

T' realized with conventional signal processing nets, eachW T of the above three functions must however be realized

NoNuINAR separately in a different subnet. A striking example ofRFLECTOR this feature reported recently2 3 is in the area of radarARA target recognition from partial information employing

Fig. 5. Concept of nonlinear resonator content addressable sinogram representation of targets of interest. Thememory. sinogram representations were used in computing and

5096 APPUED OPTICS I Vol. 26. No. 23 / I December 1987

Page 47: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

setting the synaptic weight matrix in an explicit learn-ing mode. Recognition in radar from partial informs-tion is tantamount to solution of the superresolutionproblems. The ease and elegance with which the neu- a ,. /ral net approach solves this classical problem is, to saythe least, impressive. %E Mc..

Other distinctive features of neural nets associated i0 . VW-S

with the rich phase-space behavior are bifurcation andchaotic behavior. These were mentioned earlier butare restated here because of their importance in se-quential processing of data (e.g., cyclic heteroassocia- Fig. 6. Optoelectronic analog of self-organizing neural net parti-

tive memory) and in the modeling and study of mental tioned into three layers capable of stochastic self-programming anddisorder and the effect of drugs on the nervous sys- learning.tem. 3

M. PawMMng Ard*ctues and Stodst Leanmbgby SMimuatd Amwal@ig nicate among each other. They can only communicate

In preceding work on optical analogs of neural with neurons in the input and output groups as stated

nets,1° - 2 the nets described were programmed to do a earlier.specific computational task, namely, a nearest-neigh- Two supervised learning procedures in multilayeredbor search that consisted of finding the stored entity nets have recently attracted attention. One is sto-that is closest to the address in the Hamming sense. chastic, involving a simulated annealing process,26.2 7

As such the net acted as a content addressable associa- and the other is deterministic, involving an error back-tive memory. The programming was done by first propagation process.30 There is general agreement,computing the interconnectivity matrix using a Heb- however, that because of their iterative nature, se-bian (outer-product) recipe given the entities one quential computation of the weights using these algo-wished the net to store, followed by setting the weights rithms is very time-consuming. A faster means forof synaptic interconnections between neurons accord- carrying out the cequired computations is needed.ingly. Nevertheless, the work mentioned represents a mile-

In this section we are concerned with architectures stone in that it opens the way for powerful collectivefor optoelectronic implementation of neural nets that computations in multilayered neural nets and the par-are able to program or organize themselves under su- titioning concept dispels earlier reservations36 aboutpervised conditions. Such nets are capable of (a) com- the capabilities of early single layered models of neu-puting the interconnectivity matrix for the associa- ral nets such as the Perceptron.37 The partitioningtions they are to learn, and (b) changing the weights of feature and the ability to define input and outputthe links between their neurons accordingly. Such neurons may also be the key for realizing meaningfulself-organizing networks therefore have the ability to interconnection between neural modules for the pur-form and store their own internal representations of pose of performing higher-order hierarchical process-the associations they are presented with. The discus- ing.sion in this section is an expansion of one given earli- Optics and optoelectronic architectures and tech-er.-3 niques can play an important role in the study and

Multilayered self-programming nets have recently implementation of self-programming networks and inbeen attracting increasing attention.4.2

8-3 .0 3 For ex- speeding up the execution of learning algorithms.

ample, in Ref. 28 the net is partitioned into three Here we describe a method for partitioning an opto-groups, two are input and output groups of neurons electronic analog of a neural net to implement a multi-that interface with the net environment and the third layered net that can learn stochastically by means of ais a group of hidden or internal units that acts as a simulated annealing learning algorithm in the contextbuffer between the input and output units and partici- of a Boltzmann machine formalism (see Fig. 6). Thepates in the process of forming internal representa- arrangement shown in Fig. 6 derives from the neuraltions of the associations the net is presented with. network analogs we described earlier. 2 The network.This can be done, for example, by clamping or fixing consisting of, say, N neurons, is partitioned into threethe states of neurons in the input and output groups to groups. Two groups, Vi and V,_, represent visible unitsthe desired pairs of associations and letting the net run that can be viewed as input and output groups, respec-through its learning algorithm to arrive ultimately at a tively. The third group H are hidden or internal units.specific set of synaptic weights or links between the The partition is such that N, + N2 + N3 - N, where N 1,neurons. No neuron or unit in the input group is N2 , and N 3 refer to the number of neurons in the V1, V,,linked directly to a neuron in the output group and vice and H groups, respectively. The interconnectivityversa. Any such communication must be carried out matrix, T, is partitioned into six submatrices. A, B. C.via the hidden units. Neurons within the input group D, E, F, and three zero-valued submatrices shown ascan communicate among each other and with hidden blackened or opaque regions of the T,, mask. Theunits and the same is true for neurons in the output LED array represents the state of the neurons, as-group. Neurons in the hidden group cannot commu- sumed to be unipolar binary (LED on - neuron firing,

I December 1987 1 Vol. 26. No. 23 / APPLIED OPTICS 5097

Page 48: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

LED off - neuron not firing). The T,j mask repre- (1) Starting from an arbitrary T, clamp V, and V.sents the strengths of interconnection between neu- to the desired association keeping H free running.rons in a manner similar to earlier arrangements. 12 (2) Randomly select a neuron in H, say the kthLight from each LED is smeared vertically over the neuron, and flip its state [recall we are dealing withcorresponding column of the Tj mask with the aid of binary (0,1) neurons].an anamorphic lens system (not shown in Fig. 6), and (3) Determine the change AEk in global energy E oflight emerging from each row of the mask is focused the net caused by changing the state of the kth neuron.with the aid of another anamorphic lens system (also (4) If AEh < 0, adopt the change.not shown) onto the corresponding elements of the (5) If AEh > 0, do not discard the change outrightphotodetector (PD) array. The same scheme utilized but calculate first the Boltzmann probability factor,in Ref. 12 for realizing bipolar values of Ti in incoher- - ARent light is assumed here, namely, separating each row P,= exp -IF. ,5)of the TqI mask into two subrows and assigning posi-

tive-valued Tij to one subrow and negative-valued Ti/ and compare the outcome to a random number N, eto the other, and focusing light emerging from the two [0,1]. If Pk > N, adopt the change of states of the kthsubrows separately on two adjacent photosites on the neuron even if it leads to an energy increase (i.e., AEk >photodetector array connected in opposition. Subma- 0). If Ph < N,, discard change, i.e., return the kthtrix A, with N, x N, elements, provides the intercon- neuron to its original state.nection weights between units or neurons within group (6) Once more select a neuron in H randomly andV1. Submatrix B, with N 2 X N2 elements, provides the repeat steps (1)-(5).interconnection weights between units within V2. (7) Repeat steps (1)-(6) reducing at every round theSubmatrices C (with N, x N3 elements) and D (with N 3 temperature T gradually [e.g., T - To/log(1 + m),x N1 elements) provide the interconnection weights where m is the round number, cooling schedule is fre-between units of V1 and H and submatrices E (with N2 quently used to ensure convergence] until a situation isx N3 elements) and F (with N 3 x N2 elements) provide reached where changing states of neuron in H does notthe interconnection weights of units of V2 and H. alter the energy E, i.e., A6k - 0. This indicates a stateUnits in V and V2 cannot communicate among each of thermodynamic equilibrium or a state of global en-other directly because locations of their interconnecti- ergy minimum has been reached. The temperature Tvity weights in the T,, matrix or mask are blocked out determines the fineness of search for a global mini-(blackened lower left and top right portions of Ti,). mum. A high T produces coarse search and low T aSimilarly units within H do not communicate among finer grained search.each other because locations of their interconnectivity (8) Record the state vector at thermodynamic equi-weights in the Tq mask are also blocked out (blackened librium, i.e., the states of all neurons in the net, i.e.,center square of Ti,). The LED element 0 is of graded those in H and those in V, and V2 that are clamped.response. Its output represents the state of an auxilia- (9) Repeat steps (1)-(8) for all other association onry neuron in the net that is always on to provide a V, and V2 we want the net to learn and collect statisticsglobal threshold level to all units by contributing only on the states of all neurons by storing the states atto the light focused onto negative photosites of the thermodynamic equilibrium in computer memory asphotodetector (PD) arrays from pixels in the G column in step (8). This completes the first phase of exposingof the interconnectivity mask. This is achieved by the net to its environment.suitable modulation of the transmittance of pixels in (10) Generate the probabilities P,, of finding the iththe G column. This method for introducing the neuron and the jth neuron in the same state. Thisthreshold level is attractive, as it allows for providing completes phase I of the learning cycle.to all neurons in the net a fixed global threshold, an (11) Unclamp neurons in V, letting them run free asadaptive global threshold, or even nosiy giobal thresh- with neurons in H.old if desired. (12) Repeat steps (1)-(10) for all input vectors V,

By using a computer-controlled nonvolatile spatial and collect statistics on the states of all neurons in thelight modulator to implement the T,, mask in Fig. 6 net.and including a computel controller as shown, the (13) Generate the probabilities A. of finding neuronscheme can be made self-programming with ability to i and neuron j in the same state.modify the weights of synaptic links between its neu- (14) Increment the current connectivity matrix T,rons. This is done by fixing or clamping the states of by ATi - c(P, - P') where c is a constant representingthe V (input) and V2 (output) groups to each of the and controlling the speed of learning. This completesassociations we want the net to learn and by repeated phase II of the learning cycle.application of the simulated annealing procedure with (15) Repeat steps (1)-(14) again and againuntil theBoltzmann, or other stochastic state update rule, and increments AT,j tend to zero, i.e., become smaller thancollection of statistics on the states of the neurons at some prescribed small number. At this point the net isthe end of each run when the net reaches thermody- said to have captured the underlying structure ornamic equilibrium. formed its own representations of its environment de-

Stochastic learning by simulated annealing in the fined by the associations presented to it. We are nowpartitioned net proceeds as follows: dealing with a learned net.

5098 APPLIED OPTICS / Vol. 26. No. 23 I 1 December 1987

Page 49: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

One can make the following observations regarding magnitude compared to serial digital implementation.the above procedure: Learning by simulated annealing requires calculat-

The search for state of global energy minimum is ing the energy E of the net, ".3basically a gradient descent procedure that allows forprobabilistic hill climbing to avoid entrapment in a E = .-2 uv,. k6)state of local energy minimum. The relative probabil-ity of two global states a and 0 is given by the Boltz- where vi is the state of the ith neuron andmann distribution PIPj - exp 1-(E. - Ea)/1, hencethe name Boltzmann machine.28 Therefore the lowest -n I T,jvj - 8j + I,energy state is the most probable at any temperature ,.,and is sought by the procedure. is the activation potential of the ith neuron with 8i and

Unlike explicit programming of a neural net where Ii being the threshold level and external input to thelack of correlation among the stored vectors is a pre- i-th neuron respectively and the summation term rep-requisite for ideal storage and recall, self-program- resenting the input to the i-th neuron from all otherming by simulated annealing has no such requirement. neurons in the net. By absorbing 6i and I, in theIn fact learning by simulated annealing in a Bol-mann summation term as described earlier, Eq. (7) can bemachine looks for underlying similarities or correla- simplified totions in the training set to generate weights that canmake the net generalize. Generalization is a property u, = T,v-,. (8)where the net recognizes an entity presented to it eventhough it was not among those specifically used in the A simple analog circuit for calculating the contribu-learning session. Learning is thus not rote. tion Ei of the ith neuron to the global energy E of the

The final Ti, reached represents a net that has net is shown in Fig. 7(a). Here the product of thelearned its environment by itself under supervision, activation potential of the ith neuron and the state vi ofi.e., it has formed its own internal representations of its the ith neuron is formed to obtain E, which is thensurroundings. Those environmental states or input/ added to all terms formed similarly in parallel for alloutput associations that occur more frequently will other neurons in the net. Although VLSI implemen-influence the final Tij more than others and hence form tation of such an analog circuit for parallel calculationmore vivid impressions in the synaptic memory matrix of the global energy is feasible, this becomes less at-Tii. tractive as the number of neurons increases because of

The learning procedure is stochastic but is still basi- the interconnection problem associated with the largecally Hebbian in nature where the change in the synap- fan-in at the summation element.tic interconnection between two units (neurons) de- A simplified version of a rapid scheme for obtainingpends on finding the two units in the same state E optoelectroncally is shown in Fig. 7. The scheme(sameness reinforcement rule). requires the use of an electronically addressed nonvol-

Evidently, being stochastic in nature (involving atile binary (on-off) spatial light modulator consistingprobabilistic state transition rules and simulated an- of a single column of N pixels. A suitable candidate isnealing) the learning procedure is lengthy (taking a parallel addressed magnetooptic spatial light modu-hours in a digital simulation for nets of a few tens to afew hundred neurons). Hence, speeding up the pro-cess by using analog optoelectronic implementation ishighly desirable. ,°'yo

Stochastic learning consists of two phases: phase Iinvolves generating probabilities P,j when the inputand output of the net are specified. Phase II involves G-E__generating the probabilities Fl, when only the input is. , LO Sspecified while the rest of the net is free running fol- " ...lowed by computing the weight increments and modi- E I E.

fying the T, matrix accordingly.

IV. A=eeated Leambi gsStochastic learning by the simulated annealing pro-

cedure we described was originally conceived for serialcomputation. When dealing with parallel opticalcomputing systems it does not make sense to exactlyfollow a serial algorithm. Modifications that can take -advantage of the available parallelism of optics tospeed up stochastic learning are therefore of interest. -, sxIn this section we discuss several such modifications Fig. 7. Two schemes for parallel computing of the ghobal energv inthat offer potential for speeding-up stochastic learning an optoelectronic analog of a multilayered seif-organizing net: tain optoelectronic implementations by several orders of electronic scheme: i b) optoelectronic scheme.

1 Decemoer 1987 / Vol. 26. No. 23 / APPLIED OPTICS 5099

Page 50: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

lator (MOSLM) consisting of a single column of N [see Eq. (5)] employing speckle statistics is describedpixels that are driven electronically by the same signal in Ref. 40 and optical generation of random numberdriving the LED array to represent the state vector v of arrays by photon counting image acquisition systemsthe net. A fraction of the focused light emerging from or clipped laser speckle have also been recently de-each row of the T,, mask is deflected by the beam scribed." - " These photon counting image acquisi-splitter (BS) onto the individual pixels of the column tion systems have the advantage of being able to gener -MOSLM such that light from adjacent pairs of su- ate normalized random numbers with any probabilitybrows of Tq falls on one pixel of the MOSLM. The density function. A more important advantage of op-MOSLM pixels are overlaid by a checkered binary tical generation of random number arrays however ismask as shown. The opaque and transparent pixels in the ability to exploit the parallelism of optics to modifythe checkered mask are staggered in such a fashion the simulated annealing and the Boltzmann machinethat light emerging from the left subcoluman will origi- formalism detailed above to achieve significant im-nate from the positive subrows T + of Ti only and light provement in speed. As stated earlier, with parallelemerging from the right subcolumn will originate from optical random number generation, a spatially andthe negative subrows T7 or Tij. By separately focusing temporally uncorrelated linear array of perculatingthe light from the left and right subcohumns as shown light spots of suitable size can be generated and imagedonto two photodetectors and subtracting and halving on the photodetector array (PDA) of Fig. 6 such thattheir outputs, one obtains both the positive and negative photosites of the PDA

[see also Fig. 7(a)] are subjected to random irradiance.SThis introduces a random (noise) component in 8 and

21 ,L of Eq. (7) which can be viewed as a bipolar noisythreshold. The noisy threshold produces in turn a

1 ( noisy component in the energy in accordance with Eq.2- - (9)v1 , - - , (6). The magnitude of the noise components can be

controlled by varying the standard deviation of thewhich is the required global energy. random light intensity array irradiating the PDS.

The learning procedure detailed in Sec. ITI requires The noisy threshold therefore produces random con-fast random number generation for use in random trolled perturbation or shaking of the energy land-drawing and switching of state of neurons from H scape of the net This helps shake the net loose when-(during phase I of learning) and from H and V2 (during ever it gets trapped in a local energy minimum. Thephase II of learning). Another random number is also procedure can be viewed as generating a controlledneeded to execute the stochastic state update rule deformation or tremor in the energy landscape of thewhen AEk > 0. Although fast digital pseudorandom net to prevent entrapment in a local energy minimumnumber generation of up to 109 s- 1 is feasible 9 and can and thereby ensure convergence to a state of globalbe used to help speed up digital simulation of the energy minimum. Both the random drawing of neu-learning algorithm, this by itself is not sufficient to rons (more than one at a time is now possible) and themake a large impact especially when the total number stochastic state update of the net are now done inof neurons in the net is large. Optoelectronic random parallel at the same time. This leads to significantnumber generation is also possible although at a slower acceleration of the simulated annealing process. Therate of 105 s. Despite the slower rate of generation, parallel optoelectronic scheme for computing the glob-optoelectronic methods have advantages that will be al energy described earlier [see Fig. 7(b)] can be used toelaborated on below. An optoelectronic method for modulate the standard deviation of the optical randomgenerating the Boltzmann probability factor p(AE) noise array used to produce a noisy threshold with a

SOSLM I Z PAIe Of OENTICA_ MAGNETOOPTIC SPATIAl LIGHT MODULATORS

IN PERFECT REGISTRATION

*MOSLM I - O(T!PNALAND,,-i 0 ITRAl INPUT SLM

SMOSLM 2 - SYN I1C AlIGHTS SIN

* L,. L, HO RIZONTALLY INITGRATING,ll'rll~ilAM VERITICALLY WAGIN

' 'oANl MORpHIC SYSTEM

, ~ ~ ~~~~ l l, , PDA-P@T4OOM''CTOR ARRAY

- Oo.,RCNTOUE NO HON Fig. 8. Optoelectronic neuralCOLL.+l I .w OI rPCNRLLRI t

cMLi ,r oCOOROINATS AND PARTICIPA IS IN chip.Mx SIMUJAIED ANNEA ING RUNS AND

MOIFICA lm OF SYNAPTIC VKIGHTS

cm MI..M

o +lio mill

5100 APPLED OPTICS Vol. 26. No. 23 1 December 1987

Page 51: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

function of the instantaneous global energy E and/or tendable to multilayered nets of more than three layersits time rate change dEldt. In this fashion an adaptive and to 2-D arrangement of synaptic inputs to neurons,noisy threshold scheme can be realized to control the as opposed to the 1-D or lineal arrangement describedtremors in the energy landscape if necessary. The here. Other learning algorithms calling for a multilay-above discussion gives an appreciation of the advan- ered architecture such as the error backprojection al-tages and flexibility of using optical random array gorithm3O and its coherent optics implementation45

generators in making the net rapidly find states of can also now be envisioned optoelectronically employ-global energy minimum. No attempt is made here to ing the partitioning scheme described here.estimate in detail the speed enhancement over digital Learning algorithms in layered nets lead to analog orexecution of the simulated annealing process as this multivalued Tii. Therefore high-speed computer-will be dependent on the characteristics of the light controlled SLMs with graded pixel response are calledemitting array, the photodetector array, the spatial for. Methods of reducing the needed dynamic rangelight modulator, and the speed of the computer-con- of Tj or for allowing the use of ternary T, are howevertroller interface used. Nevertheless, the enhancement under study to enable the use of commercially avail-over digital serial computation can be significant, ap- able fast nonvolatile binary SLM devices such as theproaching 5-6 orders of magnitude especially for rela- Litton/Semetex magnetooptic SLM (MOSLM).16 Atively large multilayer nets consisting of from a few frame switching time better than 1/1000 s has beentens to a few hundred neurons. A recent study of demonstrated recently in our work on a 48 X 48 pixellearning in neuromorphic VLSI systems in the context device by employing an external magnetic field bias.of a modified Boltzmann machine gives speedup esti- It is worth noting that the role of optics in the architec-mates of 106 over serial digital simulations.4 ture described not only facilitates partitioning the net

into groups or layers but also provides the massiveV. Opt ofliC NAI Chip interconnectivity mentioned earlier. For example, for

The discussion in the preceding sections shows that a neural net with a total of N - 512 neurons, the opticsoptical techniques can simplify and speedup stochas- enable making 2N2 - 2.62 x 106 programmable weight-tic learning in artificial neural nets and make them ed interconnections among the neurons in addition tomore practical. The attractiveness and practicality of the 4N - 2048 interconnections that would be neededoptoelectronic analogs of self-programming and learn- in Fig. 6(b) to compute the energy E.ing neural nets are enhanced further by the concept of Assuming that material and device requirements ofoptoelectronic neural chips presented in Fig. 8. The the architectures described can be met and partitionedembodiments shown rely heavily on the use of comput- self-organizing neural net modules will be routinelyer or microprocessor interfaced spatial light modula- constructed, the addition of such a module to a com-tors and photodetector arrays. The figure shows how puter controller through a high speed interface can bethe free-space anamorphic lens system in the top left viewed as providing the computer controller with arti-embodiment can be replaced by a single photodetector ficial intelligence capabilities by imparting to it neuralarray with horizontal strip elements that spatially in- net attributes. These capabilities include sef-organi-tegrate the light emerging from rows of MOSLM 2 zation, self-programmability and learning, and asso-(lower right embodiment). MOSLM 2 represents the ciative memory capability for conducting nearest-T,j mask of Fig. 6. Each column MOSLM 1 is uniform- neighbor searches. Such attributes would enable aly activated by the computer controller. This replaces small computer to perform powerful computationalthe function of the anamorphic lens system that was tasks of the kind needed in pattern recognition, and inneeded in Fig. 6 to smear the light from the LED array the solution of combinatorial optimization problemsvertically onto the elements of the T, mask. The and ill-posed problems encountered, for example, inoptoelectronic neural chip represents a neural module inverse scattering and vision, which are confined atoperating in an ambient light environment as com- present to the domain of supercomputers.pared with a biological neural module operating in a A central issue in serial digital computation of com-chemical environment. The chip thus derives some of plex problems is computational complexity.47 Pro-its operating energy from the ambient light environ- gramming a serial computer to perform a complexment. computational task is relatively easy. The computa-

tion time however for certain problems, especiallyV. ODs uion those dealing with combinatorial searches and combi-The architecture described here for partitioning a natorial optimization, can be extensive. In neural nets

neural net can be used in hardware implementation the opposite is true. They take time to program [forand study of self-programming and learning algo- example, computation of the interconnectivity matrixrithms such as, for example, the simulated annealing of synaptic weights by outer product or correlationalgorithm outlined here. The parallelism and massive (Hebbian rule) and setting the weights accordingly].interconnectivity provided through the use of optics Once programmed, however, they perform the compu-should markedly speed up learning even for the simu- tations required almost instantaneously. This fact islated annealing algorithm, which is known to be quite one of the first attributes noted when working withtime-consuming when carried out on a sequential ma- neural nets and has recently been elaborated on.chine. The partitioning concept described is also ex- Self-organization and learning entails the net deter-

1 Decembe 1987 / Vol. 26, No. 23 / APPLIED OPTICS 5101

Page 52: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

mining by itself the weights of synaptic interconnec- 15. D. Psaltis. E. Paek. and J. Hong. "Acoustooptic Implementation

tions among its neurons that represent the association of Neural Network Models. in Technicai Digest. Opt;cai Soci-

it is supposed to learn. In other words, the net pro- ety of America Annual Meeting iOptical Society of America.

gras suppsef, toertherey ag the programming 1. Washington. DC. 1985). paper WT6.grams itself, thereby alleviating16. J. D. Condon, "Optical Window Addressable Memory." in Tech-complexity issue. One can envision nets that learn by nical Digest of Topical Meeting on Optical Computing iOptLcaiexample when the associations the net is supposed to Society of America, Washington. DC, 1985), postdeadline paper

learn are presented to it by an external teacher in a 17. A. D. Fisher, C. L. Giles, and J. N. Lee. "Associative Processorsupervised learning mode. This leads naturally to the Architectures for Optical Computing," J. Opt. Soc. Am. A 1.

more intriguing question of unsupervised learning in 1337 (1984): "An Adaptive, Associative Optical Computing Ele-

such nets and analog implementations of such learn- ment." in Technical Digest of Topical Meeting on Optical Com-

ing. puting (Optical Society of America, Washington. DC. 1985.

paper WB4.This work was supported by grants from DARPA/ 18. A. D. Fisher and C. L. Giles, "Optical Adaptive Associative

NRL, the Army Research Office, and the University of Computer Architectures." in Proceedings. IEEE COPCOMPennsylvania Laboratory for Research on the Struc- Spring Meeting, IEEE catalog no. CH2135-2/85/0000/0342ture of Matter. Some of the work reported was carried (1985), p. 342.out when the author was a summer Distinguished Vis- 19. A. D. Fisher, "Implementation of Adaptive Associative OpticaJ

iting Scientist at the NASA Jet Propulsion Laboratory Computing Elements," Proc. Soc. Photo-Opt. Instrum. Eng.

in Pasdena. The support and hospitality of JPL are 6= 196 (1986).

gratefully acknowledged. 20. B. H. Soffer, G. J. Dunning, Y. Owechko, and E. Marom. "Asso-

ciative Holographic Memory with Feedback Using Phase-Con-

Refmrenw jugate Mirrors," Opt. Lett. 11, 118 (1986).

1. B. Widrow, and M. E. Hoff, "Adaptive Switching Circuits," in 21. D. Z. Anderson. "Coherent Optical Eigenstate Memory." Opt.

WESCON Convention Board, Part 4 (1960), pp. 96-104; in Self- Lett. 11, 56 (1986).

Organizing Systems, M. C. Yovitz et al, Eda. (Spartan, Wash- 22. A. Yariv and S-K. Kwong, "Associative Memories Based on

ington, DC, 1962). Message-Bearing Optical Modes in Phase-Conjugate Resona-

2. K. Nakano, "Asociatron-a Model of Associative Memory," tors." Opt. Lett. II. 186 (1986).

IEEE Trans. Syst. Man Cybern. SMC-2, 380 (1972). 23. N. Farhat, S. Miyahars. and K. S. Lee, "Optical Implementation

3. T. Kohonen, "Correlation Matrix Memories," IEEE Trans. of 2-D Neural Nets and Their Application in Recognition of

Comput. C-21, 353(1972). Radar Targets," in Neural Networks for Computing, J. S.

4. T. Kohonen, Associative Memory (Springer-Verlag, Heidel- Denker, Ed. (American Institute of Physics, New York. 1986). p.

berg. 1978); Self-Organization and Associative Memory 146.

(Springer-Verlag, New York. 1984). 24. N. Farhat and D. Psaltis. "Optical Implementation of Associa-

3. D. J. Willshaw, "A Simple Network Capable of Inductive Gener- tive Memory Based on Models of Neural Networks." in Optical

alization.'" Proc. R. Soc. London 182, 233 (1972). Signal Processing, J. L. Homer, Ed. (Academic. New York.

6. J. A. Anderson. J. W. Silverstein. S. A. Ritz, and R. S. Jones. 1987), p. 129. in press.

"Distinctive Features. Categorial Perception. and Probability 25. B. Koako and C. Guest. "Optical Bidirectional Associative Me-

Learning- Some ApplicationsofaNeuralModel."Physiol. Rev. mory." Proc. Soc. Photo-Opt. Instrum. Eng. 758, 1987). in

34, 413 (1977). press.7. J. J. Hopfield, "Neural Networks and Physical Systems with 26. N. Metropolis. A. M. Rosenbluth. M. N. Rosenbluth. and A. H.

Emergent Collective Computational Abilities," Proc. Net. Teller. "Equations of State Calculations by Fast Computing

Acad. Sci. U.S.A. 79. 2554 (1982); "Neurons with Graded Re- Machines," J. Chem. Phys. 21, 1087 (1953).

sponse Have Collective Computational Properties Like Those of 27. S. Kirkpatrick. C. D. Gelatt. and M. P. Vecchi. "Optimization by

Two-State Neurons." Proc. Natl. Acad. Sci. U.S.A. 81. 3088. Simulated Annealing," Science 220. 671 1983).

(1984). 28. D. H. Ackley, G. E. Hinton. and T. J. Seinowski. "A Learning

8. S. Groeberg. Studies of Mind and Brain iReidel, Boston. 1982). Algorithm for Boltzmann Machines." Cognitive Sci. 1. 147

9. A. C. Sanderson and Y. Y. Zeevi. Eds.. Special Issue on Neural 11985).and Sensory Information Processing," IEEE Trans. Syst. Man 29. T. J. Sejnowski and C. R. Rosenberg. "NETtalk: A Parallei

Cybern. SMC-13 (1983). Network That Learns to Read Loud." Johns Hopkins U.. Elec-

10. D. Psaltis and N. Farhat. "A New Approach to Optical Informa- trical Engineering & Computer Science Technical Report JHUtion Processing Based on the Hopfield Model." in Digest of the EECS-96/0i 11986).

Thirteenth Congress of the International Commission on Op- 30. D. E. Rumelhart. G. E. Hinton. and R. J. Williams. 'Learning

tics (ICO- 13). Sapporo. Japan (1984). p. 24. Internal Representations by Error Propagation." in Parailet

11. D. Psaltis and N. Farhat. "Optical Information Processing Distributed Processing, D. F. Rumelhart and J. L. McClelland.

Based on an Associative-Memory Model of Neural Nets with Eds. (MIT Press. Cambridge. MA. 1986), Vol. 1. p. 318.

Thresholding and Feedback." Opt. Lett. 10. 98 (1985). 31. S. Miyahara. "Automated Radar Target Recognition Based on

12. N. H. Farhat, D. Psaltis. A. Prata, and E. Paek. "Optical Imple- Models of Neural Networks," U. Pennsylvania Dissertation

mentation of the Hopfield Model," Appl. Opt. 24, 1469 (1985). (1987).13. N. H. Farhat and D. Psaltis. "Architectures for Optical Imple- 32. R. J. McEliece. E. C. Posner, E. R. Rodemich. and S. S. Venka-

mentation of 2-D Content Addresable Memories," in Techni- tesh. "The Capacity of the Hopfield Associative Memory."

cal Digest. Optical Society of America Annual Meeting (Opti- IEEE Trans. Inf. Theory IT-33. 461. March 1987.ca) Society of America, Washington. DC, 1985). paper WT3. 33. G. B. Ermentrout and J. D. Cowan. "Large Scale Spatially

14. K.S. Lee and N. H. Farhat."Content Addressable Memory with Organized Activity in Neural Nets." SIAM (Soc. Ind. Appl.

Smooth Transition and Adaptive Thresholding," in Technical Math.) J. Appl. Math. 38, 1 11980).

Digest, Optical Society of America Annual Meeting (Optical :14. N. H. Farhat. "Architectures for Opto-Electronic Analogs of

Society of America. Washington, DC, 1985), paper W.J35. Self-Organizing Neural Networks." Opt. Lett. 12. 448 1987'.

5102 APPUED OPTICS / Vol. 26. No. 23 / 1 Oecember 1987

NO

Page 53: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

accepted for publication. t 1986).35. G. A. Carpenter and S. Grossberg. "A Massively Parallel Archi. 43. F. Devos. P. Garda. and P. Chavel. "Optical Generation of

tecture for Self-Organizing Neural Pattern Recognition Ma- Random-Number Arrays for On-Chip Massively Parallel Montechines," Comput. Vision Graphics Image Process. 37, 54 1987). Carlo Cellular Processors," Opt. Lett. 12. 152 t 19871.

36. M. L. Minsky and S. Papert. Perceptronrs (MIT Press, Cam- 44. Y. Tsuchiya. E. Inuzuka. T. Kurono, and M. Hosada. Photonbridge, MA, 1969). Counting Imaging and Its Application." Advances in E'ectron-

37. F. Roeenblatt, Principles of Neuro-Dynamics: Perceptiors and ics and Electron Physics, B. L. Morgan. Ed. 64A. 21 (Academicthe Theory of Brain Mechanisms kSpartan Books, Washington, Press. London, 1988).DC, 1962). 45. J. Aispector and R. B. Allen. "A Neuromorphic VLSI Learning

38. M. A. Cohen and S. Groseberg, "Absolute Stability of Global System," in Advanced Research in VLSI, Paul Losleben. Ed.Pattern Formation and Parallel Memory Storage by Competi- (MIT Press. Cambridge. MA. 198'), to be published.tive Neural Networks." IEEE Trans. Syst. Man Cybern. SMC- 46. W. E. Ross. D. Psaltis. and R. H. Anderson. "Two-Dimensional13. 815 (1983). Magneto-Optic Spatial Light Modulator for Signal Processing.'"

39. S. Kirkpatrick. IBM Watson Research Center. private commu- Opt. Eng. 22, 485 (1983).nication (1987). 47. R. M. Karp, "Combinatorics, Complexity and Randomness."

40. A. J. Ticknor, H. H. Barrett, and R. L Easton. Jr., "Optical Commun. ACM 2, 98 (Feb. 1986).Bolt-mnn Machines." in Technical Digest of Topical Meeting 48. M. Takeda and J. W. Goodman. "Neural Networks for Compu-on Optical Computing (Optical Society of America. Washing- tation: Number Representations and Programming Complexi-ton. DC, 1985), poetdeadline paper PD3. ty," AppL Opt. 25. 3033 (1986).

41. G. M. Morris, "Optical Computing by Monte Carlo Methods," 49. D. G. Bounds. "Numerical Simulations of Boltzmann Ma-Opt. Eng. 24,86 (1985). chines." in Neural Networks For Computing, J. S. Denker, Ed..

42. J. Marron. A. J. Martino and G. M. Morris. "Generation of VoL 151. (American Institute of Physics, New York. 1986).Random Arrays Using Clipped Laser Speckle." Appl. Opt. 25,26 p. 59.

1 December 1987 / Vol. 26, No. 23 / APPLIED OPTICS 5103

Page 54: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

PROC/75/6,/ 14161Appendix IV

Phased-Array Antenna Pattern Synthesisby Simulated Annealing

N. H. FarhatB. Bai

Reprinted from: PROCEEDINGS OF THE IEEE, Vol. PROC-75, No. 6, JUNE 1987.

Page 55: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

ferent steering angles, but an optimal weight distribution can be [81 H. Barrettetal.. Optical Boltzmann mat , n-t- n -dtadintcomputed for each steering angle. From the simulation result, it paper OSA Topical Meeting on Optli.al ('onpuLnr,. 1n .. neis seen that the optimum far-field pattern has similar features to , ,ilage. 'V 1985.the pattern given by the Doiph-Chebvshev distribution runction.In the Dolph-Chebvshev pattern, all the sidelobes have the samelevel for a specified beamwidth. A numerical example in [1) showsan 8-element array ielement separation d = 0.5X) with 25.8(dB)sidelobe level and 40.81 beamwidth. The optimum pattern givenby our simulation shows nearly equal level sidelobes which areminimized for the given beamwidth.

V. DiscussioN

Simulated annealing is a modification of the iterative improve-ment algorithm (4]. It is physically more meaningful and can becomputed more systematically than the iterative improvement (4].Physically, the simulated annealing process is analogous to thecooling of melt in crystal growth: careful annealing produces adefect-free crystal, rapid annealing produces a defective crystal orglass [3]. The probabilistic treatment with the probability functionPIE) = exp (-.IEIKT) provides a way to accept the unfavorablechanges and is easy to compute. From our simulation, it has beenfound that the simulated annealing algorithm seems always to givebetter performance than the iterative improvement algorithm.

Since simulated annealing is a modified iterative improvementprocess, it takes a relatively long time to do an optimization prob-lem just as iterative improvement does in a computer calculation.The phased-array synthesis in our simulation runs for I h or so foran array of 41 elements on a MICRO PDP-11 computer Finding anefficient scheme to reduce the excessive amounts of computer timefor most optimization problems has always been of concern 15)-[7]. Otherwise, it enough computation power is available, iterativeimprovement can be run from random starts tor many times toapproach the optimum state. Fast optodigital computing schemessimilar to those described in [8] may also be considered for phased-array synthesis by simulated annealing. It is understood that thefar field is the Fourier transform ot the array distribution functionAn optical lens can be used for computing the Fourier transformas the distribution function is inputted to the front tocal plane otthe lens via, for example, an appropriate computer-driven spatiallighi moduiator (SLM). The Fourier transform in the back focal planecan oe recorded and fed to the computercontroller to make thesimulated a-inealing decision. The outcome is tedback to the SL Mto change the distribution function in the front tocal plane. Thehybrid optodigital scheme will do the Fourier transform instanl'v.In this fasnion. the computation associated with the Fourier trans-form can be virtually eliminated assuming a high-speed SLM andcomputer interface are utilized. An optoelectronic Boltzmannmachine for accelerating the selection rule has also been proposedearlier in [8]. This process can be repeated for each step in sim-ulated annealing. Also, aCauchv probabilitv selection rule insteadot the Boltzmann selection rule, can be used to speed up the wholeannealing process further (71.

EFERENCES

1] C. L. Dolph, 'A current distribution for broadside arrays whichoptimizes the relationship between beam width and side-lobelevel," Proc. IRE. vol. 34, pp. 335-348, June '946.

[2) N. Metropolis et al., "Equation of state calculations by fastcomputing machines," J. Chem Phys., vol. 21, pp. 1087-1092,June 1953.

[31 S. Kirkpatrick et al., "Optimization by simulated annealinR,Science, vol. 220, pp. 671-680, May 1983.

J4) W. F. Smith er al., "Reconstruction of objects from codedimages by simwlated annealing," Opt. Let., vol. 8, pp 199-201.Apr. 1983.

[51 B Dunham, "Design by natural selection." Synthese, vol. 15,pp. 254-258, 1963.

(61 S. Iin, "Heuristic programming as an aid to network design,Network, vol. 5, pp. 33-43 1975.

[7] H. Szu, "Fast simulated annealing with Cauchy probabilit,density," presented at the Neural Networks for ComputingConference, Snowbird, UT, Apr. 1986.

I4 t8-9219J87/Pt0t08445 I f) i 'I h LF'

844 ~~~~PRO(tF[)N(' O! THEl IF[[ \tOt No J( ' U10|'c

Page 56: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

Appendix V

Architectures and Methodologies for Self-Organization and StochasticLearning in Opto-Electronic Analogs of Neural Nets

N.H. Farhat, and Z.Y. ShaeUniversity of Pennsylvania

200 South 33rd StreetPhiladelphia, PA 19104-6390

Pubfished in the Proceedings of the IEEE First Annual International Conference on Neural Networks. June 1987.

IEEE Catalog #87TH0191-7

Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy beyond the limits of U.S. copyrightlaw for private use of patrons those articles mn this volume that carry a code at the bottom of the first page, provided the per-copy fee indicated in the code ispaid through the Copyright Clearance Center, 29 Congress Street, Salem. MA 01970. Instructors are permitted to photocopy isolated articles for noncommercialclassroom use without fee. For other copying, reprint or republication permission, write to Director. Publishing Services, IEEE. 345 E. 47 St., New York. NY10017. All rights reserved. Coyright 1987 by The Institute of Electrical and Electronics Engineers. Inc.

Page 57: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

ARCHITECTURES AND METHODOLOGIES FOR SELF-ORGANIZATION AND STOCHASTICLEARNING IN OPTO-ELECTRONIC ANALOGS OF NEURAL NETS

N.H. Farhat, and Z.Y. ShaeUniversity of Pennsylvania

200 South 33rd StreetPhiladelphia, PA 19104-6390

Abstract

Self-organization and learning is a distinctive feature of neural netsand processors that sets them apart from conventional approaches to signalprocessing. It leads to self-programmability which alleviates the problemof programming complexity in artificial neural nets. We have devisedarchitectures for partitioning an opto-electronic analog of a neural netinto distinct layers with prescribed inter-connectivity pattern to enablestochastic learning by simulated annealing in the context of a Boltzmannmachine. Stochastic learning is of interest because of its relevance tothe role of noise in biological neural nets. It can shed light on the waynature has turned noise present in biological nets to work to its advantage.Practical considerations and methodologies for appreciably acceleratingstochastic learning in such a multi-layered net are also described. Theseinclude the use of parallel optical computation of the energy of the net,the use of fast nonvolatile programmable spatial light modulators to realizefast "plasticity", optical generation of random number arrays, and a noisythresholding scheme that makes stochastic learning more biologically plaus-ible and does not require determining the energy of the net for theannealing schedule.

1. INTRODUCTION

Interest in neural network models (see for example, [I]-[9]) and theiroptical analogs (see for example [10]-[21]) stems from well recognizedinformation processing capabilities of the brain and the fit between whatoptics can do and what even simplified models of neural nets can offertoward the development of new approaches to collective signal processingthat are robust, fault tolerant and can be extremely fast.

As a result opto-electronic analogs and implementations of neural netsare attracting today considerable attention. The optics in these imple-mentations provide the needed parallelism and massive interconnectivity andtherefore a potential for realizing relatively large neural nets while thedecision making elements are realized electronically heralding a possibleultimate marriage of VLSI and optics.

Architectures suitable for use in the implementation of opto-electronicneural nets of one-dimensional and two-dimensional arrangements of neuralnets have been studied and described recently [11]-[14]. Two-dimensional

11-565

Page 58: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

architectures for opto-electronic analogs of neural nets have been success-fully used in the recognition of objects from partial information by eithercomplementing the missing information or by automatically generating correctlabels of the data (object feature spaces) the memory is presented with [22].

In associative memory applications, the strength of interconnectionbetween the "neurons" of the net is determined by the entities one wishesto store in the net. Specific storage "recipes" based on a Hebbian modelof learning (outer-product storage algorithm) or variations thereof areusually employed. In that sense the memory is taught what is should knowand be cognizant of. What can be of great utility however, is that neuralnets can also be made to be self-organizing and learning i.e., to becomeself-programming. The combination of neural nets, Boltzmann machines, andsimulated annealing concepts with high speed opto-electronic implementationspromise to produce high-speed artificial neural net processors with stochas-tic rather than deterministic rules for decision making and state updatethat can form their own internal representations (connectivity weights) oftheir environment, the outside world data they are presented with, in amanner very analogous to the way the brain forms its own symbolic repre-sentations of reality. This is quite intriguing and has far reachingimplications for smart sensing and recognition, thinking machines, andartificial intelligence as a whole. Our exploratory work is showing thatoptics can also play a role in the implementation and speeding up oflearning algorithm (such as simulated annealing in the context of aBoltzmann machine formalism [23]-[26] and error back propagation [27]) insuch self-teaching nets and for their subsequent use in automated robustrecognition of entities the nets have had a chance to learn earlier byrepeated exposure to them when in a learning mode. Self-organization andlearning seems to be what sets apart optical and opto-electronicarchitectures and processing based on models of neural nets from otherconventional approaches to optical processing.

In this paper we are therefore first concerned with architectures foropto-electronic implementation of neural nets that are able to program ororganize themselves under supervised conditions, i.e., of nets that arecapable of (a) computing the interconnectivity matrix for the associationsthey are to learn, and (b) of changing the weights of the links betweentheir neurons accordingly. Such self-organizing networks have thereforethe ability to form and store their own internal representations of theentitiesor associations they are presented with. We are also concerned withstochastic learning in such nets and with methodologies for acceleratingthe learning process. These include a novel noisy threshold scheme thatcan speed up the simulated annealing process in opto-electronic analogs ofneural nets. Results of computer simulations demonstrating capabilities ofannealing with the noisy thresholding are presented.

2. PARTITIONING ARCHITECTURES AND STOCHASTIC LEARNING

Multi-layered self-programming nets have been described recently, [251,[261, where the net is partitioned into three groups. Two are groups ofvisible or external input/output units or neurons that interface with the

111-566

Page 59: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

net environment or surroundings. The third is a group of hidden or internalunits that separates the input and output units and participates in theprocess of forming internal representations of the associations the net ispresented with, as for example by "clamping" or fixing the states of theinput and output neurons to the desired associations and letting the netrun through its learning algorithm to arrive ultimately at a specific setof synaptic weights or links between the neurons that capture, after manyiterations of the process, the underlying structure of all thn associationspresented to the net. The hidden units or neurons prevent the input andoutput units from communicating with each other directly. In other wordsno neuron or unit in the input group is linked directly to a neuron in theoutput group and vice-versa. Any such communication must be carried outvia the hidden units. Neurons within the input group do not communicatewith each other. They can only communicate with neurons in the input andoutput groups as stated earlier.

As an example of the continuing role for optics, we describe next aconcept for partitioning an opto-electronic analog of a neural net intoinput, output, and internal units with the selective communication patterndescribed above in order to realize a multi-layered net analog capable ofstochastic learning, by means of a simulated annealing learning algorithmin the context of a Boltzmann machine formalism (see Fig. l(a)). Thearrangement shown derives from the neural network analogs we describedearlier [11]. The network, consisting of say N neurons, is partitionedinto three groups. Two groups, V1 and V2 , represent visible or exteriorunits that can be used as input and output units respectively. The thirdgroup H are hidden or internal units. The partition is such that NI+N 2+N3=Nwhere subscripts 1, 2, 3 on N refer to the number of neurons in the Vl, V2and H groups respectively. The interconnectivity matrix, designated here asWij, is partitioned into nine submatrices, A, B, C, D, E, and F plus threezero submatrice shown as blackened or opaque regions of the Wij mask. TheLED array represents the state of the neurons, assumed to be unipolar binary(LED on = neurons firing, LED off = neuron not-firing). The Wij mask repre-sents the strengths of interconnection between neurons in a manner similarto earlier arrangements [111. Light from the LEDs is smeared verticallyover the Wij mask with the aid of an anamorphic lens system (not shown inFig. l(a)) and light emerging from rows of the mask is focused with the aidof another anamorphic lens system (also not shown) onto elements of thephotodetector (PD) array. Also we assume the same scheme utilized in [il]for realizing bipolar values of Wij in incoherent light is adopted here,namely by separating each row of the Wij mask into two subrows and assigningpositive values of Wtj to one subrow and negative values Wjj to the other,then focusing light emerging from the two subrows separately onto pairs ofadjacent photosites connected in opposition in each of the Vl, V2 and Hsegments of the photodetector array. Submatrix A with NIxN1 elements, pro-vides the interconnection weights of units or neurons within group VI .

Submatrix B,with N2xN2 elements, provides the interconnection weights ofunits within V2 . Submatrices C (of NIxN 3 elements) and D (of N3xN1 ele-ments) provide the interconnection weights between units of V1 and H andsimilarly submatrices E (of N2xN3 elements) and F (of N3xn2 ) provide the

111-567

Page 60: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

interconnection weights of units V2 and H. Units in V1 and V2 can notcommunicate with each other directly because locations of their interconnec-tivity weights in the Wij matrix or mask are blocked out (blackened lowerleft and top right portion of Wij). Similarly units within H do not commu-nicate with each other because locations of their interconnectivity weightsin the Wij mask are also blocked out (center blackened square of Wij). TheLED element e can be of graded response. It can be viewed as representingthe state of an auxiliary neuron in the net that is always on to provide athreshold level to all units by contributing to the light focused onto onlynegative photosites of the PD arrays by suitable modulation of pixels inthe G column of the interconnectivity mask. This method for introducingthe threshold level is attractive as it allows for introducing a fixedthreshold (fixed G-LED output) to all neurons or an adaptive threshold ifdesired. The threshold is global when the transmittances of pixels in G arefixed and the e LED level in controlled. The threshold is local if thee LED output is fixed and the pixel transmittances are allowed to vary.

wI E.-,

F" V OSLM

'HOM-O I C INTERCONNEC-LEDTIVITY MSK

Fig 1. Architecture for opto-electronic analog of layered self-programmingnet. (a) partitioning concept and, (b) arrangement for rapiddetermination of the net's global energy E for use in learningby simulated annealing.

We have described elsewhere in some detail [28], how by using a computercontrolled nonvolatile spatial light modulator to implement the Wij maskin Fig. l(a) and including a computer/controller as shown and by repeatedapplication of the simulated annealing procedure with Boltzmann or otherstochastic state update rule and collection of co-occurance statistics onthe states of the neurons at the end of each run when the net reachedthermodynamic equilibrium, the scheme can be made self-programming withability to modify the weights of synaptic links between its neurons to forminternal representations of the input/output associations or patternspresented to it.

111-568

Page 61: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

3. ACCELERATED LEARNING

The stochastic learning by simulated annealing procedure was originallyconceived for serial computation. When dealing with parallel opticalcomputing systems of the kind we described, it does not make sense tostrictly adhere to a serial algorithm. Modifications that can takeadvantage of the available parallelism of optics to speed up stochasticlearning should be considered. In this section we offer several suchmodifications that can markedly speed-up stochastic learning inopto-electronic implementations as compared to serial digitalimplementation.

Learning by simulated annealing requires calculating the energy E, ofthe net (71,(291,

E uisi (1)

where •i is the state of the i-th neuron and

ui = I Wi v. E i + I (2)

j ~i

is the activation potential of the i-th neuron, with ei iand Ii beingrespectively the threshold level and external input of the i-th neuron.Equation (2) can be written in the form,

ui = X W jsj (3)

by absorbing ei and Ii in the weight matrix Wij. This can be done by addinga G column to the W11 matrix to furnish ei as described earlier. A similarprocedure can be useda to furnish Ii by adding another column with transmit-tances proportional to Ii whose light transmittance is focused onto thepositive photosites of the photodetector array in Fig. l(a). The abovemethod of introducing ei and Ii suggests also that random (noise) componentsof both ei and Ii can be introduced by focusing a random array of lightspots, whose intensities are allowed to vary randomly and independentlywith time, directly onto the positive and negative photosites of the PD

array of Fig. 1(a). In this fashion deterministic and random compositionof ei and Ii can be realized. Taken together, the random componentsof ei and Ii can be viewed as random bipolar noisy threshold. Wewill return to this point later in our discussion of annealing with noisythreshold.

A simplified version of a rapid scheme for obtaining E opto-elec-tronically is shown in Fig. l(b). The scheme requires the use of anelectronically addressed nonvolatile binary (on-off) spatial light modu-lator consisting of a single column of N pixels. A suitable candidate is aparallel addressed magneto-optic spatial light modulator (MOSLM) [30], inparticular one consisting of a single column of N pixels that are drivenelectronically by the same signal driving the LED array in order torepresent the state vector s of the net. A fraction of the focused lightemerging from each row of the Wij mask is deflected by the beam splitter BSonto the individual pixels of the column MOSLM such that light from adjacent

1II-569

Page 62: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

pairs of subrows of Wij fall on one pixel of the MOSLM. The MOSLM pixelsare overlayed by a checkered binary mask as shown. The opaque and trans-parent pixels in the checkered mask are staggered in such a fashion thatlight emerging from the left subcolumn will originate from the positivesubrows Wt1j of Wij only and light emerging from the right subcolumn willoriginate from the negative subrows Wjij of Wij. By separately focusing thelight from the left land right subcolumns as shown onto two photodetectorsand subtracting and halving their outputs one obtains,

E W+ - W- )s I W. .s)s. (4)2 ij )5 5 J 13 2 1i = - uis.

which is the required global energy.Stochastic learning by simulated annealing in the opto-electronic neural

net analogs of Fig. 1 requires, as detailed elsewhere [28], fast randomnumber generation for use in random drawing and switching of state ofneurons from H and from H and V2 . Another random number is also neededto execute the stochastic state update rule. Although fast digitalpseudo-random number generation of up to 109 (sec-1 ] is feasible [31]and can be used to help speed up digital simulation of the learningalgorithm, this by itself is not sufficient to make a large impactspecially when the total number of neurons in the net is large.Opto-electronic random number generation is also possible although at aslower rate of about 105 [sec]. Despite the slower rate of generation,opto-electronic methods have advantages that will be elaborated uponbelow. An opto-electronic method for generating the Boltzmann probabilityfactor needed in the simulated annealing algorithm [281 employing specklestatistics is described in [321 and optical generation of random numberarrays by photon counting image acquisition systems or clipped laserspeckle have also been recently described [33]-[36]. These photon countingimage acquisition systems have the advantage of being able to generatenormalized random numbers with any probability density function. A moreimportant advantage of optical generation of random number arrays howeveris the ability to exploit the parallelism of optics to modify the simulatedannealing and the Boltzmann wachine iormalism detailed above in order toachieve significant improvement in speed. As stated earlier, with paralleloptical random number generation, a spatially and temporally uncorrelatedlinear array of perculating light spots of suitable size can be generatedand imaged onto the photodetector array (PDA) of Fig. 1 directly such thatboth the positive and negative photosites of the PDA are subjectedto random irradiance. This introduces a random (noise) component in ei andIi of eq. (2) which can be viewed as stated earlier as bipolar noisythreshold. The noisy threshold produces in turn a noisy component in theenergy in accordance to eq. (2). The magnitude of the noise components canbe controlled by varying the standard deviation of the random lightintensity array irradiating the PDA. The noisy threshold producestherefore random controlled perturbation or "shaking" of the energylandscape of the net. This helps shake the net loose whenever it getstrapped in a local energy minimum. The procedure can be viewed asgenerating a controlled deformations or tremor in the energy landscape of

111-570

Page 63: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

q

the net to prevent entrapment in a local energy minimum and ensure therebyconvergence to a state of global energy minimum. Both the random drawingof neurons (more than one at a time is now possible) and the stochasticstate update of the net are now done in parallel at the same time. Thisleads to significant acceleration of the simulated annealing process. Inthe following, results of numerical simulations aimed at gaining insight inthe performance of the noisy threshold scheme are presented.

4. SIMULATION RESULTS

For the purposes of this simulation we form a fully interconnected(single layer) neural net with random bipolar binary weights matrix withdiagonal elemenits set to zero. The number of neurons N is 16. The weightsare symmetrical. Figure 2 shows the density of states (energy histogram)of the net, where we calculate the energies of all the possible(216) configurations of the net. The Y (vertical) axis shows the numberof configurations (out of 216) with the same energy and theircorresponding energy energy represented by the X (horizontal) axis. Thelow energy configurations correspond to states near the very left of thecurve, of which a good annealing scheme should find one. To avoid lengthysimulation time, we do not in the following exhaust the simulation for allpossible configurations (216). Instead, we randomly select 50configurations as the test sample space. The energy histogram of these 50configurations is shown in Fig. 3. In Fig. 4 is shown the energy histogramof the states to which the net converges when initiated with the 50configurations of the test space. The histogram was obtained by initiatingthe net with any one of the 50 states and followed by finding the finalstate to which net net converges by iteratively applying the customaryneural net state update rule [7] namely,

1 if u 0 4)

( 0 if ui S 0

which amounts to performing a steepest gradient descent search into localminima of the net, then calculating the energy of the final state. Theplot means that there are Y number of initial states (out of the 50 config-urations of the sample space) which converge (in the sense of the aboveconventional steepest descent) to local minima with same energy X. We seethat a fair number of initial states end up trapped in local energy minimaat high energy state because the steepest gradient descent search methodinvolved is deterministic and does not have provisions for escaping from alocal minimum. This curve can serve as a reference to test the performanceof an annealing scheme's ability to escape from a local minimum and findthe global minimum. In Figs 5 and 6, we display the energy histograms ofthe convergent states when different annealing schemes were used. In thesefigures, each of the 50 configurations of test space ir used as inputvector for 100 times, and the statistics of the convergent states arecollected. The results employing the simulated annealing algorithm[23],[24] with random drawing of neurons one neuron at a time, and

111-571

Page 64: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

stochastic update employing noise uniformly distributed in the range (-T,T)are shown in Fig. 5. The annealing or cooling schedule used was: 96 @ 10,160 @ 5, 96 @ 3, 96 @ 1, 96 @ .5, and 96 @ 0.01 where I @ T specifies thenumber of iterations I at temperature T. Figure 6 shows the resultsobtained with the noisy threshold scheme where the deterministic componentof the threshold is taken to be zero and independent bipolar noisecomponents uniformly distributed in the (-T,T) range are added to thethresholds every iteration. The probability of the i-th neuron switchingits state was taken to be inversely proportional to its activationpotential ui. The noise amplitude T was reduced gradually everyspecified number of iterations to allow the net to find the states ofglobal energy minimum or one close to it. The following annealing schedulewas utilized: 10 @ 2.5, 10 @ 1.5, 10 @ .5, and 10 @ .1. It is seen thatannealing with noisy threshold finds states of global energy minima equallywell as the conventional simulated annealing scheme. The number ofiterations involved is however considerably less: 40 as compared to 540 inthe conventional simulated annealing scheme. It is worth noting also thatthe noisy threshold scheme does not require knowing the energy of the netto apply the rule described in the preceeding section. This furtheraccelerates the search for the global minimum and can markedly shortenlearning time [28].

60290I

452191

I03 possible configurations.1 5073-

.03

1.0600.02 1.17750.02 1.2750002 1 3725.+02

*.0*

5.2500

.C 177 . Fig. 3. Density of states of the2 s ' . ........ sample space (of 50I,7, randomly selected

• " configurations).

1 0606+02 7 1 270.02 1 3727750

ENERGY

0 .00 Fig. 4. Density of convergent&6.OO . .... - ..... states in sample space

35 4/ - (conventional neural netI update rule).

1 8OO .02 0 1775.02 12750s .02 1 3725. -02

ENERGY

111-572

-- R .

Page 65: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

,870 Fig. 5. Density of convergent.031 states in sample space

3 3GS3 L (conventional simulated- 031

02351 annealing with noise'. I... .. . .uniformly distributed in

1.121 L (-T,T))..03

1000. .02 1.1775, .02 1 275,.-02 1.3725.-02

ENERGY

49520 Fig. 6. Density of convergent3 07states in sample space

C.o3 |1(noisy threshold scheme22.47W. using noise uniformlyz .]3 distributed in (-T,T).

S.03

1.0800.. 02 1.1775 .02 I 2750,-02 1 3725..02

ENERGY

5. DISCUSSION

We have described an architecture for partitioning an opto-electronicanalog of a neural net to form a multilayered net that permitsself-organization and learning when computer controlled nonvolatile spatial

light modulators are utilized to realize the required plasticity. Thefocus here is on stochastic learning as opposed to deterministic learningbecause it can account for the role of noise in biological neural nets. Wealso described opto-electronic architectures that can be used for fastdetermination of the energy of the net and therefore can accelerate thesimulated annealing process involved in stochastic learning where "opticalrandom arrays" can also be used to accellerate the process further.However, when parallel optical computing is employed, it is not necessaryto adhere to a serial simulated annealing algorithm. We have shown thatdeparture from the conventional simulated annealing algorithm through theuse of a noisy thresholding scheme promises to markedly accelerate

stochastic learning in opto-electronic implementation of multilayeredneural nets, make the procedure more biologically plausible, and make

stochastic learning practical.

6. ACKNOWLEDGEMENT

The work reported was supported by grants from Dt-RPA/NRL, The Armyresearch Office, and the University of Pennsylvania Laboratory for Researchon the Structure of Matter. Some of the work reported was performed when

one of the authors (NF) served as distinguished visiting scientist at

NASA-JPL.

111-573

Page 66: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

REFERENCES

1. Widrow, B. and M.E. Hoff, WESCON Convention Record, Part IV, pp. 96-104,1960. Also in Self Organizinx Systems, M.C. Yovitz, et. al. (Eds.)Spartan 1962.

2. Nakano, K., IEEE Trans. on Systems Man & Cybernetics, SMC-2, 380, 1972.3. Kohonen, T., IEEE Trans. on Computers, C-21, 353, 1972.4. Kohonen, T., Associative Memory, Springer Verlag, Heidelberg (1978).

Also "Self-Organization and Associative Memory", Springer Verlag, 1984.5. Willshow, D.J., Proc. Roy. Soc. London, 182, 233, 1972.6. Anderson, J.A., et. al., Physiological Review, 4, 413, 1977.7. Hopfield, J.J., Proc. Natl. Acad. Sci., USA, 79, 1982. Also, Proc.

Natl. Acad. Sci., USA, 81, 1984.8. Grossberg, S., Studies of Mind and Brain, R. Reidel Pub. Co., Boston,

1982.9. Sanderson, A.C., and Y.Y. Zeevi, (Eds.), Special Issue on Neural and

Sensory Information Processing, IEEE Trans. on Syst. Man andCybernetics, SMC-13, 1983.

10. Psaltis, D. and N. Farhat, Opt. Lett., i0, 98, 1985.11. Farhat, N.H., et. al., App. Optics, 2.4, 1460, 1985.12. Farhat, N.H., and D. Psaltis, Digest OSA Annual Meeting, Wash., D.C.,

p. 58, 1985.13. Lee, K.S. and N.H. Farhat, Digest OSA Annual Meeting, Wash., D.C.,

p. 48, 1985.14. Psaltis, D., et. at., Digest OSA Annual Meeting, Wash. D.C., p. 58,

1985.15. Condon, D.J., et. al., OSA Topical Meeting on Optical Computing,

Incline Village, Nov. 1985, (Post deadline paper).16. Fisher, A.D., C.L. Giles, and J.N. Lee, J. Opt. Soc. Am. 1, 1337, 1984.17. Fisher, A.D., and C.L. Giles, Proc. of the IEEE COPCOM Spring Meeting,

IEEE Computer Society, IEEE Cat. No. CH2135-2/85/0000/034Z, pp. 342,1985.

18. Fisher, A.D., et. a., SPIE, 625, Bellingham, WA., 1986.19. Sofer, B., et. al., Opt. Lett., 11, 118, 1986.20. Anderson, D.Z., Opt. Lett. 11, 56, 1986.21. Yariv, A. and S.K. Kwong, Opt. Lett., 11, 186, 1986.22. Farhat, N., et. a., in Neural Networks for Computing, AIP Conf. Proc.

151, J.S. Denker Editor, 146, (1986).23. Metropolis, N., et. al., J. Chem. Phys., 21, 1087, 1953.24. Kirkpatric, S., et. al., Science, 220, 671, 1983.25. Hinton, G.F., et. al., Carnegie-Mellon University Report No.

CMU-CS-84-119, May (1984).26. Sejnowski, T.J., and C.R. Rosenberg, Johns Hopkins University,

Electrical Engineering and Computer Science Technical Report No.JHU/EECS-96/OI (1986).

27. Rumelhart, D.E., G.E. into and R.J. Williams, Institute for CognitiveScience Report 8506, Univ. of California, San Diego, Sept. 1985.

28. Farhat, N., Optics Letters, (accepted for publication).29. Cohen, M.A., and S. Grossberg, Trans. on Man Syst. and Cybern., SMC-13,

815, 1983.30. Ross, W., D. Psaltis, and R. Anderson, Optical Engineering, 22, 485,

1983.

111-574

Page 67: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

31. Kirckpatric, S. Private Communication (1987).32. Ticknor, A.J., H.H. Barret and R.L. Easton, Jr., Post-Deadline Paper

PD-3, OSA Topical Meeting on Optical Computing, Incline Village, Nev.(1985).

33. Morris, G.M., Optical Engineering, 24, 86, (1985).34. Marron, J., A. Martina and G.M. Morris, App. Optics, 25, 26 (1986).35. Devos, F., P. Garda, and P. Chavel, Optics Letters, 12, 152, (1987).36. Tsuchiya, Y., E. Inuzuka, T. Kurono, and M. Hosada, in Advances in

Electron Devices, B.L. Morgan (Eds.), 21, Academic Press (1985).

111-575

Page 68: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

Appendix VI

Bimodal Stochastic Optical Learning Machine

N. H. Farhat Z. Y. Shae

University of PennsylvaniaElectrical Engineering Department

Phila. PA 19104

Abstract

Self-organization and learning is an important attribute of neural nets that sets themapart from other approaches to signal processing. The study of stochastic learning bysimulated annealing in the context of a Boltzmann machine is attractive because itcould shed light on the role of noise in biological neural nets and because it can leadto artificial neural nets that can be switched between two d;stinct operating modesdepending on noise level in the network. At finite noise level (or temperature) the netcan be operated in a "soft" mode where learning can take place by automated synapticmodifications. Once learning is completed the net is "hardened" (or frozen) and actsas associative memory by reducing the noise level or temperature to zero. We prese ntthe results of numerical and experimental study aimed at opto-electronic realizationof such networks. The results include: (a) fast annealing by noisy thresholding whichdemonstrates that the global energy minimum of a small analog test network can bereached in a matter of a few tens of neuron time constants, (b) stochastic learning withbinary weights which paves the way for the use of fast binary and nonvolatile spatiallight modulators to realize synaptic modifications.

1 System Architecture

Optics and opto-electronic architectures and techniques can play an important role inthe study and implementation ot self-programming networks and in speeding-up theexecution of learning algorithms. Learning requires partitioning a net into layers witha prescribed communication pattern among them. A method for partitioning an opto-electronic analog of a neural net into input, output, and internal groups (layers) ofneurons with seiective communication pattern among neurons within each layer andbetween layers that is capable of stochastic learning, by means of a simulated annealingalgorithm in the context of a Boltzmann machine formalism is described in Fig. 1(a)The network, consisting of N neurons, is partitioned into three groups. Two groups, lVIand V2 , represent visible or environmental units that can be used as input and outputunits respectively. The third group H are hidden units. The partition is such thatN, + N2 -r N3 = N where subscripts 1.2, and 3 on .V refer to the number of neuronsin the V1 , V2 and H groups respectively. The interconnectivity matrix, designated

_A1

Page 69: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

here as wiJ, is partitioned into nine submatrices, A, B. C, D. E. and F plus threezero submatrices shown as blackened or opaque regions of the wij mask. The LEDarray represents the state of the neurons, assumed to be unipolar binary (LED on -

neurons firing, LED off = neurons not-firing). The wil mask represents the strengthsof the interconnection between neurons. Light from the LEDs is smeared verticallyover the wij mask with the aid of an anamorphic lens system (not shown in Fig.1(a)) and light emerging from rows of the mask is focused with the aid of anotheranamorphic lens system (also not shown) onto elements of the photodetector (PD)array. Bipolar values of w~i can be realized in incoherent light by separating each rowof the wi] mask into two subrows and assigning positive values of wt to one subrow andnegative values w- to the other, then focusing light emerging from the two subrowsseparately onto pairs of adjacent photosites connected in opposition in each of the V1.V2 and H segments of the PD array as described elsewhere [2]. Submatrix A. withNV~xN 1 elements, provides the interconnection weights of units or neurons within groupV1. Submatrix B, with N2xN2 elements, provides the interconnection weights of unitswithin V2 . Submatrices C (of .V1xV 3 elements) and D (of .V3x.V1 elements) providethe interconnection weights between units of V and H and similarly submatrices E(of ,V2 x 3 elements) and F (of NV3x-N2 elements) provide the interconnection weightsof units V2 and H. Units in V and V2 can not communicate with each other directlybecause locations of their interconnectivity weights in the u;j matrix or mask areblocked out (blackened lower left and top right portion of wzi). Similarly units withinH do not communicate with each other because locations of their interconnectivitvweights in the wij mask are also blocked out (center blackened square of w, ). TheLED element 0 is of graded response. It can be viewed as representing the state of anauxiliary neuron in the net that is always on to provide a threshold level to ail unitsby contributing to the light focused onto only negative photosites of the PD array bysuitable modulation of pixels in the G column of the interconnectivity mask. Thismethod for introducing the threshold level is attractive as it allows for introducinga fixed threshold to all neurons or an adaptive threshold if desired. It can also beemployed to alter the energy landscape of the net adaptively in accordance to thebehavior of other parameters of the net. Figure l(b) shows the arrangement for rapiddetermination of the net's energy E for use in learning by simulated annealing. Acomputer works as the system controller to calculates P,, and P,, and also to control theMOSLM which implements the interconnectivity matrx W. This architecture allowsstochastic learning by simulated annealing in the context of a Boltzmann machine. Thelearning algorithm for Boltzmann machine can be summarized as follows:

1. Choose one mapping or associated pair that the net is required to learn, andpresent it to the net. The associated pair consists of two unipolar binary vectorsone an input vector and the other an output vector.

2. Clamp the input vector to the V, neurons. and the corresponding output vectorto the V2 neurons.

3. Employ simulated annealing method in energy space to find low energy configu-rations at the given V, and V2. The final temperature in the cooling schedule is

2

Page 70: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

called To and will be used later as an annealing parameter in Cross-Entropy orG-space. During this step, random drawing and change of only the states of thehidden neurons (H) takes place.

4. Repeat steps 2-3 V1 times for all associations the net is required to learn, andcollect co-occurrence statistics i.e. determine the probabilities Pij of the ith andjth being in the same state i.e. both being on or off.

5. Unclamp the V2 neurons and repeat steps 3-4 for all input vectors, and collect co-occurrence statistics again i.e. determine the probabilities Pj of the ith and jthneurons being in the same state. During this step. random drawing and changeof both the states of the H and the V2 neurons takes place.

6. All weights in the net are modified by increasing the synaptic weight (W,,) be-tween the ith and jth neurons by a small amount b if Pii - Pt' > 0. otherwise.

decreasing the weight by the same amount. Note this requires multivalued Wu,or incremental variation of W 2 that requires the use of graded response spatiallight modulators for realizing synaptic modifications in opto-electronic implemen-tations.

7. We call steps 1-6 a learning cycle. The learning cycle consists of two phases.Phase one involves clamping the input and output units to the associated pairs.Phase two involves clamping the input units to the input vector alone and lettingthe output units free run with the hidden units. The learning cycle is repeated

( again and again and is halted after P~j - P', is close to zero for every i and j.

The learning procedure described above can be supported in the opto-electronc

hardware environment described previously.

2 Fast Annealing With Noisy Threshold

With the aid of an optical random number generation. a spatially and temporallyuncorrelated linear array of perculating light spots of suitable size and intensity rangecan be generated and imaged onto the PD array of Fig. 1 directly such that both thepositive and negative photosites of the PD array are subjected to random irradiance.This introduces a random (noise) component in the threshold. The noisy thresholdproduces in turn a noisy component in the energy function of the net. The magnitudeof the noise components can be controlled by varying the light intensity array irradiatingthe PD array. The noisy threshold produces therefore random controlled perturbationor "shaking" of the energy landscape of the net. This helps shake the net loose wheneverit gets trapped in a local energy minimum. The procedure can be viewed as generatinga controlled gradually decreasing deformations or tremors in the energy landscape ofthe net that prevents entrapment in a local energy minimum and helps the net settleinto the global minimum energy date or one close to it. Both the random drawing ofneurons (more than one at a time is now possible) and the stochastic state update ofthe net are now done in parallel at t'o same time. This leads to significant acceleration

------ AM - - 3-A

Page 71: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

of the simulated annealing process. Electronic control of the random light intensityenables realizing any annealing profile. We had presented the results of numericalstudy elsewhere [1]. In the following, results of an experimental study aimed at gaininginsight in the performance of the noisy threshold scheme are presented.

3 Experimental Results

An annealing experiment based on the noisy threshold. algorithm in an opto-electronicneural net is reported. A television screen tuned to an empty channel where no TVstation operates is used as the spatio-temporal optical noise source. We use a lensto project the optical noise pattern (snow pattern) onto the photodetector array of anopto-electronic neural net consisting of 16 unipolar binary neurons of the type describedelsewhere [2]. The connectivity matrix of the network was the same random ternarymatrix utilized in earlier work [1]. The brightness of the TV screen is controlled bythe D/A output of a MASSCONIP computer, and the convergent state is monitored bythe A/D input of the same computer. A photograph of the experimental arrangementis shown in Figure 2. We investigated four types of cooling profiles: linear, concave,convex, and stair-case illustrated in Figure 3. For each cooling profile, we investigate 5annealing time intervals: 100. 200, .500, 1000, and 2000 ms. For each cooling profile andannealing time interval, we do the annealing 100 times to collect sufficient statistics,and find the probability that the system converges to its global minimal energy state.The experimental results obtained show that the setup can find the global energyminimum of an artificial neural net of 16 neurons in 2000 ms which corresponds to32 time constants of the neurons in the test network. A net of neurons with responsetime of 1 g sec would anneal therefore in few tens of microseconds and this is expectedto be independent of the number of neurons in the net as long as parallel injection ofnoise in the network is implemented. The cooling profile had no observable effect onthis result. The probabilities of convergence to a global rminimum as function of theannealing duration for different annealing profiles are shown in the table 1.

4 Stochastic Learning With Binary Weights

The Boltzmann machine learning algorithm described earlier employs graded weights.However, from practical viewpoint, learning in artificial neural nets can be simplifiedconsiderably if binary weights can be used. This would pave the way to using fastnonvolatile binary spatial light modulators (SLMs) such as Magneto-Optic SLM andFerroelectric liquid crystal SLM. However, a Boltzman. machine basically is an adap-tive system. If the step size of adaptive changes is too large and the sensitivity ofsystem response to the error signal is high, the machine will generally become unsta-ble. Since a traditional Boltzmann machine has a high sensitivity in response to errorsignal, i.e., it responds to the error signal (Pol - P') to modify synaptic weights evenwhen the error signal is very small, small weight variations are required to preventthe systr'm from becoming unstable. However, in a binary weight net (Wi, = I. -1the step size of adaptive change is large and fixed (-2 or 2). In order to prevent the

4

Page 72: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

system from becoming unstable, we increase the inertia of weights i.e. weights do notchange when small value of Pi, - Ptj occurs. As a result, the learning procedure of theBoltzmann machine in a binary weight net would be identical to the procedure of thegraded weights net stated in the system architecture section, except step 6 which ismodified as follows: If Pij - P' ) > M, set Wi 3 = 1; if Pij - P'j < -M, set Wj=otherwise, no change, where M E [0, 1] is a fixed constant.

The goal of the Boltzmann machine is to minimize the Cross-Entropy G by meansof modifying the weights of the net in a certain order. The G space is an informationtheoretic measure of the distance between the probability distributions when an envi-ronmental input is present in the net and when it is free running with no environmentalinput applied, and is given by

G = P+ (V,,) L P -+ on

where P+(V,) is the probability of the visible units being in the a state when the visibleunits are subjected to the environmental input. Namely. P+(V,) represents the desiredor specified probability for the a state. P-(V,) is the corresponding probability whenthe net is free-running. Namely, P-(Va) represents the actual probability generatedfrom the net for the a state. P-(V,) depends on the weights W,;, and so G can bealtered by changing Wj. Since, in general. there are local minima in G space, gradientdescent search will find a local minimum instead of the global minimum. In order toreach the global minimum in G space, introduction of noise in G space is required.However,-if the noise level is too large, the network can not generate the specified ordesired environmental distribution. A systematic way for adding noise in G space. i.e.an annealing scheme in G space, has not yet been studied in detail. Here we proposethe use of the final temperature To of the simulated annealing schedule used in theenergy space E as the annealing parameter in G space, since P-(V) is function of To.In the first few learning cycles, we use high values of To. This will provide high levelof noise in G space. The value of To is decreased gradually along with the number oflearning cycles. Accordingly, a simulated annealing process in G space is accomplishedby decreasing the final temperature To in a similar way to the simulated annealingprocess in energy space which is accomplished by decreasing annealing temperature T.Note that an annealing schedule with high value of To is equivalent to a short timeinterval annealing schedule in E space. i.e., both cases can generate high level of noisein G space, and vice versa. Accordingly, annealing time interval in E space can also beused as an annealing parameter in G space. As a result. a simulated annealing processin G space can also be accomplished by gradually increasing the annealing time intervalin E space along with the number of learning cycles. Results of computer simulationsof stochastic learning by simulated annealing in a Boltzmann machine employing bothgraded and binary weights are presented in the next section.

5 Simulation Results

In these simulations we use noisy threshold (N-T) annealing scheme [1] and use theannealing time interval in E space as an annealing parameter in G space. All the simu-

Page 73: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

lations learn to solve a 4-2-4 encoder problem [3] in the context of Boltzmann machineformalism i.e. this consists of having a three layered net, of the kind described in thearchitecture section, learn to form its own internal representations of the associationspresented to it. For all simulations, the net reaches equilibrium 100 times (25 times foreach input vector) for collecting the statistics of P1j during the input and output clamp-ing phase. The situation is the same for collecting the statistics of Pj. All annealingschedules are stated in the corresponding Figures in the notation of I4T explainedearlier [1]. The noise we used is binary noise whose amplitude is either T or -T andis decreased gradually in time and terminated at at To. Figure 4 shows the resultsof the linear weight learning scheme, and Figures 5 shows the results of the binaryweight learning scheme. All Figures show the results for 12 runs. The parameter Ifwe used is 0.1. Only two annealing schedules in E space are used for the annealingin G space. During the first half of the total number of learning cycles the short timeinterval annealing schedule is employed, and during the later half of the learning cyclesthe long time interval annealing schedule is employed. These results show the viabilityof the annealing scheme in G space, and also show the viability of the binary weight

stochastic learning scheme.

6 Conclusions

We have described an architecture for partitioning an opto-electronic analog of a neuralnet to form a multilayered net that permits self-organization and learning when com-puter controlled nonvolatile spatial light modulators axe utilized to realize the requiredplasticity. The focus here is on stochastic learning as opposed to deterministic learningbecause it may provide insight in the role of noise in biological neural nets. We alsodescribed opto-electronic architectures that can be used for fast determination of theenergy of the net if such information is needed and for adaptive deterministic deforma-

tion of the net's energy landscape to control its behavior. We show that departure fromthe conventional simulated annealing algorithm through the use of noisy thresholdingin opto-electronic .- hemes promises to markedly accelerate the annealing process. andmake stochastic learning practical. Employing the noisy thresholding scheme a smallopto-electronic neural net (of 16 neurons) was found to reach a global energy minimumor one close to it in about 32 neuron time constants. We also show that binary weightlearning algorithm can be used in the context of a modified Boltzmann machine. Thispaves the way to the use of nonvolatile binary spatial light modulators to realize therequired plasticity in such stochastic learning nets. Such nets. having learned theirenvironmental inputs can be "frozen" for use as associative memories of the entitieslearned by merely removing injected noise from the net. Noise injection for annealingreturns the nets to a "soft" mode for learning new environmental inputs.

7 Acknowledgement

This research was supported by grants from DARPA and the U. S. Army Harry Dia-

mond Laboratory.

6

Page 74: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

8 References

1. Nabil H. Farhat and Z. Y. Shae, "Architectures and Methodologies for Self-Organization and Stochastic Learning in Opto-Electronic Analogs of Neural Nets",IEEE first annual conference on neural net, ICNN, San Diego, California, June(1987)

2. Nabil H. Farhat, Demetri Psaltis, Aluizio Prata, and Lung Paek, "Optical imple-mentation of the Hopfield model", Applied Optics, Vol. 24, pp. 1469-1475, May15, (1985)

3. G. E. Hinton, T. J. Sejnowski, and D. H. Ackley, "Boltzmann machines: Con-straint Satisfaction Networks That Learn", Carnegie-Mellon University, TechnicalReports, CMU-CS-84-119, (1984)

THR~oLD NTEINONNRCO S1 PC

COTROI.ER TIVITy MASK

Fi.I. Architecture for opto-eiectronic analog of layered Self-prograrnnynet. (a) partitioning concept and. b, arrangement for rapiddetermination of the niet's global energy Efoueinlaigby simulatedi anneaLing.

Linear weight learning curve with N-Talgorithm. Annealing schedule inFspace: during the 0-25th learning cycle

2 1#00 2 9 3. 1 4 1.5. 1 9 1, and 2 OD 0.1:-during the 26-50th learning cycle 3 9 16 c] 0.3, 6 9 0.5, 6 -5 0.1. and 6 9 0.

-- ~ ~This is a annealing scheme in G space.

- Fig. 5

4-4 Binary weight learning curve with N-T1. j#-----algorithm. Annealing schedule in E

space: during the 0-50th learning cycle2 0 3. 1 9 1.5. 1 9 1. and 2 b o. 1:during the .51-100th learning cycle 4 ~1

4~~ ~ ~ 4"004 S .. 4 9 0.5. 4 9 0.1. and 4 ) o.~~.~EC "~ -This is a annealing scheme in G space.

Page 75: IUniversity - apps.dtic.mil · understanding the computational algorithms used by the nervous system and development of systems that emulate, match, or surpass in their performance

Fig. 2. Pictorial view of opto-electronic neuralnet of 16 unipotar binary neurons with randomternary weights used to verify fast annealingby noisy threshoolding.

4 To 44 Cu uuMfM linear concave convex stepin To10m.3 0.48 0.46 0.50 0.45

200mi -0.62 0.56 0.68 0.64

T,30m.9 0.78 0.73 0.7-7 0.79lOOOm~s 0.83 0.86 0.84 0.882000m.3 0.97 0.96 0.96 0.98

ITable I The probabiities of convergence to a gioO&&4 MMnMUM as function of the &nneajig aurat:oa

for different annealing profiles.

41 3

Fig. 3. Cooling profiles