9
Nuclear Instruments and Methods in Physics Research A279 (1989) 537-545 537 North-Holland, Amsterdam TRACK FINDING WITH NEURAL NETWORKS Carsten PETERSON Department of Theoretical Physics, University of Lund. Sölvegatan 14A, S-22362 Lund, Sweden Received 6 December 1988 and in revised form 6 March 1989 A neural network algorithm for finding tracks in high energy physics experiments is presented . The performance of the algorithm is explored on modest size samples with encouraging results. It is inherently parallel and thus suitable for execution on a conventional SIMD architecture. More important, it naturally lends itself to direct implementations in custom made hardware, which would permit real time operations and hence facilitate fast triggers. Both VLSI and optical technology implementations are briefly discussed. 1 . Introduction Experimental particle physics contain many pattern recognition problems ranging from track finding to feature extraction . Track finding is a combinatorial optimization problem ; given a set of signals in space, reconstruct particle trajectories subject to smoothness constraints . When exploring all possibilities the comput- ing requirements grow like N! . The problem is NP-com- plete *. Various heuristics algorithms are therefore used (see ref . [2] and references therein) . With the next generation of accelerators, e .g . LEP and SSC, one has to deal with very high multiplicity events appearing at an extremely rapid rate . For exam- ple, at SSC one expects = 4 events per 100-250 ns [3] with charged multiplicities in the 100-500 range. The majority of these events are of "minimum bias" type and are likely to be ignored after a low level trigger (absence of jets, fast leptons etc.) . However, a substan- tial fraction of the events (0(1/10)) contain high pT jets and could be potentially interesting enough to re- cord and analyze. Even after this data reduction one is faced with an enormous data processing task . With 0(100) signals per track one has to store 0(10 5) real numbers per p s . It is clear that if the tracks could be reconstructed at this level a lot is gained . First, the amount of data is compressed by a factor = 10. Second, and maybe more important, this information (particle identification, etc.) could be used for higher level trig- gers. As an example, it would be possible to trigger on events with large neutral energy. To make this happen * An optimization problem is NP-complete if there is no known algorithm whose worst-case complexity ( = time con- sumption) is bounded by a polynomial in the size of the input . For a review of these problems see ref. [1]. 0168-9002/89/$03 .50 © Elsevier Science Publishers B.V. (North-Holland Physics Publishing Division) one needs a reliable track finding algorithm that natu- rally lends itself to parallel processing and custom hardware . There has been an upsurge of interest in neutral network computational models in the last two years . These models have convincingly demonstrated a re- markable performance in their ability to learn pattern classifications (see e .g . ref. [4]). However, there exists another promising but less publicized application area of this technology : NP-complete problems [5,6] . Given the great VLSI potential of neural network models we find it worthwhile to explore the possibility of using neural network algorithms for the track finding prob- lem. As mentioned above, the ultimate goal would be a custom hardware implementation . One should mention, though, that even a lower ambition level could be very rewarding; the intrinsic parallellism in neural network models can be fully exploited in commercially available SIMD * architectures like CRAY and the Connection machine. Hence off-line analysis could be speeded up substantially . This paper is organized as follows : In section 2 we briefly review the basic ingredients and concepts in neural networks for the benefit of the newcomer to this field . Using neural networks for solving constraint satisfaction problems is discussed in section 3 and il- lustrated by a transparant case study ; the graph bisec- tioning problem [6] . In section 4 we map the track finding problem onto a neural network and section 5 contains numerical explorations of this approach . Possi- ble hardware implementations, VLSI and optical, are briefly discussed in section 6 . Finally a brief outlook and summary can be found in section 7. * SIMD = Single Instruction Multiple Data .

Track finding with neural networks

Embed Size (px)

Citation preview

Page 1: Track finding with neural networks

Nuclear Instruments and Methods in Physics Research A279 (1989) 537-545

537North-Holland, Amsterdam

TRACK FINDING WITH NEURAL NETWORKS

Carsten PETERSONDepartment of Theoretical Physics, University of Lund. Sölvegatan 14A, S-22362 Lund, Sweden

Received 6 December 1988 and in revised form 6 March 1989

Aneural network algorithm for finding tracks in high energy physics experiments is presented . The performance of the algorithmis explored on modest size samples with encouraging results. It is inherently parallel and thus suitable for execution on a conventionalSIMD architecture. More important, it naturally lends itself to direct implementations in custom made hardware, which wouldpermit real time operations and hence facilitate fast triggers. Both VLSI and optical technology implementations are brieflydiscussed.

1 . Introduction

Experimental particle physics contain many patternrecognition problems ranging from track finding tofeature extraction . Track finding is a combinatorialoptimization problem; given a set of signals in space,reconstruct particle trajectories subject to smoothnessconstraints . When exploring all possibilities the comput-ing requirements grow like N! . The problem is NP-com-plete *. Various heuristics algorithms are therefore used(see ref . [2] and references therein) .

With the next generation of accelerators, e.g . LEPand SSC, one has to deal with very high multiplicityevents appearing at an extremely rapid rate . For exam-ple, at SSC one expects = 4 events per 100-250 ns [3]with charged multiplicities in the 100-500 range. Themajority of these events are of "minimum bias" typeand are likely to be ignored after a low level trigger(absence of jets, fast leptons etc.) . However, a substan-tial fraction of the events (0(1/10)) contain high pTjets and could be potentially interesting enough to re-cord and analyze. Even after this data reduction one isfaced with an enormous data processing task . With0(100) signals per track one has to store 0(105) realnumbers per p s . It is clear that if the tracks could bereconstructed at this level a lot is gained. First, theamount of data is compressed by a factor = 10. Second,and maybe more important, this information (particleidentification, etc.) could be used for higher level trig-gers. As an example, it would be possible to trigger onevents with large neutral energy. To make this happen

* An optimization problem is NP-complete if there is noknown algorithm whose worst-case complexity ( = time con-sumption) is bounded by a polynomial in the size of theinput . For a review of these problems see ref. [1].

0168-9002/89/$03.50 © Elsevier Science Publishers B.V.(North-Holland Physics Publishing Division)

one needs a reliable track finding algorithm that natu-rally lends itself to parallel processing and customhardware .

There has been an upsurge of interest in neutralnetwork computational models in the last two years.These models have convincingly demonstrated a re-markable performance in their ability to learn patternclassifications (see e.g . ref. [4]). However, there existsanother promising but less publicized application areaof this technology : NP-complete problems [5,6] . Giventhe great VLSI potential of neural network models wefind it worthwhile to explore the possibility of usingneural network algorithms for the track finding prob-lem. As mentioned above, the ultimate goal would be acustom hardware implementation . One should mention,though, that even a lower ambition level could be veryrewarding; the intrinsic parallellism in neural networkmodels can be fully exploited in commercially availableSIMD * architectures like CRAY and the Connectionmachine. Hence off-line analysis could be speeded upsubstantially .

This paper is organized as follows: In section 2 webriefly review the basic ingredients and concepts inneural networks for the benefit of the newcomer to thisfield . Using neural networks for solving constraintsatisfaction problems is discussed in section 3 and il-lustrated by a transparant case study; the graph bisec-tioning problem [6] . In section 4 we map the trackfinding problem onto a neural network and section 5contains numerical explorations of this approach . Possi-ble hardware implementations, VLSI and optical, arebriefly discussed in section 6. Finally a brief outlookand summary can be found in section 7.

* SIMD = Single Instruction Multiple Data .

Page 2: Track finding with neural networks

538

2. Neural networks basics

The basic ingredients in a neural network are Nbinary neurons S; _ ±1 and the synaptic strengths T, j(i, j= 1, . . ., N) that connect the neurons. These con-nection strengths can take both positive and negativereal values and are in most models assumed to besymmetric, T, j = T., . The dynamics of almost all neuralnetwork models is given by a local updating rule

S;=sign(y_T. jS,) .

(1)j

With eq . (1) each neuron evaluates the sign of the"upstream activity" (see fig . 1) and updates accord-ingly. Positive T, j elements promote signals whereasnegative ones inhibit them. At a given time the state ofthe system is given by the vector S= (S t, Sz, . . . , SN ) .The dynamics of the system is determined by the T-ma-trix . It is easy to show that the updating rule of eq . (1)corresponds to gradient descent of the energy function[7]E(S)= - ZL,TJS,Sj .

(2)Y

In other words, for a given initial condition, eq . (1)takes us to the closest local minimum of the energyfunction given by eq . (2). There is a close analogybetween this system and certain statistical mechanicssystems; eq . (2) is the Ising spin glass Hamiltonian andthe local updating rule of eq . (1) corresponds to Glauberdynamics [8] at zero temperature.

In some applications it is desirable to reach theglobal minimum rather than the closest local one. Onethen has to employ a stochastic "hill-climbing" proce-dure . One way to do this is by exposing the systemdefined by eq . (2) to a noisy environment, in which thestate vector S obeys the Boltzmann distribution

P(S) =1e E(S)/kT>

Zwhere the partition function Z is given byZ= L e -E(S)/kT .

.s

Fig. 1 . A generic neural network updating .

C. Peterson / Track finding with neural networks

In eqs. (3) and (4) temperature T represents thenoise and k is the Boltzmann constant (in what followswe put k = 1) . Ensemble members of this distributioncan be generated through standard Monte Carlo proce-

dures (Metropolis, Heat Bath, etc.) at a substantial CPUexpense. However, it turns out that for spin systemswith many degrees of freedom and large connectivity(T. connects not only locally) such time consumingsimulations can be replaced by a set of deterministicequations by using the so called mean field theory(MFT) approximation [5,6,9] . The key ingredients ofthis trick are the following . Rewrite the discrete sum ofeq . (4) as a multidimensional integral over the continu-ous variables U, = (U . . . . . UN) and V, = (VI . . . . . VN ) :

NZ= CFI fdVfdU e-E'(V,u,T)

(5)j=1

where C is a constant, II ff denotes multiple integrals,and

NE'(V, U, T) = E(V)IT+ Y [U,,V,+log(cosh U)] .

The MFT approach consists of approximating Z withthe value of the integrand at the saddlepoint given by(aE'/aU, = 0 and aE'/aV, =0), which yields

where the new variables V, denote the thermal averageof S, :

V, = ~S,)T.

Thus the behaviour of the neural network at sometemperature T can be emulated by the sigmoid updatingrule of eq . (9) (see fig . 2) . The step function (1) is theT -0 limit of eq . (9) . Eqs. (9) are solved by iterativetechniques . Improved convergence is often obtained byapproaching the steady state with

(10)

In eqs. (11) and (12) we have resealed U, of eqs. (7) and(8) with U, -> U,/T in order to facilitate comparisonswith VLSI implementations in section 6.

What are these neural network systems good for?There are two distinct application areas: feature recog-nition and optimization problems .

V,-tanhU,=0 7

and

1 aE(V)T =

a V,+ U 0 . (8)

Combining eqs. (2), (7) and (8), we get

V,. = tanh( y-TjV.IT),

(9)

d U,- U-+ E T,iV , (11)

dt

where

V, = tanh(U/T) . (12)

Page 3: Track finding with neural networks

tanh(x/T )

1-01

0.0-

-1 .0

. T=0.2- T=1 .0--- T=5.0_._.,-. . . . . . . . . ..

x-5

-1 0 1

5

Fig. 2 . Sigmoid gain functions (eq . (9)) for different temperatures T. The step function updating rule of eq. (1) corresponds

to T -" 0 .

- Feature recognition . Here one teaches the system a setof patterns by means of an appropriate choice of T,. J .In the production phase one then initiates the net-work with an incomplete pattern and the networkcompletes it through eq. (1) .

- Optimization . By appropriate "programming" of T,Jthe network relaxes to a stable state that representsthe solution of the problem.Needless to say eqs . (1) and (9) are inherently paral-

lel, which facilitates simulations of neural networks onparallel processors . In section 6 we discuss how eqs. (1)and (9) can be directly implemented in VLSI and opti-cal technologies .

3 . Graph bisectioning revisited

Let us here review the results from ref . [6] on using aneural network approach to obtain approximate solu-tions to the graph bisection problem. This problem,which constitutes an illustrative application, is definedas follows (see fig . 3) :

Consider a set of N nodes, which are either con-nected with each other or not. Partition them into twohalves such that there are N/2 nodes in each half. Thisproblem can easily be mapped onto the Hopfield dis-crete neural network of eq . (2), by the following repre-sentation : For each node, assign a neuron Si = 1 andfor each pair of vertices S,S, i ~ j, we assign a valueTi J = 1 if they are connected, and T, J = 0 if they are notconnected . In terms of fig . 3, we let Si = ± 1 representwhether node i is in the left or in the right position.

Fig . 3 . A graph bisection problem.

C. Peterson / Track finding with neural networks

With this notation, the product TijSiSJ is zero whenevernodes i and j are not connected at all, positive wheneverconnected nodes i and j are in the same partition, andnegative when they are in separate partitions. With thisrepresentation, minimization of eq. (2) will favour thepartitions within a partition while minimizing the con-nections between partitions . However, the net result willbe that all nodes are forced into one partition. Hencewe must add a "constraint term" to the right-hand sideof eq . (2) that penalizes situations where the nodes arenot equally partitioned . We note that ES; = 0 when thepartitions are balanced . Hence, a term proportional to(ES1 ) 2 will increase the energy whenever the partition isunbalanced . Our neural network energy function forgraph bisection then takes the form :

2

E_ -z~T.JSJ+ 2 (~Si)

,J

Si = sign I

(Tij -a)SJJ

.l

539

(13)

where the imbalance parameter a sets the relativestrength between the cut-size and the balancing term .

Again gradient descent on the energy surface definedby eq . (2) can be performed by making local updateswith a step-function updating rule (see eq. (1)) :

(14)

This procedure takes us to the closest local minimumrather than the global minimum that we desire . Asdiscussed in section 2 this situation can be remedied byusing MFT equations, which in this case have the form

L J

I -Before discussing the performance of eq. (15) we

should make an important comment . The generic formof the energy function in eq . (13) is

E = "cost" + "constraint" .

(16)

This is very different from a more standard heuristicstreatment of the optimization problem. For example, inthe case of graph bisection one typically starts in aconfiguration where the nodes are equally partitionedand then proceeds by swapping pairs subject to someacceptance criteria . In other words, the constraint ofequal partition is respected throughout the updatingprocess . This is in sharp contrast to neural networktechniques (eqs. (13), (16)), where the constraints areimplemented in a "soft" manner by the Lagrange multi-plier a .

In ref . [6] extensive studies of the performance of eq.(15) were made on graph bisection problems with sizesranging from N = 20 to 2000 with very encouragingresults . In fig . 4 we show typical results from ref . [6],where the neural network MFT algorithm isbenchmarked against the more conventional simulated

Page 4: Track finding with neural networks

540

°20

'° a16-12-8-4-0-

110

°l° b20 ~1612

270

o MFT NEURAL NETWORKNsweep = 100

430 590 750 910(a)

SIMULATED ANNEALINGo LOCAL OPTIMIZATION

8

110 270 430 590 750 910(b)

r+ cutsize

cutsize

Fig. 4. Comparison of the neural network (a) performance ofeq . (15) for an N= 2000 geometrical graph bisection problemwith the more conventional approaches local optimization, andsimulated annealing (b) . The relative serial CPU consumption

is 1 :10 : 91 . (From ref . [6].)

annealing and local optimization algorithms . Eachhistogram in fig. 4 represents 100 different experimentsperformed on a N = 2000 geometric graph. The resultsfrom ref. [6] can be summarized as follows :- Insensitivity to choice of a and T. Two parameters

enter into eq . (15) . How sensitive is the result to ourchoice of these? Do these parameters have to befine-tuned? It turns out that very good quality solu-tions are obtained for a substantial area of the(a, T)-plane.

- Quality of the solutions . The performance of thealgorithm outperforms local optimization and fallson the tail of the very time consuming simulatedannealing (see fig. 4) .

- Imbalance of final solution . Since the constraint ofbisectioning is soft, most MFT solutions have a smallimbalance, i.e . ES, 0 0 * . This is easily remedied bya Greedy Heuristics, in which one searches in thelarger set for a neuron that can swap sign with theleast increase of the cut size .

- Convergence times . As is clear from the caption of fig .4 the MFT algorithm is very competitive even whenexecuted serially . The algorithm scales linearly withproblem size.

4. Mapping the track finding problem onto a neuralnetwork

Consider a set of N space points or signals . Forsimplicity we limit the discussion to 2 dimensions . The

* In many engineering applications this is of no significance .

C. Peterson / Track finding with neural networks

ykl

Fig. 5 . N signals and directed line segments (neurons).

track finding problem consists of drawing a number ofcontinuous, nonbifurcating and smooth tracks throughthese points . In order to cast this problem onto a neuralnetwork algorithm we define N(N-1) directed linesegments S, j between the N signals (see fig . 5) . As willbe discussed in the next section, the number of signalsthat needs to be connected is considerably less thanN(N - 1) in most realistic examples.

Let these segments Sip be represented by binaryneurons such that Sid = 1 if the segment i -j is part ofthe track and Sip = 0 if this is not the case . In order todefine an appropriate energy function for the networkwe need two measurements : length of segments andangles between adjacent line segments (see fig . 6) . Notethat the definition of the angles Oijl takes the directionsof Sip and Sjl into account. Since our neurons in thiscase are 2-dimensional, the synaptic strengths have 4dimensions . Again the energy function takes the genericform of eq . (16) :

E_ -z

(T(co`) + T(constraino

S Skl

We want the "cost" part of Tjkl to be such thatshort adjacent segments with small relative angles arefavoured . There are many ways to accomplish this . Wehave made the choice

cos'O'11

Tjkl ~)_- sJk r

+r~l

where m is an odd exponent . With this choice theenergy of eq . (17) will blow up for long segments andlarge angles (in particular for 0> m/2, when the contri-bution to E becomes positive). An example of Ti(";`) isshown in fig . 7a . We next turn to the "constraint" term .

(17)

(18)

Fig. 6 . Definition of segment lengths r,, and angles Oi. l be-tween segments .

Page 5: Track finding with neural networks

Fig. 7. Example of excitory (a) and inhibitory (b) connectionsto a neuron (Sz3) corresponding to eqs . (18) and (20) respec-

tively.

For this problem it consists of two parts :(constraint) =

(t)

(z)Tjkl

Tjkl+ Tjkl'

The first term in eq . (19) takes care of the requirementthat there should be no bifurcating tracks ; two segmentswith one identical index should not be "on" at the sametime. In our language this requires that such configura-tions should give positive contributions to the energyfunction of eq . (17). This is done with the followingchoice for T(j k l :

T~kl =- za[sik(1-sil)+sjl(1-slk )]e (20)

where a is a Lagrange multipier . In other words, con-nections between neurons in the same row or columnare negative (see fig . 7b) . The other term, 7;(2), , ensuresthat the number of neurons that are "on" is roughlyequal to the number of signals N. It has the genericform

Tike = -

,8[ "global inhibition"],

(21)

where iß is another Lagrange multiplier . Its algebraicstructure is more transparent in the expression for thetotal energy,

cosmBi rE = - z~sjk rl+ J~

.Sij skl + z«E sij si,

p ~`

z+2a E SijSki + 2N(Y S kl -N ) '

(22)k*i

As in the graph bisection case we want to avoid localminima. Hence we employ the MFT equations analo-gous to eqs . (9) or (11) and (12) .

(19)

where again mean field variables V,.j = ( SijiT havebeen introduced . Eq . (24) contains three parameters, a,

C Peterson / Trackfinding with neural networks

/3 and the temperature T. In the next section we exploreeq. (24) numerically on modest size test problems .

5. Numerical explorations

We will now solve eq . (24) iteratively for test prob-lems. In section 6 we will briefly outline how this can bedone with dedicated VLSI or optical hardware .

With our representation, N signals require N(N - 1)neurons and equations . This number can be reduced inmost applications . It is very unlikely that two signals,which are very far apart, are directly connected . Hence,one may define a radius RcU , around each signal be-yond which no segments connect . There are many possi-ble ways to define Rc, We have chosen the following :A histogram of segment lengths (see fig . 8) typicallypeaks around small lengths . Assuming the peak to beapproximately symmetric the cut is given by its highend slope. In our samples this procedure reduces thenumber of active neurons by a factor = 1 - z . Ourresults are not very sensitive to Rct, t .

As testbeds we have chosen different images with 5tracks based on approximately 10 signals per track (seefig . 9) . In addition a few runs were made on smaller andlarger problems (3 and 10 tracks) in order to get afeeling for the scaling properties. For the parameters ineq . (24) we used T= 1 and a = f3 = 1 . The exponent inthe "cost" term in eq . (18) was chosen to be m = 7 . Thesystem was initialized at

Vj = V,° + Y°o,

where V,° is given by the global constraint requirement(see the last term in eq . (22))

Y_Vo =N, (26)

and V,°0 are random numbers in the interval

U

N

Rcut

541

(25)

15 distance

Fig. 8 . Typical distribution of segment lengths for a 5-trackproblem with 50 signal points .

V.j8E 1

I= tann( - ), (23)400

8V : T 300

which with eq. (22) yields 200r

Vj = tanhl( ~cosmB;

jr

+-a(Y_lil+Y_Vkj) 100

l r, j rjr t*i k* i

-,0(Y Ski -N)),T

I, (24)

k*l

Page 6: Track finding with neural networks

542

V,J(t+1)-VU(t) 1I- < 10-3,V. N

C. Peterson / Track finding with neural networks

Nsweep =15

C

Fig . 9. Segments with V, > 0.1 at different evolution stages of the MFTequations.

[-10 -° , 10 -° 1 . As a convergence criterion we used

(27)

where 1V is the number of "active" neurons subject tothe cut R < Rut .

For the iterative updating of eq . (24) there exist twoextreme alternatives ; asynchronous and synchronousupdating. Asynchronous updating means that a neuronis randomly picked and updated and the new value isused in the rKs. of eq. (24) for the next updating . Inour serial execution we have used a slightly weakerversion, sequential updating, where the r.h .s . is evaluatedwith most recent values . This is in contrast to thesynchronous method, where all neurons are swept overonce and then updated. It is well known that nonlinearsystems like eq . (24) in general perform better (= shorterconvergence times and no cycle behaviour) whenupdated asynchronously . This is relevant when usingSIMD architectures like CRAY and the ConnectionMachine, which only allow for synchronous updating .Degradation factors of order 2 have been observedalready for relatively small systems [101 when usingsynchronous updating.

After convergence we set Sid = sign(Vj). In somecases the final solutions are plagued with violations ofthe constraints of eq . (20) . These are easily removed bya greedy heuristic procedure [6] as follows: After con-vergence of the MFT equations, find all signals withmore than one line segment entering or leaving. Removethe "extra" segments (Sip - 1) subject to the conditionof minimal increase of the "cost" part of the energyfunction . (In most cases it turns out that the "cost"actually decreases with this procedure.)

In fig . 9 we show segments at different stages of theevolution (Nweep= 5, 10, 15 and 40 respectively) for a5-track problem. The development of the individualneurons is shown in fig. 10 and the correspondingenergy behaviour is shown in fig . 11 .We have also performed exploratory studies on the

scaling properties of the algorithm for this problem. Infig . 12 we show results from limited statistics runs onthree different problem sizes. The problem size be-haviour is consistent with a linear convergence timedependence on 1V as in ref . [6] . The vertical error barsin fig.12 reflect the fact that different problems of givensize have different 1V .

All in all, our simulation results look very encourag-ing. As in the graph bisection case we observe good

Page 7: Track finding with neural networks

1 0

0 .8

06

0 4

02

0,0

Fig. 10 . The neuron variables V, j as a function of Nsw_P forthe 5-track problem of fig. 9 .

80 0

60 .0

40 .0

20,0

00

-20 .0

1000

100

Fig. 11 . Energy as a function of N,.V_p for the 5-track problemof fig. 9.

solution quality, parameter stability and again possibleimbalance of the solutions are easily removed with agreedy heuristic .

In contrast to the graph bisection results, our serialcomputations are not competitive with conventionalmethods for the track finding problem. The reason is

Convergence time

N100 1000 10000 50000

[3-81 [5 " 1o] [10-151

Fig. 12 . Convergence time as a function of problem size forthree different Nracks X Ni gnals/track'

C. Peterson / Trackfinding with neural networks 543

that N scales approximately with N(N - 1), whichshould be compared with N for more conventionalapproaches [21 * . The real power in the neural networkapproach lies in the possibilities of implementing it inparallel hardware.

6. Hardware implementations - VLSI and optical

A neural network can either be fully implemented inhardware or virtually implemented in software . A fullimplementation requires a dedicated processor for eachneuron, and a dedicated information channel for eachinterconnect . Virtual implementations, on the otherhand, simulate the functions of the neurons using afewer number of processors, by time multiplexing severalneurons on each processor N > M. The states of theneurons, connection weights and other parameters arestored in local memories. Different mesh and hypercubearchitectures (e .g . the Connection Machine) are suitablefor such a virtual implementation . Depending on themagnitude of M, substantial speedup of solving ourMFT equations can be achieved .

However, for real-time processing full hardware im-plementation is desirable. Two different scenarios existfor this : VLSI and optics . Let us review these possibili-ties very briefly . Nothing in this section is original .

6.1 . VLSI

This is the most mature discipline of the two. In thistechnology it is natural to represent the neurons as

* The same increase in the number of degrees of freedomoccurs when generalizing graph bisection to graph partition[111.

inputs I i

Ui-C:>-Vi amplifier[Vi=tanh(Uj/T)]o inverter" resistor [Tvt,Tj.

outputs

Fig. 13 . A fully connected VLSI neural network.

Page 8: Track finding with neural networks

544

amplifiers with a feedback circuit consisting of wires,resistors and capacitors [12,13] . The amplifiers havesigmoid input-output relations (cf. eq . (12) and figs . 2and 13). Each amplifier also has an input capacitor (C,)that defines the time constant and facilitates the in-tegrative analog summation of the input signals fromthe other neurons. The connection strengths are represented by resistors with values R ;j = 1/ j T,. j j . Theresistors are of course always positive. On the otherhand, most neural network modelling (at least the onewe are dealing with here) requires that the T,~ elementscan take both positive and negative values . This can besolved by having a pair of "rails" of connections be-tween the neurons, T+j and Ti-j . Each amplifier is giventwo outputs, a normal ( > 0) and an inverted one ( < 0) .T+ and T,~ are both positive but are interpreted withdifferent signals depending on the sign of the outgoingvoltage from the neuron . In fig. 13 we show such a fullyconnected VLSI neural network, where we also haveincluded externally provided currents I; that set theoverall and individual threshold levels for the ampli-fiers .

The time evolution of the circuit in fig. 13 is given by

These RC-equations are identical to eqs. (11) and(12) with C, = 1 and I; = 0. In other words, the RC-equations of a fully connected network of neurons havethe MFT solutions as fixed points! This isomorfismconstitutes a strong merger of theory and practicalimplementations. The circuit in fig. 13 could be fabri-cated with a chip consisting of CCD technology basedresistors surrounded by off-chip amplifiers [14,15] . With0 .5 la.m design rules a 10000 neuron network (100signals in the track-finding case) would then occupy anarea of 25 mm2. The relaxation time for such a circuit isgiven by T = 1/RC . Realistic values for C and R are0.1 fF and 20 MS2 respectively, which gives a very rapidconvergence time, T= O(ns).

6.2. Optical computing

This novel technology is rapidly gaining momentum .It is well suited for parallel processing and many de-manding applications involve image processing whereoptics is natural. Our MFT equations involve matrixmultiplications [U, = ET,-,V], which are natural to im-plement optically. To see this, consider a set of lightemitting diodes (LEDs) and a set of photodetectors(PDs) with a spatial light modulator (SLM) in between(see fig. 14). With appropriate lenses these devices canperform a matrix multiplication when the SLM is used

C Peterson / Trackfinding with neural networks

7. Summary and outlook

Vi = tanlt( ( 'i/T)

SLM (Ti~,Ti _i

Fig. 14 . A generic optical neural network.

to store the matrix elements (TJ ) . Again the connectionstrengths need to be multiplexed (T,', and T,-j ) in orderto account for negative weights. In fig. 14 it is alsoindicated how to solve eq . (9) by closing a circuit withan amplifier (tanh) . As a spatial light modulator onecan either use a liquid crystal, which is loaded electroni-cally, or a photorefractive crystal, which is set optically.With the latter technology one can store approximately10 9 matrix elements per cm3. This number is certainlysmaller than the corresponding numbers for VLSI, butone should keep in mind that with this technology thereare no scaling limitations . At present such a system isslower, O(ms), than the VLSI alternative but the field isyoung and the very large scale interconnect capacity ishard to beat!

We have mapped the track finding problem onto aneural network. Our encouraging results can be sum-marized as follows.- With mean field theory techniques the neural net-

work algorithm produces solutions with high qualityfor the relatively modest size problems we have ex-plored ; Ntracks X Nignals/track = 3 X ô, 5 X 10 and 10 X15 respectively .

- The calculations were performed on a serial com-puter. In the neuronic representation of the problem,N signals require O(N(N - 1)) neurons. The al-gorithm is therefore not competitive with conven-tional approaches when executed serially .

- The real power lies in the inherent parallelism . Thiscan be exploited in two stages, through virtual imple-mentation on general purpose parallel architecturesand with custom made hardware .

dCi

U,ât = Y_T,V+Y_T- ( -V)+I, (2s)

J Jwith

V,-= tanh(U,/T) . (29)

Page 9: Track finding with neural networks

- The virtual implementation will presumably turn outto be very competitive with conventional approachesand will be useful for off-line analysis.

- The MFT equations naturally lend themselves todirect implementations in VLSI and optics . In thecase of VLSI a strong isomorfism exists. With thesetechnologies it will be possible to perform on-lineanalysis on the ~Ls scale with 0(10 4-105) degrees offreedom. This could make real time triggers at, e.g.,the SSC feasible .

- One should keep in mind that our way of mappingthe problem onto a neural network is by no meansunique . Recently more efficient ways of mapping theTSPproblem onto a neural network have been devel-oped [171, . where S,j are replaced by vectors S; sub-ject to normalization constraints . Substantial perfor-mance improvements are obtained with this method .It will be very interesting to see if these improve-ments carry over to the track finding problem.After the completion of this work we received a

paper by Denby [181 with a similar approach to thetrack finding problem.

References

[1] M.R. Carey and D.S . Johnson, Computer and Intractabil-ity : A Guide to the Theory of NP-Completeness (W.H .Freeman, San Fransisco, 1979).

[21 H. Grote, Rep. Prog. Phys . 50 (1987) 473.

C. Peterson / Track finding with neural networks

[3] See, e.g., M.G.D . Gilchries, in : Proc . 1984 Summer Studyof the Design and Utilization of the SuperconductingSupercollider, Snowmass, Colorado, ed. J.G . Morfin(1984) .

[41 See, e.g ., D. Rumelhart and J.L . McCelland, ParallelDistributed Processing : Explorations in the Microstruc-ture of Cognition, vol . 1 (MIT, 1986) .

[5] J.J . Hopfield and D.W . Tank, Biol . Cybernetics 52 (1985)141.

[6] C. Peterson and J.R . Anderson, Complex Systems 2 (1988)59 .J.J . Hopfield, Proc. Nat. Acad. Sci. USA 79 (1982) 2554.R.J . Glauber, J . Math . Phys . 4 (1963) 294.C. Peterson and J.R. Anderson, Complex Systems 1 (1987)995.

[10] C. Peterson and E. Hartman, MCC-ACA-ST/HI-065-88,to appear in Neural Networks .

[11] C. Peterson and J.R. Anderson, MCC-ACA-ST-064-88, toappear in Physica D.

[121 J.J. Hopfield, Proc. Nat. Acad . Sci . USA 81 (1984) 3088 .[13] C.A . Mead and M.A . Mahowald, Neural Networks 1

(1988) 91 .[14] L.D. Jackel, R.E. Howard, h.P . Graf, B. Straughn and J.S.

Denker, J. Vac. Sci. Technol. B4 (1986) 61 .[151 J.P . Sage, K. Thompson and R.S. Withers, Neural Net-

works for Computing, Snowbird, UT 1986, ed . J.S . De-nker .

[16] See, e.g ., C. Peterson and S. Redfield, Proc . 3rd Int . Conf.on Optical Computing, Toulon, France (1988).

[171 C. Peterson and B. Söderberg, LU TP 89-1, to appear inInt . J . Neural Systems.

[181 B. Denby, Comp. Phys . Comm . 49 (1988) 429.

[71[81[9]

54 5