18
Modeling the integration of motion signals across space Gunter Loffler and Harry S. Orbach Department of Vision Sciences, Glasgow Caledonian University, Cowcaddens Road, Glasgow G4 0BA, UK Received October 7, 2002; revised manuscript received March 5, 2003; accepted March 18, 2003 Experiments by Loffler and Orbach on the integration of motion signals across space [J. Opt. Soc. Am. A 20, 1461 (2003)] revealed that both three-dimensional analysis and object interpretation play a much smaller role than previously assumed. These results motivated the quantitative description of a low-level, bottom-up model presented here. Motion is computed in parallel at different spatial sites, and excitatory interactions operate between sites. The strength of these interactions is determined mainly by distance. Simulations correctly predict behavior for a variety of manipulations on multi-aperture stimuli: aligned and skewed lines, different presentation times, different inter-aperture gaps, and different spatial frequencies. However, strictly distance-dependent mechanisms are too simplistic to account for all experimental data. Mismatches for grossly misoriented lines suggest collinear facilitation as a promising extension. Once incorporated, col- linear facilitation not only correctly predicts results for misoriented patterns but also accounts for the lack of motion integration between heterogeneous stimuli such as lines and dots. © 2003 Optical Society of America OCIS codes: 330.4060, 330.4150, 330.7310. 1. INTRODUCTION Neurons in the primary visual cortex are responsive only to stimulation in a small part of the visual field: their re- ceptive field. 1 Any system that uses such local strategies at its initial stage faces a serious computational problem: Local signals must be combined at some later stage to re- veal the true characteristics of the object from which the signals are sampled. In the case of motion computation this problem can be demonstrated with as simple an ob- ject as the diamond portrayed in Fig. 1. For most parts of the object, locally sampled informa- tion is incorrect (the aperture problem 2 ). Only certain features (e.g., corners, line tips) allow the true motion sig- nals to be retrieved, and they are therefore thought to play a major role in object motion computation. Indeed, it has emerged that, perceptually, such features very of- ten determine (i.e., capture) the directional perception for contours in their vicinity. 27 There is, however, a problem inherent in this scheme that is due to features emerging accidentally in the case of partial occlusion (e.g., T junctions). To solve this prob- lem, it has been proposed that the visual system may dis- tinguish between ‘‘real’’ (intrinsic) and ‘‘accidental’’ (ex- trinsic) features and utilize only intrinsic information to solve the aperture problem. 8 However, recent evidence has challenged the simplicity of this view. 7,9,1012 In a companion paper 7 (this issue) we have established behav- iorally those circumstances in which feature signals (line terminators) capture contour motion (line segments) and when they fail to do so. Such simple stimuli (lines dis- played in a multi-aperture-mask arrangement) are well suited to test motion models concerned with the lateral interaction of motion signals. The present paper pre- sents such a model and analyzes in detail the computa- tions necessary for it to successfully predict psychophysi- cal data. In outline, the model initially computes multiple direc- tions of motion locally and in parallel. This results in many incorrect signals at locations where the object lacks features because of the aperture problem and in a few cor- rect signals at feature locations. 13 Signals from different locations then interact with one another and give rise to a direction of motion at individual sites that can be com- pared with experimental data. A number of previous studies have attempted to model the lateral interactions that are necessary to account for motion capture. One set of models employs smoothing operations to minimize the difference of neighboring mo- tion signals. 14,15 Another approach suggests the use of different units that respond selectively to terminator or contour motion. 16 These models have not been based on, nor described in terms of, the physiological architecture of the visual system. The underlying computational consid- erations are difficult, if not impossible, to identify with neuronal behavior in the visual system. Other models exist that rely on more complicated mechanisms for motion integration. 1719 However, none have been applied to the case of lines in multi-aperture displays, and their physiological plausibility remains to be verified. More recently, Bayesian or ideal-observer models 20,21 have been proposed to model human percep- tion of object motion. Although sometimes related to a neuronal implementation, they approach the problem from a computational perspective. Predictions of human performance are elegantly made from such models on the basis of a small number of simple assumptions about the probability distributions for the set of real-world objects that would produce a given retinal image. As such, they provide one level of description of certain features of hu- man vision and are especially valuable for designing com- puter vision systems. However, their descriptive value depends on whether they can provide a compact descrip- 1472 J. Opt. Soc. Am. A/ Vol. 20, No. 8/ August 2003 G. Loffler and H. S. Orbach 1084-7529/2003/081472-18$15.00 © 2003 Optical Society of America

Modeling the integration of motion signals across space

  • Upload
    harry-s

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

1472 J. Opt. Soc. Am. A/Vol. 20, No. 8 /August 2003 G. Loffler and H. S. Orbach

Modeling the integration of motion signals acrossspace

Gunter Loffler and Harry S. Orbach

Department of Vision Sciences, Glasgow Caledonian University, Cowcaddens Road, Glasgow G4 0BA, UK

Received October 7, 2002; revised manuscript received March 5, 2003; accepted March 18, 2003

Experiments by Loffler and Orbach on the integration of motion signals across space [J. Opt. Soc. Am. A 20,1461 (2003)] revealed that both three-dimensional analysis and object interpretation play a much smaller rolethan previously assumed. These results motivated the quantitative description of a low-level, bottom-upmodel presented here. Motion is computed in parallel at different spatial sites, and excitatory interactionsoperate between sites. The strength of these interactions is determined mainly by distance. Simulationscorrectly predict behavior for a variety of manipulations on multi-aperture stimuli: aligned and skewed lines,different presentation times, different inter-aperture gaps, and different spatial frequencies. However,strictly distance-dependent mechanisms are too simplistic to account for all experimental data. Mismatchesfor grossly misoriented lines suggest collinear facilitation as a promising extension. Once incorporated, col-linear facilitation not only correctly predicts results for misoriented patterns but also accounts for the lack ofmotion integration between heterogeneous stimuli such as lines and dots. © 2003 Optical Society of America

OCIS codes: 330.4060, 330.4150, 330.7310.

1. INTRODUCTIONNeurons in the primary visual cortex are responsive onlyto stimulation in a small part of the visual field: their re-ceptive field.1 Any system that uses such local strategiesat its initial stage faces a serious computational problem:Local signals must be combined at some later stage to re-veal the true characteristics of the object from which thesignals are sampled. In the case of motion computationthis problem can be demonstrated with as simple an ob-ject as the diamond portrayed in Fig. 1.

For most parts of the object, locally sampled informa-tion is incorrect (the aperture problem2). Only certainfeatures (e.g., corners, line tips) allow the true motion sig-nals to be retrieved, and they are therefore thought toplay a major role in object motion computation. Indeed,it has emerged that, perceptually, such features very of-ten determine (i.e., capture) the directional perception forcontours in their vicinity.2–7

There is, however, a problem inherent in this schemethat is due to features emerging accidentally in the caseof partial occlusion (e.g., T junctions). To solve this prob-lem, it has been proposed that the visual system may dis-tinguish between ‘‘real’’ (intrinsic) and ‘‘accidental’’ (ex-trinsic) features and utilize only intrinsic information tosolve the aperture problem.8 However, recent evidencehas challenged the simplicity of this view.7,9,10–12 In acompanion paper7 (this issue) we have established behav-iorally those circumstances in which feature signals (lineterminators) capture contour motion (line segments) andwhen they fail to do so. Such simple stimuli (lines dis-played in a multi-aperture-mask arrangement) are wellsuited to test motion models concerned with the lateralinteraction of motion signals. The present paper pre-sents such a model and analyzes in detail the computa-tions necessary for it to successfully predict psychophysi-cal data.

1084-7529/2003/081472-18$15.00 ©

In outline, the model initially computes multiple direc-tions of motion locally and in parallel. This results inmany incorrect signals at locations where the object lacksfeatures because of the aperture problem and in a few cor-rect signals at feature locations.13 Signals from differentlocations then interact with one another and give rise to adirection of motion at individual sites that can be com-pared with experimental data.

A number of previous studies have attempted to modelthe lateral interactions that are necessary to account formotion capture. One set of models employs smoothingoperations to minimize the difference of neighboring mo-tion signals.14,15 Another approach suggests the use ofdifferent units that respond selectively to terminator orcontour motion.16 These models have not been based on,nor described in terms of, the physiological architecture ofthe visual system. The underlying computational consid-erations are difficult, if not impossible, to identify withneuronal behavior in the visual system.

Other models exist that rely on more complicatedmechanisms for motion integration.17–19 However, nonehave been applied to the case of lines in multi-aperturedisplays, and their physiological plausibility remains tobe verified. More recently, Bayesian or ideal-observermodels20,21 have been proposed to model human percep-tion of object motion. Although sometimes related to aneuronal implementation, they approach the problemfrom a computational perspective. Predictions of humanperformance are elegantly made from such models on thebasis of a small number of simple assumptions about theprobability distributions for the set of real-world objectsthat would produce a given retinal image. As such, theyprovide one level of description of certain features of hu-man vision and are especially valuable for designing com-puter vision systems. However, their descriptive valuedepends on whether they can provide a compact descrip-

2003 Optical Society of America

G. Loffler and H. S. Orbach Vol. 20, No. 8 /August 2003 /J. Opt. Soc. Am. A 1473

tion, in terms of a very small number of heuristics, for thecomplex nature of human motion perception. If moreand more heuristics need to be applied, an elegant teleol-ogy descends into mere phenomenology. The experimen-tal evidence that we present requires modifications ofthese present models.

In contrast to the computational approach, the modelpresented here has been inspired by, and is based on, ourunderstanding of the physiological architecture of the vi-sual system. Its simplicity and its success in predictingexperimental results makes it a strong candidate for ex-plaining how motion integration may be accomplished bythe visual system.

2. GENERAL MODELThe global model is based on a local version described indetail elsewhere.13,22 Figure 2 portrays the specific caseof a line stimulus shown behind three circular apertures(top). Three identical copies of a local motion model (ar-ranged vertically) are applied to different parts of thestimulus, and their computations are done in parallel.Each of these local models contains two parallel pathways(Fourier and non-Fourier) that extract the motion of lu-minance boundaries and texture boundaries, respectively.The firing rate of V1 simple cell filters (B) encode thestimulus at the initial stage of the model followed by anonlinearity to account for the typical simple-cell contrastresponse function (C). Subsequently, the signals are pro-cessed in parallel. The Fourier pathway extracts motionby using directionally tuned Reichardt23,24 detectors (GF)and responds to luminance boundaries. Contrast nor-malization (HF) is followed by summation over spatial po-sitions (identified with MT component cells, IF). In thenon-Fourier pathway, squaring the V1 simple-cell re-sponses (D) and second-stage filtering (E) extracts textureboundaries. All other steps are qualitatively the same astheir Fourier counterparts.

Finally, the signals of Fourier and non-Fourier path-ways are combined at level of hypothesized MT patternunits (K). MT component cell responses (IF and INF) pro-vide the inputs @If,x,y , (Eq. 1)] to each MT pattern unittuned to direction f at spatial location (x, y):

If,x,y 5 (u5f2120°

u5f1120°

$@MTcomponentF ~u!

1 aMTcomponentNF ~u!#cos~f 2 u!%, (1)

Fig. 1. Rigid moving diamond. The large arrow represents theveridical velocity. The circles show areas where local motion es-timation could take place, and the small arrows give the local-motion estimation at the location of the corners (terminator) orthe featureless edges.

where MTcomponent are the Fourier (F) and non-Fourier(NF) component cell outputs, respectively. The weight-ing factor between the signals of the two pathways, a, wasderived from physiological observations.13 The input isweighted by the cosine of the difference between MT com-ponent and MT pattern cells’ preferred direction ofmotion.22

MT pattern units engage in a winner-take-all computa-tion by means of recurrent subtractive inhibition betweenunits tuned to different directions.22 In the case of anisolated local network, such interaction can be describedby the following set of coupled, nonlinear, first-order dif-ferential equations,

dPf,x,y

dt5

1

tF2Pf,x,y 1 NRS If,x,y 2 (

u5f2120

u5f1120

iu Pu,x,yD G ,

(2)

where Pf,x,y assigns the response of an MT pattern celltuned to direction f and with a receptive field centered atspatial location (x, y) as a function of time. The firstterm in the brackets, 2Pf,x,y , defines the temporallyasymptotic behavior that is determined by the argumentsof the Naka–Rushton (NR in the equation) function25 (seeAppendix A). To achieve winner-take-all behavior, eachpattern unit inhibits other pattern cells, and this recur-rent inhibition is represented by the last term, theweighted (iu) sum over differently tuned pattern cells lo-cated at the same spatial site (x, y).26

If only a single unit survived the competition, the spac-ing of directional preferences among MT pattern cellswould restrict the accuracy of model predictions. How-ever, regardless of the precise spacing, a high accuracycan be achieved by a slight modification from a strictsingle winner-take-all network: Each unit does not in-hibit its nearest neighbors to both sides22 (see Table 1 be-low). This, in turn, causes not only the winner but alsoits two nearest neighbors to survive the competition, anda subsequent parabolic interpolation between the threeunits permits a high accuracy. Such an approach has theadvantage of a sparse neuronal representation withoutloss of accuracy.

These operations (cosine weighting and inhibitory feed-back) have been shown to yield locally a winner-take-allnetwork22 with a steady-state direction of motion math-ematically equal to a vector summation. Consequently,operations carried out by the hypothesized MT patternunits can be thought of as the vector summation over sig-nals from motion detectors tuned to different directions ofmotion from both Fourier and non-Fourier pathways.

Such local models have been shown to predict success-fully a variety of psychophysical data including the per-ceived motion of gratings and plaids under coherent27–29

and transparent conditions.30,31 More recently, we haveshown that such models compute the correct direction ofmotion of local image features such as line terminators.13

However, multiple copies of a local model cannot ac-count for our perception of the direction of motion of ex-tended objects. In the specific example of a line behindan aperture mask (Fig. 4 below), the output of the centralsite would give a direction of motion perpendicular to theline orientation (the aperture problem). Consequently, a

1474 J. Opt. Soc. Am. A/Vol. 20, No. 8 /August 2003 G. Loffler and H. S. Orbach

Fig. 2. Outline of the global model. Three identical copies of a local-motion model are applied in parallel to different parts of thestimulus (in this case, a line behind a three-aperture mask). Each of the local-motion models contains two parallel pathways (Fourierand non-Fourier) which extract the motion of luminance boundaries and texture boundaries, respectively. The convolution of the stimu-lus (A) with differently oriented V1 simple-cell filters (B) defines the filters’ sensitivity function. (For clarity, the figure shows only oneorientation.) A power-law function (C) models the nonlinear simple-cell contrast response. Subsequently, the signal is processed inparallel. The Fourier pathway extracts motion by using directionally tuned Reichardt detectors (GF). The output is normalized (HF)by a feedforward divisive term that is calculated in terms of the sum over differently oriented V1 simple cells’ responses. MT componentcells (firing rates shown in IF) sum the outputs of motion units located at different spatial positions (omitted for clarity) but tuned to thesame direction of motion. In the non-Fourier pathway, V1 simple cells’ responses are squared (D) and second-stage filtered (assumed tobe carried out by cells in area V2) (E). These filters are tuned to a lower spatial frequency and are oriented orthogonal to the initialfilters to extract texture boundaries. Qualitatively, the same steps follow as described for the Fourier pathway: power-law nonlinearity(F), texture boundary motion (GNF), feedforward normalization (HNF), and MT component cell (INF) pooling. Finally, the signals ofFourier and non-Fourier pathways are combined at the level of MT pattern units (K). The output for each local model in the absence ofany lateral interconnections (i.e., a multi-local model) would give a different direction of motion for each site. In that case, the centralsite would signal an incorrect direction of motion. The global model differs from the multilocal version by the lateral interactions at thelevel of model MT pattern cells indicated at the bottom of the picture.

G. Loffler and H. S. Orbach Vol. 20, No. 8 /August 2003 /J. Opt. Soc. Am. A 1475

successful global model requires a modification in theform of lateral interactions between sites. These are re-alized here at the level of MT pattern units (Fig. 3). Theinhibitory intra-site feedback (winner-take-all) has to beextended and is now coupled with lateral excitation be-tween sites (see Appendix A). Replacing Eq. (2) gives theresulting differential equations for the global model:

dPf,x,y

dt5

1

tF2Pf,x,y 1 NRPS If,x,y 2 (

u5f2120

u5f1120

iu Pu,x,y

1 Ef,x,yD G . (3)

The only difference between this and Eq. (2) is the excita-tory connections denoted by E. Hence the multilocalmodel is a special case of the global version where lateralexcitations are set to zero. The excitatory lateral inputfor each pattern unit from pattern units at other sites isdefined as

Ef,x,y 5 (x8,y8

H d~x8 2 x, y8 2 y !F (u5f215

f115

~eu Pu,x8,y8!G3 NRE~ uPf,x,y 2 Pf,x8,y8u!J . (4)

These lateral interactions are calculated as the sum oversites at different spatial locations (x8, y8). A set of direc-tionally tuned pattern units at each site contributes in aweighted manner toward the strength of the inter-site in-teraction signal [second term in Eq. (4)]. The weightingfactors, which are defined in terms of the differences inpreferred directional angles, are eu . They are chosensuch that a unit at location A is most strongly excited by aunit at location B with the same directional preference.

Fig. 3. Lateral interactions at the level of model MT patternunits. Each pattern unit is sensitive to a different direction ofmotion (arrows); 24 such units span the whole range of motionsin 15° increments. The excitatory signal (1) between unitstuned to similar directions of motion at different locations de-pends on the distance between locations (weighting defined by aGaussian function of distance) and is modulated (* ) by hypoth-esized motion-discontinuity detectors, which calculate the differ-ence in motion energy between pattern units at different loca-tions (dashed lines); the detectors are necessary to avoidmutually excitatory feedback when two locations signal the samedirection of motion (see Appendix A and text for details). ANaka–Rushton (NR) nonlinearity is included for physiologicalplausibility and, in addition, avoids signal explosion in such anexcitatory network.

The excitatory signal is summed only over units withsimilar directional preferences (to within 615°).

To reflect the limited distance over which lateral inter-actions occur, the excitatory signal is modulated by aGaussian function of the absolute distance between sites[Eq. (5) and first term in Eq. (4)]. The two parameters, aand s, reflect the strength and the spatial extent of inter-actions, respectively:

d~x, y ! 5 a exp@2~x2 1 y2!/s 2#. (5)

The last term in Eq. (4) ensures a domino-effect propaga-tion (see Appendix A). This term effectively supports lat-eral interactions if they signal different directions butshuts down mutual excitations if two sites are alreadysignaling the same direction of motion. This computa-tion is attributed to the activity of motion-discontinuitydetectors. As with all model neurons, these cells exhibitan input–output nonlinearity that is described by aNaka–Rushton function. By choosing an appropriatelyhigh semisaturation constant (see parameters), this func-tion guarantees a stabilization of the network’s finalsteady state.

As in the local model, the direction of motion for eachsite and at each instant in time is finally calculated as theparabolic interpolation over the maximally excited pat-tern unit and its nearest neighbor to each side.

3. SIMULATION METHODSA. Model ParametersModel simulations were conducted by using the MATLAB(The MathWorks) environment. Simulations were basedon lattice spaced matrices.32 Most of the methods areidentical to those used for a local-motion model and aredescribed elsewhere.13 All model parameters were fixedbefore the simulations and were kept constant. Param-eters were identical to those used in the localsimulations13 and were based, whenever possible, on datafrom neurophysiological studies33–42 or psycho-physics.22,43–45

The parameter of the input filters (linear V1 cells) werechosen to show a peak sensitivity for a grating of 1.7cycles per degree (cpd).46 Twenty-four motion detectorunits correlating the inputs from filter pairs spaced by 15deg covered the whole range of motion directions at anysingle location.

The inhibitory intra-site weighting set (i) that extractsthe winner plus two nearest neighbors was set to inhibitevery other directional unit except itself and the twoneighbors on each side (see Table 1). The inhibitoryweighting constants are not crucial as long as they aresufficient to guarantee suppression of pattern units by themaximally activated cell. The lateral excitations (e) wererestricted to pattern units in the other network sites withpreferred directions within 615 deg. The strongest exci-tation occurred between units signaling the same direc-tion of motion.

The parameters of the Naka–Rushton functions, regu-lating the nonlinearities of the postsynaptic potentials forthe MT pattern units and the discontinuity detectors re-spectively, are shown in Table 2. For mathematical con-venience, the exponent N was assigned a value of 2, which

1476 J. Opt. Soc. Am. A/Vol. 20, No. 8 /August 2003 G. Loffler and H. S. Orbach

Table 1. Inhibitory Intra-Site Weighting, i, and Excitatory Inter-Site Weighting, e, among Pattern Units asa Function of Difference in Directional Tuning (Angle)

Relative Angle u (deg)

Inhibitory and Excitatory Weighting Parameters

0 15 30 45 60 75 90 105 120 135 150 165

Inhibitory weights, iu 0 0 0 21 21 21 21 21 21 0 0 0Excitatory weights, eu 11 10.5 0 0 0 0 0 0 0 0 0 0

is slightly lower than the average found for cortical neu-rons in area MT (approximately 3).34

The two parameters of the Gaussian function [Eq. (5)],which weight the lateral interactions depending on thedistance between sites, were set to a 5 2, s 5 2. Thetime constant t, which determines the characteristics ofthe temporal dynamics was set to 20 ms. These valueswere chosen to provide a good fit to the data from the ini-tial experimental condition (4A and 4B) and were keptconstant for all subsequent simulations.

B. StimuliThe stimulus parameters for all simulations were identi-cal to those employed in the psychophysical experiments.7

This includes their spatial extent, profile, orientation,contrast, spatial frequency, speed, and direction of mo-tion. The stimuli were rendered as discrete local-contrast functions. Contrast values were defined in rela-tion to a neutral background (midgray in theexperiments). The line width was 0.25°, which equalsthe width of the center excitatory lobe of model inputunits (V1 simple cells).

Fig. 4. Simulation methods. Each circular area depicts the re-ceptive fields of a set of model MT pattern units (each tuned to adifferent direction of motion). The left and center panels showthe three sites that are active for the line stimuli with small andlarge inter-aperture gaps, respectively. The icon on the rightshows the situation in condition 4C where the diameter of thecentral aperture was manipulated. It can be seen that there issome overlap of receptive fields in this condition and the numberof active units increases with the size of the central aperture(dashed circle).

Table 2. Parameters for the Two Naka–RushtonFunctions Describing the Nonlinearities of the

Postsynaptic Potentials of MT Pattern Units(NRP) and Motion Discontinuity Detectors (NRE)

NRk(x) 5 mkxNk/mkNk 1 xNk

Parameter for theNaka–Rushton Nonlinearity

mk mk Nk

NRP 100 100 2NRE 40 30 2

The size of a single MT pattern cell’s receptive field was1.6° for the 1.7-cpd channel owing to the spatial poolingover component cell responses. Hence a single site wasactivated by motion within a circular field with a diam-eter of 1.6°. To simplify model simulations, individualMT pattern sites were nonoverlapping and were alwayscentered with respect to the apertures for all but condi-tion 4C (see Appendix A for justification).

Figure 4 (left and center) shows the condition in whichthe location of the sites varied with the inter-aperture gapsize. Each circle can be understood either as the (invis-ible) apertures in the experiments or the borders of themodel cells’ receptive fields. The condition portrayed onthe right shows the simulation of one of the experiments(condition 4C), where the diameter of the central aperturewas manipulated. For the simulations, there is some de-gree of receptive field overlap. The smallest gap size(largest central aperture) resulted in seven active units,two activated by the terminators and five by the centralline segment. For the largest gap (smallest central aper-ture), only a single unit was activated by the central linesegment.47

4. RESULTSThe results of the model simulations are presented in thesame order as in the accompanying psychophysicalpaper.7 Subsections 4.A and 4.B outline how the modelachieves capture and how it was used to determine someof the model parameters. With these parameters fixed,the model is then shown to successfully predict other ex-perimental conditions (Subsections 4.C and 4.E). Limi-tations of a global model employing simple lateral inter-actions are discussed as they become evident duringcomparison of predictions with experimental data for mis-oriented segments (Subsection 4.D). This observation isinteresting, as it provides strong evidence for a specificmodel extension in the form of collinear facilitation.Once incorporated, the model correctly predicts not onlydata for misoriented segments (Subsection 4.D) but alsothe much less intuitive results for inhomogeneous stimuli(Subsection 4.F).

A. Effect of the Gap between AperturesTo understand the effect of the lateral interactions, it ishelpful to start applying the model to a specific condition:a 45° oriented line (contrast 5 1.0, speed 5 5°/s,direction 5 90°) behind a three aperture mask. Figure5 shows the results of the local model applied indepen-dently to each of the three sites. The top and the bottomrows show the sites centered at the upper and lower ter-minator, respectively, and the middle row pictures the

G. Loffler and H. S. Orbach Vol. 20, No. 8 /August 2003 /J. Opt. Soc. Am. A 1477

central, line-segment site. As shown previously,13 theterminator sites signal the veridical direction of motion.The site ‘‘seeing’’ only a featureless line gives a predictionthat is perpendicular to the line’s orientation (45°), inagreement with perception.

If there were no lateral interaction, the multi-localsteady state would be given by the directions in the right-hand column. In particular, the center aperture wouldalways give a nonveridical signal regardless of inter-aperture distance. For motion capture to occur, the twoterminator sites must alter the signal of the central site.One superficially attractive approach to doing this wouldbe to employ multiplicative lateral excitation. However,the problem with such an approach is illustrated by thisexample, where the signal strength of the componentunits tuned to the veridical direction of motion (90°) is vir-tually zero for this 45° oriented line segment (Fig. 5,middle row). This is an important observation because itchallenges any approach based on multiplicative interac-tions and instead suggests additive lateral excitations.

The main part of Fig. 6 plots the predictions of the glo-bal model for a line orientation of 45°. The solid squaresconnected by the solid line show model predictions for the

perceived direction of motion of the central aperture con-taining a featureless line segment. These data are rela-tive to the physical direction of the line and are plotted asa function of the inter-aperture gap. In addition, indi-vidual data from the corresponding psychophysicalexperiments7 are shown, where error bars are standarderrors of the mean. Simulations are for a presentationtime of 200 ms.

For small gaps, the center site signal is captured as aresult of the lateral interactions with the terminatorsites, giving a veridical direction of motion. As the dis-tance increases, the capturing effect of the terminatorsdecreases, and model prediction eventually approaches adirection orthogonal to the segment’s orientation. Themodel predictions show substantial agreement with psy-chophysical data for this and other line tilts (30°, bottomleft; 60°, bottom right in Fig. 6).

The model can also explain individual variability.Consider the case of psychophysical data for a 30° tilt.While two of the three observers (GG and GL) still reportmotion capture of the central segment for an intermediategap of 1.2°, the third subject’s (JW) perception is close tothe perpendicular. This variability can be explained in

Fig. 5. Results of the local model applied independently to each of three sites (upper and lower terminator and central featureless linesegment). The left panel shows the stimulus and its physical direction of motion. The line orientation is 45° relative to the horizontal,its width is 0.25°, and its motion is upward (arrow) at 5°/s, matching one of the psychophysical conditions. Contrast was 1.0, and thediameter of each of the circular apertures was 1.6°. The central panels plot the responses of model MT component cells as a function ofpreferred motion direction for both pathways (Fourier and non-Fourier). The bold numbers next to the polar plots indicate signalstrength. The arrow in the right panel depicts the final model output, the direction of motion, at the level of MT pattern units. Forterminators, the shape of the polar plots for the Fourier pathway exhibit a broad bandwidth with a bias (.15°) perpendicular to the lineorientation. In the case of the non-Fourier units, the response curve is sharply tuned with a peak always biased toward the line ori-entation. Neither of the pathways alone signals the veridical direction of motion. The responses in the case of isolated Fourier (F) andnon-Fourier (NF) pathways are shown inside the polar plots. However, the combined network response (S, right column) is very closeto the true physical direction of motion.13 For central line segment, as would be expected on the basis of the aperture problem, thecomputed direction of motion is perpendicular to the line orientation (note that there is no contribution from the non-Fourier pathway forline segments). For subsequent simulations it is of note that the signal strength of the line segment’s Fourier pathway slightly exceedsthat of the terminator sites.

1478 J. Opt. Soc. Am. A/Vol. 20, No. 8 /August 2003 G. Loffler and H. S. Orbach

Fig. 6. Results for the effect of the size of the gap between apertures. The plots show perceived and predicted direction of motion of thecentral, featureless segment as a function of the gap between adjacent apertures. As indicated by the icons on the right, the direction ofmotion is shown relative to the physical direction of the line. The model predictions, given by the squares connected by the solid lines,are in close agreement with psychophysical data7 for all inter-aperture gaps and line tilts (45°, center; 30°, lower left; and 60°, lowerright). Error bars for the data are standard errors of the mean. Regardless of line tilt, the model correctly predicts motion capture ofthe central segment by the adjacent terminators for small gaps, a direction orthogonal to the line orientation for large gaps, and anintermediate direction in between.

terms of differing extents of spatial excitation [given bydifferent values of the space constant parameter s fromEq. (5)] for different observers.

Figure 7 shows data from the three individuals next tomodel predictions for a strong (2.5°) and a weak (1.5°)space constant of the Gaussian function determining thespatial extent over which interactions occur. It becomesevident that slightly different space constants can ac-count for most of the variability among subjects.

B. Effect of the Duration of the PresentationFigure 8 shows the data and model prediction for half(105 ms) the presentation time used in the previous con-dition (200 ms). Comparing the model outputs for theprevious condition (Fig. 6) with the shorter presentationtime here reveals slightly different curves: The transi-

tion between captured and perpendicular direction issharper for the shorter duration. This indicates that forsmall gaps, capture is completed at or before 100 ms butintermediate gaps require more computation time toreach their steady state.

The temporal evolution of the model for two differentgap sizes is shown in Fig. 9. The graph shows the calcu-lated direction of motion for the central site as a functionof simulation time. All time courses start at the samepoint, the perpendicular direction of motion (45°). In ad-dition to the different levels for the steady state, whichdepends on the gap size, the time courses reach theirasymptotic value at different times. The time course fora small inter-aperture gap (solid curve) reaches its steadystate earlier (after approximately 60 ms) than for an in-termediate gap (;150 ms) (dashed curve). However, for

G. Loffler and H. S. Orbach Vol. 20, No. 8 /August 2003 /J. Opt. Soc. Am. A 1479

a large gap where lateral interactions are ineffective, thepredicted direction of motion is independent of time andstays orthogonal to the line orientation (dotted line).

C. Effect of Inter-Aperture Gap When the Distancebetween Stimuli Is ConstantIn the next condition, the physical distance between theterminator site and the central part of the line segment iskept constant (Fig. 10, icons). Varying the size of thecenter aperture allows two different possibilities to becompared:

1. Motion capture depends on the physical distancebetween terminators and line segment.

2. The gap between apertures determines whethercapture occurs.

Experiments3,6,50 have provided compelling evidencefor the latter hypothesis.

For the model simulations, the different gaps corre-spond to central apertures of variable size containing

Fig. 7. Model predictions with strong and weak space-constantparameter [ s, Eq. (5)]. Solid circles give model predictions for ahigh space constant (2.5, strong lateral interactions), and solidsquares show results for a low space constant (1.5, weak interac-tions). As can be seen, a small change of a single model param-eter can account for the variability across individual subjects.Note that the space constant was not used as a free parameterfor subsequent simulations, but instead the same constant(s 5 2) was employed throughout the simulations presentedhere.

Fig. 8. Results for the effect of presentation time. Data are fora 105-ms presentation time compared with 200-ms time in theprevious experiment. The only difference between this condi-tion and the longer presentation (Fig. 6) is the somewhat reducedcapturing effect for an intermediate gap (2°), which is convinc-ingly matched by the model.

overlapping local model units (see Subsection 3.B., Fig. 4).Specifically, the four gap sizes (0.4°, 1.2°, 2°, 3.0°) corre-spond to a central aperture diameter of 6.8°, 5.2°, 3.6°,and 1.6° and a total of 7, 6, 5, and 3 active units, respec-tively.

The global model correctly predicts that the gap andnot the physical distance is the crucial parameter andshows good agreement with psychophysical data. Forthe smallest gap size (0.4°), the simulation predicts thatthe terminators (in this simulation as much as 4.6° awayfrom the segment center) always capture the central linesegment. Superficially, this seems an unexpected resultbecause the model bases lateral interactions extensivelyon physical distance [Eqs. (4) and (5)], and it is obviousfrom Fig. 6 that lateral interactions are already ineffec-tive for distances of 3°.

How does the model produce this behavior? The keyfeature is the operation of the hypothesized motion-discontinuity detectors that modulate lateral interactions[Eq. (4)]. Without the modulation of lateral interactionby such discontinuity detectors, the signals of the sitescentered at parts of the line that lack terminators wouldmutually excite each other, resulting in increasing signalstrength. This, in turn, would eliminate motion capture

Fig. 9. Time course for a small, intermediate, and large gap inthe case of a 45° oriented line. Different conditions give rise todifferent asymptotic steady states. Moreover, the time taken toreach this asymptotic level differs: It is shorter for a smallinter-aperture gap (solid line) than for an intermediate gap(dashed line). For large gaps (dotted line), lateral interactionsshow no effect.

Fig. 10. Effect of changing the diameter of the central apertureand therefore altering the gap between apertures while keepingthe physical distance between terminators and central line seg-ment fixed. The model predicts experimental observations:The crucial variable for motion capture is the gap and not thephysical distance.

1480 J. Opt. Soc. Am. A/Vol. 20, No. 8 /August 2003 G. Loffler and H. S. Orbach

(see Appendix A). This means that for the model simula-tions, the veridical motion signal from the terminatorscan cross small but not large inter-aperture gaps. Oncethe signal has crossed a gap, it propagates along the linesegment.

D. Effect of (Mis)AlignmentIt is evident psychophysically that motion integration isrobust against modest misorientations between termina-tors and line segments. However, when there is a rela-tive difference in orientation (skew) of approximately 45°,motion integration breaks down.7 Can the low-levelmodel described thus far—that is not explicitly based onfigural aspects—predict this behavior? Previoussimulations51 have shown that such a model predicts mo-tion capture regardless of misorientation and hence failsto account for data on large skews. We concluded thatpurely distance-dependent strategies are too simplisticand proposed that for the model to be successful astraightforward elaboration was required.51 In thiselaboration, lateral interactions depend not only on theinter-aperture distance but also on the orientation of theelements within different sites. The strategy of groupingelements that are collinear goes back to Gestalt psychol-ogy and has received plenty support from a variety ofstudies (see Ref. 52 for a review).

The model presented up to this point can easily bemodified to include lateral interactions that also dependon collinearity between neighboring segments without af-fecting any results presented so far. Collinear facilita-tion is realized by altering Eq. (5), which so far describedthe strength of the excitatory signal between sites by a

simple Gaussian function of inter-aperture distance. Themodified equation takes the form

d~x, x8! 5 a exp[2~x 2 x8!2/s 2]

3 ucos[b2p~orimax,x 2 orimax,x8!]u.0 . (6)

The difference between the old and the new equation isthe cosine term. The strength of the lateral interactionsinitially determined only by distance is now modulated bythe (dis)similarity of the segments’ orientations (ori) atdifferent sites (x, x8).53 A measure of orientation simi-larity is gained by comparing the directional preference ofmaximally excited simple V1 cells (orimax) across sites.Collinear facilitation follows a cosine function of these ori-entation differences. Although the cosine function waschosen for mathematical simplicity, its predictions pro-vide a convincing fit to experimental data. The one newparameter, b, influencing the range of skews over whichmotion capture is effective was set to 4. Therefore, if ori-entations differ by a certain amount (more than 22.5°),lateral excitations are shut down.

It is important to note that for all collinear line stimuli,both the old [Eq. (5)] and the new [Eq. (6)] equations giveidentical predictions. In these cases, the maximally ex-cited V1 cells at the locations of line terminators and thecentral segments share the same directional preference(orimax,x 5 orimax,x8), causing the cosine term to take avalue of 1 and hence providing no modification to the lat-eral integration strength. Consequently, the modelsimulations considered up to this point are a special caseof the more general one which includes collinear facilita-tion (in the form of inhibition for noncollinear elements).

Fig. 11. Effect of misorientation (0°, top left; 10°, top right; 20°, bottom left; 45°, bottom right) between central line segment and lineterminators (icons). The model incorporating collinear facilitation correctly predicts motion capture for small gaps and up to moderateskews (10° and 20°) but lack of capture for grossly misoriented contours (45°). Without collinear facilitation, the model predicts motioncapture for small gaps regardless of skew, at variance with experimental data.

G. Loffler and H. S. Orbach Vol. 20, No. 8 /August 2003 /J. Opt. Soc. Am. A 1481

Fig. 12. Results for stimuli defined by a cross-sectional profile of a D6 pattern. The spatial frequency of the central patch is fixed (1.7cpd), but that of the truncations varies (1.7 cpd, top left; 2.8 cpd, top right; 5.0 cpd, bottom left). The single-spatial-frequency model isin qualitative agreement with experimental data but slightly underestimates the effect of capture for dissimilar spatial frequencies.This suggests that lateral interactions between motion signals arising in different parts of the visual field are restricted to a small rangeof spatial frequencies but are not completely isolated to a single channel.

The results of model simulations for four differentskews (0°, 10°, 20°, 45°) are shown in Fig. 11. Capture aspredicted by the model disappears with increasing skewand is absent even for the smallest gaps in the case of suf-ficiently large misorientation (45°), which is entirely con-sistent with observer performance. As will be shown be-low, the proposed collinear facilitation scheme not onlypermits correct model predictions for the skewing experi-ment (where it is a straightforward result) but is also suc-cessful in predicting a far less intuitive experimental re-sult (dots and line segments).

E. Effect of Line Profile (Truncated D6 Patterns)In this condition the cross-sectional profile of a D6 (sixthspatial derivative of a Gaussian) was substituted for thesquare pulse of the lines. The spatial frequency of the D6in the central aperture was set to 1.7 cpd, matching thepeak sensitivity of the initial filters of the model. Thespatial frequency of the truncated parts (outer apertures)was set to 1.7 cpd (matching the frequency of the centralpart), 2.8 cpd, or 5 cpd.

In model simulations of these experiments there was acontrast inversion between center (21) and truncations(11). In agreement with psychophysics, such a change inpolarity does not affect model outputs because model mo-tion detectors23,24 are insensitive to this manipulation.Model predictions for a truncated D6 pattern of matchingspatial frequency (Fig. 12, top left) are almost indistin-guishable from those obtained for the line profile, in closeagreement with psychophysical data. This is a conse-

quence of the local model exhibiting no difference betweencomputing isolated line terminators or truncated D6patterns.51

It might appear counterintuitive to model patterns thathave mixed spatial frequencies with a model incorporat-ing only a single spatial-frequency channel. The reasonfor applying a single-channel model to combinations of D6patterns with different frequencies stems from experi-

Fig. 13. Effect of dots on line segments. Model predictions arein agreement with psychophysics. There is little effective inter-action between dotlike features and line segments even if theyshare the same spatial-frequency components. For the modelsimulation, this behavior is due to the lack of collinear facilita-tion, because dots do not exhibit an orientational preference.The small capturing effect for the smallest gap is a result of add-ing random noise to the V1 simple-cell outputs feeding into thecollinear facilitation mechanism. Occasionally, the noise sup-ports the physical direction of motion, and the averaged simula-tion shows a small bias toward the real direction of motion.

1482 J. Opt. Soc. Am. A/Vol. 20, No. 8 /August 2003 G. Loffler and H. S. Orbach

mental observations.7 There, human observers did notreport strong influences on the central 1.7-cpd patternfrom truncations of higher spatial frequencies (5.0 cpd).These results suggest that motion integration across ap-ertures might be restricted to a single or a small range offrequency channels and not operate across widely differ-ent channels. As a consequence, a single-spatial-frequency model (being the relevant part of a multi-channel model that uses relatively independent channels)might be sufficient to account for experimental observa-tions. The simulation tested this assumption.

Results for the two combinations of nonmatching spa-tial frequencies are shown in Fig. 12. A modest increasein the spatial frequency from 1.7 to 2.8 cpd produces aclear change in model predictions (Fig. 12, top left versustop right). More drastically, if the truncations are de-fined by a 5.0-cpd D6 profile (Fig. 12, bottom), no effectiveinteractions appear to take place. These model predic-tions capture the main trend of the data. However, themodel slightly underestimates the capture of the centralD6 patch by the truncations. The fact that a single-spatial-frequency-channel model quantitatively predictshuman performance on stimuli with mixed spatial fre-quencies suggests that lateral interactions are restrictedmainly to a small range of spatial-frequency channels.The small discrepancy between model outputs and experi-mental data might suggest that the interactions are notcompletely isolated to a single channel.

F. Effect of Dots on Line SegmentsThe condition of heterogeneous stimuli is of particular in-terest as it poses a challenge for any low-level model thatdoes not base lateral interactions on an explicit interpre-tation of a scene. For a low-level model, it might be ex-pected that if motion integration mechanisms were basedpurely on fixed interactions among a set of neurons, anystimulus exciting those cells in a similar way should beequally effective. Accordingly, one might conjecture thatdots (circular difference of Gaussians) with a space con-stant chosen to excite the same spatial-frequency channelas the line stimuli would capture a featureless line seg-ment. It is therefore not surprising that the global modelin its initial version [Eq. (5), without collinear facilitation]predicted motion capture of line segments by dots. How-ever, this prediction was at variance with experimentswhere almost no capture was observed between these het-erogeneous stimuli.7

Interestingly, once collinear facilitation is included [Eq.(6)], model predictions are in agreement with psychophys-ics (Fig. 13), and simulations show a lack of capture of aline segment by nearby dot features. The reason for thisbehavior lies in the responses of the input filters (V1simple cells). Simple cells do not exhibit any orientationpreference for stimuli such as dots; cells tuned to differentorientations respond in exactly the same way. Thereforethe collinear facilitation is mainly inactive.54

5. DISCUSSIONThe results obtained with the global model show that anextension of a local, single-site network featuring excita-tory lateral interactions among spatially separated sites

can account for a wide variety of psychophysical data.Unlike other approaches,55 the model proposed here doesnot rely on any higher-level information about the stimu-lus. Neither binocular nor monocular depth cues such asa differentiation between extrinsic and intrinsic termina-tors (including T junctions) was utilized. The only modelinput is the local-contrast description of the stimulus.Furthermore, this global model is able to describe thepropagation of motion signals along spatially adjacentsites and is consistent with the fact that the gap betweenapertures and not physical distance determines capture.Consequently, the current study provides a detailed de-scription of how a low-level, bottom-up, model based onthe physiological architecture of the visual system cansolve the aperture problem.

The model, based on a single-spatial-frequency chan-nel, is also sufficient to explain the lack of influence ofhigh-spatial-frequency terminators on low-frequency con-tours. It seems, however, that for the model to predictdata quantitatively (particularly for medium-frequencyterminators on low-frequency contours), a model based ona single, isolated channel may be too restricted. In turn,this can be taken as evidence for weak interactions amongadjacent spatial-frequency channels. It is interesting tonote that similarly restricted interactions among differ-ent spatial-frequency channels have been reported re-cently in pattern vision.56

A. Physiological PlausibilitySince the local model was originally presented,22 neuro-physiology has provided extensive support for the pro-posed computations,57–61 which are discussed in detailelsewhere.13 For the global model, the suggested mecha-nisms of local winner-take-all and lateral excitation havethe powerful motivation of physiological plausibility andsimplicity. There is physiological evidence for a winner-take-all network operating among MT cells.59 Further-more, lateral additive excitations between sites seem thesimplest way to extend an existing and successful localmodel.

1. Discontinuity DetectorsHow physiologically plausible is it to modulate lateral in-teractions depending on the difference in activity betweenunits tuned to the same direction of motion at differentsites [Appendix A and Eq. (4)]? Evidence in favor of suchcomputations has been reported in some MT neurons:cells with center-surround antagonism with respect tomotion.62–64 These cells respond preferentially to loca-tions containing motion discontinuities, and it has beenpostulated19 that they are involved in motion-segmentation tasks. The model here incorporatesmotion-discontinuity detectors as physiologically crediblecandidates for modulating the lateral excitatory interac-tions between sites (Fig. 14). It is conceivable that suchneurons are involved in a number of tasks, includingfigure–ground segmentation and modulating lateral inte-gration.

Interestingly, similar mechanisms have been obtainedin an approach in which the interaction strengths of a mo-tion segmentation model were determined by training

G. Loffler and H. S. Orbach Vol. 20, No. 8 /August 2003 /J. Opt. Soc. Am. A 1483

procedures.19 The resulting behavior of their segmenta-tion units is similar to the motion-discontinuity-basedmodulation described here.

2. Collinear FacilitationThere is ample evidence for the existence of lateral con-nections that support grouping according to collinear fa-cilitation in the visual system (see Ref. 52 for a review).Facilitation between cells of similar orientation prefer-ence has been reported from anatomical and neurophysi-ological studies.65–68 Complementary support comesfrom natural-image statistics69 and psychophysical stud-ies concerned with static70,71 and dynamic72,73 displays.

B. High- versus Low-Level InterpretationsResults with the initial global model [Eq. (4)] showed thatsimple, distance-dependent mechanisms make correctpredictions across a large range of conditions and differ-ent kinds of stimuli (lines and truncated D6 patterns).However, such simple computations fail to explain percep-tion when contours are grossly misoriented or stimuli areheterogeneous (dots and line segments). The key ques-tion concerns the kind of additional mechanisms requiredfor matching human performance. Because the two con-ditions for which the original global model makes inaccu-rate predictions are situations in which a straightforwardinterpretation of a single, solid object is not available, itmight be argued that higher-level mechanisms are in-volved. These higher-level computations include objectinterpretations, figure–ground segmentation, and three-dimensional representations of the scene. It is conceiv-able that any of these would result in quite differentobject-level descriptions:

1. For all collinear conditions: three parts of thesame, solid object that is physically behind an aperturemask.

2. For the skewing conditions and dot–line combina-tion: three independently moving stimuli that are notpart of a common object.

A model could be constructed that uses these two dif-ferent higher-level descriptions combined with differentintegration strategies to make predictions similar to those

Fig. 14. Lateral excitation and multiplicative modulation. Hy-pothesized motion-discontinuity detectors are shown below theline stimulus. The dashed lines indicate the sites from whichthey receive inputs. The output of the differential operators isused to modulate (* ) the excitatory (1) lateral interactions. Inthe model, discontinuity detectors are driven by directionally se-lective units at different sites. Hence they correspond to neu-rons that respond to directionally specific motion discontinuities.

made by the model presented here. Obviously, the ex-perimental results cannot be used to provide evidence todecide between low- or high-level mechanisms. Despitethe lack of a decisive answer to that question, the ques-tion of whether a low-level scheme can satisfactorily ex-tend the global model can be examined.

On the basis of the results from the skewing condition,the most obvious scheme is collinear facilitation (Fig. 15).Each circle symbolizes the receptive field of a single spa-tial site. Assume that the stimuli for each site are partsof a gray line. A global model with only distance-dependent interactions would determine the strength oflateral interactions simply by the distance between sites.In this example, interactions between the central site andany of the terminator sites would be identical. For col-linear facilitation, lateral interactions are employed thatare not determined solely by distance but instead dependon an aspect of stimulus configuration. The excitatoryinteractions are only for sites that lie along, or close to,the stimulus axis (solid arrows, 1) but are absent (0) forsites that are sufficiently off this axis.

Explicit simulations with such a network indeed pre-dict the phenomenology for skewed lines. In the case ofskewed lines (e.g., the center segment plus the two termi-nators along the oblique direction), the lateral interac-tions are inactive and motion capture is impaired, inagreement with psychophysical data. While this successmay be of little surprise given computations that dependon skew, the same scheme makes the less intuitive (butcorrect) prediction that dot features will not capture linesegments.

Our experiments do not allow speculations about inter-actions for widely misoriented patterns (e.g., 90° skew).Two possibilities suggest themselves: either null ornegative (inhibitory) interactions. Inhibitory intercon-nections are interesting because they have the potentialto explain a phenomenon diametrically opposed to cap-ture but also observed in motion perception: motion re-pulsion. Findings of motion repulsion have been made

Fig. 15. Collinear facilitation where lateral interactions dependon the stimulus configuration. The thick lines with solid arrow-heads indicate strong interactions between sites that share thesame orientational preference. There are no interactions (‘‘0,’’open arrowheads) when orientations are sufficiently different.It is not yet clear what kinds of interactions there are in caseswhere orientations are widely different (e.g., perpendicular). Itis conceivable that the interactions, rather than being absent,should be negative, i.e., inhibitory. Further work is required todetermine these interactions (indicated by ‘‘?’’).

1484 J. Opt. Soc. Am. A/Vol. 20, No. 8 /August 2003 G. Loffler and H. S. Orbach

with random-dot patterns74–77 and gratings27,29 and havebeen attributed to inhibitory interaction among spatiallydistributed sites. On the basis of experiments with ran-dom dots, it has been proposed that capture and repulsionmight be explained simply on the basis of mechanisms op-erating over different distances.76 Comparing more re-cent results, it appears that motion capture7 and motionrepulsion78 are, at least in some experimental conditions,observed over a similar physical range. This challengesthe explanation based solely on distance.77 Instead, anexplanation of motion capture versus motion repulsionmay have to depend on the details of the stimulus con-figuration.

The collinear facilitation scheme proposed here has thepotential to explain these different observations. In ad-dition to the motion capture modeled here by excitatoryinteractions between aligned sites, repulsion would resultfrom negative, inhibitory interactions when contours aregrossly misoriented. It is encouraging to note that themaximum repulsive bias in the study by Kim andWilson78 was reported for gratings misaligned by 45° andthat no repulsive biases were observed in the case of grat-ings with the same orientation. Perhaps most signifi-cantly, such an algorithm may also provide a promisingfoundation for a low-level mechanism for the complemen-tary visual tasks of object integration and segmentationby causing motion integration to occur in certain circum-stances (e.g., aligned contours) but not others (e.g., if con-tours are likely to belong to different objects). A first stepin this direction would be a straightforward modificationof our model. Currently, the model employs lateral inter-actions that depend most strongly on the physical dis-tance between apertures [Eq. (5)]. Although formallypresented as a two-dimensional function of x and y, thecurrent version really is a one-dimensional function of thedistance @r 5 (x2 1 y2)1/2#. That is, the parameters forEq. (5) were derived from the experimental data7 ob-tained with apertures centered along a single axis. Atruly two-dimensional description of the integrationstrength would also depend on the lateral offset betweenapertures. This modification would extend the modelcurrently formulated for one-dimensional arrays of aper-tures to two-dimensional aperture arrays. There is asubstantial body of experimental results for such two-dimensional arrays from which parameters could bederived.3,6,78,79 These data suggests that a purelydistance-dependent Gaussian function as defined in Eq.(5) is indeed too simplistic to deal with different two-dimensional-array configurations. Certain features ofour current model (e.g., the restricted spatial-frequencybandwidth of initial filters) should guarantee successfulprediction of some of the results on aperture arrays (e.g.,the lack of integration across different spatialfrequencies78). However, the truly two-dimensional ex-tension needs to be formulated, with explicitly two-dimensional dependent excitation and inhibition terms,for comparison with other results. One of the most strik-ing of these comes from a study by Lorenceau and Zago.80

Very different percepts are observed for gratings thatform ‘‘virtual T-junctions’’ between apertures than forgratings that form ‘‘virtual L-junctions.’’ Both arrange-ments exhibit a 90° difference in orientation between

neighboring elements and differ only in the spatial ar-rangement of apertures. The T-junction stimulus has anobvious one-dimensional analog, for which our modelwould predict weak integration, consistent with theirdata. On the other hand, an L-junction stimulus cannotbe produced with a one-dimensional array of apertures.We speculate that the stronger integration observed byLorenceau and Zago in this case can be modeled by a trulytwo-dimensional facilitation scheme, similar to that sug-gested in their paper.

Taken together, our results illustrate that it is possibleto explain motion integration with a low-level, bottom-upapproach whether or not easy object interpretations areavailable. We take the model’s success as a challenge tothe necessity for a high-level interpretation of our experi-mental results. In particular, certain experimental re-sults from our companion paper7—features that are mis-aligned are of nonmatching spatial frequency, are ofopposite contrast polarity, or are dots—appear to posechallenges to, for example, present Bayesian models.20,21

On the other hand, we believe that our model should pre-dict the phenomena cited in these papers. One class ofresults concerns plaids.21 The local model31 on which ourglobal extension is based has already explained such re-sults, including the perceptual dependence on componentcontrast. Another class of results concerns movingrhomboids,21 in particular, how contrast and shape deter-mine the perceived direction of motion. Regardless ofcontrast, a fat rhombus always appears to move veridi-cally, whereas the perceived direction of motion for a nar-row rhombus is veridical for high but not low contrasts.In the context of our model, these results would dependon the relative contrast response functions for the first-order (encoding rhomboid edges) and second-order (rhom-boid corners) pathways. For simplicity, we have takenthese functions to be identical in our model. If they differin such a way that the first-order-pathway signal signifi-cantly exceeds the signal of the second-order pathway atlow contrast, the rhomboid results would follow naturally:the narrower the rhombus, the more dramatic the per-ceived deviation from its true motion. Such a differencebetween contrast-response functions would also explainresults on low-contrast lines81 and gratings behindapertures.80

However, this is not to imply that all integration effectsare due to low-level computations. It is entirely possiblethat low-level and high-level computations take place inparallel and influence the way we perceive moving ob-jects. Moreover, it seems likely that the contributions ofeach depend strongly on the complexity of the objects withwhich they are presented. Nevertheless, it is strikingthat for simple stimuli such as lines and dots in a multi-aperture array, low-level computations are sufficient toexplain a large variety of experimental data. It will beinteresting to see if future studies of either psychophysi-cal or neurophysiological nature support the model pre-sented here.

APPENDIX AThe appendix will describe, in detail, how a neuronal net-work can achieve capture of one signal (e.g., a featureless

SA2 5 5, SB2 5 8.

G. Loffler and H. S. Orbach Vol. 20, No. 8 /August 2003 /J. Opt. Soc. Am. A 1485

line segment) by adjacent signals (e.g., corner features orline tips) and how such signal capture can propagate.

1. How Can Motion Capture Be Realized?To simplify the complexity of the nonlinear dynamics ofthe network as far as possible, consider just two spatiallyseparated sites, A and B (Fig. 16, left).

Each of these sites is assumed to contain all the com-putations for local-motion processing (see Section 2 andRef. 13). The diagram depicts the highest level of this lo-cal computation, model MT pattern units, where it is pro-posed that lateral interactions occur. Instead of showing24 such pattern units, each tuned to a different directionof motion, for simplicity we show only two units (A1 andA2 , B1 and B2) at each site. Consequently, this simpli-fied version can be thought of as signaling one of two di-rections at each site, for example, upward (A1 and B1) ordownward motion (A2 and B2).

To guarantee a winner-take-all behavior at any singlespatial site, the local model employed recurrent inhibitoryinteractions (dashed lines with solid arrowheads). Theminus sign indicates mutual subtractive inhibition.These interactions obviously cannot produce captureacross sites. The network operations required for motioncapture are shown by solid lines and open arrowheads.These connections operate exclusively across sites and be-tween units tuned to the same directions of motion (A1and B1 , A2 and B2). In contrast to the intra-site inhibi-tory interactions necessary for the winner-take-all opera-tion, the lateral interactions are excitatory and are sym-bolized by the plus sign. Hence the proposed networkexhibits a combination of intra-site inhibition (winner-take-all) and inter-site excitation (capture).

The four coupled, first-order, nonlinear differentialequations governing the temporal dynamics of this sim-plified network are

dA1

dt5

1

t@2A1 1 NR~SA1 2 iA2 1 eB1!#,

dA2

dt5

1

t@2A2 1 NR~SA2 2 iA1 1 eB2!#,

dB1

dt5

1

t@2B1 1 NR~SB1 2 iB2 1 eA1!#,

dB2

dt5

1

t@2B2 1 NR~SB2 2 iB1 1 eA2!#, (A1)

where the Naka–Rushton function is

NR~x ! 5 Rmax

xN

mN 1 xN. (A2)

A1,2 and B1,2 represent the spontaneous, noise-normalizedfiring rates of the pattern units. These equations statethat the changing firing rate of each pattern unit is re-duced proportional to that firing rate (standard exponen-tial decay) and increased by the input to the neuron (ex-pressed as the Naka–Rushton function, NR in theequations). SA1 , SA2 , SB1 , SB2 are the inputs to the net-work (and can be understood as the outputs of model com-ponent cells). The parameters i and e (inhibitory intra-site and excitatory inter-site synaptic strength,respectively) represent the strengths of the mutual sub-tractive inhibition (winner-take-all) and additive lateralexcitation.

The mathematical description of these equations isbased on earlier neuronal network models.82–84 Thereader is referred to Wilson’s85 detailed discussion of thenonlinear differential equations used to describe temporaldynamics of neuronal network.

The behavior of the system can best be understood byconsidering an example. Suppose the component unit in-puts to the simplified network are as follows:

SA1 5 10, SB1 5 0,

Fig. 16. Left: Four-neuron, two-site network representing the interactions between model MT pattern units. For simplicity, each site(A and B) contains only two neurons (A1 and A2 , B1 and B2) selective for one of two directions of motion (e.g., up or down). Neurons atthe same site mutually inhibit each other in a winner-take-all fashion (dashed lines with minus signs). In addition to this intra-sitewinner-take-all computation, there are inter-site excitatory connections (solid lines with plus signs) between sites signaling the samedirection of motion (A1 and B1 , A2 and B2). These inter-site interactions are responsible for the capturing effect. Right, example of thetemporal dynamics of a four-neuron, two-site network. Capture is evident by the fact that at one site (B) the neuron (B1) wins thecompetition over its partner (B2) despite receiving a much weaker input. This is due to the lateral excitation between units signalingthe same direction of motion (A1 and B1) and a strong input to A1 (see text for details).

1486 J. Opt. Soc. Am. A/Vol. 20, No. 8 /August 2003 G. Loffler and H. S. Orbach

Without lateral interactions (e 5 0), the unit with thestrongest input at each site wins the local winner-take-allcompetition (A1 at site A and B2 at site B). When lateralinteractions are included, capture is reflected by the fol-lowing observation: The strongest pattern unit in thenetwork (here A1) influences the other site, so that its cor-responding unit (B1) wins the winner-take-all competi-tion at site B even when its initial input is weaker thanthat of its partner (B2). Hence the proposed combinationof inhibitory (intra-site) and excitatory (inter-site) inter-actions achieves signal capture.

It can be shown analytically that such a network willasymptotically approach one of two steady states: (1) ac-tivity in corresponding units A1 and B1 and (2) completeinhibition of the others (A2 and B2), or vice versa.86 Thisbehavior is guaranteed as long as the excitatory param-eter e is sufficiently strong. Which of the two states thenetwork will yield is dependent on the relative strengthsof the inputs SA1 , SA2 , SB1 , SB2 . (Figure 16, right,shows the temporal evolution of the network for thisexample.)

2. How Can a Capturing Signal Propagate?After establishing a network that can cause capture of aninitially weak or absent signal, the question arises as towhether such a captured signal can then propagate evenfarther and successively capture signals at more distantsites. The issue of signal propagation is of importancebecause this is presumably the mechanism responsible forhow human observers usually perceive a single veridicaldirection of motion for rigid objects. If, as in the case of a

Fig. 17. Propagation of an unambiguous terminator signal.Initially and in parallel (top row) many local computations(circles) occur at different sites along the line. Each circle rep-resents the area over which a local-motion model receives its in-put. Only at the terminator location does the local-model outputmatch the physical direction of motion of the line (in this case up-ward and to the right). At all the other positions the computeddirections (shown by the bold arrows) are perpendicular to theline’s orientation. The arrows under each circle show the lateralexcitatory interactions between sites. A successful model shouldexhibit the behavior portrayed on the figure: The terminatorsignal first captures the site next to it (second row), and the cap-tured signal in turn affects its neighboring site (third row) and soon until all sites signal the same, veridical motion (bottom row).At each step, the most active lateral connection is highlighted.

long line, only a small minority of neurons signal the truedirection, a vast number of neuronal signals have to be al-tered (Fig. 17). Propagation ought to be fairly undampedto guarantee capture, independent of the actual size andshape of the object. The explicit modeling presented inthis paper took advantage of such undamped propagation:Simulations were performed for only a minimum numberof sites (usually three). As long as propagation is un-damped, adding further sites (which might be partiallyoverlapping) would not have an effect on the final modeloutput. We show that this assumption is justified for theproposed model.87

Initial numerical simulations with the equation pro-posed above [Eq. (A1)] did not, however, exhibit the de-sired propagation behavior. Several simulations wherethe system was expanded from a two-site to a multi-sitenetwork showed that the stronger signal propagatessomewhat along sites but reaches a point where furtherpropagation damps out. The input parameters of oneparticular six-site (A–F) example were:

SA1 5 10, SB1 5 0, SC1 5 0,

SD1 5 0, SE1 5 0, SF1 5 0,

SA2 5 8, SB2 5 8, SC2 5 8,

SD2 5 8, SE2 5 8, SF2 5 8.

Capture and propagation of the strongest signal (A1)ought to cause a steady state in which only units A1 ,B1 ,...,F1 should be active, with the remaining units si-lent. This is not, however the final state of the model de-scribed by Eq. (A1). At the same time that neuron A1captures B1 , the excitatory interactions also cause mu-tual increases in the firing rate of adjacent units E2 andF2 . The network reaches a point where the propagatingsignal (originating from A1) and the conflicting signals(originating from the interactions among units F1 , E1 ,D1 ...) are of equal strength and further propagationceases. Whether and where this occurs depends on thenumber of sites, their inputs, and the excitatory and in-hibitory synaptic strengths. However, for any given setof parameters there will be a point at which the additionof a single further site abolishes propagation.

There are several ways in which this damping of signalpropagation can be avoided. One option, employed here,uses multiplicative modulation of the lateral excitatoryinteractions. The idea behind this is to avoid the syner-gistic increase in the activity of units signaling the samedirection of motion and to enhance interactions at loca-tions where neighboring sites signal different directionsof motion.

The combination of two mechanisms, lateral excitationand multiplicative modulation, guarantees the desireddomino effect portrayed in Fig. 17. The mathematicalformulation for the interactions in the specific case of atwo-site network is

dA1

dt5

1

t$2A1 1 NR@SA1 2 iA2 1 eB1~ uA1 2 B1u!#%.

(A3)

The only difference between Eqs. (A1) and (A3) is the finalterm, the difference between corresponding units’ signals

G. Loffler and H. S. Orbach Vol. 20, No. 8 /August 2003 /J. Opt. Soc. Am. A 1487

across sites. This term is strong whenever the two sitessignal different directions of motion, but it is weak or ab-sent when they are the same. By modulating lateral ex-citations in a multiplicative fashion, this computation al-lows mutual interactions only when sites signal differentdirections of motion, but it shuts them down when theyare already matching.

Besides enabling signal propagation, this kind of modu-lation serves a further purpose: It provides stabilizationand avoids signal oscillations. Once capture has oc-curred, it stabilizes the network against noise and biasesfrom conflicting motion signals of nearby objects. Whencapture and propagation are completed, the modulationterm becomes weak, because all units signal the same di-rection of motion. Any signal differing from this direc-tion will interact with its neighboring site. But even asmall shift of the signal at one site causes a simultaneousincrease in the modulation terms toward neighboringsites that signal the same object’s motion.

ACKNOWLEDGMENTSWe thank Hugh Wilson and Gael Gordon for helpful com-ments during various stages of this research and two re-viewers for their comments on an earlier version of thismanuscript. Grants from the Visual Research Trust,from the Fordergemeinschaft Deutscher Augenoptiker(FDA), and from the Engineering and Physical SciencesResearch Council (UK) to G. Loffler supported this re-search. Initial results of this research were first reportedat the Annual Meeting of the Association for Research inVision and Ophthalmology, Fort Lauderdale, Florida,April 29–May 4, 2001).88

Corresponding author Gunter Loffler’s e-mail addressis [email protected].

REFERENCES AND NOTES1. D. H. Hubel and T. N. Wiesel, ‘‘Receptive fields and func-

tional architecture of the monkey striate cortex,’’ J. Physiol.195, 215–243 (1968).

2. H. Wallach, ‘‘Uber visuell wahrgenommene Bewegungsrich-tung,’’ Psychol. Forsch. 20, 325–380 (1935).

3. M. B. Ben-Av and M. Shiffrar, ‘‘When ambiguous becomesunambiguous,’’ Invest. Ophthalmol. Visual Sci. 34, 1028–1028 (1993).

4. F. L. Kooi, ‘‘Local direction of edge motion causes and abol-ishes the barberpole illusion,’’ Vision Res. 33, 2347–2351(1993).

5. K. Nakayama and G. H. Silverman, ‘‘The aperture problemI. Perception of nonrigidity and motion direction in trans-lating sinusoidal lines,’’ Vision Res. 28, 739–746 (1988).

6. H. S. Orbach and H. R. Wilson, ‘‘Fourier and non-Fourierterminators in motion perception,’’ Invest. Ophthalmol. Vi-sual Sci. 35, 1827 (1994).

7. G. Loffler and H. S. Orbach, ‘‘Factors affecting motion inte-gration,’’ J. Opt. Soc. Am. A 20, 1461–1471 (2003).

8. S. Shimojo, G. H. Silverman, and K. Nakayama, ‘‘Occlusionand the solution to the aperture problem for motion,’’ VisionRes. 29, 619–626 (1989).

9. E. Castet, V. Charton, and A. Dufour, ‘‘The extrinsic/intrinsic classification of two-dimensional motion signalswith barber-pole stimuli,’’ Vision Res. 39, 915–932 (1999).

10. E. Castet and S. Wuerger, ‘‘Perception of moving lines: in-

teractions between local perpendicular signals and 2D mo-tion signals,’’ Vision Res. 37, 705–720 (1997).

11. L. Liden and E. Mingolla, ‘‘Monocular occlusion cues alterthe influence of terminator motion in the barber pole phe-nomenon,’’ Vision Res. 38, 3883–3898 (1998).

12. N. Rubin and S. Hochstein, ‘‘Isolating the effect of one-dimensional motion signals on the perceived direction ofmoving 2-dimensional objects,’’ Vision Res. 33, 1385–1396(1993).

13. G. Loffler and H. S. Orbach, ‘‘Computing feature motionwithout feature detectors: a model for terminator motionwithout end-stopped cells,’’ Vision Res. 39, 859–871 (1999).

14. N. M. Grzywacz and A. L. Yuille, ‘‘Theories for the visualperception of local velocity and coherent motion,’’ in Com-putional Models of Visual Processing, M. S. Landy and J. A.Movshon, eds. (MIT Press, Cambridge, Mass., 1991), pp.231–252.

15. E. C. Hildreth, The Measurement of Visual Motion (MITPress, Cambridge, Mass., 1984).

16. J. Lorenceau, M. Shiffrar, N. Wells, and E. Castet, ‘‘Differ-ent motion sensitive units are involved in recovering the di-rection of moving lines,’’ Vision Res. 33, 1207–1217 (1993).

17. J. Chey, S. Grossberg, and E. Mingolla, ‘‘Neural dynamics ofmotion grouping: from aperture ambiguity to object speedand direction,’’ J. Opt. Soc. Am. A 14, 2570–2594 (1997).

18. S. Grossberg and E. Mingolla, ‘‘Neural dynamics of motionperception: direction fields, apertures, and resonantgrouping,’’ Percept. Psychophys. 53, 248–278 (1993).

19. S. J. Nowlan and T. J. Sejnowski, ‘‘A selection model for mo-tion processing in area MT of primates,’’ J. Neurosci. 15,1195–1214 (1995).

20. Z. Y. Yang, A. Shimpi, and D. Purves, ‘‘A wholly empiricalexplanation of perceived motion,’’ Proc. Natl. Acad. Sci. USA98, 5252–5257 (2001).

21. Y. Weiss, E. P. Simoncelli, and E. H. Adelson, ‘‘Motion illu-sions as optimal percepts,’’ Nat. Neurosci. 5, 598–604(2002).

22. H. R. Wilson, V. P. Ferrera, and C. Yo, ‘‘A psychophysicallymotivated model for two-dimensional motion perception,’’Visual Neurosci. 9, 79–97 (1992).

23. W. Reichardt, ‘‘Autocorrelation, a principle for the evalua-tion of sensory information by the central nervous system,’’in Sensory Communication, W. A. Rosenblith, ed. (Wiley,New York, 1961), pp. 303–317.

24. J. P. H. van Santen and G. Sperling, ‘‘Elaborated Reichardtdetectors,’’ J. Opt. Soc. Am. A 2, 300–321 (1985).

25. K. I. Naka and W. A. Rushton, ‘‘S-potentials from colourunits in the retina of the fish,’’ J. Physiol. 185, 584–599(1966).

26. Note that neither the inputs, Ix,y [Eq. (1)], nor the recurrentinhibition [Eq. (2)] extends over the entire range of direc-tions (6180°) but instead are restricted to relative angles of6120°. The only reason for this limitation is to allow thelocal network to signal more than one direction of motion incircumstances of transparency.27 Following this argu-ment, such a network signals transparency by a bimodalityin MT pattern unit responses. However, in agreementwith psychophysical data,7 all global simulations presentedhere resulted in a single direction of motion, and the simu-lations never predicted transparency.

27. H. R. Wilson and J. Kim, ‘‘Perceived motion in the vectorsum direction,’’ Vision Res. 34, 1835–1842 (1994).

28. C. Yo and H. R. Wilson, ‘‘Perceived direction of moving two-dimensional patterns depends on duration, contrast and ec-centricity,’’ Vision Res. 32, 135–147 (1992).

29. J. Kim and H. R. Wilson, ‘‘Dependence of plaid motion co-herence on component grating directions,’’ Vision Res. 33,2479–2489 (1993).

30. J. Kim and H. R. Wilson, ‘‘Direction repulsion between com-ponents in motion transparency,’’ Vision Res. 36, 1177–1187(1996).

31. H. R. Wilson and J. Kim, ‘‘A model for motion coherence andtransparency,’’ Visual Neurosci. 11, 1205–1220 (1994).

32. To simulate the temporal dynamics of the coupled differen-tial equations, the fast Euler method has been em-

1488 J. Opt. Soc. Am. A/Vol. 20, No. 8 /August 2003 G. Loffler and H. S. Orbach

ployed. To verify this approach, initial sample simulationswere undertaken in which the Euler method was comparedwith the considerably slower but more stable fourth-orderRunge–Kutta method. These sample simulations provedthat for a sufficiently low step size of 1/4t, the two methodsgive indistinguishable results, with the Euler method beingfaster by a factor of ;3.

33. D. G. Albrecht and D. B. Hamilton, ‘‘Striate cortex of mon-key and cat: contrast response functions,’’ J. Neuro-physiol. 48, 217–237 (1982).

34. G. Sclar, J. R. Maunsell, and P. Lennie, ‘‘Coding of imagecontrast in central visual pathways of the macaque mon-key,’’ Vision Res. 30, 1–10 (1990).

35. L. S. Stone, A. B. Watson, and J. B. Mulligan, ‘‘Effect of con-trast on the perceived direction of a moving plaid,’’ VisionRes. 30, 1049–1067 (1990).

36. T. D. Albright, ‘‘Direction and orientation selectivity of neu-rons in visual area MT of the macaque,’’ J. Neurophysiol.52, 1106–1130 (1984).

37. J. H. R. Maunsell and D. C. Van Essen, ‘‘Functional prop-erties of neurons in middle temporal visual area of themacaque monkey. I. Selectivity for stimulus direction,speed, and orientation,’’ J. Neurophysiol. 49, 1127–1147(1983).

38. J. A. Movshon and W. T. Newsome, ‘‘Visual response prop-erties of striate cortical neurons projecting to area MT inmacaque monkeys,’’ J. Neurosci. 16, 7733–7741 (1996).

39. H. R. Rodman and T. D. Albright, ‘‘Single unit analysis ofpatter-motion selective properties in the middle temporalarea (MT),’’ Exp. Brain Res. 75, 53–64 (1989).

40. S. Raiguel, M. M. Van Hulle, D.-K. Xiao, V. L. Marcar, andG. A. Orban, ‘‘Shape and spatial distribution of receptivefields and antagonistic motion surrounds in the middletemporal area (V5) of the macaque,’’ Eur. J. Neurosci. 7,2064–2082 (1995).

41. G. G. Blasdel and D. Fitzpatrick, ‘‘Physiological organisa-tion of layer-4 in macaque striate cortex,’’ J. Neurosci. 4,880–895 (1984).

42. D. C. Van Essen, ‘‘The visual field representation in striatecortex of the macaque monkey: asymmetries, anisotropies,and individual variability,’’ Vision Res. 24, 429–448 (1984).

43. H. R. Wilson, ‘‘A model for direction selectivity in thresholdmotion perception,’’ Biol. Cybern. 51, 213–222 (1985).

44. H. R. Wilson and D. J. Gelb, ‘‘Modified line-element theoryfor spatial-frequency and width discrimination,’’ J. Opt.Soc. Am. A 1, 124–131 (1984).

45. H. R. Wilson and W. A. Richards, ‘‘Curvature and separa-tion discrimination at texture boundaries,’’ J. Opt. Soc. Am.A 9, 1653–1662 (1992).

46. H. R. Wilson, ‘‘Psychophysical models of spatial vision andhyperacuity,’’ in Spatial Vision, D. Regan, ed. (MacMillan,New York, 1991), pp. 64–86.

47. There are two conceptually different ways to treat neuralsites corresponding to locations of a scene without stimula-tion (e.g., aperture gaps). Grzywacz and Yuille14 proposeda model in which lateral interactions result in motion sig-nals at every point of the visual field regardless of whetherthe field was initially stimulated by a moving object. It isunclear whether this correctly reflects neurophysiology.The approach taken here is different. Spatial sites thatare not initially activated by motion in their receptive fieldare not activated by lateral interactions; rather, they staysilent. This is consistent with the approach of relating theactivity of MT pattern neurons directly to behavior48,49 andnoting that, behaviorally, parts of the visual field that havenot been stimulated do not appear to have motion associ-ated with them. Mathematically, this approach isachieved by modulating lateral interactions by a site’sactivity: If a site does not receive any bottom-up inputthrough its local computations, lateral excitation stayssilent.

48. C. D. Salzman, C. M. Murasugi, K. H. Britten, and W. T.Newsome, ‘‘Microstimulation in visual area MT—effects ondirection discrimination performance,’’ J. Neurosci. 12,2331–2355 (1992).

49. K. H. Britten, M. N. Shadlen, W. T. Newsome, and J. A.

Movshon, ‘‘The analysis of visual-motion—a comparison ofneuronal and psychophysical performance,’’ J. Neurosci. 12,4745–4765 (1992).

50. M. B. Ben-Av and M. Shiffrar, ‘‘Disambiguating velocity es-timates across image space,’’ Vision Res. 35, 2889–2895(1995).

51. G. Loffler, ‘‘The integration of motion signals across space,’’Ph.D. thesis (Glasgow Caledonian University, CowcaddensRoad, Glasgow G4 0BA, UK, 1999).

52. R. Hess and D. Field, ‘‘Integration of contours: new in-sights,’’ Trends Cogn. Sci. 3, 480–486 (1999).

53. Note the difference between this term, which depends online-segment orientation, and the motion-discontinuityterm in Eq. (4), which depends on direction of motion.

54. To simulate this condition, a small amount of (random)noise was added to the initial V1 simple cell responses.Without this noise, it would be impossible to extract a maxi-mally excited cell, as all cells would have the same firingrate. The randomly added noise (unique to this condition)is the reason for the small capturing behavior of the net-work for the smallest gap which represents the averageover 20 model simulations.

55. For example, L. Liden and C. Pack, ‘‘The role of terminatorsand occlusion cues in motion integration and segmentation:a neural network model,’’ Vision Res. 39, 3301–3320 (1999).

56. M. A. Georgeson, G. S. A. Barbieri-Hesse, and T. C. A. Free-man, ‘‘The primal sketch revisited: locating and repre-senting edges in human vision via Gaussian-derivative fil-tering,’’ Perception 31, 1 (2002).

57. A. P. Georgopoulos, M. Taira, and A. Lukashin, ‘‘Cognitiveneurophysiology of the motor cortex,’’ Science 260, 47–52(1993).

58. R. A. Andersen, L. H. Snyder, C.-S. Li, and B. Stricanne,‘‘Coordinate transformations in the representation of spa-tial information,’’ Curr. Opin. Neurobiol. 3, 171–176 (1993).

59. C. D. Salzman and W. T. Newsome, ‘‘Neural mechanisms forforming a perceptual decision,’’ Science 264, 231–237(1994).

60. C. C. Pack and R. T. Born, ‘‘Temporal dynamics of a neuralsolution to the aperture problem in visual area MT ofmacaque brain,’’ Nature 409, 1040–1042 (2001).

61. D. Bradley, ‘‘MT signals: better with time,’’ Nat. Neurosci.4, 346–348 (2001).

62. J. Allman, F. Miezin, and E. McGuinness, ‘‘Stimulus specificresponses from beyond the classical receptive-field: neuro-physiological mechanisms for local–global comparisons invisual neurons,’’ Annu. Rev. Neurosci. 8, 407–430 (1985).

63. R. T. Born and R. B. H. Tootell, ‘‘Segregation of global andlocal motion processing in primate middle temporal visualarea,’’ Nature 357, 497–499 (1992).

64. K. Tanaka, K. Hikosaka, H.-A. Saito, M. Yukie, Y. Fukada,and E. Iwai, ‘‘Analysis of local and wide-field movements inthe superior temporal visual areas of the macaque monkey,’’J. Neurosci. 6, 134–144 (1986).

65. K. S. Rockland and J. S. Lund, ‘‘Widespread periodic intrin-sic connections in the tree shrew visual-cortex,’’ Science215, 1532–1534 (1982).

66. K. E. Schmidt, R. Goebel, S. Lowel, and W. Singer, ‘‘The per-ceptual grouping criterion of collinearity is reflected byanisotropies of connections in the primary visual cortex,’’Eur. J. Neurosci. 9, 1083–1089 (1997).

67. W. H. Bosking, Y. Zhang, B. Schofield, and D. Fitzpatrick,‘‘Orientation selectivity and the arrangement of horizontalconnections in tree shrew striate cortex,’’ J. Neurosci. 17,2112–2127 (1997).

68. R. Malach, Y. Amir, M. Harel, and A. Grinvald, ‘‘Relation-ship between intrinsic connections and functional architec-ture revealed by optical imaging and in-vivo targeted biocy-tin injections in primate striate cortex,’’ Proc. Natl. Acad.Sci. USA 90, 10469–10473 (1993).

69. W. S. Geisler, J. S. Perry, B. J. Super, and D. P. Gallogly,‘‘Edge co-occurrence in natural images predicts contourgrouping performance,’’ Vision Res. 41, 711–724 (2001).

70. U. Polat and D. Sagi, ‘‘Lateral interactions between spatialchannels—suppression and facilitation revealed by lateralmasking experiments,’’ Vision Res. 33, 993–999 (1993).

G. Loffler and H. S. Orbach Vol. 20, No. 8 /August 2003 /J. Opt. Soc. Am. A 1489

71. D. J. Field, A. Hayes, and R. F. Hess, ‘‘Contour integrationby the human visual-system—evidence for a local associa-tion field,’’ Vision Res. 33, 173–193 (1993).

72. R. F. Hess, W. H. A. Beaudot, and K. T. Mullen, ‘‘Dynamicsof contour integration,’’ Vision Res. 41, 1023–1037 (2001).

73. P. J. Bex, A. J. Simmers, and S. C. Dakin, ‘‘Snakes and lad-ders: the role of temporal modulation in visual contour in-tegration,’’ Vision Res. 41, 3775–3782 (2001).

74. W. Marshak and R. Sekuler, ‘‘Mutual repulsion betweenmoving visual targets,’’ Science 205, 1399–1401 (1979).

75. G. Mather and B. Moulden, ‘‘A simultaneous shift in appar-ent direction: further evidence for a ‘distributional-shift’model of direction coding,’’ Q. J. Exp. Psychol. 32, 325–333(1980).

76. M. Nawrot and R. Sekuler, ‘‘Assimilation and contrast inmotion perception—explorations in cooperativity,’’ VisionRes. 30, 1439–1451 (1990).

77. R. J. Snowden, ‘‘Motions in orthogonal directions are mutu-ally suppressive,’’ J. Opt. Soc. Am. A 6, 1096–1101 (1989).

78. J. Kim and H. R. Wilson, ‘‘Motion integration over space:interaction of the center and surround motion,’’ Vision Res.37, 991–1005 (1997).

79. E. Mingolla, J. T. Todd, and J. F. Norman, ‘‘The perceptionof globally coherent motion,’’ Vision Res. 32, 1015–1031(1992).

80. J. Lorenceau and L. Zago, ‘‘Cooperative and competitivespatial interactions in motion integration,’’ Vision Res. 16,755–770 (1999).

81. J. Lorenceau and M. Shiffrar, ‘‘The influence of terminators

on motion integration across space,’’ Vision Res. 32, 263–273 (1992).

82. H. R. Wilson and J. D. Cowan, ‘‘A mathematical theory ofthe functional dynamics of cortical and thalamic nervoustissue,’’ Kybernetik 13, 55–80 (1973).

83. S. Grossberg, ‘‘Contour enhancement, short-term memoryand constances in reverberating neural networks,’’ Stud.Appl. Math. 52, 217–257 (1973).

84. H. R. Wilson, ‘‘Hysterisis in binocular grating perception:contrast effects,’’ Vision Res. 17, 843–851 (1977).

85. H. R. Wilson, Spikes, Decisions, and Actions (Oxford U.Press, Oxford, UK, 1999).

86. The nature and stability of the steady state was estimatedby considering the linear terms of a Taylor expansion of thenonlinear dynamics at the equilibrium points (for details ofthis approach see Ref. 85). The corresponding exponen-tials have negative real parts, and the equilibrium is con-sequently stable.

87. It is important to emphasize that this kind of undampedpropagation does not necessarily result in a single directionof motion in the more complicated case of a dynamic multi-object environment. Signal propagation is strong only foradjacent or overlapping sites. As our experiments show,any gap between sites weakens propagation and allowsdifferent directions of motion for nearby objects in ascene.

88. H. S. Orbach and G. Loffler, ‘‘Motion integration across ap-ertures: theory and experiment,’’ Invest. Ophthalmol. Vi-sual Sci. 42, 4685 (2001).