Learning the Gestalt Rule of Collinearity from Object Motion · Learning the Gestalt Rule of Collinearity from Object Motion 1867 technically minded reader may want to skip this introduction

LETTER Communicated by Aapo Hyvarinen

Learning the Gestalt Rule of Collinearity from Object Motion

Carsten ProdohlCarstenProdoehlneuroinformatikruhr-uni-bochumdeRolf P WurtzRolfWuertzneuroinformatikruhr-uni-bochumdeChristoph von der MalsburgChristophvonderMalsburgneuroinformatikruhr-uni-bochumdeInstitut fur Neuroinformatik Ruhr-Universitat Bochum D-44780 Bochum Germany

The Gestalt principle of collinearity (and curvilinearity) is widely re-garded as being mediated by the long-range connection structure in pri-mary visual cortex We review the neurophysiological and psychophysicalliterature to argue that these connections are developed from visual ex-perience after birth relying on coherent object motion We then present aneural network model that learns these connections in an unsupervisedHebbian fashion with input from real camera sequences The model usesspatiotemporal retinal ltering which is very sensitive to changes in thevisual input We show that it is crucial for successful learning to use thecorrelation of the transient responses instead of the sustained ones As aconsequence learning works best with video sequences of moving ob-jects The model addresses a special case of the fundamental question ofwhat represents the necessary a priori knowledge the brain is equippedwith at birth so that the self-organized process of structuring by experi-ence can be successful

1 Introduction

It has long been realized that perception is not a passive intake of unstruc-tured sensory signals but a highly selective and structured active processemploying complicated and widely unknown rules to guide behavior Ifthese rules are not hardwired at birth and we will argue in detail thatsome of them are not this creates a vicious circle of knowledge depend-ing on perception to arise and perception in turn depending on acquiredknowledge to be possible An attractive way to break this circle is the pos-tulate that only some basic perceptual mechanisms are present at birthwhich can be used to learn useful processing rules from the properties ofthe environment These can lead to rened perception and consequentlythe acquisition of rened knowledge about the environment The evolu-tionary advantage gained from such a hierarchy would be a lower bur-den on the coding capacity of the genome and more exibility to cope

Neural Computation 15 1865ndash1896 (2003) cdeg 2003 Massachusetts Institute of Technology

1866 C Prodohl R Wurtz and C von der Malsburg

with such environmental changes that happen too fast for evolutionarychanges

The most prominent rules for visual perception are the Gestalt principlesformulated by Wertheimer (1923) and Koffka (1935) Their basic postulateis that a percept is more than the sum of the constituent parts in that theseparts are linked to form a coherent whole The Gestalt principles are rulesthat govern what should be linked together to form a percept and whatshould not The most important ones are proximity similarity closure goodcontinuation common fate good form and global precedence

Recent research in computer vision has begun to recognize Gestalt princi-ples as important for object selection and segmentation and they have beenapplied in numerous models of visual perception where they are usuallytaken for granted To our knowledge there are two models for their devel-opment during maturation of the visual system (Grossberg amp Williamson2001 Choe 2001) but we know of none that relies on short sequences of asingle natural scene as the basis for learning

We will present a detailed model of how the principles of collinearityand curvilinearity can be learned from real visual stimuli assuming thatthe principle of common fate actively organizes perception already at birthWe will employ three assumptions which are to varying degree coveredby experimental data

sup2 Gestalt principles are implemented by the connectivity of the visualcortex

sup2 There is a hierarchy of Gestalt principles in the sense that some prin-ciples are already active at birth and others are developed later andinuenced by visual experience This hierarchy is reected in the tem-poral order in which the various principles become observable duringdevelopment

sup2 The higher principles are learned from experience while the lowerones assist in structuring the data to be learned

Concretely we will review the literature to present evidence that the prin-ciples of collinearity and curvilinearity correspond to structured horizontalconnections between simple cells in V1 and that the principle of commonfate is more fundamental than collinearity and curvilinearity Then we de-scribe a neuronal model that shows that collinearity and curvilinearity canbe learned from observing moving objects by structuring horizontal con-nections with a Hebbian learning rule

The letter is organized as follows In section 11 we review recent psy-chophysical work on the course of development of various Gestalt princi-ples in early infancy In section 12 we focus on the role of common fateIn section 13 we look in depth at recent neurophysiological data aboutwhich biological pathways and computations are probably present at birthor shortly afterward and which are structured by experience The more

Learning the Gestalt Rule of Collinearity from Object Motion 1867

technically minded reader may want to skip this introduction and move tosection 2 where we describe a simplied model of visual processing up toV1 and the proposed learning dynamics of the horizontal connections Thetechnical details can be found in the appendices In section 3 the outcomeof computational experiments using this model is presented Finally in thediscussion we extract the basic components that we consider crucial for thesuccess of the model

11 The Developmental Succession of Gestalt Principles In this sec-tion we collect psychophysical evidence that Gestalt principles are notpresent at birth and that they develop one after the other Although foradults the Gestalt principles of good form or good continuation are domi-nant for perceiving an object as a unity children younger than one year canhardly make use of them (Spelke Breinlinger Jacobson amp Phillips 1993)The same holds for texture similarity and color similarity Twelve-week-oldinfants do not use these principles at all with regard to object unity andslowly start to employ them during the development in the rst postnatalyear Detecting form at a very early stage seems to depend on continuousoptical transformations caused by object or observer motion At 24 weeksinfants are unable to grasp 3D object form from multiple stationary binocu-lar views However if 16-week-old observers are presented with a continu-ous geometrical transformation around a stationary 3D object they are ableto build up a 3D form representation (Kellman amp Short 1987)

The disparity information from the two eyes cannot be used immediatelyafter birth It is only at the age of 4 to 8 weeks that the convergence accuracyof the eyes for distances between 25 cm and 200 cm reaches the accuracyof the adult (Hainline Riddell Grose-Fifer amp Abramow 1992) Accommo-dation of the eyes becomes accurate at 12 weeks postnatally (Braddick ampAtkinson 1979) Stereo vision itself develops even later at the age of 16weeks (Yonas Arterberry amp Granrud 1987) In the period between 12 and20 weeks postnatally strong binocular interaction arises which at 24 weeksculminates in a superior binocular acuity and the beginning of stereopsis(Birch amp Salomao 1998 Birch 1985)

The contrast sensitivity for spatial frequency gratings increases between 4and 9 weeks forall spatial frequencies in line with the convergence accuracyof the eyes (Norcia Tyler amp Hamer 1990) At 9 weeks the acuity for lowspatial frequency gratings reaches adult levels but for higher frequenciesit is still three octaves worse than in the adult (Courage amp Adams 1996)From this time on the contrast sensitivity for high spatial frequency gratingsincreases systematically in line with the emergence of stereo vision Thecolor system develops even later luminance contrast above 20 is reliablydetectable at the age of 5 weeks but there is still no response to isoluminantchromatic stimuli of any size or contrast In the following weeks chromaticgratings are detectable only at low spatial frequencies with an acuity 20times lower than that for luminance stimuli The sensitivity to chromatic


gratings increases more rapidly in the following weeks than the one forluminance stimuli (Morrone Burr amp Fiorentini 1990) Not only physicalfactors such as changing photoreceptor density may be responsible for thechanges in contrast sensitivity but the neural noise is nine times higher inneonates than in adults and decreases to adult levels during the rst eightmonths of development (Skoczenski amp Norcia 1998)

12 The Relevance of Rough Motion Information In this section wediscuss the biological relevance of motion information and collect evidencethat it is available at an early stage of development We have argued thatneither form nor color nor disparity information can be processed at earlypostnatal stages but luminance information at low spatial frequencies canThe latter is thought to be mediated mainly by the M-path which is also es-sential for motion processing The Gestalt principle of common fate realizedby common motion of object parts is the dominating principle for perceiv-ing the unity of an object in 12-week-old infants (Kellman Spelke amp Short1986) This is independent of the direction of motion in 3D space Commonmotion dominates gural quality substance weight texture and shape in16-week-old infants (Streri amp Spelke 1989) At this age the infant is ableto make a distinction between object and observer motion and uses onlyobject motion for the generation of an object percept (Kellman Gleitmannamp Spelke 1987)

The question arises as to what kind of computation is carried out regard-ing the visual information originating from a moving object detected bythe retina Twenty-four-week-old infants can predict linear object motion ingrasping tasks but have difculties doing the same for visual tasks involv-ing tracking of objects that are out of reach (Hofsten Vishton Spelke Fengamp Rosander 1998) The apparent inability to predict linear object motiontogether with the observation that at low spatial frequencies luminanceinformation with sufcient contrast can be processed directly after birthmakes it very unlikely that the visual system is able to estimate accuratemotion vectors at early stages of development Nevertheless some motionprocessing is important in early development Piaget (1936) reported thateven newborns react to high-contrast stimuli that are moving in front oftheir faces This is consistent with the ndings on luminance gratings andwe conclude that at birth a system must be present that can detect changesin the visual world signaled by achromatic luminance stimuli of low spatialfrequency

We interpret these ndings such that learning Gestalt principles relieson changes in low-frequency luminance information at times when no ego-motion occurs Infants can actively create such a situation by gazing in aconstant direction where a moving object of sufcient size and luminancecontrast is present for a considerable amount of time This specic behavioris indeed observed regularly as has been reported by for example Piaget(1936) and Barten Birns amp Ronch (1971) Our model refers to this particular


situation which from ldquowithinrdquo the visual system is distinguished by strongtransient responses in retina and cortex together with the knowledge thatthe observer is not moving In order to complete the model two thingsremain to be specied

1 What is the supposed neuronal substrate for the functional organiza-tion according to Gestalt principles

2 How can this substrate be modied on the basis of the transient re-sponses caused by object movement

13 Development of the Visual Pathway In order to motivate our an-swer to these questions we now review some facts about the early devel-opment of connectivity in the visual pathway Although for the retina theprocess of maturing is not complete after birth Tootle (1993) has shown thatganglion cells of cats show burst-like spontaneous activity and that thosecells fatigue very quickly after repeated stimulation with the same stimu-lus in the rst postnatal week We interpret this functionally as a kind oftransient response property that detects changes in the visual input Thesame author further showed that ON- and OFF-ganglion cells are alreadypresent at birth and that the proportion of light-driven ganglion cells ap-proaches 100 in the second postnatal week Furthermore the retinogenic-ulate (Snider Dehay Berland Kennedy amp Chalupa 1999) and the thala-mocortical (Isaac Crair Nicollamp Malenka 1997) pathways develop mainlyprenatally and are therefore present at birth In visually inexperienced kit-tens 90 of all cells in area 17 are of simple type and 70 of all visuallyactive neurons in this area show a rudimentary orientation bias (Albus ampWolf 1984) although only 11 are specically tuned to one orientationMost of these cells respond preferentially to contrast changes caused by de-creasing light intensity as 76 of all responding neurons are activated byOFF-zones exclusively This means that area 17 shows a sensitivity bias infavor of dark stimuli immediately after birth The cells responding to visualstimuli are located in layers 4 and 6 of the striate cortex and there is almostno activity in layers 2=3 and 5 before 3 weeks postnatally In the fourth post-natal week ON- and OFF-zones are equal in number and almost all cells inlayers 4 and 6 show orientation tuning Psychophysical experiments showthat in the human infant contrast differences overrule orientation-basedtexture differences in segmentation tasks (Atkinson amp Braddick 1992) upto the twelfth week

We now turn to the question of what the neural substrate for learning theGestalt principles of collinearity or curvilinearity is There is extensive evi-dence in the psychophysical (Field Hayes amp Hess 1993 Hess amp Field 1999Kovacs 2000) neurophysiological (Malach Amir Harel amp Grinvald 1993Bosking Zhang Schoeld amp Fitzpatrick 1997 Schmidt Goebel Lowel ampSinger 1997 Fitzpatrick 1997) and modeling (Grossberg amp Mingolla 1985Li 1998 Ross Grossberg amp Mingolla 2000 Yen amp Finkel 1998 Gross-


berg amp Williamson 2001) literature that these principles are at least partlyimplemented by horizontal connections in V1 The models described byGrossberg and Mingolla (1985) Grossberg and Williamson (2001) and Rosset al (2000) agree with perceptual data even to the point of reproducingillusionary contours

Most but not all vertical interlayer local circuits in V1 of macaque mon-keys develop prenatally in precise order without visual experience (Call-away 1998) This means that axon terminals at least nd the right layer andalready form a crude retinotopic projection However intralayer horizontalconnections are present but only rudimentarily developed at birth as mostaxon terminals have not yet hit their target cells Studies of postmortem hu-man brains show that the rst horizontal connections develop 1 to 3 weeksbefore birth (37 weeks after gestation) in layers 4b and 5 Their numberincreases rapidly after birth and culminates in a uniform plexus at around7 weeks after birth The patchiness of these projections as it is found inthe adult emerges after at least 8 weeks postnatally (Burkhalter Bernardoamp Charles 1993 Katz amp Callaway 1992) The long-range connections canextend up to a maximum of four hypercolumns in each direction (Katz ampCallaway 1992) Burkhalter et al (1993) further showed that consistent withthe results of neuronal activity in kittens layers 2=3 and 6 develop horizon-tal connections later than layers 4b and 5 In layer 2=3 they are not presentuntil the sixteenth postnatal week and reach maturity in the sixtieth weekIt is interesting to note that the connections in layer 2=3 are patchy from thestart This suggests that they can already benet from the patchiness of theconnections in layer 4b probably mediated by a direct vertical connectionfrom layer 4b to layer 2=3 that develops after birth (Katz amp Callaway 1992)Furthermore as layer 4b belongs to the M-path and provides direct inputto area MT (which is strongly involved in motion processing) we concludethat the processing of visual information related to motion precedes andprobably supports the processing of form color precise stereoscopic depthand their integration This assumption is consistent with the psychophys-ical results mentioned earlier The development of horizontal connectionshas been shown to depend on the visual input presented (Lowel amp Singer1992)

14 Relation to Natural Image Statistics An article about learning fromnatural stimuli is incomplete without discussing what is known in the liter-ature about the statistics of such stimuli The idea that the visual system iswired in a way that it provides an efcient and nonredundant representa-tion of the incoming signals goes back to Attneave (1954) and Barlow (1961)Based on this principle there have been successful predictions of propertiesof retinal lateral geniculate nucleus (LGN) and simplecells in V1 Exampleswithout attempt on completeness include Olshausen and Field (1996) Belland Sejnowski (1997) and van Hateren and Ruderman (1998) A completereview of this line of work is beyond the scope of this letter but is done beau-


tifully in Simoncelli and Olshausen (2001) Additional assumptions have tobe employedmdashtypically either the sparseness (Olshausen amp Field 1996) ofa cortical representation or the statistical independence of the activities ofthe cells involved The latter leads to properties of visual cells resultingfrom independent component analysis (van Hateren amp Ruderman 1998Bell amp Sejnowski 1997) Also translation invariance is usually assumedbecause otherwise the required statistical basis would become intractablylarge

During review of this article we learned that independent componentanalysis has recently been applied successfully to networks of V1 cells thatsupport contour enhancement (Hoyer amp Hyvarinen 2002) They learn afeedforward layer of contour coding cells that take input from complexcells in V1 The underlying assumption is sparseness of coding

With our model we take a slightly different approach We do not em-ploy any assumption about sparseness or independence of cortical signalsRather we model the spatiotemporal properties of the visual pathway upto V1 and apply Hebbian learning to the horizontal connections betweensimple cells The model is more sophisticated in biological detail than othersin this area For example positivity of neuronal responses is always main-tained As a consequence the stimuli are preprocessed in a nonlinear waybefore providing data for learning The importance of such nonlinearitieshas been pointed out by Zetzsche amp Krieger (2001) A feature that our modelshares with the others is the explicit assumption of translation invariancewhich leads to weight sharing during learning This assumption is ratherunbiological but hard to avoid for keeping computation times acceptable

2 Methods and Models

Starting from the data just reviewed we assume that the specic connectiv-ity pattern of long-range horizontal connections provides the neural basisfor the Gestalt principles of collinearity and curvilinearity This notion issupported by the apparent co-occurence of the use of these principles andthe maturation of the respective connections during development This an-swers the rst question raised at the end of section 12 about the neuralsubstrate of Gestalt principles In order to answer the second one abouthow these connections develop depending on object motion we proposea quantitative model of how the transient retinal stimuli are propagatedtoward the cells interconnected by the axons in question (see Figure 1) andhow their connection strengths are modied We model the relevant partsof the visual pathway running from the retina via the retinogeniculate andthalamocortical connections to the simple cells of layer 4b in primary visualcortex and apply a Hebbian learning rule to shape the connectivity

All functions we will use to describe our model are functions of two-dimensional space but we omit this dependency for convenience of no-tation The full details of the retina model are given in appendix A1 we


Figure 1 Illustration of the complete model

summarize only the important features here The retinal photoreceptorsshow a strong transient response to changing stimuli Their output is passedto bipolar cells and nally ganglion cells perform a spatial ltering usingthe well-known center-surround antagonism that enhances local contrastdifferences As the temporal information detected by the photoreceptors ispreserved the ganglion cells show a transient output as well

Our retina model is based on the model for Y-ON-ganglion cells devel-oped by Gaudiano (1994) which is extended in two respects First bothON- and OFF-ganglion cells are modeled and second the time course ofthe transient response pattern of each cell is represented in a simplied wayby just two values (see Figure 2) a maximal transient response vtr when


Figure 2 The computation of the discrete approximations vtr and vst to thecontinuous transient activity of an ON ganglion cell

new input comes in and the steady-state activity rate vst while this inputis constantly present Here it is assumed that the timescale of the ganglioncells is faster than the change in the input which is recorded by a camera at3 frames per second This requirement results in an upper bound for objectspeed relative to the timescales used for the neuronal processing

Neither the LGN nor the cortical layer 4creg is explicitly modeled as it isassumed that in early ontogenesis no processing relevant for the develop-ment of Gestalt principles is performed there Therefore the activity of ON-and OFF-ganglion cells provides direct input to the simple cells in layer 4bIt will become clear later that for the purpose of our model it is not neces-sary to model all aspects of inhibitory neurons in striate cortex in detail Adetailed model of the cortical dynamics mediated by short- middle- andlong-range corticocortical connections is also not required Let us explainwhy There are a lot of inhibitory interneurons in the cortex that receive af-ferent input themselves and affect the excitatory pyramidal cells later by atleast short-range lateral connections We assume that the main effect of thoseinhibitory connections with regard to our model is to avoid an excitatoryexplosion when input arrives at the cortex as only afferences are comingin This assumption is realistic because the high neural noise in neonates(see section 13) requires strong global inhibitionmdashstronger than the over-all excitationmdashin the cortex to keep the whole system stable Furthermoreactivity mediated along horizontal connections alone should not be able totrigger a response in a target cell If inhibitory neurons have no other func-


tion than the ones mentioned we can avoid modeling them explicitly byletting only those excitatory cortical cells participate in the structuring oflong-range cortical connections that receive strong primary afferent inputthemselves Consequently we do not need to model the cortical dynamicsfurther at this early stage of development as purely intracortical inuencesat a postsynaptic neuron without strong primary afferent input should nothave signicant impact on the activity In the adult however there are sub-stantial inuences from intracortical long-range connections and also theneural noise is reduced by a factor of nine

The modeled simple cells are arranged in hypercolumns and during thesimulations a long-range connection structure between these cells emergesOne cell of each hypercolumn can be connected to any cell located in a 9 pound 9square surrounding its own hypercolumn Each simple cell in the model infact represents a local pool of cells that all have similar properties Thereforeit makes sense to model connections of a model cell to itself

One of the crucial features of our model for the development of a spe-cic long-range connection structure is that the cortical simple cells inlayer 4b show a transient response pattern In the results section we willsee that this transient cortical response pattern enables the developmentof an iso-orientation long-range connection structure as it is found in an-imals (Schmidt et al 1997) In the following we describe how we modelthe transient cortical activity starting with the transients of the ganglioncells An interesting question which is beyond the range of this article ishow these responses are produced biologically For simplicity (and com-putational tractability) we omit all biological details that could be part ofthe theoretically possible mechanisms (eg single-cell properties sustainedlocal inhibition) that enable transient responses These must be investigatedin the animal and by models on a ner scope than ours

We have found that the success of the model depends much more on thefact that the response is transient than on its precise time course Thereforefor each time step n between the acquisition of successive video frames theoutput of an ON- or OFF-ganglion cell is discretized to two values vtrn

and vstn Consequently it is also natural to discretize the primary afferentresponse a that a simple cell would show if no other (intracortical) inuenceswere present to a transient and a stationary value atr and ast respectivelya is computed by a 2D spatial convolution (denoted by curren) with the kernelsgON and gOFF representing the synaptic couplings made by ON-afferencesand OFF-afferences to the cortical cells

atrn D vONtr n curren gON C vOFF

tr n curren gOFF

astn D vONst n curren gON C vOFF

st n curren gOFF (21)

The full details of the cortex model are given in equations B1 through B3Here it is sufces to mention that the functions gON and gOFF are responsible


for the orientationAacute and the polarity(C iexcl) of the simplecell receptiveeldsThe resulting receptive elds for the AacuteC cells are shown schematically in thetop row and left-most column of Figures 3 and 4 Regardless of the detailedmechanism that causes the transient nature of the cortical cell response ifa transient response is triggered at frame n then there must be a differencein the primary afferent input of this frame astn and the primary afferentinput the cell received during the previous frame astn iexcl 1 Then it followsfrom our retina model that there has to be a difference in the values of atrn

and astn as well as after the transient over- or undershoot a new steadystate astn will be reached which cannot be equal to the old one astn iexcl 1because otherwise there would have been no transient response at all

The real output o of the ganglion cellmdashthe one that can be measuredexperimentally by counting action potentials and linearly transforming thebase rate to zeromdashcan then be approximated as

on D maxatrn iexcl astn 0 (22)

The response in equation 22 is quantitatively a bit too strong as the relax-ation of the afferent input astn may not be complete in the time betweentwo consecutive frames However the important feature for our model isthat transient responses are successfully detected by this output functionTo point out how crucial the transient nature of cortical responses in layer 4bis for the development of long-range connections we have done additionalsimulations with simple cells having sustained (ocurren) instead of transient re-sponses

ocurrenn D maxastn iexcl 2 0 (23)

where 2 is a constant near the baseline primary afferent activityA simple Hebbian learning mechanism is used for the adaption of the

long-range synaptic strengths

1wij D sup2oioj (24)

For computational efciency this learning rule is not applied to singlesynaptic weights but to ensembles of equivalent connections Two con-nections are equivalent if their pre- and postsynaptic cells have the sameorientation and polarity and they span the same cortical distance This ef-fectively leads to a system of connections that is translation invariant bydesign Mathematical details and a biological motivation can be found insection B2

If the transient cortical responses o are inserted into equation 24 to eval-uate the correlation between pre- and postsynaptic cell respectively theconnection structure shown in Figure 3 emerges We will argue in section 3that this is suitable to form the anatomical basis for the Gestalt principle ofcollinearity


Figure 3 Learned synaptic strengthsThe left-most column shows the classicalreceptive eld shapes Aacute

preC of presynaptic simple cells and the top row those of

the postsynaptic cells (AacutepostC ) The other squares show the spatial distribution of

synaptic strengths of one presynaptic cell to a 9 pound 9 patch of postsynaptic cells ofconstant orientation selectivityBoth types of squares use the same scale in retinalcoordinates The weights are coded in gray scale with white corresponding tothe highest weight The central position of each 9 pound 9 array corresponds to aconnection between pre- and postsynaptic cells in the same hypercolumn Theweight distribution shown has been learned on the basis of the transient corticalresponses dened in equation 22 with a Hebbian learning rule (see equation 24)The inuence of cortical connections far exceeds the size of the classical receptiveelds Furthermore the connections that are established prominently connectsimple cells with nearly the same orientation in hypercolumns that in retinalcoordinates refer to points lying in the direction of their preferred orientationThus it supports collinearity and curvilinearity


Figure 4 This gure shows long-range cortical synaptic strengths that developif sustained cortical responses (see equation 23)are used in the Hebbian learningrule (see equation B4) For an explanation of the display arrangement refer toFigure 3 The inuence of the strong gray tone diagonal in the background of theused images can easily be detected in most of the synaptic connection elds Thisexample shows how the arbitrary background dominates the learning processof long-range connections when sustained cortical responses are used

If the simulations are done with the same visual processing but withthe Hebbian learning on the basis of the sustained cortical responses (ocurren)the resulting connection structure is qualitatively different (see Figure 4)In this case no general principles are learned but the nal connectionstructure reects properties of the particular visual data used forlearning


Figure 5 One frame of a typical camera sequence used in the simulations Thereis a single moving object in front of the observer and a structured backgroundThe strong diagonal gray tone border in the background is used to illustrate thedifference between sustained and transient responses in the learning rule Sucha biased background is a problem for static responses which tend to learn justthat bias If transient responses are used the distribution of orientations causedby the moving person is much broader

3 Results

The simulations have been performed with sequences of camera images likethe example shown in Figure 5 A typical sequence consisted of 100 to 200frames and showed a person moving in front of a camera and performingsome arm movements In the whole sequence (one frame is shown in Fig-ure 5) the person was the only moving object and there is a strong diagonalgray tone border in the background


The learned long-range connection structures for the transient (see equa-tion22) and the sustained (see equation 23) cortical responses are illustratedin Figures 3 and 4 respectively The shown cells all have the same polarityand vary only in orientation The results are very similar for the cells of theother polarity Aacuteiexcl while cross connections between different polarities havenot been examined in detail

We now interpret the results shown in Figure 3 that have been obtainedwhen the transient cortical responses (see equation 22) have been used Indetail the results are

sup2 The size of the long-range connection structure far exceeds the size ofthe classical receptive eld

sup2 The strongest connections are established from the reference hypercol-umn (central array positions) to itself They connect a pool of identicalpresynaptic cell types with themselves (iso-orientation)

sup2 There are in addition strong connections in the 3 pound 3 neighborhoodof the reference hypercolumn to iso-oriented cells

sup2 The connections from the (central) reference hypercolumn to other hy-percolumns are made to cells with a similar orientation of pre- andpostsynaptic receptive eld respectively Looking at these connec-tions one can see that the direction of the strongest connections corre-sponds to the orientation in the receptive elds (collinearity) There-fore the receptive eld is somewhat extended by these long-rangeconnections

sup2 The connectivity pattern diminishes in strength with the difference inorientation of pre- and postsynaptic receptive elds (curvilinearity)

This shows that the use of transient responses for Hebbian learning canlead in an efcient and robust way to a connection structure suited to formthe anatomical basis for the Gestalt principle of collinearity and even curvi-linearity (Field et al 1993 Hess amp Field 1999 Guy amp Medioni 1996) Theimportance of transient responses is elucidated by comparison with the re-sults from the same learning rule applied to the sustained responses shownin Figure 4 caused by the same stimuli The resulting horizontal connectionstructure is qualitatively different from the one in Figure 3 The most sig-nicant long-range connections are established to the postsynaptic cells incolumn 6 of Figure 4 These show a strong response to the tilted edge inthe background of the sequence Furthermore almost all long-range con-nections are in the direction of the that edge

To a lesser extent the same holds for the connection strengths for theneighboring four columns (4ndash8) of column 6 and just the presynaptic cellswith an orientation difference of 90 degrees (shown in row 2 and the lastrow of Figure 4) to the gray tone border cells show almost no developedconnection structure One can say that the learned connection structure is


dominated by the strong gray tone border that is an accidental propertyof the background As the gray tone edge is constantly present in the im-age cells with an orientation similar to that of the gray tone border aremdashaccording to their tuning curvemdashactive as well Therefore although theirpresynaptic cell activation may be relatively weak their connection strengthto the gray tone border cells increases with each learning step One couldargue that this is a problem of the threshold 2 used in equation 23 andthat an increased threshold should lter out the weak responses to bordersso that just the presynaptic border cells connect to the postsynaptic bordercells leaving the rest mainly unchanged This procedure could work forany particular image as probably one can nd a threshold that lters outthe important borders and suppresses the weak responses for that particu-lar image However given the variations in natural images this thresholdmust be changed from image to image because a weak response to a borderin one image may be of the same magnitude as the response to the strongestborder in another image One could think that a kind of adaptive thresholdthat decides about the presence of a border taking its strength in relationto the maximum border strength found in the image could be a solutionRelated to this is the idea of normalizing the maximal responses in the im-age These approaches have the additional disadvantage that even a ldquonoiserdquoimage would participate in the learning process to the same degree as animage with very strong borders

The results show that the inclusion of transient responses in the modelovercomes these conceptual problems by using information from two dif-ferent cues the gray tone and the occurring change in subsequent imagesTherefore only moving edges in the image take part in the learning pro-cess while static ones have no inuence Because it was not possible to learncollinearity with sustained cortical responses out of the same amount of bi-ased image data that was sufcient for transient ones one can say that usingtransient responses is an efcient and robust way to do so One could ar-gue now that with different kinds of backgrounds presented or with a largevariety of directions present in one background the disturbing inuenceswill eventually average out and in the end the same connection structureas with transient response could emerge With a well-balanced input or atleast a very large input of different visual scenes this may be possible Itwould however be a considerable risk for the infants if early developmentdepended critically on this condition It may also be argued that the ad-vantage of transient over sustained responses is only a consequence of theorientation bias in the background

In order to clarify the two previous points we have applied our model tostandard image collections which we concatenated into sequences In thiscase transient responses are pointless because there is no real movementLearning from the sustained responses also yields a biased connection struc-ture which supports collinearity in the horizontal and vertical orientationsbut hardly in the oblique ones See Figure 6 for the results on the database


Figure 6 Long-range cortical synaptic strengths learned from a sequence ofstatic natural images Transient responses make no sense in this case so thesustained cortical responses (see equation 23) are used in the Hebbian learningrule (see equation B4) For an explanation of the display arrangement refer toFigure 3

available from British Telecom More stimulus sets together with the learn-ing results are available from our Web site and described in appendix C

4 Discussion

We have started from the assumption that there is a hierarchy of Gestaltprinciples and that the principle of common fate is more fundamental thanthe one of good continuation Technically speaking this can be reformulatedas saying that motion is a primary cue for the task of segmenting objects


from the background Closed boundaries (as supported by collinearity andcurvilinearity) provide a secondary cue that can be learned using the pri-mary one In this situation an important strategy is the distinction betweenobserver motion and object motion This information is probably detectableby a newborn but it is very unlikely that it can already be used accuratelyon a cellular level although there is evidence that this is possible even for4-month-old neonates (Kellman et al 1987) and in the adult (Leopold ampLogothetis 1998) How can object motion be detected by newborns A veryeasy way for the organism to achieve this distinction is to gaze in one direc-tion for a considerable amount of time (Piaget 1936) While the directionof gaze is xed changes of illumination on the retina cannot be caused byeye movements or observer motion Consequently the structuring processin the cortical model is strongly dependent on the feature constellations re-sulting from moving objects Using the latter we have shown (see Figure 5)that it is possible to learn for example the Gestalt principle of collinearityor more precisely the anatomical structure of long-range connections in pri-mary visual cortex which probably implements this Gestalt principle andwas found experimentally by Schmidt et al (1997) They showed that cellswith an orientation preference in area 17 of the cat are linked preferentiallyto iso-oriented cells a result reproduced by our model Furthermore thecoupling strength diminishes with the difference in preferred orientation ofpre- and postsynaptic cell

From a technical point of view our system has learned by experiencesomething similar to an association eld (Field et al 1993 Hess amp Field1999) projection eld (Grossberg amp Williamson 2001) or extension eld(Guy amp Medioni 1996) These are well known to aid the concept of collinear-ity and curvilinearity in technical computer vision algorithms and biolog-ical models Actually the learned arrangement of synaptic strengths mayprovide means of improving the extension eld (Guy amp Medioni 1996) al-gorithm as strong connections between cells with similar orientation areestablished in all directly neighboring hypercolumns even if the hyper-column lies in a spatial direction different from the pre- and postsynapticorientation Another lesson learned is that one presynaptic cell type shouldconnect not only to one specic postsynaptic cell type in the target hyper-column but with a smaller synaptic weight to a range of types with similarorientation in the same target column

Recently two models have been presented that address the ontogenesisof Gestalt principles In Grossberg and Williamson (2001) a very detailedperceptual model emerges from learning It is remarkable how well theperformance of this model matches neurophysiological measurements InChoe (2001) PGLISSOM a variant of the self-organizing map leads to lat-eral connectivities similar to the ones shown in Figure 3 after learning frompatches of elongated gaussians Our model differs from those in the respectthat we consider only horizontal connections and show that these can belearned from real camera images Using real sensor data complicates things


considerably and consequently other details for example stability againstcontrast changes have not been addressed However we could show thatthe complexity of data required for learning can be taken from actual mo-tion It is clear that these connections are only one aspect of a completesystem but we believe it is worthwhile to study the isolated subsystem

There are several possible reasons for the result that the desired learningeffect could be achieved by using the transient rather than the sustained re-sponses The rst is the biased background The moving object is much morelikely to provide the rich variety of oriented edges required for learning Thetransient responses do not react to the static background and therefore thatbias cannot inuence the connection structure This also suggests that ob-ject motion is preferable to observer motion Pure observer motion against astatic background would learn exactly the background biases and the sameholds for saccadic eye movements Furthermore the common motion ofobject edges gives strong hints that these edges belong together This couldnot be achieved if the whole background moved consistently In order tolearn from sustained responses the background would have to vary a lotin order for the biases of individual backgrounds to average out

Using only one sequence may be regarded as a weakness of our systemand of course it would provide far too little data to cover the environmen-tal properties This problem is greatly alleviated by assuming translationinvariance and therefore employing massive weight sharing Keeping thisin mind our results indicate that concentrating on the moving parts of ascene provides excellent preprocessing for learning of collinearity Actuallythe fact that one sequence is sufcient clearly demonstrates the power ofour preprocessing to select the data relevant for learning Also the numberof images in the collections is clearly rather small but we tried to keep aboutas many as we had movie frames for a fair comparison

It may be argued that a biased background is an unrealistic assumptionin the visual world of ambulatory system However there is considerableorientation bias in collections of natural images at least in the environ-ment given by the campus of Duke University (Coppola Purves McCoyamp Purves 1998) Our results indicate that moving objects yield a more evendistribution of orientations although we have not studied that systemati-cally The assumption that persons moving about can providea major sourceof data for visual learning of young infants seems safe It may also be ar-gued that 100 static images are too few to learn However even large imagesets would show the bias described by Coppola et al (1998) Learning fromstatic images imprints this bias into the connection structure as can be seenin Figure 6 At least our experiments show that learning is faster when basedon the transient responses to image sequences showing real motion

Regarding the biological relevance of our model many details havebeen omitted and many simplications been made in order to achieve acomputationally tractable size However good results have been reachedwith a combination of basic mechanisms spatiotemporal retinal ltering


topology preservation in the cortical map the transient nature of corticalcell responses and Hebbian learning It is a well-established fact that allthose single building blocks of our model exist in the brain so the completemodel should be regarded as a valid approximation to one of the processesthat organize perception during early development

The problem of learning relevant feature constellations from natural im-ages has been known at least since the models of Marr (1982) were in-troduced We think that the main reason for the difculties faced by theapproach is the concentration on feature constellations that are present inthe whole image Due to the nature of Hebbian learning (or other second-order correlation rules) the useful second-order feature constellations aremuch harder to detect in the whole image than in the part of the imagerepresenting a single object Of course usefulness is not a physical entitybut arises from the natural desire of living creatures to be able to distinguishobjects for various purposes like grabbing escape or ingestion which inturn yields considerable evolutionary advantage

The problemprobably gets harder the morecomplexthe features becomebecause of their diminishing statistical signicance in whole images of nat-ural scenes If one considers features like collinearity vertices or closedboundaries which dene a geometric object or combinations of closedboundaries which are listed here in order of their assumed complexity thenthe statistical signicance to nd those features in whole images probablynot only diminishes with rising complexity but the more complex featuresare probably not encountered often enough to make a difference statisticallyAnd even if there is a small signicance for those complex features it wouldtake a long time and a lot of different scenes to learn them using a statisticalalgorithm For the case of collinearity Kruger (1998) was able to show thatcollinearity and short-range parallelism are statistically signicant featuresof natural images if the set of images examined is large and varied enoughGeisler Perry Super and Gallogly (2001) link this to the psychophysicalperformance of contour grouping and conclude that this must be due to anunderlying neuronal structure Our system shows that the necessary varia-tion can be derived from a single scene with one object moving across it forlong enough that the movement covers all image locations Both studies areopposite extremes the reality for a newborn consists of neither snapshotsof many possible scenes nor a single moving object Both together showthat the neural circuits underlying the collinearity Gestalt principle can belearned from natural input

An important aspect pointed out by Geisler et al (2001) and Simoncelliand Olshausen (2001) is that simple linear correlations are not sufcient toextract interesting statistics from natural data Hebbian learning howeverrelies on linear correlations In our case it has been applied to nonlinearlypreprocessed data so there is no contradiction here Geisler et al (2001)also nd that curvilinearity cannot be learned from simple co-occurence butrequires Bayesian co-occurrence statistics Again this is not a contradiction


due to the nonlinearities in our model which are motivated biologicallyrather than statistically

Comparison of our results to the ones from Hoyer and Hyvarinen (2002)is made difcult by the fact that the assumptions about the underlying net-work are different Their model relies on a feedforward structure for contourcoding while our emphasis is on horizontal connections It seems that hori-zontal connections would hurt the statistical independence of the cells theyconnect so our model does not map naturally onto the ICA concept Thebiological evidence is certainly too sparse to make a decision in favor of oneof these assumptions This is clearly a point that requires further analysison the modeling as well as on the biological side

Further technical studies (Potzsch 1999) indicate that feature constella-tions of higher complexity such as vertices that seem to play an importantpart in object recognition cannot be learned by simple correlation learningrules that operate on the features of the whole image As indicated abovethese useful features are probably not statistically signicant features of nat-ural images If this is the case complex features can be learned from naturalimages only if there is purposeful behavior that carefully selects the dataworthwhile to be learned Concentrating on moving objects seems to be agood strategy even in the absence of a tracking mechanism It can be con-jectured that more sophisticated mechanisms like head and eye saccadescan boost learning further Reinagel and Zador (1999) show that effect onlearning image statistics therefore a positive effect on learning Gestalt rulesmay be expected

Appendix A Model of Subcortical Processing

A1 Retina Model We do not model the development of the retina it-self during the postnatal weeks Instead we extend and modify an existingretina model (Gaudiano 1994) to compute ON- and OFF-Y-ganglion cellresponses Once this is done we discretize the continuous differential equa-tions in time by values for each cell type (ON and OFF) The activity ratevst corresponds to the tonic or steady-state part in the continuous modeland the transient rate vtr to an upper bound for the maximum transient orphasic rate of the ganglion cell response (see Figure 2) These discrete ap-proximations of the retina model are essential because their output is usedin the model for the cortical layer To understand how these equations arederived we now go into the details of the continuous retina model Notethat the spatial dependency of the functions used is omitted to improvereadability

A2 The Photoreceptors Given the light intensity value pn for each pixelof the camera image recorded at time tn we dene a function pInt thatis equal to pn for all times t in the interval of [tn tnC1 The values of thisfunction pInt are the interval of [pmin

In pmaxIn ] which isgiven by the camera as


[0 255] We compute the nonlinear photoreceptor response rt according to

rt D ztpt with (A1)

pt D [pmax iexcl pmin]pInt=pmaxIn C pmin (A2)

Here the biological fact is taken into account that at extreme light intensitiesa photoreceptor can perform temporal high-pass ltering To achieve thisthe photoreceptor adapts its response rate under extreme constant light in-tensity toward its base rate activity by multiplying the visual input pt (seeequation A2) with an internal state zt The way this internal state is com-puted makes clear that it is something like a short-term memory for lightintensity By this light adaption mechanism a sudden change to a givenextreme light intensity level produces a short-term activity rate substan-tially different from the one produced under a constant light intensity ofthe same extreme magnitude The photoreceptor response rt is computedby using the transformed light intensity pt The simple linear rescalingin equation A2 to the interval [pmin pmax] is necessary because the increasedminimal value allows for dynamic photoreceptor behavior We have chosenpmin D 30 pmax D 255 The chosen value of pmin is not particularly criti-cal but should be well above zero because otherwise low light intensitiesin the visual world cannot be modulated by the internal state of the pho-toreceptor and therefore cannot trigger a dynamic photoreceptor responsedistinguishable from the response to constant low light intensity

The rate of change dz=dt of the internal parameter depends on the relativeintensity prelt (see equation A4) of the visual stimulus that the photore-ceptor receives

dz=dt D F[M iexcl zt] iexcl Hpreltzt with (A3)

prelt D pt iexcl pmin=pmax iexcl pmin (A4)

Here F M and H have the values chosen by Gaudiano (1994) M is themaximal value of zt and F is a gain parameter controlling how fast ztapproaches M in the absence of light (prelt acute 0) On the other hand Hpreltcontrols the decay of zt

A3 Bipolar and Ganglion Cells The next steps in retinal processing arethe integration of photoreceptor activity by horizontal cells and the feed-forward processing by bipolar cells We summarize horizontal inuences ofhorizontal and amacrine cells in a later step of the model (see equations A7and A8) and concentrate on the straightforward modeling of bipolar cellresponses bCiexcl It is assumed that one bipolar cell receives input from justone photoreceptor The value rmax D pmaxM is given by equation A1 as Mis the maximal value of the internal state zt of a photoreceptor and pmax


the maximal light intensity

bCt D rt and biexclt D rmax iexcl rt (A5)

The ganglion cell activity vt itself is modeled using the shunting equa-tion A6 rst used by Grossberg (1970) and Sperling (1970) In our particu-lar equation the ganglion activity rate vt is limited to the interval of [0 1]without passive decay

dvt=dt D [1 iexcl vt]ut iexcl [vt]wt (A6)

The excitatory ut and inhibitory wt inputs contribute to the ganglioncell response in a PUSH-PULL way which means that each bipolar cellcontributes to both center and surround mechanisms of the ganglion cell indifferent quantities (McGuire Stevens amp Sterling 1986) For ON-ganglioncells these inuences have been modeled by the following equations wherewe use the gaussians c (central) and s (surround) with the parameters of theGaudiano model for Y-ganglion cells extended to two dimensions

ut D c curren bCt C s curren biexclt (A7)

wt D s curren bCt C c curren biexclt (A8)

Biologically these inuences come from lateral integration mediated byhorizontal and amacrine cells For OFF-ganglion cells equations A7 andA8 have been used with bCt and biexclt interchanged In our simulationsthe center gaussian has a value of frac34 equal to 318 pixel and the surroundgaussian of 389 pixel

The analytical solutions of equation A6 for ON- and OFF-cells then are

vONt D E=pmax[c curren rt iexcl s curren rt] C FON (A9)

vOFFt D iexclE=pmax[c curren rt iexcl s curren rt] C FOFF (A10)

where the following abbreviations have been used for convenience of nota-tion

E D 1=MVs C MVc (A11)

FON D MVs=MVs C MVc (A12)

FOFF D MVc=MVs C MVc (A13)

The numbers used in these denitions are as follows Vs and Vc are theintegrals over the two gaussians used for modeling center and surroundinuences in the PUSH-PULL model


So far we have presented a continuous model for ON and OFF ganglioncells and one can discuss various aspects of the model like the validity ofthe used simplications for example where exactly in the retina the spatialintegration takes place how the temporal ltering is probably done andmore We refer the reader to Gaudiano (1994) for these arguments

As shown in Figure 2 the continuous time course is now approximatedby two values computed as follows

rtr D pnzniexcl1 (A14)

zn D MF=F C Hpn (A15)

rst D pnzn (A16)

With these approximations and using the continuous equations for the gan-glion cell responses (equations A9 and A10) we can approximate the ON-and OFF-ganglion cell activity discretely in time and represent each of themby two values that describe the spatiotemporal properties of the ganglioncell responses

vONtr D E=pmax[c curren rtr iexcl s curren rtr] C FON (A17)

vONst D E=pmax[c curren rst iexcl s curren rst] C FON (A18)

vOFFtr D iexclE=pmax[c curren rtr iexcl s curren rtr] C FOFF (A19)

vOFFst D iexclE=pmax[c curren rst iexcl s curren rst] C FOFF (A20)

Appendix B Cortex Model

How should the cortical layer be modeled Recall from section 13 thatafter birth 90 of all active cells are of simple type For that reason wewill not consider complex cells here as they probably play only a minorrole directly after birth and most likely develop afterward Additionallythe development of the long-range connection structure of different kindsof simple cells after birth probably occurs at different speeds The devel-opment of the long-range connection structure between edge detector (oddsymmetry) cells of all scales is expected to be more robust than the one of bardetector (even symmetry) cells immediately after birth One reason for thisis the increased sensitivity to low spatial frequency gratings of the corticalcells (see section 13) in the newborn This phenomenon assigns a specialrole to the edge detector cells An edge between two large areas of highand low luminance respectively remains an edge for all edge detector cellsindependent of their scale or spatial frequency This is not true for examplefor a bar detector cell as the strength of the response is determined by thepreferred spatial frequency and the size of the bar presented Therefore werestrict our model to simple cells with odd symmetry that are functionallyedge detectors


How can edge detector cells be modeled Equation 21 describes the pri-mary afferent response of a simple cell and the terms gON and gOFF are nowdened in detail The most important feature of an edge detector cell is itspreferred orientation Aacute and for each orientation there are two edge detectorcells one with positive (AacuteC) and one with negative (Aacuteiexcl) polarity They canbe modeled by using the sine part of a Gabor function equation B1 withdifferent signs In our simulations we use cells with eight different orien-tations Aacute and do not model other properties of simple cells like selectivityfor motion direction The formulas are as follows

gx y Aacute D expsup3

iexcl k2x2 C y2

2frac34 2

acutecent sinkx cos Aacute C ky sin Aacute (B1)

AacuteC gON D maxCg 0 gOFF D maxiexclg 0 (B2)

Aacuteiexcl gON D maxiexclg 0 gOFF D maxCg 0 (B3)

The parameters are frac34 lt 2 and k D 05 A value of frac34 above two leadsto receptive elds with more than two signicant ON- or OFF-subeldsin contrast to the data about simple cells Note that the resting activity of acortical neuron in equation 21 is nonzero despite the vanishing integral of gin equation B1 The reason is that the resting activity of the retinal ganglioncells is nonzero and there are afferences only to the cortex Biologicallyit is not likely that the synaptic strengths of the afferent thalamocorticalconnections are so nely tuned in the rst postnatal weeks as the Gabor-like receptive elds imply With the high specic connection structure givenin equations B2 and B3 the effects of the short-range intracortical inhibitoryand excitatory connections are implicitly modeled which are the probablebasis for sharp orientation tuning (Somers Nelson amp Sur 1995) However touse parts of a Gabor function is an easy alternative way to implement someform of orientation tuning without the computational cost of modeling theshort-range corticocortical excitatory and inhibitory feedback connections

B1 Cortical Organization We assume that the retinogeniculate andthalamocortical pathways already exist at birth and that their mapping isalready retinotopic (see section 13) Simple cells sensitive to low spatialfrequencies also exist and their theoretical primary afferent responses aremodeled using equation 21 The receptive eld center positions Ep 2 N2

for the simple cells are placed in the image at grid points with a distanceof four pixels in the horizontal and vertical directions For each of thosegrid positions a hypercolumn consisting of simple cells of eight differentreceptive eld orientations is modeled Within one hypercolumn each spe-cic orientation is represented by two cells AacuteC and Aacuteiexcl with receptive eldproperties dened by equations B2 or B3 All cells in one hypercolumnhave the same receptive eld center Ep0 in retinal coordinates and neighbor-ing hypercolumns have different receptive eld centers Epk Each cell of a


hypercolumn establishes connections with all neurons of its own hypercol-umn and with all neurons in the neighboring four hypercolumns in eachdirection of the cortical plane (9 pound 9 neighborhood) Each model cell rep-resents biologically a pool of cells with nearly the same properties andtherefore a model cell was allowed to make connections to itself To avoidborder artifacts learning of long-range connections was disabled in the rstfour hypercolumns at the border of the cortical plane

B2 Learning Horizontal Connections We now shift our focus to the or-ganization of the long-range cortical connections illustrated at the bottom ofFigure 1 and in more detail in Figure 7 in which the repeated regular struc-ture is representing a hypercolumn with eight orientations (only four areshown) and two polarities A small segment of the cortical plane is shownand the dashed arrows in it illustrate the range of the cortical horizontalconnections

To adapt the synaptic weights the difference in primary afferent inputin equation 22 is used in equation 24 to select only those cells that have atransient cortical response Equation 24 is a Hebbian learning rule 1wij isthe change of synaptic strength between neuron j and i and sup2 is a generallearning factor that controls the impact of each learning step The choiceof sup2 is not critical To illustrate the effect of transient responses we havealso examined a Hebbian learning rule that uses only the sustained (seeequation 23) cortical responses

1wij D sup2ocurreni ocurren

j (B4)

This corresponds to the classical interpretation of the Hebbian postulate inneural modeling because it is a correlation learning rule that operates di-rectly on the input All cells that have a primary afferent response above thebaseline activity micro of equation 23 may participate in the structuring process

After the increment of equation 24 or B4 is used to update the synap-tic strengths equation B5 two additional procedures are incorporated Toavoid articially high synaptic values a connection strength above one isreset to one by equation B6 To introduce competition between synapsesthe total synaptic strength for a given ensemble C is held constant at K bynormalizing the weights after each learning step and multiplying them withK equation B7

wij Atilde wij C 1wij (B5)

wij Atilde minwij 1 (B6)

wij Atilde wijK=X

Cwij (B7)

One of the steps to make the model applicable to natural image data isthe introduction of ensembles C of equivalent connections First we will


Figure 7 The cortical horizontal connection scheme A small segment of the cor-tical plane is shown to illustrate the range of the long-range connections (indi-cated by the dashed arrows) and some connections of an ensemble of equivalentconnections (indicated by the solid arrows and explained in the text) The re-peated regular structure represents a hypercolumn with 8 orientations (just fourare shown) and two polarities Refer to section B1 for a detailed explanation

give a technical description of such an ensemble and then explain the bi-ological motivation for building those ensembles All established connec-tions can be characterized by four parameters the receptive eld centerposition Ep 2 N2 of the presynaptic cell the spanned distance measured inhypercolumns Er 2 Z2 orientation and polarity of the presynaptic simplecell Aacute

preCiexcl and the orientation and polarity of the postsynaptic simple cell

AacutepostCiexcl Mathematically one can build the equivalence classes from the re-

lation that two connectionsmdashand with them their synaptic weightsmdashareequivalent if they have the same Er Aacute

preCiexcl and Aacute

postCiexcl but possibly different

centers


By doing this we have introduced translational invariance of the syn-apses that connect the same pre- and the same postsynaptic cell types overthe same distance of hypercolumns which is illustrated for a few connec-tions by the solid arrows in equation A3 As we have two polarities andeight different types of cells as pre- and postsynaptic cells and 9 pound 9 dif-ferent Er we have 1296 ensembles of connections for each presynaptic celltype of the reference hypercolumn and a total of 20736 ensembles for thishypercolumn as a whole As an example the four solid arrows in Figure 7are equivalent and are forced to have identical weights

What is the biological foundation for building these ensembles Considera newborn with a moving object in front of it The newborn will gaze in onedirection and the image of the objects moves over the retina Then on alarger timescale the newborn will shift its head trying to followmdashnot verysuccessfully because of the low tracking accuracy after birth (Hofsten et al1998 Piaget 1936)mdashthe stimulus and gaze again in the new direction Thishappens several times until the newborn has lost the object If we now thinkof the visual input the newborn receives in terms of object features projectedon the retina we see that the same features are shifted on the retina becauseof the object motion and because of the more or less randomly distributedeye or head saccades of the newborn One could argue now that this shouldmainly affect horizontally neighboring cells and not vertically neighboringcells because most movements are on the horizontal surface But when thenewborn is gazing in one direction and then moves his head or eyes togaze in another direction it is very unlikely that this can be done without avertical shift given the low accuracy of tracking movements in the newbornThe structuring of long-range connection strengths in the cortical region thatcorresponds to the retinal area covered by the object will of course averagein time over all presented stimuli and this average should be the samefor equivalent connections because the object features seen have been thesame on average The result should be connection strengths of roughly thesame magnitude for equivalent connections Therefore we can model justone synapse for each ensemble of connections and reduce the amount ofsynapses drastically

Appendix C Input Data

We have applied the model to two movies of a person moving and wav-ing his arms in front of a background from a seminar room The back-grounds showed different biases The sequence ldquomovingmpgrdquo consists of199 frames and the background contains a diagonal rectangle

In the course of revising this article we also applied the model to thesequence ldquofw carstenmpgrdquo which is in color and has a larger resolutionbecause it was collected for a different experiment This movie consists of100 frames and shows no strong diagonals in the background Sampling hasbeen adjusted and color ignored The results for the transient responses are


very similar to the ones for the other sequence The ones for the sustainedresponses reect the different background bias

To clarify the relationship to static natural images we have applied themodel to movies made out of 60 frames from a texture database from MIT(http==www-whitemediamitedu=vismod=imagery=VisionTexture) andone with 98 frames from a scene database by British Telecom (ftp==ftpvislistcom=IMAGERY=BT scenes)

All four sequences as well as the respective learning processes can be re-trieved on-line from (ftp==ftpneuroinformatikruhr-uni-bochumde=pub=

pictures=gestalt)

Acknowledgments

This work has been funded by the DFG in SFB509 and the KOGNET Grad-uiertenkolleg and by the BMBF in the LOKI project (01IB 001D) Partial fund-ing from the MUHCI project by the European Community is also gratefullyacknowledged We thank the reviewers for thoughtful comments and im-portant hints for improvement of the original manuscript

References

Albus K amp Wolf W (1984)Early post-natal development of neuronal functionin the kittenrsquos visual cortex A laminar analysis Journal of Physiology 348153ndash185

Atkinson J amp Braddick O (1992) Visual segmentation of oriented textures byinfants Behav Brain Res 49(1) 123ndash131

Attneave F (1954) Some informational aspects of visual perception Psycholog-ical Review 61 183ndash193

Barlow H (1961) Possible principles underlying the transformation of sensorymessages In W Rosenblith (Ed) Sensory communication (pp 217ndash234) Cam-bridge MA MIT Press

Barten S Birns B amp Ronch J (1971) Individual differences in visual pursuitbehavior of neonates Child Development 42 313ndash319

Bell A J amp Sejnowski T J (1997) The ldquoindependent componentrdquo of naturalscenes are edge lters Vision Research 37(23) 3327ndash3338

Birch E (1985) Infant interocular acuity differences and binocular vision VisionResearch 25(4) 571ndash576

Birch E amp Salomao S (1998) Infant random dot stereoacuity cards Journal ofPediatric Ophthalmology and Strabismus 35(2) 86ndash90

Bosking W Zhang Y Schoeld B amp Fitzpatrick D (1997) Orientation se-lectivity and the arrangement of horizontal connections in tree shrew striatecortex Journal of Neuroscience 17(6) 2112ndash2127

Braddick O amp Atkinson J (1979)A photorefractive study of infant accommo-dation Vision Research 19 1319ndash1330

Burkhalter A Bernardo K L amp Charles V (1993)Development of local circuitsin human visual cortex Journal of Neuroscience 13(5) 1916ndash1931


Callaway E M (1998) Prenatal development of layer-specic local circuits inprimary visual cortex of the macaque monkey Journal of Neuroscience 18(4)1505ndash1527

Choe Y (2001) Perceptual grouping in a self-organizing map of spiking neuronsUnpublished doctoral dissertation University of Texas at Austin Availableon-line http==wwwcstamuedu=faculty=choe=ftp=choedisspdf

Coppola D M Purves H R McCoy A N amp Purves D (1998) The distribu-tion of oriented contours in the real world PNAS 95 4002ndash4006

Courage M amp Adams R (1996) Infant peripheral vision The development ofmonocular visual acuity in the rst 3 months of postnatal life Vision Research36(8) 1207ndash1215

Field D J Hayes A amp Hess R F (1993) Contour integration by the hu-man visual system Evidence for local ldquoassociation eldrdquo Vision Res 33(2)173ndash193

Fitzpatrick D (1997) The functional organization of local circuits in visual cor-tex Insights from the study of tree shrew striate cortex CerebralCortex 385(6)535ndash538

Gaudiano P (1994) Simulations of x and y retinal ganglion cell behavior witha nonlinear push-pull model of spatiotemporal retinal processing Vision Re-search 34 1767ndash1784

Geisler W Perry J Super B amp Gallogly D (2001) Edge co-occurrence innatural images predicts contour grouping performance VisionResearch41(6)711ndash724

Grossberg S (1970)Neural pattern discrimination Journal of TheoreticalBiology27 291ndash337

Grossberg S amp Mingolla E (1985) Neural dynamics for perceptual group-ing Textures boundaries and emergent segmentation Perception and Psy-chophysics 38 141ndash171

Grossberg S amp Williamson J (2001) A neural model of how horizontal andinterlaminar connections of visual cortexdevelop into adult circuits that carryout perceptual grouping and learning Cerebral Cortex 11(1) 37ndash58

Guy M amp Medioni G (1996) Inferring global perceptual contours from localfeatures International Journal of Computer Vision 20 113ndash133

Hainline L Riddell J Grose-Fifer J J amp Abramow I (1992) Development ofaccommodation and convergence in infancy Behavioural Brain Research 4933ndash50 Special Issue

Hess R amp Field D (1999) Integration of contours New insights Trends inCognitive Sciences 3(12) 480ndash486

Hofsten C Vishton P Spelke E Feng Q amp Rosander K (1998) Predictiveaction in infancy Tracking and reaching for moving objects Cognition 67(3)255ndash285

Hoyer P O amp Hyvarinen A (2002)A multilayer sparse coding network learnscontour coding from natural images Vision Research 42(12) 1593ndash1605

Isaac J Crair M Nicoll R amp Malenka R (1997) Silent synapses during de-velopment of thalamocortical inputs Neuron 18(2) 269ndash280

KatzL Camp Callaway E M (1992)Development of localcircuits in mammalianvisual cortex Annual Review of Neuroscience 15 31ndash56


Kellman P J Gleitmann H amp Spelke E (1987) Object and observer motionin the perception of objects by infants J Exp Psychol Hum Percept Perform13(4) 586ndash593

Kellman P J amp Short K (1987) Development of three-dimensional form per-ception J Exp Psychol Hum Percept Perform 13(4) 545ndash557

Kellman P J Spelke E amp Short K (1986) Infant perception of object unityfrom translatory motion in depth and vertical translation Child Development57(1) 72ndash86

Koffka K (1935) Principles of gestalt psychology London Lund HumphriesKovacs I (2000) Human development of perceptual organization Vision Re-

search 40(12) 1301ndash1310Kruger N (1998) Collinearity and parallelism are statistically signicant

second-order relations of complex cell responses Neural Processing Letters8 117ndash129

Leopold D amp Logothetis N (1998)Microsaccadesdifferentially modulate neu-ral activity in the striate and extrastriate visual cortex Experimental BrainResearch 123(3) 341ndash345

Li Z (1998)A neural model of contour integration in the primary visual cortexNeural Computation 10 903ndash940

Lowel S amp Singer W (1992) Selection of intrinsic horizontal connections inthe visual cortex by correlated neuronal activity Science 255 209ndash212

Malach R Amir Y Harel M amp Grinvald A (1993) Relationship betweenintrinsic connections and functional architecture revealed by optical imag-ing and in-vivo targeted biocytin injections in primate striate cortex PNAS90(22) 10469ndash10473

Marr D (1982) Vision A computational investigation into the human representationand processing of visual information San Francisco Freeman

McGuire B A Stevens J amp Sterling P (1986) Microcircuitry of beta ganglioncells in cat retina Journal of Neuroscience 6(4) 907ndash918

Morrone M Burr D amp Fiorentini A (1990) Development of contrast sensi-tivity and acuity of the infant colour system Proc R Soc Lond B Biol Sci242(1304) 134ndash139

Norcia A Tyler C amp Hamer R (1990) Development of contrast sensitivity inthe human infant Vision Research 30(10) 1475ndash1486

Olshausen B A amp Field D J (1996)Wavelet-like receptive elds emerge froma network that learns sparse codes for natural images Nature 381 607ndash609

Piaget J (1936)La naissance de lrsquointelligence chez lrsquoenfant New York InternationalUniversities Press

Potzsch M (1999) Object-contour statistics extracted from natural image sequencesUnpublished doctoral dissertation Ruhr-Universitat-Bochum Availableon-line ftp==ftpneuroinformatikruhr-uni-bochumde=pub=manuscripts=

theses=poetzschpdfReinagel P amp Zador A M (1999)Natural scene statistics at the center of gaze

Network Computation in Neural Systems 10 341ndash350Ross W Grossberg S amp Mingolla E (2000)Visual cortical mechanisms of per-

ceptual grouping interacting layers networks columns and maps NeuralNetworks 13(6) 571ndash588


Schmidt K Goebel R Lowel S amp Singer W (1997) The perceptualgrouping criterion of colinearity is reected by anisotropies of connectionsin the primary visual cortex European Journal of Neuroscience 5(9) 1083ndash1084

Simoncelli E amp Olshausen B (2001) Natural image statistics and neural rep-resentation Annual Review of Neuroscience 24 1193ndash1216

Skoczenski A amp Norcia A (1998) Neural noise limitations on infant visualsensitivity Nature 391(6668) 697ndash700

Snider C Dehay C Berland M Kennedy H amp Chalupa L (1999) Prena-tal development of retinogeniculate axons in the macaque monkey duringsegregation of binocular inputs Journal of Neuroscience 19(1) 220ndash228

Somers D C Nelson S B amp Sur M (1995)An emergent model of orientationselectivity in cat visual cortical simple cells Journal of Neuroscience 15(8)5448ndash5465

Spelke E Breinlinger K Jacobson K amp Phillips A (1993) Gestalt relationsand object perception A developmental study Perception 22 1483ndash1501

Sperling G (1970) Models of visual perception and contrast detection Percep-tion and Psychophysics 8 143ndash157

Streri A amp Spelke E (1989) Effects of motion and gural goodness on hapticobject perception in infancy Child Development 60(5) 1111ndash1125

Tootle J (1993)Early postnatal development of visual function in ganglion cellsof the cat retina Journal of Neurophysiology 69(5) 1645ndash1660

van Hateren J amp Ruderman D (1998) Independent component analysisof natural image sequences yields spatiotemporal lters similar to simplecells in primary visual cortex Proceedings of the Royal Society London B 2652315ndash2320

Wertheimer M (1923) Untersuchungen zur Lehre von der Gestalt II Psycholo-gische Forschung 4 301ndash350

Yen S amp Finkel L (1998) Extraction of perceptually salient contours by striatecortical networks Vision Research 38(5) 719ndash741

Yonas A Arterberry M E amp Granrud C E (1987) Four-month-old infantsrsquosensitivity to binocular and kinetic infomation for three-dimensional-objectshape Child Development 84(4) 910ndash917

Zetzsche C amp Krieger G (2001) Nonlinear mechanisms and higher-orderstatistics in biological vision and electronic image processing Review andperspectives Journal of Electronic Imaging 10(1) 56ndash99

Received September 13 2001 accepted January 28 2003


with such environmental changes that happen too fast for evolutionarychanges

The most prominent rules for visual perception are the Gestalt principlesformulated by Wertheimer (1923) and Koffka (1935) Their basic postulateis that a percept is more than the sum of the constituent parts in that theseparts are linked to form a coherent whole The Gestalt principles are rulesthat govern what should be linked together to form a percept and whatshould not The most important ones are proximity similarity closure goodcontinuation common fate good form and global precedence

Recent research in computer vision has begun to recognize Gestalt princi-ples as important for object selection and segmentation and they have beenapplied in numerous models of visual perception where they are usuallytaken for granted To our knowledge there are two models for their devel-opment during maturation of the visual system (Grossberg amp Williamson2001 Choe 2001) but we know of none that relies on short sequences of asingle natural scene as the basis for learning

We will present a detailed model of how the principles of collinearityand curvilinearity can be learned from real visual stimuli assuming thatthe principle of common fate actively organizes perception already at birthWe will employ three assumptions which are to varying degree coveredby experimental data

sup2 Gestalt principles are implemented by the connectivity of the visualcortex

sup2 There is a hierarchy of Gestalt principles in the sense that some prin-ciples are already active at birth and others are developed later andinuenced by visual experience This hierarchy is reected in the tem-poral order in which the various principles become observable duringdevelopment

sup2 The higher principles are learned from experience while the lowerones assist in structuring the data to be learned

Concretely we will review the literature to present evidence that the prin-ciples of collinearity and curvilinearity correspond to structured horizontalconnections between simple cells in V1 and that the principle of commonfate is more fundamental than collinearity and curvilinearity Then we de-scribe a neuronal model that shows that collinearity and curvilinearity canbe learned from observing moving objects by structuring horizontal con-nections with a Hebbian learning rule

The letter is organized as follows In section 11 we review recent psy-chophysical work on the course of development of various Gestalt princi-ples in early infancy In section 12 we focus on the role of common fateIn section 13 we look in depth at recent neurophysiological data aboutwhich biological pathways and computations are probably present at birthor shortly afterward and which are structured by experience The more











































tr n curren gOFF


























3 Results



















4 Discussion



























rt D ztpt with (A1)






























rst D pnzn (A16)











iexcl k2x2 C y2

2frac34 2












j (B4)





wij Atilde wijK=X

Cwij (B7)










centers











pictures=gestalt)

Acknowledgments


References














































































































tr n curren gOFF


























3 Results



















4 Discussion



























rt D ztpt with (A1)






























rst D pnzn (A16)











iexcl k2x2 C y2

2frac34 2












j (B4)





wij Atilde wijK=X

Cwij (B7)










centers











pictures=gestalt)

Acknowledgments


References









































































































tr n curren gOFF


























3 Results



















4 Discussion



























rt D ztpt with (A1)






























rst D pnzn (A16)











iexcl k2x2 C y2

2frac34 2












j (B4)





wij Atilde wijK=X

Cwij (B7)










centers











pictures=gestalt)

Acknowledgments


References




































































































tr n curren gOFF


























3 Results



















4 Discussion



























rt D ztpt with (A1)






























rst D pnzn (A16)











iexcl k2x2 C y2

2frac34 2












j (B4)





wij Atilde wijK=X

Cwij (B7)










centers











pictures=gestalt)

Acknowledgments


References






























































































tr n curren gOFF


























3 Results



















4 Discussion



























rt D ztpt with (A1)






























rst D pnzn (A16)











iexcl k2x2 C y2

2frac34 2












j (B4)





wij Atilde wijK=X

Cwij (B7)










centers











pictures=gestalt)

Acknowledgments


References


























































































tr n curren gOFF


























3 Results



















4 Discussion



























rt D ztpt with (A1)






























rst D pnzn (A16)











iexcl k2x2 C y2

2frac34 2












j (B4)





wij Atilde wijK=X

Cwij (B7)










centers











pictures=gestalt)

Acknowledgments


References



















































































tr n curren gOFF


























3 Results



















4 Discussion



























rt D ztpt with (A1)






























rst D pnzn (A16)











iexcl k2x2 C y2

2frac34 2












j (B4)





wij Atilde wijK=X

Cwij (B7)










centers











pictures=gestalt)

Acknowledgments


References















































































tr n curren gOFF


























3 Results



















4 Discussion



























rt D ztpt with (A1)






























rst D pnzn (A16)











iexcl k2x2 C y2

2frac34 2












j (B4)





wij Atilde wijK=X

Cwij (B7)










centers











pictures=gestalt)

Acknowledgments


References











































































tr n curren gOFF


























3 Results



















4 Discussion



























rt D ztpt with (A1)






























rst D pnzn (A16)











iexcl k2x2 C y2

2frac34 2












j (B4)





wij Atilde wijK=X

Cwij (B7)










centers











pictures=gestalt)

Acknowledgments


References


























































































3 Results



















4 Discussion



























rt D ztpt with (A1)






























rst D pnzn (A16)











iexcl k2x2 C y2

2frac34 2












j (B4)





wij Atilde wijK=X

Cwij (B7)










centers











pictures=gestalt)

Acknowledgments


References














































































3 Results



















4 Discussion



























rt D ztpt with (A1)






























rst D pnzn (A16)











iexcl k2x2 C y2

2frac34 2












j (B4)





wij Atilde wijK=X

Cwij (B7)










centers











pictures=gestalt)

Acknowledgments


References









































































3 Results



















4 Discussion



























rt D ztpt with (A1)






























rst D pnzn (A16)











iexcl k2x2 C y2

2frac34 2












j (B4)





wij Atilde wijK=X

Cwij (B7)










centers











pictures=gestalt)

Acknowledgments


References






































































3 Results



















4 Discussion



























rt D ztpt with (A1)






























rst D pnzn (A16)











iexcl k2x2 C y2

2frac34 2












j (B4)





wij Atilde wijK=X

Cwij (B7)










centers











pictures=gestalt)

Acknowledgments


References





















































































4 Discussion



























rt D ztpt with (A1)






























rst D pnzn (A16)











iexcl k2x2 C y2

2frac34 2












j (B4)





wij Atilde wijK=X

Cwij (B7)










centers











pictures=gestalt)

Acknowledgments


References











































































4 Discussion



























rt D ztpt with (A1)






























rst D pnzn (A16)











iexcl k2x2 C y2

2frac34 2












j (B4)





wij Atilde wijK=X

Cwij (B7)










centers











pictures=gestalt)

Acknowledgments


References







































































4 Discussion



























rt D ztpt with (A1)






























rst D pnzn (A16)











iexcl k2x2 C y2

2frac34 2












j (B4)





wij Atilde wijK=X

Cwij (B7)










centers











pictures=gestalt)

Acknowledgments


References





























































































rt D ztpt with (A1)






























rst D pnzn (A16)











iexcl k2x2 C y2

2frac34 2












j (B4)





wij Atilde wijK=X

Cwij (B7)










centers











pictures=gestalt)

Acknowledgments


References

























































































rt D ztpt with (A1)






























rst D pnzn (A16)











iexcl k2x2 C y2

2frac34 2












j (B4)





wij Atilde wijK=X

Cwij (B7)










centers











pictures=gestalt)

Acknowledgments


References



















































































rt D ztpt with (A1)






























rst D pnzn (A16)











iexcl k2x2 C y2

2frac34 2












j (B4)





wij Atilde wijK=X

Cwij (B7)










centers











pictures=gestalt)

Acknowledgments


References














































































rt D ztpt with (A1)






























rst D pnzn (A16)











iexcl k2x2 C y2

2frac34 2












j (B4)





wij Atilde wijK=X

Cwij (B7)










centers











pictures=gestalt)

Acknowledgments


References






































































rt D ztpt with (A1)






























rst D pnzn (A16)











iexcl k2x2 C y2

2frac34 2












j (B4)





wij Atilde wijK=X

Cwij (B7)










centers











pictures=gestalt)

Acknowledgments


References


























































































rst D pnzn (A16)











iexcl k2x2 C y2

2frac34 2












j (B4)





wij Atilde wijK=X

Cwij (B7)










centers











pictures=gestalt)

Acknowledgments


References









































































rst D pnzn (A16)











iexcl k2x2 C y2

2frac34 2












j (B4)





wij Atilde wijK=X

Cwij (B7)










centers











pictures=gestalt)

Acknowledgments


References







































































iexcl k2x2 C y2

2frac34 2












j (B4)





wij Atilde wijK=X

Cwij (B7)










centers











pictures=gestalt)

Acknowledgments


References









































































j (B4)





wij Atilde wijK=X

Cwij (B7)










centers











pictures=gestalt)

Acknowledgments


References












































































centers











pictures=gestalt)

Acknowledgments


References














































































pictures=gestalt)

Acknowledgments


References








































































pictures=gestalt)

Acknowledgments


References

















































































































































































Documents

Learning the Gestalt Rule of Collinearity from Object Motion · Learning the Gestalt Rule of Collinearity from Object Motion 1867 technically minded reader may want to skip this introduction