18
Int. J. Man-Machine Studies (1975) 7:511-528 Perception of Depth Surfaces in Random-dot Stereograms: a Neural Model* PARVATI DEV Neurosciences Research Program, Boston, Massachusetts 02130, U.S.A. (Received 7 November 1974) A model has been presented of a neural process that segments the visual field into spatially disjoint regions, each region characterized by a specific feature such as a texture or color. The neural connectivity hypothesized to be necessary for the segmentation process has been formulated in mathematical terms and the corresponding neural network has been simulated on the digital computer. The properties of the network that result from the postulated patterns of excitatory and inhibitory connectivity have been investigated. It is shown that the required connectivity is that of excitatory connections only between neurons detecting similar features and inhibitory connections between all feature-detecting neurons. The resulting segmentation model is used to model the phenomenon of stereopsis as investigated through the use of random-dot stereograms. The process of depth perception through stereopsis can be viewed as a segmentation process with each segment, that is, each surface at a specific depth, characterized by a specific retinal disparity. It is shown that the segmentation model suffices to detect the different depth surfaces embedded in the random-dot patterns. Visual perception is based on the detection and recognition of a set of features, such as contours, color and texture, that characterize the object or the scene. In general, there are many objects present in the visual field and, to recognize any one of these objects, the recognition procedure should be applied to the subset of visual features generated by the object. The method of selection of such a subset is a problem that remains a thorny one in fields ranging from neurophysiology to artificial intelligence. One possible solution is that a single feature or a small group of features acts as a cue to interact with long-term memory so as to generate a "hypothesis" concerning the identity of the object (Gregory, 1969; Axbib, 1972). Repeated comparisons between the input features and the continually updated hypothesis serves to *An earlier version of this paper was presented at the Conference on BiologicallyMotivated Automata Theory held at the MITRE Corporation, MacLean, Va., 19-21 June 1974. 511

Perception of depth surfaces in random-dot stereograms : a neural model

Embed Size (px)

Citation preview

Int. J. Man-Machine Studies (1975) 7 : 5 1 1 - 5 2 8

Perception of Depth Surfaces in Random-dot Stereograms: a Neural Model*

PARVATI DEV

Neurosciences Research Program, Boston, Massachusetts 02130, U.S.A.

(Received 7 November 1974)

A model has been presented of a neural process that segments the visual field into spatially disjoint regions, each region characterized by a specific feature such as a texture or color. The neural connectivity hypothesized to be necessary for the segmentation process has been formulated in mathematical terms and the corresponding neural network has been simulated on the digital computer. The properties of the network that result from the postulated patterns of excitatory and inhibitory connectivity have been investigated. It is shown that the required connectivity is that of excitatory connections only between neurons detecting similar features and inhibitory connections between all feature-detecting neurons. The resulting segmentation model is used to model the phenomenon of stereopsis as investigated through the use of random-dot stereograms. The process of depth perception through stereopsis can be viewed as a segmentation process with each segment, that is, each surface at a specific depth, characterized by a specific retinal disparity. It is shown that the segmentation model suffices to detect the different depth surfaces embedded in the random-dot patterns.

Visual percept ion is based on the detec t ion and recogni t ion o f a set o f features, such as contours , co lor and texture, tha t character ize the object o r the scene. In general , there are m a n y objects present in the visual field and, to recognize any one o f these objects, the recogni t ion p rocedure should be appl ied to the subset o f visual features generated by the object . The me thod o f select ion o f such a subset is a p r o b l e m tha t remains a tho rny one in fields ranging f rom neurophys io logy to artificial intelligence. One possible solut ion is tha t a single feature or a small g roup o f features acts as a cue to in teract wi th long- te rm m e m o r y so as to generate a "hypo thes i s " concerning the ident i ty o f the object (Gregory , 1969; Axbib, 1972). Repea t ed compar i sons between the input features and the cont inua l ly u p d a t e d hypothesis serves to

*An earlier version of this paper was presented at the Conference on Biologically Motivated Automata Theory held at the MITRE Corporation, MacLean, Va., 19-21 June 1974.

511

512 P. D~V

select the subset of features that belong to the object. An alternative solution, which forms the content of this paper, is that a preliminary separation of the features into subsets may occur through a much simpler mechanism based on excitatory and inhibitory interaction between neighboring feature detectors.

A glance around one's visual world suggests that a cue by which one differentiates between objects is the difference in some characteristic feature such as color or texture. For example, the green of the trees contrasts with the blue of the sky; the roughness of the wall contrasts with the smoothness of the desk. To actually recognize an object such as a desk, m a n y features besides its smoothness must be detected. However, the region of the visual field characterized by the feature of smoothness contains the subset of visual features necessary for this recognition process. Therefore, a process that could separate or "segment" the visual field into regions, each characterized by a single feature or a set of features, would provide a method for separating into subsets the features that were generated by different objects.

In investigating the phenomenon of stereopsis, Julesz (1962, 1971) dis- cusses a similar problem. The visual feature used for the stereoscopic perception of depth is retinal disparity.* For example, a planar surface perpendicular to an observer's direction of gaze is characterized by the same retinal disparity over its entire surface. Through the use of random-dot stereograms, Julesz could constrain the observer to detect first the retinal disparity at each location in a random pattern of dots and then to synthesize this disparity information so as to perceive planar or curved surfaces at different depths. Since there were no other cues for the perception of these surfaces, localized cues such as retinal disparity must have been used to segment the visual field into these surfaces.

Therefore, the spatially global process of segmentation may be based on spatially local processes involving the detection of features at different locations in the incoming visual pattern. However, a mechanism is required for the interaction of these local processes to generate the global percept. Again, suggestions for such mechanisms have developed f rom the investi- gation of stereopsis. The first such suggestion is contained in the difference- field model by Julesz (•962). I t emphasizes that local processes, such as the

*The images of the visual world projected onto the two retinas are superimposed and integrated at the level of the visual cortex. Those portions of the images that superpose exactly are said to have zero retinal disparity. If the image must be translated laterally in order to superpose exactly, the amount of translation necessary is the retinal disparity of that input. The retinal disparity of the images of an object is a measure of the depth of the object from the observer. Zero retinal disparity corresponds to the fixation surface, which includes the point fixated by the observer.

DEPTH SURFACES IN RANDOM-DOT STEREOGRAMS 513

matching of dots could result in many erroneous matches and that a process is required for the detection of regions of similar retinal disparity in the visual input. Another model, involving dipoles coupled by springs (Julesz, 1971) simulates many of the temporal and spatial phenomena observed in stereopsis. A qualitative model that represents more directly the interaction between neural disparity detectors has been developed by Nelson (1975). The model postulates specific excitatory and inhibitory patterns of connectivity between the disparity detectors that result in the perception of depth.

This author, investigating computer simulations of segmentation pro- cesses requiring the detection of regions of similar feature input, has arrived at a neural network formulation of the segmentation process similar to that suggested by Nelson for stereopsis (Dev, 1974). In this communication a mathematical formulation of this segmentation process is presented whereby a network of feature-detecting neurons with specified excitatory and inhib- itory connectivity can detect regions of the visual field characterized by a specific feature. The segmentation can be based on features such as color, movement, texture and retinal disparity. The properties of the segmentation model are investigated through computer simulation with emphasis on the effects of the postulated excitatory and inhibitory pattern of connectivity. The model is then applied to the specific segmentation phenomenon occurring in stereopsis. It is shown that the postulated network is capable of extracting the depth information available in random-dot stereograms, of suppressing the noise generated by erroneous matches and of detecting the pattern of surfaces embedded in the stereogram.

Formulation of the Model

Each feature detector processes local spatial information since it only responds to input in that restricted region of the visual field which comprises its receptive field. However, clustering of features is a global property of the incoming spatial pattern and detection of such clustering implies interaction between feature detectors.

The neural representation of cognitive processes such as "detection" are not explored in this paper. Instead, it is hypothesized that the detection of a cluster of similar features requires the development of high activity in the feature detectors involved, a higher level of activity than would be generated in any isolated feature detector activated only by its corresponding feature. An important fact is that the feature detector neurons in visual cortex are arranged in a spatial array such that as the retinal image projects to the visual

514 P. DEV

cortex, the spatial relations within the retinal image are maintained. There- fore, a cluster of features in the visual input activates a localized region in the cortical array of feature detectors.

What network of feature detectors will display localized regions of high activity when detecting the occurrence of a cluster of similar features ? Two types of interaction are postulated to occur in the network, and their char- acteristics are discussed below. Similar postulates have been advanced by Nelson (1975) for the neural mechanism underlying stereopsis.

(1) Neurons at different locations but detecting the same feature are connected by mutual excitatory or facilitatory interaction. This interaction falls off with increasing distance between the neurons. A duster of similar features activates a set of feature detectors that are close to each other and hence generate strong, mutual excitatory interaction leading to an increase in the activity of this set of feature detectors. When the features are scattered rather than clustered, the increased distance between the feature-detecting neurons reduces the strength of the excitatory interaction and leads to correspondingly less increase in their activity.

(2) All neurons, including those detecting different features, inhibit each other and, again, the inhibition falls off with distance. Such inhibitory inter- action leads to the interesting effect of competition between neurons. At each location in the visual cortex, all the activated feature detectors compete with each other by inhibiting the activity of the others. This effect is par- tieularly prominent at a location where a cluster of features occurs; the increased activity at this location suppresses the activity of neurons detecting other features at that location, thus making the duster the main input detected in that region.

These two modes of interaction are represented below in a mathematical formulation. The mathematical formulation is then used for the computer simulations described in the following sections.

There are (2M-k 1) arrays of neurons. Within each array, all neurons detect the same feature such as a specific orientation of a contour, a specific retinal disparity or a specific color. Neurons in different arrays detect different features. Therefore (2M-4-1) features are detected, with neurons in the j th array detecting the jth feature.

The spatial location of the neuron in the array determines the region of the visual field sampled by the neuron. There is a topographic representation of the retina, and hence the visual field, on to each array such that spatial relationships present between points in the retinal image are maintained in each array of feature detectors.

The interaction between and within the (2Mq-1) feature detector arrays

DEPTH SURFACES IN RANDOM-DOT STEREOGRAMS 515

and a single inhibitory array can be represented by the following set of equations (for one-dimensional arrays):

Ej(t + 1,x) = ace(X) * E~(t,x)-- a,e(X) * l(t,x) +Pj(t,x), where j ---- -- M, -- ( M - - 1 ) , . . . , 0,1,2 . . . . , M, and

(1)

M I(t + l,x) = ae,(X) *

J ~ - M Es( t ,x ) - -a , (x ) • l ( t ,x)+Q(t ,x) . (2)

The subscript j indexes the set of ( 2 M + l ) features that can be detected by these neurons, and indicates that each array is limited to the detection of a single type of feature. E~(t,x) is the activity, at time t, of a neuron at location x and detecting the j th feature. I(t,x) is the activity of a neuron in the inhib- itory array. Pj(t,x) and Q(t,x) are the inputs to the feature detector arrays and to the inhibitory array respectively. The subscript, ei, for example, indicates a connection from an excitatory neuron to an inhibitory neuron. Appropriate interpretations can be applied to the other subscripts, ee, ie, and ii. ace(X), aie(X), ael(X) and a,(x) are "spread functions" or spatial weighting factors corresponding to the axonal or dendritic spread of the neural feature detectors, ace(x) determines the nature of the excitatory interaction between neurons detecting similar features. The other three-spread functions deter- mine the inhibitory interactions within and between arrays. Note that all inhibition between feature-detecting neurons is assumed to occur through a common pool of inhibitory neurons. This assumption is made because of the postulate that all neurons inhibit all neighboring neurons without regard to the nature of the features they detect. The pool of inhibitory neurons inhib- iting all feature detecting neurons embodies the required non-specific inhibition. (The symbol * indicates the operation of convolution.)

Analysis of the Segmentation Model

In this section, the effect of the two postulated modes of connectivity on the activity in the feature detector arrays are explored.

E X C I T A T O R Y I N T E R A C T I O N

What is the effect of excitatory interaction between neurons detecting similar features ? This is investigated by varying the intensity and the spatial extent of the excitatory spread function ace(X) with all the other spread

516 P. DZV

functions removed, that is, a~l(x), ate(x) and a, (x) are equal to zero. A simple rectangular spatial form is assumed for the excitatory spread function:

Ace ace(x) = - - , for --Xe~<X <X~e (3)

2Xee-kl

---- 0 elsewhere.

Therefore, xoe Z ae~(X) = A~e. (4)

- - X r e

One advantage of permitting interaction only between neurons detecting similar features is that each array of similar feature detectors is not con- nected to any other array. Therefore the properties of any one such array can be investigated and the results apply to all the other arrays. The equation describing the activity in one array is:

E(t q- 1,x) = a~e(X) * E(t,x) +P(t ,x) . (5)

Since excitatory interaction between neurons results in positive feedback, the possibility exists that the system may be unstable and that the activity may increase without limit. It can be easily shown that, for a spatially uniform input which is constant over time, the condition under which the system is stable is:

Aee< 1. (6)

The same condition can be shown to hold for spatially patterned inputs. The system described by equation (5) is simulated on the digital computer.

An array of 100 elements is used, that is, x ranges from 1 to 100. The simu- lated connectivity is described by equation (3). To satisfy stability conditions the value of Ace is chosen as 0.9. X,e varies from 0 to 7 for different runs of the simulated system. The level of activity E(t,x) is represented by the value of the element at location x in the simulated array and at discrete time t. (Note: to avoid boundary conditions, a wrap-around method is used, that is, location 1 in the array also corresponds to location 101.)

The major effect of excitatory interaction between neighboring neurons in an array is to spatially smooth any pattern of input activity. The input pattern to the neuron array [Fig. 1 (a)] consists of a high-intensity input to the central region of the array and an input of lower intensity to the flanking regions of the array. This input causes an increase of activity in the array. The final stable pattern of activity in the array, for different values of Xe~, is illustrated in Fig. l(b). Increasing the value of Xee results in increased

DEPTH SURFACES IN RANDOM-DOT STEREOGRAMS 517

smoothing of the input pattern and a corresponding reduction in the peak value of the stable activity pattern.

g

iooo

~: 500 .=_

o g ~'I0000

(a)

(b)

00 50 I00 Location in array

FIG. 1. The spatial pattern of activity developed in a single feature-detector array receiving a block input when the only interaction between neurons is excitatory. The array is one- dimensional and contains 100 feature-detectors. (a) Spatial pattern of input to the array. (b) Final pattern of activity in array for different values of the spread of excitation Xe,.

(Ae~ = 0.9, X~e : 0,1,2,4,7.)

By varying the intensity of the input to the central region of the array, the relationship between the input and the level of stable activity developed is found to be monotonic and linear.

Excitatory interaction between neurons detecting similar features may itself be sufficient for the detection of segments in the input pattern. To test this possibility, the simulated system is activated by an input consisting of a random pattern of binary noise, corresponding to the presence of scattered features, as well as a single segment, that is, a cluster of features [Fig. 2(a)]. The final pattern of activity developed in the array is illustrated in Fig. 2(b) for different values of Xee.

Detection of a segment is defined as requiring the development of a region of high activity in the neural array at a location corresponding to the location of a cluster of similar features in the input pattern. If the output of the simulated array is passed through a threshold device, the regions of high activity can be detected. It is clear from Fig. 2(b) that a threshold device

518 v. DEV

would permit the detection of a segment at locations 65 through 90 in all the examples illustrated since any spatial spread of excitatory activity results in maximum activity in that region of the array activated by a cluster of features. Therefore, the postulated excitatory interaction permits the detection of segments in the input.

- 's°°l (o) I ,oootaA M/1 A I

4, -0 50 I00 I0 000 (b)

4000 - ~ !0 000

o o 4000 I0000

4000 I0000

40000~00 Location in array

FIG. 2. The spatial pattern of activity developed in a single feature-detector array when the only interaction between neurons is excitatory. (a) The input pattern contains features randomly scattered in the array. There is a duster of features at locations 65 through 90. (b) Final pattern of activity in array for different values of Xee. (Aee ~ 0"9, Xee = 1,2,4,7.)

However, spatial spread of excitatory interaction does not suffice to completely suppress activity in those regions of the array which receive scattered feature input, such as locations 30-65 in Fig. 2(b). Increasing values of Xee do reduce activity in regions receiving scattered feature inputs but they also blur the outline of the segment. The role of inhibitory interaction between neurons and its effect on the response of the array to scattered feature input is discussed next.

DEPTH SURFACES IN RANDOM-DOT STEREOGRAMS 519

INHIBITORY INTERACTION

Excitatory interaction between neurons detecting similar features has been shown to suffice for the generation of segments. What role is played by inhibitory interaction within and between arrays ?

I000

f ,,~ 500

>.

0 5000

4000 !

3000

2000

I000

0

(a)

(b)

g" ! -,oooi 4000 1 (c)

3000 r

20001 -

0 50 I00 Location in array

FIG. 3. The spatial pattern of activity developed in a single feature-detector array when it interacts with an inhibitory array. (a) Spatial pattern of input to the array. (b) Final pattern of activity in the feature-detector array for different values of spread of inhibition and with no spread of excitation. (Ace=0"9, ActuAte=0"5, Xee~O, X~t ---= Xt, --= 0,1,2,4.). (c) Final pattern of activity in the feature-detector array with spread of both excitation and

inhibition. (A~e = 0.9, A,t ~ At, = 0.5, X,~ ~ 2, X,t ~ Xt, ~ 0,2,4.)

The interaction between a single feature detector array and a single inhibitory array is explored first. The input to the feature detector array is indicated in Fig. 3(a). In Fig. 3(b), the well-known effect of lateral inhibition can be seen. There is no spread of excitatory interaction (Xee = 0) and each neuron only excites itself. When there is no spread of inhibitory interaction either (Xei = Xle = 0 ) , the spatial pattern of array activity is rectangular and is similar to the input pattern. With spread of inhibitory interaction,

520 P. DEV

contrast enhancement occurs, that is, the change in intensity of array activity at the edges of the input block is greatly increased. Increasing the values of Xei and Xie increases the contrast enhancement.

However, if there is any spread of excitatory interaction, as illustrated in Fig. 3(c) (Xee = 2), the effect of contrast enhancement through lateral inhibition almost disappears for low values of Xe~ and X~ (up to Xe~ = X ~ =

3). With greater lateral spread of inhibition (Xe~ = Xie ---- 4), some contrast enhancement occurs. The effect of such contrast enhancement would be to increase the steepness of the edges of the segment. However, such contrast enhancement may also increase the amplitude of any noise activity present.

- 'S°°l Cal I '°°°KAA M llf--1 A I 5 0 0 ILl . . . . . . I I L.I f i l l U UL 1

~I I I I I I I I l I I - 0 50 I00 I0000~

{ ,ooo ,,

Iooo~ "r',J w r ~5~o , , , 'loo

Location in array

FIG. 4. The spatial pattern of activity developed in a single feature-detector array which receives both noise and segment input and interacts with an inhibitory array. (a) Spatial pattern of input. (b) Final pattern of activity in the feature-detector array for different amounts of inhibitory spread. (For the first example, Ace = 0"9, Xe~ ~ 2, that is, there is no inhibition. For the remaining three examples, A c e s 0 " 9 , Aet~A~,=0"5, X~e=2,

Xe, ~ X~ = 0,2,4.)

DEPTH SURFACES IN RANDOM-DOT STEREOGRAMS 521

In Fig. 4, the input is again a random pattern of binary noise, correspond- ing to the presence of scattered features, as well as a single segment corres- ponding to a cluster of features [Fig. 4(a)]. The final pattern of activity developed in the array is illustrated in Fig. 4(b). The first example in Fig. 4(b) indicates the pattern of activity developed when no inhibition is present. The excitatory interaction smooths and reduces the noise leaving maximum activity in the segment region (locations 65 through 90). Introducing inhibit- ory interaction reduces the overall level of activity and enhances the spatial contrast but does not significantly alter the pattern of activity in the array.

If more than one feature detector array is considered, the interaction between these arrays via the inhibitory array becomes important. The presence of a segment in one feature detector array should cause high activity in the inhibitory array at that location and should produce a depression of activity at that location in other feature detector arrays. The effect of such interaction between two feature detector arrays is illustrated in Fig. 5.

1500

I000~

50O [

0 t-

4000 [

-2000 I- ~, 40OO I-

o

-2000 0

(o)

_ 1 I I I I i t I i

(c)

I I I I I I I I I

I I I I l i l l l 50

I0000

8000

6000

4000 I0 000

8000

6000

4000

i ~ 4000f 2ooo~/....~ _2oo~I,,-,,,, ,~c'S, 4000 ~- 2ooo

Io0 0 50 I0O LocQlion in array

FIG. 5. The spatial pat terns of activity in two feature-detector arrays which interact with a single inhibi tory array. (a) Inpu ts to the two arrays contain bo th scattered and clustered feature input. (b) Final pat terns of activity developed in the feature-detector arrays when no inhibi tory array is present. (Ace = 0'9, Xee ~ 2). (C) Fina l pat terns of activity in the feature-detector arrays when there is no spatial spread of inhibit ion. (Ace ~ 0"9, Act ~ A t ,

0"5, X, , ~--- 2, -Yet ~-- Xt, ~-~ 0). (d) Final pat terns of activity in the feature-detector arrays for large spatial spread of inhibit ion. (Ace ~ 0"9, Act ~ A i , ~ 0.5, X,~ ~ 2, X~ ~ Xl~ ~ 4).

522 I,. DEV

Both feature detector arrays receive as input a random pattern of binary noise corresponding to the presence of scattered features [Fig. 5(a)]. Array 1 receives segment input at locations 30-60. Array 2 receives segment input at locations 65-90. When only excitatory interaction within an array is present, and there is no inhibitory interaction between arrays [Fig. 5(b)], the segments are clearly visible. Introduction of inhibition radically alters the pattern of activity in the arrays [Fig. 5(c)]. At the locations containing the segments, high activity is still visible in the appropriate arrays. In array 1, there is a suppression of activity at locations 65 through 90 because of the high activity in array 2. Correspondingly, in array 2, there is suppression of activity at locations 30 through 60 because of high activity in array 1. However, in array 1, at locations 10 through 30, high activity has developed. The pattern of activity could correspond to the presence of two segments from locations 10 through 25 and locations 35 through 60, or to a single segment extending from locations 10 through 60. Evidently, the noise input to array 1 at locations 10 through 30 was sufficiently high compared to the input to array 2 that competition between the two arrays resulted in the generation of a segment at that location. Note that a large spread of inhibition had little effect on the patterns of activity generated [Fig. 5(d)].

The effect of inhibitory interaction is therefore a complex one. The presence of a segment in one array does reduce activity at corresponding locations in other arrays. Thus, the presence of a cluster of similar features does tend to suppress spurious activation of other feature detectors in these locations. When no well-defined segment input is present, segments still tend to form through a process of competition between the arrays.

The effect of varying the parameter a~i(x) was also investigated but was found to have little effect on the final spatial patterns of activity. The general effect was to increase the overall level of activity. Therefore, the value of the parameter was assumed to be zero in all the simulations.

In summary, the model described has the segmentation capability required of it. Where clusters of similar features are present, the excitatory interaction within each array generate high levels of activity at the locations correspond- ing to each cluster. The inhibitory interaction between the arrays causes high activity in one array to suppress activity in other arrays. Thus a segment characterized by one feature will suppress spurious activity in other feature detectors. Where no clear cluster of similar features is present in the input, the inhibitory interaction results in competition and the generation of a segment.

The introduction of appropriate nonlinearities could result in a segmenta- tion process that would be more effective than the one described above. For

DEPTH SURFACES IN RANDOM-DOT STEREOGRAMS 523

Dispority orroys [

example a nonlinear neural response, such as a sigmoid relationship between the activity developed in a neuron and its input, would result in more dearly defined segments and greater suppression of noise in the final pattern of activity in the arrays. As demonstrated by Wilson & Cowan (1972), such a nonlinearity may also lead to hysteresis phenomena. No hysteresis is observed in the model described above. The effects of such nonlinearities have not been described here because the major purpose of this investigation was to deter- mine the efficacy of systems with excitatory and inhibitory interaction in the detection of clustering of similar feature input.

Application of the Segmentation Model to the Detection of Depth in Random- dot Stereograms

As has been discussed earlier, the segmentation process may be based on a variety of features. One such feature is retinal disparity (see footnotes earlier). Surfaces at different depths from the observer are characterized by different retinal disparities and hence the detection of these surfaces constitutes a segmentation problem. The detection of such surfaces in random-dot stereograms has been selected for study here because this phenomenon represents a clear example of segmentation on the basis of a single type of feature, retinal disparity. Further, random-dot stereograms and disparity detectors appear to be far more amenable to investigation through computer simulation than most other visual segmentation processes, as will be obvious below.

I.eft refin0 [ . . . . I [ I Right retinO

I I ,] l I i i l I \NN% / / .~ :

l . . . . ] Inhibitory array

S a/fix)

FIG. 6. A block diagram of a neural network for the stereoscopic perception of depth. Each disparity array receives from both retinas, with the output from one retina being shifted laterally relative to the other as indicated in the arrows. The spread functions a,~(x), ae~(x), ale(x) and a~i(x) describe the convergence and divergence that occur in both

excitatory and inhibitory interaction.

524 P. DEV

Figure 6 outlines a block diagram of a neural network for the perception of depth surfaces. The two retinas receive slightly different views of the world. For example, each receives one pattern of a random-dot stereogram pair. Five different disparities are detected corresponding to lateral displace- ments of the retinal image up to two units to the left or to the right. Corres- ponding to the five disparities detected are five arrays of neurons. Within an array, all neurons detect the same disparity. The location of each neuron in the array corresponds to the locations within the retinas from which it receives input. That is, there is a one-to-one map from each retina to each disparity array, with spatial relationships being maintained. Within an array, there is excitatory interaction between the neurons as characterized by the excitatory spread function ace(x). Inhibitory interaction is mediated by a single inhibitory array which receives excitatory input from all the disparity arrays and inhibits all the disparity arrays as well as itself. Again, the inter- action is between neurons at corresponding locations, with the spread of interaction being described by the spread functions ael(x), a,e(X) and a,(x).

A random-dot stereogram, first developed by Julesz (1960), consists of a pair of patterns, each pattern composed of randomly positioned dots. The two patterns are identical to each other in some regions and differ in others. In the regions where they differ, the difference simply consists of a lateral shift of one pattern with respect to the other. Each eye receives one pattern of the stereogram pair, and the two patterns can be matched at any location in the brain where the two retinal inputs converge. Identical regions in the two patterns will match exactly when superposed, but the remaining regions must be shifted laterally before a match can be obtained. This lateral shift corresponds to the retinal disparity generated by any surface that is at a depth other than the depth at which the observer is fixated. Thus the region of the pattern that requires lateral shift is perceived as at a depth different from that of the other regions.

Neurons capable of detecting retinal disparity have been shown to exist in the visual cortex of the cat (Barlow, Blakemore & Pettigrew, 1967) and the monkey (Hubel & Wiesel, 1970). When the visual input is a random-dot stereogram, a disparity detector can be thought of as a neuron that matches dots from the two retinal images by responding maximally to the occurrence of a specific pair of dots. The mismatch in the locations of the two dots is the disparity for which that detector is tuned. Since a dot in one retinal image may be matched with any neighboring dot in the other retinal image, many disparity detectors besides those detecting the desired disparities are activated. This spurious activation of disparity detectors generates "noise" which may mask the presence of the desired disparities.

DEPTH SURFACES IN RANDOM-DOT STEREOGRAMS 525

A computer-generated random-dot pattern is illustrated in Fig. 7(a). The two patterns are the inputs to the two eyes. The locations of high intensity in these patterns represent white dots, and those of low intensity represent black dots. Five sequences of 20 points each in the first pattern have been shifted laterally by different amounts to produce the second pattern. The shifts range from --2 to + 2 pattern points. As is obvious, no segments are visible in the retinal inputs.

(a)

(b)

D~

1500

I

~" 0 I i

g

0 5O I00

I I I I I I I I I I |1

6000

'

6000 -

o 6000

6000 -

0 50 I00 Locotion in orroy

(c)

FIG. 7. The segmentation process as applied to the detection of surfaces in random-dot stereograms. (a) The input to each eye is one pattern of a r andom-do t stereogram pair. The peaks represent white dots and the troughs black dots. (b) The input to each of the five disparity detector arrays as a result o f the appropriate matching of outputs from the two retinas. (c) Final patterns of activity in the disparity-detector arrays. ( A e e ~ 0 " 9 , Ae~ = .4~e

= 0"5, X ~ ~ X~ ~--- X~ = 2.)

526 P. DEV

Each disparity detector array receives the output from both the retinas, with one retinal image being shifted relative to the other. This lateral shift is different for different disparity detector arrays and it ranges from --2 to +2. At each location in a disparity detector array, the modulo-2 sum of the input is obtained. A disparity detector at any location in the array is con- sidered activated if the inputs from the two retinas are similar, that is, if both the inputs are of high intensity or both are of low intensity. The resulting activation of the disparity detectors is illustrated in Fig. 7(b). In each array, a cluster of features is clearly visible, indicating the presence in the visual world of a surface at the corresponding depth. However, this cluster cannot be detected by a simple detector such as one that detects all activity above a certain threshold. A transformation is required that will result in high activity in the array at the location of the duster and reduced activity else- where. That is, a segmentation process is required.

The result of a segmentation process is shown in Fig. 7(c). These are the final spatial patterns of activity achieved by the disparity detector arrays.* In each array, activity at the location of the segment is higher than activity elsewhere in the array, thus permitting a threshold detector to detect the location of the segments. Also, the activity at locations not containing the segments is clearly reduced. Thus, a complex input, containing five separate dusters of similar features, can be processed by the segmentation model. The output will result in the perception of five surfaces, each at a different depth from the observer and all arranged in a series of increasing depth.

Discussion

A model has been presented of a neural process that segments the visual field into spatially disjoint regions, each region characterized by a specific feature. Such segmentation may generate regions characterized by different colors, different textures, etc. Investigation of such a model helps to point out its limitations and possible extensions.

So far, the simplistic assumption has been made that the visual field can be separated into regions each characterized by a specific feature. However, a gradation of features frequently exists: a sloping surface generates a gradation in retinal disparity; it may also display a gradation in texture; light striking a surface at an angle may result in a gradation of brightness along that surface

*Note that the modulo-2 sum is applied to retinal inputs but does not appear in the seg- mentation equations (1) and (2). Each disparity detector can be considered as consisting of two parts. One part receives the retinal input and transmits the modulo-2 sum to the second part. The second part also receives input from other disparity detectors, and the changes in its activity are described by the segmentation equations.

DEPTH SURFACES IN RANDOM-DOT STEREOGRAMS 527

and a corresponding gradation in color tone. For the detection of segments characterized by a gradation of features, a modification of the model is required.

One possible modification is to permit spread of the excitatory interaction to include excitation between neurons detecting similar but not quite the same feature. Then, as long as the gradation is sufficiently gentle, the segment will be detected. Another possible modification includes the introduction of neurons capable of detecting such gradations and of exciting other neurons detecting a similar gradation. No such detector of feature gradations has been observed neurophysiologically and the suggestion remains an interesting possibility. A third possibility is to invoke an active self-organizing process which biases the excitatory connectivity so as to adapt the network to each new visual input. For example, if the excitatory connectivity includes spread both in the spatial and the feature dimensions, the extent of the spread should be modifiable by the input. If a region is entirely characterized by a single feature, the excitatory connectivity between those feature detectors can be collapsed in the feature dimension and expanded in the spatial dimension. If, instead, the region contains a gradation of features, the converse modi- fication in connectivity can occur. Such a self-organizing process would require a second segmentation stage, operating on connectivity between neurons in the first segmentation stage, and generating segments of similar connectivity.

Another simplistic assumption made in the model is that the relevant parameter during the segmentation process is the absolute level of neural activity. During the computer simulations, it was obvious that the rate of increase of neural activity also clearly defined the segments. The rate of change of activity may prove of greater importance, neurophysiologically, than the absolute level of neural activity since it is known that neurons are extremely sensitive to changes in their input but adapt rapidly to a sustained input.

One problem the model does not address is the maintenance of segments during eye movements. An eye movement results in a spatial shift of the entire visual input. Further, the retinal disparity of a surface may not only shift spatially but may also change if the eye movement involves a change in vergence. Saccadic eye movements occur at a mean interval of 200 msec and hence any segmentation process must be completed within that period if it is not to be modified by the subsequent retinal input. Therefore, after every eye movement, the segmentation process must be repeated. A system where the segments are maintained during eye movements would probably involve higher level processes either for the movement of segments across arrays or

528 P. DEV

for the detection of a segment independent of its current spatial location. Since the hypothesis presented here views segmentation on the basis of feature clustering at a low level process, it is preferable to assume that the segments are recreated after every eye movement.

It should be noted that random-dot stereograms present a particularly difficult segmentation problem. When viewing them, there is an initial delay before the depth ~urfaces are perceived but, once the surfaces are acquired, they are maintained in spite of eye movements. If, as suggested above, the segmentation process is repeated after every eye movement, the delay in the first segmentation and the rapidity of all subsequent segmentations implies the existence of a powerful cognitive component influencing the generation of all segments after the first. The initial delay may also be the result of the tremendous ambiguity of retinal disparity input, an ambiguity which is far greater than that presented by most scenes in the visual world.

This work was supported in part by NIH Grant No. 5 R01 NA 09755-03 COM awarded to M. A.. Arbib, Computer and Information Sciences, University of Massachusetts, Amherst., Mass., and by the Neurosciences Research Program, Boston, Mass.

References

ARBIB, M. A. (1972). The Metaphorical Brain: An Introduction to Cybernetics as Artificial Intelligence and Brain Theory. New York: Wiley-Interscience.

BARLOW, H. B., BLAKEMORE, C. & PETTIGREW, J. D. (1967). The neural mechanism of binocular depth discrimination. J. Physiol. (London), 193, 327.

D~v, P. (1974). Segmentation processes in visual perception: a co-operative neural model. University of Massachusetts, Amherst. COINS Technical Report 74C-5.

GREGORY, R. L. (1969). The Intelligent Eye. New York: McGraw-Hill. HUBEL, D. H. & WIESEL, T. N. (1970). Stereoscopic vision in macaque monkey.

Cells sensitive to binocular depth in area 18 of the macaque monkey cortex. Nature (London), 225, 41.

JULESZ, B. (1960). Binocular depth perception of computer-generated patterns. Bell System Tech. J., 39, 1125.

Jt~LESZ, B. (1962). Towards the automation of binocular depth perception (AUTOMAP-1). Proceedings of the IFIPS Congress, Munich 1962. Ed. C. M. Popplewell. Amsterdam: North-Holland.

JULESZ, B. (1971). Foundations of Cyclopean Perception. Chicago: University of Chicago Press.

NELSON, J. I. (1975). Globality and stereoscopic fusion in binocular vision. J. Theor. Biol., 49, 1.

WILSON, H. R. & COWAN, J. D. (1972). Excitatory and inhibitory interactions in localised populations of model neurons. Biophys. J., 12, 1.