A biologically-inspired sparse, topographic recurrent neural … · 2020. 12. 18. · A biologically-inspired sparse, topographic recurrent neural network model for robust change

A biologically-inspired sparse, topographic recurrentneural network model for robust change detection

Jogendra Nath Kundu1, Srinivas K.1, R. Venkatesh Babu1, Devarajan Sridharan2,3

1Department of Computational and Data Sciences,2Department of Computer Science and Automation and,

3Center for Neuroscience, Indian Institute of Science, Bangalore, [email protected], [email protected]

Abstract

We present a recurrent neural network (RNN) model for efficient change detection,specifically for the challenging scenario in which changes occur in an interrupted se-quence of visual images (“change blindness”). Motivated by known neuroanatomyand physiology of the superior colliculus, a midbrain structure involved in drivingattention to changes, we designed a neurobiologically-constrained RNN modelwith sparse, topographic connectivity. The model developed robust encoding andmaintenance strategies despite discontinuities in its input stream and efficiently de-tected changes in its stable mnemonic representation. As compared to conventionalRNNs, our model learned more rapidly and achieves identical change-detection per-formance with far fewer connections. In addition, it reproduced key physiologicalcharacteristics of biological neurons, including greater sensitivity to dynamic versusstatic inputs, and stable encoding and robust change-detection despite marked het-erogeneity in population dynamics. A mechanistic analysis of our model providesinsights into putative neurobiological mechanisms of change detection.

1 Change detection and the Superior ColliculusWe live in a rapidly changing world. For adaptive survival, our brains must possess the ability todiscriminate and localize changes in our environment. In the laboratory, the ability to detect andlocalize changes is tested with the help of “change blindness” tasks (Fig. 1A) in which successiveimages, that differ in one important detail, are presented alternately with an intervening blank image(flicker paradigm; [11, 10]). In such tasks, subjects must scan each image, moving their eyessuccessively to different targets of interest, to identify the object of change. Selective attention is animportant cognitive capacity that permits overcoming change blindness [9]: previous studies haveshown that directing the subject’s attention toward the location of change, either overtly (by movingthe eyes) or covertly renders the change more easily detectable.

The superior colliculus (SC) is a midbrain structure that is highly conserved across vertebrate species,and is critically involved in the control of eye movements (saccades) [4]. Recent studies showthat the SC also plays an important role in spatial attention tasks involving change detection [5,13, 3]. For instance, inactivating the SC produces deficits with detecting and reporting changesin motion-direction change detection tasks [6, 19]. Moreover, microstimulating the SC enhancesanimals’ ability to detect changes in a change blindness task [2].

Here we develop a recurrent neural network (RNN) model of change detection based on knownneural circuit properties of the SC. Specifically, we introduce a biologically-inspired variant ofa conventional RNN that we term a sparse, topographic recurrent neural network (st-RNN), andtrain the model to robustly detect changes. By manipulating (silencing or enhancing) key layerswithin the st-RNN, we qualitatively replicate experimental effects of manipulating (inactivating ormicrostimulating) the SC in change detection tasks. Finally, we show that our st-RNN model learnsmore quickly and computes changes more efficiently than a conventional (fully-connected) RNNmodel.Submitted to NIPS 2017 Workshop on Cognitively Informed Artificial Intelligence, Long Beach, CA, USA.

Input Layer(8x8)

Inhibitory Neurons (8x8)

Excitatory Neurons (16x16)

E-E

I-I

I-EE-I

Output Layer(8x8)

B CA

Image (250ms)

Blank (250ms)

E-E E-I

I-II-E

Topographic conn.

Repea

t

Hidden Layer

Figure 1: Task and network configuration. A. Change blindness paradigm. Blue circle: location of change.B. Schematic of sparse, topographic recurrent neural network. Blue: E neurons and connections, orange: Ineurons and connections. C. Weight matrix of the hidden layer. White: non-zero connections. (Below) Typicaltopographic connection.

2 A sparse, topographic recurrent neural network (st-RNN) modelBiologically-constrained model. The SC receives direct visual input from the retina and projects tocortical and sub-cortical regions involved in controlling attention and eye-movements. It is organizedinto distinct layers comprising excitatory (E) and inhibitory (I) neural populations with recurrentconnections. In addition, neurons in the SC are known to exhibit restricted spatial receptive fields andtopographic connectivity such that each neuron encodes only a small portion of visual space, andconnects predominantly to neighboring neurons that encode overlapping spatial regions [4].

We implemented these constraints in an RNN model. First, the model received input directly froma localized patch of the image, representing spatially localized visual input. Second, followingDale’s law, each presynaptic neuron in the model provided either excitatory or inhibitory inputsto all of its postsynaptic partners; recurrent connections were made among both E and I neurons(Fig. 1B). As the precise ratio of E:I neurons in the SC is not known, we adopted the canonicalratio of 4:1 observed in mammalian cortex [18]. Third, each neuron connected only to a smallneighborhood of neurons such that mutually connected neurons encoded overlapping regions ofthe image. This connectivity pattern ensured that weight matrix for all layers was sparse (e.g. Fig.1C) and connections were topographic (Fig. 1C, lower). These constraints may be described as:W eff = SM(W )× CM(W )� bW c+; where SM(.) is a sign matrix, whose elements determinesE versus I connectivity among the different neurons, CM(.) is a mask matrix that determines theconnections permissible under topographic connectivity constraints (Fig. 1C), × refers to matrixmultiplication, � denotes element-wise multiplication, and b.c+ denotes rectification [12]. Finally,RNN dynamics were simulated as follows [16]:

st = f(W · rt−1 + U · xt) rt−1 = bst−1c+ ot = g(V · rt + bo)

where xt, rt and ot are the activity of the input, recurrent layer, output neurons at time ‘t’ (respec-tively), st represents a latent variable, that can be construed as the net input into the recurrent layer,U , V and W are input-hidden, hidden-output and recurrent hidden layer connectivity matrices, bo isoutput bias, f(.) is the hyperbolic tangent function, and g(.) is a sigmoid non-linearity. Simulationswere performed with the Tensorflow[1] framework.

Training and testing. Two key operations are involved in successfully solving the change blindnesstask (Fig. 1A). First, the network must maintain a representation of the image information over theduration of the blank. Second, this information, stored in memory, must be compared against thesubsequent image to detect the change. We achieved these two operations with two different st-RNNs,operating in sequence (Fig. 2A). The first st-RNN – the “Mnemonic-coding” RNN – was trained tomaintain the representation of the image over the duration of one (or several) blanks, and to updateits output upon the presentation of the second image. The second st-RNN – the “Change-detection”RNN – compared the current and past outputs of the mnemonic-coding RNN to detect changes.

Each st-RNN was trained separately, and with separate training datasets comprisinh ∼200,000synthetic 8x8 binary patches. For each binary patch A, a alternate (change) patch A’ was generatedby uniformly turning on or off randomly-sized contiguous patches of input pixels (upto a maximumpatch size of 4x4). During training, the input to each st-RNN comprised a sequence of 10 images,

2

Time

Change detection

Mnemonic coding

Micro-stimulation

A C

Inactivation

DB Time

Mea

n ac

tivity

0 1

Input

Time

Figure 2: Change detection in the st-RNN model. A. Exemplar binary input patch sequence (top row), itsmaintenance over the duration of the blanks by the mnemonic-coding RNN (middle row) and the output of thechange-detection RNN (bottom-row). Dashed gray circle: Location of change (offset of pixels). B. Mean activityof the output (change-detection RNN) to static (left), moving (middle) and looming (right) stimuli. Gray bar:Duration of stimulus. C. Change blindness task involving grating orientation change detection (see text fordetails). Other conventions as in (A). D. Performance of the network after simulated microstimulation (top row)or inactivation (bottom row). Other conventions as in (C).

beginning with a binary patch (A), succeeded by a variable number of blanks (median=2, range=1-8),and finally followed by the changed binary image (A’). We employed a variable number of blanksto ensure that the RNN learned a general strategy of change detection, that did not depend on theprecise temporal interval between the original and changed image. The order of presentation of Aand A’ were counterbalanced to ensure that the number of “onset” versus “offset” type changes werematched in the training dataset (e.g. Fig 2A shows an offset type change). Each RNN was trainedby optimising the sum of the L2-loss function at the output layer (actual versus expected output) foreach timestep [8], using stochastic gradient descent over time with a minibatch size of 100.

3 Robust change detection in a mnemonic representationPerformance evaluation. The st-RNNs were trained until the change in the loss function plateaued(Fig. 3A), and subsequently tested on a different corpus of 40,000 validation images. As shown for anexemplar patch in Figure 2A, the mnemonic-coding RNN correctly encoded and robustly maintainedthe first image patch across the blanks (Fig. 2A, middle row). Similarly, the change-detection RNNsuccessfully detected and localized the change across the two image patches (Fig. 2A, last row).

As a by-product of this training, the model’s output neurons exhibited higher sensitivity for dynamicversus static inputs. When a static image (4x4 square patch) was presented to the model for foursuccessive frames, output neurons of the change-detection RNN fired briefly at the onset of the image(Fig. 2B, left). On the other hand, when the same image was presented at contiguous locations onfour successive frames, to mimic a moving stimulus, neurons responded at nearly 4x higher rates (Fig.2B, middle). Similarly, when the stimulus loomed-in across the four frames (matching the averagenumber of input pixels), the neurons, again, fired at high rates (Fig. 2B, right). These results resemblethe stronger activation of SC neurons by moving or looming, as compared to static, stimuli [4, 17].

Next, we tested the model in a change detection task commonly used in behavioral psychophysicsexperiments [15, 14]. In this task, subjects are presented with 4 oriented gratings of various contrasts(Fig. 1C, top row). The gratings briefly disappear, and following reappearance one of the fourgratings’ orientation changes. The subject must detect and localize the grating that changed inorientation. We simulated the same task input as a 1000 × 800 image, encoded by tiling 50,000st-RNN modules with spatially overlapping input representations. The model successfully detectedand localized the change grating (Fig. 2C, circled lower-left grating). Nevertheless, the model alsoincorrectly localized the change in a second, low contrast grating (Fig. 2C, upper right). We foundthat incorrect localization resulted from a gradual deterioration in the maintenance of this grating’s

3

1

00 2000 4000

1

0

B

Stimulus PC1

Stim

ulus

PC

2

Stimulus PC1Stimulus PC2

Tim

e P

C

CM

se e

rror

Iteration Iteration

A

fcRNNstRNN

2

1

00 100 200

1

0

5 10 15 5 10 15

1

0

Time

E-neurons I-neurons

Act

ivat

ion

DChange-detectionMnemonic-coding

Figure 3: Model comparison and analysis. A. Mean square training error over iterations for the st-RNN (blue)and fully connected RNN (red). B. Stable representation in a mnemonic coding subspace defined by the top twoprincipal components of mnemonic-coding RNN hidden unit activity during the blanks (delay) for exemplarpatterns. C. Dynamics of representation in the mnemonic-coding RNN hidden layer. Blue to yellow: early to latetime points during the delay. D. Heterogenous activity dynamics in exemplar E (left) and I (right) hidden units inthe mnemonic coding RNN for four different input patterns (insets).

representation in the mnemonic-coding RNN, which led to a spurious “change” being detected uponreappearance of the gratings.

We sought to rescue this deficit by simulating “microstimulation” in the model. Specifically, wepotentiated the recurrent layer weights of the mnemonic-coding RNN, by scaling all of its weightsby a constant factor (1.3). This produced a remarkable recovery of the ability of the network tolocalize the change accurately to the lower left grating (Fig. 2D, upper row). Conversely, simulatinginactivation of the hidden layer, by setting the recurrent weights of the mnemonic-coding RNN tozero, completely eliminated the ability of the model to detect and localize the change (Fig. 2D, lowerrow). These results closely resemble the experimental effects of microstimulating or inactivating theSC on change detection behaviors [2, 6, 19].

Analysis of mechanisms. First, we compared the advantages of the sparse, topographic connectivityin our st-RNN against full connectivity in a conventional RNN. We discovered that the fully-connectedRNN took considerably (∼5x) longer to train, both for the mnemonic-coding and change-detectionoperations (Fig. 3A, red versus blue curves) without any noticeable improvement in performanceafter training. Moreover, the proportion of non-zero weights in the hidden layer of the fully connectedRNN (45.7%) was, far lower than the proportion of non-zero weights in the st-RNN (2.9%), indicatingthat st-RNN was able to compute changes much more efficiently.

Next, we analyzed how the model was able to maintain image information robustly during the blank.We performed principal components analysis (PCA) on the time-averaged activity during the blankperiod of hidden layer units in the mnemonic-coding RNN, and plotted the trajectory of their activitiesin a “mnemonic subspace” [7], whose z-axis (time PC) represents a dimension that captures variancein activity over time (Fig. 3B-C). Our model showed hallmark characteristics of the “stable subspacemodel”, a recently proposed framework for stable maintenance of information in the brain [7]. Theseincluded stable coding in the mnemonic subspace, with a partially aligned input coding vector, andstrong temporal dynamics along the time PC dimension (Fig. 3B-C). The least stable mnemoniccoding existed for outlier patterns whose statistics were not representative of the training dataset (Fig.3B). Stable mnemonic coding occurred despite individual neurons exhibiting strongly heterogenousactivity dynamics (Fig. 3D), mimicking observations in various brain regions including the prefrontalcortex and SC. The change-detection RNN was then able to reliably detect changes occurring in thisstable representation in the mnemonic-coding RNN.

In summary, we developed a biologically plausible recurrent neural network model with sparse,topographic connectivity. The model developed a robust strategy for encoding and maintaining imageinformation in the absence of external input, efficiently detected changes in this stable mnemonicrepresentation, and outperformed conventional RNN models. Future work will seek to validate themodel with human behavioral and neuroimaging experiments.

4

Acknowledgments. This research was supported by a CSIR Fellowship (JNK), a Robert BoschCentre for Cyber Physical Systems masters (MTech) project award (SK), a Wellcome Trust DBT-IndiaAlliance Intermediate Fellowship (DS), a SERB Early Career Research award (DS), a Pratiksha TrustYoung Investigator award (DS), and a DBT-IISc Partnership program grant (DS).

References

[1] Martín Abadi et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Softwareavailable from tensorflow.org. 2015. URL: https://www.tensorflow.org/.

[2] James Cavanaugh and Robert H Wurtz. “Subcortical modulation of attention counters change blindness”.In: Journal of Neuroscience 24.50 (2004), pp. 11236–11243.

[3] James P Herman and Richard J Krauzlis. “Color-Change Detection Activity in the Primate SuperiorColliculus”. In: eNeuro 4.2 (2017), p. 0046.

[4] Eric I Knudsen. “Control from below: the role of a midbrain network in spatial attention”. In: EuropeanJournal of Neuroscience 33.11 (2011), pp. 1961–1972.

[5] Richard J Krauzlis, Lee P Lovejoy, and Alexandre Zénon. “Superior colliculus and visual spatialattention”. In: Annual Review of Neuroscience 36 (2013), pp. 165–182.

[6] Lee P Lovejoy and Richard J Krauzlis. “Inactivation of primate superior colliculus impairs covertselection of signals for perceptual judgments”. In: Nature Neuroscience 13.2 (2010), pp. 261–266.

[7] John D Murray et al. “Stable population coding for working memory coexists with heterogeneous neuraldynamics in prefrontal cortex”. In: Proceedings of the National Academy of Sciences 114.2 (2016),pp. 394–399.

[8] Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. “On the difficulty of training recurrent neuralnetworks”. In: International Conference on Machine Learning. 2013, pp. 1310–1318.

[9] Ronald A Rensink, J Kevin O’Regan, and James J Clark. “To see or not to see: The need for attention toperceive changes in scenes”. In: Psychological Science 8.5 (1997), pp. 368–373.

[10] Daniel J Simons and Michael S Ambinder. “Change blindness: Theory and consequences”. In: Currentdirections in Psychological Science 14.1 (2005), pp. 44–48.

[11] Daniel J Simons and Ronald A Rensink. “Change blindness: Past, present, and future”. In: Trends inCognitive Sciences 9.1 (2005), pp. 16–20.

[12] H Francis Song, Guangyu R Yang, and Xiao-Jing Wang. “Training excitatory-inhibitory recurrent neuralnetworks for cognitive tasks: A simple and flexible framework”. In: PLoS Computational Biology 12.2(2016), e1004792.

[13] Devarajan Sridharan, Jason S Schwarz, and Eric I Knudsen. “Selective attention in birds”. In: CurrentBiology 24.11 (2014), R510–R513.

[14] Devarajan Sridharan et al. “Distinguishing bias from sensitivity effects in multialternative detectiontasks”. In: Journal of Vision 14.9 (2014), pp. 16–16.

[15] Nicholas A Steinmetz and Tirin Moore. “Eye movement preparation modulates neuronal responses inarea V4 when dissociated from attentional demands”. In: Neuron 83.2 (2014), pp. 496–506.

[16] David Sussillo and Omri Barak. “Opening the black box: low-dimensional dynamics in high-dimensionalrecurrent neural networks”. In: Neural Computation 25.3 (2013), pp. 626–649.

[17] Brian J White et al. “Superior colliculus neurons encode a visual saliency map during free viewing ofnatural dynamic video”. In: Nature Communications 8 (2017), p. 14263.

[18] Mingshan Xue, Bassam V Atallah, and Massimo Scanziani. “Equalizing excitation-inhibition ratiosacross visual cortical neurons”. In: Nature 511.7511 (2014), p. 596.

[19] Alexandre Zénon and Richard J Krauzlis. “Attention deficits without cortical neuronal deficits”. In:Nature 489.7416 (2012), p. 434.

5

https://www.tensorflow.org/

Documents

A biologically-inspired sparse, topographic recurrent neural … · 2020. 12. 18. · A biologically-inspired sparse, topographic recurrent neural network model for robust change