Neural Models of Visual Attention John K. Tsotsos Center for Vision Research York University, Toronto, Canada Marc Pomplun Department of Computer Science

Neural Models of Visual AttentionNeural Models of Visual Attention

John K. Tsotsos

Center for Vision ResearchYork University, Toronto, Canada

Marc Pomplun

Department of Computer ScienceDepartment of Computer ScienceUniversity of Massachusetts at BostonUniversity of Massachusetts at Boston

Müller (1873) Exner (1894) Wundt (1902) Pillsbury (1908)Müller (1873) Exner (1894) Wundt (1902) Pillsbury (1908)Broadbent 1958 (Early Selection)Broadbent 1958 (Early Selection)Deutsch, Deutsch & Norman 1963/68 (Late Selection)Deutsch, Deutsch & Norman 1963/68 (Late Selection)Treisman 1964 Treisman 1964 Milner 1974 Milner 1974 **Grossberg 1976+ (Adaptive Resonance Theory) Grossberg 1976+ (Adaptive Resonance Theory) **Treisman & Gelade 1980 (Feature Integration Theory)Treisman & Gelade 1980 (Feature Integration Theory)von der Malsburg 1981+ (Correlation Theory) von der Malsburg 1981+ (Correlation Theory) **Crick 1984 Crick 1984 **Koch and Ullman 1985Koch and Ullman 1985Anderson and Van Essen 1987 (Shifter Circuits) Anderson and Van Essen 1987 (Shifter Circuits) **Sandon 1989 Sandon 1989 ‡‡Wolfe et al. 1989+ (Guided Search 1.0, 2.0. 3.0)Wolfe et al. 1989+ (Guided Search 1.0, 2.0. 3.0)Phaf, Van der Heijden, Hudson 1990 (SLAM)Phaf, Van der Heijden, Hudson 1990 (SLAM)Tsotsos et al. 1990+ (Selective Tuning) Tsotsos et al. 1990+ (Selective Tuning) * * ‡‡Mozer 1991 (MORSEL)Mozer 1991 (MORSEL)Ahmad 1991 (VISIT) Ahmad 1991 (VISIT) **Olshausen, Anderson & Van Essen 1993 Olshausen, Anderson & Van Essen 1993 * * ‡‡Niebur, Koch et al. 1993+ Niebur, Koch et al. 1993+ **Desimone & Duncan 1995 (Biased Competition) Desimone & Duncan 1995 (Biased Competition) **Postma 1995 (SCAN) Postma 1995 (SCAN) ** ‡‡Schneider 1995 (VAM) Schneider 1995 (VAM) **LaBerge 1995 LaBerge 1995 **Itti & Koch 1998 Itti & Koch 1998 ‡‡Cave et al. 1999 (FeatureGate)Cave et al. 1999 (FeatureGate)

Theories/ModelsTheories/ModelsTheories/ModelsTheories/ModelsThe number of models that The number of models that address the neurobiology address the neurobiology of visual attention is small of visual attention is small ((** in the list). The number in the list). The number that have real that have real computational tests on computational tests on actual images is even actual images is even smaller (smaller (‡‡ in the list). in the list). However, many relevant However, many relevant ideas have appeared in ideas have appeared in psychological models.psychological models.

A selected historical A selected historical perspective on the ideas perspective on the ideas important to the modelling important to the modelling task appears in the task appears in the following slides.following slides.

Models of visual attention need to include solutions to or exhibit Models of visual attention need to include solutions to or exhibit observed neurobiological/psychophysical performance for:observed neurobiological/psychophysical performance for:

computational complexity of visual processescomputational complexity of visual processes

information routing through the processing hierarchyinformation routing through the processing hierarchy

attentional controlattentional control

time course of attentive modulationtime course of attentive modulation

single cell attentive modulationsingle cell attentive modulation

attentive modulation in (apparently) all visual areasattentive modulation in (apparently) all visual areas

suppressive surround effectssuppressive surround effects

serial/”parallel” visual search performanceserial/”parallel” visual search performance

binding of features to objectsbinding of features to objects

IssuesIssuesIssuesIssues

Format of OverviewFormat of Overview

Not all models are included, only those that have historicalNot all models are included, only those that have historical importance or that claim neuro-psycho relevance importance or that claim neuro-psycho relevance

Due to space and time limits, each model is described only with:Due to space and time limits, each model is described only with:1. key references1. key references2. key ideas2. key ideas3. neurobiological relationship (where possible)3. neurobiological relationship (where possible)

( ( √ √ has supporting evidence has supporting evidence XX does not have supporting evidence does not have supporting evidence

?? open question) open question)

Note that this can only be regarded as a partial review!Note that this can only be regarded as a partial review!

Koch and Ullman 1985Koch and Ullman 1985Koch, C., Ullman, S. (1985). Shifts in selective visual attention: Towards the underlying Koch, C., Ullman, S. (1985). Shifts in selective visual attention: Towards the underlying neural circuitry, neural circuitry, Human Neurobiology 4Human Neurobiology 4, 219-227., 219-227.

Koch and Ullman 1985Koch and Ullman 1985Koch, C., Ullman, S. (1985). Shifts in selective visual attention: Towards the underlying Koch, C., Ullman, S. (1985). Shifts in selective visual attention: Towards the underlying neural circuitry, neural circuitry, Human Neurobiology 4Human Neurobiology 4, 219-227., 219-227.

Key ideas:Key ideas:- saliency map (Treisman’s map) - saliency map (Treisman’s map) ??- winner-take-all competition - winner-take-all competition √ √ (Findlay 1996, Lee et al. 1999)(Findlay 1996, Lee et al. 1999) - WTA selects items to route to - WTA selects items to route to central representation central representation XX- inhibition of return for shifts - inhibition of return for shifts ??- time to move attention proportional- time to move attention proportionalto logarithmic in distance between to logarithmic in distance between stimuli stimuli X (Krose & Julesz 1989) X (Krose & Julesz 1989) - no single cell modulations - no single cell modulations XX

Anderson and Van Essen 1987 Shifter CircuitsAnderson and Van Essen 1987 Shifter CircuitsAnderson, C., Van Essen, D. (1987). Shifter Circuits: a computational strategy for dynamic Anderson, C., Van Essen, D. (1987). Shifter Circuits: a computational strategy for dynamic aspects of visual processing, aspects of visual processing, Proc. Natl. Academy Sci. USA 84Proc. Natl. Academy Sci. USA 84: 6297-6301.: 6297-6301.

Anderson and Van Essen 1987 Shifter CircuitsAnderson and Van Essen 1987 Shifter CircuitsAnderson, C., Van Essen, D. (1987). Shifter Circuits: a computational strategy for dynamic Anderson, C., Van Essen, D. (1987). Shifter Circuits: a computational strategy for dynamic aspects of visual processing, aspects of visual processing, Proc. Natl. Academy Sci. USA 84Proc. Natl. Academy Sci. USA 84: 6297-6301.: 6297-6301.

Key ideas:Key ideas:- information routing is accomplished by simple shifting circuits starting in - information routing is accomplished by simple shifting circuits starting in the LGN and input layers of primate visual area V1 the LGN and input layers of primate visual area V1 XX- realignment is based on the preservation of spatial relationships - realignment is based on the preservation of spatial relationships - stages linked by diverging excitatory inputs. - stages linked by diverging excitatory inputs. - direction of shift by inhibitory neurons that selectively suppress sets of - direction of shift by inhibitory neurons that selectively suppress sets of ascending inputs. ascending inputs. - stages are grouped into small and large scale shifts. - stages are grouped into small and large scale shifts. - control comes from pulvinar - control comes from pulvinar ??

Tsotsos 1990+ Selective Tuning ModelTsotsos 1990+ Selective Tuning Model

Tsotsos, J.K., Analyzing Vision at the Complexity Level, Tsotsos, J.K., Analyzing Vision at the Complexity Level, Behavioral and Brain Sciences Behavioral and Brain Sciences 13-3 13-3, p423 - 445, 1990., p423 - 445, 1990.Tsotsos, J.K. (1993). An Inhibitory Beam for Attentional Selection, in Tsotsos, J.K. (1993). An Inhibitory Beam for Attentional Selection, in Spatial Vision in Humans Spatial Vision in Humans and Robots and Robots, ed. by L. Harris and M. Jenkin, p313 - 331, Cambridge University Press. , ed. by L. Harris and M. Jenkin, p313 - 331, Cambridge University Press. Tsotsos, J.K., Culhane, S., Wai, W., Lai, Y., Davis, N., Nuflo, F. (1995). Modeling visual attention Tsotsos, J.K., Culhane, S., Wai, W., Lai, Y., Davis, N., Nuflo, F. (1995). Modeling visual attention via selective tuning, via selective tuning, Artificial IntelligenceArtificial Intelligence 78(1-2),78(1-2),p 507 - 547.p 507 - 547.Tsotsos, J.K. (1995). Towards a Computational Model of Visual Attention, in Tsotsos, J.K. (1995). Towards a Computational Model of Visual Attention, in Early Vision and Early Vision and BeyondBeyond, ed. by T. Papathomas, C, Chubb, A. Gorea, E. Kowler, MIT Press/Bradford Books, , ed. by T. Papathomas, C, Chubb, A. Gorea, E. Kowler, MIT Press/Bradford Books, p207 - 218. p207 - 218.Tsotsos, J.K., Culhane, S., Cutzu, F., From Theoretical Foundations to a Hierarchical Circuit for Tsotsos, J.K., Culhane, S., Cutzu, F., From Theoretical Foundations to a Hierarchical Circuit for Selective Attention, Selective Attention, Visual Attention and Cortical CircuitsVisual Attention and Cortical Circuits, ed. by J. Braun, C. Koch & J. Davis, , ed. by J. Braun, C. Koch & J. Davis, MIT Press (in press). MIT Press (in press).

Tsotsos 1990+ Selective Tuning ModelTsotsos 1990+ Selective Tuning Model

Tsotsos, J.K., Analyzing Vision at the Complexity Level, Tsotsos, J.K., Analyzing Vision at the Complexity Level, Behavioral and Brain Sciences Behavioral and Brain Sciences 13-3 13-3, p423 - 445, 1990., p423 - 445, 1990.Tsotsos, J.K. (1993). An Inhibitory Beam for Attentional Selection, in Tsotsos, J.K. (1993). An Inhibitory Beam for Attentional Selection, in Spatial Vision in Humans Spatial Vision in Humans and Robots and Robots, ed. by L. Harris and M. Jenkin, p313 - 331, Cambridge University Press. , ed. by L. Harris and M. Jenkin, p313 - 331, Cambridge University Press. Tsotsos, J.K., Culhane, S., Wai, W., Lai, Y., Davis, N., Nuflo, F. (1995). Modeling visual attention Tsotsos, J.K., Culhane, S., Wai, W., Lai, Y., Davis, N., Nuflo, F. (1995). Modeling visual attention via selective tuning, via selective tuning, Artificial IntelligenceArtificial Intelligence 78(1-2),78(1-2),p 507 - 547.p 507 - 547.Tsotsos, J.K. (1995). Towards a Computational Model of Visual Attention, in Tsotsos, J.K. (1995). Towards a Computational Model of Visual Attention, in Early Vision and Early Vision and BeyondBeyond, ed. by T. Papathomas, C, Chubb, A. Gorea, E. Kowler, MIT Press/Bradford Books, , ed. by T. Papathomas, C, Chubb, A. Gorea, E. Kowler, MIT Press/Bradford Books, p207 - 218. p207 - 218.Tsotsos, J.K., Culhane, S., Cutzu, F., From Theoretical Foundations to a Hierarchical Circuit for Tsotsos, J.K., Culhane, S., Cutzu, F., From Theoretical Foundations to a Hierarchical Circuit for Selective Attention, Selective Attention, Visual Attention and Cortical CircuitsVisual Attention and Cortical Circuits, ed. by J. Braun, C. Koch & J. Davis, , ed. by J. Braun, C. Koch & J. Davis, MIT Press (in press). MIT Press (in press).

neuron ‘sees’ thisneuron ‘sees’ thisreceptive fieldreceptive field

subject ‘attends’subject ‘attends’to single itemto single item

Key ideas:Key ideas:- attention modulates neurons to earliest levels; wherever there is a - attention modulates neurons to earliest levels; wherever there is a many-to-one mapping many-to-one mapping √√ - signal interference controlled by surround inhibition - signal interference controlled by surround inhibition throughout processing network throughout processing network - task knowledge biases computations throughout processing network- task knowledge biases computations throughout processing network- inhibition of connections not units - inhibition of connections not units √ √ Hernandez-Peon, Scherrer, Jouvet (1956)Hernandez-Peon, Scherrer, Jouvet (1956)- attentional control is local, distributed and internal- attentional control is local, distributed and internal- competition is based on WTA - competition is based on WTA (different form than previous models) (different form than previous models) - pyramid representation with reciprocal - pyramid representation with reciprocal convergence and divergenceconvergence and divergence √ √ Salin &Bullier(1995)Salin &Bullier(1995)

attentional spotlight

effective receptive field of selected unit in unattended case

layers of input abstraction hierarchy

inhibitory attentional beam

"pass" zone

"inhibit" zone

The basic idea (BBS 1990)

not the samenot the sameas von as von derMalsburg derMalsburg - only - only connectionsconnectionsleading to leading to interferenceinterferenceare inhibited; are inhibited; other other unattended unattended ones left ones left alonealone

processingpyramid

inhibited pathways

passpathways

unit of interestat top

input

√√ Caputo & Guerra 1998Caputo & Guerra 1998Bahcall & Kowler 1999Bahcall & Kowler 1999Vanduffel, Tootell, Orban 2000Vanduffel, Tootell, Orban 2000Smith et al. 2000Smith et al. 2000

√√ Kastner, De Weerd, Kastner, De Weerd, Desimone, Ungerleider, Desimone, Ungerleider, 19981998

top-down, coarse-to-finetop-down, coarse-to-fine WTA hierarchy for WTA hierarchy for incremental selection andincremental selection and localizationlocalization

unselected connections are unselected connections are inhibitedinhibited

WTA achieved WTA achieved through local through local gating networksgating networks

Hierarchical Winner-Take-All Hierarchical Winner-Take-All Hierarchical Winner-Take-All Hierarchical Winner-Take-All

Simulation

unit and connectionin the interpretive network

unit and connectionin the gating network

unit and connectionin the top-down bias network

B+1,k

U +1, k

I,k

-1,j

,k,jG

g,kb,k

M ,k

I+1,x

}

layer +1

layer -1

layer

I

Selection Selection CircuitsCircuits

Search for Blue RegionsSearch for Blue RegionsSearch for Blue RegionsSearch for Blue Regions

PredictionsPredictionsPredictionsPredictionsfrom 1990 paper:from 1990 paper:

attention in all visual areas, down to earliestattention in all visual areas, down to earliest

competition can be biased by task competition can be biased by task

inhibition of unselected connections within beaminhibition of unselected connections within beam

inhibitory surround impairs perception around attended item inhibitory surround impairs perception around attended item

distractor effects depend on distractor-target separation distractor effects depend on distractor-target separation

Olshausen, Anderson & Van Essen 1993Olshausen, Anderson & Van Essen 1993Olshausen, B., et al. (1993). A neurobiological model of visual attention and invariant patternOlshausen, B., et al. (1993). A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information, recognition based on dynamic routing of information, J. of Neuroscience, 13(1):J. of Neuroscience, 13(1):4700-4719.4700-4719.

Olshausen, Anderson & Van Essen 1993Olshausen, Anderson & Van Essen 1993Olshausen, B., et al. (1993). A neurobiological model of visual attention and invariant patternOlshausen, B., et al. (1993). A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information, recognition based on dynamic routing of information, J. of Neuroscience, 13(1):J. of Neuroscience, 13(1):4700-4719.4700-4719.

Key ideas:Key ideas:- implementation of shifter circuits- implementation of shifter circuits- forms position and scale invariant representations at the output layer - forms position and scale invariant representations at the output layer XX- control neurons, originating in the pulvinar, dynamically modify synaptic - control neurons, originating in the pulvinar, dynamically modify synaptic weights of intracortical connections to achieve routing weights of intracortical connections to achieve routing ??- the topography of the selected portion of the visual field is preserved - the topography of the selected portion of the visual field is preserved - uses Koch & Ullman mechanism (luminance saliency only) for selection - uses Koch & Ullman mechanism (luminance saliency only) for selection - associative recognition - associative recognition at output layerat output layer

Olshausen seeks to achieveOlshausen seeks to achievetranslation-rotation invariant translation-rotation invariant recognitionrecognition

only attended item reaches output layer

Itti 1998Itti 1998Itti, L., Koch, C., Niebur, E. (1998). A model for saliency-based visual attention for rapid Itti, L., Koch, C., Niebur, E. (1998). A model for saliency-based visual attention for rapid scene analysis, scene analysis, IEEE Trans. Pattern Analysis and Machine Intelligence 20IEEE Trans. Pattern Analysis and Machine Intelligence 20 , 1254-1259., 1254-1259.

Itti 1998Itti 1998Itti, L., Koch, C., Niebur, E. (1998). A model for saliency-based visual attention for rapid Itti, L., Koch, C., Niebur, E. (1998). A model for saliency-based visual attention for rapid scene analysis, scene analysis, IEEE Trans. Pattern Analysis and Machine Intelligence 20IEEE Trans. Pattern Analysis and Machine Intelligence 20 , 1254-1259., 1254-1259.

Key ideas:Key ideas:- a newer implementation of Koch and Ullman’s scheme- a newer implementation of Koch and Ullman’s scheme- fast and parallel pre-attentive extraction of visual features across 50 spatial - fast and parallel pre-attentive extraction of visual features across 50 spatial maps (for orientation, intensity and color, at six spatial scales)maps (for orientation, intensity and color, at six spatial scales)- features are computed using linear filtering and center-surround structures - features are computed using linear filtering and center-surround structures - these features form a saliency map - these features form a saliency map ??- Winner-Take-All neural network to select the most conspicuous image - Winner-Take-All neural network to select the most conspicuous image locationlocation- inhibition-of-return mechanism to generate attentional shifts - inhibition-of-return mechanism to generate attentional shifts - saliency map topographically encodes for the local conspicuity in the visual - saliency map topographically encodes for the local conspicuity in the visual scene, and controls where the focus of attention is currently deployedscene, and controls where the focus of attention is currently deployed

ConclusionsConclusions

Several ideas have endured:Several ideas have endured:

Winner-Take-All for selection (competition)Winner-Take-All for selection (competition)

HierarchiesHierarchies

Inhibition of return to force serial searchInhibition of return to force serial search

Some kind of ‘gating’ processSome kind of ‘gating’ process

Inhibitory surroundsInhibitory surrounds

However, modeling seems to be still in its early daysHowever, modeling seems to be still in its early days

Progress will depend on whether modelers and experimenters can Progress will depend on whether modelers and experimenters can work togetherwork together

Documents

Neural Models of Visual Attention John K. Tsotsos Center for Vision Research York University, Toronto, Canada Marc Pomplun Department of Computer Science