Studying Visual Attention with the Visual Search Paradigm Marc Pomplun

Studying Visual Attention with the

Visual Search Paradigm

Marc Pomplun

Department of Computer ScienceUniversity of Massachusetts at Boston

E-mail: [email protected]:

http://www.cs.umb.edu/~marc/

Studying Visual Attention with the

Visual Search ParadigmOverview:

• The Feature Integration Theory• Visual Search• The Guided Search Theory• The Area Activation Model

The Binding Problem

• Different features of the visual scene are coded by separate systems– e.g., direction of motion, location,

color and orientation

• How do we know this?– Anatomical & neurophysiological

evidence– Brain Imaging (fMRI & PET)

• So how do we experience a coherent

world?

Feature Integration Theory (Treisman et al)

• Attention is used to bind features together

• Code one object at a time on the basis of its location

• Bind together whatever features are attended at that location

Feature Integration Theory

• Sensory “features” (color, size, orientation, etc) are coded in parallel by specialized modules

• Modules form two kinds of “maps”– Feature maps (e.g., color maps,

orientation maps etc.)– A master map of locations


• Feature maps contain two kinds of information:

- presence of a feature anywhere in the field (“there’s something red out there”)

- implicit spatial information about the feature

• Activity in the feature maps can tell us which features are contained in the visual scene.

• It cannot tell us which other features the “green blob” has.

• The master map codes the location of features.

Feature Integration TheoryThe basic idea of the FIT is that visual attention is used for • Locating features• Binding appropriate features together

There are two stages of object perception:• Preattentive stage: Individual features are extracted in parallel across the whole visual scene. • Attentive stage: When attention is directed to a location, the local features are combined to form a whole.


• Attention moves within the location

map• Focus of attention selects whatever

features are linked to that location

• Features of other objects are excluded

• Attended features are then entered into the current temporary object representation


Empirical evidence for the FIT has been obtained through

• Visual search tasks

• Illusory conjunctions

We will focus on the paradigm of visual search.

Visual Search

Feature Search

• Is there a red T in the display?

TT

T

T

T

T

T

TT

T T• Target defined by a single feature• According to

FIT, this should not demand attention• Target should “pop out”

Conjunction Search

• Is there a red T in the display?

X

T

TT

X

T

TX

T TX

X

T

T

• Target is now defined by its shape and color

• This involves binding features and so should demand attention

• Need to attend to each item until target is found

Feature SearchChanging the number of distractors:

TT

T

T

T

T

T

TT

T T

TT

T

T

T

T

T

TT

TT

T

TT T

TT

TT

T

T T

T

TT

T

T T T

T

Conjunction SearchChanging the number of distractors:

XTT

T TX

X

T

X

X

T

T T

X

T

TX

T TX

X

T

TX

XX

T

T

T

TT

T

T

T

T

TX

X

X XX

XX

X

X

X

Visual Search Experiments

• Record time taken to determine whether target is present or not

• Vary the number of distractors

• Search for features should be independent of the number of distractors

• Conjunction search should get slower with more distractors

Visual Search

0

500

1000

1500

2000

2500

3000

1 5 15 30Display Size

Feature Target

Conjunction Target

• Conjunction targets demand serial search

significant slope

• Feature targets pop out

flat display size function

Problem with FIT: Pop-Out of Conjunction

Targets

• A moving X pops out of a display of moving O’s and static X’s

O

O OO

OX

X

X

XX• Target is defined by a conjunction of movement and form• At least some conjunctions do not require focal attention

Guided Search Theory

The Guided Search Theory (GST) is similar to the FIT in that it also assumes two subsequent stages of visual search performance:• a preattentive, parallel stage• an attentive, serial stageHowever, the main difference to FIT is that GST assumes the preattentive stage to obtain spatial saliency information that is used to guide attention in the serial stage.


According to GST, saliency is encoded in an additional map, called the saliency map.The saliency map is created during the preattentive stage and can combine multiple features if necessary.In the subsequent serial search process, attention is first directed to the highest “peak” in the saliency map, then to the second-highest, and so on.This visual guidance allows efficient search even for some conjunction targets.


Support for the GST comes from eye-movement research.Eye-movement recording allows researchers to determine the items that a subject looks at during visual search.



In the previous example, • 80% of fixations were closest to an item sharing color with the target,• 20% of fixations were closest to an item sharing orientation with the target.It seems that the color dimension is guiding the subject’s visual search process.Of course, due to imprecision of eye movements and their measurement, better statistics are necessary to determine the guiding dimension.


In visual search tasks, subjects are usually guided by one target feature or a combination of target features.This supports the idea of GST that preattentively derived information from multiple dimensions guides and thereby facilitates the subsequent serial search process.

Guided Search TheoryThere are two problems with GST:• According to GST, grouping the guiding distractors should result in reduced guidance (less bottom-up activation). However, the opposite happens.• There is no quantitative implementation of a Guided Search model that could predict guidance, i.e., saccadic selectivity for a given search task.

To overcome these problems, we proposed the Area Activation Model of saccadic selectivity in visual search tasks.

Area Activation

Assumptions:• Processing resources during a fixation

are distributed like a two-dimensional Gaussian function centered at fixation.

• Fixation positions are chosen to allow a maximum of information processing according to the assumed processing resources.

• Scan paths are chosen in such a way that they connect the optimal fixation positions with minimal eye-movement cost (path length).

Area Activation - Strong Guidance

Area Activation - Strong Guidance

Area Activation - Weak Guidance

Area Activation - Weak Guidance

Area Activation - Empirical Results

Area Activation

Problems with the Area Activation Model:

• Empirical number of fixations per trial needs to be known in advance.

• Only very basic factors influencing visual search have been implemented so far.

Nevertheless, Area Activation can be considered a very first step towards a quantitative model of visual search.

ConclusionsWe have discussed how the visual search paradigm can be employed to investigate the mechanisms of visual attention.

Various models of attention have been developed and evaluated with visual search tasks; in more recent studies, this was done based on eye-movement data.

In the next lecture, we will look at slightly different paradigms, which are aimed at identifying factors that determine visual scan paths.

See you then!

Documents

Studying Visual Attention with the Visual Search Paradigm Marc Pomplun