11
Multisensory Cues Capture Spatial Attention Regardless of Perceptual Load Valerio Santangelo University of Oxford and University of Rome “La Sapienza” Charles Spence University of Oxford We compared the ability of auditory, visual, and audiovisual (bimodal) exogenous cues to capture visuo-spatial attention under conditions of no load versus high perceptual load. Participants had to discriminate the elevation (up vs. down) of visual targets preceded by either unimodal or bimodal cues under conditions of high perceptual load (in which they had to monitor a rapidly presented central stream of visual letters for occasionally presented target digits) or no perceptual load (in which the central stream was replaced by a fixation point). The results of 3 experiments showed that all 3 cues captured visuo-spatial attention in the no-load condition. By contrast, only the bimodal cues captured visuo-spatial attention in the high-load condition, indicating for the first time that multisensory integration can play a key role in disengaging spatial attention from a concurrent perceptually demanding stimulus. Keywords: multisensory integration, audiovisual, attentional capture, exogenous, spatial cuing In the last decade, several studies have attempted to investigate the nature of the interaction between selective attention and per- ceptual load (see Lavie, 2005, for a review). However, the relation between the exogenous orienting of spatial attention and percep- tual load (as well as other kinds of load, such as working memory load; e.g., Santangelo & Spence, 2006) has only recently been examined. This is a particularly important area for research in the context of any assessment of the genuine automaticity of exoge- nous spatial orienting (see Santangelo & Spence, 2007) and one that constitutes the principal aim of the present study. In particular, we wanted to investigate the role that multisensory integration plays in capturing spatial attention as a function of any concurrent increase of perceptual load. Are peripheral onsets capable of reflexively capturing spatial attention when a person’s attentional resources are already en- gaged in another perceptually demanding task? This question was recently addressed by Santangelo, Olivetti Belardinelli, and Spence (2007). They conducted two experiments in which the orienting of spatial attention was measured by means of an orthog- onal spatial cuing task (see Spence, McDonald, & Driver, 2004). The participants were instructed to discriminate as rapidly and accurately as possible the elevation (up vs. down) of a visual target cued shortly beforehand by a spatially nonpredictive peripheral (visual or auditory) stimulus whose location varied on an indepen- dent (i.e., orthogonal) dimension (i.e., left vs. right). Typically, participants’ target discrimination responses in orthogonal spatial cuing studies are faster and more accurate when the elevation discrimination is carried out on the cued as opposed to the uncued side. This result has been taken to indicate the beneficial effect of the shift of spatial attention toward the location in which the cue was presented (see Spence et al., 2004, for a review). In Santangelo et al.’s (2007) study, the spatial orienting of participants’ attention was either measured alone or under condi- tions of concurrent increased perceptual load, provided by the rapid sequential visual (or auditory) presentation (RSV[A]P) of a stream of alphanumeric characters from fixation. The participants in Santangelo et al.’s study had to monitor the RSV(A)P stream in order to detect any digits that might be presented in the stream. In the high perceptual load condition, the participants performed both the orthogonal cuing task and the RSV(A)P task within the same block of experimental trials. By contrast, in the medium-load condition, the participants were only required to perform the orthogonal cuing task, although the RSV(A)P stream was also presented (but no response was required). Finally, in the no-load condition, only the orthogonal cuing task was presented and re- sponded to (i.e., this constituted a baseline condition that did not involve any concurrent increase of perceptual load relative to performing the spatial cuing task by itself). Santangelo et al. argued that if exogenous spatial attentional orienting is influenced by the adoption by participants of a focused attentional state on the central perceptually demanding task, one would expect to find a suppression of cuing effects (i.e., no significant difference between performance on the cued vs. uncued trials) under high- and medium-load conditions. Otherwise, one should expect to find comparable performance (i.e., significant cuing effects) in all three conditions. Santangelo et al. (2007) observed the same pattern of results when they varied both the sensory modality of the intramodal Editor’s Note. Alan Kingstone served as the action editor for this arti- cle.—GWH Valerio Santangelo, Department of Experimental Psychology, Univer- sity of Oxford, Oxford, England, and Department of Psychology, Univer- sity of Rome “La Sapienza,” Rome, Italy; Charles Spence, Department of Experimental Psychology, University of Oxford. This study was supported by a postdoctoral research grant awarded to Valerio Santangelo by the Faculty of Psychology, University of Rome “La Sapienza.” Correspondence concerning this article should be addressed to Valerio Santangelo, Department of Psychology, University of Rome “La Sapienza,” via dei Marsi 78, 00185, Roma, Italy. E-mail: [email protected] Journal of Experimental Psychology: Copyright 2007 by the American Psychological Association Human Perception and Performance 2007, Vol. 33, No. 6, 1311–1321 0096-1523/07/$12.00 DOI: 10.1037/0096-1523.33.6.1311 1311

Multisensory cues capture spatial attention regardless of perceptual load

  • Upload
    unipg

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Multisensory Cues Capture Spatial Attention Regardlessof Perceptual Load

Valerio SantangeloUniversity of Oxford and University of Rome “La Sapienza”

Charles SpenceUniversity of Oxford

We compared the ability of auditory, visual, and audiovisual (bimodal) exogenous cues to capturevisuo-spatial attention under conditions of no load versus high perceptual load. Participants had todiscriminate the elevation (up vs. down) of visual targets preceded by either unimodal or bimodal cuesunder conditions of high perceptual load (in which they had to monitor a rapidly presented central streamof visual letters for occasionally presented target digits) or no perceptual load (in which the central streamwas replaced by a fixation point). The results of 3 experiments showed that all 3 cues capturedvisuo-spatial attention in the no-load condition. By contrast, only the bimodal cues captured visuo-spatialattention in the high-load condition, indicating for the first time that multisensory integration can play akey role in disengaging spatial attention from a concurrent perceptually demanding stimulus.

Keywords: multisensory integration, audiovisual, attentional capture, exogenous, spatial cuing

In the last decade, several studies have attempted to investigatethe nature of the interaction between selective attention and per-ceptual load (see Lavie, 2005, for a review). However, the relationbetween the exogenous orienting of spatial attention and percep-tual load (as well as other kinds of load, such as working memoryload; e.g., Santangelo & Spence, 2006) has only recently beenexamined. This is a particularly important area for research in thecontext of any assessment of the genuine automaticity of exoge-nous spatial orienting (see Santangelo & Spence, 2007) and onethat constitutes the principal aim of the present study. In particular,we wanted to investigate the role that multisensory integrationplays in capturing spatial attention as a function of any concurrentincrease of perceptual load.

Are peripheral onsets capable of reflexively capturing spatialattention when a person’s attentional resources are already en-gaged in another perceptually demanding task? This question wasrecently addressed by Santangelo, Olivetti Belardinelli, andSpence (2007). They conducted two experiments in which theorienting of spatial attention was measured by means of an orthog-onal spatial cuing task (see Spence, McDonald, & Driver, 2004).The participants were instructed to discriminate as rapidly andaccurately as possible the elevation (up vs. down) of a visual target

cued shortly beforehand by a spatially nonpredictive peripheral(visual or auditory) stimulus whose location varied on an indepen-dent (i.e., orthogonal) dimension (i.e., left vs. right). Typically,participants’ target discrimination responses in orthogonal spatialcuing studies are faster and more accurate when the elevationdiscrimination is carried out on the cued as opposed to the uncuedside. This result has been taken to indicate the beneficial effect ofthe shift of spatial attention toward the location in which the cuewas presented (see Spence et al., 2004, for a review).

In Santangelo et al.’s (2007) study, the spatial orienting ofparticipants’ attention was either measured alone or under condi-tions of concurrent increased perceptual load, provided by therapid sequential visual (or auditory) presentation (RSV[A]P) of astream of alphanumeric characters from fixation. The participantsin Santangelo et al.’s study had to monitor the RSV(A)P stream inorder to detect any digits that might be presented in the stream. Inthe high perceptual load condition, the participants performed boththe orthogonal cuing task and the RSV(A)P task within the sameblock of experimental trials. By contrast, in the medium-loadcondition, the participants were only required to perform theorthogonal cuing task, although the RSV(A)P stream was alsopresented (but no response was required). Finally, in the no-loadcondition, only the orthogonal cuing task was presented and re-sponded to (i.e., this constituted a baseline condition that did notinvolve any concurrent increase of perceptual load relative toperforming the spatial cuing task by itself). Santangelo et al.argued that if exogenous spatial attentional orienting is influencedby the adoption by participants of a focused attentional state on thecentral perceptually demanding task, one would expect to find asuppression of cuing effects (i.e., no significant difference betweenperformance on the cued vs. uncued trials) under high- andmedium-load conditions. Otherwise, one should expect to findcomparable performance (i.e., significant cuing effects) in all threeconditions.

Santangelo et al. (2007) observed the same pattern of resultswhen they varied both the sensory modality of the intramodal

Editor’s Note. Alan Kingstone served as the action editor for this arti-cle.—GWH

Valerio Santangelo, Department of Experimental Psychology, Univer-sity of Oxford, Oxford, England, and Department of Psychology, Univer-sity of Rome “La Sapienza,” Rome, Italy; Charles Spence, Department ofExperimental Psychology, University of Oxford.

This study was supported by a postdoctoral research grant awarded toValerio Santangelo by the Faculty of Psychology, University of Rome “LaSapienza.”

Correspondence concerning this article should be addressed to ValerioSantangelo, Department of Psychology, University of Rome “La Sapienza,”via dei Marsi 78, 00185, Roma, Italy. E-mail: [email protected]

Journal of Experimental Psychology: Copyright 2007 by the American Psychological AssociationHuman Perception and Performance2007, Vol. 33, No. 6, 1311–1321

0096-1523/07/$12.00 DOI: 10.1037/0096-1523.33.6.1311

1311

cuing task (either visual or auditory) and the modality of thecentral attention-demanding stream (RSVP or RSAP)—namely,the suppression of both unimodal visual and unimodal auditorycuing effects whenever the central stream was presented (i.e., inboth the medium- and high-load conditions). A similar result wasalso observed in a follow-up study involving the exogenous ori-enting of tactile spatial attention (i.e., using both tactile cue andtarget stimuli; Santangelo & Spence, in press). The fact thatunimodal spatial cuing effects were eliminated under conditions ofboth unimodal (see Santangelo et al., 2007, Experiments 1A and2B) and crossmodal stimulation (Experiments 1B and 2A; Santan-gelo & Spence, in press) is consistent with the view that auditory,visual, and tactile exogenous spatial orienting may be controlledby a common (i.e., shared, or at least linked) underlying neuralsubstrate (Santangelo, Van der Lubbe, Olivetti Belardinelli, &Postma, 2006; Spence et al., 2004) and neural correlates (seeEimer, 2004; Macaluso & Driver, 2004, for reviews).

The Potential Role of Multisensory Cues

Although Santangelo and colleagues’ (2007) results suggested amodulation of exogenous orienting by spatial attention (whichhence appears to be far from truly automatic; Santangelo &Spence, 2007; see also Jonides, 1981; Treisman, 2005), there areseveral reasons to believe that bimodal audiovisual cues might bemore resistant to the elimination of cuing effects when a partici-pant’s resources are otherwise engaged (i.e., when a RSVP streamis presented concurrently; e.g., Lavie, 2005). In fact, there is nowconsiderable evidence that the simultaneous presentation of stimuliin different sensory modalities can result in multisensory interac-tions that are additive or even, on occasion, superadditive (seeCalvert, Spence, & Stein, 2004; Stein & Meredith, 1993, forreviews). For example, in nonhuman mammals, neurons in thesuperior colliculus have been shown to respond to auditory, visual,and tactile stimuli (Stein, Meredith, & Wallace, 1993) and torespond superadditively to spatially and temporally coincidentmultisensory stimulation, especially when it is presented at near-threshold levels (see Holmes & Spence, 2005). In other words,super colliculus neurons can exhibit responses to multisensorystimuli that are significantly higher than the responses evoked bythe sum of the component unimodal stimuli (Wallace, Meredith, &Stein, 1998).

Given these neurophysiological data, one might expect that thepresentation of bimodal audiovisual cues might also result in theoccurrence of more robust spatial cuing effects. In fact, the behav-ioral enhancement of overt spatial orienting following the presen-tation of bimodal compared with unimodal stimuli has been shownby Stein, Meredith, Huneycutt, and McDade, 1989 (see also Stein,Stanford, Wallace, Vaughan, & Jiang, 2004). The cats in theirstudy were trained to fixate directly ahead and then to orientovertly and approach a visual target that suddenly appeared inorder to receive a reward. It is important to note that overtorienting was significantly more accurate when the visual stimuluswas accompanied by a spatially coincident sound relative to whenthe visual stimulus was presented alone (although it is unclear iforienting responses were also executed any more rapidly).

The animal data suggest that near-threshold bimodal stimulimay capture spatial attention more effectively than unimodal stim-uli. This assumption is consistent with the results of several event-

related brain potential studies in humans (e.g., Teder-Salejarvi,McDonald, Di Russo, & Hillyard, 2002) and with the results ofcertain neuroimaging studies (e.g., Calvert, Campbell, & Bram-mer, 2000) that have reported enhanced neural activity for bimodalcompared with unimodal stimulation. It is rather surprising, there-fore, to find that all of the human behavioral studies that have beenconducted to date on this topic have failed to demonstrate anyadvantage for bimodal (relative to unimodal) cues in their capacityto elicit covert exogenous spatial orienting (Santangelo et al.,2006; Spence & Driver, 1999; Ward, 1994; see also Ward, Mc-Donald, & Golestani, 1998). In fact, if anything, combined audi-tory and visual (i.e., bimodal) cues appear to be somewhat lesseffective in capturing people’s attention (i.e., they give rise tonumerically smaller cuing effects; e.g., see Spence & Driver,1999) than are unimodal cues. However, no attempts to use bi-modal stimulation to overcome a focused attentional state havebeen made previously. What’s more, those previous studies thathave examined whether abrupt peripheral onsets are capable ofattracting spatial attention when it is otherwise engaged (e.g.,Berger, Henik, & Rafal, 2005; Mazza, Turatto, Rossi, & Umilta,2007; Muller & Rabbitt, 1989; Theeuwes, 1991; Van der Lubbe &Postma, 2005; Yantis & Jonides, 1990) have presented only uni-modal onsets.

We therefore thought it possible that although any exogenouscuing effects elicited by the presentation of bimodal stimuli mightnot result in enlarged orienting effects per se, they may be moreresistant to suppression than the unimodal auditory, visual, ortactile cuing effects that have been reported previously (Santan-gelo et al., 2007; Santangelo & Spence, in press). That is, bimodalstimuli might be able to disengage attention from another percep-tually demanding task (such as monitoring an RSVP stream) moreeffectively and thus result in the orienting of spatial attentiontoward either side. Such a result would clearly be important froman applied perspective, given the recent growth of interest in usingnonvisual and multisensory cues to capture driver attention (Ho,Reed, & Spence, 2006, in press; Ho & Spence, 2005), as it wouldsuggest that bimodal warning signals might be much more effec-tive than unimodal warning signals in capturing a driver’s atten-tion.

Overview of the Present Study

The aim of the present study was therefore to examine whetherany exogenous cuing effects elicited by bimodal (audiovisual)stimuli would be less affected by perceptual/attentional load ma-nipulations than the spatial cuing effects elicited by unimodalstimuli. To this end, we conducted three experiments. In Experi-ment 1, participants’ spatial attentional orienting following multi-sensory (audiovisual) cues was assessed under conditions of dif-ferent levels of perceptual load. To anticipate our main result, weobserved no elimination of exogenous cuing effects elicited by ouraudiovisual cue regardless of the current level of perceptual load.In Experiment 2, we replicated this finding in an entirely within-participants design, that is, directly comparing the participants’spatial attentional orienting following unimodal (i.e., either audi-tory or visual) or bimodal audiovisual cues under different levelsof perceptual load. Finally, in Experiment 3, we show that cuesconsisting of the combination of either two redundant visual or tworedundant auditory cues were still rendered ineffective under con-

1312 SANTANGELO AND SPENCE

ditions of high perceptual load whereas the audiovisual cues werenot, demonstrating once again that only the multisensory cues wereable to disengage the participants’ spatial attention regardless ofthe concurrent perceptual load.

Experiment 1

Methods

Participants. Data were collected from 23 volunteers fromOxford University who reported normal or corrected-to-normalvision and were naıve as to the purpose of the study, which lastedfor 35 min. The data from one participant were not analyzedbecause his performance fell below 75% correct overall, leaving atotal of 22 participants (9 male; mean age � 23 years, range �19–33 years). The horizontal eye position of the right eye of 10 ofthe participants was monitored using the IRIS eye-movementmeasurement system (Skalar, Cambridge Research Systems,Rochester, England) in order to check their adherence with thecentral fixation instructions (and thus to rule out an overt orientingaccount of any spatial cuing effects observed). An adjustable chinrest was used to minimize participants’ head movements. Theparticipants were instructed to avoid making eye movements (orblinking) during the presentation of the RSVP stream (medium-and high-load conditions) or while the fixation cross was presented(no-load condition).

Apparatus and materials. The visual stimuli were displayedon a light grey background on a 17-in. (43-cm) computer monitor(refresh rate � 60 Hz) located in a dark and silent room. Thedistance between the computer monitor and the participant’s headwas approximately 40 cm. The auditory stimuli were presented bymeans of two loudspeaker cones (7 � 8 cm) located on either sideof the computer monitor. The distance between the central loud-speaker cones (aligned with the middle of the screen) and thecenter of the computer monitor was 25 cm (see Figure 1).

The bimodal cues used in the orthogonal cuing task consisted ofthe presentation of a black rectangle (18 � 12 mm, subtending avisual angle of 2.6° � 1.7°) presented on either the left or rightside of the computer monitor (located 15 cm from the center of thescreen, at an eccentricity of approximately 20°) at the same time asa pure tone (1100 Hz) delivered for 50 ms from the middleloudspeaker on the same side as the rectangle. (The visual andauditory cues were identical to those used by Santangelo et al.,2007, in their unimodal cuing studies.) The visual targets consistedof a black circle (14 mm in diameter, subtending a visual angle of2° � 2°) presented from one of the four corners of the computermonitor (15 cm on the left/right and 11 cm above/below thefixation point). The distractor set in the RSVP task consisted of 17letters (B, C, D, E, F, J, K, L, M, N, P, R, S, T, Y, X, Z), and thetarget set consisted of six digits (2, 3, 4, 5, 6, 9; 23 � 13 mm,subtending a visual angle of 3.3° � 1.9°).

Procedure. Three different conditions of perceptual load(high-, medium-, and no-load) were presented in separate blocks ofexperimental trials, with the order of presentation of the conditionscounterbalanced across participants. The participants were allowedto rest for a few minutes between each block of experimental trials.

In the high-load condition, the trials consisted of the presenta-tion of a stream of 11 alphanumeric characters. Each character waspresented for 100 ms with an ISI of 17 ms (a light grey screen

filled the gaps between the presentation of successive stimuli). Thedistractor letters in the stream were chosen randomly before eachtrial, with the sole restriction that no distractor was repeated withina given stream. The target digit appeared equiprobably in the third,sixth, and ninth positions in the stream, whereas the bimodalspatial cue appeared equiprobably in the third and sixth streampositions, and equiprobably on either side of fixation. When pre-sented, the visual target in the spatial cuing task appeared twopositions after the spatial cue (i.e., in either the fifth or eighthstream positions). The stimulus onset asynchrony (SOA) betweenthe onset of the spatial cue and target stimulus was 233 ms. Thespatial target could either appear on the same (cued trials) oropposite (uncued trials) side as the spatial cue. A target digit waspresented on 67% of the trials, whereas a peripheral target (requir-ing an elevation discrimination response) was presented on theremaining 33% of the trials. Note that target digits in the streamand spatial targets in the orthogonal cuing task were never pre-sented in the same trial.

The participants were informed about the proportion of eachtype of trial that would be presented prior to the start of theexperiment. They were instructed to press one of three buttons ona keypad (as rapidly and accurately as possible) in response toeither the target digit (in the center of the screen) or the spatialtarget (up vs. down discrimination of the target circles), regardlessof its side of presentation. Each block of trials lasted for approx-imately 12 min and included 196 trials, that is, Digit Position (3) �Cue Position (2) � Cue Side (2) � Trial Repetition (11) � 132,67% of trials; Cue Position (2) � Cue Side (2) � Target Position(4) � Trial Repetition (4) � 64, 33% of trials. Before starting theexperiment, the participants completed a 24-trial high-load trainingsession.

The medium-load condition was identical to the high-load con-dition with the sole exception of the particular experimental in-structions given. Although the participants were instructed to payattention to the central stream, they were required only to performthe visual elevation discrimination task, pressing one of two re-sponse buttons to indicate the elevation of each target regardless ofits side of presentation. They were instructed not to respond to anydigit target that might appear in the RSVP stream.

In the no-load condition, the rapid serial visual presentation(RSVP) task was replaced by a fixation point (a 6 x 6 mm cross,i.e., just as in a typical exogenous spatial cuing study; see Spenceet al., 2004, for a review). After a random interval (ranging from300 to 600 ms) starting from the onset of the fixation point, abimodal spatial cue was presented, which in turn was followed(after an SOA of 233 ms) by a spatial target. The participants wereinstructed to maintain their fixation on the central fixation pointand to perform the visual elevation discrimination task as rapidlyand accurately as possible. This block lasted for approximately 5min and included 64 trials, Cue Side (2) � Target Position (4) �Trial Repetition (8).

Design and data analysis. There was one between-participantsfactor, Eye Monitoring (monitored vs. unmonitored), and twowithin-participants factors: Perceptual Load (high-load, medium-load, and no-load) and Cuing (cued vs. uncued trials). Trials inwhich the participants responded in under 100 ms (prematureresponses) or failed to respond within 1,500 ms of target onset(misses) and trials in which participants responded erroneouslywere excluded from the analysis of the RT data. Trials in which a

1313ATTENTIONAL CAPTURE BY MULTISENSORY CUES

horizontal eye movement (greater than 2.5°) occurred or in whichparticipants blinked were also removed from the analysis. The datawere then collapsed across cue side and target position. Meanreaction times (RTs) and error rates derived from the spatial cuingtask were analyzed using a mixed analysis of variance (ANOVA).A separate ANOVA was performed in order to test whether digitposition and cue position affected participants’ performance. ThisANOVA included the between-participants factor of Eye Moni-toring (monitored vs. unmonitored) and the within-participantsfactors of Digit Position (third, sixth, or ninth) and Cue Position(third or sixth). Note that this analysis was only performed on thedata from the high-load condition (given that the participants onlyhad to perform the digit detection task in this condition).

Results

Spatial cuing task. The mean RTs and error rates are displayedin Figure 2. Premature responses and misses occurred only sel-domly (less than 1% of trials overall), as did trials with a horizontaleye movement (or blink; 2.1% of trials). An ANOVA on the RTdata with the factors of Eye Monitoring (2), Perceptual Load (3),and Cuing (2) revealed that the participants responded signifi-cantly more rapidly on cued (M � 412 ms) than on uncued (427ms) target trials overall, F(1, 20) � 29.8, p � .001. Responseswere also faster in the no-load condition (366 ms) than in themedium-load condition (426 ms; p � .001), which, in turn, re-sulted in faster responses than in the high-load condition (467 ms;

G

22.5

cm

30 cm

50 cm

= visual target location

= loudspeaker

= visual cue location

Legend

D

K

M

P

3

M

T

B

R

F

G

22.5

cm

30 cm

50 cm

= visual target location

= loudspeaker

= visual cue location

Legend

= visual target location

= loudspeaker

= visual cue location

Legend

D

K

M

P

3

D

K

M

P

3

D

K

M

P

3

M

T

B

R

F

M

T

B

R

F

M

T

B

R

F

Digit detection(67% of trials)

Target discrimination(33% of trails)

Figure 1. Schematic representation of the experimental set-up in the high-load condition, in which theparticipants had to perform both the digit detection task and the target discrimination task (i.e., the elevationdiscrimination—up vs. down—of a black circle presented in one of the corners of the computer monitor). Aspatially nonpredictive peripheral cue (consisting of the simultaneous presentation of a pure tone and a blackrectangle, presented for 50 ms on the same side) was presented on each trial. The to-be-detected digit waspresented on 67% of the trials, whereas on the remaining trials, the visual target (i.e., the black circle) waspresented. Digit targets and visual elevation targets were always presented on different trials.

1314 SANTANGELO AND SPENCE

p � .001), F(2, 40) � 80.9, p � .001. There was no interactionbetween Eye Monitoring, Perceptual Load, and Cuing, F(2, 40) �1.2, p � .310, thus ruling out an overt orienting account of the data(mean cuing effects for eye movement monitored and unmonitoredparticipants were 12 and 18 ms, respectively: t � �2.908, p �.017, and t � �4.897, p � .001). There was no interactionbetween Perceptual Load and Cuing either, F(2, 40) � 1, ns,indicating that the magnitude of the spatial cuing effect wascomparable across all three conditions.

The only significant term in the analysis of the error data wasthe main effect of Perceptual Load, F(2, 40) � 37.4, p � .001,which, as expected, revealed that the participants found the high-load condition (11%) significantly harder than either the no-load(2%; p � .002) or medium-load (2%; p � .004) conditions.

Digit detection task. Just as in Santangelo et al.’s (2007)recent study, the separate ANOVA on the digit detection datarevealed that participants responded to the target digits signifi-cantly more rapidly when they were presented in the ninth streamposition (M � 445 ms) than when they were presented in either thethird (467 ms; p � .001) or sixth (461 ms; p � .001) positions,F(2, 40) � 5.1, p � .011, while cue position did not affect digitdetection performance, F(1, 20) � 1.4, p � .259, suggesting thatour participants prioritized the digit detection task over their per-formance of the elevation discrimination task (cf. Santangelo et al.,2007). Neither the main effect of eye monitoring, F(1, 20) � 1.6,p � .222, nor the three-way interaction between eye monitoring,perceptual load, and cuing were significant, F(2, 40) � 1, ns, thusruling out an overt orienting account of our data. The ANOVArevealed no other significant terms in the analysis of the RT data,and no significant differences were found in a similar analysis of

the error data (all of the participants performed the digit detectiontask at well over 90% correct).

Discussion

Overall, the results of the RT analysis would appear to suggestthat the reflexive orienting effects elicited by the presentation ofbimodal cues are more resistant to suppression by perceptually/attentionally demanding stimuli/tasks (such as the central presen-tation of the RSVP stream) than the reflexive cuing effects elicitedby the presentation of unimodal cues (note that all relevant aspectsof the experimental procedure and materials used were kept con-stant between the current study and the experiments reported bySantangelo et al., 2007; see Figure 3).1 These results appear toshow that the spatial cuing effects elicited by bimodal (audiovi-sual) cues are more resistant to abolition by simultaneously pre-sented perceptually demanding stimuli/tasks than unimodal (eitherauditory or visual) cues. However, it should be noted that thisconclusion is based on a comparison of the results of two differentstudies: the participants’ performance with bimodal cues in thepresent experiment and the participants’ performance with unimo-dal cues in Santangelo et al.’s (2007) study.

Although all of the relevant procedural features (e.g., the appa-ratus and materials, the intensity of the stimuli used, etc. . .) in thepresent experiment exactly matched those used in Santangelo etal.’s (2007) previous study, a more robust verification for our mainfinding (i.e., that multisensory cues capture visuo-spatial attentionregardless of any concurrent increase in perceptual load) wouldobviously come from a within-participants experimental design, inwhich the cuing effects elicited by unimodal and bimodal cues

1 In order to further examine this issue, we compared the mean spatialcuing effects observed in Santangelo et al.’s (2007) study elicited byunimodal visual (Experiment 1A) and auditory (Experiment 1B) cues(these cues were identical to those used here) with the magnitude ofreflexive cuing effects elicited by bimodal audiovisual cues in the presentstudy (see Figure 3). An ANOVA performed on the RT data with thebetween-participants factor of Cue Type (unimodal vs. bimodal) and thewithin-participants factor of Perceptual Load (no-load, medium-load, andhigh-load) revealed a significant interaction between Cue Type and Con-dition, F(2, 84) � 5.4, p � .006. Post hoc comparisons revealed that themagnitude of the spatial cuing effect in the high-load condition wassignificantly larger following the presentation of bimodal cues than fol-lowing the presentation of unimodal cues ( p � .011). In the no-loadcondition, significantly less spatial cuing was actually observed followingthe presentation of bimodal compared with unimodal cues ( p � .029),although the cuing effects elicited by bimodal cues in the no-load andhigh-load conditions were comparable in magnitude ( p � .944). Nosignificant difference between the spatial cuing effects elicited by bimodaland unimodal cues was observed in the medium-load condition ( p � .970).However, the orienting effects reported in the medium-load condition bySantangelo et al. (2007; Experiment 1) were far from significant (meancuing effect � 11 ms, 95% confidence intervals, �18 � � � 14 ms, t ��.555, p � .584). It would therefore seem possible that the trend towarda cuing effect might reflect nothing more than the normal fluctuation seenin RT data (and hence might, in principle, be smaller than the 11 msobserved in the bimodal cuing condition). Note also that the same numberof trials was presented to each of the 48 participants in Santangelo et al.’s(2007) study compared with the 22 participants in the present study. Thenonsignificant results reported in our previous study cannot thereforesimply be attributed to fewer data having been collected.

500

450

400

350

300

Cued Uncued

15

10

5

0

Rea

ctio

n tim

e (m

s)

Error (%

) No-load Medium-load High-load

Figure 2. Mean RTs (lines), error rates (bars), and standard errors (errorbars) for cued and uncued trials in the no-load, medium-load, and high-loadconditions in Experiment 1.

1315ATTENTIONAL CAPTURE BY MULTISENSORY CUES

were compared directly. Therefore, in Experiment 2 (which essen-tially constitutes an attempt to replicate the finding obtained inExperiment 1), we assessed the ability of visual, auditory, orbimodal (audiovisual) peripheral spatially nonpredictive cues tocapture visuo-spatial attention under conditions of no-load or highperceptual load. The medium-load condition was dropped from oursecond experiment, given that it had not been particularly infor-mative as regards the main aim of our study.

Experiment 2

Methods

Participants. Data were collected from 14 volunteers (8 male;mean age � 24.9 years, range � 20–30 years) from OxfordUniversity who reported normal or corrected-to-normal vision andwere naıve as to the purpose of the study, which lasted 35 min.

Apparatus and materials. The apparatus and materials werethe same as for Experiment 1 except that a unimodal visual cue (ablack rectangle of 18 � 12 mm, subtending a visual angle of 2.6°� 1.7°, presented on either the left or right side of the computermonitor) and a unimodal auditory cue (a pure tone 1100 Hzdelivered for 50 ms from the left or right loudspeaker) were alsoused in the orthogonal cuing task used in this experiment. Note thatthe bimodal cues consisted of the simultaneous presentation ofboth the visual and auditory cue on the same side. In this exper-iment, the horizontal eye position of every participant’s right eyewas monitored.

Procedure. Two different conditions of perceptual load (no-load and high-load) were presented in separate blocks of experi-mental trials (one for the no-load and two for the high-loadcondition), with the order of presentation of the conditions coun-terbalanced across participants. In both the no-load and high-loadconditions, the procedure was the same as for Experiment 1,

except that now the spatially nonpredictive cue could be visual,auditory, or bimodal. Each type of cue was presented with thesame probability within each block of trials. The high-load blocklasted for approximately 10 min and included 156 trials, that is,Digit Position (3) � Cue Position (2) � Cue Side (2) � Cue Type(3) � Trial Repetition (3) � 108, 70% of trials; Cue Type (3) �Cue Position (2) � Cue Side (2) � Target Position (4) � 48, 30%of trials). The no-load block lasted for approximately 6 min andincluded 96 trials instead, that is, Cue Type (3) � Cue Side (2) �Target Position (4) � Trial Repetition (4). Before starting theexperiment, the participants completed 6 no-load training trials aswell as 12 high-load training trials.

Results

Spatial cuing task. The mean RTs and error rates are shown inFigure 4. Premature responses and misses occurred only seldomly(0.5% of trials, overall), as did trials with a horizontal eye move-ment (or blink; 2.0% of trials). These trials were excluded from theanalysis of the RT data, as were those trials on which the partic-ipants responded erroneously. A three-way within-participantsANOVA with the factors of Perceptual Load (no-load vs. high-load), Cue Type (visual, auditory, audiovisual), and Cuing (cuedvs. uncued) performed on the RT data revealed a significant maineffect of cuing, F(1, 13) � 5.3, p � .039, indicating that theparticipants responded more rapidly on cued (M � 438 ms) thanon uncued (455 ms) target trials. Participants’ responses were alsosignificantly faster in the no-load condition (376 ms) than in thehigh-load condition (518 ms), F(1, 13) � 59.0, p � .001. TheANOVA also revealed a main effect of cue type, F(2, 26) � 6.2,p � .006, indicating that the participants responded significantlymore rapidly following the presentation of audiovisual cues (431ms, p � .001, post hoc test) than following visual cues (452 ms),which, in turn, tended to elicit faster responses than auditory cues(458 ms, p � .064).

Crucially, the ANOVA on the RT data also revealed a signifi-cant interaction between the three factors, F(2, 26) � 4.2, p �.026. Significant cuing effects were obtained for all the cues in theno-load condition, as shown by post hoc comparisons (visual cue:p � .007; auditory cue: p � .001; audiovisual cue: p � .001). Bycontrast, a spatial cuing effect was observed only for the audiovi-sual cue in the high-load condition (visual cue: p � .610; auditorycue: p � .129; audiovisual cue: p � .001). It is also worth notingthat the magnitude of the cuing effect obtained following audio-visual (bimodal) cuing did not differ significantly across the no-load (31 ms) and high-load (27 ms) conditions ( p � .701). Therewas also no significant difference in the magnitude of the cuingeffects elicited by visual (23 ms), auditory (36 ms), and audiovi-sual (31 ms) cues in the no-load condition ( p � .279 and p � .692,respectively; see Figure 5). It is important to note that the ANOVAon the RT data also revealed a significant interaction betweencondition and cue type, F(2, 26) � 10.0, p � .001, indicating thatbimodal cues disengaged visuo-spatial attention from the centralperceptually demanding task more rapidly (a mean of 491 ms) thaneither visual (523 ms; p � .001) or auditory (539 ms; p � .001)cues.

A similar ANOVA on the error data revealed a main effect ofcondition, F(1, 13) � 19.5, p � .001, indicating that the partici-pants made significantly more errors in the high-load condition

Figure 3. Comparison of the magnitude of the mean spatial cuing effects(computed as the difference between RTs observed in uncued and cuedtrials) elicited by unimodal (Santangelo et al., 2007, Experiment 1) andbimodal cues in the no-load, medium-load, and high-load conditions inExperiment 1. The error bars represent the standard errors of the means.

1316 SANTANGELO AND SPENCE

(7.2%) than in the no-load (2.5%) condition. The analysis alsorevealed a significant interaction between condition and cuing,F(1, 13) � 7.5, p � .017, indicating that the participants madesignificantly fewer errors on the cued target trials than on theuncued target trials in the high-load condition (5.9% and 8.5%,respectively; p � 003) but not in the no-load condition (2.8% and2.1%, respectively; p � 630). The ANOVA revealed no othersignificant terms in the analysis of the error data.

Digit detection task. Another three-way within-participantsANOVA with the factors of Digit Position in the stream (third,sixth, or ninth position), Cue Type (visual, auditory, and audiovi-sual), and Cue Position in the stream (third or sixth position) wasperformed on the RT data derived from the digit detection task.This analysis revealed a significant main effect of the position ofthe digit in the stream, F(2, 26) � 16.2, p � .001, indicating (inline with the results of Experiment 1; see also Santangelo et al.,2006, on this point) that the participants responded significantlymore rapidly when the digit was presented in the ninth streamposition (482 ms) than when it was presented in either the third(526 ms; p � .006) or sixth (512 ms; p � .012) positions.Moreover, this analysis also revealed a significant main effect ofcue type, F(2, 26) � 3.5, p � .044, indicating that the participantsdetected the digit more slowly when the cue was bimodal (514 ms)than when it was either visual (504 ms; p � .012) or auditory (502ms; p � .26). This result is consistent with the notion that, bycapturing the participants’ visuo-spatial attention irrespective ofthe ongoing central task, bimodal cues disrupted the participants’performance on the digit detection task. No other significant termswere revealed by the analysis of the RT data. Finally, no signifi-cant differences were found in a similar analysis of the error data(overall, the participants made very few errors in the digit detec-tion task: 3%).

Discussion

These results replicate the main finding reported in Experiment1, but now using a within-participants experimental design. Onceagain, the results showed that all three types of cues (auditory,visual, and audiovisual) captured visuo-spatial attention in theabsence of any perceptual load (i.e., the no-load condition), thus

600

550

500

450

400

350

Cued Uncued Cued Uncued Cued Uncued

10

5

0

Rea

ctio

n tim

e (m

s)

No-load High-load

Visual Auditory Audiovisual

Error (%

)

Figure 4. Mean RTs (lines), error rates (bars), and standard errors (error bars) elicited by visual, auditory oraudiovisual (bimodal) cues for cued and uncued trials in the no-load and high-load conditions in Experiment 2.

60

40

20

0

-20

Visual Auditory Audiovisual

Mea

n cu

ing

effe

ct (

ms)

No-load High-load

Cue type

Figure 5. Comparison of the magnitude of the mean spatial cuing effectelicited by visual, auditory, and audiovisual (bimodal) cues in Experiment2. The error bars represent the standard errors of the means.

1317ATTENTIONAL CAPTURE BY MULTISENSORY CUES

replicating the results of numerous previous studies (see Spence et al.,2004, for a review). However, in line with Santangelo et al.’s (2006)findings, unimodal visual and auditory cues failed to capture visuo-spatial attention under conditions of high perceptual load. By contrast,cuing effects elicited by bimodal (audiovisual) exogenous stimuliwere unaffected by the perceptual load manipulation.

However, it might be argued that the resistance to disruptionshown by spatial attentional orienting elicited by our audiovi-sual cues is not attributable to multisensory interaction per sebut simply to an increase of the information provided by thebimodal cues relative to unimodal cues. In order to investigatethis hypothesis, we conducted a final experiment in which wecompared the ability of three different kinds of redundantdouble peripheral cues to disengage spatial attention from aperceptually demanding task (i.e., the RSVP stream). One ofthese cues was multisensory (i.e., audiovisual, just as in the firsttwo experiments), whereas the other two were unimodal cues(i.e., redundant double-visual and redundant double-auditory;see Laurienti, Kraft, Maldjian, Burdette, & Wallace, 2004, forsimilar terminology). If multisensory integration does indeedplay a key role in the disengagement of spatial attention from acentrally presented RSVP stream/task, one would expect toobserve attentional orienting effects following audiovisual pe-ripheral cuing but not following redundant (i.e., double) visualor redundant double-auditory peripheral cues.

Experiment 3

Methods

Participants. Data were collected from 14 volunteers (6 male;mean age � 22.5 years, range � 18–28 years) from OxfordUniversity who reported normal or corrected-to-normal vision andwere naıve as to the purpose of the study, which lasted for 35 min.

Apparatus and materials. The apparatus and materials werethe same as for Experiment 1 except that now two extra loud-speaker cones (identical to those utilized in the previous experi-ments) were used to present 50 ms white noise bursts. They werelocated on each side of the computer monitor, 8 cm center-to-center with the previous two loudspeaker cones (again used topresent pure tones). Moreover, two LEDs located on each side ofthe computer monitor (between the inner loudspeakers and themonitor frame, lying on the midline of the screen, 20 cm fromcentral fixation, at an eccentricity of approximately 26°) were usedto present 50-ms flash lights.

Procedure. Two conditions (no-load and high-load) were pre-sented in separate blocks of experimental trials (one for the no-load and two for the high-load condition), with the order ofpresentation of the conditions counterbalanced across participants.In both the no-load and the high-load condition, the procedure wasthe same as for Experiment 2, except that now the spatiallynonpredictive cue could be redundant visual (consisting of thesimultaneous presentation of both a black rectangle and a lightflash), redundant auditory (consisting of the simultaneous presen-tation of both a pure tone and a noise burst), or audiovisual (whichconsisted of the simultaneous presentation of both a black rectan-gle and a pure tone, just as in the two previous experiments). As inExperiment 2, each type of cue was presented with the sameprobability within each block of trials. The duration of the exper-

imental blocks (i.e., no-load or high-load) and the number ofexperimental and training trials were the same as for Experiment 2.

Results

Spatial cuing task. The mean RTs and error rates are shown inFigure 6. Premature responses and misses occurred only seldomly(0.6% of trials, overall). These trials were excluded from theanalysis of the RT data, as were the trials on which the participantsresponded erroneously. A three-way within-participants ANOVAwith the factors of Perceptual Load (no-load vs. high-load), CueType (redundant visual, redundant auditory, and audiovisual), andCuing (cued vs. uncued) performed on the RT data revealed asignificant main effect of cuing, F(1, 13) � 34.9, p � .001,indicating that the participants responded more rapidly on cued(M � 434 ms) than on uncued (451 ms) target trials overall.Participants’ responses were also significantly faster in the no-loadcondition (385 ms) than in the high-load condition (500 ms), F(1,13) � 36.1, p � .001, once again indicating the effectiveness ofour perceptual load manipulation.

Crucially, the ANOVA on the RT data also revealed a signifi-cant interaction between the three factors, F(2, 26) � 3.4, p �.047. Significant cuing effects were obtained for all three cues inthe no-load condition, as shown by post hoc comparisons (redun-dant visual cue: p � .001; redundant auditory cue: p � .001;audiovisual cue: p � .001). By contrast, a spatial cuing effect wasobserved only for the audiovisual cue in the high-load condition(redundant visual cue: p � .615; redundant auditory cue: p � .884;audiovisual cue: p � .003). As in Experiment 2, the magnitude ofthe spatial cuing effect obtained following audiovisual cuing didnot differ significantly across the no-load (21 ms) and high-load(20 ms) conditions ( p � .906). Nor was there any significantdifference in the magnitude of the cuing effects elicited by theredundant visual (23 ms), redundant auditory (31 ms), and audio-visual (21 ms) cues in the no-load condition ( p � .316 and p �.271, respectively; see Figure 7).

A similar ANOVA performed on the error data revealed a maineffect of condition, F(1, 13) � 8.5, p � .012, indicating that theparticipants made significantly more errors in the high-load con-dition (5.8%) than in the no-load (2.5%) condition. The analysisalso revealed a main effect of Cuing, F(1, 13) � 5.3, p � .038,indicating that the participants made significantly fewer errors onthe cued target trials (3.2%) than on the uncued target trials (5.1%).The ANOVA revealed no other significant terms in the analysis ofthe error data.

Digit detection task. A further three-way within-participantsANOVA with the factors of Digit Position in the stream (third,sixth, or ninth position), Cue Type (redundant visual, redundantauditory, and audiovisual), and Cue Position in the stream (third orsixth position) was performed on the RT data from the digitdetection task. This analysis revealed a significant main effect ofthe position of the cue in the stream, F(1, 13) � 19.0, p � .001,indicating that the participants responded significantly more rap-idly when the spatial cue was presented in the third stream position(484 ms) than when it was presented in the sixth position (505 ms).Moreover, the participants tended to respond more rapidly whenthe digit was presented in the ninth stream position (468 ms) thanwhen it was presented in either the third or sixth position (anaverage of 507 ms), though this effect failed to reach significance,

1318 SANTANGELO AND SPENCE

F(2, 26) � 2.0, p � .153. The ANOVA revealed no other signif-icant terms in the analysis of the digit detection RT data.

Finally, a similar analysis performed on the error data revealeda significant main effect of cue type, F(2, 26) � 3.7, p � .037,indicating that the participants failed to detect the digit morefrequently when the spatial cue was audiovisual (4%) than when it

was either a redundant visual (3%; p � .024) or a redundantauditory signal (2%; p � .017). This result, in line with results ofExperiment 2, is consistent with the notion that, by capturing theparticipants’ visuo-spatial attention irrespective of the ongoingcentral task, audiovisual cues disrupted the participants’ perfor-mance on the digit detection task.

Discussion

The results of our final experiment rule out the possibility thatthe resistance to disruption shown by the spatial attentional ori-enting elicited by the presentation of audiovisual cues in our firsttwo experiments can be attributed simply to an increase in theinformation provided by the audiovisual (bimodal) peripheral cuesrelative to the unimodal cues (see Experiment 2). Although onemight argue that our redundant (either visual or auditory) cueswere perceived by our participants as just single (i.e., more com-plex) unimodal cues, they nevertheless represented an increasedamount of spatial information relative to the unimodal (visual orauditory) cue used in Experiment 2.

The results of Experiment 3 show that all three types of bimodalcues (i.e., redundant visual, redundant auditory, and audiovisual)captured visuo-spatial attention in the absence of any perceptualload (i.e., the no-load condition) to the same extent. However, onlythe multisensory bimodal cues still captured visuo-spatial attentionunder conditions of high perceptual load (i.e., the high-load con-dition), thus indicating that multisensory interaction (and not theincreased amount of peripheral information, such as for theredundant-auditory or redundant-visual cues) can play a key role indisengaging the participant’s spatial attention from a perceptuallydemanding task. (See Laurienti et al., 2004, for similar evidencethat the integration of sensory signals in their crossmodal Stroop

550

500

450

400

350

Cued Uncued Cued Uncued Cued Uncued

15

10

5

0

Redundant-Visual Redundant-Auditory Audiovisual

Rea

ctio

n tim

e (m

s)E

rror (%)

No-load High-load

Figure 6. Mean RTs (lines), error rates (bars), and standard errors (error bars) elicited by the redundant visual,redundant auditory, and audiovisual (bimodal) cues for cued and uncued trials in the no-load and high-loadconditions in Experiment 3.

50

40

30

20

10

0Redundant-Vis. Redundant-Aud. Audiovisual

Cue type

Mea

n cu

ing

effe

ct (

ms)

No-load High-load

Figure 7. Comparison of the magnitude of the mean spatial cuing effectelicited by redundant visual, redundant auditory and audiovisual (bimodal)cues in Experiment 3. Redundant-Vis. � Redundant-Visual; Redundant-Aud. � Redundant-Auditory. The error bars represent the standard errorsof the means.

1319ATTENTIONAL CAPTURE BY MULTISENSORY CUES

paradigm only occurred when the two stimuli were presented fromdifferent sensory modalities—audition and vision, just as here—but not when two redundant visual signals or two redundantauditory signals were presented at the same time.)

General Discussion

The primary aim of the three experiments reported in the presentstudy was to compare the ability of unimodal (either visual orauditory) and bimodal (audiovisual) spatially nonpredictive pe-ripheral stimuli to capture visuo-spatial attention under conditionsof no perceptual load versus high perceptual load. Across all threeexperiments, our results highlighted a qualitative difference be-tween exogenous orienting of visuo-spatial attention triggered byunimodal and bimodal (i.e., multisensory) cues. In fact, unimodal(including redundant unimodal) spatial exogenous cues failed tocapture visuo-spatial attention under conditions of high perceptualload (i.e., when the participants had to perform the RSVP task). Bycontrast, audiovisual cues successfully captured visuo-spatial at-tention regardless of any concurrent increase in perceptual load.This was evidenced by a comparable magnitude of spatial cuingeffects being reported in the no-load and high-perceptual loadconditions (see Experiments 2 and 3), and also by the fact thatthe audiovisual cues resulted in a decrement in participants’performance on the ongoing central task (i.e., the digit detec-tion) with respect to either the single unimodal cues (i.e., theirresponses were slowed; see Experiment 2) or the redundantunimodal cues (i.e., participants made more errors; see Exper-iment 3).

According to perceptual load theory (see Lavie, 2005, for arecent review), perceptual resources are inevitably used to processstimuli until they have been depleted. Santangelo et al. (2007)recently argued that the RSVP task may consume most of aparticipant’s perceptual resources, thus leaving fewer resourcesspare for the processing of any task-irrelevant stimuli (such as theuninformative cues presented in their studies or in ours). Theresults of the present study demonstrate that peripherally presentedaudiovisual stimuli appear to retain their effectiveness in reflex-ively orienting spatial attention as long as they are capable ofdisengaging a participant’s attention from the main task (i.e., thetask on which the participant’s attention happens to be currentlyfocused). In other words, perceptually demanding tasks appear toraise the threshold that has to be overcome by peripheral stimula-tion in order to capture spatial attention, rather than simply justconsuming a person’s available processing resources. The multi-sensory cues used in the present study would therefore appear tohave exceeded that threshold (see Santangelo & Spence, 2007).

The results of Experiment 2 indicate that multisensory integra-tion can facilitate the disengagement of spatial attention from aconcurrent perceptually demanding task. Support for this claimcomes from the fact that participants’ responses to the visuo-spatial target were about 40 ms faster when it was preceded by thebimodal (audiovisual) cues rather than by the unimodal cues underconditions of high perceptual load (i.e., when participants had tomonitor the RSVP stream; see the significant interaction betweencondition and cue type; Experiment 2). Notwithstanding the fasterdisengagement of participants’ attention from the central ongoingtask, the magnitude of the spatial cuing effects elicited by thepresentation of the bimodal (audiovisual) cues was comparable to

those elicited by the presentation of the unimodal cues in theno-load condition (i.e., without the RSVP task). A possible expla-nation for this result might relate to the possibility that the simul-taneous presentation of auditory and visual stimuli may haveresulted in an increase of perceptual salience of the exogenouscues (compared with the presentation of unimodal exogenouscues), without necessarily leading to any larger cuing effect (seeLovelace, Stein, & Wallace, 2003; Odgaard, Arieh, & Marks,2003; Stein, London, Wilkinson, & Price, 1996). Moreover, it isalso worth mentioning here that superadditive effects followingbimodal stimulation in animal research have typically been dem-onstrated using near-threshold stimuli, whereas crossmodal studiesof spatial attention in humans have always used clearly supra-threshold stimuli (see Spence et al., 2004, for a review). This isperhaps the reason why to date bimodal cues failed to elicit inhumans any larger spatial cuing effect compared with unimodalcues. Whatever the reason, the extent to which multisensory inte-gration affects the exogenous orienting of visuo-spatial attention(by increasing the perceptual salience of stimuli or by other mech-anisms) is a nontrivial question awaiting future research.

In conclusion, the results of the present study provide the firstempirical evidence showing a behavioral advantage for multisen-sory (audiovisual) over unimodal (either auditory or visual) stimuliin terms of their ability to capture spatial attention reflexively.Superadditive (enlarged) effects for bimodal compared with uni-modal stimulation have previously been documented intracellu-larly in anaesthetized animals (e.g., Stein & Meredith, 1993), and,on occasion, behaviorally in awake animals (e.g., Stein et al.,1989; though see Populin & Yin, 2002), and in neuroimaging andERP studies in humans (e.g., Calvert et al., 2000; Talsma &Woldorff, 2005; Teder-Salejarvi et al., 2002; though see alsoBeauchamp, Argall, Bodurka, Duyn, & Martin, 2004; Dhamala,Assisi, Jirsa, Steinberg, & Kelso, 2007, for null results). However,the results outlined here provide the first empirical demonstrationthat multisensory integration derived from bimodal (audiovisual)stimulation can also qualitatively affect behavioral performance inthe reflexive orienting of spatial attention in awake humans. Al-though the magnitude of the spatial cuing effect elicited by thepresentation of the bimodal cues was no bigger than that elicitedby the presentation of the unimodal cues in the no-load condition,bimodal stimulation appears to be more resistant to the disruptionof spatial cuing observed in several recent cuing studies (Santan-gelo et al., 2007; Santangelo & Spence, in press) by facilitating thedisengagement of attention from a perceptual/attentional demand-ing task (such as the RSVP stream).

Overall, the findings reported here may also be very importantin an applied context: for example, when considering the design offuture warning signals capable of capturing driver attention (e.g.,Ho et al., 2006, in press; Ho & Spence, 2005). Our results suggestthat multisensory warning signals might be far more effective thanunimodal cues at capturing driver attention when the driver (or anyother interfere operator) is engaged in other cognitively demandingtasks (e.g., such as driving and/or using a mobile phone).

References

Beauchamp, M. S., Argall, B. D., Bodurka, J., Duyn, J. H., & Martin, A.(2004). Unraveling multisensory integration: Patchy organization withinhuman STS multisensory cortex. Nature Neuroscience, 7, 1190–1192.

1320 SANTANGELO AND SPENCE

Berger, A., Henik, A., & Rafal, R. (2005). Competition between endoge-nous and exogenous orienting of visual attention. Journal of Experimen-tal Psychology: General, 134, 207–221.

Calvert, G. A., Campbell, R., & Brammer, M. J. (2000). Evidence fromfunctional magnetic resonance imaging of crossmodal binding in thehuman heteromodal cortex. Current Biology, 10, 649–657.

Calvert, G., Spence, C., & Stein, B. E. (2004). Introduction. In G. A.Calvert, C. Spence, & B. E. Stein (Eds.), The handbook of multisensoryprocessing (pp. xi–xvii). Cambridge, MA: MIT Press.

Dhamala, M., Assisi, C. G., Jirsa, V. K., Steinberg, F. L., & Kelso, J. A. S.(2007). Multisensory integration for timing engages different brain net-works. Neuroimage, 34, 764–773.

Eimer, M. (2004). Electrophysiology of human crossmodal spatial atten-tion. In C. Spence & J. Driver (Eds.), Crossmodal space and crossmodalattention (pp. 221–245). Oxford, England: Oxford University Press.

Ho, C., Reed, N. J., & Spence, C. (2006). Assessing the effectiveness of“intuitive” vibrotactile warning signals in preventing front-to-rear-endcollisions in a driving simulator. Accidents Analyses and Prevention, 38,988–996.

Ho, C., Reed, N., & Spence, C. (in press). Multisensory (audiotactile)in-car warning signals for collision avoidance. Human Factors.

Ho, C., & Spence, C. (2005). Assessing the effectiveness of variousauditory cues in capturing a driver’s visual attention. Journal of Exper-imental Psychology: Applied, 11, 157–174.

Holmes, N. P., & Spence, C. (2005). Multisensory integration: Space, timeand superadditivity. Current Biology, 15, R762–R764.

Jonides, J. (1981). Voluntary versus automatic control over the mind’seye’s movement. In J. Long & A. Baddeley (Eds.), Attention andperformance IX (pp. 187–203). Hillsdale, NJ: Erlbaum.

Laurienti, P. J., Kraft, R. A., Maldjian, J. A., Burdette, J. H., & Wallace,M. T. (2004). Semantic congruence is a critical factor in multisensorybehavioral performance. Experimental Brain Research, 158, 405–414.

Lavie, N. (2005). Distracted and confused?: Selective attention under load.Trends in Cognitive Sciences, 9, 76–82.

Lovelace, C. T., Stein, B. E., & Wallace, M. T. (2003). An irrelevant lightenhances auditory detection in humans: A psychophysical analysis ofmultisensory integration in stimulus detection. Cognitive Brain Re-search, 17, 447–453.

Macaluso, E., & Driver, J. (2004). Functional imaging of crossmodalspatial representation and crossmodal spatial attention. In C. Spence &J. Driver (Eds.), Crossmodal space and crossmodal attention (pp. 247–275). Oxford, England: Oxford University Press.

Mazza, V., Turatto, M., Rossi, M., & Umilta, C. (2007). How automatic areaudiovisual links in exogenous spatial attention? Neuropsychologia, 45,514–522.

Muller, H. J., & Rabbitt, P. M. A. (1989). Reflexive and voluntaryorienting of visual attention: Time course of activation and resistance tointerruption. Journal of Experimental Psychology: Human Perceptionand Performance, 15, 315–330.

Odgaard, E., Arieh, Y., & Marks, L. E. (2003). Cross-modal enhancementof perceived brightness: Sensory interaction versus response bias. Per-ception & Psychophysics, 65, 123–132.

Populin, L. C., & Yin, T. C. T. (2002). Bimodal interactions in the superiorcolliculus of the behaving cat. Journal of Neuroscience, 22, 2826–2834.

Santangelo, V., Olivetti Belardinelli, M., & Spence, C. (2007). The sup-pression of reflexive visual and auditory orienting when attention isotherwise engaged. Journal of Experimental Psychology: Human Per-ception and Performance, 33, 137–148.

Santangelo, V., & Spence, C. (2006). Assessing the effect of verbalworking memory load on visuo-spatial exogenous orienting. Neuro-science Letters, 413, 105–109.

Santangelo, V., & Spence, C. (2007). Is the exogenous orienting of visuo-spatial attention truly automatic? A multisensory perspective. Manu-script submitted for publication.

Santangelo, V., & Spence, C. (in press). Assessing the automaticity of theexogenous orienting of tactile attention. Perception.

Santangelo, V., Van der Lubbe, R. H. J., Olivetti Belardinelli, M., &Postma, A. (2006). Spatial attention triggered by unimodal, crossmodal,and bimodal exogenous cues: A comparison of reflexive orienting mech-anisms. Experimental Brain Research, 173, 40–48.

Spence, C., & Driver, J. (1999). A new approach to the design of multimodalwarning signals. In D. Harris (Ed.), Engineering psychology and cognitiveergonomics (Vol. 4, pp. 455–461). Aldershot, England: Ashgate.

Spence, C., McDonald, J., & Driver, J. (2004). Exogenous spatial cuingstudies of human crossmodal attention and multisensory integration. InC. Spence & J. Driver (Eds.), Crossmodal space and crossmodal atten-tion (pp. 277–320). Oxford, England: Oxford University Press.

Stein, B. E., London, N., Wilkinson, L. K., & Price, D. P. (1996). En-hancement of perceived visual intensity by auditory stimuli: A psycho-physical analysis. Journal of Cognitive Neuroscience, 8, 497–506.

Stein, B. E., & Meredith, M. A. (1993). The merging of the senses.Cambridge, MA: MIT Press.

Stein, B. E., Meredith, M. A., Huneycutt, W. S., & McDade, L. (1989).Behavioral indices of multisensory integration: Orientation to visualcues is affected by auditory stimuli. Journal of Cognitive Neuroscience,1, 12–24.

Stein, B. E., Meredith, M. A., & Wallace, M. T. (1993). The visuallyresponsive neuron and beyond: Multisensory integration in cat andmonkey. Progress in Brain Research, 95, 79–90.

Stein, B. E., Stanford, T. R., Wallace, M. T., Vaughan, W. J., & Jiang, W.(2004). Crossmodal spatial interactions in subcortical and cortical cir-cuits. In C. Spence & J. Driver (Eds.), Crossmodal space and cross-modal attention (pp. 25–50). Oxford, England: Oxford University Press.

Talsma, D., & Woldorff, M. G. (2005). Attention and multisensory inte-gration: Multiple phases of effects on the evoked brain activity. Journalof Cognitive Neuroscience, 17, 1098–1114.

Teder-Salejarvi, W. A., McDonald, J. J., Di Russo, F., & Hillyard, S. A.(2002). An analysis of audio-visual crossmodal integration by means ofevent-related potential (ERP) recordings. Cognitive Brain Research, 14,106–114.

Theeuwes, J. (1991). Exogenous and endogenous control of attention: Theeffect of visual onsets and offsets. Perception & Psychophysics, 49,83–90.

Treisman, A. (2005). Synesthesia: Implications for attention, binding, andconsciousness—A commentary. In L. Robertson & N. Sagiv (Eds.),Synesthesia: Perspectives from cognitive neuroscience (pp. 239–254).Oxford, England: Oxford University Press.

Van der Lubbe, R. H. J., & Postma, A. (2005). Interruption from irrelevantauditory and visual onsets even when attention is in a focused state.Experimental Brain Research, 164, 464–471.

Wallace, M. T., Meredith, M. A., & Stein, B. E. (1998). Multisensoryintegration in the superior colliculus of the alert cat. Journal of Neuro-physiology, 80, 1006–1010.

Ward, L. M. (1994). Supramodal and modality-specific mechanisms forstimulus-driven shifts of auditory and visual attention. Canadian Jour-nal of Experimental Psychology, 48, 242–259.

Ward, L. M., McDonald, J. J., & Golestani, N. (1998). Cross-modal controlof attention shifts. In R. D. Wright (Ed.), Visual attention (pp. 232–268).New York: Oxford University Press.

Yantis, S., & Jonides, J. (1990). Abrupt visual onsets and selective atten-tion: Voluntary versus automatic allocation. Journal of ExperimentalPsychology: Human Perception and Performance, 16, 121–134.

Received August 15, 2006Revision received March 8, 2007

Accepted March 30, 2007 �

1321ATTENTIONAL CAPTURE BY MULTISENSORY CUES