11
Neuropsychologia 40 (2002) 1396–1406 Audiovisual speech perception in Williams syndrome M. Böhning a,b , R. Campbell b,* , A. Karmiloff-Smith c a Department of Linguistics, University of Potsdam, Potsdam, Germany b Department of Human Communication Science, University College London, Chandler House, 2 Wakefield Street, London, UK c Neurocognitive Development Unit, Institute of Child Health, London, UK Received 13 November 2000; received in revised form 26 September 2001; accepted 28 September 2001 Abstract People with the genetic disorder of Williams syndrome (WS) show an anomalous cognitive profile, wherein some purely verbal and social communicative abilities are relatively proficient, while visuo-spatial skills can be extremely impaired. Face processing, while apparently relatively spared among visuo-spatial skills, can show deficits suggesting developmental immaturity. In this context, the exploration of visual and audiovisual speech perception in WS is of interest. A new test based on tokens from a single natural English speaker of the form /ba:/, /va:/, /a:/, /da:/ and /ga:/, digitally manipulated and presented in unimodal (vision alone, audition alone) and audiovisual conditions, was presented for participants to identify each token. Compared with age-matched controls, WS participants were impaired at visual but not auditory identification, and in audiovisual testing showed correspondingly reduced effects of vision on report of auditory token identity. Audiovisual integration was nevertheless demonstrable in WS. Speech-reading may require skills which do not reach age-appropriate levels in WS, despite their age-appropriate (auditory) phonological abilities. © 2002 Published by Elsevier Science Ltd. Keywords: Williams syndrome; Audiovisual speech; Face processing; Magnocellular function 1. Introduction Williams syndrome (WS) [38] is a rare genetic disorder with an incidence of one in 20,000 live births [3]. It is of interest to cognitive neuroscientists because of its uneven cognitive–linguistic profile. Some language and commu- nicative skills appear relatively proficient, while many non-verbal skills such as visuo-spatial cognition, number, as well as planning, conceptual/semantic skills and problem solving are severely compromised (see [6,29] for recent reviews). However, one area of visuo-spatial cognition for which claims of spared function have been made is the processing of faces. People with WS ‘like to look at faces’, and can show appropriate responses to a range of facial acts that denote emotion and intention [23,33]. Several exper- imental studies had suggested that face processing in WS can be chronological age-appropriate, or at least may ap- proach such levels. However, a detailed study by Deruelle et al. [14] has explored a range of face processing tasks in a WS population, attempting to delineate the pattern of face processing more closely. While normal adults are better * Corresponding author. Tel.: +44-207-679-4232; fax: +44-207-713-0861. E-mail address: [email protected] (R. Campbell). at matching upright than inverted faces, people with WS resembled normal younger controls in their relative insensi- tivity to vertical orientation in face-matching. The results of the other face processing tasks in the battery also supported the idea that face processing was not age-appropriate in this group, and reflected difficulties in configurable processing of faces. Deruelle et al. also tested ability to match still face images for lipshape (/a/, /o/ and /i/). This was the only task on which the WS group performed as well as their age-matched control group. They concluded that lip-reading relies on local rather than global or configural processing, accounting for its sparing in WS. This conclusion may be precipitate, for natural speech- or lip-reading is highly sensitive to the normal facial configura- tion. For example, audiovisual speech illusions are affected by the orientation of the face. They are reduced for inverted faces, even when the mouth region itself is upright [20,35]. Moreover, unlike the static stimuli used by Deruelle et al. [14], the processing of faces for speech makes use of visual movement. Visual time-varying information in the absence of a recognisable image of the relevant face parts can affect auditory perception [34]. In hearing people, sensitivity to visual speech can be mea- sured by the influence of seen on heard speech, when these are combined and presented as audiovisual tokens. This 0028-3932/02/$ – see front matter © 2002 Published by Elsevier Science Ltd. PII:S0028-3932(01)00208-1

Audiovisual speech perception in Williams syndrome

Embed Size (px)

Citation preview

Neuropsychologia 40 (2002) 1396–1406

Audiovisual speech perception in Williams syndromeM. Böhning a,b, R. Campbell b,!, A. Karmiloff-Smith c

a Department of Linguistics, University of Potsdam, Potsdam, Germanyb Department of Human Communication Science, University College London, Chandler House, 2 Wakefield Street, London, UK

c Neurocognitive Development Unit, Institute of Child Health, London, UK

Received 13 November 2000; received in revised form 26 September 2001; accepted 28 September 2001

Abstract

People with the genetic disorder ofWilliams syndrome (WS) show an anomalous cognitive profile, wherein some purely verbal and socialcommunicative abilities are relatively proficient, while visuo-spatial skills can be extremely impaired. Face processing, while apparentlyrelatively spared among visuo-spatial skills, can show deficits suggesting developmental immaturity. In this context, the exploration ofvisual and audiovisual speech perception in WS is of interest. A new test based on tokens from a single natural English speaker of the form/"ba:/, /"va:/, /"!a:/, /"da:/ and /"ga:/, digitally manipulated and presented in unimodal (vision alone, audition alone) and audiovisualconditions, was presented for participants to identify each token.Compared with age-matched controls, WS participants were impaired at visual but not auditory identification, and in audiovisual

testing showed correspondingly reduced effects of vision on report of auditory token identity. Audiovisual integration was neverthelessdemonstrable in WS. Speech-reading may require skills which do not reach age-appropriate levels in WS, despite their age-appropriate(auditory) phonological abilities. © 2002 Published by Elsevier Science Ltd.

Keywords:Williams syndrome; Audiovisual speech; Face processing; Magnocellular function

1. Introduction

Williams syndrome (WS) [38] is a rare genetic disorderwith an incidence of one in 20,000 live births [3]. It is ofinterest to cognitive neuroscientists because of its unevencognitive–linguistic profile. Some language and commu-nicative skills appear relatively proficient, while manynon-verbal skills such as visuo-spatial cognition, number,as well as planning, conceptual/semantic skills and problemsolving are severely compromised (see [6,29] for recentreviews). However, one area of visuo-spatial cognition forwhich claims of spared function have been made is theprocessing of faces. People with WS ‘like to look at faces’,and can show appropriate responses to a range of facial actsthat denote emotion and intention [23,33]. Several exper-imental studies had suggested that face processing in WScan be chronological age-appropriate, or at least may ap-proach such levels. However, a detailed study by Deruelleet al. [14] has explored a range of face processing tasks ina WS population, attempting to delineate the pattern of faceprocessing more closely. While normal adults are better

! Corresponding author. Tel.: +44-207-679-4232;fax: +44-207-713-0861.E-mail address: [email protected] (R. Campbell).

at matching upright than inverted faces, people with WSresembled normal younger controls in their relative insensi-tivity to vertical orientation in face-matching. The results ofthe other face processing tasks in the battery also supportedthe idea that face processing was not age-appropriate in thisgroup, and reflected difficulties in configurable processingof faces. Deruelle et al. also tested ability to match stillface images for lipshape (/a/, /o/ and /i/). This was the onlytask on which the WS group performed as well as theirage-matched control group. They concluded that lip-readingrelies on local rather than global or configural processing,accounting for its sparing in WS.This conclusion may be precipitate, for natural speech- or

lip-reading is highly sensitive to the normal facial configura-tion. For example, audiovisual speech illusions are affectedby the orientation of the face. They are reduced for invertedfaces, even when the mouth region itself is upright [20,35].Moreover, unlike the static stimuli used by Deruelle et al.[14], the processing of faces for speech makes use of visualmovement. Visual time-varying information in the absenceof a recognisable image of the relevant face parts can affectauditory perception [34].In hearing people, sensitivity to visual speech can be mea-

sured by the influence of seen on heard speech, when theseare combined and presented as audiovisual tokens. This

0028-3932/02/$ – see front matter © 2002 Published by Elsevier Science Ltd.PII: S0028 -3932 (01 )00208 -1

M. Böhning et al. / Neuropsychologia 40 (2002) 1396–1406 1397

constitutes a more realistic way to assess speechreading inWS, especially since there are no reports that the percep-tion of auditory speech is compromised. The first aim of thestudy reported here, is to describe the pattern of influenceof seen on heard speech identification in people with WS.A secondary aim is to explore the extent to which this task

might reflect abnormal integration of visual and auditoryprocessing. A high-level integration deficit, which wouldimpair the analysis of the products of perceptual processing,has been proposed as one principle underlying the anoma-lous cognitive profile in WS [22,36,37]. The extent to whichrelatively low level cross-modal perceptual integration mightbe implicated has not been explored hitherto, and audiovi-sual speech processing offers a suitable testbed, since au-diovisual integration is established in infancy, is somewhatinsensitive to attentional and strategic demands [28], followsa well defined integration metric [26,27] and has a clearlyidentified cortical basis involving auditory language circuitry[7,8].Sensitivity to unimodal visual speech can be poor in

dyslexic children [11] and in a number of patients withcortical lesions or anomalous face processing function[9,10,12]. Because visual performance can be poor whileauditory performance is good, audiovisual performancemay appear to show less evidence of integration (and isrelatively more reliant on audition). However, it would beincorrect to assume that integration mechanisms per se arefaulty in such cases. De Gelder et al. [13], however, suggestthat children with autism may have defective audiovisualintegration. In their study, nine youngsters with autismshowed normal visual speech discrimination and auditoryspeech categorisation, but failed to show the consistent shiftof the (auditory) discrimination function when bimodalaudiovisual inputs were displayed.In the present study, we explore the identification of audi-

tory and of visual iambic disyllables (of the form /"C/a:/).Using a single speaker, these varied in the perceived placeof articulation of the consonant, (C), from bilabial (‘ba’) tovelar (‘ga’). All unimodal and bimodal combinations weretested in a WS group of adolescents and adults matched in-dividually to controls on the basis of age.Our experimental questions are as follows:

1. How good is visual speech discrimination in WS?While lipshape matching was age-appropriate in WS,

several studies suggest a configurational component tothe identification of natural visual speech. Configura-tional processing appears to be relatively compromisedin WS. Moreover, moving faces cannot readily beequated to still ones in terms of the information pro-cessing entailed. Our predictions concerning the visualidentification of natural lipspoken syllables in WS arethat it may show deficits compared with age-matchedcontrols.

2. How good is auditory speech segment discrimination inWS?

In contrast to their high-level language anoma-lies, phonological and articulatory abilities in WS arebelieved to be age-appropriate, with good verbal repeti-tion skills [6]. We predict that there should be no differ-ence between WS and controls in their identification ofthe five auditory only targets.

3. Do WS show reduced integration of seen and heardspeech?If WS show age-appropriate visual and auditory identi-

fication skills, the ‘failure of integration’ hypothesis maybe tested definitively. It would predict that in reportingaudiovisual syllables, WS would report either the visualor the auditory but (relatively) rarely a ‘combination’percept or one showing some influence of vision onaudition.

2. Method

2.1. Participants

Fifteen native English-speaking individuals withWSwerescreened for suitability for testing (adequate hearing, abilityto follow instructions), by interview. One diagnosed with apermanent hearing loss was excluded, as was another whoseresponse patterns were highly perseverative. This left 13participants (five female, eight male) in the experiment.Table 1 shows the chronological age and gender of the

individual participants, with their scores on a number ofpsychometric measures. These included the British PictureVocabulary Scale II (BPVS II; receptive vocabulary test:[15]) and the British Ability Scales II (BAS II; [16]). 1Where fluorescent in situ hybridisation (FISH) had beenperformed, it is reported. It was positive in all the ninetested cases, confirming the genotype showing deletionof the elastin gene. The remaining participants had beendiagnosed clinically from structural and functional criteria.Klein et al. [24] report very high incidence of hyperacusis

in WS. This might affect the development of visual speechanalysis skills. A version of Klein et al.’s questionnaire wasadministered by interview with caregiver or participant, anda hyperacusis score established (see Table 1).The age range of the participants in the present study was

11:1 to 52:2 years. Some participants wore spectacles tocorrect (minor) optical defects.

2.2. Controls

The experimental hypotheses relate to age-appropriateskills. Therefore, an age-matched control group was chosen.These were native English speakers with no known cogni-tive anomalies, matched on an individual level for age and

1 The chronological age given in the BPVS II column correspondswith the age when the experiment of this study was carried out. Thepsychometric test results of the BAS II were obtained at different dates.

1398M.Böhning

etal./N

europsychologia40

(2002)1396–1406

M. Böhning et al. / Neuropsychologia 40 (2002) 1396–1406 1399

gender to the WS participants. They were recruited througha youth club and from volunteers in the London area. 2

2.3. Materials

The stimuli consisted of five naturally spoken iambicVCV syllables (/"ba:/, /"va:/, /"!a:/, /"da:/, and /"ga:/)derived from a female British–English talker, which weredigitally captured as five audiovisual clips. The images wererecorded under studio conditions and high-definition imageand audio capture was used. The aim of the recording wasto achieve optimal speech readability. Thus, the talker wasside and front lit, affording both a quite strongly shadowedview of the whole head and some details of the inside ofthe mouth. The view was frontal, including head and shoul-ders. Each disyllable was matched for overall length andfor perceived onset of the consonant, before being splicedinto audio and visual segments only. These were used togenerate 35 tokens: 5 auditory, 5 visual and the completeset of 25 audiovisual combinations. In the auditory uni-modal condition (n = 5) the screen remained black. In thevisual unimodal condition (n = 5) a silent lip-movementwas seen. Twelve 35-item lists, each comprising every to-ken chosen in random order, were constructed. Each listwas titled separately and trials were individually numberedonscreen. Every trial in the list was preceded by a warningtone. Each trial comprised a 1 s speech segment followed bya 3 s blank screen during which the response was collected.A videotape was constructed from this material and partic-ipants in this experiment each viewed a total of five lists(total number of trials was 35# 5 = 175 per participant) inthe experimental condition.

2.4. Procedure

Participants were tested individually in a quiet room. Thevideotape was shown on a 22 in. colour video monitor, withaudio level set to 60 dB. Participant’s behaviour was mon-itored by video and audio recorder. Each participant sat ata distance of 1m from the television. Each testing sessionlasted about 25min.After each list the experimenter, sitting beside the partic-

ipant, paused the video. This helped the WS participants,particularly, to concentrate on the task. At the beginningof each testing session the experimenter gave the followingverbal instructions to WS participants and younger controls:“You will now watch a video of a woman speaking. I wantyou to repeat what the woman said. Sometimes she mightbe hidden behind a wall, so you can’t see her. And some-times she might be whispering, so you can’t hear her. Buteach time I want you to repeat what she said. She will be

2 The greatest age difference in the under 18-year-old was 1;2 years,the control subject being older than the WS subject and the greatest agedifference in the over 18-year-old was 2;7 years, the control subject beingolder than the WS subject.

saying funny words that don’t mean anything”. For oldercontrols, the instruction was: “You will now watch a videoof a woman speaking. I want you to repeat what the womansaid. Sometimes you can’t see her and sometimes you can’thear her. But each time I want you to repeat what she said.She will be saying words that have no meaning”.After instruction all the participants saw training stimuli

from the videotape. Pilot testing established that 10 practicetrials were sufficient for reliable performance in both WSand controls. The experimental series started immediatelyafter this.All responses for WS participants were transcribed pho-

netically by the experimenter (MB), and checked indepen-dently by a second transcriber from audio and videotape.Agreement was >90%.

3. Results

3.1. Effects of hyperacusis

Hyperacusis questionnaire score measures varied widelyin the WS group, but were significantly greater than incontrols (P < 0.01). However these scores failed to predictoutcomes on the speech identification tasks, for no signifi-cant correlations were found between hyperacusis score (thenumber of sounds that currently bothered the participantsee Table 1) and any of the behavioural measures (below).In WS, another measure of hyperacusis-rated anxiety inrelation to sudden noises—was related to the total numberof ‘unpleasant reactions’ that participants reported to 10different noises. However, this measure also failed to cor-relate with any of the experimental variables. Hyperacusismeasures were ignored in further analyses.

3.1.1. Unimodal presentationsThe first analysis explored whether WS showed anoma-

lous processing of auditory and of visual tokens presentedunimodally. The response measure was accuracy of report(/5). This was a repeated measures analysis (SPSS GLMprocedure) in which the within-subject (repeated measures)factors were (a) modality and (b) place of articulation. Be-tween the subjects factor was experimental group, and agewas entered as a covariate.There was a marginally significant effect of modality

(F(1, 23) = 4.08, P = 0.055) vision was superior to au-dition overall. There was a significant group x modalityinteraction (F(1, 23) = 4.76, P < 0.05). WS were inferiorto controls at visual, but not at auditory identification (posthoc comparison, P < 0.01).Modality affected place of articulation scores (modal-

ity x place interaction). F(4, 92) = 5.74, P < 0.01), aneffect which was not further moderated by experimentalgroup. Audition was superior for discriminating the alveo-lar consonant /d/, vision was better for more anterior placesof articulation (labial, labiodental). In this test material, the

1400 M. Böhning et al. / Neuropsychologia 40 (2002) 1396–1406

Fig. 1. Top panel: boxplots showing distribution of unimodal scores. The difference between groups for vision is significant (post hoc) at P < 0.01.Bottom panel: mean accuracy for each unimodal condition. Audition generates higher scores for /d/, vision for bilabial /b/ and labiodental /th/ and /v/.

velar /g/ is reasonably well identified by vision as well asby audition, possibly reflecting a response bias to /g/ in thevisual condition. These findings are summarised in Fig. 1.Age failed to affect the findings, with one exception.

The age x modality x place interaction was significant(F(4, 92) = 6.52, P < 0.01). This is considered furtherbelow, where age is explored as a categorical variable.WS were less accurate than controls at discriminating vi-

sual, but not auditory tokens. The pattern for discriminationof tokens was affected by modality of presentation. How-ever, experimental status (WS or control) did not affect thispattern.

3.1.2. Bimodal presentationsThe next analysis (SPSS-GLM) explored bimodal re-

sponses. The within-subject factors were: (a) auditory placeof articulation (five levels); (b) visual place of articulation

(five levels). The between subjects factor was again experi-mental status, and age was again entered as a covariate.The only two significant (within-subjects) effects were:

auditory x visual interaction (F(16, 368) = 11.42, P <

0.001) and the auditory x visual x group interaction(F(16, 368) = 5.79, P < 0.001). These are summarisedin Fig. 2. This shows that where audition and vision werecongruent, peak accuracy (measured in terms of auditoryaccuracy) obtained. However, both this effect, and the con-verse (lowering of auditory accuracy by incongruent visualtokens) were reduced in WS compared with controls.

3.1.3. Audiovisual (congruent) compared with auditoryresponsesAnalysis 2 suggests that the groups differ in the extent to

which adding congruent visual information influenced au-ditory accuracy. In analysis 3, this was inspected directly

M. Böhning et al. / Neuropsychologia 40 (2002) 1396–1406 1401

Fig. 2. Audiovisual responses. Labels show auditory (upper case), then visual (lower case) responses. Thus Vd is auditory /v/ with visual /d/. WSresponses are signficantly more accurate for Vd, Vg, Thb and Gv. There are no significant group differences for each of the congruent items (Bb, Vv,THth, Dd and Gg).

by contrasting congruent AV responses with auditory re-sponses alone. Within-subject factors were (a) modality and(b) place. Once again, the group factor was experimentalstatus, and age was entered into the analysis as a covari-ate. There was a main effect of modality: audiovisual scoreswere higher than those for audition alone (F(1, 23) = 8.64,P < 0.01). There was also a place x modality interaction(F(4, 23) = 7.94, P < 0.01), and a three-way interactionwith experimental status (F(4, 23) = 2.97, P < 0.05). 3These findings are summarised in Fig. 3 which shows

the means of individual difference scores for audiovisual-congruent and auditory tokens. It can be seen that the effectof adding congruent vision is general for controls, but islimited to /b/ for WS.The only effect of age was a three-way interaction be-

tween modality, place and age. This is discussed further be-low, where age is considered as a categorical variable.

3.1.4. AgeAge generally did not moderate the (group) findings, yet

two significant effects of age as a covariate did occur. Both inorder to examine this, and also to explore the possibility thatWS patterns may reflect a developmentally delayed pattern,age was explored as a categorical variable. Participants weredivided into a younger (<16 years) group (n = 13, 6 WS)

3 All statistical probability levels were Bonnferroni corrected for repeatvariable testing in this and subsequent analyses.

Fig. 3. Congruent audiovisual scores compared with audition alone, dif-ference scores. In WS, audiovisual enhancement is limited to bilabial(/b/). In controls it is more general.

and an older group (>16 years, n = 13, 7 WS), 4 and anal-yses reported under 1 and 3 above were repeated with ageas the grouping variable, experimental status (WS/control)as the covariate.

4 One participant (WS) was under 16-years-old at first testing, over 16at second testing. Reclassification of age-group including these data firstin the older, then in the younger group failed to affect any of the reportedeffects.

1402 M. Böhning et al. / Neuropsychologia 40 (2002) 1396–1406

Fig. 4. Effects of age (age group analyses) top panel: the distinction between /v/ and /th/ is sensitive to age for auditory, but less so (and in the oppositedirection) for visual presentation bottom panel: /v/ is enhanced by adding vision in the younger group, /th / in the older group.

In the first analysis (unimodal condition (2) x place(5)x age group(2)) all the effects reported above were againsignificant. In addition, there was a significant interactionof place of articulation with age group (F(4, 92) = 3.3.1,P < 0.05), which was further modulated by modality(F(4, 92) = 4.54, P < 0.01). Fig. 4 (top panel) summarisesthese effects. The age group influence is with respect to/v/ and/th/ discrimination. The younger group are poorerat /v/ discrimination and show a bias to /th/ responses forauditory presentations only.In a further analysis contrasting audiovisual (congruent)

and audio-alone there was a significant interaction of modal-ity x place x age group (F(4, 92) = 9.51, P < 0.001). Thisis summarised in Fig. 4 (bottom panel) which shows that vi-sion improved auditory /v/ discrimination in younger view-ers (at the cost of /th/ responses), while in older viewers, /th/was improved by vision (at the cost of /v/).

3.1.5. Are older WS like young controls?A final analysis contrasted visual, audiovisual (congru-

ent) and visual responses for older WS (n = 7, >16 years)and younger controls (n = 7, <16 years). In this analysisthe within-subject factors were modality (three levels), place(five levels). No covariate was introduced. Despite the smallnumbers, there was a significant modality x group interac-tion (F(2, 24) = 8.13, P < 0.01), moderated by place ofarticulation (F(8, 96) = 4.83, P < 0.001). 5 These effectsare summarised in the three panels of Fig. 5.Post hoc comparisons confirmed that WS were poorer

at vision-only than younger controls. In comparisonwith controls, visual /d/ and /g/ were particularly poorly

5 Only significant effects with calculated power value of >0.85 arereported. This vitiates, to some extent, the use of small sample sizes andthe interpretation of higher order interactions.

M. Böhning et al. / Neuropsychologia 40 (2002) 1396–1406 1403

Fig. 5. Older WS and younger controls contrasted. For audition, /v/is reported differently in the two groups. For vision, the WS group isgenerally poorer. There is no significant difference in the bimodal scores.

discriminated (P < 0.02). They also confirmed that youngercontrols were poorer at auditory /v/ discrimination thanother places of articulation (P < 0.05), but that this contrastwas not significant for older WS. In the bimodal condition,contrasts showed no significant differences.

4. Discussion

This experiment compared audiovisual and visual identifi-cation of speech tokens in people with WS and age-matched

controls. These contrasts only allow restricted inferences tobe drawn. In particular, since no mental-age matches wereincluded, we cannot draw strong conclusions concerningthe role of general intellectual capacity in the observedpatterns. It could well be that silent speechreading, partic-ularly, requires the participant to be prepared to use novelmeans to respond. Furthermore, these controls do not allowus to explore the effects of developmental status directly.However, age could be examined (see below) and offerssome interesting insights into differences between WS andage-matched groups.Overall, this task generated slightly more accurate reports

of speech segments by vision alone than audition alone.Place of articulation affected report differently for visionand for audition, with audition generally more accuratefor speech articulated in more posterior parts of the vocalcavity. This pattern was not unexpected, given that place ofarticulation was varied systematically from front to back ofthe mouth, and place of articulation can often be accuratelyseen rather than heard—for example, the contrast between/p/ and /k/ or /th/ and /v/ [27,28]. Although more anteriorproductions (bilabials particularly) were relatively well re-ported from vision, these particular displays also allowedvelar and alveolar consonants to be reasonably well dis-criminated. The excellent quality of the talker image, whichafforded good views of the mouth in action, and the fullrange of head movements associated with speech, may alsohave played a part. While the distinction between /d/ and /g/is generally thought to be hard to discriminate by eye, it canbe perceived from a clearly illuminated speaker in this vowelcontext.There was clear evidence of integration in bimodal report.

Both congruent (Fig. 4) and incongruent (see Fig. 3) tokensshowed marked effects. For example, the ‘classic’ McGurkparadigm, using auditory /b/ with visual /g/ (token Bg inFig. 3), generated fewer than 10% ‘b’ responses in bothtested groups.

4.1. Unimodal processing

People with WS were impaired at identifying consonantsin visual speech tokens, despite showing no significantdifferences from age-matched controls in auditory iden-tification abilities. Nor was their pattern of visual speechidentification simply developmentally delayed, since acomparison of young controls (<16 years) with older WSparticipants still showed significant differences in visualidentification. While even the young controls could makeuse of visual information to distinguish /d/ and /g/, WSparticipants were poor at this, which contributed to theiroverall lower score. Auditory /v/ presented difficulties forthe young controls, who often reported this as /th/. WS par-ticipants showed no apparent difficulty with this contrast—this may go some way to explaining why visual /v/ and/th/ showed a reduced influence on auditory /v/ and /th/ inbimodal report by WS compared with controls (see Fig. 3).

1404 M. Böhning et al. / Neuropsychologia 40 (2002) 1396–1406

4.2. Integration

Since WS were poorer at visual discrimination, visionshould have a reduced effect on bimodal report, and soit proved. Nevertheless, there was extensive integration ofvision and audition in WS (see Figs. 3 and 4), and youngcontrols and older WS did not differ in this regard (Fig. 5,bottom panel). These data fail to offer support for the possi-bility that audiovisual integration at the level of speech seg-ment identification is defective in WS. The question cannotbe closed, however. A strong test of the ‘failure to integrate’hypothesis would match WS with other controls on the ba-sis of similar visual identification accuracy, and explore theoutcome on a case-by-case basis (see Massaro [27], Chapter5, for the rationale for individual psychometric analysis).Exactly what would constitute proof of ‘failure to integrate’is not easy to assess. While Massaro [26,27] claims thatbimodal performance is essentially predictable on the basisof unimodal performance, there are a number of cases thatbuck the trend. De Gelder et al. [12] report a neuropsycho-logical case of visual agnosia with prosopagnosia who wasable to discriminate visual speech tokens, but whose au-diovisual performance was not reliably affected by vision,and a somewhat similar pattern in people with autism [13].In the neuropsychological case, while online visual dis-crimination of general speech-class (bilabial versus ‘other’)could be performed with effort on the patient’s part, theidentification (i.e. post-display report) of visual speech to-kens was unreliable. Attentional processes may well modifybehaviour in these cases. With effort, visual discriminationmay be possible within a specific experimental task, but itmay then make use of unusual processing mechanisms andnot be recruited automatically in the bimodal condition.At all events, while an integration deficit in WS may

be proposed for higher-level processes such as lexical andsemantic-syntactic processing in online speech analysis,or reliance on lexical knowledge in short term memory[17,36,37], there is no clear indication from these data thatlow level cross-modal processing is defective in WS.

4.3. Accounting for the visual deficitin speechreading in WS

People withWS are often observed to be socially sensitiveand great ‘lookers at faces’ [23,33]. Nevertheless, Deruelleet al. [14] found impairments in a range of face processingtasks in WS when compared to age-matched controls, possi-bly reflecting configural deficits in WS [30,31]. The presentstudy comes to a similar conclusion. Natural visual speechis identified abnormally in WS. This contrasts with Deruelleet al.’s claim that lip-reading is unimpaired in WS, whichwas based on a single task of matching static mouthshape.We have pointed out that the identification of natural speechsegments is sensitive to information from other parts of theface, is orientation-sensitive, as well as requiring the (sepa-rate) computation of face movement. Both these factors may

have a role to play in the outcome of this study. Small differ-ences in facial configuration relating mouthshape to the po-sition of tongue, teeth and lips may be critical for correctlyidentifying the consonants used in this experiment. In sup-port of this, within theWS group the only significant correla-tion of any psychometric variable with visual speechreading(partialled for auditory performance) was with non-verbaltests of ability (BAS non-verbal test, and BAS non-verbalpattern construction subtest) /v/ scores correlated signifi-cantly with these test-scores (Spearman’s rho = 0.67, P <

0.02). The relation between visual movement processingand WS was not explored experimentally in this study, buta feasible model for the deficit in speechreading would im-plicate a developmental deficit in movement processing andalso some problems in the normal development of face pro-cessing. In the final section of this paper we draw attentionto a cortical area that may function suboptimally in WS.

4.4. Neurophysiological speculations

Atkinson et al. [5] have demonstrated a deficit in corticaldorsal stream function in WS. 6 Within the lateral temporallobe, the superior temporal sulcus (STS) has afferent con-nections from both the dorsal and ventral streams [1,18].Processing in STS is motion-sensitive, suggesting the directprojection of magnocellular inputs [1] and is specificallyimplicated in the perception of biological motion [19]. STSappears to be specialised for the perception of meaningful fa-cial movements [2] and is implicated in the processing of dy-namic facial information, which is important for skills suchas speechreading and detecting direction of eye gaze [7,32].Although the inferior occipito-temporal regions are the maincortical areas implicated in face detection and processing,some brain-imaging studies also reveal activation of theSTS for face-photograph identification tasks [18]. STS hasback-projections to inferotemporal face processing regionsincluding those parts of the fusiform gyrus that support theprocessing of face images [21]. STS thus may be involved inmoderating the development of these regions and their func-tions. There are some indications that fusiform gyrus func-tions abnormally in face processing in WS [23]. STS is alsoinvolved in the attentionally modulated processing of hier-archically organised stimuli [25]. In such tasks, individualswith WS fail to show the typical bias for global processing.Having highlighted the role of STS in relation to a num-

ber of tasks which are anomalous in WS, it should also benoted that there are several aspects of STS function thatappear unaffected. STS supports the detection of directionof eyegaze [32], yet children with WS can be sensitiveto direction of eye gaze and can use it to infer intention[33]. STS is strongly and specifically implicated in the

6 The fact that this deficit may not be specific to WS is not critical toour argument. We do not know whether other developmentally anoma-lous groups than people with WS may show unusual patterns of visualspeechreading.

M. Böhning et al. / Neuropsychologia 40 (2002) 1396–1406 1405

bimodal integration of audiovisual speech [8], which ap-peared functional in our study. One possibility is that thereare separable functional subsystems within STS. Alterna-tively, some of the functions that appear unimpaired in WSmay be supported by compensatory systems.The general point remains that, since dorsal and ventral

systems develop at different rates [4], anomalous develop-ment of either of these systems must affect their integra-tion in perceptual function, and thus affect the developmentof localised systems and subsytems equipped to perform arange of visual tasks. The delineation of these systems inWS and other developmental disorders is a future challengefor cognitive neurodevelopment.

Acknowledgements

This work was supported by a scholarship to MBfrom the German Academic Exchange Service (DAAD)and from MRC Programme Grant no. G9715642 to A.Karmiloff-Smith. It is based on a thesis completed as partof the first author’s M.Sc. degree requirement UniversityCollege London). The study was approved by the ethicalcommittees of UCLH and local health service trusts. Wethank Julia Grant for help with selection of appropriateclinical participants and the HGSS-Youth Club and theWS Foundation for their help in recruiting participants.Gemma Calvert and David Harwood are thanked for theirhelp in the development of the audiovisual test material.R. Campbell acknowledges the support of the Belle vanZuylen Foundation (University of Utrecht) in preparingthis paper.

References

[1] Ahlfors SP, Simpson GV, Dale AM, Belliveau JW, Liu AK,Korvenoja A, et al. Spatiotemporal activity of a cortical network forprocessing visual motion revealed by MEG and fMRI. Journal ofNeurophysiology 1999;82:2545–55.

[2] Allison T, Puce A, McCarthy G. The neurobiology of socialcognition. Trends in Cognitive Sciences 2000;4:267–78.

[3] Arnold R, Yule W, Martin N. The psychological characteristics ofinfantile hypercalcaemia: a preliminary investigation. DevelopmentalMedicine and Child Neurology 1985;27:49–59.

[4] Atkinson J. Early visual development: differential functioning ofparvocellular and magnocellular pathways. Eye 1992;6:129–35.

[5] Atkinson J, King J, Braddick O, Nokes L, Anker S, Braddick, et al.A specific deficit of dorsal stream function in Williams syndrome.NeuroReport 1997;8:1919–22.

[6] Bellugi U, Lichtenberger L, Mills D, Galaburda A, Korenberg J.Bridging cognition, brain and molecular genetics: evidence fromWilliams syndrome. Trends in Neurosciences 1999;22:197–207.

[7] Calvert GA, Bullmore ET, Brammer MJ, Campbell R, WilliamsSC, McGuire PK, et al. Activation of auditory cortex during silentlipreading. Science 1997;276:593–6.

[8] Calvert GA, Campbell R, Brammer MJ. Evidence from functionalmagnetic resonance imaging of cross-modal binding in the humanheteromodal cortex. Current Biology 2000;10:649–57.

[9] Campbell R, Garwood J, Franklin S, Howard D, Landis T, Regard M.Neuropsychological studies of auditory-visual fusion illusions. Fourcase studies and their implications. Neuropsychologia 1990;28:787–802.

[10] Campbell R, Zihl J, Massaro DW, Munhall K, Cohen MM.Speechreading in the akinetopsic patient LM. Brain 1997;121:1794–803.

[11] De Gelder B, Vroomen J. Impaired speech perception in poor readers:evidence from hearing and speechreading. Brain and Language1998;64:269–81.

[12] De Gelder B, Vroomen J, Bachoud-Levi A. Impaired speechreadingand audiovisual speech integration in prosopagnosia. In Campbell R.,Dodd BJ, Burnham D, editors. Hearing by Eye II. Hove: PsychologyPress, 1998.

[13] De Gelder B, Vroomen J, van der Heide L. Face recognition andlip-reading in autism. European Journal of Cognitive Psychology1991;31:69–86.

[14] Deruelle C, Mancini J, Livet MO, Cassé-Perot C, de Schonen S.Configural and local processing of faces in children with Williamssyndrome. Brain and Cognition 1999;41:276–98.

[15] Dunn LM, Whetton C, Burley J. The British Picture VocabularyScale NFER-Nelson Publishing Company Ltd., 1997.

[16] Elliott CD, Smith P, McCulloch K. British Ability ScalesNFER-Nelson Publishing Company Ltd., 1997.

[17] Grant J, Karmiloff-Smith A, Gathercole SA, Paterson S, Howlin P,Davies M, et al. Phonological short-term memory and its relationshipto language in Williams syndrome. Cognitive Neuropsychology1997;15:81–99.

[18] Haxby J, Hoffman EA, Gobbini MI. The distributed human neuralsystem for face perception. Trends in Cognitive Science 2000;4:223–32.

[19] Howard R, Brammer M, Wright I, Woodruff PWR, Bullmore ET,Zeki S, et al. A direct demonstration of functional specialisationwithin motion-related visual and auditory cortex of the human brain.Current Biology 1996;6(8):1015–9.

[20] Jordan TR, Bevan KM. Seeing and hearing rotated faces: influencesof facial orientation on visual and audiovisual speech recognition.Journal of Experimental Psychology: Human Perception andPerformance 1997;23:388–403.

[21] Kanwisher N, McDermott J, Chun MM. The fusiform face area: amodule in human extrastriate cortex specialised for face perception.Journal of Cognitive Neuroscience 1997;17:4302–11.

[22] Karmiloff-Smith A, Tyler L, Voice K, Sims K, Udwin O, HowlinP, et al. Linguistic dissociations in Williams syndrome: evaluatingreceptive syntax in on-line and off-line tasks. Neuropsychologia1998;36:343–51.

[23] Karmiloff-Smith A, Klima E, Bellugi U, Grant J, Baron-Cohen S.Is there a social module? Language, face processing, and theory ofmind in individuals with Williams syndrome. Journal of CognitiveNeuroscience 1995;7:196–208.

[24] Klein AJ, Armstrong BL, Greer MK, Brown FR. Hyperacusis andotitis media in individuals with Williams syndrome. Journal of Speechand Hearing Disorders 1990;55:339–44.

[25] Lamb MR, Robertson LC, Knight RT. Component mechanismsunderlying the processing of hierarchically organised patterns:inferences from patients with unilateral cortical lesions. Journalof Experimental Psychology: Learning Memory and Cognition1990;16:471–83.

[26] Massaro DW. Speech perception by ear and eye: a paradigm forpsychological enquiry. Hillsdale NJ: Lawrence Erlbaum, 1987.

[27] Massaro DW. Perceiving talking faces: from speech perception to abehavioural principle. Cambridge, MA: MIT Press, 1998.

[28] McGurk H, MacDonald JW. Hearing lips and seeing voices. Nature1976;264:746–8.

[29] Mervis CB, Morris, CA, Bertrand J, Robinson B. Williams syndrome:findings from an integrated program of research. In: Tager-FlusbergH, editor. Neurodevelopmental disorders. Cambridge, MA: MITPress, 1999. p. 65–110.

1406 M. Böhning et al. / Neuropsychologia 40 (2002) 1396–1406

[30] Mervis CB, Robinson BF, Pani JR. Visuo-spatial construction.American Journal of Human Genetics 1999;65:1222–9.

[31] Pani JR, Mervis CB, Robinson BF. Global spatial organisationby individuals with Williams syndrome. Psychological Science1999;10:453–8.

[32] Puce A, Allison T, Bentin S, Gore JC, McCarthy G. Temporal cortexactivation in humans viewing eye and mouth movements. Journal ofCognitive Neuroscience 1998;18:2188–99.

[33] Reilly J, Klima ES, Bellugi U. Once more with feeling: affect andlanguage in atypical populations. Development and Psychopathology1990;2:367–91.

[34] Rosenblum LD, Johnson JA, Saldaña HM. Point-light facial displaysenhance comprehension of speech in noise. Journal of Speech andHearing Research 1996;39:1159–70.

[35] Rosenblum LD, Yakel DA, Green K. Face and mouth inversioneffects on visual and audiovisual speech perception. Journal ofExperimental Psychology: Human Performance Perception 2000;26:806–19.

[36] Tyler LK, Karmiloff-Smith A, Voice KL, Stevens T, Grant J, UdwinO, et al. Do individuals with Williams syndrome have bizarresemantics? Evidence for lexical organisation using an on-line task.Cortex 1997;33:515–27.

[37] Vicari S, Carlesimo G, Brizzolara D, Pezzini G. Short-termmemory in children with Williams syndrome: a reduced contributionof lexical-semantic knowledge to word span. Neuropsychologia1996;34(9):919–25.

[38] Williams JCP, Barratt-Boyes BG, Lowe JB. Supravalvular aorticstenosis. Circulation 1961;14:1311–8.