17
Revising segmentation hypotheses in first and second language listening John Field Department of Applied Linguistics, University of Reading, Whiteknights, PO Box 218, Reading, RG6 6AA, UK Received 15 June 2007; received in revised form 2 September 2007; accepted 31 October 2007 Abstract Any on-line processing that takes place while an utterance is unfolding is extremely tentative, with early-formed hypotheses having to be revised as the utterance proceeds. The hypotheses in question relate not only to the words that are present but also to where their boundaries fall. This study exam- ines how first and second language listeners adjust their segmentation assumptions as new perceptual evidence comes in. It employs a variant of the gating task in which subjects transcribe a short utter- ance presented in sections of gradually increasing length. The first two presentations were phoneti- cally ambiguous and could be segmented in any one of three ways. The third and fourth presentations provided disambiguating input; subjects’ responses were examined at these points to see how quickly they switched from a wrong interpretation to one that fitted the perceptual evidence. The results indicated a significant difference in the way in which first and second language listeners deal with incorrect segmentation hypotheses. Whereas native listeners are quick to change their interpretations on the basis of incoming evidence, non-native listeners are considerably more reluc- tant to do so. Ó 2008 Published by Elsevier Ltd. Keywords: Perseveration; Gating; Second language listening; Spoken word recognition; Auditory processing; On-line processing; Lexical segmentation 0346-251X/$ - see front matter Ó 2008 Published by Elsevier Ltd. doi:10.1016/j.system.2007.10.003 E-mail address: j.c.fi[email protected] Available online at www.sciencedirect.com System 36 (2008) 35–51 www.elsevier.com/locate/system SYSTEM

Revising segmentation hypotheses in first and second language listening

Embed Size (px)

Citation preview

Available online at www.sciencedirect.com

System 36 (2008) 35–51

www.elsevier.com/locate/system

SYSTEM

Revising segmentation hypotheses in firstand second language listening

John Field

Department of Applied Linguistics, University of Reading, Whiteknights, PO Box 218, Reading, RG6 6AA, UK

Received 15 June 2007; received in revised form 2 September 2007; accepted 31 October 2007

Abstract

Any on-line processing that takes place while an utterance is unfolding is extremely tentative, withearly-formed hypotheses having to be revised as the utterance proceeds. The hypotheses in questionrelate not only to the words that are present but also to where their boundaries fall. This study exam-ines how first and second language listeners adjust their segmentation assumptions as new perceptualevidence comes in. It employs a variant of the gating task in which subjects transcribe a short utter-ance presented in sections of gradually increasing length. The first two presentations were phoneti-cally ambiguous and could be segmented in any one of three ways. The third and fourthpresentations provided disambiguating input; subjects’ responses were examined at these points tosee how quickly they switched from a wrong interpretation to one that fitted the perceptual evidence.The results indicated a significant difference in the way in which first and second language listenersdeal with incorrect segmentation hypotheses. Whereas native listeners are quick to change theirinterpretations on the basis of incoming evidence, non-native listeners are considerably more reluc-tant to do so.� 2008 Published by Elsevier Ltd.

Keywords: Perseveration; Gating; Second language listening; Spoken word recognition; Auditory processing;On-line processing; Lexical segmentation

0346-251X/$ - see front matter � 2008 Published by Elsevier Ltd.

doi:10.1016/j.system.2007.10.003

E-mail address: [email protected]

36 J. Field / System 36 (2008) 35–51

1. Speech decoding as a tentative process

1.1. Decoding and the L1 listener

One way of viewing second language listening is as a form of expertise which the nativelistener possesses and the learner aims to acquire. The value of this perspective is that itprovides us with a benchmark in the form of a model of expert listening against whichthe performance of learners can be measured. It also enables us to draw upon empiricalevidence obtained by cognitive psychologists, phoneticians and others in order to identifydistinct processes which can form the basis for listening practice (Field, in press).

But when we examine first language research, it becomes clear that it offers us twoapparently contradictory accounts of how words, phrases and clauses are recognised.One view, widely held by psycholinguists, is that listeners do not wait until the end of amajor syntactic constituent before attempting to assemble the words that a speaker hasuttered. Instead, they analyse the speaker’s utterances while they are still unfolding. Earlystudies by Marslen-Wilson (1973, 1975) suggest that listeners are capable of accuratelyrepeating (or ‘shadowing’) what a speaker says at a delay of only about 200–250 millisec-onds behind the speaker’s voice. What is more, the shadowing process appears to entailmatching at word level and not just the parroting of sounds, since incorrectly pronouncedwords are corrected in the process of reporting them. The delay is about the length of asyllable in English, suggesting that the syllable may be an important unit of analysis forthe listener.

It is reasonable to express a few reservations about the original findings. The speakerswere talking relatively slowly and the material was read-aloud rather than natural con-nected speech. In addition, just because listeners are capable of shadowing speakers in thisway, it does not mean that they actually do so in practice when processing a piece ofextended discourse. The demands upon attention may simply be too heavy. Nevertheless,the assumption that listening takes place ‘on line’ has become widely accepted, and is atenet of many psycholinguistic models. It has also given rise to Marslen-Wilson’s ownCohort Theory (1987) in which the opening sounds of an utterance activate a range of pos-sible word matches, which is gradually narrowed down as the utterance continues andmore perceptual and contextual evidence becomes available.

A rather different picture is painted by the evidence of researchers who have examinedthe intelligibility of small groups of sounds within stretches of connected speech. Inanother early landmark study, Pickett and Pollack (1963) found that only 55% of wordsexcised from a piece of connected speech could be recognised without accompanying pho-netic context. We can find an obvious explanation for this phenomenon in the accountsprovided by phoneticians, (e.g. Cruttenden, 1986), who draw attention to the reductiveeffect of intonation. Intonation groups give prominence to one focally stressed syllable,at the expense of weakening the duration, loudness and/or vowel quality of those that sur-round it. It is thus unsurprising if it is difficult for listeners, even L1 listeners, to matchexcised syllables or pairs of syllables to known words. The word is in effect, subsidiaryto the group in which it occurs. There are also, of course, factors such as assimilation,resyllabification and elision, which cause the word to vary from its citation form in waysthat are heavily dependent upon the context in which the word occurs (Brown, 1990).

How, then, is one to reconcile these two accounts: one suggesting that listeners processspoken input as it is heard, the other indicating that they are incapable of matching short

J. Field / System 36 (2008) 35–51 37

extracts from connected speech to words? The only way of doing so is to assume that, evenat the level of decoding, listening is a highly tentative process, with the listener constantlyforming and revising hypotheses as the evidence accumulates. If a listener does indeedform a set of possible word matches on the strength of an initial syllable, then thosematches must be provisional and subject to revision as time goes on. Current theory wouldrepresent the process as a type of competition, in which candidate words receive variousdegrees of activation to the extent that they are supported by perceptual evidence (Luceand McLennan, 2005, pp. 595–597). There finally arrives a point at which the activationof one so outstrips that of the others that it emerges as the best match and the wordcan be said to have been identified.

We have discussed this process so far in relation to individual words but, as Broersma &Cutler point out elsewhere in this issue, we need to extend the notion of competition acrossword boundaries. Just as hospital and hospitality might both be activated by the openingsequence [hAspIt], so we also have to accept that a sister and assist her might be competingcandidates for a listener who encounters the sequence [EsIstE]. The speech signal does not,like most written languages, have consistent gaps between words; it is the listener who hasto determine where one word ends and the next begins (see also al Jaffer, this issue).

The process of lexical segmentation appears uncomplicated if one assumes that listeningtakes place in a linear fashion (Cole and Jakimik, 1980): i.e. that, once we have identifiedthe first word in an utterance, we can locate the point at which that word ends and thus thepoint at which the next one begins. However, the lexicons of most languages do not permitthis. Firstly, smaller words are often embedded in larger ones, making it difficult to tellwhen a word is complete. For example, hearing the word manager, we might assume whenwe came to the end of the first syllable (man) or of the second (manner) that we had iden-tified the first word of the utterance and that another was about to begin.1 The listener alsohas to deal with sequences that can be divided in different ways, like the a sister/assist her

example just given. The sentence The captain’s in cabin sixteen might contain all of the fol-lowing words besides the target ones: cap, capped, tin, in, zinc, cab, sick, six. The uncer-tainty may sometimes extend over several words: a listener who has decoded a piece ofinput as the waiter cut it . . . would have to suddenly backtrack and substitute the way

to cut it if the sentence proved to end with the words . . . is like this.All of this reinforces the conclusion that decoding speech is a highly tentative process,

with decisions between candidates still needing to be made at the level of the clause. It isnot just word-level recognition that is subject to negotiation; but also the segmentationchoices that learners make. An early assumption in favour of a particular word boundarydistribution may be called into question as more perceptual evidence becomes available. Itis this process of checking and revising initial segmentation preferences that the presentstudy proposes to investigate.

1.2. Decoding and the L2 listener

If the outcomes of on-line decoding are tentative for first language listeners, howmuch more so must they be for those listening to a foreign language. Alongside the

1 Examining a corpus of 20,000 English words, Luce (1986) calculated that, with frequency weighting, only 39%of words in normal speech are unique before their offsets and only another 23% at offset.

38 J. Field / System 36 (2008) 35–51

difficulties caused by the way in which the speech signal unfolds in time, L2 listeners,until quite a high level of proficiency, are very conscious of their limited expertise.Three factors in particular give rise to a lack of confidence in what they extract fromthe speech signal:

� Their incomplete vocabulary repertoire against which to match fragments of speech.Note that the issue here is not so much how many words a listener knows as how read-ily the listener can identify known words when they occur in connected speech;� Their more limited experience of listening to the language: with consequences for their

phoneme values and their ability to apply appropriate lexical segmentation strategies;� Possible gaps in the co-text with which the listener has to work, where words have not

been decoded or have been decoded inaccurately. The listener may not be able to fallback upon co-text to support uncertain word recognition in the way a native listenercan.

Their confidence in any set of word candidates that is activated can be assumed to bemuch lower than that of a native listener. The activation of all candidates can be expectedto be weaker. In these circumstances, the listener either will be slower to narrow the setdown to a single candidate; or alternatively will reduce the level of evidence that isrequired for recognition, at the risk of achieving an incorrect match.

It thus seems apparent than an L2 listener handles the forming and checking of wordrecognition and segmentation hypotheses rather differently from the way a native listenergoes about it. However, the difference might run in one of two directions:

� An L2 listener might focus narrowly upon the acoustic cues provided by the signal, andthus be quick to abandon any preferred working hypothesis once the evolving inputprovided counter-indications. This would accord with a received view of unskilled L2listening as heavily dependent upon perceptual processing, to the possible exclusionof building larger-scale meaning.� An L2 listener might be reluctant to abandon a preferred hypothesis once it has been

favoured. This would indicate that the L2 listener relies heavily upon first impressions,even at the risk of achieving a decoding outcome that may not fit the evidence.

In the first case, the L2 listener might abandon a hypothesis as fast as a native listener –or even faster, given the degree of uncertainty that attended the hypothesis from the start.In the second case, the L2 listener might prove slower than a native one to abandon ahypothesis.

With this in mind, we look specifically in the present study at how second language lis-teners handle lexical segmentation preferences that are not substantiated by subsequentinput. The aim is to make a comparison between their reaction at the point where theirworking hypothesis is overruled and the reactions of native listeners under similar circum-stances. The research questions are thus: How flexible are native listeners in redistributingword boundaries once disambiguating information becomes available? Are non-native lis-teners equally flexible? An appropriate way of investigating them is afforded by the ‘gating’method (Grosjean, 1980, 1985), which provides insights into the way in which listeners’interpretations of spoken input develop over time and as the available perceptual informa-tion increases.

J. Field / System 36 (2008) 35–51 39

2. The relevance of ‘gating’

2.1. The gating paradigm

In gating, the researcher divides a piece of speech into short sections (‘gates’) which arepresented to the listener incrementally. To take a simple example, the sentence I’ve never

been here before. might be divided into syllable-sized sections and presented to subjects inchunks of gradually increasing size, as shown below

Gate 1: I

Gate 2: I’ve

Gate 3: I’ve nev

Gate 4: I’ve neverGate 5: I’ve never been

Gate 6: I’ve never been here

Gate 7: I’ve never been here be

Gate 8: I’ve never been here before

At each gate, the listener attempts to record an impression of what the speaker has saidso far. In this way, we can track the way in which hypotheses are formed, checked andrejected as more and more of the sentence becomes available.

The syllable-by-syllable example above is simply for illustration. Researchers have var-ied in deciding how to locate the gates, but most have chosen to divide the spoken inputinto time slices, usually of 50 ms. The first gate thus represents the first 50 ms of the utter-ance, the second the first 100 ms. and so on. Occasionally (Bard et al., 1988), researchershave presented material on a word-by-word basis; but this procedure is seriously open toquestion as it spares the listener from having to make the important lexical segmentationdecisions that connected speech would normally demand. Insights into such decisions areamong the more useful outcomes of the method.

There are certain limitations to the gating method, fully acknowledged by Grosjeanhimself (1996). Firstly, it does not replicate the conditions of normal listening in thatthe listener hears the earlier parts of the utterance multiple times. Despite this, Tylerand Wessels (1985) and Cotton and Grosjean (1984) reported no task effects that wereattributable to this aspect of the procedure. Furthermore, if a listener does not correctlyidentify a word after hearing it three or four times, it strengthens the evidence that on-lineidentification of the word is difficult. Indeed, this is one of the most striking findings thatthe method has produced. Using gating, Grosjean (1985) and Bard et al. (1988) demon-strated very effectively the extent to which a model of word recognition has to allow forretroactive processing. On the strength of his data, Grosjean (1985, p. 309) argues that‘. . .word recognition is not a word-by-word, left-to-right process. Rather, . . . the processis very much a feed-forward, feed-back operation, where there are constant adjustmentsbeing made to early and/or partial analyses . . .’

Secondly, gating requires a forced decision on the part of the subject as to what hasbeen heard. In this respect, its data is post-perceptual: it cannot be said to tap in to thekind of on-line processing that takes place when the signal first meets the listener’s ear.However, particularly if one is investigating second language listening, it is importantnot to ignore the importance of the final interpretation that a listener takes away. It

40 J. Field / System 36 (2008) 35–51

can be assumed that, in the case of many listeners, the processes that are applied in ana-lysing L2 input are inchoate and heavily influenced by long-established L1 routines. It is ofinterest to establish not just how listeners handle the signal but also what preferences theyopt for in relation to the possibly conflicting cues with which they are confronted.

A third concern is that the method requires an all-or-nothing response. This aspect of ithas been modified by some researchers, (e.g. Grosjean, 1985), who have asked listeners toindicate their degree of confidence in the interpretation they provided. Nevertheless, gatingmight appear to be inconsistent with the activation models of speech processing that arewidely accepted today, and which assume that a listener carries multiple competing candi-dates in his/her mind until any perceptual ambiguity is resolved (McQueen, 2004). In fact,there need not be any such conflict. The choices made during gating (including choices ofword segmentation) can be treated as indicative of which candidate is, at any givenmoment, the most highly activated on the basis of the evidence available. With this shiftof emphasis, one can still track the rise and fall of hypotheses. It simply has to be assumedthat when a hypothesis is abandoned it is because it has received inhibitory activation fromevidence in the input.

Researchers continue to make use of the gating paradigm. It has been used (Lindfieldet al., 1999; Arciuli and Cupples, 2003) to investigate what minimal evidence a listenerneeds before word recognition is achieved. It has also been used in clinical contexts toinvestigate the effects of ageing (Stine and Wingfield, 1994), aphasia (Tyler, 1992) and spe-cific language impairment (Montgomery, 1999).

Whatever reservations may have been expressed, the great strength of the method lies inthe richness of the data it provides. It tracks the way in which, in on-line processing, lis-teners have to deal with incomplete fragments of speech. It provides insights into the typesof preference that listeners exercise on the basis of this partial evidence and the factors thatmight influence them in forming preferences. It indicates whether listeners are guided inlocating word boundaries by strategies that are characteristic of their first language orby those of the language that they are acquiring (Field, 2001). And very importantly fromthe point of view of the present research questions, it provides evidence of how quickly lis-teners react when a hypothesis is discredited. How confidently and how promptly do theyrevise their first assumptions?

2.2. Gating and the L2 listener

The gating method is, of course, dependent upon the ability of subjects to represent thesounds they have heard in the form either of an item of vocabulary or of an approximationthat employs the GPC (grapheme–phoneme correspondence) rules of the target language.This might appear to rule out the use of the method with non-native listeners. However,Shockey (2003) demonstrates convincingly that gating can indeed be used to achieveinsights into the processing of L2 listeners once they achieve a basic level of competencein their listening and writing skills. A pilot trial by the present author as a prelude tothe present study also indicated that reliable results can be derived from the method, pro-viding that due allowance is made for erratic spelling and that any ambivalent spellings areexcluded from the data.

Shockey (2003, pp. 120–124) undertook three pilot studies of L2 auditory processingusing the gating method. They all featured natural speech; and the subjects were quiteadvanced listeners studying at university, either in Hong Kong or in the UK. Shockey’s

J. Field / System 36 (2008) 35–51 41

data indicated exactly the kind of retroactive process that has been observed with nativelisteners (Grosjean, 1985), where initial impressions are revised as more of the intonationgroup is heard. On the basis of these and earlier results with native listeners, Shockey (pp.103–104), like Grosjean, expresses scepticism about our ability to detect words at all in theearly moments of an utterance.2 She also concludes that non-native listeners need a longerpiece of input than natives in order to achieve recognition of individual words. The reasonshe proposes for this ‘processing lag’ is that (2003, p. 122) ‘They depend heavily on syn-tactico-semantic information to arrive at an understanding rather than using phonologicalcontext to disambiguate [reduced forms of words]’. Her conclusion thus runs counter tothe idea that L2 listeners tend to focus excessively on the signal at the expense of widermeaning (though it should be noted that the subjects in question were advanced listeners).Her evidence also suggests (relevantly to the present study) that non-native listeners maybe more likely than native ones to adhere to incorrect matches made early on in the into-nation group, instead of adjusting their interpretation as more and more of the group isheard. However, Shockey’s concern is chiefly with word recognition rather than lexicalsegmentation.

3. Research design

3.1. Method

The goal of the present enquiry was to establish how sensitively second language listen-ers respond to acoustic–phonetic evidence that disambiguates a previously ambiguouspiece of speech – and, in particular, to evidence that requires the listener to redistributeword boundaries. It therefore made rather different use of the gating method from thestudies described so far. It required subjects to transcribe short sequences which beganwith two syllables that were phonetically ambiguous. They were ambiguous not just interms of the words they contained but also in terms of where the boundaries of thosewords fell. An example of this kind of two-syllable stem might be the sequence [draIvE]which is potentially interpretable as driver, as drive a or as drive a- (the first syllable ofa(way), a(long)). The ambiguity was only resolved for the listener when the following syl-lable was heard.

Part of the data that was obtained provided insights into the preferences that L1 and L2listeners might manifest when presented with input that was open to more than one pos-sible segmentation. However, we focus here solely on to how subjects reacted once disam-biguating input was provided (driver killed, drive a car, drive away) which overturned theiroriginal hypothesis.

3.2. Stimuli

In all, eight stems were employed. They shared the same strong–weak (SW) rhythmicpattern. Each permitted of three different segmentations: into a two-syllable word (driver),into two monosyllabic words (drive a) or into a monosyllabic word followed by the

2 Though it is interesting to note that, with one of her more successful listeners, word identification becamemore accurate once 250 ms of the signal was available.

Table 1Ambiguous stems and complete utterances

Stem SW S + W S + W–

1. [ -i:tEn] Eaten/up Eat an/egg Eat en/ough2. [ -weItE] The waiter/came The way to/Cambridge Weigh to/matoes3. [ -p�kIt] Packet/of tea Pack it/tightly The pack it/self4. [ -SOtEn] Shorten/the coat Short and/thin Short an/nouncement5. [ -draIvE] Driver/killed Drive a/cab Drive a/cross6. [ -selE] Cele/brate Sell a/brush Sell a/broad7. [ -glA:sIz] The glasses/are broken The glass is/broken The glass es/caped damage8. [ -wi:kEn] Weaken/the drink We can/try We con/tinue

42 J. Field / System 36 (2008) 35–51

beginning of another word (drive a-). The three interpretations were controlled for wordfrequency by reference to the British National Corpus (Leech et al., 2001) to ensure thatnone was markedly less probable than the others. The items were also controlled forlength. Table 1 lists the stems used in the study, together with the full phrases and clausesof which they formed part. Slashes indicate the point up to which the variants of the stemwere deemed to be ambiguous.

It was by no means easy to design three-way sequences of this kind, and two conces-sions had to be made. Firstly, the word the was included in two variants of Item 2 andone of Item 3, since its omission might have introduced a syntactic constraint againstthe formulaic sequences the way to and the . . . itself. Secondly, one variant from item 7(glass es/scaped) was not entirely ambiguous.

The items were distributed into three sets (A, B and C). Each set consisted of one var-iant of each of the eight items, in the same order as above. The sets were then recorded atnormal speed by a professional voice actress, a speaker of British RP. Sony DAT equip-ment was used. It was clearly vital to the experiment that the initial sequences should beperceptually very similar, if not identical, across conditions. The actress was asked to pro-nounce the words eaten and shorten and weaken with a final schwa + /n/ rather than a syl-labic this variant is one that is increasingly common among younger users of Britishsouthern English. She was asked to use the forms [E -nVf] for enough and [ -selEbreIt] forcelebrate.

The DAT recording was transferred to computer, and was edited using Sound WaveStudio, a sound editing program operating at 16 bits and 22 kHz. Each variant wasdivided into four gates. The gates were based on syllables rather than on timing:

Gate a: after the first S syllable, e.g. / -weI/Gate b: after the first SW sequence, e.g. / -weItE/Gate c: midway between gate b and the end, e.g. / -weItE -mA:/Gate d: whole item, e.g. / -weItEmA:tEuz/

After gating, the stimulus available at each gate b was recorded separately and submit-ted to five native-listener judges, who had no specialised ear-training or knowledge of pho-netics. The judges were supplied with a list of the 24 phrases used in the experiment andasked, where possible, to match what they heard to a phrase on the list. The judgesexpressed themselves unable to provide matches for most of the stimuli. However, forone, a correct match was identified by all five judges. This was the phrase the way to

J. Field / System 36 (2008) 35–51 43

Cambridge, which proved to be distinctive because of vowel lengthening on way; it wastherefore re-recorded and re-checked.

3.3. Procedure

The edited DAT recording was transferred from computer to audiocassette. The cas-sette was played to subjects in class sets, using high quality equipment designed for lan-guage teaching. The experiments took place in quiet classrooms of small size whichwere used for language lessons and were acoustically sound.

One set of gated items was played to each group of subjects. They were told that theywould hear part of a word, phrase or sentence and that they were to listen carefully andwrite down what they thought they had heard at a. on the answer sheet. They would thenhear a slightly longer part of the sentence which they should write down at b.; then alonger extract to be written down at c., and finally the whole sequence, which was to betranscribed at d. Subjects were asked not to alter any response once they had written itdown – even if the final sequence proved to be different from what they had expected.The first item had longer pauses to ensure that the procedure was understood. At eachgate, the experimenter called out a, b, c or d to indicate the slot on the answer sheet tobe completed. Instructions to non-native groups were given in English; but the researcherchecked to ensure that there had been complete understanding.

3.4. Subjects

Results are reported from three groups of subjects: native speakers of English, nativespeakers of French and a mixed-nationality group of non-native speakers of English.

The English native speakers were pupils aged between 14 and 18, attending secondaryschools in the Cambridge area. Numbers for each of the three sets of items were as follows:

Set A: 15 sixth form pupils; 25 Year 10 pupils. Total: 40Set B: 15 sixth form pupils; 22 Year 10 pupils. Total: 37Set C: 15 sixth form pupils; 20 Year 10 pupils. Total: 35

The French native speakers were students attending classes at the Institut Britannique inParis. They were mainly young adults in their mid-twenties, but a few were older. Theywere all at Intermediate or Upper Intermediate level. Numbers were as follows:

Set A: 14 subjects in 2 classesSet B: 12 subjects in 2 classesSet C: 13 subjects in 2 classes

The subjects in the French groups were asked to provide information about themselves;papers were excluded from subjects who had a language other than French as their firstlanguage. The period during which the subjects had studied English varied widely (from1 to 10 years). However, they had only spent from 1 week to 3 months in total in Eng-lish-speaking countries.

The experiment was also undertaken with a third group of subjects, non-native listenerswith a range of first languages who were students of English at a private language school

44 J. Field / System 36 (2008) 35–51

in Cambridge. These subjects were controlled for level of English: all were in classeslabelled Intermediate or Upper Intermediate and had scores on the school’s entry testwhich indicated that they fell between levels 5 and 8 on the English Speaking Union cri-teria. Numbers were as follows:

Set A: 39 subjects in 3 classesSet B: 27 subjects in 3 classesSet C: 47 subjects in 3 classes

The most represented first languages featured were: Spanish (36), Portuguese (16), Ital-ian (8), German (9), French (6), Korean (9), Japanese (8), Thai (5), Chinese (5).

Although these mixed NNL groups were not controlled for native language, resultsfrom them enable tentative conclusions to be drawn as to the extent to which the responsesof the French subjects can be regarded as characteristic of those of non-native listeners ingeneral. It was fortunate that, amongst this large population of NNL subjects, 36 wereSpanish speakers distributed relatively evenly (N = 12, N = 11, N = 13) across the threesets. This enables separate results to be quoted for subjects whose native language isSpanish.

4. Results at gates c and d

The gating paradigm provides data both about the hypotheses formed by subjects as anutterance proceeds and about the ways in which those hypotheses are abandoned, revisedor adhered to as more information becomes available. In the experiment reported here, theresponses made at gates c and d afforded insights into the second process. Gate c partiallyor wholly resolved the word boundary ambiguity present at earlier gates; and at gate d theentire utterance became available.

Interest lay in establishing whether respondents adjusted their segmentation choice asnew information became available at gates c and d. Were those who had chosen the wrongsegmentation reluctant to abandon their initial hypothesis? And was there any differencein the speed with which native and non-native speakers adjusted word boundarydistributions?

4.1. Achievement of the target segmentation

Responses for gates c and d were analysed to determine the number of subjects who hadachieved the correct segmentation (i.e. the one indicated by the disambiguating input).Overall percentages were as shown in Table 2, with the Spanish sub-group included bothin the NNL figure and separately.

Table 2Subjects choosing target segmentation (mean percentage across items)

NL (N = 112) French (N = 39) NNL (N = 111) Spanish (N = 36)

Gate c 60.79 39.70 40.50 39.36Gate d 91.32 64.43 58.66 56.03

J. Field / System 36 (2008) 35–51 45

A chi-square test on raw figures indicated a highly significant difference across groups atgate c for French, NL and NNL responses: v2 (2) = 88.23, p < 0.001. It was even more sig-nificant at gate d: v2 (2) = 260.66, p < 0.001. The results were also significant when Spanishresponses were substituted for those of NNLs: v2(2) = 62.08, p < 0.001 at gate c andv2(2) = 225.58, p < 0.001 at gate d.

When the groups were paired, this effect proved to be entirely attributable to differencesbetween, on the one hand, the NL group and, on the other, the French and NNL/Spanishones. Thus, at gate c, there was a highly significant effect of group distinguishing NL andFrench subjects (z(1) = 6.52, p < 0.001) and NL and NNL subjects (z(1) = 8.68,p < 0.001), but there was none distinguishing French and NNL (z(1) = 1.18, n.s.). At gated, the effect became even more marked for both NL/French (z(1) = 11.34, p < 0.001) andNL/NNL groups (z(1) = 15.98, p < 0.001); but it remained non-significant for French/NNL (z(1) = 1.81, n.s.).

The accuracy with which an appropriate segmentation is adopted once disambiguationis available would thus appear to be a feature which sharply differentiates the routines ofnative listeners from those of non-native. The evidence of a heightened difference betweenNL and non-NL groups when the full utterance was available at gate d suggests a greaterreluctance by non-native listeners to abandon a segmentation hypothesis once one hasbeen formed and/ or a slower response to new information.

4.2. Reaction to an incorrect segmentation

One of the most useful insights afforded by this experimental method lay in the evidenceit provided of the way in which listeners deal with a segmentation hypothesis that provesto be unfounded.3 An analysis of the data suggested a number of different outcomes wherethere was a wrong choice of segmentation at gate b:

XT: Wrong segmentation X at b – target segmentation T achieved at gate c.XPT: Wrong segmentation at b – same segmentation persists (P) at gate c – targetachieved at gate d.XPP: Wrong segmentation at b – same segmentation persists at gates c and d.X0-: Wrong segmentation at b – no response at gate c – correct or incorrect choice oftarget at gate d.XN-: Wrong segmentation at b – new (N) but wrong segmentation at gate c –correct orincorrect choice at gate d.

Responses for all items incorrectly segmented at gate b were classified according to theabove categories. Two further minor categories were added: XP – where persistence of asegmentation was followed by a zero answer or by a new but incorrect segmentation; andXTN where a correct identification of the target segmentation was abandoned at gate d foran alternative one. Figures for the latter (which only occurred in mixed NNL results) wereadded to those for XT.

3 The terms ‘unfounded’, ‘incorrect’ and ‘wrong’ are used loosely here. The premise was that the stimuli at gateb for the three different targets were indistinguishable. A choice of segmentation was thus only ‘incorrect’ to theextent that it had failed to predict the target one.

0

10

20

30

40

50

60

% o

f inc

orre

ct

segm

enta

tions

XT XPT XPP XP- XN- X0Reactions

NLFNNL

Fig. 1. Revision of incorrect hypothesis formed at gate: six patterns.

46 J. Field / System 36 (2008) 35–51

The raw figures for each category were represented as a percentage of incorrectresponses at gate b by each language group. (NL: N = 503; French: N = 178; NNL:N = 532). Fig. 1 shows how subjects in each of the three groups reacted. It clearly indicatesthe greater degree to which NL subjects switched to the correct segmentational target. Italso shows a consistency between responses by French and NNL subjects. French andNNL reactions over the categories displayed were found to correlate closely (r = 0.952,df = 5, p < 0.01) but to diverge from those of NL subjects (NL/French, r = 0.559,df = 5, n.s.; NL/NNL, r = 0.311, df = 5, n.s.). There was a particularly close correlationof 0.988 between results for the Spanish sub-group and those for the NNL group as awhole, suggesting that the NNL figures are indeed representative.

The different behaviour of native and non-native subjects becomes even more apparentwhen the data is recombined, as in Fig. 2, to reflect only four gate c reactions to an incor-rect segmentation: namely: Target (T) – Persistence (P) of gate b segmentation – New seg-mentation (N) – Hesitation (0 indicating no answer).

At gate c, non-native subjects were considerably more likely than NL subjects to persistwith the segmentation which they had chosen at gate b, in spite of evidence to the contrary(French: 70.21%, NNL: 69.55%); though there is a marked reluctance even among NL

0

10

20

30

40

50

60

70

80

% o

f inc

orre

ct s

egm

enta

tions

XT XP- XN- X0Reactions

NLFNNL

Fig. 2. Revision of incorrect hypothesis formed at gate b: four gate c responses.

J. Field / System 36 (2008) 35–51 47

subjects (55.66%) to abandon a particular segmentational hypothesis. The most strikingdifference is that, by gate d, NL subjects were indeed prepared to switch to a new wordboundary distribution, with only 12.12% showing perseveration, whereas French(42.13% and NNL subjects (50.75%) proved much more reluctant.

In order to check the significance of these differences, figures for the six categories ofresponse were combined to reflect three possible decisions at gate c: Change (XT plus XN)– Persist (XP) – Hesitate. The Hesitate set contained very few occurrences and was thereforeomitted. Raw scores for the remaining two categories of response by three groups of subjectswere then subjected to a chi-square test of association. The result, v2(2) = 36.57, was signif-icant at p < 0.001. When the data was broken down further, the effect was found to applyentirely between NL subjects and the other two groups. z(1) for NL and French subjectswas 3.72, p < 0.001; for NL and NNL subjects, it was 5.70, p < 0.001. This compares withz(1) = 0.27, for French and NNL subjects. A marked difference between the segmentationprocesses of native and non-native listeners is thus indicated.

5. Discussion

The experiment explored the hypothesis that different patterns of behaviour wouldcharacterise the way in which NLs and French subjects redistributed the lexicalboundaries.

5.1. Native listeners

A first finding relates to the way in which native listeners responded once segmentation-al ambiguity was partly or wholly resolved. Unsurprisingly, they adjusted relativelyquickly to evidence that the target segmentation was different from their original hypoth-esis. Of responses showing ‘incorrect’ boundary insertions at gate b, 38.37% switched tothe appropriate segmentation when partial disambiguation occurred and a further48.71% achieved the target segmentation when the whole utterance was available.

However, this flexibility is relative. Some responses manifested a reluctance to abandon anearly segmentational hypothesis. Thus, 43.54% of those who had chosen an incorrect seg-mentation at gate b adhered to it in the face of indications at gate c (admittedly not alwayseasy to interpret) that it might be inappropriate. Perseveration continued to mark 12.12% ofthese responses at gate d, when the whole utterance was available. This could be character-ised as a task effect resulting from earlier presentations, and doubts raised as a consequenceabout the data obtained at gates c and d. But, if a task effect were indeed responsible, onewould expect indications of a similar effect at gate b. There was no such evidence: theresponses at gate b favoured disyllabic SW interpretations (driver), just as Cutler and Carter’sMetrical Segmentation Strategy (1987) would indicate, and were apparently uninfluenced bythe fact that the S syllable had already been presented at gate a. as a discrete monosyllabicword (drive). There is, then, some evidence, even among NLs, of reluctance to abandon wordboundary hypotheses; but the effect is not nearly as marked as it is with non-native listeners.

5.2. Non-native listeners

There were similarities between native and non-native listeners in the segmentationpreferences exhibited at gate b. The major difference identified between the groups lay

48 J. Field / System 36 (2008) 35–51

in the ease with which a wrong segmentational hypothesis was abandoned. When nativelisteners make an incorrect segmentation, they seem able to resolve the problem relativelyquickly; but non-natives seem far less flexible, tending to persist with their original wordboundary allocation in spite of evidence that it is inappropriate. Here, French, mixedNNL and Spanish subjects all performed in the same way.

It might appear unremarkable that non-native listeners should be slower than native toadjust to a correct segmentation. The difference might reflect nothing more than theirrestricted familiarity with the language of acquisition. But the point should be made thatthis experiment did not tap into immediate on-line reactions but into segmentation deci-sions arrived at some seconds after the stimulus. Furthermore, the results recorded as ‘tar-get not achieved’ do not reflect blank answers but incorrect ones. They show not a lack ofany segmentation decision on the part of the mixed NNL and French subjects but an

adherence, in the face of contrary evidence, to an inappropriate segmentation.The observed difference between native and non-native subjects may indeed derive from

a failure of recognition. It may be that NNL subjects were unaware that an alternative seg-mentation was possible because they had not succeeded in identifying the words whichconstituted it. The vocabulary in all the stimuli should have been within the range of learn-ers of English at the stage represented by the subjects, but acquisition of a vocabulary itemis not the same as the ability to recognise it when it occurs in running speech. For example,NNLs may have encountered the word abroad in isolation, but never in a connectedsequence where they had to ascertain that the initial syllable was not the definite articleor an -er suffix. At the very least, therefore, the results underline the gap which often existsbetween knowledge and recognition – a gap which is too often overlooked in classroompractice and in the design of ELT listening materials.

There are reasons for doubting whether the recognition account tells the whole story,however. It seems difficult to sustain in the face of subjects’ continuing failure to achievecorrect segmentations at gate d. Even if recognition were slower for non-native listeners,one would expect it to have been achieved by this point, given that the whole context wasavailable and that the subject had time to adjust to the partial disambiguation provided atgate c. As Table 2 shows, the differential between native and non-native subjects, far fromnarrowing at gate d, became wider.

An alternative interpretation is that the failure of around a third of French and NNLsubjects to achieve the correct segmentation was, in part at least, the result of a persev-

eration effect, whereby an initial hypothesis, once formed, was adhered to in spite of evi-dence that it was wrong. This would seem to indicate an important difference betweenL1 and L2 listeners in terms of lexical processing: namely, that, once having formed asegmentational hypothesis, the latter are far more reluctant to abandon it. Having estab-lished a leading candidate in terms of word recognition or segmentation choice, theyseem unwilling to revive the original competition process in the way that a native listenermight. Certainly, if they do not so under the easily-paced conditions of the gating par-adigm, then it seems highly unlikely that they would under the time constraints of nor-mal everyday listening.

One reason may be that a lack of confidence in their decoding skills makes L2 listenerscautious about using incoming perceptual information to overrule interpretations alreadyestablished. The view that ‘a bird in the hand is worth two in the bush’ seems to prevail. Innormal listening conditions, dropping a provisional segmentation might entail having tostart all over again part way through listening.

J. Field / System 36 (2008) 35–51 49

Alternatively, one might explain this behaviour in terms of the greater cognitivedemands that are imposed when one processes a language that is only partly familiar.They may limit the listener’s ability to carry forward a set of partially activated com-petitors once a lexical or segmentation decision has been made; or there may simply beinsufficient spare attentional capacity to permit the listener to revisit the competitionprocess.

6. Some implications

Failures of understanding by second language listeners can often be attributed not togeneral comprehension processes, but to an ill-matched word or inappropriate segmenta-tion. There is nothing wrong with incorporating the most likely word or segmentation can-didate into a meaning representation; but learners appear insufficiently sensitive to thedanger that a wrong choice will distort later understanding. Teachers of listening needto design their pedagogy in a way that encourages learners to construct and carry forwardprovisional interpretations – but they also need to ensure that these interpretations aretreated with some caution and tested against the perceptual evidence.

Learners thus need practice in segmenting short extracts of connected natural speechinto their component words. This can constitute a simple self-access task, in which theyare given the opportunity to listen and re-listen as often as they wish until they are sat-isfied that they have achieved a correct segmentation. But it can also take the form of aclassroom transcription exercise in which learners are required not simply to write downwhat they think they hear but also to defend their answers. They should compare tran-scriptions with each other and argue in favour of their own, before re-listening to checkwhose version seems correct. They should also be encouraged to assess the probability ofthe interpretations they have favoured (an idea that draws upon the gating tradition).This entails a much less interventionist approach by the listening teacher than the onethat is commonly adopted at present. Instead of judging responses as right or wrong,the teacher asks: How certain are you of that transcription? Ten per cent? Sixty per cent?The teacher’s role is to assist the learner to weigh possibilities, not just to provide theright answer.

The issue of lexical segmentation needs to be given more attention in the listeningclassroom than it currently attracts (Field, 2003). Instructors should raise awarenessof cases where the perceptual evidence might match more than one segmentation can-didate. One way of doing so is by means of simple transcription tasks which featurethe kind of ambiguous material that was included in this study. Sequences might bedictated and possible segmentations compared, or the dictation might be presentedin a ‘garden path’ fashion that requires the retroactive processing that has been evi-denced in gating data:

T dictates: a nice cream . . . dress. S writes ‘an ice cream’ and has to revise it.T dictates: some boxes have . . . arrived. S writes ‘some boxes of’ and has to revise it.T dictates: I want to. . . drive a train. S writes ‘I want a driver’ and has to revise it.

In addition, it is worthwhile devising exercises to make learners sensitive to segmenta-tion cues that are specific to the target language. Languages appear to vary in the strategiesthat determine which segmentation is preferred. Cutler (1997) identifies a metrical

50 J. Field / System 36 (2008) 35–51

principle in relation to English, a moraic one in relation to Japanese and a vowel harmonyone in relation to Finnish. The segmentations indicated by these strategies may not alwaysbe the only ones possible but they have quite a high level of probability of being correctsince they tend to reflect the characteristics of the lexicon in question.

Acknowledgments

I would like to express my gratitude for all the assistance given me in this study by theheads of department, staff and students at Eurocentre Cambridge, Hills Road Sixth FormCollege Cambridge and Long Road Sixth Form College Cambridge. Thanks also go totwo reviewers for helpful comments.

References

Arciuli, J., Cupples, L., 2003. Stress typicality effects in native and non-native speakers of English. In: Proceedings

of the 15th International Congress of Phonetic Sciences, Barcelona: pp. 2051–2054.

Bard, E.G., Shillcock, R.C., Altmann, G.T.M., 1988. The recognition of words after their acoustic offsets in

spontaneous speech: effects of subsequent context. Perception and Psychophysics 44, 395–408.

Brown, G., 1990. Listening to Spoken English, second ed. Longman, Harlow.

Cole, R.A., Jakimik, J., 1980. A model of speech perception. In: Cole, R.A. (Ed.), Perception and Production of

Fluent Speech. Erlbaum, Hillsdale NJ, pp. 133–163.

Cotton, S., Grosjean, F., 1984. The gating paradigm: a comparison of successive and individual presentation

formats. Perception and Psychophysics 35, 41–48.

Cruttenden, A., 1986. Intonation. Cambridge University Press, Cambridge.

Cutler, A., 1997. The comparative perspective on spoken-language processing. Speech Communication 21, 3–15.

Cutler, A., Carter, D.M., 1987. The predominance of strong initial syllables in the English vocabulary. Computer

Speech and Language 2, 133–142.

Field, J., 2001. Lexical segmentation in first and foreign language listening. Unpublished PhD dissertation,

Cambridge University.

Field, J., 2003. Promoting perception: lexical segmentation in L2 listening. ELT Journal 57/4, 325–334.

Field, J., in press. Listening in the Language Classroom. Cambridge University Press, Cambridge.

Grosjean, F., 1980. Spoken word-recognition processes and the gating paradigm. Perception and Psychophysics

28, 267–283.

Grosjean, F., 1985. The recognition of words after their acoustic offsets: evidence and implications. Perception

and Psychophysics 38, 299–310.

Grosjean, F., 1996. Gating. Language and Cognitive Processes 11, 597–604.

Leech, G., Rayson, P., Wilson, A., 2001. Word Frequencies in Spoken and Written English. Longman, Harlow.

Lindfield, K.C., Wingfield, A., Goodglass, H., 1999. The role of prosody in the mental lexicon. Brain and

Language 68, 312–317.

Luce, P.A., 1986. A computational analysis of uniqueness points in auditory word recognition. Perception and

Psychophysics 39, 155–158.

Luce, P.A., McLennan, C.T., 2005. Spoken word recognition. In: Pisoni, D.B., Remez, R.E. (Eds.), The

Handbook of Speech Perception. Blackwell, Oxford, pp. 591–609.

Marslen-Wilson, W., 1973. Linguistic structure and speech shadowing at very short latencies. Nature 244, 522–

523.

Marslen-Wilson, W., 1975. Sentence perception as an interactive parallel process. Science 189, 226–228.

Marslen-Wilson, W., 1987. Functional parallelism in spoken word recognition. Cognition 25, 71–102.

McQueen, J., 2004. Speech perception. In: Lamberts, K., Goldstone, R. (Eds.), The Handbook of Cognition.

Sage, London, pp. 255–275.

Montgomery, J., 1999. Recognition of gated words by children with specific language impairment: an

examination of lexical mapping. Journal of Speech, Language and Hearing Research 42, 735–743.

Pickett, J.M., Pollack, I., 1963. Intelligibility of excerpts from fluent speech. Effects of rate of utterance and

duration of excerpt. Language and Speech 6, 151–164.

J. Field / System 36 (2008) 35–51 51

Shockey, L., 2003. Sound Patterns of Spoken English. Blackwell, Oxford.

Stine, E.A., Wingfield, A., 1994. Older adults can inhibit high-probability competitors in speech recognition.

Aging and Cognition 1, 152–157.

Tyler, L.K., 1992. Spoken Language Comprehension. MIT, Cambridge, MA, pp. 77–84.

Tyler, L.K., Wessels, J., 1985. Is gating an on-line task? Evidence from naming latency data. Perception and

Psychophysics 38, 217–222.