23
INFANT BEHAVIOR AND DEVELOPMENT 6, 263-285 (1983) Perception of Auditory Equivalence Classes for Speech in Early Infancy* PATRICIA K. KUHL Department of Speech and Hearing Sciences Universit 3' of Washington This study focused on pre-verbal infants' ability to categorize speech sounds. Vowels were computer synthesized to simulate productions by men, women, and children. They were highly discriminable. The task examined infants' ability to perceive the similarity among these discriminably different but phonetically identical vowels while maintaining the perceptual differentiation between two vowel categories (/a/and /a/). In Experiment I, six-month-old infants were initially trained to discriminate a single/a/and/a/vowel produced by the computer-simulated male voice. Novel vowels spoken by other talkers were then gradually introduced in a progressive transfer-of-learning task. In Experiment II, infants were initially trained on the same two vowels, but then were immediately tested with all of the novel stimuli. In both experiments, the data show that infants treat discriminably different members of the same vowel category as equivalent, thus suggesting that pre-verbal infants are capable of categorizing speech sounds, a prerequisite for the development of speech perception and speech production. infant speech perception vowel perception categorization constancy Categorization requires that discriminably different stimuli be treated as equivalent. This process depends not only on discrimination of the essential differences be- tween stimuli from different categories, but on the recognition of perceptual sim- ilarity for stimuli in the same category. This study examined the extent to which young infants demonstrate the ability to categorize speech sounds. It was designed to test the infant's recognition of the phonetic equivalence among discriminably different instances representing a single phonetic category while preserving the distinction between two different phonetic categories. Data concerning the infant's ability to categorize speech sounds are especially relevant to theoretical models of speech perception. Research on speech suggests that phonetic categories are multiply-cued and that no simple one-to-one relation exists between segments in the acoustic stream and perceived phonetic structure * This research was supported by a gran! to the author from the National Science Foundation (BNS 81-03581 ). The author also wishes to acknowledge the support provided by Central Institute ['or the Deaf INS 03856 and RR 00396) and particularly A. Maynard Engebretson who synthesized the stimuli and assisted in the preparation of the stimulus tapes for this experiment. The author thanks Kyum Ha Lee of the Child Development and Mental Retardation Center for the development of the logic device used to run the experiment. James Hillenbrand for assisting in the experiment, and Andrew Meltzoff for critical comments on an earlier draft of this manuscript. Correspondence and requests for reprints should be addressed to Patricia K. Kuhl, Department of Speech and Hearing Sciences. University of Washington, Seattle WA 98195, 263

Kuhl-1983-Auditory Equivalence Classes

  • Upload
    spivak4

  • View
    19

  • Download
    3

Embed Size (px)

DESCRIPTION

A paper about auditory equivalence classes in natural language.

Citation preview

Page 1: Kuhl-1983-Auditory Equivalence Classes

INFANT BEHAVIOR AND DEVELOPMENT 6, 263-285 (1983)

Perception of Auditory Equivalence Classes for Speech in Early Infancy*

PATRICIA K. KUHL Department of Speech and Hearing Sciences

Universit 3' of Washington

This study focused on pre-verbal infants' ability to categorize speech sounds. Vowels were computer synthesized to simulate productions by men, women, and children. They were highly discriminable. The task examined infants' ability to perceive the similarity among these discriminably different but phonetically identical vowels while maintaining the perceptual differentiation between two vowel categories ( /a /and /a/). In Experiment I, six-month-old infants were initially trained to discriminate a s i n g l e / a / a n d / a / v o w e l produced by the computer-simulated male voice. Novel vowels spoken by other talkers were then gradually introduced in a progressive transfer-of-learning task. In Experiment II, infants were initially trained on the same two vowels, but then were immediately tested with all of the novel stimuli. In both experiments, the data show that infants treat discriminably different members of the same vowel category as equivalent, thus suggesting that pre-verbal infants are capable of categorizing speech sounds, a prerequisite for the development of speech perception and speech production.

infant speech perception vowel perception categorization constancy

Categorization requires that discriminably different stimuli be treated as equivalent. This process depends not only on discrimination of the essential differences be- tween stimuli from different categories, but on the recognition of perceptual sim- ilarity for stimuli in the same category. This study examined the extent to which young infants demonstrate the ability to categorize speech sounds. It was designed to test the infant's recognition of the phonetic equivalence among discriminably different instances representing a single phonetic category while preserving the distinction between two different phonetic categories.

Data concerning the infant's ability to categorize speech sounds are especially relevant to theoretical models of speech perception. Research on speech suggests that phonetic categories are multiply-cued and that no simple one-to-one relation exists between segments in the acoustic stream and perceived phonetic structure

* This research was supported by a gran! to the author from the National Science Foundation (BNS 81-03581 ). The author also wishes to acknowledge the support provided by Central Institute ['or the Deaf INS 03856 and RR 00396) and particularly A. Maynard Engebretson who synthesized the stimuli and assisted in the preparation of the stimulus tapes for this experiment. The author thanks Kyum Ha Lee of the Child Development and Mental Retardation Center for the development of the logic device used to run the experiment. James Hillenbrand for assisting in the experiment, and Andrew Meltzoff for critical comments on an earlier draft of this manuscript. Correspondence and requests for reprints should be addressed to Patricia K. Kuhl, Department of Speech and Hearing Sciences. University of Washington, Seattle WA 98195,

263

Page 2: Kuhl-1983-Auditory Equivalence Classes

264 KUHL

(Liberman, Cooper, Shankweiler, & Studdert-K.ennedy, 1967). The lack of acous- tic invariance for exemplars perceived to be phonetically equivalent led authors to describe it as a "constancy" problem for speech (Kuhl, 1979a; 1980; Shankweiler, Strange, & Verbrugge, 1977) because it shares certain characteristics with the classic cases in vision (see Kuhl, in press a, for discussion). Data demonstrating the age at which infants recognize phonetic equivalence contribute to arguments about the basis of category recognition for speech (Kuhl, in press a).

Inquiries concerning the infant's recognition of speech-sound categories are equally important to theories of vocal learning, particularly in cases such as this one in which infants must recognize phonetic equivalence for speech produced by different talkers. To explain this further, the infant's vocal tract is not capable of producing the absolute formant frequencies represented in the speech of adult lis- teners (Liberman, 1980). An attempt to imitate adult speech by a direct match of the frequencies involved would not prove possible. An infant's recognition of the equivalence among the vowels produced by men, women, and children would suggest that they have access to a level of representation that is not frequency specific. If so, then infants attempting to imitate the vowels produced by adults would simply aim to produce a vowel whose abstract form matched that of the adult. Thus, the infant's recognition of phonetic equivalence for the productions of different talkers has additional importance--it is a prerequisite for vocal imitation.

Three lines of research are pertinent to this work. First, research on the particular categorization problem addressed in this study (vowel categorization); second, previous infant research on the perceptual organization of auditory stimuli; and third, the specific approach developed in this laboratory for the study of speech- sound categorization in infancy.

THE VOWEL CATEGORIZATION PROBLEM

Research on speech has shown that the acoustic cues for phonetic categories are frequently context, talker, position and/or rate dependent (see Liberman et al., 1967, for classic examples). The perception of vowel identity is a case in point. When a given talker produces different vowel sounds he changes the overall config- uration of his vocal tract. This change in configuration alters the resonant frequen- cies of the tract which are directly reflected in the locations of the formants (fre- quency regions in which the concentration of energy is greatest) on the frequency axis. Early work in the field demonstrated that (a) vowel perception is directly related to the locations of the formants; and (b) the first two formants are sufficient to distinguish all English vowels (Delattre, Liberman, & Cooper, 1951). Thus, the critical acoustic cues governing the categorization of vowels were presumed to involve either the absolute values of the formant frequencies or some measure of the relationship among the formant frequencies.

Research has shown, however, that when vowels are produced by talkers whose vocal tracts have different overall dimensions (as when the identical vowel is produced by a male, a female, and a child), the formant frequencies are quite

Page 3: Kuhl-1983-Auditory Equivalence Classes

AUDITORY EQUIVALENCE CLASSES 265

different (Peterson & Barney, 1952). Vowel categories adjacent in "vowel space," such a s / a / ( a s in "co t" ) and /~ / (as in "caught"), are not separable on the basis of their first two formants. This is due to the fact that the overall dimensions of the vocal tracts of men, women, and children are not proportional (Fant, 1973), so the resulting resonances (and therefore formant frequencies) are not related as ratio transforms. Fant (1973) has further demonstrated that the scale factor relating the formants produced by males, females, and children are not constant for any single formant across vowels nor across formants within vowels. Thus, the perception of equivalence among vowels spoken by different talkers was an interesting test of infants' categorization abilities.

CATEGORIZATION OF SPEECH BY INFANTS

Much of the research on speech perception in infancy has focused on the infant's ability to discriminate sounds rather than to recognize similarity among them. These discrimination studies (see Kuhl, 1979a, in press b; and Jusczyk, 1981, for reviews) have, however, provided convincing demonstrations of the young infant's ability to perceive the subtle acoustic differences among speech sounds. Moreover, these studies suggest that infants base their discrimination on the acoustic cues that have been demonstrated to be critical to discrimination in adult listeners.

There are very few experiments that measure directly the infant's perception of similarity among members of a speech-sound category. The classic work on the "categorical perception" of speech provided some evidence. These studies exam- ined the extent to which infants partitioned speech stimuli on a physical continuum in ways that conformed to phonetic categorization by adults (Eimas, 1974; 1975; Eimas, Siqueland, Jusczyk, & Vigorito, 1971). The stimuli in these experiments were computer synthesized; a single acoustic parameter was varied gradually to create a continuum that ranged perceptually, for example, f rom/ba/ to /pa/ . Dis- crimination tests on 1- and 4-month-old infants, involving a sucking-habituation technique, demonstrated that infants failed to discriminate stimuli given the same phonetic label by adults, while providing evidence of discrimination for stimuli separated by the same physical distance on the continuum, but given different phonetic labels by adults (Eimas et al., 1971). In other words, the infants responded to stimuli on the continuum as though they perceptually grouped those within a single phonetic category. However, this evidence of perceptual grouping can be interpreted in either of two ways. It can be attributed to the infant's recognition of category equivalence or simply to the infant's inability to discriminate within- category stimuli.

Another approach was taken by Fodor, Garrett, and Brill (1975). They exam- ined the acquisition of a head-turn response for visual reinforcement in 14- to 18- month-old infants under two stimulus conditions. In both conditions, three syllables were randomly presented ( /p i , /ka / , /pu / ) but only two of the three were reinforced. In one condition, the stimuli being reinforced were phonetically related (/pi/ and /pu/); in the other condition, the stimuli being reinforced were not phonetically

Page 4: Kuhl-1983-Auditory Equivalence Classes

266 KUHL

related (/pi/ and /ka/). The authors hypothesized that if infants tend to hear the similarity between two syllables that share the initial consonant, in spite of the differences in the acoustic cues for that consonant and in spite of the irrelevant differences between the two syllables, then their tendencies to learn the association ought to differ in the two conditions. Their hypothesis was supported. While the analyses showed that the proportion of head turns was greater for the reinforced stimuli in both conditions, a significant interaction followed up by a simple-effects analysis demonstrated that the difference between the proportion of head turns to reinforced and nonreinforced stimuli was significant only for the phonetically simi- lar group.

These data demonstrated that infants grouped syllables that shared the initial consonant more readily than they grouped syllables that did not share the initial consonant, but the data also demonstrated that neither task was easy. Two factors may have made the task inordinately difficult. First, observations in our own labora- tory demonstrate that until 5V2 months of age, a large percentage of infants are not easily conditioned to make head-turn responses for a visual reinforcer. At 5V2 months, or older, infants make head-turn responses easily and it is our impression that over 90% of the infants are conditionable for contrasts that are relatively easy, such a s / a / v s / i / . Second, the task involved a two-response differentiation. That is, a head turn either to the right or to the left, dependent upon which of two loud- speakers presented the reinforced stimulus, was required. Tasks which require two- response differentiation for auditory stimuli have been shown to be inordinately difficult for young infants and animals (see Burdick, 1979, and Miller and Bowe, 1982, for discussion). The approach taken in this laboratory (described in the next section) was an adaptation of a classic "go /no-go" technique. It requires the infant to make a single response which is differentiated from a state in which that response is inhibited. Studies directly comparing performance for animals tested in a go/no- go task versus a two-response task using the same auditory stimuli show the former to be far easier (Burdick, 1980).

AN APPROACH TO SPEECH-SOUND CATEGORIZATION IN INFANCY

An approach to the study of speech-sound categorization with infants was taken by Kuhl ( 1979b, 1980). We utilized a discrimination format coupled with a transfer-of- learning design in order to assess the degree to which training an infant to discrimi- nate two single exemplars, each representing a different phonetic category, would result in a transfer-of-learning to novel, discriminably different, instances from those same two phonetic categories. Two main features distinguish this approach to the study of categorization from the categorical-perception approach; first, the nature and the diversity of the stimuli representing each of the categories; and second, a technique that derives evidence of categorization from the infant's pro- duction of an equivalent response to stimuli that are discriminably different but are nonetheless perceived to be similar.

Page 5: Kuhl-1983-Auditory Equivalence Classes

AUDITORY EQUIVALENCE CLASSES 267

Regarding the first, rather than differing on a single acoustic parameter, as in the studies involving discrimination of sounds drawn from a physical continuum, the stimuli varied along a number of dimensions. For example, in the first experi- ment using this design, Kuhl (1979b) examined the infant's discrimination of two vowel ca tegor ies , /a / (as in " p o p " ) a n d / i / ( a s in "peep") . The stimuli used in the experiment varied along three dimensions: phonetic identity (/a/ vs. /i/), pitch contour (interrogative vs. declarative), and talker identity (male, female, or child). Stimuli assigned to the same vowel category are highly discriminable to infants (Kuhl and Hillenbrand, 1979). Thus, infants' perception of similarity cannot be attributed to a failure to discriminate among members of the same category. Fur- thermore, each dimension is acoustically prominent, providing potential distraction effects when attempting to " so r t " the stimuli along one particular dimension. Thus, categorization along the phonetic dimension required that the diverse acoustic events underlying phonetic identity be recognized as equivalent, while acoustically prominent but irrelevant acoustic events be ignored.

Regarding the second, categorization is inferred from the infant's production of an equivalent response to stimuli perceived to be similar. The technique therefore focuses on a measurement of the infant's perception of the similarity between stimuli, rather than on the infant's perception of the differences between stimuli.

• Our discrimination task involved a head-turn response that was reinforced with the presentation of a visual stimulus when a constantly repeated vowel, such as /a / , was changed to / i / . In the first of two experiments, infants were trained to make a head- turn response when a n / a / vowe l , synthesized to simulate a male voice with a falling intonation contour, was changed to an / i / vowel. After this initial training, infants were tested with other exemplars of the two categories, ones which simulated the productions of female and child talkers, with either rising or falling intonation contours. The data demonstrated that infants readily transferred the correct response to novel vowels from both categories. In fact, an analysis of first-trial data demon- strated that infants performed significantly above chance for each of the novel vowels the first time it was presented. In other words, infants immediately produced the response to each novel variant from the category, indicating the perceived equivalence among exemplars in the category.

The purpose of this experiment was to replicate the two experiments described previously (Kuhl, 1979b), but using vowel categories that are adjacent in vowel space. The vowel categories /a/ (as in " c o t " ) and /3/ (as in "caught") were chosen for this purpose. Peterson and Barney's (1952) data on the formant frequen- cies o f / a / a n d / 3 / t h a t were naturally produced by men, women, and children show that these two vowels are very similar, even when produced by a single talker, and that when the productions of many talkers are analyzed, there is considerable overlap for the values of the first and second formants of these vowel categories. We chose to tes t /a / and /3 / for two reasons. First, they represent one of the most difficult vowel contrasts in English. Should infants demonstrate equivalence classi- fication for these vowel categories, one might reasonably suppose they could do so

Page 6: Kuhl-1983-Auditory Equivalence Classes

268 KUHL

for other vowel contrasts as well. Second, Kuhl (1979b) argued that infants' percep- tion of the vowel c a t e g o r i e s / a / a n d / i / m i g h t have been based on recognition of a simple property specifying the configuration of the formant frequencies. The vowel /i/ is "diffuse" with widely spaced formants While the vowel /a/ is "compact" with closely spaced formants (Jakobson, Fant, and Halle, 1969). The vowels /a/ and /o / t e s t this basis for category recognition because the two vowels.have similar spectral configurations (both are "compact") . Should infants differentiate these two vowel categories, the ability to categorize cannot be based on the recognition of the "compactness" property.

EXPERIMENT I

METHOD

Subjects

Four infants, aged 5.5 to 6.5 months of age, served as subjects for the experiment. One additional infant was tested for 3 days and was then dropped from the study due to his tendency to cry or fuss after about 10 minutes of testing. The infants were obtained by mail solicitation to the parents of all newborns in the Seattle area. Parents were questioned about familial histories of hearing loss and treatment for ear infections; infants at-risk for hearing loss were not tested. The infants were full term and seemed to their parents to be developing normally. Parents were paid $5.00 per visit at the end of the experiment.

Stimuli

The stimuli were synthesized on a terminal analog serial synthesizer at Central Institute for the Deaf in St. Louis. Two exemplars of the v o w e l / a / a n d the vowel /~/, one with a falling pitch contour typical for declarative sentences and one with a rising pitch contour typical for interrogative sentences, were synthesized for each of three talkers, a male, a female, and a child. Table 1 lists the center frequencies and bandwidths of the first three formants f o r / a / a n d / ~ / f o r each talker; these first three

TABLE 1 Center Frequencies and Bandwidths of the First Three Formants of t h e / a / a n d

/~ /Vowels for Male, Female, and Child Talkers

Male Female Child

/o/

lal

F3 2410 (81.9) 2710 (94.9) 3180 F2 840 (45.8) 920 (53.0) 1060 F1 570 (45.1) 590 (58.1) 680 F3 2440 (83.2) 2810 (99.9) 3170 F2 1090 (48.4) 1220 (53.9) 1370 F1 730 (45.2) 850 (53.4) 1030

(121.7) (53.0) (55.7)

(121,0) (55.6) (52.9)

Page 7: Kuhl-1983-Auditory Equivalence Classes

AUDITORY EQUIVALENCE CLASSES 269

TABLE 2 Center Frequencies and Bandwidths of the Upper Formants (F4-Flo) for Male,

Female, and Child Talkers (in kHz)

Male Female Child

F4 3,5 (0.175) 4.03 (0.225) 5.01 (0.35) F5 4,5 (0.281) 5.18 (0.380) 6.44 (.675) F6 5,5 (0.458) 6.33 (0.640) 7.87 (1.5) F7 63 (0.722) 7.48 (1.25) 9.3 (4.25) F e 7 3 ( 1 . 2 5 ) 8 . 6 3 ( 2 . 4 0 ) . . .

F 9 83 (2.125) 9.78 (7.0) . . . Flo 9,5 (4.75) . . . . . .

formants determine the identity of the vowel. The center frequencies were taken from Peterson and Barney's (1952) averages of the center-frequency measurements of naturally produced/a / a n d / o / in an /h -d /con tex t . The bandwidths were taken from Dunn (1963). The upper formants (Table 2), included to produce more natural vowels, were held constant for the two vowels for each talker. The upper formants were obtained by taking Rabiner's (1968) estimates of the center frequencies for the upper formants of a male talker and modifying them using Fanrs (1973) correction factors for the proportionally shorter vocal-tract lengths of female (.87) and child (.70) talkers. The stimuli were synthesized with equal amplitudes at a 20 kHz sampling rate; formants over 10 kHz were eliminated to avoid aliasing problems. All stimuli were 500 ms in duration.

Table 3 lists the pitch-contour specifications. The falling contours were syn- thesized by changing from the first value to the second value in the first 100 ms. remaining there for 40 ms, and then falling to the third value in the remaining 360 ms. The rising contours were linear over their entire course.

The resulting stimuli were judged by adult listeners to be good exemplars of either the v o w e l / a / o r / 3 / w h e n they were presented for identification. In a forced- choice task, the tokens were readily attributed to either a male, female, or child talker, and to having either a rising or falling intonation contour.

Simulus tapes for the experiment were made on a two-channel tape recorder (Sony model #4204). T h e / a / a n d / 3 / v o w e l s were recorded synchronously on two

TABLE 3 Pitch Contour Specif icat ions for the Rising and Falling Pitch-Contour St imul i

for t h e / a / a n d / ~ / V o w e l s .

Male Female Child

Fa II 112--> 132-->92 189---,223---* 155 224,-* 264,184 Rise 112--> 132 189-->223 224-->264

Page 8: Kuhl-1983-Auditory Equivalence Classes

270 KUHL

separate channels of the tape with stimulus onsets spaced 2 s apart. They were presented at 68 dB SPL (A scale) and were calibrated each day with a sound level meter (Bruel and Kjaer, model #2203) placed in the approximate position of the infant's head.

The Experimental Suite

The experimental/control suite was identical to that used by Kuhl (1979b). Briefly, the infant was held by a parent so that he or she faced an assistant. The assistant maintained the infant's attention at midline, or directly in front of the assistant, by manipulating a variety of silent toys. A loudspeaker (Electrovoice, SPI2) was located at a 90 ° angle to the assistant; the visual reinforcer was placed directly in front of the loudspeaker. The visual reinforcer consisted of a commercially avail- able battery-operated toy (a monkey clapping cymbals, a bear pounding a drum, or a dog wagging its tail) that was housed in a dark plexiglass box so that the animal could not be seen, and was not activated, until lights mounted inside the box were illuminated. The plexiglass box and the loudspeaker were mounted on stands so that both were at eye level for the infant. The camera, placed on the back wall of the room, fed a TV monitor located in an adjoining control room.

The control room contained a two-channel tape recorder (Teac, model #2300 SD) whose outputs fed a logic device. The TV monitor allowed observation of the infant. A cassette tape player fed music to two sets of earphones in the experimental room which were worn by the assistant and the parent. The music could be inter- mpted by an audio intercom which allowed the experimenter to communicate with the assistant.

The logic device contained a probability generator, set to .5 probability, which determined whether a given observation interval would be a change or a control trial. Since other work in the laboratory suggested that long strings of change or control trials greatly increased the probability that the infant would produce an error, the experimenter overrode the probability generator for one trial if three consecutive change trials or control trials occurred. The logic device automati- cally recorded the head-turn judgments made by the experimenter and the assistant, recorded the latency of the infant's response, scored the trial, activated and timed the visual reinforcer, and printed these data at the end of each trial.

Procedure

General Procedure. The technique employs a head turn for visual reinforce- ment adapted from one originally designed to obtain auditory thresholds for infants (see Kuhl, in press c for review.) Infants were trained to make a head turn whenever a speech sound, repeated once every 2 seconds as a "background" stimulus, was changed to a "comparison" speech sound. A head turn toward the loudspeaker which occurred during the presentation of the comparison stimulus was rewarded with the presentation of the visual reinforcer. Throughout the experiment, two kinds of trials, "change" and "control ," were run. During the 6 sec observation interval

Page 9: Kuhl-1983-Auditory Equivalence Classes

AUDITORY EQUIVALENCE CLASSES 271

TRIAL STRUCTURE

CHANGE TRIAL

CONTROL TRIAL

OBSERVATION PRE INTERVAL POST

I I I I I I I 1 I I

In/ In/ / t / In/ 3 2 i I

lal lal lal lal 3 2 6 1

1 I I I 2 4 6 8

/3/ /3/ /3/ lal l t l lal 2 6 5 4 3 5

lal I~1 lal lal lal I l l 4 5 2 4 3 5

t I I I J I !0 12 14 16 1O 20

TIN[ (see)

F i g u r e I . Stimulus presentation format which occurred prior to, during, and after observation intervals for change trials and control trials. As shown, a random pre- sentation of the tokens from the "background" category (shown here as the vowel /a /wi th six subscripts indicating the six different vowels presented during the final stage of Experiment I) occurred at all times, other than during the observation interval of a "change" trial, when a random presentation of stimuli from the "com- parison" category (shown here as the v o w e l / ~ / ) occurred. Head turns w e r e judged during the 6 s observation intervals of both change and control trials. See text for additional details.

on a change trial, the random presentation of stimuli from the background category was changed to a random presentation of stimuli from the comparison category (see Figure I for illustration). During the 6 sec observation interval of a control trial, stimuli from the background category were continuously presented.

The experimenter initiated an observation interval when the infant was attend- ing to the toys held by the assistant, and not crying, fussing, or babbling. During both change and control trials, the experimenter and the assistant judged whether a head turn occurred and had "vo t e " buttons with which to record their affirmative decisions. The assistant was cued that an observation interval was occurring by a vibrating pin located on the hand-held vote button. The pin vibrated for the duration of the observation interval.

If both the experimenter and the assistant judged that the infant had turned his/her head during the 6 sec observation interval on a change trial, the visual reinforcer was automatically activated by the logic device for a 3 sec period. If neither (or only one of them) judged that a head turn had occurred, the visual reinforcer was not activated and an error was scored. During a control trial, if neither the experimenter nor the assistant judged that a head turn had occurred, the trial was scored as correct. If both (or either) judged that a head turn had occurred, an error was scored. The visual reinforcer was never activated on control trials.

In order to control for potential bias, both the parent and the assistant wore

Page 10: Kuhl-1983-Auditory Equivalence Classes

272 KUHL

headphones and listened to music; therefore, neither could inadvertantly cue the infant that a stimulus change occurred. The assistant did not know whether a given observation interval was a change or a control-trial, so that neither the criterion for judging a head tum, nor a subtle change in the way in which the toys were manipu- lated, could differentially occur during the two kinds of trials. The experimenter did not know ahead of time whether a change or control trial was to occur, but did hear the stimuli during the trial. Since the votes of both the assistant and the experiment- er were automatically recorded for both change and control trials, any systematic bias on the part of the experimenter could be observed.

Experimental Stages. The experiment progressed in five stages after a condi- tioning criterion was met. The stimulus ensembles for both the background and the comparison categories are listed in Table 4 for all five stages of the experiment. The vowel category specified as the background was counterbalanced across subjects. During conditioning and the initial-training stage of the experiment, each of the two categories was represented by a single stimulus, matched in every detail except for the critical cues (the first three formant frequencies) which differentiate the two categories.

During conditioning, only change trials occurred. The change from the back- ground stimulus to the comparison stimulus was paired with the presentation of the visual reinforcer. Since activating the visual reinforcer resulted in both a prominent

TABLE 4 The Stimulus Ensembles for the Background and Comparison Categories for All Stages of the Experiment. The Talker and Pitch-Contour Values for Each

Stimulus are Given in Parentheses.

Experimental stages

Background Comparison

Conditioning Initial training Pitch variation

Talker variation

Talker x pitch variation

Entire ensemble

/a / (Male, fall) /a / (Male, fall) /a / (Male, fall) /a / (Male, rise) la/ (Male, fall) /a/ (Female, fall) /a / (Male, fall) /a / (Male, rise) /a/ (Female, fall) /a/ (Female, rise) /a / (Male, fall) /a / (Male, rise) /a/ (Female, fall) /al (Female, rise) /a/ (Child, fall) /a/(Child, rise)

/o/(Male, fall) /3/(Male, fall) /3/(Male, fall) /3/(Male, rise) /3/(Male, fall) /3/(Female, fall) /3/(Male, fall) /3/(Male, rise) /3/ (Female, fall) /3/(Female, rise) /3/(Male, fall) /31 (Male, rise) /3/(Female, fall) /3/ (Female, rise) /3/(Child, fall) /3/(Child, rise)

Page 11: Kuhl-1983-Auditory Equivalence Classes

AUDITORY EQUIVALENCE CLASSES 273

visual and auditory event, infants readily turned away from the assistant to look at it. Eventually, the infant anticipated the presentation of the visual reinforcer when the speech sound was changed from the background to the comparison stimulus. After three consecutive anticipatory head tums occurred during appropriate stim- ulus-change intervals, initial-training was begun during which change and control trials had a .5 probability of occurrence. The infant remained in the initial-training stage, and in all subsequent stages, until a performance criterion of 9 out of 10 consecutive trials correct was met.

In Stage 2 (pitch variation), the pitch contour of the vowels in both categories was randomly changed from rising to falling. The pitch variation encouraged the infant to ignore an acoustically prominent difference among the background (and the comparison) stimuli, while attending to a dimension along which the stimuli in each category were similar, that is, vowel identity. Successful completion of the pitch-contour stage served to indicate that the infant was capable of the task, so that a failure to generalize along the talker dimension could be separated from a more general cognitive difficulty with the task.

In Stage 3 (talker variation), the talker producing the vowels was randomly changed from the male to the female voice. In Stage 4, both talkers produced the vowels with a randomly changing pitch contour. In the final stage, the child's vowels, also with pitch-contour variations, were added to the ensemble bringing the total number of stimuli in each category to six (3 talkers x 2 pitch contours).

Typically, 25 trials were run each day in a 20 minute session. However, sessions were always terminated when an infant began to fuss or not attend to the assistant's toys; if, on the other hand, an infant was alert, the session was extended. Infants were tested on consecutive days whenever that was possible.

RESULTS

All infants met the performance criterion of 9 out of 10 consecutive trials correct at each stage in the experiment. The range in the number of sessions required to complete the experiment was 4 to 7 sessions, with an average of 5.5 sessions. The total number of trials ranged from 96 to 204 with an average of 147.75. The criterion required to pass each stage in the experiment (9 out of 10 consecutive correct) mandates a minimum of 50 trials to complete the experiment.

The conditioning criterion was met within the first session by all four of the infants, requiring 8.25 trials on the average. The average number of trials required to meet the criterion for each of the other five stages of the experiment are shown in Figure 2, along with their standard errors. The averages were: 50 (Initial Training), 27.25 (Pitch Variation), 13.75 (Talker Variation), 24 (Talker x Pitch Variation), and 23 (Entire Ensemble). As shown, a trend towards a decrease in the trials-to- criterion measure occttrred as the experiment progressed. However, a one-way analysis of variance with repeated measures revealed no significant differences in the number of trials required across conditions, F(4,12) = 2.18, p < . 10.

The average latencies of response for correct head-turn responses for each

Page 12: Kuhl-1983-Auditory Equivalence Classes

6 0

10

t - O 50

o ~

~ 4 0

I

o 3 0 I

~ 2 0 "r"

0 1 2 3 4 5

274 KUHL

Stages Figure 2. The mean number of trials to meet criterion (9 out of 10 consecutive trials correct) in the five stages of Experiment 1 (Initial Training, Pitch Variation, Talker Variation, Talker x Pitch Variation, and Entire Ensemble). Standard errors were 17.95, 6.16, 1.44, 6.62, and 6.58, respectively.

condition were as follows: Initial Training, 2.5 sec; Pitch Variation, 2.4 sec; Talker Variation, 2.3 sec; Talker x Pitch Variation, 2.4 sec; and Entire Ensemble, 2.4 sec. A one-way analysis of variance with repeated measures revealed no significant differences across conditions, F(4,12) = .2904, p < .25.

The agreement between the experimenter and the assistant on head-turn judg- ments for both change and control trials was 97.5%.

DISCUSSION

The assumption, using this particular experimental design, is that the extent to which infants demonstrate rapid transfer of learning to novel stimuli representing the two vowel categories reflects the extent to which the infant recognizes some criterial attribute which serves as a "sort ing rule" for forming two equivalence classes. The most efficient sorting rule for these stimuli (and the most natural one for adult listeners) involves the recognition of some criterial attribute that predicts the membership of the vowel categories /a/ a n d / o / .

The argument that each of the novel vowel tokens was recognized as belong- ing to the /a/ class or the /o/ class of stimuli is strengthened by ruling out two

Page 13: Kuhl-1983-Auditory Equivalence Classes

AUDITORY EQUIVALENCE CLASSES 275

alternative explanations for the infants' performance, both discussed by Kuhl (1979b). One strategy that might produce good performance without requiring that the infant recognize the equivalence among the/a/-vowels or the/3/-vowels would be to produce the head-turn response whenever the training stimulus was presented. The latency data obtained in this experiment and others (see Kuhl, 1980) do not support the notion that infants adopt this strategy. The data indicate that in the first stage of the experiment, infants begin their head-turn responses just after the first stimulus in the observation interval is presented, and that they continue to do so throughout the experiment. Since, in the latter stages, each of the six stimuli has an equal chance of occurring first during the observation interval, the latency data support the notion that each of the six stimuli, rather than just the training stimulus, evoke the head-turn response.

A second strategy to rule out is a memorization strategy, whereby good performance is produced by simply memorizing each of the stimuli that is rein- forced, without necessarily recognizing a perceptual similarity among the stimuli being reinforced. If a memorization strategy were applied, one would expect that the average trials-to-criterion measure would increase as the experiment progressed. The fact that a trend towards a decrease was observed in the trials-to-criterion measure, when comparing the initial-training stage to other stages, would tend to counter the memory argument and support the categorization hypothesis. The mem- ory hypothesis has been directly investigated in experiments in our laboratory (Hillenbrand, 1980; Kuhl, in press b) and elsewhere (Miller, Younger, & Morse, 1982). In these experiments, performance was compared on two tasks: one in which the two categories of sounds was determined by rule (such as phonetic classifica- tion), and a second in whichthe same stimuli were separated into two categories at random. In these experiments, infants performed very poorly in the random condi- tion when the reinforcement contingencies were not predicted by a sorting rule that had a definable perceptual correlate.

These arguments notwithstanding, the claim that the infant is capable of recognizing equivalence classes that conform to the vowel categories /a/ and /~/ would be further supported if infants performed significantly above chance when the intermediate stages in the experiment were eliminated. Even stronger evidence would be provided if performance was significantly above chance for each novel stimulus the first time it was presented. The second experiment, a replication of that run by Kuhl (1979b, Experiment 11), represents an attempt to provide additional support for this interpretation.

EXPERIMENT II

METHOD

Subjects

Eight new infants, 5.5 to 6.5 months of age, were tested. An additional five infants were excluded from the experiment after failing to meet the conditioning or the

Page 14: Kuhl-1983-Auditory Equivalence Classes

276 KUHL

intial-training criterion in a set number of trials (see below). Subjects were obtained in the same manner as that described for Experiment I.

Stimuli

The /a/ and /3 / stimuli were identical to those used in Experiment .I.

Procedure

Infants were tested using the head-turn technique in the same manner as that pre- viously described. In this experiment, however, the infant was tested in only two stages, rather than in five. Conditioning and initial-training were run exactly as before with the exception that infants were required to meet the conditioning criteri- on (3 consecutive correct responses) within 40 trials and the initial-training criterion (9 out of l0 consecutive responses correct) within 90 trials, or they were excluded from the experiment. These limits were imposed to ensure that the infant would reach the final phase of the experiment before becoming satiated on the visual reinforcer.

All infants who met the criteria in the conditioning and initial-training stages within the specified number of trials advanced to the second (transfer-of-learning) stage. In this stage, all six of the stimuli in each of the two vowel categories were randomly presented, as in the final stage of Experiment I. However, unlike the final stage of Experiment I, each stimulus was repeated three times before the next

TRIAL STRUCTURE

CHAHGE TRIAL

CONTROL TRIAL

OBSERVATION PIE IHTERVAL POST

I I I I I I I 1 I I

In/ In/ I l l In/ 3 2 2 2

In/ In/ In/ In/ 3 i i i

/3/ /'4/ | i

I f I l l I l l l 4 4

I l l I l l Inl I 3 3

I l l 4

I l l /a/ 1 I

I I I I I I I I I I 2 4 | I 10 12 14 16 I! 20

TIN[ tsic}

I l l 3

F i g u r e 3. Stimulus presentation format which occurred prior to, during, and after observation intervals for change and control trials in Experiment II. The format was similar to that used in Experiment I, with the exception that each stimulus stimulus in the random sequence was repeated three times. The observation intervals of both change and control trials were synchronized to begin when the first sound in the triplet was presented and to end after the last sound in the triplet was presented, thus ensuring that the infants response during the observation interval could be attributed to a specific stimulus stimulus.

Page 15: Kuhl-1983-Auditory Equivalence Classes

AUDITORY EQUIVALENCE CLASSES 277

stimulus in the random sequence was presented (Figure 3 illustrates). In this way, only a single stimulus was presented during a particular change or control trial. All infants were tested for 90 trials in the transfer-of-learning phase of the experiment, regardless of their performance. Ninety trials were considered adequate to sample each of the stimuli in the two vowel categories.

RESULTS

Training Phase

The number of trials-to-criterion in the conditioning and initial-training stages of the experiment are listed in the top of Table 5 for all eight infants. The number of trials required to pass the conditioning criteria ranged from 4 to 26 with a mean of 12.0. In the initial-training stage, the number of trials run before infants passed the criterion ranged from 41 to 94 with a mean of 69.8.

Transfer-of-Learning

Overall Performance. The data obtained in the transfer-of-learning phase of the experiment are summarized at the bottom of Table 5 for all eight infants. The percentage of head-turn reponses that occurred during all change trials (Hits), the percentage of head-turn responses that occurred during all control trials (False Alarms), and the percent-correct score (% hits + % correct rejections divided by 2) is shown for each infant. The percent-correct scores for individuals ranged from 51.4% correct to 89.8% correct, with a mean of 67.3%.

Table 6 lists the percent-correct scores for each of the six stimuli for indi- vidual infants. Group performance ranges from 67.2% correct on the training stim- ulus (Male-fall) and on one other stimulus (Female-rise), to 74.0% correct on the Child-rise stimulus, with average performance across all stimuli at 67.8% correct. A two-factor analysis of variance on the Trial Type (Change vs. Control) and the Stimulus Type (six individual types) with repeated measures on both factors re- vealed a significant effect for the Trial factor, F(I ,7) = 14.72, p < .01, and a non- significant effect on the Stimulus factor, F(5,35) = 1.37, p < .25, and no signifi- cant interaction.

As Tables 5 and 6 show, performance differed substantially across infants. For four of the infants (S l, $3, $7, $8), overall performance was good, ranging from 73.9% correct to 89.9% correct. For the remaining four infants ($2, $4, $5, $6), performance was poor, ranging from 51.6% correct to 57.5% correct. Perfor- mance on the training stimulus (Male-fall) predicted overall performance. The eight infants were divided into two groups based on their performance on the training stimulus; infants who achieved better than 75% correct on the training stimulus were placed in one group, while infants who achieved less than 55% were placed in a second group. Calculations of the average scores for these two groups on each of the six stimuli showed that infants who performed well on the training stimulus (X = 90.6%) also performed well on each of the other stimuli (mean performance ranged from 74.6% to 85.2%); infants who did not perform well on the training

Page 16: Kuhl-1983-Auditory Equivalence Classes

278 KUHL

TABLE 5 The N u m b e r of Tr ia ls - to-Cr i ter ion Dur ing the Tra in ing Phase of the

Expe r imen t for Each Subject ( top); the Percent Head Turns on Change and Cont ro l Trials, and the Overal l Percent-Correct Scores for Each Subject in the

Trans fer -o f -Learn ing Phase of the Expe r imen t (bot tom) .

Training Phase

S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8

Conditioning 6 15 4 15 26 7 16 7 12.0 Initial Training 75 51 70 78 87 41 62 94 69.8

Transfer-of-Learning

Percent head turns on change trials 89.8 30.6 64.6 25.6 37.8 19.2 77.8 66.0 51.4

Percent head turns on control trials 10.2 19.5 16.7 12.8 33.3 16.3 26.8 20.0 17.0

Overall percent correct 89.8 55.6 74.0 56.4 52.3 51.4 85.5 73.0 67.3

TABLE 6 Total -Tr ia l Data for Each of the S t imu l i Presented Dur ing the Transfer-of-

Learn ing Phase of Expe r imen t II, fo r Each of the Eight Subjects. The Percent- Correct Measure Reflects Per fo rmance on Both Change and Cont ro l Tr ials (%

Hits + % Correct Reject ions + 2).

Token Types

Male, Male, Female, Female, Child, Child, Ss fall rise fall rise fall rise Xs a

1 100.0 85.0 83.1 81.4 91.8 98.2 89.9 2 29.4 70.0 66.7 60.9 33.3 64.3 54.1 3 90.5 81.3 65.4 60.0 82.3 78.8 76.4 4 55.0 41.7 53.3 73.7 54.5 66.7 57.5 5 37.5 55.6 55.6 46.7 50.0 72.7 53.0 6 37.5 55.6 57.9 42.1 52.9 63.6 51.6 7 94.1 81.4 82.2 90.0 85.0 81.4 85.7 8 77.8 71.4 78.6 67.0 81.8 66.5 73.9 Xs 65.2 67.8 67.9 65.2 66.5 74.0 67.8

aThe mean scores for each infant in this Table do not always agree with the overall percent-correct measure in Table 7 because each of the stimuli listed above was not presented an equal number of times to each infant.

Page 17: Kuhl-1983-Auditory Equivalence Classes

AUDITORY EQUIVALENCE CLASSES 279

TABLE 7 The Mean Latency of Response for Each Subject on Correct Change

Trials (hits) During Initial Training, During the Transfer-of-Learning Phase, and on the Training Stimulus (male, fall) During the Transfer-of-Learning

Phase of the Experiment.

S 1 S 2 S 3 S 4 S 5 3 6 S 7 S 8 X

Hit latency/Initial training 2.8 3.0 2.7 3.2 2.5 3.1 2.9 3.2 2.9

Hit latency/ Transfer phase 1.9 3.2 2.4 3.3 2.8 3.7 2.9 3.0 2.9

Hit latency/ Transfer phase/ training token 1.9 2.4 2.0 3.5 3.5 2.3 2.4 2.9 2.9

m

stimulus (X = 39.9%) did not perform well on any of the other stimuli (mean performance ranged from 47.7% to 66.8%).

The latencies of response on correct change trials (Hits) for individual infants are shown in Table 7. Latencies during the initial-training phase and the transfer-of- learning phase of the experiment are shown, as well as for the training stimulus during the transfer-of-learning phase. In each case, the average latency is 2.9 sec, and a one-way analysis of variance with repeated measures revealed that the dif- ferences among the latency scores were not significant (F = .316; p < .40).

The agreement between the assistant and the experimenter on head-turn re- sponses for Experiment 11 was 96.2%.

First-Trial Data. Performance was analyzed for each novel stimulus the first time it was introduced. The data obtained for each of the novel stimuli introduced into the comparison category (presented during change trials) and introduced into the background category (presented during control trials) were considered. The group's overall performance on the first trial was significantly above chance, both when performance on the training stimulus was included (68 out of 96, p < .001, binomial test~), and when performance on the training stimulus was eliminated (53 out of 80, p < .01).

We were most interested, however, in performance on individual stimuli. Infants performed best on the original training stimulus (Male-fall), producing the correct response in 15 out of 16 opportunities (p = < .001, by binominal test). Performance on three other stimuli (Female-rise, Female-fall, and Child-fall) was also significantly above chance (p < .05), with 12 out of 16 correct responses.

i The probability of being correct on each trial is .5 because (a) if the infant fails to produce a head-turn response on any of the trials; or (b) produces them on every trial, he would only be correct half the time, since to be correct the infant must taro on change trials, but not on control trials. If the infant produces genuinely random bead-turning responses, the probability of correct responding would be .5 as well.

Page 18: Kuhl-1983-Auditory Equivalence Classes

280 KUHL

Performance on the remaining two stimuli (Male-rise, Child-rise) did not exceed chance.

Thus there is evidence in the first-trial data for immediate generalization across the talker factor, the one of main in'terest here. That is, performance was significantly above chance for both of the female's stimuli and for one of the child's stimuli. The two novel stimuli in which performance failed to reach significance (Male-rise and Child-rise) were ones that did not preserve the value of the second dimension (pitch contour) that had been presented during intial-training (i.e., a falling pitch contour). This trend was also noted by Kuhl (1979b).

GENERAL DISCUSSION

Most experiments on speech perception in early infancy have focused on the in- fant's ability to detect a difference between two single speech sounds. The main focus in this experiment was the perception of similarity among discriminably different stimuli representing a particular phonetic category. The task utilized in this experiment required the infant to do two things: perceive the similarity among phonetically equivalent vowel stimuli, while maintaining the perceptual distinction between two vowel categories.

The data reported here suggest that infants trained to demonstrate the discrim- ination of two exemplars, one from each of two vowel categories, infants correctly transfer the discriminative response to novel stimuli representing the two categories. Support for the claim was produced in two experiments. In the first, the novel stimuli were introduced gradually in a five-stage experiment and performance was measured in terms of the number of trials required to meet a performance criterion. The trials-to-criterion measure demonstrated no significant increases as novel stim- uli were introduced. In fact, a trend toward decreasing numbers of trials required to meet criterion at each stage was seen. This savings in the number of trials required at each stage is consistent with the notion that infants recognize the stimuli as members of two different categories. These findings differ from those obtained in studies in which infants are required to learn to respond to stimuli that are not perceptually similar (Hillenbrand, 1980; Kuhl, in press b; Miller, Younger, & Morse, 1982).

Experiment lI was designed to allow statistical comparisons of performance for each of the novel vowel stimuli and to eliminate the gradual introduction of novel stimuli. The first trial data of Experiment II demonstrated that infants imme- diately generalized the correct response to the novel stimuli produced by different talkers. This is the factor of main concern to the vowel categorization problem. Infants performed significantly above chance upon first exposure to both of the female's novel vowels and to one of the child's novel vowels (Child-fall), but failed to perform significantly above chance on two other stimuli (Male-rise and Child- rise). For both of these stimuli, the level of the second dimension (pitch contour) was not identical to that presented during the training phase (i.e., a falling pitch

Page 19: Kuhl-1983-Auditory Equivalence Classes

AUDITORY EQUIVALENCE CLASSES 281

contour). A similar trend toward better performance on novel stimuli that preserved the value of the pitch dimension that had been represented by the training stimulus was noted by Kuhl (1979b) in the study involving the more easily discriminated /a- i / contrast. This effect may be due to the separability of the vowel and pitch dimensions in perception by infants (see Kuhl and Miller, 1982, for discussion).

In summary, the data provide strong support for the notion that 6-month-old infants recognize equivalence classes that conform to vowel categories. Given that we have examined the perception of equivalence classes for an easily discriminated vowel pair such as /a/ vs. /i/ (Kubl, 1979b) and here present data on a difficult vowel pair such as /a/ vs. /~/, we interpret the data as providing evidence that infants preserve vowel "constancy" across transformations in the talker and pitch contour of the vowels, and probably do so for all vowel categories in English. Their ability to render discriminably different events equivalent suggests that they are capable of speech-sound categorization. These studies constitute the first evidence that young infants are capable of categorization for speech.

While we have demonstrated that infants are capable of "sorting" these sounds on the vowel identity dimension, it is nevertheless difficult to specify the sorting rule they use in acoustic terms. We cannot, in fact, identify the exact nature of the information used by adult listeners in the categorization of vowels produced by different talkers. While obviously associated with the locations of the formant frequencies, the information is not presumed to be frequency specific, but rather to involve the perception of similar spectral "patterns" or ~'shapes" (see Fant, 1973; Harshman, Lundy, & Disner, 1980; and Miller, Engebretson, & Vemula, 1980, for recent work).

Kuhl (1979b) argued that the categorization of vowels l ike/a/ and/ i /could be explained by the use of a sub-phonemic feature such as the distinction between "compact" and "diffuse" vowels (Jacobson, Fant, and Halle, 1969). The /a/ vowel is compact, meaning that its formant frequencies are spaced closely together, whereas the /i/ vowel is diffuse, meaning that its formant frequencies are widely spread. Infants' recognition of this property could lead to the correct categorization of vowels like /a/ and /i/ without indicating that infants base their perception of equivalence on the recognition of phonetic identity. The importance of the present experiment is that it tests infants' categorization skills for two vowels sharing a sub- phonemic feature. B o t h / a / a n d / 3 / a r e compact. The infant's ability to distinguish these two vowels in a categorization task rules out the possibility that the categories were based on a compact-diffuse sorting rule. Rather, it suggests that infants' vowel categories are based on recognition of the properties underlying phonetic identity.

Infants in this experiment were 6-months-old. Is it plausible to argue that experience plays a role in equivalence classification for vowels? Let us consider two ways in which experience could play a role. The first is learning by association. Two factors make such an explanation unlikely. First, the vowels /a/ and/3/ are typically not distinguished phonemically (that is, used to indicate a meaningful difference in real words) in most dialects of American English. But more important-

Page 20: Kuhl-1983-Auditory Equivalence Classes

282 KUHL

ly, even if the vowels were phonemically distinct in the speech heard by these infants, an associative learning explanation would require that infants (a) recognize the association between a particular sound produced by a single talker and an object or event; (b) recognize that the two sounds are associated with two different objects or events, and (c) learn by exposure to different talkers that the sounds made when referencing those same two objects or events are functionally equivalent. While the fields of developmental phonology, linguistics, and cognition have not provided specific timelines for these kinds of behaviors, it is difficult to argue convincingly that such learning accounts for these behaviors.

A second, more general experiential influence that would be interesting to explore is the extent to which the infant's own production of speech, during the babbling phase, might facilitate categorization. 2 We argued earlier that the infant must solve the categorization problem in order to imitate speech. That is, imitation is presumed to be guided by the infant's perception of the degree to which his own productions match those of a model. Yet, this process of vocal imitation might facilitate speech-sound categorization by focusing the infant's attention on a repre- sentation of vowels that reveals the similarities between those produced by his/her own vocal tract and those of a model. We are not arguing on independent grounds that motor representations facilitate auditory equivalence perception, but since cate- gorization studies have not been done on infants prior to babbling, these ideas remain to be explored experimentally.

The infant's ability to organize these stimuli along a particular dimension is not limited to the vowel dimension. In another experiment, Kuhi and Hillenbrand (1979) used these same stimuli and techniques to show that infants can sort these same vowel stimuli on the pitch-contour dimension. In that study, infants were reinforced for making a head-turn response when the speech sample had a rising pitch contour, but not when it had a falling pitch contour. The irrelevant dimensions were vowel identity and talker. We have not examined the infant's ability to use a sorting rule based on the talker dimension with these stimuli, but a recent study using techniques similar to those used here (Miller, Younger, & Morse, 1982) did demonstrate the 6-month-old's ability to distinguish male and female voices and transfer that discriminative response to productions by other male and female talkers.

Furthermore, the recognition of phonetic equivalence is not limited to vowels. Work in our laboratory has examined classes based on consonants such as the fricative distinction/s/, as in " s ip , " v s . / f / a s in "sh ip ," and/ f / , as in " f in , " vs. /0/, as in " th in" (Kuhl, 1980), as well as to nasal (/m/ vs. /n/) consonants (Hillenbrand, 1980). Additional work in our laboratory (Hillenbrand, 1980) has demonstrated the infant's ability to recognize equivalence classes based on a pho- netic feature, such as the distinctive feature "plosive" as opposed to "nasal ."

Thus, infants have demonstrated the ability to recognize auditory equivalence

2 The experience provided by the babbling phase would not be specific to motor practice on these vowels, since/~/ has not been reported to occur in infants at this age lLieberman, 1980L Rather. it would refer to infants' general knowledge about articulation acquired during the babbling phase.

Page 21: Kuhl-1983-Auditory Equivalence Classes

AUDITORY EQUIVALENCE CLASSES 283

classes, not only at the level of the phonetic unit, as demonstrated in this experi- ment, but at a more abstract level of representation, that of the phonetic feature. It would be valuable, from a theoretical perspective, to determine both the level of abstractness, and within a particular level, the specific dimensions along which the infant most naturally sorts stimuli. We suspect that some sorting rules for equiv- alence classification are more salient than others. Since we have first-trial data in only the vowel categorization studies (Kuhl, 1979b, and the data presented here), we cannot claim that in all of the cases tested to date, the recognition of equivalence is as immediate as it appears to be for vowels. Our impressions are that it depends on the diversity and "goodness" of stimuli representing the target dimension, the nature and range of the irrelevant dimensions, and factors related to training and testing the infants (see Kuhl, in press a for discussion).

We have also noted that as the categories under test are more similar acous- tically, and thus more difficult to distinguish, that individual differences are more pronounced. In this study, and in previous studies using difficult contrasts (Kuhl, 1980), some infants fail to perform when novel stimuli are introduced, particularly when introduced simultaneously as in the second experiment reported here. In these cases, performance decrements are not attributable to an average drop in the perfor- mance of all infants. Rather, as in this experiment, some infants continue to perform very well when novel stimuli are introduced while others perform near chance. Performance for both groups of infants was consistent across stimuli; that is, when infants were divided into a " g o o d - " and a "poor - " performance group based on performance on the training stimulus, they were equally good or poor on the novel stimuli. These differences among the infants were not predicted by their first-trial performance, nor by the speed with which they completed the conditioning or initial-training phases of the experiment. To date, we do not know whether these differences are correlated with any measures of general intelligence, language- learning skills, or other more specific cognitive abilities.

These studies complement and add substantially to existing information con- cerning infants' perceptual organization of speech. We have employed a design that examines categorization--the ability to render discriminably different events equiv- alent, and provide the first strong evidence that 6-month-old infants are capable of recognizing the similarity among discriminably different instances representing the same phonetic category. The infant's preservation of phonetic identity over changes in the talker producing the utterance, and the pitch contour employed by that talker, is critical to the development of speech perception and vocal learning.

REFERENCES

Burdick, C. K. The effect of behavioral paradigm on auditory discrimination learning: A literature review. J. Auditory Research 1979, 19. 59-82.

Burdick, C. K. Auditory discrimination learning by the chinchilla: Comparison of go/no-go and two- choice procedures. Journal of Auditory Research 1980, 20, 1-29.

Carlson. R., Fant, G., & Granstrom, B. Two-formant models, pitch, and vowel perception. In G. Fant & M. A. A. Tatham (Eds.), Auditory. analysis and perception of speech. London: Academic Press, 1975.

Page 22: Kuhl-1983-Auditory Equivalence Classes

284 KUHL

Carrell, T. D., Smith, L. B., & Pisoni, D. B. Some perceptual dependencies in speeded classification of vowel color and pitch. Perception and Psychophysics, 1981, 29, I-10.

Delattre, P. C., Liberman, A. M., & Cooper, F. S. Voyelles synthetiques a deux formantes et voyelles cardinales. Maitre Phonetique, 1951, 96, 30-36."

Dunn, H. K. Acoustic characteristics of vowels. Proceeding of the Engineering Summer Conference on Automatic Recognition, Ann Arbor, July 1963.

Eimas, P. D. Auditory and linguistic processing of cues for place of articulation by ihfants. Perception and Psychophysics 1974, 16, 513-521.

Eimas, P. D. Auditory and phonetic coding of the cues for speech: Discrimination of the/r- l /dist inc- tion in young infants. Perception and Psychophysics, 1975, 18, 341-347.

Eimas, P. D., Siqueland, E. R., Jusczyk, P., & Vigorito, J. Speech perception in infants. Science, 1971, 171, 303-306.

Fant, G. Speech sounds and features. Cambridge, MA: MIT Press, 1973. Fodor, J, A., Garrett, M. F., & Brill, S. L, Pi ka pu: The perception of speech sounds by prelinguistic

infants. Perception and Psychophysics, 1975, 18, 74-78. Harshman, R. A., Lundy, M. E., & Disner, S. F. "Intelligent" (statistically-guided) algorithms for

vowel normalization. Journal of the Acoustical Society of America, 1980, 68, Supplement 1, S 32(A).

Hillenbrand, J. M. Perceptual organization of speech sounds by young infants. Unpublished doctoral dissertation, University of Washington, 1980.

Jakobson, R., Fant, G., and Halle, M. Preliminaries To Speech Analysis. Cambridge: MIT Press. Jusczyk, P. W. Infant speech perception: A critical appraisal. In P. D. Eimas & J. L. Miller (Eds.),

Perspectives on the study of speech. Hillsdale, NJ: Lawrence Erlbaum Associates, 1981. Kuhl, P. K. Speech perception in early infancy: The acquisition of speech-sound categories. In S. K.

Hirsh, D. H. Eldredge, I. J. Hirsh, & S. R. Silverman (Eds.), Hearing and Davis: Essays honoring Hallowell Davis. St. Louis, MO: Washington University Press, 1976.

Kuhl, P. K. The perception of speech in early infancy. In N. J. Lass (Ed.), Speech and language: Advances in basic research and practice. New York: Academic Press, 1979. (a)

Kuhl, P. K. Speech perception in early infancy: Perceptual constancy for spectrally dissimilar vowel categories. Journal of the Acoustic Socie~ of America, 1979, 66, 1668-1679. (b)

Kuhl, P. K. Perceptual constancy for speech-sound categories in early infancy. In G. Yeni-Komshian, J. Kavanagh, & C. Ferguson (Eds.), Child phonology: (Vol. 2). Perception. New York: Academic Press, 1980.

Kuhl, P. K. The perception of speech and sound in early infancy. In P. Salapatek & L. Cohen (Eds.), Handbook of infant perception. New York: Academic Press, in press, (a)

Kuhl, P. K. Categorization of speech by infants. In J. Mehler & R. Fox (Eds.), Infant cognition: Beyond the blooming, buzzing, confasion. Hillsdale, NJ: Lawrence Erlbaum Associates, in press. (b)

Kuhl, P. K. Methods used in the study of infant speech perception. In G. Gottlieb and N. Krasnegor (Eds.), Measurement of Audition and Vision During the First Year of Life: A Methodological Overview. Norwood, N.J.: Ablex, in press. (c)

Kuhl, P. K., & Hillenbrand, J. Speech perception by young infants: Perceptual constancy for categories based on pitch contour. Paper presented at the meeting of the Society for Research on Child Development, San Francisco, March, 1979.

Kuhl, P. K., & Miller, J. D. Discrimination of auditory target dimensions in the presence or absence of variation in a second dimension by infants. Perception and Psychophysics, 1982, 31, 279-292.

Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert-Kennedy, M. Perception of the speech code. Psychological Review, 1967, 74, 431-461.

Lieberman, P. On the development of vowel production in young children. In G. Yeni-Komshian, J. Kavanagh, & C. Ferguson (Eds.), Child phonology: (Vol. 1). Production. New York: Academic Press, 1980.

Miller, C., Younger, A., & Morse, P. Categorization of male and female voices in infancy. Infant Behavior and Development, 1982, 5, 143-159.

Page 23: Kuhl-1983-Auditory Equivalence Classes

AUDITORY EQUIVALENCE CLASSES 285

Miller, J. D., & Bowe, C. A. Roles of the qualities and locations of stimuli and responses in simple associative learning: The quality-location hypothesis. Pavlovian Journal of the Biological Sci- ences, 1982, 17, 129-139.

Miller, J. D., Engebretson, A. M., & Vemula, N. R. Vowel normalization: Differences between vowels spoken by children, women, and men. Journal of the Acoustic Society of America, 1980, 68, Supplement I, S 33(A).

Miller, J. L. Interactions in processing segmental and suprasegmental features of speech. Perception and Psychophysics, 1978, 24, 175-180.

Peterson, G. E., & Barney, H. L. Control methods used in a study of the vowels. Journal of the Acoustical Society of America, 1952, 24, 175-184.

Rabiner, L. R. Digitai-formant synthesizer for speech synthesis studies. Journal of the Acoustical Society of America, 1968, 43, 822-828.

Shankweiler, D., Strange, W., & Verbrugge, R. Speech and the problem of perceptual constancy. In R. Shaw & J. Bransford (Eds.), Perceiving, acting, and knowing: Toward an ecological psychol- ogy. Hillsdale, N J: Lawrence Erlbaum Associates, 1977.

3 August 1981; Revised 17 March 1982 •