16
SONORITY AND SYLLABICITY: ACOUSTIC CORRELATES OF PERCEPTION Patti Jo Price+ Abstract. Evidence is accumulating that native are not as consistent, confident, or in agreement on counting the number of syllables in natural utterances as is commonly assumed (Lebrun, 1966; Bell, 1975; Price, Note 1). There are, however, instances where speaker-hearers give clear, consistent syllable counts. It is the position of this paper that the unclear cases as well as the clear cases are phonetically classifiable in terms of sonority. The experiments presented here are intended to delimit what is meant by sonority in acoustic terms. The syllable is more often appealed to than defined (see surveys in Bell, 1978; Price, Note 1). The problems arising from attempted definitions are sometimes "explained away" by positing the syllable as a "natural perceptual unit." In this view, native speakers have strong intuitions about syllables, but definitions cannot be developed from these intuitions due to complex interactions of morphology, phonology, orthography and phonetics. However, evidence is accumulating that even this weak claim for the syllable may not hold. Bell (1975) tried a variety of methods in attempting to elicit natural units. Lebrun (1966) asked for syllable counts of short sentences repeated as often as subjects wished. Price (Note 1) asked for syllable counts of very short utterances with dialect background strictly controlled. In all these studies, where the assumption about the intuitive status of the syllable was tested, the results converged: nati ve were not extremely consistent, confident or in agreement on syllable counts. Moreover, automatic segmenting algorithms tend to fail in areas phonetically similar to those where native speakers are inconsistent or disagree: neighboring segments of roughly the same degree of sonority. Mermel stein (1975) mentions cases of syllabic vs. non-syllabic Irl or Inl ("hori zon" as Ihraj zanl or "apparently" as l,gpp£rnlil [sic]) and contiguous vowels as in "so 1." The inconsistency of listeners, the lack of agreement across listeners, and the failure of algorithms to match dicUonary syllabications do not necessarily imply that the syllable does not exist or is useless. Clear cases exist, and, further, the unclear cases may be taken as evidence that a more flexible definition of the syllable is necessary, i.e., a definition that accounts for both the clear and the unclear cases. Such a definition in terms of sonority peaks will be outlined here. The experiments reported here investigate the acoustic lates of the perceptual term "sonority." +Also University of Pennsylvania. Acknowledgment. I wish to thank Len Katz, Andrea Levitt, Leigh Lisker, and Michael Studdert-KennedY for discussion, criticism and many helpful comments; and BRSG Grant RR-05596 to Haskins Laboratories for support. [HASKINS LABORATORIES: Status Report on Speech Research SR-62 (1980)] 161

SONORITY AND SYLLABICITY: ACOUSTIC CORRELATES OF ......The terms "prominence" or "sonority" have been ied to various aspects of speech: as an overall feature of voice quality (see,

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SONORITY AND SYLLABICITY: ACOUSTIC CORRELATES OF ......The terms "prominence" or "sonority" have been ied to various aspects of speech: as an overall feature of voice quality (see,

SONORITY AND SYLLABICITY: ACOUSTIC CORRELATES OF PERCEPTION

Patti Jo Price+

Abstract. Evidence is accumulating that native speaker~hearers arenot as consistent, confident, or in agreement on counting the numberof syllables in natural utterances as is commonly assumed (Lebrun,1966; Bell, 1975; Price, Note 1). There are, however, instanceswhere speaker-hearers give clear, consistent syllable counts. It isthe position of this paper that the unclear cases as well as theclear cases are phonetically classifiable in terms of sonority. Theexperiments presented here are intended to delimit what is meant bysonority in acoustic terms.

The syllable is more often appealed to than defined (see surveys in Bell,1978; Price, Note 1). The problems arising from attempted definitions aresometimes "explained away" by positing the syllable as a "natural perceptualunit." In this view, native speakers have strong intuitions about syllables,but definitions cannot be developed from these intuitions due to complexinteractions of morphology, phonology, orthography and phonetics. However,evidence is accumulating that even this weak claim for the syllable may nothold. Bell (1975) tried a variety of methods in attempting to elicit naturalunits. Lebrun (1966) asked for syllable counts of short sentences repeated asoften as subjects wished. Price (Note 1) asked for syllable counts of veryshort utterances with dialect background strictly controlled. In all thesestudies, where the assumption about the intuitive status of the syllable wastested, the results converged: nati ve speaker~hearers were not extremelyconsistent, confident or in agreement on syllable counts. Moreover, automaticsegmenting algorithms tend to fail in areas phonetically similar to thosewhere native speakers are inconsistent or disagree: neighboring segments ofroughly the same degree of sonority. Mermel stein (1975) mentions cases ofsyllabic vs. non-syllabic Irl or Inl ("hori zon" as Ihraj zanl or "apparently"as l,gpp£rnlil [sic]) and contiguous vowels as in "so 1." The inconsistency oflisteners, the lack of agreement across listeners, and the failure ofalgorithms to match dicUonary syllabications do not necessarily imply thatthe syllable does not exist or is useless. Clear cases exist, and, further,the unclear cases may be taken as evidence that a more flexible definition ofthe syllable is necessary, i.e., a definition that accounts for both the clearand the unclear cases. Such a definition in terms of sonority peaks will beoutlined here. The experiments reported here investigate the acoustic corre~

lates of the perceptual term "sonority."

+Also University of Pennsylvania.Acknowledgment. I wish to thank Len Katz, Andrea Levitt, Leigh Lisker, andMichael Studdert-KennedY for discussion, criticism and many helpful comments;and BRSG Grant RR-05596 to Haskins Laboratories for support.

[HASKINS LABORATORIES: Status Report on Speech Research SR-62 (1980)]

161

Page 2: SONORITY AND SYLLABICITY: ACOUSTIC CORRELATES OF ......The terms "prominence" or "sonority" have been ied to various aspects of speech: as an overall feature of voice quality (see,

The terms "prominence" or "sonority" have been ied to various aspectsof speech: as an overall feature of voice quality (see, e.g., Wedin,Leanderson, & Wedin, 1978), as a feature of stress or accent carri.. ed bycertain syllables (see, among many others, Gaitenby & Mermelstein, 1977), andas a feature of segments forming the internal structure of syllables. Onlythe latter usage of the terms ltlill be dealt Hi th i.n the present study.Acoustic correlates for these terms that have been i , ho1tlever, sholrla great deal of overlap. Most of the studies consider fundamental frequency(absolute value or extent of excursion), intensity, and duration. Therelative roles of these acoustic attributes are under debate, perhaps largelybecause of methodological differences. I knoH of no studies that haveinvestigated experimentally the acoustic features of sonority for segmentssmaller than the syllable, although there has been a fair amount of theoriz~

ing.

Since Sievers (1893) the internal structure of the syllable has beendiscussed in terms of sonority, strength or prominence (see, for example,Bloomfield, 1933, p. 120; Venneman, 1972; Hooper, 1976), Bell and Hooper(1978) provide a good survey of cross~LLnguistic ev1dence for such hierar~

chies. The basic notion i.s that syllabic peaks are peaks of sonority, andthat segments increase in sonority before the peak and decrease in sonorityafter H. This implies that English III is more sonorant than English Ipl.because instances of IlIplV I and /~_Vlp#/ occur but not liIlpV_j or I Vplll/.The tHO latter examples may in fact be realized in English, but only if theIII is sonorant enough to be a syllable peak itself as 1n "I'll put 1t aHay"UlpUDID~H'lei/), or "people" Upipl/). The cross~l stic evidence may be

i F''l ~

taken to mean that sonority hierarchies are language specific: Russians seemto feel that IlIrtV I structures are one syllable, although 1n English amonosyllable of this structure is impossible. If utterance~·init1al RussianIrtVI structures are monosyllabic structures, then, in terms of sonority, Heare forced to say that Russian It I is more sonorant than Russian Irl, and Hecannot say the same for English. HOHever, this does not necessarily mean thatRussian and Engl1sh Iris are of the same sonority, or that sonority has noexplanatory value. If an acoustic deftnHion of sonority is developed, therelative sonority of li.nguistic units from different languages can be comparedHithout reference to the language fic phonotactics of the two linguisticsystems.

A fundamental generalization of the sonority is that vmvels aremore sonorant than consonants. A cruci.al aspect of the , hO\vever, isthat it is useful to divi.de the set of ic into a richerclassification system than one involving only consonants and vm-lel s. Evenlinguists ltlho make no specific menUon of sonori hierarchies may den-ne thesyllable in terms of a vocali.c nucleus surrounded consonantal margins(onset andlor coda) (Hjelmslev 1 ; Trager & Bloch, ·1941; Trager & Smith,1951; Hockett, 1958 Gr , 1962; & Halle, 1968; Studdert~Kennedy,

1976). Peaks of sonority are, in , vOHe1s; the troughs are generallyconsonants. Defining syllables in terms of al ternat10ns of consonants andvOHels Harks insofar as the classes of consonants and vOI..els are clear 0 Byexamin1ng the cases of clear and unclear vOHels, He can outline the classes ofclear and unclear sonority peaks, and, hence, predict listener inconsistency,disagreement, or possible problems for automati.c algori thrns andthe intuitions of a native

162

Page 3: SONORITY AND SYLLABICITY: ACOUSTIC CORRELATES OF ......The terms "prominence" or "sonority" have been ied to various aspects of speech: as an overall feature of voice quality (see,

"Clear." "good" or "proto c;oYTespond to prototypicalsyllabic peaks and are fairly easy ibe in articulatory or in acousticterms. They are characterized by an open vocal tract, vibrating vocal folds,and relatively long duration. Good consonants or syllabic margins arecharacterized by the opposite: a constricted or closed vocal tract, inter~"

rupted voicing, and relatively short duration (a "transientll as opposed toII steady~statell char'acter of the more audible portions of the articulation).These three factor of opening of the vocal tract (OPENING), glottalsource characteristics (VOICING). of transience (RATE OF CHANGE)-~are

all involved in sonority. All the experiments to be reported in this paperbear on the meaning of sonol~ity and the role it plays in syllabicityjudgments. In the acoustic domain. these three factors may correspond to thepresence vs. absence of a clear formant structure, voice vs. hiss (or no)excitation source, and stead vs. patterns. The RATEOF CHANGE characteristic may apply to parameters other than formant structure(fundamental frequency or amplitude. for example). but other aspects will notbe specifically investigated here. In the ideaUzed situation, then. chainsof syllables are series of vocal tract openings and closings with the openparts (syllabic peaks) corresponding to vO\1fels and the closed parts corres~

ponding to consonants. In the clearest cases this is true, with theexceptions that: (1) there is a tendency to think of the closing and/oropening gesture as constituting the consonantal component (rather than theIIclosed" part itself), and (2) there is a tendency to think of these openingand closing gestures as organized into discrete us~vowel units.

It is important to notice that the characteristic OPENING, VOICING, andRATE OF CHANGE are all relativ rather absolute terms. Furthermore,there are many cases where only one or two these ities may appear. Forexample, insofar as openness of the vocal tract indicates degree of vocalic~

ness, open vowel s (say, [a]) ar(~ more vowel~,like than close vowel s (say, [i]or [u]). A number of linguists (e.g., Hockett, 1942; Pike 1943, PP. 110~111;

Jones, 1950, p. 15) treat [j] and [w] non~ 1c counterparts of [1] and[u]. The orthography chosen may this view: they are often writtenidentically. sometimes wHh a cUacdtic added to distinguish them. Closevowel s are not only similar to idt~s (sometlmes called "semi~vO\1fels" or"semi~consonants"). but they al so risk confusion ~Ji th segments that are notIIsemi ll but "real" consonants: sl deviation in control of air supply forconstricted vowels can produce friction noise similarity tofricatives (as in, for example .Ilmerican ish "heed yOUI'll [hidyr]~~

[hidZr]) . Some vo\1fels are more vowel~··1:l.k\~ than others vii th respect' toOPENI~G, VOICING, or RATE OF CHANGE. All three characteristics are a matterof degr'ee. Voicing is a matter of both in its relative onset (see,e.g., liskeI' & Abramson, 1964) and in the amount f ng frictionnoise (as in, for ex , voiced v . murmur v whi vowels).

Furthermore 9 these three stic relatively independentoThat is, they may differ indices of hmt vocalic ( a particularsegment is. For example. in voicel vOltlal s, the mouth can be very open.steady state portions may be cl • but voicing is absent. Glidesrepresent a case in Itlhich the vocal tY'act is relatively open and voicing ispresent, but there is a rapid rate of There are also cases in which avoiced steady state period occurs when thE:) vocal tract is obstructed. as fornasals, voiced obstruents, and liquid. Voiceless fricatives may have a long

163

Page 4: SONORITY AND SYLLABICITY: ACOUSTIC CORRELATES OF ......The terms "prominence" or "sonority" have been ied to various aspects of speech: as an overall feature of voice quality (see,

steady state period. but rank low on the scales of VOICING and OPENING. Infact. the set of clear vowels or clear consonants is probably smaller than theset of unclear cases.

When one considers the combinator"ial of these elements, theproblem of syllables becomes more complex. In terms of the characteristics ofprototypical vowels (OPENING, VOICING. and RATE OF CHANGE), prototypic syll~

abIes can be defined as alternations of V olfJe1 s and prototypicconsonants. This predicts that listeners will agree more on the number ofsyllables in utterances that consist of alternations of prototypic consonantsand vowels than they will on alternations of the less clear cases. Supportfor this hypothesis is found in Price (Note 1).

The present study considers liquids ( ish Iris and Ills) in Ie VIposi tion. The degree of openness of the vocal tnwt cannot be systematicaTlyvaried for most sounds. since we tend to define classes of phones largely withrespect to this aspect. It is possible to vary relative and absoluteduration, amplitude, and voice onset time" The study investigatesthese aspects of sonority in the case of the s ite and 2~ay~~~

.£.arad~. There are many such pairs vJhere the sonority or prominence of the 111or 11"1 may be all that is needed to lexical items distinct: bray~beret,

round~around, long~~~ong, ~et }it~settle _~~" There are also moreamb1guouspairs where it is not clear that there is distinction at all: hire~higher.

aisle~·I '11 etc. Assuming that abic of sonority~ tilenincreasing the sonority of certain iable sonor:i ty should leadto an increase in the nlmber of lables ed. Experiment 1 tests theroles of duration and amplitude for their contribution to the sonority of 11"/in natural productions of and Experiment 2 tests voice onsettime and the relative roles of voj.c hiss and silence in a syntheticplight~polite continuum" Expertment 3 tests the roles of relativevs~bsolu£e~/ll durations in the same ite continuum.

led to the bestMel"melstein 's (1977)

that of duratton and ofoverlap in perception of

two are not necessarily

ngpect some

lab1es, but the

Relative intensity and relativ durationprediction of perceived syllable stress instudy, with the value of intensifundamental frequency. One mprominence wi thin and acrossidentical.

Irl Duration VB. Irl itude in Natural Productions ofand

.2~~>~~~~~,~"

In this experiment. amplitude and duratton of the 11"1 portions (definedas the portions where F2 and F3 are close to each other, an acousticindication of retroflexion) in natural productions of and pa~ade weremanipulated by computer ed and to naive listeners for labeling.

16f;

Page 5: SONORITY AND SYLLABICITY: ACOUSTIC CORRELATES OF ......The terms "prominence" or "sonority" have been ied to various aspects of speech: as an overall feature of voice quality (see,

Measurement Data=~~=~-~-~~

Ten productions each of and by each of two talkers, onemal e and one female, were measured. Voice onset time was measured fromwaveforms. Irl duration from spectral displays. Amplitude of aspiration andof the Irl were measured in dB down from the by computer analysis.

vfuile the duration of aspiration (VOT) was, on the average, about 10 mseclonger fOf' ~qs than for pcl!'ades. the of these durations for 'parade(40~60 msee) was wholly included in the range .P~~JL~q (40~80 msec). Thus,it was decided not to manipulate thi parameter in the present experiment.The amplitudes of aspiration did not differ significantly either in range orin mean value. Irl durations did vary ficantly: the mean for pr~ed wasfound to be 80 msec with a 55 to 110 msec range, and the mean for ,parade, 145msec with a range of 110 to 170 msec. While some tokens of _par~ade wereapparently pronounced by the male talker as Ipareyd/, as evidenced by spectraldisplays, amplitude envelopes (two humps in the display), and by listening,all productions by the female talker (and most productions by the male talker)were pronounced Ipreyd/~~with syllabic Ir/. The amplitude levels for the Iriswere measured at the Irl peak in dB dCi'm from the peak for that token, wheresuch a peak occurred, Where no such peak occurred, these levels were averagedover 12.6 msee intervals throughout tht3 Ir/. This ad hoc procedure may notresult in a meaningful measurement. In fact, averagearr~litude levels by thismeasure differed by only 1 dB.

Stimuli

In this experiment Irl duration itude were altered independently.Based on the measurement data, an II tol<en for each of the source \tlOrdswas selected from among the the male talker, since his formantswere easier to track and measure, and the tch periods made it easierto extend the Irl by pitch pulse iteration without disturbing the naturalnessof the tokens. From the chosen, II'I duration was shortened by deletingpitch pulses after onset of voicing. 30 msee resulted in a derivedstimulus \'iith a 115 maee Ir/. Similarly, deleting 60 msee resulted in aderived stimulus with an 85 maec Ir/. The most drastically shortenedstimulus, then, had an Ir/ duration roughly equal to the mean for the set ofprayeds. ~nplitude of the Irl portion was decreased by 6 dB for the originalandf~r the portion of the Irl in the t'lhortened versions. For theEayed chosen, Iii duration \l1as lncreased by iteration of the first pitchpulse. Two stimuli were derived in this fashion, one \tIlth Irl durationincreased by 30 msec (110 msec total Ir/) and one with Ir/ duration increasedby 60 msec (140 msec total Ir/). Add 60 msee to the inal token createda stimulus \'ihose 11'1 duration (1 1W msec) was roughly equal to the mean valuefor tokens of parade (1 L15 msec). For of these three 11'1 durations twoampli tude levelswer~e used: the or inal, and 6 db up from the original.Four tokens of each stimul us appeared in a randomized test sequence onseparate tapes for each source word.

The subjects were 12 paid volunteers (all Yale undergl'aduates) lrlho \tlel~e

asked to listen to the tapes twice, in counterbalanced order, once to count

165

Page 6: SONORITY AND SYLLABICITY: ACOUSTIC CORRELATES OF ......The terms "prominence" or "sonority" have been ied to various aspects of speech: as an overall feature of voice quality (see,

Irl DURATION & AMPLITUDE N=96

w (j)...J W00 (j)« z...J 0...J 0­r (j)Cf) Ww a:z -0o w

~ ~w a:o 0­a:w a:0- 0

100

80

6

40

20

o

"prayed"__6&&& ORIG. AMP.

2 -__ 10 +EidB

" parade"

3 ".'UU".BD OR I G. AM p.4 __ 1r/~6dB

short(80-85ms)

medium(IIO=1I5ms)

Ir/ DURATION

long(l40~145ms)

Figure 1. N::: 96. "Short," "medium," and "long" refer to Irl durations. The"short" condition corresponds to E:~~2.s with original Irl durationand to parades with 60 msec of the Irl deleted. The "medium"condition corresponds to Irl lengthened or shortened by 30 msec forsources pral~d and parade respectively. The "long" conditionrefers to parade of original Irl duration and to prayed with Irlduration increased by 60 msec. Represented in plots 1 and 2 are"prayed" responses to stimuli derived from source Qrayed. Plots 3and 4 correspond to source parade. Plots 1 and 3 correspond tooriginal Irl amplitude levels, plots 2 and 4 to manipulated Irlamplitude levels. Note that Irl duration has a decisive effect onlabelings. Amplitude may have some effect for the "medium" condi­tion.

166

Page 7: SONORITY AND SYLLABICITY: ACOUSTIC CORRELATES OF ......The terms "prominence" or "sonority" have been ied to various aspects of speech: as an overall feature of voice quality (see,

syllables and once to idon prepared answer sheets.

Results and Discussion

words by circling either "prayed" or "parade"

The responses resulting from these t\"JO (syllable counting and wordidentification) did not differ significantly: "one syllable" responses cor~

respond to " pr'ayed" responses to \"1i thin 6 for every source stimulus.The two tasks combined yield an N 96. Figure 1 plots percent one syllableor "prayed" responses vs. II"I duration for both source words. and for bothamplitude levels. It is clear from this that II"I duration has adecisive effect on listener judgments for both souroe words. It appears thatthe effects of amplitude are negligible for short and for long II"I durations.The case of the "medium" durations is less simple: for source prayed,amplitude affected judgments significantly. but this was not the case forsource parade. This asymmetry may indicate a general difference betweenproductionsof pr_a:'L~q and productions of • at least for this talker. orit may be due to token~>specific differences. Although an II average" token ofeach source word was chosen. these were natural productions and, hence, differalong many uncontrolled dimensions. In any case, the set of IImedium" durationstimuli support the conclusion that duration is a more effective cue thanamplitude: when the duration used resulted in judgments split between the twowords (curve 1), amplitude had a significant effect (curve 2); when, however,the duration used resulted in j stl~ongly in favor of one word or theother (" parade," in this case, curve 3). ampU>tude had little effect (curve4). The measurement data indicate that absolute II"I durations may 14ell beambiguous as indices of VB. in natural productions: the longestIl"l of PIale<! was of the same duratlon as the shortest 11"1 for parade. Whenthe words are embedded in sentences, it is likely that the ranges of the II"Idurations for the two source words vJtIl overlap. In sum, duration of I'd forthese words seems to be a sufficient cue to their distinction. Amplitude mayplay a role loJhera this cue is neutralized. While more open vowels aregenerally louder (of higher amplHude level) than less open vowels, differ­ences in formant frequencies. or vowel color, are also generally involved. Ifsonority is considered in articulatory terms. then the rather small effect ofamplitude is reasonable, given that spectral information vias unchanged. Theindependent testing of amplitude and spectral information as they relate tothe openness of the vocal tract is left for future research.

Voice Onset Time in stimuli

The measurement data for the and revealed consistentlylonger mean values of voice onset time VOT) for than for paraqes forboth talkers. Although the ranges of these values overlap heavily for bothtalkers. the mean difference is 10 msec for the male talker, 20 maec for thefemale. Further. the longer VOTs are correlated with shorter Irl durations.The situation is not entirely parallel to that of voiced vs. voiceless stopsin initial position: (1) the duratton of the segment follovJing the initialstop (as well as VOT value) serves to distinguish "prayed"~"parade." "plight"­"polite." etc .. but not initial Ibdgl vs. Iptl<l; (2) the differences in VOT

167

Page 8: SONORITY AND SYLLABICITY: ACOUSTIC CORRELATES OF ......The terms "prominence" or "sonority" have been ied to various aspects of speech: as an overall feature of voice quality (see,

correspond to dHferences not in the voic:lng of the initial stop but in thesyllabici ty of the follm,ri.ng and (3) increases in VaT are notnecessarily correlated w:lth increases in formant frequency onset values, sincethe li.quid may be state throughout a wi.de range of VaT values e

However, both situations involve coordination of vocal tract opening and theonset of voicing. In other words, sonority is not merely a matter of openingand closing the vocal tract. but of vocal tract dynamics and their interactionv/Hh laryngeal control. Thus, a continuum that switches judgments from one totwo syllables based on VaT alone is evidence that the perceptual significanceof the relative timing of vocal tract gesture and laryngeal pulsing, asev1,denced by VaT, 1,s generalizable beyond the class of initial stop conso~

nants.

The stimuli for this exper1,ment were prepared on the OVE-3 synthesizer atHaski.ns Laboratories. Stimuli. perceived as "plight" and "polHe" vlere creat~

ed. Spectrogr'ams of endpoint stimuli are shown in Figure 2. In order toavoid the intrusion of ini.Ual Ib/ percepts, the shortest VaT used was 49msec. VOT wa s increased from J~9 msec to 126 msec in F{ msec steps. Theincrease in VaT decreased the buzzv-excited steady state /1/ duration from 91msee to 14 msec, thereby i. the h1,s&-excHed steady state /1/ from 14to 91 msec. A similar set of stimuli was created i,n \"hlch silence replacedthe hiss between initial burst and voicing onset. These stimuli are probablyless re sentative of actual articulations than the first set, but they dopermit the investigat1,on of the effect of hiss vs. silence, It 1,sreasonable to use these stimuli .~ since hiss may not alwaysbe audible in speech contexts, and a because in fact result ina convincing "plight"~ilpolite" continuum. Four randomizations of these 24stimuli were presented to 13 pa1,d volunteers (Yale undergraduates) forlabeling as "polite" or "plight." The graphs in 3 thus represent N ::48.

In Figure 3 it 1,3 seen that VaT is an effective cue to the plight~polite

distincUon. Furthet". it appears to make difference ~1hether~the periodbetvJeen burst and onset of voic i.s noise~~filled or silent: subjects, onthe average. need about 16 msec longer voiced state /1/ to hear politevs . .Eligh~ ~lhen silence hiss in this interval. The steady stateportion of the /1/ is crucial to hear vs. , but the voicedpart is more critical than the voiceless part. That is, the total steadystate /1/ is not the critical factor here: all stimuli have the same durationin th1,s respect, vmat appears to be critical is the overall sonority of the/1/, As is shown here, duratton of vo:lcing of the /1/ effecttvely switchesjudgments from "plight" to "polite" in both hiss and the silent condi~

tions. If, hOlf/ever, hiss is pt'esent bet1fleen burst and voicing onset, thecross~over is realized vlith a shorter voiced /11 duration (about ·16 msecshorter) than if this interval is silent, This suggests a sonority hierarchyof voicing over hiss over silence.

168

Page 9: SONORITY AND SYLLABICITY: ACOUSTIC CORRELATES OF ......The terms "prominence" or "sonority" have been ied to various aspects of speech: as an overall feature of voice quality (see,

Figure 2. SPECTROGRAMS OF SYNTHETIC STIMULI. At the left are the endpointsof the hiss condition of the synthetic Iiplight"~llpoliteli continuum.At the right are the endpoints for the silence condition of thiscontinuum. The two displays at the top represent the stimuli withlongest VaT values, and, hence, shortest voiced /1/ duration. Thedisplays at the bottom represent the shortest VaT values used,which correspond to the longest duration of the voiced steady state/1/ •

169

Page 10: SONORITY AND SYLLABICITY: ACOUSTIC CORRELATES OF ......The terms "prominence" or "sonority" have been ied to various aspects of speech: as an overall feature of voice quality (see,

VOT OR DURATION OF VOICED Ill, N:: 48

--- HISS.""""" SILENCE

(J)w(J)ZoCl.(J)W0::

F­ZWo0::WCl.

100

80

60

40

20

o

14(112)

28(98 )

42 56 70(84) (70) (56)

VOICED III (msec)

84 98(42)VOT in msec

Figure 3. N:: 48. The solid line indicates percent "plight" responses tostimuli in which the interval between burst and voicing onset washiss filled. The dotted line represents the condition in whichsilence filled this interval. The abscissa is labeled with dura­tion of voiced Ill. Underneath the III duration figures, thecorresponding VaT values appear in parentheses. All stimuli havethe same steady state III duration (91 msec); they differ in thepoint at which formant excitation is switched from hiss to buzz.Note that the longer the voicing of the III (i.e., the shorter theVaT), the more "polite" responses elicited (Le., fewer "plight"responses) • Further, the duration of voicing of the III at the"plight"-"polite" cross~over point is about 16 msec longer whensilence rather than hiss is present in the interval between burstand voicing onset. This suggests that, with respect to duration,voicing is more effective than hiss in cueing "polite" rather than"plight," and that hiss, in turn, is more effective than silence.

170

Page 11: SONORITY AND SYLLABICITY: ACOUSTIC CORRELATES OF ......The terms "prominence" or "sonority" have been ied to various aspects of speech: as an overall feature of voice quality (see,

Relative vs. Absolute Duration in Synthetic Stimuli

Stimuli and Subject~

The third and final experiment to be reported here involves the issue ofrate, or absolute vs. relative durations. In this experiment two factors arepitted against each other: the absolute duration of the steady state /1/ andthe overall rates of the stimuli. Stimuli similar to those used in the hisscondition of Experiment 2, but with larger step sizes, were used in thisexperiment under four conditions:

(1) ORIGINAL: stimuli of Experiment 2 (hiss condition) vJith VOTvaried from 42 to 126 in 14 msec steps (voiced III durationthereby decreasing from 84 to 0 msec);

(2) EXTENDED: /1/ duration of the stimuli increased by 35 msec,thus adding two new stimuli;

(3) FAST: stimuli of condition (1) played out at a 40% faster rate,and

(4) EXTENDED FAST: stimuli of condition (2) played out at a 40%faster rate.

Cond i tions (1) and (4) thus represent stimuli vlith the same ab solute /1/duration (90 msec) , but played out at different rates. Conditions (1) and(3), on the other hand, have the same /1/ durations relative to the durationof the entire stimulus. Likewise, stimuli in conditions (2) and (4) have /1/durations of the same percentage of overall duration, though the two sets ofstimuli are played out at different rates. Figure 4 shows spectrograms of thestimuli used for the shortest VOT value in the EXTENDED and EXTENDED FASTconditions. The step sizes were 10 msec in the fast conditions, 14 msec inthe other two conditions. The extended conditions thus involved two morestimuli than the unextended conditions. Three tokens of the stimuli werepresented to 10 paid volunteers (Yale undergraduates): N = 30.

Results and Discussion

Figures 5a and 5b sholt! " plight" responses as a function of duration ofvoiced /1/ expressed as a percentage of overall duration. VOT and voiced /1/duration are here inversely correlated: longer VOT values correspond toshorter voiced /1/ durations. Note that for both the original (Figure 5a) andthe extended (Figure 5b) stimuli, an increase in overall rate elicits more"plight" responses. That is, when relative durations are equated, an increasein rate does affect listener judgments. The effect of absolute duration isshown in Figure 6. In this figure "plight" responses are plotted as afunction of the absolute duration of voiced /1/. It thus appears that, otherthings being equal, the absolute duration of the voiced portion of the /1/ hasa greater effect on listener judgments than its relative duration, at leastfor the rates and the durations used here. Further research may reveal thatfor certain ranges of absolute duration, the relative duration of a segment

171

Page 12: SONORITY AND SYLLABICITY: ACOUSTIC CORRELATES OF ......The terms "prominence" or "sonority" have been ied to various aspects of speech: as an overall feature of voice quality (see,

Figure 4. SPECTROGRAMS OF SYNTHETIC STIMULI USED IN EXPERIMENT 3. Thesedisplays represent the shortest VOT (longest voiced /1/) stimuliused for the EXTENDED. and the EXTENDED FAST conditions.

172

Page 13: SONORITY AND SYLLABICITY: ACOUSTIC CORRELATES OF ......The terms "prominence" or "sonority" have been ied to various aspects of speech: as an overall feature of voice quality (see,

RELATIVE DURATION OF VOiCED III

ORIGINAL EXTENDED

20~ 30~155-11)- VOT IN MlilEt

..'.'",'",

"'"3~ _'...~....

'll.••...._-_ EXTENDED

_._... EXTENDED fAST

O~ 10~

U~O-I") "'-IIIIl)20~ 30~

"0-42) -VOi II\! Il/lSIl!:C1O~

110-14)

_ORIGINAL

-FAST

o~

U~o-Itl)

I­ZLU()0::LU0..

(f)LU(j)Zo0..(f)LU0::

"l­ItS\...J0..

VOICED III AS PERCENTAGE OF OVERALL DURATION

Figure 5. Figure 5a (left) and Figure 5b (right). N = 44. Plotted here are"plight" responses as a function of voiced /1/ duration. (VaTvalues are in parentheses.) /1/ durations are expressed here inpercentages of overall stimulus duration. Note that for the samerelative durations (either for original or extended stimuli). thefaster overall rates elicit more "plight" responses.

f--''-JW

Page 14: SONORITY AND SYLLABICITY: ACOUSTIC CORRELATES OF ......The terms "prominence" or "sonority" have been ied to various aspects of speech: as an overall feature of voice quality (see,

ABSOLUTE DURATION OF vOle Dill

100 """""." OR I GINAlDO

,~.:\EXTENDED

--FAST

(f) ,\\ ---- EXTENDED FASTWCf)

Z0a..Cf)wex:t-J:

~_~..-_~

<:J---_._---"~----_~

...Ja.. 4t-Z ""W '"'""0 "'""ex: "'..

2 ....w ....

""a.. ............

o

20 40 60 80

MSEC OF VOICED /11

100 120

Figure 6. N:::: 44. The same data presented in Figure 5 are plotted here as afunction of absolute /1/ duration. As is seen in this figure, forthe /1/ durations and rates used, absolute rather than relativeduration seems to be the crucial factor in listener judgments. Thecross~over point here is 55 to 65 msec voiced /1/. This isconsistent with the value found in Experiment 2.

174

Page 15: SONORITY AND SYLLABICITY: ACOUSTIC CORRELATES OF ......The terms "prominence" or "sonority" have been ied to various aspects of speech: as an overall feature of voice quality (see,

with respect to its surround makes a difference, but these data so farindicate a different story: that absolute duration is a more effectiveindicator of sonority than is relative duration.

CONCLUSIONS

If it is assumed that the aud i tory term "sonori ty" can account for thesyllabic vs. non~syllabic distinction of certain segments, then this studyprovides evidence for acoustic correlates of this term. Sonority, like otherperceptual terms such as "pitch" or "loudness," has multidimensional acousticcorrelates. The experiments presented here bear on the roles of duration,amplitude, voicing, hiss, and silence as they relate to sonority. The resultsof these experiments support the following hypotheses:

(1) duration is a more effective cue to sonority than is amplitude,

(2) amplitude may playa role when duration is ambiguous,

(3) when duration is manipulated, voiced segments tend to be moresonorant than hiss~excited segments, which in turn appear moresonorant than silence,

(4) absolute duration is more important to perceived sonority thanrelative duration.

Acoustic or auditory correlates have been proposed (but not tested) fOi4

the perception of syllabic peaks vs. margins. Fischer~J0rgensen (1975)suggested that liquids are auditorily weaker than vowels since most of theirenergy is concentrated in the first formant. Fant (1969/1973) suggested aweighted sum of the intensities of F1 and F2 compared to that in adjoiningsegments. Gaitenby and Mermelstein's (1977) weighting function, which favorsthe frequencies between 500 and 4,000 Hz, implies a similar acoustic-auditoryemphasis, although in this case the ~4]eighttng is done in order to analyzesyllabic stress rather' than internal syllable structuce. Left for furtherresearch is the implementation of these suggestions in a test of thehypotheses proposed in the present study.

REFERENCE NOTE

1. Price, P. The :!JL~~~:~:~

vania. 1978.Unpublished M.A. paper University of Pennsyl-

REFERENCES

Bell, A. If nativE::, speaker~ can't. count ~syllables, what can they do?Bloomington, Ind.: Indiana University Linguistics Club, 1975.

Bell, A. Segment organization phenomena and their explanations. In A. Bell &J. Hooper (Ed s.). ~yllabl~~ "md New York: North Holl and,1978.

175

Page 16: SONORITY AND SYLLABICITY: ACOUSTIC CORRELATES OF ......The terms "prominence" or "sonority" have been ied to various aspects of speech: as an overall feature of voice quality (see,

Bell, A., & Hooper, J. Issues and evidence in syllabic phonology. In A. Bell& J. Hooper (Ed s.). Syl~)es an2, segment~. Ne\11 York: North Holland,1978.

Bloomfield, L. Languag~. New York: Holt, 1933.Chomsky, N., & Halle, M. Ihe _sound pattern of Englisi'!. New York: Harper and

RO\I1, 1968.Fant, G. Distinctive features and phonetic dimensions. In Speech sounds and

features, Cambridge, Mass,: M.I.T. Press, 1973. (Originally published;1969) •

Fischer-J~rgensen, E. Jr~}n Eh~nologica! theorr. Copenhagen: AkademiskForlag, 1975.

Gai tenby, J., & Mermelstein, P. Acoustic correlates of perceived prominencein unknown utterances. Haskins ~torie~ Jitatu~ ]eport on SpeechResea~. 1977. SR~49, 201~21~~

Greenberg, J. Is the vowel~consonant dichotomy universal? Word, 1962, l§..73~81.

Hjelmslev, L. The syllable as a structural unit, In Proceedings of the ThirdInternational Congress of Phonetic Sciences. 1938, 266~272.

Hockett, C. JCsystemOfdescriptive pho~- Language, 1942, 18, 3~21.Hockett, C. A course in modern linguistics. New YOt~Macmillan: 1958.Hooper, J. An~trodUcTIM-t() natural- generative phonology. New York:

Academic Press, 197 • ~~~~.~~-~ ~-~~.~~~~ ~~~~

Jones, D. The phoneme. Cambridge, England: Heffer, 1950.Lebrun, Y. -Sur~a syllabe, sommet de sonorite. Phonetica, 1966, Ji, 1~15.

Lisker, L., & Abramson, A. A crosse-linguistic study of voicing in initialstops. Word, 1964, 384~422.

Mermelstein, P. Automat segmentation of speech into syllabic units.Journal of the Acoustical Society of America, 1975, 58, 880~883.

Pike, .K~onetTCS. Ann~-Arbore";11fch: :~--unIver:srty of Michigan Press, 1943.Sievers, E. GrundzUge der Phonetik. Leipzig: Breitkopf & H!rtel, 1893.Studdert~Kennedy,~~ ~Speech~perception. In N. Lass (Ed.), Contemporary

issues in <~~peri~~ntal ph~!letics (Chap. 8). New York: Academic Press.

An ?u~lin~ of Englisi'! str~~tu~~. Studies inPapers 3. Norman, Ok.: Battenburg Press, 1951.of syllabic phonology. 1ingui~tiche Berichte,

Language, 1941,The syllabic phonemes of English.197 .

Trager, G., & Bloch. B.17, 223~246.

Trager; G., & &nith, H., Jr.Linguistics: Occasional

Venneman, T. On the theory1972, l§., 1~18.

Wedin, S., Leanderson, R., &: ltJedin, LPhoniatrica 1978, 103~112.

Evaluation of voice training. Folia

176