Upload
truongdiep
View
221
Download
5
Embed Size (px)
Citation preview
Dept. for Speech, Music and Hearing
Quarterly Progress andStatus Report
Swedish word accent as afunction of word length
Alstermark, M. and Erikson, Y.
journal: STL-QPSRvolume: 12number: 1year: 1971pages: 001-013
http://www.speech.kth.se/qpsr
STL-QPSR 1/1971
SPEECH ANALYSIS
A. SWEDISH VTORD ACCENT AS A FUNCTION O F WORD LENGTH*
M. Als te rmark and Y. Er ikson
i. Problem
The a im of the study is to elucidate the realization of acute and grave
accent in Swedish, in specific to study the fundamental frequency cor re la tes
of the numbers which traditionally a r e used to designate s t r e s s levels and
accent type. What i s the rolc of word length, i. e m number of syllables ?
2. Pre l iminary experiment
The resu l t s of a pilot experiment on the grave accent a s a function of
word length indicated that the maximum values in the Fo curve showed a ;
systematic variation in their temporal positioning and that these values did
not consistently occur a t the temporal midpoint of the vowel. Since this fact,
to our knowledge, had not been previously reported in the l i terature , we con-
sidered i t an interesting object for further study.
3. Speech ma te r i a l
3 . 1 Model words. The word mater ia l i s composed of 10 words with acute ---------- and 10 words with grave accent. The words with acute accent a r e mono-,
di- , t r i - , and quadra- syllabic with varying s t r e s s pat terns which include the
position of the s t ressed vowel in the f i r s t , second, third, o r fourth syllable
(Appendix ), These words were chosen so that the s t r e s sed vowel i s always
phonologically long and the unstressed vowels a r e always phonologically short.
3 . 2 Nonsense words. Starting with each model word a s a reference, a non- ------------ sense word with the same s t r e s s pat tern a s the model word was constructed.
The s t r e s sed vowel was substituted with [ a:], the unstressed vowel with
[a]. The choice of this par t icular s t ructure for the nonsense words can be
motivated by the des i re to control a number of i r relevant variables:
(1) The choice of the vowels can be motivated by the requirement that the duration of the vowel segments in the nonsense words correspond to that of the model words. By consistently using the same vowel, the var ia - tion in inherent fundamental frequency of the vowels of the model words was eliminated (Lehiste 1970, p. 18).
Q Thesis work associated with the Department of Phonetics, Stockholm University
STL-QPSR l / f9? : 2.
(2. a) The choice of voiced plosives resulted in the exclusive use of voiced segments. This p i -od~ced 2 fair ly 5v-n z:J. contiiluous Fo curTre which was appropriate for the f ~ ~ a d a m e n t a l P:?ec_~e~cy measurements .
(2. b) The plosive facilitated s-gXcntatlon. F r ~ r t ~ e r m o r e , i t seemed de- sirable to counteract the effect that a voiceless plosive has on a follow- ing vowel, i. e . a peak F, ~rzlue immrtdiately zftzr the consonant (Lehiste 1970, p. 7 1). The most a p p r o p ~ i a t e segment here would have been a sonorant since i t would have t'xe leas t effect on the Fo curve. A [b] was c11o sen, :lo.,-~eves, for segznentationzl reasons. Both model words and nsnsense-words were set in the f rame: " ~ t ... . . . t i l l lunch" (Eat . . , . . . for lunch). The segment immediately p re - ceding and immediately following the nonsense words i s therefore voice!? s s (this was doxe with the r;v.beequ-enheementation of rningo - g r a m s in mind).
4. Re cordins technique - Each word pa i r , consistin2 of n model word and a corresponding non-
sense word in .: -. :~lo~cmc.nt.ioned f~a-mc,, -:.?as recorded five t imes in random
orde r with normal vocal effort by a phonetically trained male talker. The
recording tool< place jn the ~ o u n d proof room a t the Department of Linguistics,
Stockholm Universlty. A i\Tagra tap? r ecorde r wi'i:? a tape speed of 7 *"/set was employed and a micro9hone distance of 25 cm was maintained.
5. Analysis
5. I Mingograms. Mingograms of the recorded ma te r i a l were made. The ------ --- paper speed was 50 mm/sec and the low p a s s f i l t e r cutoff frequency was , 64 Hz which meant an integration time of approximately 7 .9 msec .
5 .2 Segmentation. The segmentation of the duplex oscil logram was under- ---- ----- taken. Points were chosen a t the boundaries between vowels and consonants
and a t the temporal midpoint of the vowel segment. Vertical l ines were
drawn f r o m these points to the zero-line of the Fo curve. This method was
made possible by the negligible t ime delay associated with the integration
time. The fundamental frequency curve was drawn in on these lines. 1 5.3 Measurements. Only the nonsense words were considered in the m e a s - ----------- urements . The mater ia l , ?:?en, consists 01 100 words: 10 acute and 10
grave with five observations of each wort!. .In spite of the precautionary I
measures taken in the construction of the nonsense words, difficulties a r o s e
in the identificatfon cf ?he zx los io r , of lh- [b3 in the fir st segment. The
point of reference for the measurement of segment length in a word like
[ba:bab] i s the point at which voicing f i r s t appears . Each segment is m e a s -
u red f rom this point of reference. The accuracy of the measurements i s in
STL-QPSR 1/197 1 3.
the neighborhood of t- 10 msec. Lehiste points out that grea ter accuracy
of measurements fal ls below the perceptual threshold for discrimination
between sounds of different duration which i s 10-40 msec . Fundamental
frequency was measured with an accuracy of ca +- 5 Hz.
6. Resul ts
The values for segment length and fundamental frequency were tabulated
(see Appendix). The averages for the five observations of each word were
computed. F r o m these averages, an average curve was drawn for each
word by plotting the values for F and segment length a t the segmental 0
boundaries and then drawing straight l ines between these points. With the
help of the tables and the average curves, the following observations could
be made (see Figs. I-A-1 - I-A-4). ,
6. 1 Words with acute accent (Figs. I-A- 1 and I-A-2) . ---------------- 6. 1. 1 The syllable which c a r r i e s the s t r e s s level 4 occurs finally and i s
preceded by one, two, o r three uns t ressed syllables (0 o r 1)i:-.
The F maximum was found to occur close to the temporal midpoint of the 0
vowel which therefore was selected a s sampling point. The F value a t this 0
point was about equal for a l l the words. The fundamental frequency has a
tendency to decrease slightly during the vowels in the unstressed syllables.
(This m a y be due to the choice of the consonant in the nonsense words, i. e,
the voiced plosive. )
Figure
Since no difference has been observed between the manifestation of s t r e s s between the syllables marked with the s t r e s s levels 1 and 0, they will be considered a s equivalent uns t ressed syllables.
- - .- -
model word
I-A- I
and
I-A-2
F value HE
- .- -- . -. - - - - -- - - . - -. - - --- --
s t r e s s 1 point of I distance: Fo m a x f rom t-ref ( m se c)
pattern
125 1 129
132 i
13 0
Fo m a x (segm. bound. )
2 18
3 18
vin
ruin
4
anemon
anomali 2 1 472
1004 574 - - I _ _ _ - --
v4 i
STL-QPSR 1/197 1 4.
6. 1.2 The syllable which c a r r i e s s t r e s s 4 occurs initially and is followed
by one, two, o r three syllable s.
The F maximum 1 ies a t the boundary between V and the preceding con- 0 2
sonant - the fundamental frequency value i s here again, about the same
Figure
I-A- I
and
I -A-2 i
for a l l the words. A comparison of the resu l t s in 6. I. 1 and 6. 1.2 shows
that a shift of the maximum value towards the following segment has taken
model word
valen
mumie
jonierna
place. It a l so seems evident that this shift has become l a rge r with the in-
c rease in the number of unstressed syllables a f te r the s t r e s sed syllable.
s t r e s s pattern
40
40 1
400 1
A t - te s t was made to check the difference in the temporal positioning of the
Fo maxima and i t showed a significance level of 95 O/o between 40 and 4001
but not between 40 and 401 nor between 40 1 and 4001. Since the Fo max-
imum i s stretched in t ime (i. e. extended over m o r e segments) a s the
word gets longer, it i s difficult to determine the magnitude of the afore-
mentioned F maximum shift. Note that a higher Fo value can be observed 0
Fo value Hz
135
134 1 I 13 1
I
in V2, which c a r r i e s the s t r e s s 0, than in V which c a r r i e s the s t r e s s 4. 1 The ra te a t which the F maximum decreases a f te r reaching i t s maximum
0
i I
point of Fo m a x (segm. bound. )
C2v2
C2v2
C ~ v ~
seems to be related to the number of syllables in the word.
distance : Fo m a x f rom t-ref (msec)
3 44
350
372
Note that the F envelope for the syllable which c a r r i e s the s t r e s s 4 is, 0
therefore, not influenced by the addition of uns t ressed syllable s before this
syllable, but only if they a r e added af te r the s t r e s s 4. A consequence of
this fact is that the Fo contours for ' ruin' (04) and ' valen' (40) show su r -
pr is ing s imilar i ty in the temporal positioning of the F maximum (see Fig. 0
I-A-5, top graph).
STL-QPSR 1/197 1 5.
6. 1 .3 The s t r e s sed syllable i s preceded and followed by uns t ressed
syl lable s.
The fundamental frequency maximum fo r 040 and 0401 l i e s a t the boundary
between the s t r e s sed vowel and the following consonant. A temporally
stretched maximum value was observed in 040 1, i. e. the same Fo value
was observed a t the boundary between the s t r e s sed vowel and the following
consonant and a t the boundary between this consonant and the follov~ing un-
s t r e s sed vowel. Here again, the Fo value at the segment boundary where
the maximum occurred showed very little variation in the two differing
s t r e s s patterns. The shift of the Fo maximum towards the following seg-
ments that has been observed, could bc viewed a s a function of the un-
s t r e s sed vowels which have been added af te r the s t r e s sed vowel in these
words. Note that the shift has therefore increased in magnitude for 0401
which unlike 040 and 1040 has two unstressed syllables af ter the s t r e s sed
syllable.
6.2 V o r d s with grave accent --------- ------ 6.2. 1 The two syllables which c a r r y the s t r e s s e s 3 and 2 occur finally in
the word and a r e preceded by two, one, o r no uns t ressed syllables.
Fo value Hz
13 1
13 1
128 1
Figure
I-A- 1
ancl
I-A-3
model word
lavinen
armenier
violine r
s t r e s s pattern
0 40
040 1
1040
point of Fo m a x (segm. bound. )
v2C3
v2C3-C3v3
v3C4
1 Fizure
I - A - 3
and
I-f'.-4
Fo value Hz
133
132
130
s t r e s s pattern
32
032
0032
model word
valdr
milji~v31-d
idealmjal
I I f I 1
point of Fo max 2 (segrr-. bound. )
v2
v3
v4
point of Fo m a x 1 (segm. bound. )
C1vl
C2v2
C3v3
I
Fo value Hz
117
122
114
STL-QPSR 1/1971 7.
6.2.3 The s t r e s sed syllables occur initially (3) and finally (2) in the word
and a r e separated by two, one, o r no unstressed syllables.
The f i r s t s t r e s sed syllable in the two words which contained uns t ressed syl-
lables between the s t r e s sed syllables, shows an Fo contour with a maximum
value which i s stretched in time. There may be reason to postulate that the
unstressed syllables which follow the s t ressed , have influenced the Fo con-
tour causing a shift in the maximum fundamental frequency value. In 302,
a common Fo value has been observed a t the midpoint of the s t ressed vowel
and a t the boundary be tween this vowel and the consonant which precedes it.
In 3002 the rnnxjlllulli value has shifted so that i t occurs a t the temporal
midpoint of the s t r e s sed vowel. In the second s t r e s sed syllable, the Fo
maxitllum has been observed a t the midpoint of the vowel in a l l the words.
6.2.4 The two s t r e s sed syllables l ie adjacent to each other and an un-
s t ressed syllable has been added before, af ter o r on either side of these
syllables.
Fo value Hz
133
126
128
Figure
I
i I-A-3
point of Fo m a x 2 (segm. bound. )
v2
v3
v4
1 Figure
1 I-"-3
and 1
1 I-A-4 I I t
-- -- - - - - - . -. . - - - I
model i s t r e 6s 1 point of Fo value
model word
point of Fo mnx I (segm. bound. )
I
word
valBr
lamadjur
s t r e s s pat tern
m i l jovdrd
livlina
bananldda
point of Fo m a x 2 (segm. bound. )
0 3 2
320
0320
- - . . . - - F, value Hz 1
In the f i r s t s t r e ssed syllable, the fundanlental frequency illaxi~riurn l i e s a t the
boundary between the vowel and i t s preceding consonant. Note that the funda-
mental frequency value dec reases to a minimum during the vowel. In the
Hz
117
113
ladogaO /
pat tern [ Fo m a x I
3002
3 2
302
111
(segm. bound. )
ivi
C i V 1 -V1
STL-QPSR 1/1971 8.
second s t r e s sed syllable, the F o maximum has shifted towards the following
segments a s unstressed syllables in 320 and 0320 were added af ter the 32
patterns. No shift was apparent in 032 where the s t r e s sed syllsble occurs
finally.
6.2.5 One uns t ressed syllable i s placed between the two s t ressed , and one
uns t ressed syllable has zlso been added before and one af ter this s t r e s s pat-
t e rn in two of the words.
F igur c
I-A-3
and
I -A-4
The f i r s t s t r e s sed syllable shows an Fo contour with a maximum value which
i s stretched in time. The same fundamental f r e q u e ~ c y has beer- therefore
observed a t two different segmental boundaries, i. e. a t the midpoint of the
s t ressed vowel and a t the boundary between this vowel and i t s preceding con-
sonant. This shift i s most probably a consequence of the unstressed syllable
which immediately follows the s t ressed syllable. Again, the maximum F 0
value in the second syllable l i e s a t the midpoint of thc vowel if this syllable
is not followed by an uns t ressed syllable. Fo maximuin has been obsej-ved,
then, in V and V4 in 302 and 0302 respectively. lLn obvious shift towards , 3 I
the following segments has occurrecl, however, in 3320 where the maximum
value l ies between the final unstressed vowel ar.d i t s preceding consonant.
Finally, i t can be noted that there i s l i t t le variation in the maximum value
for the three words. Maximum 1 occurs consistently around 115 Hz and
maximum 2 around 130 Hz.
6.3 C o m ~ a r i s o n of grave accented words with acute accented words - - - - - L - - - - -
point of Fo m a x 2 (segm. bound. )
6.3.1 A comparison of an acute accented word with a grave accented word
shows the following s imilar i t ies and diifcrences in the Fo envelopes of the
two accent types (Fig. I-A-5, middle graph). The grave accented words
I a r e consistently 100- 150 m s e c longer than the words with acute accent.
This difference in duration seems to be a consequence of the fact that the
Fo value Hz
113
115
I
point of FO m3-x 1. (segm. bound. )
C V -'ill 1 1
C2v2-v2
L
model TVO r d
lnmadjur
milanobe
Eva-Lena C; v - v
s h e s s pat tern
302
0302
3020 112 i ? i I
STL-QPSR 1/197 1 9.
words with grave accent contain two s t r e s sed syllables. (In Swedish, vowel
segments in s t r e s sed syllables a r e longer than the same segments in un-
s t ressed syllables, see Lindblom (1970)).
As has been mentioned ea r l i e r , in both acute and grave accented words,
the Fo maximum has been sampled a t the temporal midpoint of the vowel in
the s t r e s sed syllable when this syllable occurs finally in a word. Because
of the difference in duration, however, i t i s apparent that the temporal posi-
tion of the F maximum in 04 does not correspond to that in 32. The max- 0 - -
imum value fo r the s t r e s sed syllable in 04 corresponds in 32 m o r e closely
to a point a t which a minimum occurs (Fig. I-A-5, middle graph).
6 . 3 . 2 The above observation seemed to motivate a m o r e thorough examina-
tion of the minimum values. This examination included only grave accented
words since the words with acute accent showed no consistent definite mim-
imum values.
It has been established ea r l i e r that unstressed syllables following the
F maximum a r e important to the temporal positioning of the maximum. 0
The fundamental frequency minimum seems consistently to occur at the vowel-
consonant boundary following the first s t r e s sed vowel in all ten words. Note
that the minimum F value i s relatively constant a t around 90 Hz, and lower 0
than Fo for p r e - and post-tonal uns t ressed syllables.
Fundamental frequency -- minima for words . with grave accent
STL-QPSR 1/1971 10.
6 . 3 . 3 A fur ther comparison of grave with acute accented words shows that
the syllable which c a r r i e s s t r e s s 2 in words with grave accent has nearly
the same F envelope a s the s t r e s sed syllable in acute accented words 0
(Fig. I-A-5, bottom graph). Note how well the contour for 4 0 1 corresponds
to that of 200 in 3200.
7, Synthesis
7. 1 Introduction. Let u s assume that we s t a r t with the hypothesis that the --------- acute word accent is fundamental and that the grave accent is derived f rom
the acute. If s t r e s s 4 in words with acute accent corresponds to s t r e s s 2 in
grave accented words ( see above), i t should be possible, with the help of
two contours, one for an acute accented word and one for s t r e s s 3 in a grave
accented word, to construct a cor rec t contour for a compound word with
grave accent.
7.2 Construction of s t r e s s contours for grave accented words _ _ _ _ _ _ _ - - - _ _ - - - - - - - - - - ----------- 7.2. 1 In the construction of the s t r e s s contour for 3200, the contour 401
can be utilized f rom the segmental boundary C lVl which would correspond
to 200 in 3200. The contour for s t r e s s 3 taken f rom the pat tern 32 is then
placed to the left of the 401. Fig. I-A-6 (top graph) shows a comparison
between the constructed contour and the actual measured value for 3200.
7.2.2 In the construction of 3020, which i s s imi lar to that described above,
the contour f o r 040 corresponds to 020 in 3020 (the uns t ressed syllables a r e
positioned in the same manner) ( ~ i g . I-A-6). The contour for the syllable
which carries s t r e s s 3 cannot, in this case, be taken f rom 32 a s in the p r e -
vious construction since 3020 has an unstressed syllable af ter s t r e s s 3
which, a s has been mentioned ea r l i e r , seems consistently to resu l t in a
temporally stretched maximum value 1. Instead, the contour for s t r e s s 3
i s taken froin the pat tern 302, where a s t r e s sed syllable i s followed by an
unstressed.
7.2.3 0320 has been constructed in a s imilar manner. Here i t i s necessary
to place a n unstressed syllable a t the beginning of the construction. It has been previously established that an unstressed syllahle which precedes a
- - s t r e s sed does not influence the s t r e s s contour a s a whole. The addition of
a typical contour for an unstressed syllable with a slight decrease in the F 0
value during the vowel would be sufficient to construct this contour which
would also be appropriate for 320 (Fig. 1-A-6).
STL-QPSR 1/1971 12.
8. Summary
There i s no one-to-one relationship between the s t r e s s level numbers
traditionally used to designate s t r e s s levels and accent type on the one hand,
and the acoustic cor re la tes of these numbers on the other. In a word with
grave accent, the second s t r e s sed syllable a s well a s the adjacent unstressed
syllables, which may occur , a r e identical to those syllable s in words with
acute accent in their realization of fundamental frequency values and position
of maxima. No difference could be detected in the realization of syllables
carrying the numbers 1 and 0. The realizntion of acute and grave accents is
dependent on word length, i. e. number of syllables in a word. The unstressed
syllables, which precede the s t ressed , do not influence the s t r e s s contour,
whereas uns t ressed syllables following the s t r e s sed tend to s t re tch the max-
imum value in t ime towards the following segments.
9. Future work
This study implies some directions for future work.
( I ) Substitute the voiced consonant with a voiceless consonant. Find out which p a r t of the fundamental frequency envelope i s most important and may not be masked by a voiceless interval.
(2) Substitute a long vowel with a short vowel and a t the same t ime voiced consonant with voiceless, i. e. [pappap] instead of [ba:ba:b].
(3) Vary speech tempo to tes t f o r the undershoot effects predicted by Ohman' s model (STL-QPSR 2-3/1967).
3 0 (4) Study of 30 a s opposed to 32 contour, e. g. ta lat (past participle of tala=
3 2 speak) and talakt (compound noun = speech act) , which in traditional nota- tion have different contours.
(5) Temporal measurements .
( 6 ) Synthesis Johan Liljencrants (experiments using a computer program developed by Johan Liljencrants, Dept. of Speech Communication).
References:
G. Fant (1968): "Analysis and synthesis of speech processes" , in Manual of ~hone t i c s , ( ed . B. Malmbcrg ( ~ m s t e r d a m ) , pp. 173-277. I
E, ~ d r d i n g (1970): "Word tones and larynx muscles", W P f r o m Phon. Lab., Lund University, Nr 3, pp. 20-46.
K. Hadding-Koch ( 196 1): "Word tones and intonation", Ch. 6 in Acoustico- Phonetic Studies in the Intonation of Southern ~wedish(Lund) , pp. 62-74.
I. Lehiste ( 1970): Suprasegmentals (Cambridge, Mass. ). B. Lindblom ( 1970): "Temporal organization of syllabic processes t t , invited
paper read a t the ASA-meeting, Atlantic City, April.
B. Malrnberg ( 1959): "Bemerkungen zum schwedischen Wortakzenttt, Z . f. Phonetik und allg. Sprachwiss. 12, No. 1-4, pp. 103-207. -
S, tjhrnan (1967): "Word and sentence intonation: A quantitative modelll, STL-QPSR No. 2-3/1967, pp. 20-54.
STL-QPSR 1/1971
APPENDIX
S t re s s patterns of model words
Acute
4 4 0 4 01 vin valen mumie
0 4 1 0 4 ruin anemon
0 4 0 lavinen
4 0 0 1 jonierna
1 0 0 4 anomali
0 4 0 1 armenier
10 4 0 violiner
Grave
3 2 3 2 0 3 2 0 0 val5r 1 ivl ina mdlgorare
0 3 2 0 03 2 miljovsrd idealmjijl
3 0 2 3 0 02 lamadjur 1 adogao
0 3 0 2 cdlanobo
3 0 2 0 Eva-Lena
40 valen
4 vin
.I .2 .3 .4 .5 .6 .7 .8 .9 l.0 sec. b a: b
04 ruin
.I .2 .3 .4 .5 -6 .7 .8 .9 1.0 sec. b a b a: b
4 01 murnie
b a: b a b a b
040 lavinen
.1 .2 . 3 -4 .5 .6 .7 .8 .9 ID sec. b a b a: b a b
.1 .2 .3 .4 .5 .6 .7 .6 .9 I0 sec.
b a b a b a: b
Fig. I-A- I. Averaged F time variations for nonsense words. Consonant segments indicated 0
by area with slant lines. Mid point of vowel segments also marked. Model
words and standard notation of word accent and s t res s pattern at top right.
Hz 4 0 0 1 jocierna
140 1 C2V2
.1 .2 .3 .4 .5 .6 .7 .8 .9 1.0 sec.
b a: b a b a b a b
0 4 0 1 armenier
b a b a: b a b a b 104 0
viol iner
b a b a b a: b a b
1 0 0 4 anornali
b a b a b a b a: b
Fig. I-A-2. Averaged Fo time variations for nonsense words. Consonant segments indicated
by a r e a with slant lines. Mid point of vowel segments also marked. Model
words and standard notation of word accent and s t r e s s pattern a t top right.
3 2 valar
3 0 2 lamad jur
.1 .2 .3 .4 .5 .6 .7 .8 .9 1.0 sec.
b a b a b a: b
3 200 malgorare
.I .2 .3 .4 .5 .6 .7 .8 .9 ID sec. b a: b a: b a b a b
0 3 ; o bananlada
.1 .2 .3 .4 .5 .6 .7 .8 .9 1.0 sec.
b a b a: b a: b a b
Fig. I-A-3. Averaged F time variations for nonsense words. Consonant segments indicated 0
by a r e a with slant lines. Mid point of vowel segments a l so marked. Model
words and standard notation of word accent and stress pat tern a t top right.
.I .2 .3 .4 .5 .6 .7 .8 .9 1.0 see.
b a b a b a: b a: b
3 0 2 0 Eva- Lena
.t .2 .3 .4 .5 .6 .7 .8 .9 1.0 sac.
b a: b a b a: b a b
03 0 2 Milanobo
.I .2 .3 .4 .5 .6 .7 .8 .9 1.0 set.
b a b a: b a b a: b
3 0 02 Ladogao
b a: b a b a b a: b
Fig. I-A-4. Averaged F time variations for nonsense words. Consonant segments indicated 0
by area with slant l ines. Mid point of vowel segments also marked. Model
words and standard notation of word accent and s t res s pattern at top right.