Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
Homophony and morphology:The acoustics of word-final S in English
Ingo Plag, Julia Homann & Gero Kunter
September 6, 2014
Abstract1
Recent research has shown that homophonous lexemes show striking phonetic2
differences (e.g. Gahl 2008, Drager 2011), with important consequences for mod-3
els of speech production such as Levelt et al. (1999). These findings also pose4
the question of whether similar problems hold for allegedly homophonous affixes5
(instead of free lexemes). Earlier experimental research found some evidence that6
morphemic and non-morphemic sounds may differ acoustically (Walsh & Parker7
1983, Losiewicz 1992). This paper investigates this question by analyzing the pho-8
netic realization of non-morphemic /s/ and /z/, and of six different English /s/9
and /z/ morphemes (plural, genitive, genitive-plural and 3rd person singular, as10
well as cliticized forms of has and is). The analysis is based on more than 60011
tokens extracted from natural conversations (Buckeye Corpus, Pitt et al. 2007).12
Two important results emerge. First, there are significant differences in acoustic13
duration between some morphemic /s/’s and /z/’s and non-morphemic /s/ and /z/,14
respectively. Second, there are significant differences in duration between some of15
the morphemes. These findings challenge standard assumptions in morphological16
theory, Lexical Phonology and models of speech production. Phonetic detail needs17
to have a place in models of morphology-phonology interaction.18
1 Introduction19
Recent studies on lexeme homophony have shown that seemingly homophonous words20
actually differ in phonetic detail such as length and vowel quality (Gahl 2008, Drager21
2011). Likewise, realizational differences can be found in complex words. Kemps et al.22
(2005b,a) showed that free and bound variants of a stem differ acoustically and that23
listeners make use of such phonetic cues in speech perception.24
Findings like these pose a challenge to traditional models of speech production which25
locate frequency information at the level of the phonological form, and which postulate26
that phonetic processing does not have access to morphological information (e.g. Chom-27
sky and Halle 1968, Levelt and Wheeldon 1994, Levelt et al. 1999).28
Homophony below the level of the lexeme, however, has not received much attention so29
far. There are two questions that warrant closer inspection. The first concerns seemingly30
homophonous segments that represent morphemes and those that do not. For example,31
the morphemic /s/ in laps might acoustically differ from the non-morphemic /s/ in32
1
lapse, as suggested by Walsh & Parker (1983). The second question of interest concerns33
morphemes that share one phonological form while representing different meanings or34
functions. There could be fine phonetic differences between such morphemes, as for35
example between plural /s/ (as in two heaps) and third person singular /s/ (as in she36
keeps).37
In this paper, we investigate both questions. Using data from the Buckeye Corpus38
(Pitt et al. 2007), we look into the issue of morpheme homophony by investigating the39
phonetic realization of non-morphemic final /s/ and /z/, and of different English /s/40
and /z/ morphemes, which can denote plural, genitive, genitive-plural and 3rd person41
singular, as well as cliticized forms of has and is. We show that there are significant42
differences between many of the different kinds of /s/ and /z/. Our regression models with43
a number of pertinent covariates (e.g. speech rate, position in the phonological phrase,44
frequency etc.) demonstrate, for example, that plural and non-morphemic /s/ and /z/45
are significantly longer than most other /s/ and /z/ morphemes. These findings are46
unexpected from a theoretical perspective and pose further challenges to extant theories47
of morphology and widely accepted models of speech production.48
The paper is structured as follows. In the next section we lay out in more detail the49
general problem of the relationship between morphological structure and the phonetic50
signal, and we will develop our research questions. Section 3 introduces the methodology51
of our corpus-based study. The results are presented in section 4, followed by a discussion52
and conclusion.53
2 Morphological structure and the phonetic signal54
Traditionally, morphemes are defined as linguistic units that consist of a meaning and a55
phonological form. In the case of allomorphy, one meaning is linked to several phonolog-56
ical forms whose choice depends on constraints that can be phonological, morphological57
or lexical in nature. A textbook example of such allomorphy are the allomorphs /s/,58
/z/ and /ız/ of the English plural morpheme. If the base noun ends in a sibilant, /ız/59
is chosen, if the base ends in a non-sibilant voiceless consonant, /s/ is chosen, and /z/60
occurs in all other contexts. This distribution is exactly the same for the English third61
person singular marker, i.e. the two morphemes are identical at the form level, both in62
terms of their exponents and in terms of their distribution. /s/, /z/ and /ız/ are also al-63
lomorphs of the genitive marker (though with slightly different distributional constraints,64
see Bauer et al. 2013:69, 129, 145). The plural morpheme, the genitive-plural morpheme,65
the genitive morpheme and the third person singular morpheme thus share homophonous66
exponents. Similarly, the cliticized forms of has and is can either be /s/ or /z/ depending67
on the presence of voicing in the context, which makes them homophones to each other68
and to a subset of the allomorphs of the plural, 3rd singular and genitive morphemes69
(Bauer et al. 2013:85). Crucially, there is nothing in the representation of the allomorphs70
that could cause systematic differences in phonetic implementation between the different71
morphemes, nor between the same segments when they do not represent morphemes.72
The said morphemes are treated in a similar way in standard formal theories of73
morphology-phonology interaction (e.g. Chomsky and Halle 1968, Kiparsky 1982). In74
such models the allomorphy is determined at a particular phonological cycle inside the75
2
lexicon, and at the level of underlying representations. Once the right underlying form76
is derived, the morphological boundary of the respective cycle is erased (a process called77
’bracket erasure’, see Chomsky and Halle 1968, Kiparsky 1982) and the form leaves the78
lexicon. All further phonological processes are relegated to another module called ’post-79
lexical phonology’ and later to the articulatory component, neither of which have access80
to morphological information. As in traditional structuralist accounts, there is nothing81
in the system that would allow for systematic phonetic differences between homophonous82
suffixes, or between morphemic and non-morphemic sounds.83
The distinction between lexical and post-lexical processes features also prominently in84
psycholinguistics. According to widely accepted models of speech production (e.g. Levelt85
et al. 1999), the aforementioned morphemes and clitics would not differ in their realiza-86
tion from corresponding non-morphemic /s/, /z/ and /ız/. In these models, meanings87
are stored in the mental lexicon, and their corresponding forms are represented phono-88
logically. Thus, what is used as a basis for articulation is the phonological form only,89
and the module called ’articulator’ does not have access to any information regarding90
the lexical origin of a sound. Leaving stylistic and accentual differences aside, a certain91
string of phonemes in a given context will therefore always be articulated in the same way,92
irrespective of its morphemic status, and only modulo the phonetic variation originating93
from purely phonetic sources such as speech rate or context.94
Recent research on the homophony of lexemes suggests, however, that such a model of95
speech production may be insufficient. Gahl (2008) investigated the acoustic realization of96
223 supposedly homophonous word pairs such as time and thyme and found that, quite97
consistently, the more frequent member of the pair, e.g. time, is significantly shorter98
than the respective less frequent one, e.g. thyme. This can be taken as evidence that two99
homophonous lexemes cannot be represented exclusively by one identical phonological100
form with information on their combined frequency, but that the individual frequencies101
must be stored with the respective lemmas and have an effect on their articulation.102
Similarly, Drager (2011) found that the different functions of like go together with103
different acoustic properties. Whether like is used as an adverbial, as a verb, as a discourse104
particle or as part of the quotative be like has an effect on several phonetic parameters,105
i.e. the ratio of the duration of /l/ to vowel duration, on the pitch level and on the degree106
of monophthongization of the vowel /aı/. These fine differences indicate that homophony107
of two or more lemmas at the phonetic level may not exist.108
Below the word level, there is evidence that phonemically identical strings may sys-109
tematically vary in their phonetic realization, depending on morphemic status. This runs110
counter to not only to standard models of speech production but also to the structuralist111
and formal theories of phonology-morphology interaction. Kemps et al. (2005b) found112
that free and bound variants of a base (e.g. help without a suffix as against help in helper)113
differ acoustically, even if no morpho-phonological alternations apply. Furthermore, the114
authors show that Dutch and German listeners do make use of such phonetic cues in115
speech perception (see also Kemps et al. 2005a).116
From an articulatory perspective, differences in the variability of intergestural timing117
in monomorphemic and complex words also point at incongruities in the representations118
of homophones. In an EPG study, Cho (2001) found that in Korean, timing of the gestures119
for [ti] and [ni] shows more variation when the sequence is heteromorphemic than when it120
is tautomorphemic, which indicates that morphological structure is reflected in the details121
3
of the articulatory gestures, with potential acoustic correlates in the speech signal.122
In view of these findings, homophony below the level of the lexeme seems worthy of123
closer inspection. If there are fine differences between supposedly homophonous free lex-124
emes, there could well be systematic differences between morphemic and non-morphemic125
sounds, and differences between supposedly homophonous bound morphemes. Not much126
research has been done in this area, but there are three experimental studies available127
that have looked at morphemic vs. non-morphemic (strings of) sounds.128
Walsh & Parker (1983) carried out a production experiment with three homophonous129
word pairs and measured the length of /s/ in monomorphemic words and in words that130
were homophonous to the monomorphemic ones but contained a final morphemic /s/ (e.g.131
lapse vs. laps). The experiment had three different conditions, and in each condition132
the word pairs were presented in a different context. The authors then compared the133
means of morphemic and non-morphemic /s/ across the three different conditions of their134
experiment. In two of the conditions there is a small difference of nine milliseconds in135
the means of the two different kinds of /s/, while the third condition shows no difference136
between morphemic and non-morphemic /s/. Based on these means the authors conclude137
that ”the durational differences of final /s/’s observed in Condition I and Condition II138
appear to be a function of the morphological status of the /s/.” (Walsh & Parker 1983:139
204).1 Given the very small data set and other methodological problems such as the lack140
of any inferential statistical analysis, or the integration of any phonetic covariates, Walsh141
& Parker’s results may be met with great scepticism.142
In a similar study, Losiewicz (1992) investigated the acoustic difference between mor-143
phemic, i.e. past tense, /d/ and /t/, and non-morphemic /t/ and /d/, and also found144
durational differences between the two sets of sound. As Hanique and Ernestus (2012)145
point out, however, Losiewicz’s study suffers from serious methodological problems, such146
as very small data sets and the use of insufficient frequency measures. Another problem147
that is not mentioned by Hanique & Ernestus is that Losiewicz tested both /t/ and /d/148
without including voicing as a co-variate.149
Baker et al. (2007), also using experimental data, found acoustic differences (in dura-150
tional and amplitude measurements) between morphemic and non-morphemic initial mis-151
and dis- (as in, e.g., distasteful vs. distinctive. Again there are methodological problems,152
such as the fact that morphemic and non-morphemic strings did not only differ in mor-153
phemic status but also in phonological properties that may have directly affected their154
acoustic realization, such as stress. For example, distasteful may have a secondary stress155
on the first syllable while distinctive may not, which may have an effect on duration and156
amplitude.157
In sum, there is some evidence that there might be systematic duration differences158
between affixes and the corresponding homophonous non-morphemic sounds, but the data159
sets are very small and do not represent natural conversational speech, and the effects160
found are not always convincing due to methodological shortcomings. Nevertheless the161
previous results are promising enough to warrant further inquiry into the homophony of162
morphemic and non-morphemic sounds.163
1A follow-up perception experiment reported briefly in Walsh & Parker (1983) did not yield evidencethat listeners were able to make use of the phonetic information. In this paper, we are exclusivelyconcerned with the acoustic properties of /s/ and /z/, the problem of perception remains a topic offuture research.
4
With regard to the acoustic differences between different homophonous suffixes, we164
have to say that to our knowledge, this question has never been systematically investi-165
gated, although it potentially has important theoretical implications, as any effect found166
in this area runs counter to established theories of morphology-phonology interaction and167
models of speech production. Systematic morpho-phonetic effects would call for a recon-168
sideration of the place of phonetic detail in lexical representation and lexical processing.169
A systematic study is therefore called for that investigates the two aspects of suffix170
homophony on a larger scale, preferably using data from natural conversations. This171
paper presents such a study, testing the two null hypotheses given in (1) and (2).172
(1) Null Hypothesis 1173
There is no durational difference between non-morphemic and morphemic sounds174
(e.g. between plural /s/ and non-morphemic word-final /s/)175
(2) Null Hypothesis 2176
There is no durational difference between homophonous suffixes (e.g. between177
plural /s/ and 3rd person singular /s/)178
3 Methodology179
3.1 Data180
Let us first look at the data to test Null Hypothesis 2, i.e. morphemic /s/ and /z/. We181
will investigate six morphemes that share the allomorphs /s/ and /z/, namely the 3rd182
person singular marker, the plural marker, the genitive marker, the combined genitive-183
plural marker, the cliticized form of has and the cliticized form of is. We focus in this184
paper on the duration of the two allomorphs /s/ and /z/. The allomorph /ız/, which is185
restricted to plural, genitive and third person singular marking, is not considered.2186
A note on our terminology is in order. We will use capitalized ’S’ as an umbrella term187
for the two segments /s/ and /z/ in word-final position. The term ’morphemic S’ will be188
used as an umbrella term for the clitics and suffixes. Furthermore we will used he term189
’base’ both for morphological bases as well as for hosts in cliticization, and for the string190
of sounds that precedes the final S in mono-morphemic words.191
As data source for this study we used the Buckeye Corpus of Conversational Speech192
(Pitt et al. 2007). This corpus comprises about 300,000 words from 40 long-time local193
residents of Columbus, Ohio, who were recorded conversing freely with an interviewer194
for about one hour each. In addition to the raw speech files, the Buckeye Corpus offers195
time-aligned written and phonetic transcriptions of the interviews.196
By using data from natural conversation instead of experimentally elicited data we197
take a rather bold approach. While experimental data may provide a better opportunity198
to control potentially intervening variables in various ways, data obtained in this way199
sometimes also raise concerns about their validity. It has been shown however, that,200
primarily due to modern statistical techniques such as mixed-effects regression (Baayen201
2Another potential candidate is cliticized us, as in let’s. As this form does not show any variation withregard to its host (it is always let, a comparison with the other /s/ and /z/ morphemes in a mixed-effectsregression model is problematic. Cliticized us was therefore excluded from this study.
5
et al. 2008), more natural data can be fruitfully employed to investigate issues of morpho-202
phonetic detail by including pertinent covariates that control for many sources of vari-203
ability in the data (see, for example, Ernestus and Warner (2011) for an overview).204
Examples of the different kinds of morphemic S from our dataset are given in (3).205
(3) a. It feels like they’re trying to tease me (3rd person singular)206
b. A grand total of six years as an undergrad (plural)207
c. Write a short story in that author’s style (genitive)208
d. The local senior citizens ’ home (genitive plural)209
e. That’s the way my family’s always been (has clitic)210
f. My brother’s twenty four (is clitic)211
For the analysis of length differences between different types of morphemic S we needed212
items that consisted of a base and of one of the types of morphemic S introduced above.213
While the clitics can take all sorts of bases, affixal S is more restricted in this respect.214
To keep the set of items as homogeneous as possible, only items with verbs, nouns, and215
pronouns (indefinite and personal) as bases entered the data set.216
Using the automatic annotations of the Buckeye corpus, 100 tokens of each morphemic217
S were randomly extracted. If a type occurred more than 12 times, all additional tokens218
were replaced by other, randomly sampled types. If less than 100 tokens were available219
for a certain kind of morphemic S, all available tokens were extracted. The overall set220
of suffixed items amounted to 460 (i.e. tokens), representing 293 types. Of these items,221
9 were excluded because acoustic and visual inspection revealed that the morphemic S222
was not realized as [s] or [z] in the speech signal (but e.g. as [ız] or [S], or it was omitted223
completely) and in one case the speaker purposefully lengthened the S for stylistic reasons.224
After inspection of the distribution of the duration measurements we also excluded as225
outliers items that were longer than 250 ms. Eventually, 447 items entered the acoustic226
analysis of morphemic S.227
In order to investigate Null Hypothesis 1, i.e. the potential difference between mor-228
phemic and non-morphemic S, we also sampled a set of non-morphemic word-final S229
tokens from the corpus. This sample was created as follows. In an initial step we ex-230
tracted all words from the corpus that ended in [s] or [z], irrespective of their expected231
standard pronunciation. Multimorphemic words, irregular third person singular forms232
(i.e. does, has and is), pronouns, and determiners were manually removed from this233
initial set. If a mono-morphemic word, i.e. type, was produced more than once by a234
speaker, only one randomly selected token of that type entered the pre-final data set235
(N=3057 tokens).3 From these words, 240 words were randomly sampled. Items with236
anomalies in the signal were excluded, as were outliers that were longer than 250 ms.237
The final data set consisted of 223 words with non-morphemic S, about half the number238
of items as in the set of morphemic S.239
In the final data set, the different types of S were distributed as shown in Figure 1.240
3We use the notion of ‘type’ here to refer to a specific pronunciation of a word as given in the Buckeyetranscriptions. That is, if a given word was pronounced in different ways, for example with variabledeletion of a particular segment, this was treated as different types.
6
Figure 1: Distribution of different types of S in the data set. (Abbreviations: s = non-morphemic S, GEN = genitive, PL-GEN = genitive plural)
3.2 Analysis241
3.2.1 Acoustic measurements242
With the help of LaBB-CAT (Fromont and Hay 2008), a freely available speech corpus243
management system formerly known as ONZE Miner, the Buckeye transcripts were con-244
verted into textgrid files. These files could then be used for further segmentation and245
analysis with the help of the acoustic analysis software Praat (Boersma and Weenink246
2013). Buckeye’s (partly automatic) phonetic annotations were checked for each item247
and adjusted where necessary. After manual checking and adjustment of all relevant248
intervals acoustic measurements were taken automatically with the help of a script.249
We wanted to model the length of S both in absolute terms and in relative terms, i.e.250
in relation to the length of its base. Studies of geminates (e.g. Oh and Redford 2012)251
have shown that differences in phonetic duration can be meaningfully interpreted as252
relative or absolute, depending, among other things, on the language under investigation.253
Given that very little is known about the absolute or relative length of affixes, it seemed254
reasonable to test both kinds of dependent variables. Therefore, the absolute duration255
of S in a given token was obtained from the duration of the segment in milliseconds, and256
the relative duration was calculated by dividing the absolute duration of the S by the257
duration of the whole word, i.e. by the sum of base duration and S duration.258
The distribution and means of the duration measurements by type of S are given in259
figure 2. Each dot represents one measurement and the lines indicate the means.260
7
Figure 2: Duration of different types of S in the data set. (Abbreviations: s = non-morphemic S, GEN = genitive, PL-GEN = genitive plural)
The distribution of the measurements shows rather clear differences between the dif-261
ferent types of S, and an anova shows a significant effect of type of S (F=20.196,262
p¡2.2e-16). The results of pair-wise comparisons of the means using Tukey contrasts are263
summarized in Table 1.4 We find ten significant contrasts. Non-morphemic S and plu-264
ral S do not differ from one another but are significantly longer than all other types of265
morphemic S.266
4For the comparisons the packages multcomp (Torsten Hothorn et al. 2008) and sandwich (Zeileis2004, 2006) were used, implementing the specific procedure suggested by Herberich et al. 2010 for sce-narios with unbalanced group sizes, non-normality and heteroscedasticity.
8
Table 1: Multiple comparison of means of duration of S (Tukey contrasts). (Significancecodes: ’***’ p<0.001 ’*’ p<0.01, ’*’ p<0.05)
Estimate std. Error t value Pr(>—t—)plural - s -0.0130316 0.0053468 -2.437 0.181463rdsg - s -0.0335883 0.0046443 -7.232 >0.001 ***GEN - s -0.0301727 0.0045179 -6.679 >0.001 ***has - s -0.0440993 0.0048154 -9.158 >0.001 ***is - s -0.0334490 0.0039892 -8.385 >0.001 ***PL-GEN - s -0.0332975 0.0049425 -6.737 >0.001 ***3rdsg - plural -0.0205567 0.0058582 -3.509 0.00844 **GEN - plural -0.0171410 0.0057585 -2.977 0.04664 *has - plural -0.0310676 0.0059948 -5.182 >0.001 ***is - plural -0.0204174 0.0053538 -3.814 0.00286 **PL-GEN - plural -0.0202659 0.0060974 -3.324 0.01594 *GEN - 3rdsg 0.0034157 0.0051129 0.668 0.99407has - 3rdsg -0.0105109 0.0053776 -1.955 0.43954is - 3rdsg 0.0001393 0.0046524 0.030 1.00000PL-GEN - 3rdsg 0.0002908 0.0054917 0.053 1.00000has - GEN -0.0139266 0.0052688 -2.643 0.11228is - GEN -0.0032764 0.0045262 -0.724 0.99087PL-GEN - GEN -0.0031249 0.0053852 -0.580 0.99727is - has 0.0106502 0.0048232 2.208 0.28772PL-GEN - has 0.0108017 0.0056372 1.916 0.46427PL-GEN - is 0.0001515 0.0049501 0.031 1.00000
While this may look already like a very interesting result, it has to be treated with267
great caution. Previous studies of the acoustic duration of words and individual sounds268
have shown that this acoustic parameter is subject to various acoustic and non-acoustic269
factors, such as speech rate, frequency, final lenghtening etc. Variable durations for the270
same word type may also arise from more general processes of phonetic reduction, which271
may affect duration but also non-durational parameters such as vowel quality or number272
of segments and syllables. In any study interested in one particular factor, in our case273
the influence of morphological status, these other influences need to be controlled, for274
example as covariates in a regression analysis.275
3.2.2 Covariates276
The specific set of covariates chosen for the present study is very similar to that of other277
studies that have investigated duration effects in morphologically complex words using278
speech corpus data (for example Pluymaekers et al. 2005b, 2010, Hanique et al. 2013).279
In the following we will briefly discuss each of the covariates.280
Local speech rate. An obvious variable that needs to be controlled for is speech281
rate. We used two measures of speech-rate. As one measure we calculated the speech rate282
in the discourse around each item (i.e. the ’local’ speech rate in syllables per second).283
This was done by counting the number of syllables in the context of up to ten seconds284
immediately preceding and succeeding the item, if not interrupted beforehand, e.g. by a285
9
pause or by the interviewer. The number of syllables was then divided by the number of286
seconds of context that was considered.287
Base duration. With regard to an even more local speech rate, we measured the288
length of the base. Other things being equal, a longer duration of the base will indicate289
a slower speech tempo. This measurement is also able to at least partially control for290
the lengthening effect of accentuation: as Turk and White (1999) show, word-final (non-291
morphemic) consonants are significantly longer if the word they belong to is accented.292
Number of syllables. We also included another measure of length, the number293
of syllables. As shown in Lindblom (1963) for Swedish vowels, and Nooteboom (1972)294
for Dutch vowels, vowel segments may tend to be shorter if they are followed by more295
syllables (but see Hanique et al. (2013) for somewhat different findings). It is unclear296
whether this effect can also be found in English, and whether it pertains to fricatives,297
but it seemed safe to include this covariate nevertheless. In any case, this effect can be298
conceptualized as a kind of compression effect, where words with more syllables undergo299
reduction. The syllable counts for the bases in our data set were extracted from CELEX300
where possible, in all other cases the number of vowels (or diphthongs) in a word was301
taken as an indicator for its number of syllables.302
Frequency. Frequency also affects phonetic duration, with more frequent words303
exhibiting more reduction, i.e. shorter segment durations (e.g. Bybee 2001:78, Jurafsky304
et al. 2001). We used two kinds of frequency, base frequency and form frequency. For base305
frequency we took the log-transformed frequency of the base in the spoken part of the306
Corpus of Contemporary American English (COCA, Davies 2008-), for the form frequency307
we calculated the log-transformed frequency of the word including the final S, ignoring308
what kind of S is attached in a given token. The frequencies were log-transformed to309
reduce the potentially harmful effect of skewed distributions in linear regression models.310
Previous mention. Another reduction effect may arise from online priming. The311
more often a complex word or its base have been mentioned in the previous discourse,312
the shorter we expect their durations to be in a given case (e.g. Fowler 1988, Fowler313
and Housum 1987, Gahl et al. 2012). We therefore included a covariate that counted the314
number of previous mentionings in a time window of 30 seconds preceding the token in315
question.316
Bigram frequency. Another potential factor influencing the duration of a word in317
running speech is the predictability of the word in its context (e.g. Jurafsky et al. 2001,318
Pluymaekers et al. 2005a, Bell et al. 2009, Torreira and Ernestus 2009). For content words,319
recent studies unanimously show that it is the upcoming context that has an effect on320
acoustic reduction (affecting also duration, e.g. Pluymaekers et al. 2005a, Bell et al.321
2009, Torreira and Ernestus 2009). To account for this durational effect we measured the322
bigram frequency of the item in question and its following word.323
Neighborhood density. Finally, we included phonological neighborhood densities as324
covariates, as it has been shown that the number of phonological neighbors may influence325
phonetic reduction (and hence word duration), as denser networks facilitate articulation326
(see, for example, Gahl et al. 2012). Neighborhood density measures were taken from the327
clearpond data base (Marian et al. 2012).328
Number of consonants in the rhyme. Another factor influencing the length of329
consonantal segments is their occurrence in a cluster. Consonants in clusters tend to330
be shorter (e.g. Klatt 1976). We therefore included the number of consonants in the331
10
rhyme of the final syllable of the base as a covariate, expecting that the more consonants332
precede the S the shorter the individual segments, including S, would become. Since this333
numerical variable had only three values (0, 1 and 2) the variable was transformed into a334
categorical variable to prevent the model to come up with nonsensical estimates (e.g for335
a word with 1.2 consonants in the rhyme).336
Voicing. We included voicing to account for the effect of allomorphy. It is also well-337
known that voicing affects the length of fricatives, with voiced fricatives being shorter338
(e.g. Klatt 1976). Based on an inspection of the distribution of the measurements taken339
by the algorithm used in Praat, an S was considered to be voiced if the algorithm could340
detect a periodic pitch pulse in more than 75 percent of the overall duration of the341
segment.342
Position in utterance. The position of an item in an utterance also plays a role343
for duration. Words and segments (especially fricatives) at the end of an utterance are344
subject to lenghtening (e.g. Oller 1973:1244, Berkovits 1993). Thus, we coded for each345
item whether it was found in the final position of an utterance, or in medial position. We346
would expect an S as the final segment of an utterance to be longer than an S we find in347
other positions.348
Syntactic position. Similarly, segments before a syntactic or prosodic boundary are349
lengthened (see, for example, Klatt 1976, Cooper 1976, Cooper et al. 1978, Byrd et al.350
2006). For our data, determining the boundaries of intonation phrases turned out to351
be highly problematic and unreliable, and we therefore opted for a syntax-based coding.352
We coded of each item whether it occurred at the right boundary of a syntactic phrase353
(e.g. an NP). Such a coding can at least partially control for differences that might occur354
due to syntax-based prosodic effects, for example, between phrase-final plural nouns and355
pre-head genitives.356
Following segment. We also coded the first segment of the word following the S357
(if it occurred in medial position), as this segment might also have an influence on the358
duration of the preceding S (e.g. Klatt 1976).359
3.2.3 Statistical analysis360
We devised two types of analysis, with different constellations of variables and different361
statistical models. In the first analysis we used absolute duration of S as the de-362
pendent variable (Model 1), whereas the second analysis had proportion of S as its363
dependent variable.364
Model 1 was fitted using mixed-effects regression, as implemented in the packages365
nlme4 (Bates et al. 2014) and lmerTest (Kuznetsova et al. 2014) for R (R Development366
coreteam 2011). Mixed-effects regression brings the variation of random effects such as367
subject or item under statistical control, and can deal with unbalanced data sets. The368
latter property is especially welcome since not all combinations of all values of the different369
predictors are represented in our data with equal frequency.370
The mixed-effects model was fitted adhering to the following strategy: In the initial371
model, alongside the explanatory variable type of S, we included the control variables372
discussed in the previous section.373
This initial model was then reduced through stepwise exclusion of insignificant factors374
(e.g. Baayen 2008). A factor was only considered significant if it proved to pass three375
11
tests: first, its t-statistics had to yield a t-value of greater than 2 (or less than -2) when376
included in the model. Second, the Akaike information criterion (AIC) of the model377
including the factor had to be lower than the AIC of the model without it. Third, an378
ANOVA comparing the model including the factor to a model without it had to yield a379
p-value lower than 0.05, thus showing that the inclusion of the factor did significantly380
improve the fit of the model. A variable under consideration was only retained in the381
model if it passed all three tests. When testing for interactions, numerical variables were382
first standardized to avoid skewed results (e.g. Jaccard et al. 1990).383
One of the central assumptions of any linear regression model is a linear relationship384
between the dependent and independent variables. If this assumption is not met, the385
estimated coefficients may be highly unreliable. In this case, the dependent variable may386
often be transformed to alleviate for any problem resulting from a lack of linearity. The387
Box-Cox transformation (Box and Cox 1964, Venables and Ripley 2002) can be employed388
to identify a suitable transformation parameter λ for a power transformation. For the389
transformation of the absolute duration measures used in Model 1 the optimal value of390
λ was λ =0.1414141.391
Let us turn to Model 2. In this analysis the dependent variable was the relative length392
of S (proportion of S, calculated as the (non-transformed) absolute duration of the S393
divided by the duration of the whole word). As this variable is bounded between 0 and 1394
and has a skewed distribution, linear regression is ruled out and a model is needed that395
can cope with these properties of the target variable. Beta regression (e.g. Ferrari and396
Cribari-Neto 2004) is such a model. We used the R package betareg (Cribari-Neto and397
Zeileis 2010) for this analysis. For the beta regression models a slightly different fitting398
strategy had to be adopted. This will be explained in more detail in section 4.2.399
Since number of syllables and length of the base are highly correlated variables, we400
residualized the number of syllables on the duration of the base and included this new401
predictor, resSylNum, in the model. The same procedure was applied to base frequency402
and form frequency, so that the residualized form frequency was included.403
3.2.4 Overview of the data404
An overview of variables and their distributions is given in Table 2. The names of the405
(sometimes transformed) variables that entered the analysis are given in small capitals.406
12
Table 2: Summary of the dependent variables and covariates used in the initial models.
Dependent variables N Mean St. Dev. Min Max
Absolute duration of S durationOfS 670 0.086 0.040 0.022 0.242Relative duration of S proportionS 670 0.220 0.088 0.042 0.674
Numerical predictors N Mean St. Dev. Min Max
Local speech rate sylSec 670 5.782 1.330 2.376 12.977Base duration baseDuration 670 0.323 0.132 0.030 0.929Number of syllables resSylNum 670 0.001 0.684 −1.662 3.092Base frequency logBaseFreq 670 8.921 2.159 1.792 14.493Form frequency resLogAllBoundFreq 670 −0.007 0.997 −3.662 2.579Previous mention baseRep 597 0.307 0.736 0 6Bigram frequency logrBigram 567 8.759 3.024 0.000 14.413Neighborhood density PND 657 14.460 14.889 0 60
Categorical predictors N Levels
No. of cons. in rhyme consonants 670 0: 354 1: 261 2: 55Voicing isVoiced 670 yes: 87 no: 583Position in utterance position 670 mid: 582 final: 88Syntactic position boundary 670 yes: 232 no: 438Following segment follSound 580 (40 levels)
Explanatory variable N Levels
670 S: 223 PL: 97 3rdsg: 99 GEN: 90has: 47 is: 93 PL-GEN: 21
4 Results407
4.1 Model 1: Absolute duration as dependent variable408
Model 1 was fitted according to the procedure described above. The inspection of the409
residuals showed a non-normal distribution at one end of the distribution. Following410
standard procedures (e.g. Crawley 2002, Baayen and Milin 2010), we removed outliers411
(defined as items with standardized residuals exceeding −2.5 or +2.5) and refitted the412
model (the removal of outliers resulted in the loss of 2 % of the observations). The final413
model showed a satisfactory deistribution of residuals.414
In the final model we find significant effects of type of S (typeOfS), the position415
of the item (position), the number of consonants in the rhyme of the item’s final syl-416
lable (consonants), the speech rate (sylSec) and voicing (isVoiced). In addition,417
both the duration of the base (baseDuration) and the residualized number of syllables418
(resSylNum) have significant and independent effects on the length of S. Regarding the419
random effects, we tested speaker and base, but only speaker-specific effects turned out420
to significantly improve model performance.421
We also tested for interactions. Testing all covariates in interaction with type of S422
did not yield any significant interactions. We also tested an interaction between type of423
S and voicing, since one might suspect clitics not to participate in voicing assimilation to424
the same extent as it is found with morphemes. We finally tested an interaction between425
voicing and the number of consonants in the rhyme of the base-final syllable, since the426
13
presence of more consonants increases the distance between the S and the voiced nucleus427
of the syllable, which might lead to less assimilation. However, neither interaction turned428
out to be statistically significant.429
The p-values for the analysis of variance (or deviance) as well as the proportion of430
deviance explained for each fixed-effect of Model 1 are documented in table 3.5431
Table 3: p-values of fixed effects in Model 1, fitted to the Box-Cox-transformed durationsof S.
Df Sum Sq Mean Sq F value p-value expl.dev.(%)
typeOfS 6.00 0.19 0.03 37.26 0.000 16.12resSylNum 1.00 0.02 0.02 19.09 0.000 1.38position 1.00 0.20 0.20 229.57 0.000 16.55sylSec 1.00 0.01 0.01 8.38 0.004 0.60isVoiced 1.00 0.02 0.02 23.32 0.000 1.68baseDuration 1.00 0.01 0.01 12.70 0.004 0.92consonants 2.00 0.07 0.03 37.76 0.000 5.44
All effects are very highly significant, with the effect of position being strongest. The432
type of S is the second strongest predictor and can account for about 16 percent of the433
variance. The estimates and their p-values are documented in table 4. The reference434
levels for the categorial predictors are the following: for typeOfS it is non-morphemic435
S, for position it is final position, for isVoiced it is yes, and for consonants it is 0.436
All coefficents can be interpreted as changes relative to these reference levels.6437
5This table is based on the function pamer.fnc of the LMERConvenienceFunctions package (Tremblayand Ransijn 2013). Upper-bound p-values are computed by using as denominator degrees of freedom thenumber of data points minus number of fixed effects including the intercept). Upper-bound p-values areanti-conservative. Lower-bound p-values are computed by using as denominator degrees of freedom thenumber of data points minus number of fixed effects minus the number of random effects. The resultingp-values are conservative. Since the analysis yielded identical upper- and lower-bound p-values, the tablegives these p-values.
6The values are computed using the summary function of the lmerTest package. This package calcu-lates p-values and degrees of freedom based on Satterthwaite approximation for denominator degrees offreedom (SAS Institute Inc. 1978).
14
Table 4: Fixed-effect coefficients and p-values in a mixed-effects model fitted to the Box-Cox-transformed durations of S
Estimate Std. Error t value Pr(>|t|)(Intercept) 0.750 0.009 79.712 0.000morphplural -0.008 0.006 -1.407 0.160morph3rdsg -0.017 0.006 -2.983 0.003morphGEN -0.020 0.004 -4.744 0.000morphhas -0.028 0.005 -5.355 0.000morphis -0.022 0.005 -4.828 0.000morphPL-GEN -0.007 0.007 -0.909 0.364resSylNum -0.005 0.002 -2.373 0.018positionmiddle -0.051 0.004 -13.244 0.000sylSec -0.002 0.001 -2.439 0.015isVoicedunvoiced 0.020 0.004 5.604 0.000baseDuration 0.032 0.010 3.214 0.001consonants1 -0.015 0.003 -4.883 0.000consonants2 -0.042 0.005 -8.587 0.000
Figure 3 illustrates the effects of the covariates in Model 1.438
15
Figure 3: Partial effects of covariates in Model 1, fitted to the Box-Cox-transformedabsolute durations of S
The covariates show the expected behavior. In the upper row the leftmost panel439
shows that with regard to the number of syllables (resSylNum), we find the same440
compression effect for S in English as attested for Dutch and Swedish vowels. The more441
syllables, the shorter the S. The second panel (position) shows that in utterance-final442
position S is longer , which can be interpreted as a clear final-lengthening effect. The443
rightmost panel of the upper row gives us the effect of speech rate (sylSec). With faster444
speech, S becomes shorter. Moving on to the lower row, the leftmost panel indicates445
that, as expected, a voiced S is generally shorter than an unvoiced S. In the mid panel446
of the lower row we see the effect of baseDuration. Words that have a longer duration447
(while keeping syllable number constant) also have longer S’s, which means that final448
S participates in overall lengthening or shortening effects that affect the word as such.449
Finally, the more consonants we find in the rhyme the shorter final S becomes (lower row,450
rightmost panel), which is equally expectable.451
Let us now turn to the variable of interest, i.e. type of S, whose effect is plotted in452
Figure 4. The lines indicate the predicted means, the dots give the residuals.453
16
Figure 4: Partial effect of type of S, Model 1 (Abbreviations: s = non-morphemic S, GEN= genitive, PL-GEN = genitive plural)
We can see that non-morphemic S has the longest duration and the has clitic is454
shortest. Testing all pair-wise contrasts yields the significant contrasts shown in table 5.455
To be on the conservative side, we report the upper bounds of the p-values.456
Table 5: Significant contrasts in duration between different types of S. Significance codes:’***’ p<0.001 ’*’ p<0.01, ’*’ p<0.05
S PL 3RDSG GEN HAS IS PL-GEN
S x ** *** *** ***PL x ** *3RDSG xGEN xHAS x **IS x *PL-GEN x
Of the overall 21 possible pair-wise contrasts eight are significant. Non-morphemic457
S is significantly longer than all types of morphemic S apart from plural and genitive458
plural. Plural S is longer than all other types of morphemic S apart from plural genitive459
S. In addition to these contrasts we also find some significant contrasts among genitive460
S, 3rdsg and the has and is clitics.7461
7Model 1 does not take into account the potential effect of the following segment on the duration of
17
The results from Model 1 clearly indicate that the null hypotheses in (1) and (2)462
need to be rejected. We find robust differences between morphemic and (most) non-463
morphemic S on the one hand, and between some types of morphemic S on the other.464
These differences are present in natural conversational speech. They cannot be attributed465
to purely phonetic or lexical effects since these effects were carefully controlled for.466
4.2 Model 2: Relative duration467
An inspection of the distribution of the relative duration measurements showed three468
outliers where the S was extremely long compared to the rest of the items, with propor-469
tions being larger than 54 percent. These three items were removed. The distribution of470
the measurements of relative length of S was substantially skewed, which is something471
that can be frequently observed with variables that are bounded between 0 and 1. Beta472
regression is a kind of statistical model that can cope with these distributional properties.473
Beta regression models can contain two components, one that predicts the mean, and a474
component for the precision (or dispersion) phi (see Ferrari and Cribari-Neto (2004) for475
introduction and discussion). Broadly speaking, a predictor with a low precision coeffi-476
cient means that the beta regression model estimates the values of this predictor to be477
more dispersed around the coefficient’s mean than in the case of a predictor with a high478
precision coefficent.479
We fitted a beta regression model with the proportion of S as the dependent variable,480
and the same initial fixed effects as in Model 1 apart from baseDuration, since this481
variable had been used in the computation of the dependent variable. For the beta re-482
gression models a slightly different fitting strategy from that of mixed effects regression483
had to be adopted (see Ferrari and Cribari-Neto 2004 for details). The initial models484
started with only the mean component and all predictors. The initial model was then485
simplified in a step-wise procedure, removing all insignificant predictors. After the com-486
pletion of simplification for the mean component, we added the precision component and487
then simplified this component applying again the standard step-wise procedures. After488
completion of this simplification procedure we finally tested whether the inclusion of the489
precision component as a whole was justified by comparing of the AIC of a model that490
had only the mean component with the model with both mean and precision component,491
and by using a likelihood ratio test. The inclusion of the precision parameters was jus-492
tified by lower AIC scores and higher log-likelihood. The estimated coefficients for the493
means and the phi-values of the final model are given in table 6. The reference levels are494
the same as with Model 1, i.e. for typeOfS it is non-morphemic S, for position it is495
word-final consonants (e.g. Klatt 1976). Given that our data set included also utterance-final tokens (i.e.tokens without an immediately following segment), this variable could not be incorporated into Model1. In order to address this variable, we also fitted another model on the tokens in medial position, usingthe same set of predictors but replacing position by follSound (i.e. following sound). The same fixedeffects as in Model 1 emerged as significant in this new model apart from the number of syllables (andapart from the obvious exception of position). The nature of the following segment also had a significanteffect. In the model with following sound as random effect we find six significant pair-wise contraststhat overlap with the contrasts known from Model 1. There are the four contrasts of non-morphemic S,and for plural S we find two significant contrasts (plural-3rdsg, plural-is) and one marginally significantcontrast (plural-has). In a model fitted with follSound as fixed effect the same contrasts emerged assignificant.
18
final position, for consonant it is 0 and for isVoiced it is yes, so that all coefficents496
can be interpreted as changes relative to these reference levels. The bottom part of table497
6 documents the precision model, which outputs a significant effect for type of S and498
for the two covariates position and number of consonants.499
Table 6: Coefficients of final beta regression model, Model 2 (Pseudo R-squared of themodel: 0.2544)
Coefficients(Mean model with logit link)
Estimate Std. Error z p (> |z|)
(Intercept) -1.523340 0.096769 -15.742 <2e-16typeOfSplural -0.073045 0.064239 -1.137 0.25550typeOfS3rdsg -0.053469 0.074752 -0.715 0.47443typeOfSGEN -0.335864 0.069704 -4.818 1.45e-06typeOfShas -0.272441 0.090657 -3.005 0.00265typeOfSis -0.324614 0.072663 -4.467 7.92e-06typeOfSPL-GEN -0.209794 0.081780 -2.565 0.01031positionmiddle -0.231629 0.051308 -4.514 6.35e-06consonants1 -0.062879 0.038944 -1.615 0.10640consonants2 -0.388557 0.059769 -6.501 7.98e-11isVoicedunvoiced 0.201449 0.048540 4.150 3.32e-05logBaseFreq 0.050043 0.009779 5.118 3.09e-07resLogAllBoundFreq 0.060222 0.022638 2.660 0.00781
Phi coefficients(Precision model with log link)
Estimate Std.Error z p (> |z|)
(Intercept) 2.844906 0.164818 17.261 <2e-16typeOfSplural 0.165790 0.193006 0.859 0.39035typeOfS3rdsg 0.008172 0.191191 0.043 0.96591typeOfSGEN 0.519059 0.184837 2.808 0.00498typeOfShas 0.098614 0.240067 0.411 0.68124typeOfSis 0.452886 0.181490 2.495 0.01258typeOfSPL-GEN 0.628123 0.337476 1.861 0.06271positionmiddle 0.320562 0.162377 1.974 0.04836consonants1 0.430005 0.133978 3.210 0.00133consonants2 0.706434 0.225737 3.129 0.00175
With regard to our explanatory variable, Model 2 tells us both the means and the500
dispersion of the relative length of S are dependent on which kind of S we look at. This501
is fully in line with the results from Model 1.502
The behavior of the covariates in the mean model is partly different from, partly the503
same as their behavior in the model that tried to predict the absolute duration of S. The504
19
effects of position, number of consonants in the rhyme and voicing are found in both505
models and they go in the same direction as before. In middle position the proportion of506
S is smaller (see the negative coefficient), which means that the effect of final lengthening507
is especially prevalent with the final sound of the word. This is in line with the results of508
investigations studying the effect of final lengthening (e.g. Turk and Shattuck-Hufnagel509
2007). With regard to rhyme structure, the more consonants there are in the rhyme510
the shorter the proportion of S becomes (see the negative coefficients), which is again511
an expectable effect, even if restricted to the difference between no consonant and two512
consonants preceding S. Voiced tokens of S are shorter than unvoiced tokens, which is513
again an expected effect.514
Two new effects emerge, however, when looking at relative duration. Both base515
frequency and form frequency contribute to the relative duration of S in that higher516
frequency leads to a longer relative duration of S. This positive effect of frequency on517
relative duration (as computed in this paper) is in fact expected. The pertinent literature518
consistently finds shorter absolute durations for more frequent words. This is also the519
case in our data (rho=-0.39, p=0 for the correlation between log base frequency and the520
duration of the base). Since the duration of the base (added to the duration of the S) is521
in the denominator of the fraction used to compute relative duration (see again section522
3.2.1), a shorter duration in the denominator, i.e. the correlate of higher frequency, leads523
to a higher fraction, i.e. to higher relative duration measurements.524
Having controlled for the influences of the covariates, we now present the estimated525
means (obtained from the first component of the beta regression) and phi-coefficients526
(obtained from the precision component) for the different values of type of S in Table527
7.528
Table 7: Means and phi-coefficients of the different morphemes, final beta regressionmodel, Model 2
type of S mean phi
non-morphemic S 0.2538 17.1999plural 0.2402 20.30153rdsg 0.2438 17.3411genitive 0.1955 28.9035has 0.2057 18.9825is 0.1973 27.0528genitive-plural 0.2161 32.2342
Figure 5 shows the estimated beta distributions of relative durations for each type of529
S. The dots indicate the different means, and the shaded area indicates the dispersion530
around the mean. Non-morphemic S (which is the reference level in the beta regression)531
has the highest mean, and consequently, the mean coefficents for all other types of S are532
smaller (and thus negative) in Table 6.533
As can be gleaned from the lower part of Table 6, non-morphemic S also has the lowest534
precision estimate in Table 7 (the coefficients of all other types of S are positive), but535
the precision estimates of plural S and 3rd singular S do not differ significantly. This low536
20
precision is reflected in the very elongated dispersion areas for these types in the figure.537
Apparently, there is considerable variation around the mean for these types, notably more538
so than for other types of S which have higher estimates for precision.539
Figure 5: The effect of morpheme type on the relative duration of S (Abbreviations: s =non-morphemic S, GEN = genitive, PL-GEN = genitive plural)
If we pair-wise compare the different types of S in the mean component of Model 2, we540
find eleven significant contrasts, which is an even higher number than in Model 1 (which541
had eight significant contrasts). For these pairs, the beta regression estimates that the542
means of the relative duration differ significantly from each other. Table 8 summarizes543
the contrasts.544
Table 8: Significant contrasts in relative duration between different types of S. Significancecodes: ’***’ p<0.001 ’*’ p<0.01, ’*’ p<0.05
S PL 3RDSG GEN HAS IS PL-GEN
S PL 3RDSG GEN HAS IS PL-GENS x *** ** *** *PL x *** * ***3RDSG x *** ** ***GEN xHAS xIS xPL-GEN x
21
The pair-wise contrasts yield an interesting picture. In terms of relative duration,545
non-morphemic S, plural S, and 3rd person singular S form a group that differs from the546
group that is formed by the three clitics, i.e. genitive S, clitic has and clitic is.547
The former three are longer than the clitics, and there are no significant duration548
differences within each group. The mean of the genitive plural S lies between both549
groups, and differs significantly only from non-morphemic S and shows two marginally550
significant differences to the plural and 3rd singular (p=0.09 and p=0.07, respectively).551
To summarize, the analysis of the relative duration of S has shown there are many552
significant differences between different types of S, in particular between the S clitics and553
non-morphemic, plural and 3rd singular S. The results of Model 2 thus complement the554
results of Model 1, for which absolute duration served as the dependent variable. Even555
though relative duration and absolute duration behave somewhat different with regard556
to the influence of type of S, the null hypotheses in (1) and (2) needs to be rejected557
for both kinds of duration measurements.558
5 Discussion and conclusion559
This study started out from two null hypotheses. The first stated that there are no560
durational differences between non-morphemic S and morphemic types of S. The second561
null hypothesis stated that there are no durational differences between the different ho-562
mophonous types of morphemic S. However, no matter whether we looked at absolute563
duration of S or at the relative duration of S, the type of S emerged as a significant pre-564
dictor. Also, non-morphemic S differed consistently from some of the morphemic types565
of S. This means that we have to reject the two null hypotheses.566
Across models, non-morphemic S emerges as the longest S. The significance of indi-567
vidual differences between pairs of different types of S varies slightly between the two568
models, but many of the differences are robust across models.569
Our analyses of absolute and relative duration of the different types of morphemic S570
included pertinent co-variates that controlled for purely phonetic effects known from the571
phonetic literature. The co-variates behave as expected, so that one can assume that the572
speech corpus data used for this study are generally reliable. Thus, the results, surprising573
though they may be, cannot be easily dismissed as due to an unsuitable sample. But574
how can these results then be understood?575
Let us first discuss the difference between morphemic S and non-morphemic S (Null576
Hypothesis 1). Our results concerning the absolute duration of S seem to agree with577
Walsh & Parker (1983), who also found differences between the two types of S. Such a578
statement would, however, be premature. As these authors did not employ any inferential579
statistics, nor consider any co-variates in their analysis, we cannot determine whether580
the small differences found in the means by Walsh & Parker are meaningful. It may be581
noteworthy, however, that the mean difference observed by Walsh & Parker goes in the582
opposite direction from ours: in their data, morphemic S is longer, not shorter, as it is583
in our sample.584
If we take this discrepancy between their data and ours seriously, we may speculate585
about the reasons for the discrepancy. One reason might be the different kinds of item.586
Thus, in Walsh & Parker’s experimental study, morphologically complex words were587
22
compared with their simplex homophones, while in our data we tested the duration of S588
across the board, i.e. across many different, non-homophonous, types. Furthermore, in589
the experiment the participants read out the stimuli, so that the experimental procedure590
may have influenced the results in a way that we do not find in spontaneous speech.591
Gahl et al. (2012), for example, discusses studies that have failed to reproduce the well-592
established effect that more frequent words generally show shorter duration and remarks593
that this failure may be due to the fact that speakers tend to read at a regular pace when594
asked to read word lists or words in short carrier phrases. This tendency has been found595
to override effects of lexical properties, such as word frequency.596
Let us turn to Null Hypothesis 2, which concerns suffix homophony. Following in the597
footsteps of recent studies by, for example, Gahl et al. (2012), Drager (2011), we tested the598
null hypothesis that there is no difference between phonologically homophonous suffixes in599
English. In particular, we investigated whether the morpheme type (i.e. plural, genitive,600
plural, genitive, 3rd person singular, cliticized has or cliticized is) has an influence on601
the acoustic duration of the segment.602
Both models found significant differences between some of the morphemes, but differ603
somewhat as to which morphemes differ from which other ones. In terms of absolute604
duration general, plural S is longest, and the auxiliary clitics are shortest, with a number605
of additional differences between individual pairs of morphemes. In terms of relative606
duration, non-morphemic, plural and 3rd singular S differ from the three clitics. We607
can therefore safely say that the Null Hypothesis 2 also needs to be rejected. Like in the608
afore-mentioned studies of whole lexemes that are taken to be homophones, homophonous609
bound morphemes can show systematic acoustic differences.610
What is especially striking is the patterning of the relative durations. The types of611
S that have a supposedly stronger boundary, i.e. the three clitics (genitive, has and612
is), show shorter relative duration than the types of S that have a weaker boundary (i.e.613
plural and 3rd person singular) or no boundary (i.e. non-morphemic S). These facts could614
be interpreted in such a way that morphological boundary strength may have a rather615
direct correlate in the acoustic signal.616
Our findings on the phonetics of S are in line with other studies that have produced617
evidence that lexical representations may include fine phonetic information, and that this618
information is used by speakers in word recognition (e.g. Gahl (2008), Johnson (2004),619
Kemps et al. (2005a), Salverda et al. (2003)).620
If there are systematic differences between the different types of S in speech produc-621
tion, one would like to know whether language users are influenced by these differences in622
perception. A natural extension of the present study would therefore be the investigation623
of S with listeners. The difference in absolute duration between the observed means of624
the shortest morphemic S (i.e. the has clitic) and non-morphemic S in our corpus data625
amounts to c. 44 ms, with the means of duration of S varying between 62 and 106 ms626
(difference as predicted by the model: 26 ms, range predicted by the model: 93-119ms).627
These differences seem large enough to be potentially perceptible: the perceptual thresh-628
old for durational differences in fricatives has been estimated at about 25 to 30 ms (e.g.629
Klatt & Cooper 1975, Schatzman & McQueen 2006).630
Our results raise important theoretical questions. Given that we find differences631
between different types of S, it is unclear why such differences would emerge in the first632
place. More specifically, one is tempted to ask the question why a certain type of S633
23
should be longer than some other S, and not shorter. For example, why are the clitics634
particularly short and plural S is rather long? Which theory would predict that?635
At a very general level, our findings indicate that there is morphological information in636
the phonetic signal. This means that morphological information is present in post-lexical637
stages of speech production. This calls into question the distinction between lexical and638
post-lexical phonology, which has featured prominently in both theoretical linguistics, i.e.639
phonology, and psycholinguistics.640
In phonology, there is a long tradition of theoretical mechanisms like bracket erasure641
and cyclic application of morpho-phonological rules, after whose application there is no642
possibility that one could trace any information about a sound’s origin or structural643
status in the acoustic signal. The findings of this paper thus seriously challenge central644
tenets of the traditional models of Lexical Phonology and Morphology in the wake of645
Kiparsky (1982) (see also, for example, Coetzee and Pater 2011 and Scheer (2010) for646
critical discussion).647
In psycholinguistics, well-established models of speech production and the mental lex-648
icon are equally unable to accomodate our findings. Levelt et al. (1999), for example,649
assume that phonological representations are composed discrete segments and syllables,650
and the articulator module makes use of pre-programmed gestures that are stored in a syl-651
labary (Levelt et al., 1999:5). The articulator cannot provide a pre-programmed gesture652
for each syllable of a language if different meanings cause differences in these gestures. In653
other words, in such models morphologically dependent subphonemic detail is not part654
of these representations and needs therefore be accounted for by purely phonetic factors655
that influence articulatory implementation such as speech rate (e.g. Levelt (1989)). For656
our data, such an account is ruled out.657
There is, however, another line of psycholinguistic research that has looked at dis-658
tinct processing properties of different kinds of morphemes. Using data from aphasia,659
second language acqusition and code switching Myers-Scotton and Jake (2000) propose660
a four-way distinction between different types of morpheme (the so-called ‘4-M model’).661
According to the 4-M model there are content morphemes and systems morphemes, and662
the system morphemes are further subdivided into ‘early system morphemes’ and ‘late663
system morphemes’. These classes are not all very well defined, but it seems clear that664
plural belongs to the early system morphemes while genitive and 3rd person singular665
marking belong to the late system morphemes. The late system morphemes further666
consist of two subclasses, ‘bridge system morphemes’ and ‘outsider system morphemes’.667
According to Myers-Scotton and Jake (2000) genitive ‘s is a bridge morpheme, while668
3rd person singular is an outsider morpheme. The proposed four-way classification of669
morphemes is supported by the differential behavior of these morphemes in aphasia, sec-670
ond language acquisition and code-switching, and these authors relate the differences in671
behavior to differences in lexical access. For content and early system morphemes, lexical672
access happens at the lemma level, for the late system morphemes at what they call the673
‘functional level’, which is located at the level of the formulator (see, for example, Levelt674
1989).675
How can this model account for our results? Even if we assume that Myers-Scotton676
& Jake’s taxonomy is on the right track, there are severe problems. The first one is677
that their model does not say anything about the details of phonetic implementation,678
and it is unclear what the model would predict for this stage of speech production. If679
24
the 4-M carries over to articulation (in ways still to be investigated), we would expect680
plural (being an early system morphemes) to pattern with the auxiliary clitics, which is681
obviously not the case. What is more, the 4-M model seems to rest on the assumption682
that the morphemes are accessed separately from their bases. More recent research in683
morphological processing has cast serious doubt on this idea, and the jury is still out on684
the question of the psycholinguistic status of individual inflectional morphemes.685
In can thus be stated that both phonological theory and many psycholinguistic models686
fail to provide a convincing explanation of the kind of morphologically-induced phonetic687
variation that we find in our data. Exemplar-based models (e.g. Bybee 2001,Gahl and688
Yu 2006, Goldinger 1998, Pierrehumbert 2001, 2002, Johnson 2004) seem to be better689
equipped to deal with the kind of variation we find in our data. In such models a690
given word is linked to a frequency distribution over phonetic outcomes, as found in the691
environment in the speaker. These distributions are updated with new experiences, and692
subtle subphonemic differences in these experiences may result in representations that693
reflect these properties. For example, Pierrehumbert (2002) demonstrates how such a694
model can deal with the phonetic variability of lexemes. It is conceivable that similar695
models can be implemented to account for the subtle phonetic properties of different696
bound morphemes. The details of such an account still need to be worked out in future697
studies, however, as previous exemplar-based approaches have not included the subtle698
phonetic differences involved in the differentiation of allegedly homophonous affixes.699
To summarize, this paper presents the first study that has systematically investigated700
the relationship of morphemic status and phonetic implementation. This was done using701
natural conversation data. The analysis has yielded important evidence on the question702
of affix homonymy, revealing that phonologically homophonous bound morphemes can703
be phonetically distinct, and that morphemic and non-morphemic S may differ, too.704
This is unpredicted by current linguistic and psycho-linguistic theories of the lexicon and705
grammar. Further studies are certainly called for to replicate the observed effects, and to706
develop new models of the mental lexicon and of the relationships between morphology,707
phonology, and phonetic implementation.708
References709
Baayen, R. H., Davidson, D. J., and Bates, D. M. (2008). Mixed-effects modeling with710
crossed random effects for subjects and items. Journal of Memory and Language,711
59(4):390–412.712
Baayen, R. H. and Milin, P. (2010). Analyzing reaction times. International Journal of713
Psychological Research, 3(2):12–28.714
Baker, R., Smith, R., and Hawkins, S. (2007). Phonetic differences between mis-and dis-715
in english prefixed and pseudo-prefixed words. In Proceedings of the 16h International716
Congress of Phonetic Sciences.717
Bates, D., Maechler, M., Bolker, B., and Walker, S. (2014). lme4: Linear mixed-effects718
models using eigen and s4.719
25
Bauer, L., Lieber, R., and Plag, I. (2013). The Oxford reference guide to English mor-720
phology. Oxford University Press, Oxford.721
Bell, A., Brenier, J. M., Gregory, M., Girand, C., and Jurafsky, D. (2009). Predictability722
effects on durations of content and function words in conversational english. Journal723
of Memory and Language, 60(1):92–111.724
Berkovits, R. (1993). Progressive utterance-final lengthening in syllables with final frica-725
tives. Language and Speech, 36(1):89.726
Boersma, P. and Weenink, D. J. (2013). Praat. doing phonetics by computer (version727
4.3. 14)(2005).728
Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations. Journal of the729
Royal Statistical Society, Series B, 26(2):211–252.730
Bybee, J. L. (2001). Phonology and language use. Cambridge University Press, Cam-731
bridge.732
Byrd, D., Krivokapic, J., and Lee, S. (2006). How far, how long: On the temporal scope733
of prosodic boundary effects. Journal of the Acoustic Society of America, 120(3):1589–734
1599.735
Cho, T. (2001). Effects of morpheme boundaries on intergestural timing: Evidence from736
korean. Phonetica, 58:129–162.737
Chomsky, N. and Halle, M. (1968). The sound pattern of English. Harper and Row, New738
York.739
Coetzee, A. W. and Pater, J. (2011). The place of variation in phonological theory. The740
Handbook of Phonological Theory, Second Edition, pages 401–434.741
Cooper, W. E. (1976). Syntactic control of timing in speech production: A study of742
complement clauses. Journal of Phonetics, 4(2):151–171.743
Cooper, W. E., Paccia, J. M., and Lapointe, S. G. (1978). Hierarchical coding in speech744
timing. Cognitive Psychology, 10(2):154–177.745
Crawley, M. J. (2002). Statistical computing. An introduction to data analysis using746
S-Plus. John Wiley, Chichester.747
Cribari-Neto, F. and Zeileis, A. (2010). Beta regression in r. Journal of Statistical748
Software, 34(2):1–24.749
Davies, M. (2008). The corpus of contemporary american english: 400+ million words,750
1990-present.751
Drager, K. (2011). Sociophonetic variation and the lemma. Journal of Phonetics,752
39(4):694–707.753
Ernestus, M. and Warner, N. (2011). An introduction to reduced pronunciation variants.754
Journal of Phonetics, 39:253–260.755
26
Ferrari, S. and Cribari-Neto, F. (2004). Beta regression for modelling rates and propor-756
tions. Journal of Applied Statistics, 31(7):799–815.757
Fowler, C. A. (1988). Differential shortening of repeated content words produced in758
various communicative contexts. Language and Speech, 31:307–317.759
Fowler, C. A. and Housum, J. (1987). Talkers’ signalling of new and old words in speech760
and listeners’ perception and use of the distinction. Journal of Memory and Language,761
26:489–504.762
Fromont, R. and Hay, J. (2008). Onze miner: the development of a browser-based research763
tool. Corpora, 3(2).764
Gahl, S. (2008). ¡i¿time¡/i¿ and ¡i¿thyme¡/i¿ are not homophones: The effect of lemma765
frequency on word durations in spontaneous speech. Language, 84(3):474–496.766
Gahl, S., Yao, Y., and Johnson, K. (2012). Why reduce? phonological neighborhood den-767
sity and phonetic reduction in spontaneous speech. Journal of Memory and Language,768
66(4):789–806.769
Gahl, S. and Yu, A. C. L., editors (2006). Special Issue on Exemplar-based Models in770
Linguistics, volume 23.771
Goldinger, S. D. (1998). Echoes of echos? an episodic theory of lexical access. Psycho-772
logical Review, 105(2):251–279.773
Hanique, I. and Ernestus, M. (2012). The role of morphology in acoustic reduction.774
Lingue e Linguaggio, 11:147–164.775
Hanique, I., Ernestus, M., and Schuppler, B. (2013). Informal speech processes can be776
categorical in nature, even if they affect many different words. The Journal of the777
Acoustical Society of America, 133(3):1644–1655.778
Herberich, E., Sikorski, J., Hothorn, T., and Rapallo, F. (2010). A robust procedure for779
comparing multiple means under heteroscedasticity in unbalanced designs. PLoS ONE,780
5(3):e9788.781
Jaccard, J., Wan, C. K., and Turrisi, R. (1990). The detection and interpretation of782
interaction effects between continuous variables in multiple regression. Multivariate783
Behavioral Research, 25(4):476–478.784
Johnson, K. (2004). Massive reduction in conversational american english. In Spontaneous785
speech: data and analysis. Proceedings of the 1st session of the 10th international786
symposium, pages 29–54, Tokyo and Japan.787
Jurafsky, D., Bell, A., Gregory, M., and Raymond, W. D. (2001). Probabilistic relations788
between words: Evidence from reduction in lexical production. In Bybee, J. and789
Hopper, P. J., editors, Frequency and the emergence of linguistic structure, pages 229–790
254. Benjamins, Amsterdam.791
27
Kemps, R., Ernestus, M., Schreuder, R., and Harald Baayen, R. (2005a). Prosodic cues792
for morphological complexity: The case of dutch plural nouns. Memory & Cognition,793
33(3):430–446.794
Kemps, R., Wurm, L. H., Ernestus, M., Schreuder, R., and Baayen, H. (2005b). Prosodic795
cues for morphological complexity in dutch and english. Language and Cognitive Pro-796
cesses, 20:43–73.797
Kiparsky, P. (1982). Lexical morphology and phonology. In Yang, I.-S., editor, Linguistics798
in the Morning Calm: Selected Papers from SICOL, pages 3–91. Hanshin, Seoul.799
Klatt, D. H. (1976). Linguistic uses of segmental duration in english: Acoustic and800
perceptual evidence. Journal of the Acoustical Society of America, 59(5):1208–1221.801
Kuznetsova, A., Brockhoff, P. B., and Bojesen Christensen, R. H. (2014). lmertest.802
Levelt, W. J. M. (1989). Speaking. From intention to articulation. The MIT Press,803
Cambridge and Mass.804
Levelt, W. J. M., Roelofs, A., and Meyer, A. S. (1999). A theory of lexical access in805
speech production. Behavioral and Brain Sciences, 22:1–38.806
Levelt, W. J. M. and Wheeldon, L. R. (1994). Do speakers have access to a mental807
syllabary. Cognition, 50:239–269.808
Lindblom, B. (1963). Spectrographic study of vowel reduction. The Journal of the809
Acoustical Society of America, 35(11):1773–1781.810
Marian, V., Bartolotti, J., Chabal, S., Shook, A., and White, S. A. (2012). Clearpond:811
Cross-linguistic easy-access resource for phonological and orthographic neighborhood812
densities. PLoS ONE, 7(8):e43230.813
Myers-Scotton, C. and Jake, J. (2000). Four types of morpheme: evidence from aphasia,814
code switching, and second-language acquisition. Linguistics, 38(6; ISSU 370):1053–815
1100.816
Nooteboom, S. G. (1972). Production and perception of vowel duration: A study of the817
durational properties of vowels in Dutch. University of Utrecht, Utrecht.818
Oh, G. E. and Redford, M. A. (2012). The production and phonetic representation of819
fake geminates in english. Journal of Phonetics, 40(1):82–91.820
Oller, D. K. (1973). The effect of position in utterance on speech segment duration in821
english. Journal of the Acoustic Society of America, 54:1235–1247.822
Pierrehumbert, J. B. (2001). Exemplar dynamics: Word frequency, lenition and contrast.823
In Bybee, J. and Hopper, P. J., editors, Frequency and the emergence of linguistic824
structure, pages 137–158. Benjamins, Amsterdam.825
Pierrehumbert, J. B. (2002). Word-specific phonetics. In Gussenhoven, C. and Warner,826
N., editors, Laboratory Phonology VII, pages 101–140. Mouton de Gruyter, Berlin.827
28
Pitt, M. A., Dilley, L., Johnson, K., Kiesling, S., Raymond, W., Hume, E., and Fosler-828
Lussier, E. (2007). Buckeye corpus of conversational speech (2nd release). Columbus,829
OH: Department of Psychology, Ohio State University.830
Pluymaekers, M., Ernestus, M., and Baayen, H. (2005a). Articulatory planning is con-831
tinuous and sensitive to informational redundancy. Phonetica, 62:146–159.832
Pluymaekers, M., Ernestus, M., and Baayen, R. H. (2005b). Lexical frequency and833
acoustic reduction in spoken dutch. Journal of the Acoustical Society of America,834
118(4):2561–2569.835
Pluymaekers, M., Ernestus, M., Baayen, R. H., and Booij, G. (2010). Morphological836
effects in fine phonetic detail: The case of dutch -igheid. In Fougeron, C., editor,837
Laboratory phonology 10, Phonology and phonetics, pages 511–531. Mouton de Gruyter,838
Berlin and New York.839
R Development coreteam (2011). R: A language and environment for statistical comput-840
ing.841
Salverda, A. P., Dahan, D., and McQueen, J. M. (2003). The role of prosodic boundaries842
in the resolution of lexical embedding in speech comprehension. Cognition, 90:51–89.843
SAS Institute Inc. (1978). Sas technical report r-101: Tests of hypotheses in fixed-effects844
linear models.845
Scheer, T. (2010). A guide to morphosyntax-phonology interface theories: how extra-846
phonological information is treated in phonology since Trubetzkoy’s Grenzsignale. Wal-847
ter de Gruyter.848
Torreira, F. and Ernestus, M. (2009). Probabilistic effects on french [t] duration. In849
INTERSPEECH, pages 448–451.850
Torsten Hothorn, Frank Bretz, and Peter Westfall (2008). Simultaneous inference in851
general parametric models. Biometrical Journal, 50(3):346–363.852
Tremblay, A. and Ransijn, J. (2013). Lmerconveniencefunctions.853
Turk, A. E. and Shattuck-Hufnagel, S. (2007). Multiple targets of phrase-final lengthening854
in american english words. Journal of Phonetics, 35(4):445–472.855
Turk, A. E. and White, L. (1999). Structural influences on accentual lengthening in856
english. Journal of Phonetics, 27:171–206.857
Venables, W. N. and Ripley, B. D. (2002). Modern Applied Statistics with S-Plus.858
Springer, New York, 4 edition.859
Zeileis, A. (2004). Econometric computing with hc and hac covariance matrix estimators.860
Journal of Statistical Software, 11(10):1–17.861
Zeileis, A. (2006). Object-oriented computation of sandwich estimators. Journal of862
Statistical Software, 16(9):1–16.863
29