Homophony and morphology: The acoustics of …...Homophony and morphology: The acoustics of word- nal S in English Ingo Plag, Julia Homann & Gero Kunter September 6, 2014 1 Abstract

Homophony and morphology:The acoustics of word-final S in English

Ingo Plag, Julia Homann & Gero Kunter

September 6, 2014

Abstract1

Recent research has shown that homophonous lexemes show striking phonetic2

differences (e.g. Gahl 2008, Drager 2011), with important consequences for mod-3

els of speech production such as Levelt et al. (1999). These findings also pose4

the question of whether similar problems hold for allegedly homophonous affixes5

(instead of free lexemes). Earlier experimental research found some evidence that6

morphemic and non-morphemic sounds may differ acoustically (Walsh & Parker7

1983, Losiewicz 1992). This paper investigates this question by analyzing the pho-8

netic realization of non-morphemic /s/ and /z/, and of six different English /s/9

and /z/ morphemes (plural, genitive, genitive-plural and 3rd person singular, as10

well as cliticized forms of has and is). The analysis is based on more than 60011

tokens extracted from natural conversations (Buckeye Corpus, Pitt et al. 2007).12

Two important results emerge. First, there are significant differences in acoustic13

duration between some morphemic /s/’s and /z/’s and non-morphemic /s/ and /z/,14

respectively. Second, there are significant differences in duration between some of15

the morphemes. These findings challenge standard assumptions in morphological16

theory, Lexical Phonology and models of speech production. Phonetic detail needs17

to have a place in models of morphology-phonology interaction.18

1 Introduction19

Recent studies on lexeme homophony have shown that seemingly homophonous words20

actually differ in phonetic detail such as length and vowel quality (Gahl 2008, Drager21

2011). Likewise, realizational differences can be found in complex words. Kemps et al.22

(2005b,a) showed that free and bound variants of a stem differ acoustically and that23

listeners make use of such phonetic cues in speech perception.24

Findings like these pose a challenge to traditional models of speech production which25

locate frequency information at the level of the phonological form, and which postulate26

that phonetic processing does not have access to morphological information (e.g. Chom-27

sky and Halle 1968, Levelt and Wheeldon 1994, Levelt et al. 1999).28

Homophony below the level of the lexeme, however, has not received much attention so29

far. There are two questions that warrant closer inspection. The first concerns seemingly30

homophonous segments that represent morphemes and those that do not. For example,31

the morphemic /s/ in laps might acoustically differ from the non-morphemic /s/ in32

1

lapse, as suggested by Walsh & Parker (1983). The second question of interest concerns33

morphemes that share one phonological form while representing different meanings or34

functions. There could be fine phonetic differences between such morphemes, as for35

example between plural /s/ (as in two heaps) and third person singular /s/ (as in she36

keeps).37

In this paper, we investigate both questions. Using data from the Buckeye Corpus38

(Pitt et al. 2007), we look into the issue of morpheme homophony by investigating the39

phonetic realization of non-morphemic final /s/ and /z/, and of different English /s/40

and /z/ morphemes, which can denote plural, genitive, genitive-plural and 3rd person41

singular, as well as cliticized forms of has and is. We show that there are significant42

differences between many of the different kinds of /s/ and /z/. Our regression models with43

a number of pertinent covariates (e.g. speech rate, position in the phonological phrase,44

frequency etc.) demonstrate, for example, that plural and non-morphemic /s/ and /z/45

are significantly longer than most other /s/ and /z/ morphemes. These findings are46

unexpected from a theoretical perspective and pose further challenges to extant theories47

of morphology and widely accepted models of speech production.48

The paper is structured as follows. In the next section we lay out in more detail the49

general problem of the relationship between morphological structure and the phonetic50

signal, and we will develop our research questions. Section 3 introduces the methodology51

of our corpus-based study. The results are presented in section 4, followed by a discussion52

and conclusion.53

2 Morphological structure and the phonetic signal54

Traditionally, morphemes are defined as linguistic units that consist of a meaning and a55

phonological form. In the case of allomorphy, one meaning is linked to several phonolog-56

ical forms whose choice depends on constraints that can be phonological, morphological57

or lexical in nature. A textbook example of such allomorphy are the allomorphs /s/,58

/z/ and /ız/ of the English plural morpheme. If the base noun ends in a sibilant, /ız/59

is chosen, if the base ends in a non-sibilant voiceless consonant, /s/ is chosen, and /z/60

occurs in all other contexts. This distribution is exactly the same for the English third61

person singular marker, i.e. the two morphemes are identical at the form level, both in62

terms of their exponents and in terms of their distribution. /s/, /z/ and /ız/ are also al-63

lomorphs of the genitive marker (though with slightly different distributional constraints,64

see Bauer et al. 2013:69, 129, 145). The plural morpheme, the genitive-plural morpheme,65

the genitive morpheme and the third person singular morpheme thus share homophonous66

exponents. Similarly, the cliticized forms of has and is can either be /s/ or /z/ depending67

on the presence of voicing in the context, which makes them homophones to each other68

and to a subset of the allomorphs of the plural, 3rd singular and genitive morphemes69

(Bauer et al. 2013:85). Crucially, there is nothing in the representation of the allomorphs70

that could cause systematic differences in phonetic implementation between the different71

morphemes, nor between the same segments when they do not represent morphemes.72

The said morphemes are treated in a similar way in standard formal theories of73

morphology-phonology interaction (e.g. Chomsky and Halle 1968, Kiparsky 1982). In74

such models the allomorphy is determined at a particular phonological cycle inside the75

2

lexicon, and at the level of underlying representations. Once the right underlying form76

is derived, the morphological boundary of the respective cycle is erased (a process called77

’bracket erasure’, see Chomsky and Halle 1968, Kiparsky 1982) and the form leaves the78

lexicon. All further phonological processes are relegated to another module called ’post-79

lexical phonology’ and later to the articulatory component, neither of which have access80

to morphological information. As in traditional structuralist accounts, there is nothing81

in the system that would allow for systematic phonetic differences between homophonous82

suffixes, or between morphemic and non-morphemic sounds.83

The distinction between lexical and post-lexical processes features also prominently in84

psycholinguistics. According to widely accepted models of speech production (e.g. Levelt85

et al. 1999), the aforementioned morphemes and clitics would not differ in their realiza-86

tion from corresponding non-morphemic /s/, /z/ and /ız/. In these models, meanings87

are stored in the mental lexicon, and their corresponding forms are represented phono-88

logically. Thus, what is used as a basis for articulation is the phonological form only,89

and the module called ’articulator’ does not have access to any information regarding90

the lexical origin of a sound. Leaving stylistic and accentual differences aside, a certain91

string of phonemes in a given context will therefore always be articulated in the same way,92

irrespective of its morphemic status, and only modulo the phonetic variation originating93

from purely phonetic sources such as speech rate or context.94

Recent research on the homophony of lexemes suggests, however, that such a model of95

speech production may be insufficient. Gahl (2008) investigated the acoustic realization of96

223 supposedly homophonous word pairs such as time and thyme and found that, quite97

consistently, the more frequent member of the pair, e.g. time, is significantly shorter98

than the respective less frequent one, e.g. thyme. This can be taken as evidence that two99

homophonous lexemes cannot be represented exclusively by one identical phonological100

form with information on their combined frequency, but that the individual frequencies101

must be stored with the respective lemmas and have an effect on their articulation.102

Similarly, Drager (2011) found that the different functions of like go together with103

different acoustic properties. Whether like is used as an adverbial, as a verb, as a discourse104

particle or as part of the quotative be like has an effect on several phonetic parameters,105

i.e. the ratio of the duration of /l/ to vowel duration, on the pitch level and on the degree106

of monophthongization of the vowel /aı/. These fine differences indicate that homophony107

of two or more lemmas at the phonetic level may not exist.108

Below the word level, there is evidence that phonemically identical strings may sys-109

tematically vary in their phonetic realization, depending on morphemic status. This runs110

counter to not only to standard models of speech production but also to the structuralist111

and formal theories of phonology-morphology interaction. Kemps et al. (2005b) found112

that free and bound variants of a base (e.g. help without a suffix as against help in helper)113

differ acoustically, even if no morpho-phonological alternations apply. Furthermore, the114

authors show that Dutch and German listeners do make use of such phonetic cues in115

speech perception (see also Kemps et al. 2005a).116

From an articulatory perspective, differences in the variability of intergestural timing117

in monomorphemic and complex words also point at incongruities in the representations118

of homophones. In an EPG study, Cho (2001) found that in Korean, timing of the gestures119

for [ti] and [ni] shows more variation when the sequence is heteromorphemic than when it120

is tautomorphemic, which indicates that morphological structure is reflected in the details121

3

of the articulatory gestures, with potential acoustic correlates in the speech signal.122

In view of these findings, homophony below the level of the lexeme seems worthy of123

closer inspection. If there are fine differences between supposedly homophonous free lex-124

emes, there could well be systematic differences between morphemic and non-morphemic125

sounds, and differences between supposedly homophonous bound morphemes. Not much126

research has been done in this area, but there are three experimental studies available127

that have looked at morphemic vs. non-morphemic (strings of) sounds.128

Walsh & Parker (1983) carried out a production experiment with three homophonous129

word pairs and measured the length of /s/ in monomorphemic words and in words that130

were homophonous to the monomorphemic ones but contained a final morphemic /s/ (e.g.131

lapse vs. laps). The experiment had three different conditions, and in each condition132

the word pairs were presented in a different context. The authors then compared the133

means of morphemic and non-morphemic /s/ across the three different conditions of their134

experiment. In two of the conditions there is a small difference of nine milliseconds in135

the means of the two different kinds of /s/, while the third condition shows no difference136

between morphemic and non-morphemic /s/. Based on these means the authors conclude137

that ”the durational differences of final /s/’s observed in Condition I and Condition II138

appear to be a function of the morphological status of the /s/.” (Walsh & Parker 1983:139

204).1 Given the very small data set and other methodological problems such as the lack140

of any inferential statistical analysis, or the integration of any phonetic covariates, Walsh141

& Parker’s results may be met with great scepticism.142

In a similar study, Losiewicz (1992) investigated the acoustic difference between mor-143

phemic, i.e. past tense, /d/ and /t/, and non-morphemic /t/ and /d/, and also found144

durational differences between the two sets of sound. As Hanique and Ernestus (2012)145

point out, however, Losiewicz’s study suffers from serious methodological problems, such146

as very small data sets and the use of insufficient frequency measures. Another problem147

that is not mentioned by Hanique & Ernestus is that Losiewicz tested both /t/ and /d/148

without including voicing as a co-variate.149

Baker et al. (2007), also using experimental data, found acoustic differences (in dura-150

tional and amplitude measurements) between morphemic and non-morphemic initial mis-151

and dis- (as in, e.g., distasteful vs. distinctive. Again there are methodological problems,152

such as the fact that morphemic and non-morphemic strings did not only differ in mor-153

phemic status but also in phonological properties that may have directly affected their154

acoustic realization, such as stress. For example, distasteful may have a secondary stress155

on the first syllable while distinctive may not, which may have an effect on duration and156

amplitude.157

In sum, there is some evidence that there might be systematic duration differences158

between affixes and the corresponding homophonous non-morphemic sounds, but the data159

sets are very small and do not represent natural conversational speech, and the effects160

found are not always convincing due to methodological shortcomings. Nevertheless the161

previous results are promising enough to warrant further inquiry into the homophony of162

morphemic and non-morphemic sounds.163

1A follow-up perception experiment reported briefly in Walsh & Parker (1983) did not yield evidencethat listeners were able to make use of the phonetic information. In this paper, we are exclusivelyconcerned with the acoustic properties of /s/ and /z/, the problem of perception remains a topic offuture research.

4

With regard to the acoustic differences between different homophonous suffixes, we164

have to say that to our knowledge, this question has never been systematically investi-165

gated, although it potentially has important theoretical implications, as any effect found166

in this area runs counter to established theories of morphology-phonology interaction and167

models of speech production. Systematic morpho-phonetic effects would call for a recon-168

sideration of the place of phonetic detail in lexical representation and lexical processing.169

A systematic study is therefore called for that investigates the two aspects of suffix170

homophony on a larger scale, preferably using data from natural conversations. This171

paper presents such a study, testing the two null hypotheses given in (1) and (2).172

(1) Null Hypothesis 1173

There is no durational difference between non-morphemic and morphemic sounds174

(e.g. between plural /s/ and non-morphemic word-final /s/)175

(2) Null Hypothesis 2176

There is no durational difference between homophonous suffixes (e.g. between177

plural /s/ and 3rd person singular /s/)178

3 Methodology179

3.1 Data180

Let us first look at the data to test Null Hypothesis 2, i.e. morphemic /s/ and /z/. We181

will investigate six morphemes that share the allomorphs /s/ and /z/, namely the 3rd182

person singular marker, the plural marker, the genitive marker, the combined genitive-183

plural marker, the cliticized form of has and the cliticized form of is. We focus in this184

paper on the duration of the two allomorphs /s/ and /z/. The allomorph /ız/, which is185

restricted to plural, genitive and third person singular marking, is not considered.2186

A note on our terminology is in order. We will use capitalized ’S’ as an umbrella term187

for the two segments /s/ and /z/ in word-final position. The term ’morphemic S’ will be188

used as an umbrella term for the clitics and suffixes. Furthermore we will used he term189

’base’ both for morphological bases as well as for hosts in cliticization, and for the string190

of sounds that precedes the final S in mono-morphemic words.191

As data source for this study we used the Buckeye Corpus of Conversational Speech192

(Pitt et al. 2007). This corpus comprises about 300,000 words from 40 long-time local193

residents of Columbus, Ohio, who were recorded conversing freely with an interviewer194

for about one hour each. In addition to the raw speech files, the Buckeye Corpus offers195

time-aligned written and phonetic transcriptions of the interviews.196

By using data from natural conversation instead of experimentally elicited data we197

take a rather bold approach. While experimental data may provide a better opportunity198

to control potentially intervening variables in various ways, data obtained in this way199

sometimes also raise concerns about their validity. It has been shown however, that,200

primarily due to modern statistical techniques such as mixed-effects regression (Baayen201

2Another potential candidate is cliticized us, as in let’s. As this form does not show any variation withregard to its host (it is always let, a comparison with the other /s/ and /z/ morphemes in a mixed-effectsregression model is problematic. Cliticized us was therefore excluded from this study.

5

et al. 2008), more natural data can be fruitfully employed to investigate issues of morpho-202

phonetic detail by including pertinent covariates that control for many sources of vari-203

ability in the data (see, for example, Ernestus and Warner (2011) for an overview).204

Examples of the different kinds of morphemic S from our dataset are given in (3).205

(3) a. It feels like they’re trying to tease me (3rd person singular)206

b. A grand total of six years as an undergrad (plural)207

c. Write a short story in that author’s style (genitive)208

d. The local senior citizens ’ home (genitive plural)209

e. That’s the way my family’s always been (has clitic)210

f. My brother’s twenty four (is clitic)211

For the analysis of length differences between different types of morphemic S we needed212

items that consisted of a base and of one of the types of morphemic S introduced above.213

While the clitics can take all sorts of bases, affixal S is more restricted in this respect.214

To keep the set of items as homogeneous as possible, only items with verbs, nouns, and215

pronouns (indefinite and personal) as bases entered the data set.216

Using the automatic annotations of the Buckeye corpus, 100 tokens of each morphemic217

S were randomly extracted. If a type occurred more than 12 times, all additional tokens218

were replaced by other, randomly sampled types. If less than 100 tokens were available219

for a certain kind of morphemic S, all available tokens were extracted. The overall set220

of suffixed items amounted to 460 (i.e. tokens), representing 293 types. Of these items,221

9 were excluded because acoustic and visual inspection revealed that the morphemic S222

was not realized as [s] or [z] in the speech signal (but e.g. as [ız] or [S], or it was omitted223

completely) and in one case the speaker purposefully lengthened the S for stylistic reasons.224

After inspection of the distribution of the duration measurements we also excluded as225

outliers items that were longer than 250 ms. Eventually, 447 items entered the acoustic226

analysis of morphemic S.227

In order to investigate Null Hypothesis 1, i.e. the potential difference between mor-228

phemic and non-morphemic S, we also sampled a set of non-morphemic word-final S229

tokens from the corpus. This sample was created as follows. In an initial step we ex-230

tracted all words from the corpus that ended in [s] or [z], irrespective of their expected231

standard pronunciation. Multimorphemic words, irregular third person singular forms232

(i.e. does, has and is), pronouns, and determiners were manually removed from this233

initial set. If a mono-morphemic word, i.e. type, was produced more than once by a234

speaker, only one randomly selected token of that type entered the pre-final data set235

(N=3057 tokens).3 From these words, 240 words were randomly sampled. Items with236

anomalies in the signal were excluded, as were outliers that were longer than 250 ms.237

The final data set consisted of 223 words with non-morphemic S, about half the number238

of items as in the set of morphemic S.239

In the final data set, the different types of S were distributed as shown in Figure 1.240

3We use the notion of ‘type’ here to refer to a specific pronunciation of a word as given in the Buckeyetranscriptions. That is, if a given word was pronounced in different ways, for example with variabledeletion of a particular segment, this was treated as different types.

6

Figure 1: Distribution of different types of S in the data set. (Abbreviations: s = non-morphemic S, GEN = genitive, PL-GEN = genitive plural)

3.2 Analysis241

3.2.1 Acoustic measurements242

With the help of LaBB-CAT (Fromont and Hay 2008), a freely available speech corpus243

management system formerly known as ONZE Miner, the Buckeye transcripts were con-244

verted into textgrid files. These files could then be used for further segmentation and245

analysis with the help of the acoustic analysis software Praat (Boersma and Weenink246

2013). Buckeye’s (partly automatic) phonetic annotations were checked for each item247

and adjusted where necessary. After manual checking and adjustment of all relevant248

intervals acoustic measurements were taken automatically with the help of a script.249

We wanted to model the length of S both in absolute terms and in relative terms, i.e.250

in relation to the length of its base. Studies of geminates (e.g. Oh and Redford 2012)251

have shown that differences in phonetic duration can be meaningfully interpreted as252

relative or absolute, depending, among other things, on the language under investigation.253

Given that very little is known about the absolute or relative length of affixes, it seemed254

reasonable to test both kinds of dependent variables. Therefore, the absolute duration255

of S in a given token was obtained from the duration of the segment in milliseconds, and256

the relative duration was calculated by dividing the absolute duration of the S by the257

duration of the whole word, i.e. by the sum of base duration and S duration.258

The distribution and means of the duration measurements by type of S are given in259

figure 2. Each dot represents one measurement and the lines indicate the means.260

7

Figure 2: Duration of different types of S in the data set. (Abbreviations: s = non-morphemic S, GEN = genitive, PL-GEN = genitive plural)

The distribution of the measurements shows rather clear differences between the dif-261

ferent types of S, and an anova shows a significant effect of type of S (F=20.196,262

p¡2.2e-16). The results of pair-wise comparisons of the means using Tukey contrasts are263

summarized in Table 1.4 We find ten significant contrasts. Non-morphemic S and plu-264

ral S do not differ from one another but are significantly longer than all other types of265

morphemic S.266

4For the comparisons the packages multcomp (Torsten Hothorn et al. 2008) and sandwich (Zeileis2004, 2006) were used, implementing the specific procedure suggested by Herberich et al. 2010 for sce-narios with unbalanced group sizes, non-normality and heteroscedasticity.

8

Table 1: Multiple comparison of means of duration of S (Tukey contrasts). (Significancecodes: ’***’ p<0.001 ’*’ p<0.01, ’*’ p<0.05)

Estimate std. Error t value Pr(>—t—)plural - s -0.0130316 0.0053468 -2.437 0.181463rdsg - s -0.0335883 0.0046443 -7.232 >0.001 ***GEN - s -0.0301727 0.0045179 -6.679 >0.001 ***has - s -0.0440993 0.0048154 -9.158 >0.001 ***is - s -0.0334490 0.0039892 -8.385 >0.001 ***PL-GEN - s -0.0332975 0.0049425 -6.737 >0.001 ***3rdsg - plural -0.0205567 0.0058582 -3.509 0.00844 **GEN - plural -0.0171410 0.0057585 -2.977 0.04664 *has - plural -0.0310676 0.0059948 -5.182 >0.001 ***is - plural -0.0204174 0.0053538 -3.814 0.00286 **PL-GEN - plural -0.0202659 0.0060974 -3.324 0.01594 *GEN - 3rdsg 0.0034157 0.0051129 0.668 0.99407has - 3rdsg -0.0105109 0.0053776 -1.955 0.43954is - 3rdsg 0.0001393 0.0046524 0.030 1.00000PL-GEN - 3rdsg 0.0002908 0.0054917 0.053 1.00000has - GEN -0.0139266 0.0052688 -2.643 0.11228is - GEN -0.0032764 0.0045262 -0.724 0.99087PL-GEN - GEN -0.0031249 0.0053852 -0.580 0.99727is - has 0.0106502 0.0048232 2.208 0.28772PL-GEN - has 0.0108017 0.0056372 1.916 0.46427PL-GEN - is 0.0001515 0.0049501 0.031 1.00000

While this may look already like a very interesting result, it has to be treated with267

great caution. Previous studies of the acoustic duration of words and individual sounds268

have shown that this acoustic parameter is subject to various acoustic and non-acoustic269

factors, such as speech rate, frequency, final lenghtening etc. Variable durations for the270

same word type may also arise from more general processes of phonetic reduction, which271

may affect duration but also non-durational parameters such as vowel quality or number272

of segments and syllables. In any study interested in one particular factor, in our case273

the influence of morphological status, these other influences need to be controlled, for274

example as covariates in a regression analysis.275

3.2.2 Covariates276

The specific set of covariates chosen for the present study is very similar to that of other277

studies that have investigated duration effects in morphologically complex words using278

speech corpus data (for example Pluymaekers et al. 2005b, 2010, Hanique et al. 2013).279

In the following we will briefly discuss each of the covariates.280

Local speech rate. An obvious variable that needs to be controlled for is speech281

rate. We used two measures of speech-rate. As one measure we calculated the speech rate282

in the discourse around each item (i.e. the ’local’ speech rate in syllables per second).283

This was done by counting the number of syllables in the context of up to ten seconds284

immediately preceding and succeeding the item, if not interrupted beforehand, e.g. by a285

9

pause or by the interviewer. The number of syllables was then divided by the number of286

seconds of context that was considered.287

Base duration. With regard to an even more local speech rate, we measured the288

length of the base. Other things being equal, a longer duration of the base will indicate289

a slower speech tempo. This measurement is also able to at least partially control for290

the lengthening effect of accentuation: as Turk and White (1999) show, word-final (non-291

morphemic) consonants are significantly longer if the word they belong to is accented.292

Number of syllables. We also included another measure of length, the number293

of syllables. As shown in Lindblom (1963) for Swedish vowels, and Nooteboom (1972)294

for Dutch vowels, vowel segments may tend to be shorter if they are followed by more295

syllables (but see Hanique et al. (2013) for somewhat different findings). It is unclear296

whether this effect can also be found in English, and whether it pertains to fricatives,297

but it seemed safe to include this covariate nevertheless. In any case, this effect can be298

conceptualized as a kind of compression effect, where words with more syllables undergo299

reduction. The syllable counts for the bases in our data set were extracted from CELEX300

where possible, in all other cases the number of vowels (or diphthongs) in a word was301

taken as an indicator for its number of syllables.302

Frequency. Frequency also affects phonetic duration, with more frequent words303

exhibiting more reduction, i.e. shorter segment durations (e.g. Bybee 2001:78, Jurafsky304

et al. 2001). We used two kinds of frequency, base frequency and form frequency. For base305

frequency we took the log-transformed frequency of the base in the spoken part of the306

Corpus of Contemporary American English (COCA, Davies 2008-), for the form frequency307

we calculated the log-transformed frequency of the word including the final S, ignoring308

what kind of S is attached in a given token. The frequencies were log-transformed to309

reduce the potentially harmful effect of skewed distributions in linear regression models.310

Previous mention. Another reduction effect may arise from online priming. The311

more often a complex word or its base have been mentioned in the previous discourse,312

the shorter we expect their durations to be in a given case (e.g. Fowler 1988, Fowler313

and Housum 1987, Gahl et al. 2012). We therefore included a covariate that counted the314

number of previous mentionings in a time window of 30 seconds preceding the token in315

question.316

Bigram frequency. Another potential factor influencing the duration of a word in317

running speech is the predictability of the word in its context (e.g. Jurafsky et al. 2001,318

Pluymaekers et al. 2005a, Bell et al. 2009, Torreira and Ernestus 2009). For content words,319

recent studies unanimously show that it is the upcoming context that has an effect on320

acoustic reduction (affecting also duration, e.g. Pluymaekers et al. 2005a, Bell et al.321

2009, Torreira and Ernestus 2009). To account for this durational effect we measured the322

bigram frequency of the item in question and its following word.323

Neighborhood density. Finally, we included phonological neighborhood densities as324

covariates, as it has been shown that the number of phonological neighbors may influence325

phonetic reduction (and hence word duration), as denser networks facilitate articulation326

(see, for example, Gahl et al. 2012). Neighborhood density measures were taken from the327

clearpond data base (Marian et al. 2012).328

Number of consonants in the rhyme. Another factor influencing the length of329

consonantal segments is their occurrence in a cluster. Consonants in clusters tend to330

be shorter (e.g. Klatt 1976). We therefore included the number of consonants in the331

10

rhyme of the final syllable of the base as a covariate, expecting that the more consonants332

precede the S the shorter the individual segments, including S, would become. Since this333

numerical variable had only three values (0, 1 and 2) the variable was transformed into a334

categorical variable to prevent the model to come up with nonsensical estimates (e.g for335

a word with 1.2 consonants in the rhyme).336

Voicing. We included voicing to account for the effect of allomorphy. It is also well-337

known that voicing affects the length of fricatives, with voiced fricatives being shorter338

(e.g. Klatt 1976). Based on an inspection of the distribution of the measurements taken339

by the algorithm used in Praat, an S was considered to be voiced if the algorithm could340

detect a periodic pitch pulse in more than 75 percent of the overall duration of the341

segment.342

Position in utterance. The position of an item in an utterance also plays a role343

for duration. Words and segments (especially fricatives) at the end of an utterance are344

subject to lenghtening (e.g. Oller 1973:1244, Berkovits 1993). Thus, we coded for each345

item whether it was found in the final position of an utterance, or in medial position. We346

would expect an S as the final segment of an utterance to be longer than an S we find in347

other positions.348

Syntactic position. Similarly, segments before a syntactic or prosodic boundary are349

lengthened (see, for example, Klatt 1976, Cooper 1976, Cooper et al. 1978, Byrd et al.350

2006). For our data, determining the boundaries of intonation phrases turned out to351

be highly problematic and unreliable, and we therefore opted for a syntax-based coding.352

We coded of each item whether it occurred at the right boundary of a syntactic phrase353

(e.g. an NP). Such a coding can at least partially control for differences that might occur354

due to syntax-based prosodic effects, for example, between phrase-final plural nouns and355

pre-head genitives.356

Following segment. We also coded the first segment of the word following the S357

(if it occurred in medial position), as this segment might also have an influence on the358

duration of the preceding S (e.g. Klatt 1976).359

3.2.3 Statistical analysis360

We devised two types of analysis, with different constellations of variables and different361

statistical models. In the first analysis we used absolute duration of S as the de-362

pendent variable (Model 1), whereas the second analysis had proportion of S as its363

dependent variable.364

Model 1 was fitted using mixed-effects regression, as implemented in the packages365

nlme4 (Bates et al. 2014) and lmerTest (Kuznetsova et al. 2014) for R (R Development366

coreteam 2011). Mixed-effects regression brings the variation of random effects such as367

subject or item under statistical control, and can deal with unbalanced data sets. The368

latter property is especially welcome since not all combinations of all values of the different369

predictors are represented in our data with equal frequency.370

The mixed-effects model was fitted adhering to the following strategy: In the initial371

model, alongside the explanatory variable type of S, we included the control variables372

discussed in the previous section.373

This initial model was then reduced through stepwise exclusion of insignificant factors374

(e.g. Baayen 2008). A factor was only considered significant if it proved to pass three375

11

tests: first, its t-statistics had to yield a t-value of greater than 2 (or less than -2) when376

included in the model. Second, the Akaike information criterion (AIC) of the model377

including the factor had to be lower than the AIC of the model without it. Third, an378

ANOVA comparing the model including the factor to a model without it had to yield a379

p-value lower than 0.05, thus showing that the inclusion of the factor did significantly380

improve the fit of the model. A variable under consideration was only retained in the381

model if it passed all three tests. When testing for interactions, numerical variables were382

first standardized to avoid skewed results (e.g. Jaccard et al. 1990).383

One of the central assumptions of any linear regression model is a linear relationship384

between the dependent and independent variables. If this assumption is not met, the385

estimated coefficients may be highly unreliable. In this case, the dependent variable may386

often be transformed to alleviate for any problem resulting from a lack of linearity. The387

Box-Cox transformation (Box and Cox 1964, Venables and Ripley 2002) can be employed388

to identify a suitable transformation parameter λ for a power transformation. For the389

transformation of the absolute duration measures used in Model 1 the optimal value of390

λ was λ =0.1414141.391

Let us turn to Model 2. In this analysis the dependent variable was the relative length392

of S (proportion of S, calculated as the (non-transformed) absolute duration of the S393

divided by the duration of the whole word). As this variable is bounded between 0 and 1394

and has a skewed distribution, linear regression is ruled out and a model is needed that395

can cope with these properties of the target variable. Beta regression (e.g. Ferrari and396

Cribari-Neto 2004) is such a model. We used the R package betareg (Cribari-Neto and397

Zeileis 2010) for this analysis. For the beta regression models a slightly different fitting398

strategy had to be adopted. This will be explained in more detail in section 4.2.399

Since number of syllables and length of the base are highly correlated variables, we400

residualized the number of syllables on the duration of the base and included this new401

predictor, resSylNum, in the model. The same procedure was applied to base frequency402

and form frequency, so that the residualized form frequency was included.403

3.2.4 Overview of the data404

An overview of variables and their distributions is given in Table 2. The names of the405

(sometimes transformed) variables that entered the analysis are given in small capitals.406

12

Table 2: Summary of the dependent variables and covariates used in the initial models.

Dependent variables N Mean St. Dev. Min Max

Absolute duration of S durationOfS 670 0.086 0.040 0.022 0.242Relative duration of S proportionS 670 0.220 0.088 0.042 0.674

Numerical predictors N Mean St. Dev. Min Max

Local speech rate sylSec 670 5.782 1.330 2.376 12.977Base duration baseDuration 670 0.323 0.132 0.030 0.929Number of syllables resSylNum 670 0.001 0.684 −1.662 3.092Base frequency logBaseFreq 670 8.921 2.159 1.792 14.493Form frequency resLogAllBoundFreq 670 −0.007 0.997 −3.662 2.579Previous mention baseRep 597 0.307 0.736 0 6Bigram frequency logrBigram 567 8.759 3.024 0.000 14.413Neighborhood density PND 657 14.460 14.889 0 60

Categorical predictors N Levels

No. of cons. in rhyme consonants 670 0: 354 1: 261 2: 55Voicing isVoiced 670 yes: 87 no: 583Position in utterance position 670 mid: 582 final: 88Syntactic position boundary 670 yes: 232 no: 438Following segment follSound 580 (40 levels)

Explanatory variable N Levels

670 S: 223 PL: 97 3rdsg: 99 GEN: 90has: 47 is: 93 PL-GEN: 21

4 Results407

4.1 Model 1: Absolute duration as dependent variable408

Model 1 was fitted according to the procedure described above. The inspection of the409

residuals showed a non-normal distribution at one end of the distribution. Following410

standard procedures (e.g. Crawley 2002, Baayen and Milin 2010), we removed outliers411

(defined as items with standardized residuals exceeding −2.5 or +2.5) and refitted the412

model (the removal of outliers resulted in the loss of 2 % of the observations). The final413

model showed a satisfactory deistribution of residuals.414

In the final model we find significant effects of type of S (typeOfS), the position415

of the item (position), the number of consonants in the rhyme of the item’s final syl-416

lable (consonants), the speech rate (sylSec) and voicing (isVoiced). In addition,417

both the duration of the base (baseDuration) and the residualized number of syllables418

(resSylNum) have significant and independent effects on the length of S. Regarding the419

random effects, we tested speaker and base, but only speaker-specific effects turned out420

to significantly improve model performance.421

We also tested for interactions. Testing all covariates in interaction with type of S422

did not yield any significant interactions. We also tested an interaction between type of423

S and voicing, since one might suspect clitics not to participate in voicing assimilation to424

the same extent as it is found with morphemes. We finally tested an interaction between425

voicing and the number of consonants in the rhyme of the base-final syllable, since the426

13

presence of more consonants increases the distance between the S and the voiced nucleus427

of the syllable, which might lead to less assimilation. However, neither interaction turned428

out to be statistically significant.429

The p-values for the analysis of variance (or deviance) as well as the proportion of430

deviance explained for each fixed-effect of Model 1 are documented in table 3.5431

Table 3: p-values of fixed effects in Model 1, fitted to the Box-Cox-transformed durationsof S.

Df Sum Sq Mean Sq F value p-value expl.dev.(%)

typeOfS 6.00 0.19 0.03 37.26 0.000 16.12resSylNum 1.00 0.02 0.02 19.09 0.000 1.38position 1.00 0.20 0.20 229.57 0.000 16.55sylSec 1.00 0.01 0.01 8.38 0.004 0.60isVoiced 1.00 0.02 0.02 23.32 0.000 1.68baseDuration 1.00 0.01 0.01 12.70 0.004 0.92consonants 2.00 0.07 0.03 37.76 0.000 5.44

All effects are very highly significant, with the effect of position being strongest. The432

type of S is the second strongest predictor and can account for about 16 percent of the433

variance. The estimates and their p-values are documented in table 4. The reference434

levels for the categorial predictors are the following: for typeOfS it is non-morphemic435

S, for position it is final position, for isVoiced it is yes, and for consonants it is 0.436

All coefficents can be interpreted as changes relative to these reference levels.6437

5This table is based on the function pamer.fnc of the LMERConvenienceFunctions package (Tremblayand Ransijn 2013). Upper-bound p-values are computed by using as denominator degrees of freedom thenumber of data points minus number of fixed effects including the intercept). Upper-bound p-values areanti-conservative. Lower-bound p-values are computed by using as denominator degrees of freedom thenumber of data points minus number of fixed effects minus the number of random effects. The resultingp-values are conservative. Since the analysis yielded identical upper- and lower-bound p-values, the tablegives these p-values.

6The values are computed using the summary function of the lmerTest package. This package calcu-lates p-values and degrees of freedom based on Satterthwaite approximation for denominator degrees offreedom (SAS Institute Inc. 1978).

14

Table 4: Fixed-effect coefficients and p-values in a mixed-effects model fitted to the Box-Cox-transformed durations of S

Estimate Std. Error t value Pr(>|t|)(Intercept) 0.750 0.009 79.712 0.000morphplural -0.008 0.006 -1.407 0.160morph3rdsg -0.017 0.006 -2.983 0.003morphGEN -0.020 0.004 -4.744 0.000morphhas -0.028 0.005 -5.355 0.000morphis -0.022 0.005 -4.828 0.000morphPL-GEN -0.007 0.007 -0.909 0.364resSylNum -0.005 0.002 -2.373 0.018positionmiddle -0.051 0.004 -13.244 0.000sylSec -0.002 0.001 -2.439 0.015isVoicedunvoiced 0.020 0.004 5.604 0.000baseDuration 0.032 0.010 3.214 0.001consonants1 -0.015 0.003 -4.883 0.000consonants2 -0.042 0.005 -8.587 0.000

Figure 3 illustrates the effects of the covariates in Model 1.438

15

Figure 3: Partial effects of covariates in Model 1, fitted to the Box-Cox-transformedabsolute durations of S

The covariates show the expected behavior. In the upper row the leftmost panel439

shows that with regard to the number of syllables (resSylNum), we find the same440

compression effect for S in English as attested for Dutch and Swedish vowels. The more441

syllables, the shorter the S. The second panel (position) shows that in utterance-final442

position S is longer , which can be interpreted as a clear final-lengthening effect. The443

rightmost panel of the upper row gives us the effect of speech rate (sylSec). With faster444

speech, S becomes shorter. Moving on to the lower row, the leftmost panel indicates445

that, as expected, a voiced S is generally shorter than an unvoiced S. In the mid panel446

of the lower row we see the effect of baseDuration. Words that have a longer duration447

(while keeping syllable number constant) also have longer S’s, which means that final448

S participates in overall lengthening or shortening effects that affect the word as such.449

Finally, the more consonants we find in the rhyme the shorter final S becomes (lower row,450

rightmost panel), which is equally expectable.451

Let us now turn to the variable of interest, i.e. type of S, whose effect is plotted in452

Figure 4. The lines indicate the predicted means, the dots give the residuals.453

16

Figure 4: Partial effect of type of S, Model 1 (Abbreviations: s = non-morphemic S, GEN= genitive, PL-GEN = genitive plural)

We can see that non-morphemic S has the longest duration and the has clitic is454

shortest. Testing all pair-wise contrasts yields the significant contrasts shown in table 5.455

To be on the conservative side, we report the upper bounds of the p-values.456

Table 5: Significant contrasts in duration between different types of S. Significance codes:’***’ p<0.001 ’*’ p<0.01, ’*’ p<0.05

S PL 3RDSG GEN HAS IS PL-GEN

S x ** *** *** ***PL x ** *3RDSG xGEN xHAS x **IS x *PL-GEN x

Of the overall 21 possible pair-wise contrasts eight are significant. Non-morphemic457

S is significantly longer than all types of morphemic S apart from plural and genitive458

plural. Plural S is longer than all other types of morphemic S apart from plural genitive459

S. In addition to these contrasts we also find some significant contrasts among genitive460

S, 3rdsg and the has and is clitics.7461

7Model 1 does not take into account the potential effect of the following segment on the duration of

17

The results from Model 1 clearly indicate that the null hypotheses in (1) and (2)462

need to be rejected. We find robust differences between morphemic and (most) non-463

morphemic S on the one hand, and between some types of morphemic S on the other.464

These differences are present in natural conversational speech. They cannot be attributed465

to purely phonetic or lexical effects since these effects were carefully controlled for.466

4.2 Model 2: Relative duration467

An inspection of the distribution of the relative duration measurements showed three468

outliers where the S was extremely long compared to the rest of the items, with propor-469

tions being larger than 54 percent. These three items were removed. The distribution of470

the measurements of relative length of S was substantially skewed, which is something471

that can be frequently observed with variables that are bounded between 0 and 1. Beta472

regression is a kind of statistical model that can cope with these distributional properties.473

Beta regression models can contain two components, one that predicts the mean, and a474

component for the precision (or dispersion) phi (see Ferrari and Cribari-Neto (2004) for475

introduction and discussion). Broadly speaking, a predictor with a low precision coeffi-476

cient means that the beta regression model estimates the values of this predictor to be477

more dispersed around the coefficient’s mean than in the case of a predictor with a high478

precision coefficent.479

We fitted a beta regression model with the proportion of S as the dependent variable,480

and the same initial fixed effects as in Model 1 apart from baseDuration, since this481

variable had been used in the computation of the dependent variable. For the beta re-482

gression models a slightly different fitting strategy from that of mixed effects regression483

had to be adopted (see Ferrari and Cribari-Neto 2004 for details). The initial models484

started with only the mean component and all predictors. The initial model was then485

simplified in a step-wise procedure, removing all insignificant predictors. After the com-486

pletion of simplification for the mean component, we added the precision component and487

then simplified this component applying again the standard step-wise procedures. After488

completion of this simplification procedure we finally tested whether the inclusion of the489

precision component as a whole was justified by comparing of the AIC of a model that490

had only the mean component with the model with both mean and precision component,491

and by using a likelihood ratio test. The inclusion of the precision parameters was jus-492

tified by lower AIC scores and higher log-likelihood. The estimated coefficients for the493

means and the phi-values of the final model are given in table 6. The reference levels are494

the same as with Model 1, i.e. for typeOfS it is non-morphemic S, for position it is495

word-final consonants (e.g. Klatt 1976). Given that our data set included also utterance-final tokens (i.e.tokens without an immediately following segment), this variable could not be incorporated into Model1. In order to address this variable, we also fitted another model on the tokens in medial position, usingthe same set of predictors but replacing position by follSound (i.e. following sound). The same fixedeffects as in Model 1 emerged as significant in this new model apart from the number of syllables (andapart from the obvious exception of position). The nature of the following segment also had a significanteffect. In the model with following sound as random effect we find six significant pair-wise contraststhat overlap with the contrasts known from Model 1. There are the four contrasts of non-morphemic S,and for plural S we find two significant contrasts (plural-3rdsg, plural-is) and one marginally significantcontrast (plural-has). In a model fitted with follSound as fixed effect the same contrasts emerged assignificant.

18

final position, for consonant it is 0 and for isVoiced it is yes, so that all coefficents496

can be interpreted as changes relative to these reference levels. The bottom part of table497

6 documents the precision model, which outputs a significant effect for type of S and498

for the two covariates position and number of consonants.499

Table 6: Coefficients of final beta regression model, Model 2 (Pseudo R-squared of themodel: 0.2544)

Coefficients(Mean model with logit link)

Estimate Std. Error z p (> |z|)

(Intercept) -1.523340 0.096769 -15.742 <2e-16typeOfSplural -0.073045 0.064239 -1.137 0.25550typeOfS3rdsg -0.053469 0.074752 -0.715 0.47443typeOfSGEN -0.335864 0.069704 -4.818 1.45e-06typeOfShas -0.272441 0.090657 -3.005 0.00265typeOfSis -0.324614 0.072663 -4.467 7.92e-06typeOfSPL-GEN -0.209794 0.081780 -2.565 0.01031positionmiddle -0.231629 0.051308 -4.514 6.35e-06consonants1 -0.062879 0.038944 -1.615 0.10640consonants2 -0.388557 0.059769 -6.501 7.98e-11isVoicedunvoiced 0.201449 0.048540 4.150 3.32e-05logBaseFreq 0.050043 0.009779 5.118 3.09e-07resLogAllBoundFreq 0.060222 0.022638 2.660 0.00781

Phi coefficients(Precision model with log link)

Estimate Std.Error z p (> |z|)

(Intercept) 2.844906 0.164818 17.261 <2e-16typeOfSplural 0.165790 0.193006 0.859 0.39035typeOfS3rdsg 0.008172 0.191191 0.043 0.96591typeOfSGEN 0.519059 0.184837 2.808 0.00498typeOfShas 0.098614 0.240067 0.411 0.68124typeOfSis 0.452886 0.181490 2.495 0.01258typeOfSPL-GEN 0.628123 0.337476 1.861 0.06271positionmiddle 0.320562 0.162377 1.974 0.04836consonants1 0.430005 0.133978 3.210 0.00133consonants2 0.706434 0.225737 3.129 0.00175

With regard to our explanatory variable, Model 2 tells us both the means and the500

dispersion of the relative length of S are dependent on which kind of S we look at. This501

is fully in line with the results from Model 1.502

The behavior of the covariates in the mean model is partly different from, partly the503

same as their behavior in the model that tried to predict the absolute duration of S. The504

19

effects of position, number of consonants in the rhyme and voicing are found in both505

models and they go in the same direction as before. In middle position the proportion of506

S is smaller (see the negative coefficient), which means that the effect of final lengthening507

is especially prevalent with the final sound of the word. This is in line with the results of508

investigations studying the effect of final lengthening (e.g. Turk and Shattuck-Hufnagel509

2007). With regard to rhyme structure, the more consonants there are in the rhyme510

the shorter the proportion of S becomes (see the negative coefficients), which is again511

an expectable effect, even if restricted to the difference between no consonant and two512

consonants preceding S. Voiced tokens of S are shorter than unvoiced tokens, which is513

again an expected effect.514

Two new effects emerge, however, when looking at relative duration. Both base515

frequency and form frequency contribute to the relative duration of S in that higher516

frequency leads to a longer relative duration of S. This positive effect of frequency on517

relative duration (as computed in this paper) is in fact expected. The pertinent literature518

consistently finds shorter absolute durations for more frequent words. This is also the519

case in our data (rho=-0.39, p=0 for the correlation between log base frequency and the520

duration of the base). Since the duration of the base (added to the duration of the S) is521

in the denominator of the fraction used to compute relative duration (see again section522

3.2.1), a shorter duration in the denominator, i.e. the correlate of higher frequency, leads523

to a higher fraction, i.e. to higher relative duration measurements.524

Having controlled for the influences of the covariates, we now present the estimated525

means (obtained from the first component of the beta regression) and phi-coefficients526

(obtained from the precision component) for the different values of type of S in Table527

7.528

Table 7: Means and phi-coefficients of the different morphemes, final beta regressionmodel, Model 2

type of S mean phi

non-morphemic S 0.2538 17.1999plural 0.2402 20.30153rdsg 0.2438 17.3411genitive 0.1955 28.9035has 0.2057 18.9825is 0.1973 27.0528genitive-plural 0.2161 32.2342

Figure 5 shows the estimated beta distributions of relative durations for each type of529

S. The dots indicate the different means, and the shaded area indicates the dispersion530

around the mean. Non-morphemic S (which is the reference level in the beta regression)531

has the highest mean, and consequently, the mean coefficents for all other types of S are532

smaller (and thus negative) in Table 6.533

As can be gleaned from the lower part of Table 6, non-morphemic S also has the lowest534

precision estimate in Table 7 (the coefficients of all other types of S are positive), but535

the precision estimates of plural S and 3rd singular S do not differ significantly. This low536

20

precision is reflected in the very elongated dispersion areas for these types in the figure.537

Apparently, there is considerable variation around the mean for these types, notably more538

so than for other types of S which have higher estimates for precision.539

Figure 5: The effect of morpheme type on the relative duration of S (Abbreviations: s =non-morphemic S, GEN = genitive, PL-GEN = genitive plural)

If we pair-wise compare the different types of S in the mean component of Model 2, we540

find eleven significant contrasts, which is an even higher number than in Model 1 (which541

had eight significant contrasts). For these pairs, the beta regression estimates that the542

means of the relative duration differ significantly from each other. Table 8 summarizes543

the contrasts.544

Table 8: Significant contrasts in relative duration between different types of S. Significancecodes: ’***’ p<0.001 ’*’ p<0.01, ’*’ p<0.05

S PL 3RDSG GEN HAS IS PL-GEN

S PL 3RDSG GEN HAS IS PL-GENS x *** ** *** *PL x *** * ***3RDSG x *** ** ***GEN xHAS xIS xPL-GEN x

21

The pair-wise contrasts yield an interesting picture. In terms of relative duration,545

non-morphemic S, plural S, and 3rd person singular S form a group that differs from the546

group that is formed by the three clitics, i.e. genitive S, clitic has and clitic is.547

The former three are longer than the clitics, and there are no significant duration548

differences within each group. The mean of the genitive plural S lies between both549

groups, and differs significantly only from non-morphemic S and shows two marginally550

significant differences to the plural and 3rd singular (p=0.09 and p=0.07, respectively).551

To summarize, the analysis of the relative duration of S has shown there are many552

significant differences between different types of S, in particular between the S clitics and553

non-morphemic, plural and 3rd singular S. The results of Model 2 thus complement the554

results of Model 1, for which absolute duration served as the dependent variable. Even555

though relative duration and absolute duration behave somewhat different with regard556

to the influence of type of S, the null hypotheses in (1) and (2) needs to be rejected557

for both kinds of duration measurements.558

5 Discussion and conclusion559

This study started out from two null hypotheses. The first stated that there are no560

durational differences between non-morphemic S and morphemic types of S. The second561

null hypothesis stated that there are no durational differences between the different ho-562

mophonous types of morphemic S. However, no matter whether we looked at absolute563

duration of S or at the relative duration of S, the type of S emerged as a significant pre-564

dictor. Also, non-morphemic S differed consistently from some of the morphemic types565

of S. This means that we have to reject the two null hypotheses.566

Across models, non-morphemic S emerges as the longest S. The significance of indi-567

vidual differences between pairs of different types of S varies slightly between the two568

models, but many of the differences are robust across models.569

Our analyses of absolute and relative duration of the different types of morphemic S570

included pertinent co-variates that controlled for purely phonetic effects known from the571

phonetic literature. The co-variates behave as expected, so that one can assume that the572

speech corpus data used for this study are generally reliable. Thus, the results, surprising573

though they may be, cannot be easily dismissed as due to an unsuitable sample. But574

how can these results then be understood?575

Let us first discuss the difference between morphemic S and non-morphemic S (Null576

Hypothesis 1). Our results concerning the absolute duration of S seem to agree with577

Walsh & Parker (1983), who also found differences between the two types of S. Such a578

statement would, however, be premature. As these authors did not employ any inferential579

statistics, nor consider any co-variates in their analysis, we cannot determine whether580

the small differences found in the means by Walsh & Parker are meaningful. It may be581

noteworthy, however, that the mean difference observed by Walsh & Parker goes in the582

opposite direction from ours: in their data, morphemic S is longer, not shorter, as it is583

in our sample.584

If we take this discrepancy between their data and ours seriously, we may speculate585

about the reasons for the discrepancy. One reason might be the different kinds of item.586

Thus, in Walsh & Parker’s experimental study, morphologically complex words were587

22

compared with their simplex homophones, while in our data we tested the duration of S588

across the board, i.e. across many different, non-homophonous, types. Furthermore, in589

the experiment the participants read out the stimuli, so that the experimental procedure590

may have influenced the results in a way that we do not find in spontaneous speech.591

Gahl et al. (2012), for example, discusses studies that have failed to reproduce the well-592

established effect that more frequent words generally show shorter duration and remarks593

that this failure may be due to the fact that speakers tend to read at a regular pace when594

asked to read word lists or words in short carrier phrases. This tendency has been found595

to override effects of lexical properties, such as word frequency.596

Let us turn to Null Hypothesis 2, which concerns suffix homophony. Following in the597

footsteps of recent studies by, for example, Gahl et al. (2012), Drager (2011), we tested the598

null hypothesis that there is no difference between phonologically homophonous suffixes in599

English. In particular, we investigated whether the morpheme type (i.e. plural, genitive,600

plural, genitive, 3rd person singular, cliticized has or cliticized is) has an influence on601

the acoustic duration of the segment.602

Both models found significant differences between some of the morphemes, but differ603

somewhat as to which morphemes differ from which other ones. In terms of absolute604

duration general, plural S is longest, and the auxiliary clitics are shortest, with a number605

of additional differences between individual pairs of morphemes. In terms of relative606

duration, non-morphemic, plural and 3rd singular S differ from the three clitics. We607

can therefore safely say that the Null Hypothesis 2 also needs to be rejected. Like in the608

afore-mentioned studies of whole lexemes that are taken to be homophones, homophonous609

bound morphemes can show systematic acoustic differences.610

What is especially striking is the patterning of the relative durations. The types of611

S that have a supposedly stronger boundary, i.e. the three clitics (genitive, has and612

is), show shorter relative duration than the types of S that have a weaker boundary (i.e.613

plural and 3rd person singular) or no boundary (i.e. non-morphemic S). These facts could614

be interpreted in such a way that morphological boundary strength may have a rather615

direct correlate in the acoustic signal.616

Our findings on the phonetics of S are in line with other studies that have produced617

evidence that lexical representations may include fine phonetic information, and that this618

information is used by speakers in word recognition (e.g. Gahl (2008), Johnson (2004),619

Kemps et al. (2005a), Salverda et al. (2003)).620

If there are systematic differences between the different types of S in speech produc-621

tion, one would like to know whether language users are influenced by these differences in622

perception. A natural extension of the present study would therefore be the investigation623

of S with listeners. The difference in absolute duration between the observed means of624

the shortest morphemic S (i.e. the has clitic) and non-morphemic S in our corpus data625

amounts to c. 44 ms, with the means of duration of S varying between 62 and 106 ms626

(difference as predicted by the model: 26 ms, range predicted by the model: 93-119ms).627

These differences seem large enough to be potentially perceptible: the perceptual thresh-628

old for durational differences in fricatives has been estimated at about 25 to 30 ms (e.g.629

Klatt & Cooper 1975, Schatzman & McQueen 2006).630

Our results raise important theoretical questions. Given that we find differences631

between different types of S, it is unclear why such differences would emerge in the first632

place. More specifically, one is tempted to ask the question why a certain type of S633

23

should be longer than some other S, and not shorter. For example, why are the clitics634

particularly short and plural S is rather long? Which theory would predict that?635

At a very general level, our findings indicate that there is morphological information in636

the phonetic signal. This means that morphological information is present in post-lexical637

stages of speech production. This calls into question the distinction between lexical and638

post-lexical phonology, which has featured prominently in both theoretical linguistics, i.e.639

phonology, and psycholinguistics.640

In phonology, there is a long tradition of theoretical mechanisms like bracket erasure641

and cyclic application of morpho-phonological rules, after whose application there is no642

possibility that one could trace any information about a sound’s origin or structural643

status in the acoustic signal. The findings of this paper thus seriously challenge central644

tenets of the traditional models of Lexical Phonology and Morphology in the wake of645

Kiparsky (1982) (see also, for example, Coetzee and Pater 2011 and Scheer (2010) for646

critical discussion).647

In psycholinguistics, well-established models of speech production and the mental lex-648

icon are equally unable to accomodate our findings. Levelt et al. (1999), for example,649

assume that phonological representations are composed discrete segments and syllables,650

and the articulator module makes use of pre-programmed gestures that are stored in a syl-651

labary (Levelt et al., 1999:5). The articulator cannot provide a pre-programmed gesture652

for each syllable of a language if different meanings cause differences in these gestures. In653

other words, in such models morphologically dependent subphonemic detail is not part654

of these representations and needs therefore be accounted for by purely phonetic factors655

that influence articulatory implementation such as speech rate (e.g. Levelt (1989)). For656

our data, such an account is ruled out.657

There is, however, another line of psycholinguistic research that has looked at dis-658

tinct processing properties of different kinds of morphemes. Using data from aphasia,659

second language acqusition and code switching Myers-Scotton and Jake (2000) propose660

a four-way distinction between different types of morpheme (the so-called ‘4-M model’).661

According to the 4-M model there are content morphemes and systems morphemes, and662

the system morphemes are further subdivided into ‘early system morphemes’ and ‘late663

system morphemes’. These classes are not all very well defined, but it seems clear that664

plural belongs to the early system morphemes while genitive and 3rd person singular665

marking belong to the late system morphemes. The late system morphemes further666

consist of two subclasses, ‘bridge system morphemes’ and ‘outsider system morphemes’.667

According to Myers-Scotton and Jake (2000) genitive ‘s is a bridge morpheme, while668

3rd person singular is an outsider morpheme. The proposed four-way classification of669

morphemes is supported by the differential behavior of these morphemes in aphasia, sec-670

ond language acquisition and code-switching, and these authors relate the differences in671

behavior to differences in lexical access. For content and early system morphemes, lexical672

access happens at the lemma level, for the late system morphemes at what they call the673

‘functional level’, which is located at the level of the formulator (see, for example, Levelt674

1989).675

How can this model account for our results? Even if we assume that Myers-Scotton676

& Jake’s taxonomy is on the right track, there are severe problems. The first one is677

that their model does not say anything about the details of phonetic implementation,678

and it is unclear what the model would predict for this stage of speech production. If679

24

the 4-M carries over to articulation (in ways still to be investigated), we would expect680

plural (being an early system morphemes) to pattern with the auxiliary clitics, which is681

obviously not the case. What is more, the 4-M model seems to rest on the assumption682

that the morphemes are accessed separately from their bases. More recent research in683

morphological processing has cast serious doubt on this idea, and the jury is still out on684

the question of the psycholinguistic status of individual inflectional morphemes.685

In can thus be stated that both phonological theory and many psycholinguistic models686

fail to provide a convincing explanation of the kind of morphologically-induced phonetic687

variation that we find in our data. Exemplar-based models (e.g. Bybee 2001,Gahl and688

Yu 2006, Goldinger 1998, Pierrehumbert 2001, 2002, Johnson 2004) seem to be better689

equipped to deal with the kind of variation we find in our data. In such models a690

given word is linked to a frequency distribution over phonetic outcomes, as found in the691

environment in the speaker. These distributions are updated with new experiences, and692

subtle subphonemic differences in these experiences may result in representations that693

reflect these properties. For example, Pierrehumbert (2002) demonstrates how such a694

model can deal with the phonetic variability of lexemes. It is conceivable that similar695

models can be implemented to account for the subtle phonetic properties of different696

bound morphemes. The details of such an account still need to be worked out in future697

studies, however, as previous exemplar-based approaches have not included the subtle698

phonetic differences involved in the differentiation of allegedly homophonous affixes.699

To summarize, this paper presents the first study that has systematically investigated700

the relationship of morphemic status and phonetic implementation. This was done using701

natural conversation data. The analysis has yielded important evidence on the question702

of affix homonymy, revealing that phonologically homophonous bound morphemes can703

be phonetically distinct, and that morphemic and non-morphemic S may differ, too.704

This is unpredicted by current linguistic and psycho-linguistic theories of the lexicon and705

grammar. Further studies are certainly called for to replicate the observed effects, and to706

develop new models of the mental lexicon and of the relationships between morphology,707

phonology, and phonetic implementation.708

References709

Baayen, R. H., Davidson, D. J., and Bates, D. M. (2008). Mixed-effects modeling with710

crossed random effects for subjects and items. Journal of Memory and Language,711

59(4):390–412.712

Baayen, R. H. and Milin, P. (2010). Analyzing reaction times. International Journal of713

Psychological Research, 3(2):12–28.714

Baker, R., Smith, R., and Hawkins, S. (2007). Phonetic differences between mis-and dis-715

in english prefixed and pseudo-prefixed words. In Proceedings of the 16h International716

Congress of Phonetic Sciences.717

Bates, D., Maechler, M., Bolker, B., and Walker, S. (2014). lme4: Linear mixed-effects718

models using eigen and s4.719

25

Bauer, L., Lieber, R., and Plag, I. (2013). The Oxford reference guide to English mor-720

phology. Oxford University Press, Oxford.721

Bell, A., Brenier, J. M., Gregory, M., Girand, C., and Jurafsky, D. (2009). Predictability722

effects on durations of content and function words in conversational english. Journal723

of Memory and Language, 60(1):92–111.724

Berkovits, R. (1993). Progressive utterance-final lengthening in syllables with final frica-725

tives. Language and Speech, 36(1):89.726

Boersma, P. and Weenink, D. J. (2013). Praat. doing phonetics by computer (version727

4.3. 14)(2005).728

Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations. Journal of the729

Royal Statistical Society, Series B, 26(2):211–252.730

Bybee, J. L. (2001). Phonology and language use. Cambridge University Press, Cam-731

bridge.732

Byrd, D., Krivokapic, J., and Lee, S. (2006). How far, how long: On the temporal scope733

of prosodic boundary effects. Journal of the Acoustic Society of America, 120(3):1589–734

1599.735

Cho, T. (2001). Effects of morpheme boundaries on intergestural timing: Evidence from736

korean. Phonetica, 58:129–162.737

Chomsky, N. and Halle, M. (1968). The sound pattern of English. Harper and Row, New738

York.739

Coetzee, A. W. and Pater, J. (2011). The place of variation in phonological theory. The740

Handbook of Phonological Theory, Second Edition, pages 401–434.741

Cooper, W. E. (1976). Syntactic control of timing in speech production: A study of742

complement clauses. Journal of Phonetics, 4(2):151–171.743

Cooper, W. E., Paccia, J. M., and Lapointe, S. G. (1978). Hierarchical coding in speech744

timing. Cognitive Psychology, 10(2):154–177.745

Crawley, M. J. (2002). Statistical computing. An introduction to data analysis using746

S-Plus. John Wiley, Chichester.747

Cribari-Neto, F. and Zeileis, A. (2010). Beta regression in r. Journal of Statistical748

Software, 34(2):1–24.749

Davies, M. (2008). The corpus of contemporary american english: 400+ million words,750

1990-present.751

Drager, K. (2011). Sociophonetic variation and the lemma. Journal of Phonetics,752

39(4):694–707.753

Ernestus, M. and Warner, N. (2011). An introduction to reduced pronunciation variants.754

Journal of Phonetics, 39:253–260.755

26

Ferrari, S. and Cribari-Neto, F. (2004). Beta regression for modelling rates and propor-756

tions. Journal of Applied Statistics, 31(7):799–815.757

Fowler, C. A. (1988). Differential shortening of repeated content words produced in758

various communicative contexts. Language and Speech, 31:307–317.759

Fowler, C. A. and Housum, J. (1987). Talkers’ signalling of new and old words in speech760

and listeners’ perception and use of the distinction. Journal of Memory and Language,761

26:489–504.762

Fromont, R. and Hay, J. (2008). Onze miner: the development of a browser-based research763

tool. Corpora, 3(2).764

Gahl, S. (2008). ¡i¿time¡/i¿ and ¡i¿thyme¡/i¿ are not homophones: The effect of lemma765

frequency on word durations in spontaneous speech. Language, 84(3):474–496.766

Gahl, S., Yao, Y., and Johnson, K. (2012). Why reduce? phonological neighborhood den-767

sity and phonetic reduction in spontaneous speech. Journal of Memory and Language,768

66(4):789–806.769

Gahl, S. and Yu, A. C. L., editors (2006). Special Issue on Exemplar-based Models in770

Linguistics, volume 23.771

Goldinger, S. D. (1998). Echoes of echos? an episodic theory of lexical access. Psycho-772

logical Review, 105(2):251–279.773

Hanique, I. and Ernestus, M. (2012). The role of morphology in acoustic reduction.774

Lingue e Linguaggio, 11:147–164.775

Hanique, I., Ernestus, M., and Schuppler, B. (2013). Informal speech processes can be776

categorical in nature, even if they affect many different words. The Journal of the777

Acoustical Society of America, 133(3):1644–1655.778

Herberich, E., Sikorski, J., Hothorn, T., and Rapallo, F. (2010). A robust procedure for779

comparing multiple means under heteroscedasticity in unbalanced designs. PLoS ONE,780

5(3):e9788.781

Jaccard, J., Wan, C. K., and Turrisi, R. (1990). The detection and interpretation of782

interaction effects between continuous variables in multiple regression. Multivariate783

Behavioral Research, 25(4):476–478.784

Johnson, K. (2004). Massive reduction in conversational american english. In Spontaneous785

speech: data and analysis. Proceedings of the 1st session of the 10th international786

symposium, pages 29–54, Tokyo and Japan.787

Jurafsky, D., Bell, A., Gregory, M., and Raymond, W. D. (2001). Probabilistic relations788

between words: Evidence from reduction in lexical production. In Bybee, J. and789

Hopper, P. J., editors, Frequency and the emergence of linguistic structure, pages 229–790

254. Benjamins, Amsterdam.791

27

Kemps, R., Ernestus, M., Schreuder, R., and Harald Baayen, R. (2005a). Prosodic cues792

for morphological complexity: The case of dutch plural nouns. Memory & Cognition,793

33(3):430–446.794

Kemps, R., Wurm, L. H., Ernestus, M., Schreuder, R., and Baayen, H. (2005b). Prosodic795

cues for morphological complexity in dutch and english. Language and Cognitive Pro-796

cesses, 20:43–73.797

Kiparsky, P. (1982). Lexical morphology and phonology. In Yang, I.-S., editor, Linguistics798

in the Morning Calm: Selected Papers from SICOL, pages 3–91. Hanshin, Seoul.799

Klatt, D. H. (1976). Linguistic uses of segmental duration in english: Acoustic and800

perceptual evidence. Journal of the Acoustical Society of America, 59(5):1208–1221.801

Kuznetsova, A., Brockhoff, P. B., and Bojesen Christensen, R. H. (2014). lmertest.802

Levelt, W. J. M. (1989). Speaking. From intention to articulation. The MIT Press,803

Cambridge and Mass.804

Levelt, W. J. M., Roelofs, A., and Meyer, A. S. (1999). A theory of lexical access in805

speech production. Behavioral and Brain Sciences, 22:1–38.806

Levelt, W. J. M. and Wheeldon, L. R. (1994). Do speakers have access to a mental807

syllabary. Cognition, 50:239–269.808

Lindblom, B. (1963). Spectrographic study of vowel reduction. The Journal of the809

Acoustical Society of America, 35(11):1773–1781.810

Marian, V., Bartolotti, J., Chabal, S., Shook, A., and White, S. A. (2012). Clearpond:811

Cross-linguistic easy-access resource for phonological and orthographic neighborhood812

densities. PLoS ONE, 7(8):e43230.813

Myers-Scotton, C. and Jake, J. (2000). Four types of morpheme: evidence from aphasia,814

code switching, and second-language acquisition. Linguistics, 38(6; ISSU 370):1053–815

1100.816

Nooteboom, S. G. (1972). Production and perception of vowel duration: A study of the817

durational properties of vowels in Dutch. University of Utrecht, Utrecht.818

Oh, G. E. and Redford, M. A. (2012). The production and phonetic representation of819

fake geminates in english. Journal of Phonetics, 40(1):82–91.820

Oller, D. K. (1973). The effect of position in utterance on speech segment duration in821

english. Journal of the Acoustic Society of America, 54:1235–1247.822

Pierrehumbert, J. B. (2001). Exemplar dynamics: Word frequency, lenition and contrast.823

In Bybee, J. and Hopper, P. J., editors, Frequency and the emergence of linguistic824

structure, pages 137–158. Benjamins, Amsterdam.825

Pierrehumbert, J. B. (2002). Word-specific phonetics. In Gussenhoven, C. and Warner,826

N., editors, Laboratory Phonology VII, pages 101–140. Mouton de Gruyter, Berlin.827

28

Pitt, M. A., Dilley, L., Johnson, K., Kiesling, S., Raymond, W., Hume, E., and Fosler-828

Lussier, E. (2007). Buckeye corpus of conversational speech (2nd release). Columbus,829

OH: Department of Psychology, Ohio State University.830

Pluymaekers, M., Ernestus, M., and Baayen, H. (2005a). Articulatory planning is con-831

tinuous and sensitive to informational redundancy. Phonetica, 62:146–159.832

Pluymaekers, M., Ernestus, M., and Baayen, R. H. (2005b). Lexical frequency and833

acoustic reduction in spoken dutch. Journal of the Acoustical Society of America,834

118(4):2561–2569.835

Pluymaekers, M., Ernestus, M., Baayen, R. H., and Booij, G. (2010). Morphological836

effects in fine phonetic detail: The case of dutch -igheid. In Fougeron, C., editor,837

Laboratory phonology 10, Phonology and phonetics, pages 511–531. Mouton de Gruyter,838

Berlin and New York.839

R Development coreteam (2011). R: A language and environment for statistical comput-840

ing.841

Salverda, A. P., Dahan, D., and McQueen, J. M. (2003). The role of prosodic boundaries842

in the resolution of lexical embedding in speech comprehension. Cognition, 90:51–89.843

SAS Institute Inc. (1978). Sas technical report r-101: Tests of hypotheses in fixed-effects844

linear models.845

Scheer, T. (2010). A guide to morphosyntax-phonology interface theories: how extra-846

phonological information is treated in phonology since Trubetzkoy’s Grenzsignale. Wal-847

ter de Gruyter.848

Torreira, F. and Ernestus, M. (2009). Probabilistic effects on french [t] duration. In849

INTERSPEECH, pages 448–451.850

Torsten Hothorn, Frank Bretz, and Peter Westfall (2008). Simultaneous inference in851

general parametric models. Biometrical Journal, 50(3):346–363.852

Tremblay, A. and Ransijn, J. (2013). Lmerconveniencefunctions.853

Turk, A. E. and Shattuck-Hufnagel, S. (2007). Multiple targets of phrase-final lengthening854

in american english words. Journal of Phonetics, 35(4):445–472.855

Turk, A. E. and White, L. (1999). Structural influences on accentual lengthening in856

english. Journal of Phonetics, 27:171–206.857

Venables, W. N. and Ripley, B. D. (2002). Modern Applied Statistics with S-Plus.858

Springer, New York, 4 edition.859

Zeileis, A. (2004). Econometric computing with hc and hac covariance matrix estimators.860

Journal of Statistical Software, 11(10):1–17.861

Zeileis, A. (2006). Object-oriented computation of sandwich estimators. Journal of862

Statistical Software, 16(9):1–16.863

29

Documents

Homophony and morphology: The acoustics of …...Homophony and morphology: The acoustics of word- nal S in English Ingo Plag, Julia Homann & Gero Kunter September 6, 2014 1 Abstract