Journal of Phonetics - University of Kansas

Structure-dependent tone sandhi in real and nonce disyllables in Shanghai WuContents lists available at ScienceDirect
Journal of Phonetics
E-m 1 In
indicates coda, in
journal homepage: www.elsevier.com/locate/phonetics
Research Article
Structure-dependent tone sandhi in real and nonce disyllables in Shanghai Wu
Jie Zhang n, Yuanliang Meng
Department of Linguistics, The University of Kansas, USA
A R T I C L E I N F O
Article history: Received 20 August 2014 Received in revised form 8 October 2015 Accepted 13 October 2015
Keywords: Tone Tone sandhi Shanghai Wu Productivity Growth curve analysis
70/$ - see front matter & 2015 Elsevier Ltd. All rig .doi.org/10.1016/j.wocn.2015.10.004
espondence to: Department of Linguistics, The 85 864 5724. ail address: [email protected] (J. Zhang). Chao numbers, a speaker’s tonal range from low a rising tone in the low range (Chao, 1948, 1968) the case of Shanghai, a glottal stop . onetically, the “voiced” stops in Shanghai are no howed that the voiced category has acoustic prop 2011) as well as a shorter closure duration than
A B S T R A C T
Disyllabic sequences in Shanghai Wu undergo different types of tone sandhi depending on their structure: phonological words (e.g., modifier–nouns) spread the initial tone across the disyllable, while phrases (e.g., non- lexicalized verb–nouns) maintain the final tone and level the contour of the nonfinal tone. We investigated the productivity of the two tone sandhi types through 48 speakers’ productions of real and nonce disyllables. Our results show that (a) the word-level tone sandhi in Shanghai indeed involves tone spreading, while the phrase- level sandhi is better interpreted as phonetic contour reduction, (b) the spreading sandhi generally applies productively to nonce words, but there are some differences in tone production between real and nonce words that are attributable to both categorical non-application and gradient application of the sandhi in nonce words, and (c) the structure dependency of Shanghai tone sandhi is also productive, as the speakers produced qualitatively different f0 patterns in modifier–noun nonce words and verb–noun nonce phrases. These results indicate that in order to arrive at a full picture of tone sandhi patterning, experimental data that shed light on the generalizations that speakers make from the speech input are necessary.
& 2015 Elsevier Ltd. All rights reserved.
1. Introduction
1.1. Tone and tone sandhi in Shanghai Wu
Shanghai is a Northern Wu dialect of Chinese spoken in a major metropolis in eastern China with a population of 23.5 million (2010 census data, from http://www.stats-sh.gov.cn/). Like other dialects of Chinese, Shanghai Wu is tonal, but two properties of Shanghai differentiate its tone system from the more familiar four-tone system of Mandarin. First, Shanghai has retained the historical checked syllables (syllables closed by a stop, realized in Shanghai as CV) that Mandarin has lost. These syllables have considerably shorter duration than open or sonorant-closed syllables and a reduced tonal inventory: there are three tones on open or sonorant-closed syllables, transcribed by Xu, Tang, and Qian (1981) in Chao numbers (Chao 1948, 1968) as 53 (T1), 34 (T2), and 13 (T3); but on CV syllables, there are only two phonetic tones 55 (T4) and 12 (T5).1 Second, Shanghai, like many Wu dialects of Chinese, has maintained the historical voicing/phonation distinction in syllable onsets, and the cooccurrence restriction between voicing/phonation and f0, which led to the yin-yang tone split in many Chinese dialects (Karlgren, 1915–1926; Haudricourt, 1954; Pulleyblank, 1978; Yip, 1990, among many others), is still synchronically relevant for Shanghai: the higher tones 53, 34, and 55 (the historical yin tones in Chinese) only occur after voiceless obstruents and modal sonorants and the lower tones 13 and 12 (the historical yang tones) only occur after voiced obstruents and murmured sonorants.2
hts reserved.
University of Kansas, 1541 Lilac Lane, Blake Hall, Room 427, Lawrence, KS 66045-3129, USA. Tel.: +1 785 864 2879;
to high is represented by a numerical scale from “1” to “5.” Contour tones are denoted by number concatenations; e.g., “13” . In the tradition of Chinese dialectology, we also use an underline to indicate tones that occur on syllables closed by an obstruent
t realized with typical closure voicing, but were described as “voiceless with voiced aspiration” by Chao (1967). More recent erties of breathy phonation such as higher H1–H2 (Cao & Maddieson, 1992; Ren, 1992; Chen, 2011; Gao, Hallé, Honda, Maeda, the voiceless category (Shen & Wang, 1995; Wang, 2011). On fricatives, the voicing distinction is truly reflected in voicing. On
J. Zhang, Y. Meng / Journal of Phonetics 54 (2016) 169–201170
Fig. 1 illustrates the five phonetic tones in Shanghai and their cooccurrence with syllable types and onsets. The data came from one female speaker, who read eight monosyllabic morphemes for each tone one time in isolation. The f0 values of the tones were measured using the ProsodyPro script (Xu, 2005–2011) in Praat (Boersma & Weenink, 2009), and the values in Hz were first converted into semi-tone, and then z-score transformed. For more details of the stimuli and data analysis, see Section 2.
Like in the majority of Chinese dialects, tones in Shanghai participate in tone sandhi depending on the context in which they appear. Comprehensive descriptions of Shanghai tone sandhi in disyllables appeared in Sherard (1972), Zee and Maddieson (1980), Shen (1981), Xu et al. (1981), Xu & Tang (1988), and Zhu (1999, 2006). Two properties of Shanghai tone sandhi are particularly noteworthy. First, the sandhi pattern that occurs in compounds is the so-called “left-dominant sandhi” (Yue-Hashimoto, 1987; Chen, 2000; Zhang, 2007, 2014), which spreads the tone of the initial syllable across the entire word. Examples (1a) and (1b) show that the surface tones of the compounds “to catch a cold” and “popsicle,” 55-31 and 22-44, are derived by spreading the base tones of the initial syllables, 53 and 13, over the disyllables, respectively. This is a notably different pattern from the more familiar third tone sandhi in Mandarin whereby a T3 (213) changes into a T2 (35) before another T3.3 Yue-Hashimoto (1987) and Zhang (2007) termed the Mandarin-type tone sandhi “last-syllable dominant” and “right-dominant,” respectively, and showed from typological data that there is an asymmetry in how the sandhi behaves based on directionality, in that left-dominant sandhi tends to involve the extension of the initial tone rightward, while right-dominant sandhi tends to involve local or paradigmatic tone change. Shanghai and Mandarin, therefore, represent a typical pattern in their respective sandhi directionality.
(footnote continued) sonorants, the modal-murmured dis Zhu (1999), who transcribed the so
3 Acoustically, the third tone sa a sandhi T3 and a base T2 cannot
tinction, which corresponds to the norant distinction as CC and
ndhi in Mandarin does not involve be reliably perceived by native s
voiceless-voiced distinction in obstruents, is only reported by a qCC, respectively. We use this transcription practice here
complete neutralization (Peng, 2000; Yuan & Chen, 2014; am peakers (Peng, 2000).
(1)
“to catch a cold” (Xu et al., 1981: p. 151)
b.
b13
“stick”
pin53
“ice”
“popsicle” (Xu et al., 1981: p. 153)
Second, tone sandhi in Shanghai is sensitive to the morphosyntactic structure of the disyllabic sequence. According to Xu et al. (1981) and Xu and Tang (1988), modifier–noun combinations are invariably compounds and can only undergo left-dominant sandhi. Verb–noun, verb-modifier, subject–predicate combinations and coordinate structures that are less lexicalized and have lower frequency of occurrence, however, can undergo right-dominant sandhi, which retains the tone of the final syllable and reduces the tonal contour of the nonfinal syllable. The effects of syntactic structure and frequency of occurrence on Shanghai tone sandhi are illustrated by the examples in (2). In (2a), the same morphemes for “to fry” and “rice”, when concatenated as a modifier–noun compound “fried rice,” undergo left-dominant contour extension, but when concatenated as a verb–noun phrase “to fry rice”, may undergo either left-dominant contour extension or right-dominant contour reduction. In (2b), the verb “to pull” is concatenated with three different nouns – “river”, “grass”, and “tree”, which form an idiomatic expression for “tug-of-war”, a commonly used phrase “to pull out grass; to weed”, and a rarely used phrase “to pull out a tree”, respectively, and the tone sandhi patterns for these three concatenations are left-dominant only, variable left-dominant or right-dominant, and right-dominant only, respectively.
subset of the res .
ong others). But t
he small acoustic difference between
Ta L
J. Zhang, Y. Meng / Journal of Phonetics 54 (2016) 169–201 171
(2)
Left-dominant sandhi:
σ2¼CV or CVN
53-X-55-31 34-X-33-44 13-X-22-44 55-X-33-44 12-X-11-13
The effects of syntactic structure and frequency in Shanghai tone sandhi:
a.
ts34
σ2¼CV
“to fry”
b.
b12
/b12-z13/-[b22-z13]
(Xu et al., 1981: p. 148)
The complete patterns of left-dominant and right-dominant sandhis in Shanghai reported in Xu et al. (1981) are summarized in Table 1. Three observations can be made regarding the left-dominant sandhi. First, the tone on the second syllable is entirely determined by the tone on the first syllable and hence completely loses its contrastive status. Second, when the second syllable is open or closed by a nasal, the spreading pattern can be separated into two types depending on the tone on the first syllable: for Tones 1 to 4, the contour on the first syllable is extended across the disyllable, which can be termed contour extension; for Tone 5, however, the contour tone on the first syllable is displaced onto the second syllable, which can be termed contour displacement (see also Zhu, 1999). Third, when the second syllable is CV, only level tones appear on the surface. For right-dominant sandhi, the general pattern is that the first syllable loses the tonal contour while maintaining the overall tone height, and Tones 1 (53) and 2 (34) are neutralized to 44.
Xu et al. (1981) argued that the left- vs. right-dominant sandhi directionality is determined by whether the disyllable forms a phonological word, and subsequent phonological analyses of Shanghai tone sandhi and prosodic domains (e.g., Selkirk & Shen, 1990; Duanmu, 1995) have adopted this position, but often assumed that phrases simply do not undergo tone sandhi and right- dominant sandhi only represents phonetic reduction of the nonfinal tones.
1.2. Goals of the current study
A goal of the current study is to provide an acoustic investigation of the two unique properties of Shanghai tone sandhi: rightward tone spreading and structure dependency. Descriptively, we aim to provide acoustic details of both left-dominant and right-dominant tone sandhi in Shanghai in order to (a) verify the spreading property of the left-dominant sandhi reported in earlier literature and (b) shed light on the nature of right-dominant sandhi – is it better interpreted as phonological leveling with prespecified, neutralized level targets or phonetic contour reduction? In so doing, the study offers a comprehensive acoustic description of disyllabic tone sandhi in a Chinese language with purported bidirectionality, a task hitherto rarely attempted (but see Takahashi, 2013, reviewed below).
But more importantly, the study aims to go beyond the sandhi patterns observed in existing words and phrases in Shanghai and test the productivity of both rightward spreading and structure dependency in Shanghai using a nonce-probe test (“wug” test; Berko, 1958). The productivity of a linguistic process refers to its ability to apply to new items (Bybee, 2001: pp.12–13). The understanding of productivity is important to theoretical linguistics as it provides crucial evidence about the generalizations and cognitive abstractions that speakers make and hence directly addresses the issue of grammar in the sense of the tacit knowledge of the speaker (Bybee, 2001, Pierrehumbert, 2003, among many others). In the realm of phonology, productivity is a particularly timely issue as recent experimental research has shown that the speakers’ phonological knowledge as reflected in productivity patterns is often not identical to the lexical patterns of the language in question (e.g., Zuraw, 2007; Berent, Steriade, Lennertz, & Vaknin, 2007; Hayes, Zuraw, Siptár, & Londe, 2009; Becker, Ketrez, & Nevins, 2011; Hayes & White, 2013). One factor that has been shown to affect productivity is the phonetic basis of the phonological process. For instance, Zuraw (2007) showed through a corpus study on loans and a web- based survey on nonce words that Tagalog speakers possessed knowledge of the splittability of word-initial consonant clusters that was informed by perception, but could not be deduced from lexical statistics. Hayes et al. (2009) tested Hungarian speakers’
Right-dominant sandhi:
53-X-44-X 34-X-44-X 13-X-33-X 55-X-44-X 12-X-22-X
knowledge of suffixal vowel harmony through a nonce probe test and showed that although the speakers learned both phonetically natural (suffixed vowels correlated with properties of stem vowels) and unnatural patterns (suffixal vowels correlated with properties of the stem-final consonant), the unnatural patterns were undervalued and learned less robustly than the natural ones. Specific to the rightward spreading tone sandhi in Shanghai, two hypotheses can be made regarding its productivity. First, based on the crosslinguistic prevalence of progressive, assimilatory tonal coarticulation (Mandarin: Xu, 1997; Tianjin: Zhang & Liu, 2011; Taiwanese Southern Min: Peng, 1997; Malaysian Southern Min: Chang & Hsieh, 2012; Vietnamese: Han & Kim, 1974; Brunelle, 2009; Thai: Gandour, Potisuk, & Dechongkit, 1994; Potisuk, Gandour, & Harper, 1997),4 Zhang (2007) argued that left-dominant spreading sandhi is conceivably a phonologized result of it. The strong affinity between rightward spreading sandhi and progressive assimilatory coarticulation predicts that the spreading sandhi should be overall productive, and this hypothesis will be tested by the comparison of the spreading sandhi application between real and nonce words. Second, we expect a productivity difference between contour extension and contour displacement in that the latter would be less productive due to its more distant affinity with progressive coarticulation.
Our other goal is to test the hypothesis that the structure sensitivity of the sandhi is productive – a hypothesis rooted in the productivity of morphosyntactic combinations. We test this hypothesis by comparing the tonal realization between two types of disyllabic nonce items – modifier–noun combinations, which form words and are expected to undergo left-dominant sandhi, and verb– noun combinations, which should form phrases due to their nonce nature and hence undergo right-dominant sandhi. If corroborated, this hypothesis will lend direct support to the interface analysis between syntactic structure and prosodic domain in Shanghai (Selkirk & Shen, 1990; Duanmu, 1995). The nonce-probe test used in the comparison may also serve as an additional method that offers empirical evidence for theoretical analyses of prosody–syntax interface in general.
2. Previous literature
2.1. Acoustic studies on Shanghai disyllabic tone sandhi
Despite the relative prestige that Shanghai Wu enjoys as one of the largest dialects of Chinese, there has been relatively little experimental data on the tone sandhi pattern of the dialect. Zee and Maddieson (1980), Toda (1990), Zhu (1999), Chen (2011), and Takahashi (2013) are the precious few exceptions. We restrict ourselves to a review of the disyllabic sandhi pattern – the focus of our study – in these works. Zee and Maddieson (1980) recorded one female speaker and found that the f0 contour of disyllabic compounds was similar in shape to that of the first syllable of the compound. But when the first syllable was a checked syllable with a low rising tone, the rising contour was realized on the second syllable of the disyllable, whose sandhi pattern was analyzed as [L- LM↑], where M↑ indicates a raised Mid. Toda (1990) specifically investigated the tonal realization of disyllables where the first syllable had a high falling tone (T1) as its base tone. In the two speakers that she recorded, both showed a high level tone on the first syllable and a mid falling tone on the second syllable. Toda argued that this pattern was difficult to analyze as a simple contour extension from the base tone of the first syllable due to the difference in the time-normalized f0 contour between the disyllable and the first syllable. Zhu (1999) replicated Toda’s result in one of the two speakers that he recorded, but found that the other speaker’s T1+X pattern could indeed be interpreted as contour extension from the first syllable. Zhu further argued that, while T2+X and T3+X involved contour extension, T4+X and T5+X both involved contour displacement. But the T4+X result was difficult to evaluate due to the small f0 excursion on both the monosyllables and disyllables. Chen’s (2011) primary goal was to investigate how the f0 perturbation from the onset consonant in noninitial position is affected by the phonological consonant-tone cooccurrence restriction, but her results did show that the f0 of the second syllable was primarily determined by the base tone of the first syllable, and the f0 difference associated with the laryngeal feature of the second syllable, which could potentially be linked to a base tone difference, was largely realized in the first 50ms of the vowel and attributable to f0 perturbation. Takahashi (2013) was the only work we are aware of that investigated both left-dominant and right-dominant sandhi in Shanghai, although his left-dominant investigation focused on three- and four-syllable sequences. His data showed that in left-dominant contexts, the f0 contour of the polysyllabic sequence was indeed determined by the base tone of the first syllable, and younger speakers inserted a default Low tone on the third syllable of the sequence. This echoes Chen’s (2008) earlier finding on polysyllabic tone sandhi in Shanghai. For right-dominant sandhi, Takahashi investigated the f0 pattern on the initial syllable of disyllables under different speech rates and found that, at all speech rates, the contour shape of the initial tone was preserved and the falling T1 and rising T2 did not result in neutralization, thus supporting the position that right-dominant sandhi in Shanghai is gradient phonetic reduction rather than neutralizing phonological changes.
4 Regressive tonal coarticulation is commonly attested as well, but its nature may be either assimilatory or dissimilatory. The dissimilatory effect of a Low tone on a preceding High is particularly notable and has been shown in Mandarin (Shih, 1986; Shen, 1990; Xu, 1997), Thai (Gandour et al., 1994; Potisuk et al., 1997), Taiwanese (Peng, 1997), and Yoruba (Laniran, 1992). The duration and magnitude of progressive tonal coarticulation is typically reported to be greater than those of regressive tonal coarticulation, but the opposite effect has occasionally been found (e.g., Mandarin: Shen, 1990; Yoruba: Laniran, 1992; Vietnamese: Brunelle, 2009). In the modeling of prosody, researchers have treated the directionality of coarticulatory smoothing differently. For instance, Kochanski and Shih’s (2003) soft template model (Stem-ML) assumes bidirectional smoothing; Pro-om et al.’s (2009) quantitative target approximation (qTA) model as well as its predecessor, the parallel encoding and target approximation (PENTA) model (Xu, 2005), is sequential and allows only left-to-right coarticulatory influences.
2.2. Productivity studies on Chinese tone sandhi
The nonce probe tests, whereby speakers are asked to provide responses to novel words in contexts that are facilitative to the application of the phonological process in question, have been widely used to test the productivity of phonological alternations (e.g., Albright, Andrade, & Hayes, 2001; Hayes & Londe, 2006; Zuraw, 2007; Hayes et al., 2009; Becker et al., 2011; Hayes & White, 2013) as well as regular and irregular morphological rules (e.g., Bybee & Pardo, 1981; Albright, 2002; Albright & Hayes, 2003; Pierrehumbert, 2006). Using this method to investigate the productivity of tone sandhi can be traced back to the work of Hsieh (1970, 1975, 1976) on Taiwanese Southern Min. Subsequent works have investigated the productivity of tone sandhi in Mandarin (Zhang & Lai, 2010), Tianjin (Zhang & Liu, in press), Wuxi (Yan & Zhang, in press), as well as Taiwanese (Wang, 1993; Zhang & Lai, 2008; Zhang, Lai & Sailor, 2011). The major finding is that, similar to the works on segmental phonology cited in Section 1.2, the speakers’ phonological knowledge of tone sandhi is also not necessarily identical to the sandhi patterns reflected in lexical statistics. The works on Taiwanese, for example, have shown that when the tone sandhi involves a circular chain shift, the sandhi is not entirely productive in wug tests, indicating that despite the regularity of the sandhi in the language, the speakers have not completely internalized the pattern and likely rely on lexical and allomorph listings for the sandhi.5 The phonetic property of the sandhi has also been shown to have an influence on how it is internalized by speakers. For instance, Zhang and Lai (2010) tested the productivity of both the third tone sandhi (213-35/__213) and the half-third sandhi (213-21/__T, Ta213) in Mandarin and found that, although both applied categorically to nonce words, the application of the former was phonetically incomplete. They attributed this to the greater phonetic naturalness of the latter.6 Zhang and Liu (in press) replicated this result in the tonal cognates in Tianjin, a dialect closely related to Mandarin. In addition, Yan and Zhang (in press) showed that in Wuxi, a Wu dialect, the productivity of tone sandhi in nonce words is positively correlated with the phonetic similarity between the base tone and the sandhi tone – another effect of the phonetic nature of the sandhi.
These previous works indicate that our understanding of tone sandhi can benefit considerably from productivity studies that shed more direct light on the speakers’ tacit knowledge of the sandhi patterns. The results of these productivity studies will then provide a firmer foundation from which formal analyses of tone sandhi can proceed.
The productivity studies so far, however, are limited in two respects. First, they have primarily focused on right-dominant sandhi, and we know little about the productivity of the left-dominant spreading pattern common in Northern Wu dialects like Shanghai. This is especially interesting as the rightward spreading pattern is the most closely related to progressive tonal coarticulation. If a strong phonetic basis of the sandhi facilitates its productivity, we would expect the rightward spreading pattern to be relatively productive. Second, previous studies have not investigated the structure sensitivity of tone sandhi. The research on Shanghai that we report here fills these two gaps and complements our current knowledge of tone sandhi productivity.
3. Methodology
The basic methodology of our study was to elicit disyllabic utterances from native speakers of Shanghai by presenting them with two separate monosyllables in their base tones and asking them to pronounce the syllables together as a real word or phrase in Shanghai. The tonal realization of the two syllables was then measured to quantify the application of the tone sandhi. The experiment was divided into two parts, one dealing with existing disyllables, one dealing with nonce disyllables, and both words and phrases, which were expected to undergo left- and right-dominant sandhis, respectively, were tested. We first describe our participants and the stimulus construction. The set-up and procedure for the two parts of the experiment are then discussed, followed by how we analyzed the f0 data and the statistical method that we used for f0 curve comparisons.
3.1. Participants
There is considerable dialect-internal variation within Shanghai, and due to close contact with other Wu dialects such as Suzhou and Ningbo as well as the dominant influence of Standard Chinese, Shanghai has undergone and is still undergoing fast changes, especially in its phonetics and lexicon.7 We focused on the variety of Shanghai spoken in the urban area by younger speakers in this study. Our experiment was conducted in the Phonetics Laboratory of the Department of Chinese Language and Literature at Fudan University, Shanghai. Forty-eight speakers (28 females) who grew up in one of the ten urban districts of Shanghai and self-identified as native, fluent speakers of Shanghai Wu participated in the experiment. The majority of the participants were undergraduates at Fudan University, and the participants’ mean age at the time of experiment was 24.6.
5 There is a range of tone sandhi application rates that has been reported for Taiwanese wug tests, and the rate seems to (a) be task-dependent and (b) increase with continued exposure to the nonce items (e.g., Hsieh, 1975; Wang, 1993). Chuang, Chang, and Hsieh (2011) argued that the “foreignness” of the nonce items contributed to the unproductivity results in earlier wug tests and showed that when speakers were asked to undo tone sandhi in existing disyllabic monomorphemic words and Japanese loanwords, the sandhi productivity was considerably higher. They went on to argue that the method of wug tests in the study of productivity needed to be reevaluated. While we agree that the exact application rate of a phonological process in a wug test cannot be directly taken as the productivity of the process, the comparison of wug test results on tone sandhi patterns in different dialects under the same method still informs us that speakers internalize different types of sandhi differently. For instance, Taiwanese tone sandhi induces categorical non-application in nonce words, while Mandarin tone sandhi does not. Moreover, the incorporation of listed lexical items or allomorphs does not preclude the possibility of practice/learning effect as the nonce word becomes more familiar. Zuraw (2000, 2003), for instance, has proposed a model that allows the application rate of semi-productive phonological processes to increase in loanwords as they gradually become incorporated in the lexicon.
6 Zhang & Lai (2010) discussed a number of alternative interpretations for the result, including the low frequency of third tone sandhi cases, treating the low falling tone as the base tone for T3, and the syntactic dependency of the third tone sandhi. Without taking the discussion too far afield, we refer the reader to their article for a comparison of the interpretations.
7 For information about the diachronic changes, dialectal variation, and sociolinguistic situation of Shanghai, see Xu and Tang (1988), Qian (1997), and Zhu (1999, 2006).
3.2. Stimulus construction
3.2.1. Real disyllables The first part of the experiment investigated the nature of left-dominant and right-dominant tone sandhi in real disyllabic
sequences in Shanghai. To this end, we aimed to select 100 words and phrases, four for each of the 25 base-tone combinations. Among the four, two should undergo left-dominant sandhi, and two should undergo right-dominant sandhi. We also wanted to ensure that the directionality difference was not simply a function of usage frequency difference, but structure-related. We therefore aimed to match the overall frequency between the left-dominant and right-dominant items.
Given that there is no existing frequency corpus of Shanghai, we first designed and implemented an online subjective frequency rating pretest that estimated the usage frequency of 400 disyllabic words and phrases in Shanghai (Balota, Pilotti, & Cortese, 2001). Of the 400 items, 200 undergo left-dominant sandhi and 200 undergo right-dominant sandhi according to Xu et al. (1981) and Xu and Tao (1997). The pronunciation of these 400 items was further checked with two native speaker consultants (one female). Within the 200 in each directionality, each tonal combination was represented by eight items. Our female consultant recorded the entire list and the recording was used for the frequency pretest. The test was divided into four sessions, each with 100 words, and the four sessions were matched in numbers for tones and sandhi types. The test was implemented online in LimeSurvey hosted by the Ermal Garinger Academic Resource Center at the University of Kansas.
The test was advertised through Chinese dialect websites, the Linguist List, social media websites, and word of mouth. In the end, the numbers of complete responses for each session were 33, 30, 30, and 33, respectively. Some of our subjects participated in all sessions, some only in a subset of them.
During the test, participants were given the Chinese characters and the acoustic recording of an item and asked to respond whether the item was “very commonly used,” “commonly used,” “neither common nor rare,” “rarely used,” or “very rarely used” (given in Chinese characters). The subjects’ ratings were converted into a 1–5 scale, where 1 represents a “very rarely used” response and 5 a “very commonly used” response, and the ratings for each item were averaged across subjects. From the 400 items, 100 (50 left- dominant, 50 right-dominant, 4 for each tonal combination) were eventually selected so that the left- and right-dominant items had the same rating distribution (Kolmogorov–Smirnov test: D¼0.14, p¼0.7166) and the same rating mean (Wilcoxon test: W¼1341.5, p¼0.5304; the Wilcoxon test was used due to non-normal distributions of the samples). The syntactic structures of the left-dominant sandhi items were primarily modifier–noun, but also included modifier–verb, coordination, and lexicalized compounds and proper names. The syntactic structures of the right-dominant sandhi items were primarily verb–noun, but also included verb–adverb and subject–predicate. The segmental contents of the syllables were not actively controlled. The complete stimulus list is given in Appendix A.
Given that one of the main goals of the study was to investigate the productivity of tone sandhi by comparing the sandhi application in real and nonce words, we elicited the sandhi patterns in real words by providing the speakers with the base tones of individual syllables, as the base tones of the nonce syllables must be given to the subjects (see below). Otherwise, the real words would be read with no auditory priming of the base tones, while the nonce words would. The individual syllables were read in their base tones in isolation by our female consultant and recorded in an anechoic chamber at the University of Kansas. The acoustic files of the individual syllables were then used in the first part of the experiment.
To alleviate fatigue of our subjects, who participated in both parts of the experiment, we divided the stimulus list into two, each including one item for left-dominant sandhi and one item for right-dominant sandhi for each of the tonal combinations. One list was used for half of the subjects, and the other list was used for the other half.
3.2.2. Nonce disyllables The second part of the experiment involved the subjects’ production of disyllabic nonce sequences, which were formed by
combining a syllable accidentally missing from the Shanghai syllabary (legal segmentals and legal tone, whose combination does not violate voicing-tone cooccurrence restrictions, but happens to be missing in the syllabary) as the first syllable (σ1) and an existing syllable as the second syllable (σ2). The nonce σ1 was provided a meaning as either a nominal modifier or a verb to elicit left- dominant and right-dominant sandhi, respectively, and σ2 was always an existing noun. Ten nonce syllables, two in each tone, were used in σ1 position, and each syllable was associated with two meanings – a modifier meaning and a verb meaning. Each speaker, however, only heard one meaning for each syllable. These nonce syllables were arrived at by first consulting the complete Shanghai syllabary in Zhu (2006, pp. 22–23); the missing segmentals and tone combinations from the syllabary were then checked with both of our consultants for acceptability. Given the voicing-tone cooccurrence restrictions, which limited the number of logical combination, there were relatively few items to choose from, and we were not able to match the segmental properties of these nonce syllables (e.g., consonant aspiration, vowel height) with those of σ1s in the real disyllables. Ten monosyllabic nouns, two in each tone, were used in σ2 position, and each speaker used one noun to combine with a modifier nonce σ1 and the other to combine with a verb nonce σ1. The two sets of nonce syllables in σ1 and their meanings are given in Table 2, and the two sets of existing nouns used in σ2 are given in Table 3.
For example, two nonce syllables with Tone 1 (53) were used in σ1 position: m~ 53 and mu53; half of the speakers would hear m~ 53 used as a modifier meaning “a special color” and mu53 used as a verb meaning “to shop online,” and the other half of the speakers would hear m~ 53 as the verb and mu53 as the modifier. Each nonce syllable in σ1 was combined with five monosyllabic nouns, one in each tone, in σ2. For example, m~53 was combined with s53 “book”, s34 “umbrella”, di13 “musical instrument”, pi55
“pen”, di12 “flute”, and mu53 was combined with ho53 “flower”, ts34 “grass”, zo13 “tea”, ty55 “chrysanthemum”, m12 “sock”.
Table 3 The two sets of nouns that were used to create modifier–noun and verb–noun combinations.
T1 s53 “book” ho53 “flower” T2 s34 “umbrella” ts34 “grass” T3 di13 “musical instrument” zo13 “tea” T4 pi55 “pen” ty55 “chrysanthemum”
T5 di12 “flute” m12 “sock”
Table 2 The two sets of nonce syllables, cued as modifiers or verbs, for half of the speakers. For the other half, the meaning columns for the nonce syllables were switched.
T1 m53 “a special color” mu53 “to shop online” T2 p34 “a city name” to34 “to sell in a special way” T3 b13 “a man-made
material” n13 “to transport via a
spaceship” T4 me55 “a smell” ne55 “to smuggle in a special
way” T5 ue12 “a shape” y12 “to give as a gift in a special
way”
Therefore, half of the speakers described “book”, “umbrella”, “musical instrument”, “pen”, and “flute” in a special color whiling shopping online for “flower”, “grass”, “tea”, “chrysanthemum”, and “sock”, while the other half did the opposite, but the segmental contents pronounced by the two groups of speakers were identical. In the end, each speaker produced 25 modifier–noun and 25 verb–noun sequences. The entire stimulus list for this part of the experiment can be found in Appendix B. The cue sentences that provided the meanings for the nonce syllables as well as prompts for the subjects’ response were again recorded by our female consultant.
3.3. Experimental procedure
Each participant first filled out a language background questionnaire and signed an informed consent form, then participated in the experiment. All participants did the real disyllable portion of the experiment first, then the nonce disyllable portion after a five-minute break. They were paid a nominal fee upon the completion of the experiment.
Both parts of the experiment were implemented in Paradigms (Tagliaferri, 2010). For the real disyllable portion, the subjects were given the two syllables in their base tones auditorily, separated by an 800 ms pause; the Chinese characters associated with the syllables also appeared on a computer screen as the syllables played. The subjects were then prompted to pronounce the words out loud in a clear and natural way. The stimuli were randomized for each speaker. The main experiment was preceded by an instruction read in Shanghai and a practice session that included four disyllabic items – two left-dominant and two right-dominant – that did not appear in the main experiment.
For the nonce disyllable portion, the subjects were given the meanings of the nonce syllables both auditorily and in written form. The nonce syllables were pronounced with their base tones twice during the verbal prompt and represented orthographically with a box “” in lieu of a Chinese character on the computer screen. For instance, the subjects would both hear and see “
m~ 53; m~ 53, ___” (“If to shop online is called m~ 53; if a book has not been m~ 53-ed, then we can say that we have not ___”), with the nonce syllable “m~ 53” represented as “” on the screen. The subject was expected to reply with /m~ 53/ (“m~ 53-ed the book”) with right-dominant sandhi. For each nonce syllable, the five monosyllabic nouns that it was combined with appeared together in one block; i.e., once the speakers were given the meaning of m~ 53, they were asked to combine it with five different nouns one after another. Different nonce words appeared in random order for each speaker. The main experiment was also preceded by an instruction and a practice session. The practice session used two nonce syllables that were not used in the experiment, one cued as a modifier and one cued as a verb, and the subjects were asked to combine each nonce syllable with two different monosyllabic nouns. The subjects were encouraged to ask questions during and after the practice if they had trouble understanding the task. Some did, but all were judged to have comprehended the task before they moved onto the experiment.
For both portions of the experiments, the subjects’ response was continuously recorded using a Marantz solid state recorder PMD 671 sampling at 22.05 kHz and an EV N/D 767a microphone.
3.4. Data analysis
All acoustic analyses of the data were conducted in Praat (Boersma & Weenink, 2009). The rimes of the syllables in the target stimuli were first identified and annotated in a text grid, we then took an f0 measurement at every 10% of the rime duration for each target syllable using ProsodyPro (Xu, 2005–2011), giving eleven f0 measurements for each syllable. ProsodyPro uses the automatic vocal pulse marking by Praat as well as a trimming algorithm that removes spikes and sharp edges (see Appendix 1 of Xu, 1999 for additional information on the trimming algorithm). The Maxf0 and Minf0 parameters in the script as well as the octave-jump cost were
adjusted for each speaker, and the f0 measurements were hand-checked against narrow-band spectrograms in Praat to correct for octave and other errors in the measurements provided by the script.
The f0 measurements in Hz were converted to Semi-tone relative to 50 Hz using the formula in (3a) to better reflect pitch perception (’t Hart, Collier, & Cohen, 1990; Rietveld & Chen, 2006). The Semi-tone values were then z-score transformed using the formula in (3b) over all measurements from a speaker to normalize for between-speaker variations, especially male and female differences (Rose, 1987; Zhu, 2004).
(3)
a.
STi Þ2 p
In addition to the data from the 48 experimental participants, the individual syllables recorded from our language consultant for the first part of the experiment were also analyzed, and these formed the basis for the data on the tonal inventory of Shanghai (see Fig. 1).
To compare the rightward spreading sandhi application between real and nonce words and the application of sandhi between modifier–noun words and verb–noun phrases in nonce items, we used growth curve analysis (Mirman, 2014) to model the f0 curves of the two syllables in the subjects’ responses. This analysis describes the functional form of the probability distribution of f0 over time by identifying model fit components for a f0 curve that captures this probability distribution. To capture the changes in f0 direction within a syllable, but in the meantime avoid overfitting the segmental effect, we used quadratic orthogonal polynomials to model all f0 curves over a syllable. The time terms for orthogonal polynomials are uncorrelated, hence their parameter estimates are independent of each other. The intercept term indicates the average height of the curve; the linear term indicates the overall slope of the curve; and the quadratic term indicates the sharpness of the centered peak of the curve. Detailed methods of the f0 comparisons are given together with the results to facilitate the interpretation of the results.
In addition, the participants’ tonal response to each target stimulus were also classified by a phonetically trained Shanghai native speaker into “Spreading,” “No Sandhi,” and “Other” to further shed light on the productivity and structure dependency of the sandhi pattern. The speaker was a linguistics graduate student who specialized in tone research and felt comfortable performing the task. She was asked to classify a disyllabic response as “Spreading” if its tone pattern is perceptually equivalent to how she would pronounce an existing nominal compound, “No Sandhi” if she believed that the subject pronounced the disyllable in its base tones, and “Other” if the tone pattern did not fall under either of these two categories. She started from the real disyllables, where she reported the classification to be straightforward, then moved onto the nonce disyllables, where she felt that the classification was difficult for around 20% of the tokens. For these tokens, she used a combination of her perception and a pitch track comparison in Praat between the token in question and a real disyllable in the same syntactic structure by the same speaker to make the final decision. Generalized Linear Mixed-Effects models were then used to investigate the effects of word type (real vs. nonce) and structure on the classification.
The rime duration for all stimulus syllables was measured in Praat as well and Linear Mixed-Effects models were used to investigate how duration was affected by word type and structure.
All statistical analyses were carried out in R version 3.1.0 (R Core Team, 2014) using the lme4 package version 1.1-6 (Bates, Maechler, Bolker, & Walker, 2014).
4. Results
We first report in Section 4.1 the f0 result on the application of left-dominant and right-dominant sandhis in real disyllables (first part of the experiment). The goal is primarily descriptive: the f0 data shed light on the nature of the two types of sandhi and address the questions of whether the left-dominant sandhi truly involves the spreading of the initial tone rightward, and whether right-dominant sandhi is better interpreted as phonological leveling or phonetic contour reduction. We then report f0 and sandhi classification comparisons between real and nonce words for left-dominant sandhi (Section 4.2) and between modifier–noun words and verb–noun phrases in nonce items (Section 4.3) to address the productivity and structure dependency of the sandhi system. Relevant rime duration comparisons are given in each section as well.
4.1. Acoustic description of left- and right-dominant tone sandhi in real disyllables
The time-normalized f0 data for real disyllabic words expected to undergo left-dominant spreading sandhi, organized by base tone combinations, are given in Fig. 2, and the right-dominant sandhi undergoing counterparts are given in Fig. 3. f0 curves for the base tones from our female language consultant, averaged over the eight monosyllables used for each tone in the real word experiment, were overlaid onto each graph as thin solid lines for reference. All f0 graphs here and elsewhere were produced with the R package ggplot2 (Wickham, 2009).
We can compare the two sets of f0 graphs in two ways to understand the nature of the difference between left- and right-dominant sandhi. First, if we look across each row, in which all graphs share the same base tone on the first syllable but have different base tones on the second syllable, we can see that in left-dominant sandhi (Fig. 2), the base tone difference on the second syllable is
Fig. 2. F0 data (vertical lines indicate 7SE) for real disyllabic words expected to undergo left-dominant sandhi. Each graph represents a base-tone combination. Thin solid lines represent the average f0 curves for the base tones from the female language consultant. Each observed data point represents the average f0 at a particular normalized time point across participants.
considerably curtailed, and the overall f0 pattern over the disyllable indeed takes the shape of the contour of the first syllable. The f0 on the first syllable is little affected by the different base tones on the second syllable and is generally realized as a slightly falling tone. The spreading pattern is particularly clear in T1+X and T5+X: in the former, the falling contour of the initial base tone was spread over the two-syllable domain, and in the latter, the rising contour of the initial base tone was displaced onto the second syllable, leaving a low tone on the first syllable. For T2+X and T3+X, which had a rising tone as a base tone on the first syllable, only the low portion of the rise was realized on the first syllable, and the f0 was higher on the second syllable in all tonal combinations except for T2+T3, indicating a spreading of the first-syllable rise. In right-dominant sandhi (Fig. 3), however, the f0 on the second syllable remains close to the base tone shape. Like in left-dominant sandhi, the f0 on the first syllable is also little affected by the base tone of the second syllable, but instead of a falling tone, it retains the tonal properties of the base tone. These indicate that no sandhi has applied.
Second, if we look down each column, in which all graphs share the same base tone on the second syllable but have different base tones on the first syllable, we can see that in left-dominant sandhi, the f0 on the second syllable is strongly affected by the different base tones on the first syllable and realized differently despite the same base tone. In right-dominant sandhi, however, the f0 on the second syllable within the same column remains constant by maintaining the tonal properties of the base tone. The f0 on the first syllable also maintains properties of the base tone in right-dominant sandhi. In particular, the two tones – T1 (53) and T2 (34) – that have been reported to be neutralized to the same level tone 44 in the literature retained their falling and rising contours on the first syllable, respectively.
These comparisons indicate that disyllabic words in Shanghai indeed undergo rightward spreading tone sandhi, but the so-called “right-dominant sandhi” for disyllabic phrases is better interpreted as phonetic contour reduction.
We can also note from Fig. 2 that in left-dominant sandhi, the second syllable preserves many of its base tone properties despite the strong influence of the first syllable tone spread. In T2+X, T3+X, and T4+X, the f0 on the second syllable corresponds to the base tones on the second syllable. The trace of the base tones is noticeable for T1+X and T5+X as well: in T1+X, the second syllable has a rise in T1+T2 that corresponds to the rise in base T2, and the second syllable is higher in T1+T4 than T1+T5, corresponding to the base tone difference between T4 and T5; in T5+X, the higher second syllable in T5+T4 than T5+T5 is also
Fig. 3. F0 data (vertical lines indicate 7SE) for real disyllabic phrases expected to undergo right-dominant sandhi. Each graph represents a base-tone combination. Thin solid lines represent the f0 curves for the base tones from the female language consultant. Each observed data point represents the average f0 at a particular normalized time point across participants.
clearly observable. The higher f0 on T1, T2, and T4 than T3 and T5 on the second syllable also corresponds to a voicing difference in the onset of the second syllable: along with earlier results (Cao & Maddieson, 1992; Ren, 1992; Chen, 2011; Wang, 2011), our data showed clear voicing for stop closure and frication for obstruents that cooccurred with T3 and T5. These results, therefore, are likely due to a combination of the imitation effect from the exposure to the base tones (see Goldinger, 1998; Delvaux & Soquet, 2007; Tilsen, 2009; Nielsen, 2011, etc. on imitation effects) and the perturbation effect from the initial consonant of the second syllable.
The classification result of the f0 patterns into “Spreading,” “No Sandhi,” and “Other” for the real items is given in Fig. 4. The vast majority of forms that are expected to undergo left-dominant sandhi indeed underwent the spreading sandhi, and the vast majority of forms that are expected to undergo right-dominant sandhi underwent no sandhi, as judged by our phonetically trained Shanghai speaker. A Generalized Linear-Mixed Effects model on the “Spreading” pattern, with structure as a fixed effect and participant and item as random effects, showed that the M–N structure had significantly higher “Spreading” counts than the V–N structure (Estimate¼9.6711, S.E.¼0.8259, z¼11.709, p<0.001; M–N as baseline).
The rime duration results for the two syllables in the two sandhi directions are given in a box and whiskers plot in Fig. 5. Given that the segmental content between the left- and right-dominant sandhi items was not matched, we coded the vowel height of the rime according to the lowest vocalic element during the rime as “High,” “Mid,” and “Low” (e.g., [tyø] was coded as “Mid” and [ia] was coded as “Low”), as vowel height is a known factor that affects duration (e.g., House & Fairbanks, 1953; Peterson & Lehiste, 1960; Maddieson, 1997). A likelihood-ratio comparison between a model of rime duration that only included participant and item as random effects and one with vowel height as an additional fixed effect showed that the addition of vowel height significantly improved the model (χ2(2)¼63.646, p<0.001). So this nuisance factor was included in subsequent models, with structure (left- vs. right-dominant) and syllable (σ1 vs. σ2) as potential factors. Likelihood-ratio tests showed that among these models, the one that included both terms and their interaction provided the best fit with the data. From this model, we found that for the left-dominant structure, there was no rime duration difference between σ1 and σ2 (Estimate¼1.432, S.E.¼1.856, t¼0.772, p¼0.440), and for the right-dominant structure, σ2 had a significantly longer rime duration than σ1 (Estimate¼23.945, S.E.¼1.863, t¼12.853, p<0.001). These results indicate that the structural difference is correlated with a difference in duration patterning.
Fig. 4. Tone pattern counts for “Spreading,” “No Sandhi,” and “Other” for real items as determined by a phonetically trained Shanghai speaker, organized by base tone combinations. “M–
N” (modifier–noun) and “V–N” (verb–noun) represent forms that are expected to undergo left-dominant and right-dominant sandhi, respectively.
Fig. 5. Rime durations for the two syllables for the real items in the two sandhi directions. “S1”¼first syllable; “s2”¼second syllable. The black dot represents the median, the box represents the interquartile range (1st to 3rd quartile), and the whiskers represent maximally 1.5 times the interquartile range.
4.2. Productivity of the left-dominant sandhi
We focus in this section on the productivity of rightward tone spreading sandhi by comparing the f0 patterns between real and nonce words. The real word data were from the left-dominant-sandhi-undergoing words in the first part of the experiment, while the nonce word data were from the modifier–noun novel compound formations in the second part of the experiment. As stated earlier, the f0 curves were modeled using quadratic orthogonal polynomials. To investigate the effect of word type (real vs. nonce) on the f0 curve for a particular tonal combination, a base model that only included the linear and quadratic time terms and the participant and participant by word type random effects on the time terms was first constructed. Word type was then added onto this model as a factor, and word type’s interactions with the time terms (time, time2) were subsequently added step-wise. Their effects on model fit were evaluated using log-likelihood model comparison. The model fit comparisons for all 50 growth curve analyses (two syllables 25 base-tone combinations) as well as the R codes that generated the models are given in Appendix C.
The observed f0 data for these two word types for each of the base-tone combinations together with the second-order orthogonal polynomial growth curve models for each of the syllables are given in Fig. 6. Although model comparisons indicate that the full model is not always justified in all f0 comparisons, it is in some cases. We therefore graphed the full models for all cases to allow for a consistent visual comparison. F0 curves for the base tones from our language consultant were again overlaid onto each graph to aid the assessment of sandhi productivity: a higher similarity between the sandhi tone and the base tone would indicate a lower productivity.
Fig. 6. Observed data (symbols, vertical lines indicate7SE) and second-order orthogonal polynomial growth curve model fits (lines) for f0 on disyllabic words expected to undergo left- dominant sandhi. Each graph represents a base-tone combination. Filled circle and thick solid line represent real words; filled triangle and dotted line represent nonce words; f0 curves for the base tones from our language consultant are overlaid onto each graph as thin solid lines. Each observed data point represents the average f0 at a particular normalized time point across participants.
A visual inspection of the general shapes of the f0 curves in Fig. 6 indicates that the spreading sandhi has generally applied to both real and nonce words: the disyllable has an overall falling contour when the first syllable has a falling tone (T1) and an overall rising contour when the first syllable has a rising tone (T2, T3, T5). This indicates that the spreading sandhi is generally productive, supporting the hypothesis that a close affinity with a phonetic process, here progressive tonal coarticulation, facilitates a sandhi’s productivity. This is also supported by the observation that the f0 curves for the two syllables in both real and nonce words are quite different from those of the base tones overlaid onto the graphs in Fig. 6.
For T2+X, however, there was a consistently large yet unexpected difference in average f0 on the first syllable between real and nonce words, and model comparisons showed that the intercept term was significant for all T2+X combinations (χ2(1)>30, p<0.001). An analysis of the experimental stimuli recorded by our consultant indicated that she pronounced the nonce T2 syllables with a lower- than-expected f0, almost in the T3 range, and we believe that this was the cause for the unexpected difference.
Despite the general similarity in f0 shape between real and nonce words, model comparisons indicate that the f0 curves from the two types of words are usually significantly different from each other: in 42 out of 50 growth curve analyses, word type has a significant effect on the intercept, linear, or quadratic term of the model; and for all 25 base-tone combinations, the two word types have different f0 curves on at least one of the syllables. There is some evidence that the f0 curves in nonce words show more tonal characteristics of the base tone than those in real words do. This effect would be the most obvious when the expected sandhi tone is a clear falling tone while the base tone is a clear rising tone, or vice versa, a scenario found on the second syllable of T1+T2, T1+T3, and T1+T5 combinations. The corresponding graphs in Fig. 6 show that the nonce words indeed have a greater rising tendency on the second syllable than the real words, and this is supported by model comparisons that showed that the linear terms significantly improved the models (T1+T2: χ2(1)¼5.8341, p¼0.0157; T1+T3: χ2(1)¼30.0598, p<0.001; T1+T5: χ2(1)¼28.5280, p<0.001), and parameter estimates for the linear terms, with real words as the baseline, all showed positive values (T1+T2: 0.5741; T1+T3: 1.3487; T1+T5: 1.3237).
Fig. 7. Tone pattern counts for “Spreading,” “No Sandhi,” and “Other” real and nonce items expected to undergo left-dominant sandhi as determined by a phonetically trained Shanghai speaker, organized by base tone combinations.
Another scenario where this effect could be observed is when the base tone and sandhi tone are expected to differ in f0 height. This is found when (a) the first syllable has a rising (T3) or high tone (T4) and the second syllable has a low-register tone (T3, T5), which would cause the sandhi tone to be higher than the base tone on the second syllable, or (b) the first syllable has a high falling tone (T1) and the second syllable has a high-register tone (T1, T4), which would cause the base tone to be higher than the sandhi tone on the second syllable. The f0 comparison results, however, are inconsistent. For all tonal combinations in (a), model comparisons showed that the intercept term significantly improved the model (T3+T3: χ2(1)¼6.8648, p¼0.0088; T3+T5: χ2(1)¼ 10.1586, p¼0.0014; T4+T3: χ2(1)¼10.2454, p¼0.0014; T4+T5: χ2(1)¼43.2113, p<0.001), but the parameter estimates for the intercept, with real words as the baseline, only showed negative values for T4+T3 (0.6875) and T4+T5 (0.8529), but a positive values for T3+T3 (0.2037) and T3+T5 (0.3661). For the tonal combinations in (b), the intercept term significantly improved the model for T1+T1 (χ2(1)¼7.7434, p¼0.0054), and the parameter estimate was positive (0.6185), but the intercept term was not significant for T1+T4 (χ2(1)¼0.0176, p¼0.8944). These effects can also be seen in the corresponding graphs in Fig. 6.
Another question on productivity we set out to address is whether contour displacement is as productive as contour extension. Contour displacement occurs on T5+X combinations, whereby the rising tone from base T5 is displaced to the second syllable. Given that in the tonal inventory, T1 is the only falling tone, if contour displacement did not apply productively to nonce words, but did apply to real words, we would expect the most marked difference to appear on the second syllable of T5+T1. The corresponding graph in Fig. 6 shows that in real words, the rising tone was indeed displaced onto the second syllable, but in nonce words, the sandhi tone on the second syllable was close to a level tone that was higher than the first syllable. Model comparisons showed that for the f0 on this syllable, adding word type and its interaction with the linear and quadratic time terms stepwise to the base model all significantly improved the previous model (intercept: χ2(1)¼5.6000, p¼0.0180; linear: χ2(1)¼10.5987, p¼0.0011; quadratic: χ2(1)¼ 6.1609, p¼0.0131). The parameter estimate for the linear term is negative and significant (Estimate¼1.5005, t¼4.2857, p<0.001), supporting the claim that the real words had more of a rising contour on this syllable than the nonce words. There are two potential interpretations for this difference. One is that, instead of contour displacement, the more general contour extension has applied to the nonce words. The other is that the level f0 is a result of averaging rising tones from contour displacement and falling tones from the lack of sandhi application. A closer look at the sandhi behaviors from individual tokens showed that the latter interpretation is more accurate. In other words, the nature of the lower productivity for contour displacement is primarily non- application. The sandhi classification result and the f0 result from only the tokens classified as “Spreading” below provide further support for this.
The model comparisons for the second syllable of other T5+X combinations, however, showed little effect of word type. Except for the linear term for T5+T2 (χ2(1)¼5.5578, p¼0.0184) and the intercept term for T5+T3 (χ2(1)¼3.9629, p¼0.0465), adding word type or its interactions with time terms did not improve the models. This could mean that contour displacement applied productively to nonce words. But an alternative interpretation is that for T5+T2, T5+T3, and T5+T5, the second syllable had a rising tone as the base tone, and therefore, the application and non-application of contour displacement would both predict a rising tone on the second
Table 4 Parameter estimates for the fixed effect of word type (with real words as the baseline) on the “Spreading” pattern counts in the real and nonce items expected to undergo left-dominant sandhi in the Generalized Linear-Mixed Effects models for the 25 base-tone combinations. “nnn”: p<0.001; “nn”: p<0.01; “nn”: p<0.05. For T2+T4, T3+T3, T3+T4, and T4+T4, 100% of the real items exhibited the spreading pattern; to avoid complete separation in the Generalized Linear-Mixed Effects analysis, an artificial real-item data point that did not undergo spreading was added to the dataset before the analysis was run.
Tones Estimate S.E. z p sig.
T1+T1 1.3451 0.5306 2.5350 0.0112 n
T1+T2 9.3582 2.2384 4.1807 <0.001 nnn
T1+T3 3.7461 1.1916 3.1438 0.0017 nn
T1+T4 3.2494 1.0547 3.0809 0.0021 nn
T1+T5 1.6094 0.6080 2.6470 0.0081 nn
T2+T1 0.9369 0.4603 2.0353 0.0418 n
T2+T2 2.7515 1.0641 2.5857 0.0097 nn
T2+T3 0.7388 0.5857 1.2613 0.2072 T2+T4 2.5773 1.2203 2.1119 0.0347 n
T2+T5 1.5445 0.7428 2.0793 0.0376 n
T3+T1 1.9123 0.8767 2.1812 0.0292 n
T3+T2 1.1421 1.1734 0.9733 0.3304 T3+T3 1.1632 1.1732 0.9914 0.3215 T3+T4 1.4733 1.1374 1.2954 0.1952 T3+T5 0.4274 0.9366 0.4564 0.6481 T4+T1 1.5486 0.7962 1.9450 0.0518 T4+T2 3.4720 0.7794 4.4547 <0.001 nnn
T4+T3 2.5233 1.0024 2.5172 0.0118 n
T4+T4 4.2077 1.0519 4.0000 <0.001 nnn
T4+T5 3.4720 0.7794 4.4547 <0.001 nnn
T5+T1 3.5850 1.3199 2.7162 0.0066 nn
T5+T2 16.0643 6.3021 2.5490 0.0108 n
T5+T3 7.1026 3.8792 1.8310 0.0671 T5+T4 15.7873 5.1089 3.0901 0.0020 nn
T5+T5 2.3327 0.8968 2.6011 0.0093 nn
syllable; for T5+T4, the short duration of the final checked syllable might have prevented the rising contour to surface on the syllable in both real and nonce words.
The classification result of the f0 patterns for the nonce words expected to undergo left-dominant sandhi is given in Fig. 7, and the real words’ result is replicated from Fig. 4 here for comparison purposes. A large proportion of the tone patterns in the nonce words has been classified as undergoing the spreading sandhi, indicating the general productivity of the sandhi pattern. But in general, the real words had more “Spreading” patterns and fewer “No Sandhi” patterns than the nonce words. Likelihood ratio comparisons among Generalized Linear-Mixed Effects models on the “Spreading” pattern showed that the inclusion of word type and tonal combination (T1+T2, T1+T3, etc.) both significantly improved upon the model that only included the random effects of participant and item (word type: χ2(1)¼39.393, p<0.001; tonal combination: χ2(24)¼60.721, p<0.001), and the model that includes the interaction term between word type and tonal combination also significantly improved upon the one without the interaction (χ2(24)¼ 75.046, p<0.001). We therefore looked at the effect of word type on the “Spreading” pattern for each tonal combination, and these results are summarized in Table 4. For the 25 base-tone combinations, 18 showed a significant effect of word type. Compared with the f0 curve results in which all 25 tonal combinations showed a significant difference based on word type, these results indicate that the lower sandhi productivity in nonce words suggested by some of the f0 curve results (T1+T1, T1+T2, T1+T3, T1+T5, T4+T3, T4+T5) were caused by a combination of categorical non-application of the sandhi in nonce words and phonetically gradient sandhi application.
To further investigate the nature of the productivity difference between real and nonce words, we conducted the same growth curve analyses on only the f0 patterns that have been classified as “Spreading” by our trained Shanghai speaker. The full growth curve models together with the observed f0 data and the base tone data from our consultant are given in Fig. 8, and the model fit comparisons are given in Appendix D. Most of the differences between real and nonce words observed in the entire data set (Fig. 6) persist in Fig. 8. In 41 out of 50 growth curve analyses, word type still has a significant effect on the intercept, linear, or quadratic term of the model; and for all 25 base-tone combinations, the two word types still have different f0 curves on at least one of the syllables. This indicates that the lower productivity indeed partially stems from a gradient application of the sandhi. A particularly interesting comparison appears in the T5+T1 graphs in Fig. 6 and Fig. 8: in Fig. 8, where the f0 track only includes those tokens that have been classified as “Spreading,” the f0 on the second syllable is indeed rising, and the inclusion of the interaction between word type and the linear time term did not significantly improved the model (χ2(1)¼0.5685, p¼0.4508). This provides further support for our earlier claim that for T5+T1, the lower productivity of the contour displacement sandhi is primarily reflected in categorical non-application.
A concern with the f0 comparison between real and nonce words above is that for each tonal combination, each participant only produced one real word and one nonce word. The results, therefore, are confounded with the segmental perturbation effects on f0 and may not generalize to different items. We investigated the potential effect of vowel height on tones for the tokens classified as “Spreading” as follows. We again coded each syllable according to the lowest vocalic element during the rime as “High,” “Mid,” and “Low”, and for each tone in each syllable position, we investigated the effect of vowel height on f0 by comparing a base model with only time terms and participant and participant by word type random effects on the time terms with a model that includes vowel height
Fig. 8. Observed data (symbols, vertical lines indicate 7SE) and second-order orthogonal polynomial growth curve model fits (lines) for f0 on disyllabic words that have undergone the left-dominant spreading sandhi according to a phonetically-trained Shanghai speaker. Each graph represents a base-tone combination. Filled circle and thick solid line represent real words; filled triangle and dotted line represent nonce words; f0 curves for the base tones from our language consultant are overlaid onto each graph as thin solid lines. Each observed data point represents the average f0 at a particular normalized time point across participants.
using log-likelihood tests. The results consistently showed that the effect of vowel height was significant, and parameter estimates showed that higher vowels generally had a higher f0 than lower vowels (High>Mid>Low), a finding consistent with earlier literature (e.g., Whalen & Levitt, 1995; Maddieson, 1997). To compensate for the vowel height effect, we calculated the average f0 for the high, mid, and low vowels for each tone in each syllable position at the eleven time points and subtracted these values from the original f0 data according to the vowel height of the item. We then reran the growth curve analyses on these values. The results to a large extent replicated the earlier analyses. Thirty-four of the 41 original f0 comparisons that showed a difference between real and nonce words maintained a difference between the two word types, and the vast majority of the effects that were shown to be significant by model comparison (intercept, linear, quadratic) in the original analysis maintained their significance in the new analysis (47 out of 61). This indicates that the word type effects are largely independent from the vowel height effect. The model fit comparisons for the 50 growth curve analyses based on f0 data corrected for vowel height are given in Appendix E.
Another potential confound for the f0 comparison between real and nonce words is that the f0 difference may have arisen from a duration difference if the speakers produced the unfamiliar nonce words more slowly. The rime duration results for the two syllables in all of the real and nonce words are given in Fig. 9. We again included vowel height as a nuisance predictor, and model comparisons showed that adding word type (real vs. nonce) or syllable (σ1 vs. σ2) did not significantly improved the model (word type: χ2(1)¼ 0.1695, p¼0.6805; syllable: χ2(1)¼1.7184, p¼0.1899), nor did adding the interaction between the two improve the model without the interaction (χ2(1)¼0.4472, p¼0.5037). These results indicate that the duration pattern in the nonce words was identical to that in the real words, and that the f0 difference based on word type was unlikely to be caused by a duration difference.
4.3. Structure dependency
To test the hypothesis that the structure dependency of tone sandhi is productive, we compared the tonal realizations between disyllabic nonce items that have different morphosyntactic structures. We expected the modifier–noun (M–N) combinations to form
Fig. 9. Rime durations for the two syllables in real and nonce words expected to undergo left-dominant sandhi. “S1”¼first syllable; “s2”¼second syllable. The black dot represents the median, the box represents the interquartile range (1st to 3rd quartile), and the whiskers represent maximally 1.5 times the interquartile range.
Fig. 10. Observed data (symbols, vertical lines indicate 7SE) and second-order orthogonal polynomial growth curve model fits (lines) for f0 on disyllabic nonce items with different syntactic structures (M–N¼modifier–noun; V–N¼verb–noun). Each graph represents a base-tone combination. Filled circle and thick solid line represent M–N words; filled triangle and dotted line represent V–N phrases; f0 curves for the base tones from our language consultant are overlaid onto each graph as thin solid lines. Each observed data point represents the average f0 at a particular normalized time point across participants.
words and undergo left-dominant sandhi and the verb–noun (V–N) combinations to form phrases and undergo right-dominant sandhi, which we have interpreted as phonetic contour reduction, not phonological sandhi. The data came from the second part of the experiment. The segmental contents of the M–N and V–N combinations were identical, as the nonce syllables in σ1 position were cued as modifiers for half of the participants and as verbs for the other half.
The observed f0 data for the M–N and V–N nonce words for each of the base-tone combinations, along with the second-order orthogonal polynomial growth curve models for the f0 and the f0 curves for the base tones from our language consultant, are given in Fig. 10. We again graphed the full models for all f0 comparisons here for consistency’s sake.
Fig. 11. Tone pattern counts for “Spreading,” “No Sandhi,” and “Other” for nonce items as determined by a phonetically trained Shanghai speaker, organized by base tone combinations. “M–N” (modifier–noun) and “V–N” (verb–noun) represent forms that are expected to undergo left-dominant and right-dominant sandhi, respectively.
Fig. 12. Rime durations for the two syllables for the nonce items in the two sandhi directions. “S1”¼first syllable; “s2”¼second syllable. The black dot represents the median, the box represents the interquartile range (1st to 3rd quartile), and the whiskers represent maximally 1.5 times the interquartile range.
From Fig. 10, we can see that the f0 curves for the V–N nonce items are consistently more similar to the base tones than the M–N nonce items. We have discussed in Section 4.2 that despite some differences from real words, the left-dominant spreading sandhi applied relatively productively in M–N nonce words. For the V–N nonce phrases, however, we generally observed nothing more than the gradient reduction of f0 contours on the first syllable. This is especially clear when the first syllable had a base rising tone (T2+X, T3+X, T5+X). Model comparisons of the f0 curves indicates that in 44 out of the 50 analyses, syntactic structure had a significant effect on the intercept, linear, or quadratic term of the model. When the first syllable has a base rising tone, model comparisons for the first syllable consistently showed that the effect of syntactic structure on the linear term was significant, and parameter estimates, with M–N as the baseline, consistently showed positive values for the linear term, indicating that the first syllables in V–N had greater rising slopes than the first syllables in M–N. Moreover, due to the matched segmental contents between the M–N and V–N combinations, the f0 comparisons here are not confounded with segmental effects, making the result more easily interpretable. The model fit comparisons for all 50 growth curve analyses as well as the R codes are given in Appendix F.
The classification result of the f0 patterns for the nonce items with M–N and V–N structures is given in Fig. 11, with the M–N result replicated from Fig. 7. The overwhelming majority of the V–N items has been classified as undergoing no sandhi by our native speaker, and a Generalized Linear-Mixed Effects model on the “Spreading” pattern, with structure as a fixed effect and participant and item as random effects showed that the V–N structure had a significantly fewer “Spreading” count than the M–N structure (Estimate¼5.0304, S.E.¼0.2230, z¼22.559, p<0.001, M–N as baseline). This result is consistent with the f0 curve result and
suggests that the structure-sensitivity of tone sandhi is productive in Shanghai, and speakers are able to let the syntactic structure of a disyllabic sequence dictate the tonal outcome of the two syllables.
The rime duration results for the two syllables in the nonce items with the two structures are given in Fig. 12. We again included vowel height as a nuisance predictor, and likelihood-ratio tests showed that the model that included both the structure (left- vs. right- dominant) and syllable (σ1 vs. σ2) terms as well as their interaction provided the best fit with the data. From this model, we found that for the M–N (left-dominant) structure, the second syllable had a significantly longer duration than the first syllable (Estimate¼3.792, S.E.¼1.919, t¼1.976, p¼0.048), and the for the V–N (right-dominant) structure, the second syllable had a significantly longer duration as well (Estimate¼20.165, SE¼1.919, t¼10.507, p<0.001), but the duration difference between the two syllables was significantly greater for the V–N structure (Estimate¼16.374, S.E.¼2.594, t¼6.311, p<0.001). This durational pattern is similar to that of the real items in Fig. 5 and provides support for the fact that the participants correctly interpreted the grammatical structures for the nonce items.
5. Discussion
5.1. Productivity and structure dependency of Shanghai tone sandhi
Our descriptive results on the f0 patterns of left-dominant and right-dominant tone sandhi in existing disyllabic words provided some evidence that the left-dominant sandhi indeed involved spreading the f0 contour over the two syllables, while the right-dominant sandhi was better interpreted as phonetic contour reduction on the first syllable. The auditory priming of the base tones during the experiment prevented us from making conclusive claims about the nature of these sandhis, but the confound was necessitated by the more important goal of the study, which was to investigate the productivity of the left-dominant sandhi. Our claim that the left- dominant sandhi was relatively productive came from two sets of data, one on the f0 comparison between real and nonce words, one on the comparison between nonce words (M–N structure) and nonce phrases (V–N structure), the former of which would not have been possible had the real word data not been base-tone primed, as the nonce words were necessarily primed by their base tones in the setting up of the context.
The real vs. nonce comparison provided a direct test of productivity by showing whether the sandhi applied differently in nonce words than in real words. Despite statistical differences in the f0 curves, we have seen that the shapes of the curves over the disyllabic nonce words generally represent the base tone contours of the first syllable, indicating the productivity of the spreading sandhi. Some of the differences between real and nonce words could be interpreted as the nonce words preserving more tonal characteristics from the base tone than the real words, but not all of them could. The differences, we argue, came from two sources. One was a greater number of categorical non-applications of the sandhi, as shown in the classification result. The other lay in the gradient application of sandhi, a type of gradience that is akin to incomplete neutralization in production (e.g., Peng, 2000; Yu, 2007) and the lack of full productivity in T3 sandhi in Standard Chinese (Zhang & Lai, 2010) and Tianjin (Zhang & Liu, in press).
The f0 comparison between M–N nonce words and V–N nonce phrases indicates that the structure sensitivity of the sandhi system in Shanghai is productive, as their differences, both in f0 curves and tone pattern classification, are consistently interpretable by the stronger preservation of the base tones in the V–N structure. The comparison also indirectly supports the productivity of left- dominant sandhi: if the tones in the V–N structure only involved phonetic contour reduction of the base tones, then qualitatively different f0 curves on the M–N nonce words would indicate that phonological sandhi processes have applied to these words.
We have found some evidence that the contour displacement sandhi in T5+X has a different productivity pattern from contour extension, as the lower productivity of contour displacement seems to have primarily stemmed from categorical non-application of the sandhi, especially in T5+T1 where the context allows the effect to be the most clearly observed. This suggests a more substantial degree of underlearning of the sandhi. We conjectured earlier that this is due to the sandhi’s more distant affinity with progressive coarticulation than contour extension. Two additional phonetic properties of the sandhi may have also contributed to its lower productivity. One is that according to Zhu (1999), in T5+X combinations, the phonetic prominence falls on the final syllable due to the pronounced rising f0. This creates a mismatch between phonetic prominence and phonological prominence, which is on the initial syllable – the syntactic non-head that determines the sandhi tones (see Selkirk & Shen, 1990; Duanmu, 1995). The other is that pronounced rising tones are typologically disfavored (Zhang, 2002). Other factors identified in earlier literature that undermine tone sandhi productivity include phonological opacity (Hsieh, 1970, 1975, 1976; Wang, 1993; Zhang & Lai, 2008; Zhang et al., 2011), lexical variation (Zhang & Liu, in press), and low lexical frequency (Zhang & Lai, 2008, 2010; Zhang et al., 2011; Zhang & Liu, in press). Opacity and lexical variation are not relevant here, as the contour displacement pattern itself is transparent, and for T5+X combinations, contour displacement is the only sandhi form that has been reported, while the other four contour extension patterns have more reported variation (Xu et al., 1981; Xu & Tang, 1988; Zhu, 1999, 2006). Due to the lack of frequency data, we cannot rule out the possibility that the lower productivity is related to low lexical frequency, but the reported effects of frequency are typically smaller than what we found for T5+T1 combinations (e.g., Zhang & Lai, 2010; Zhang & Liu, in press).
An anonymous reviewer raised the concern that the real and nonce items were elicited under different contexts. In particular, the nonce items that varied in the second syllable were elicited in one block, which may have resulted a contrastive reading on the second syllable and consequently more base tone readings (see, for example, Chen & Gussenhoven, 2008). But given that our hypothesis was that the spreading sandhi should be productive in the nonce items, putting these items in a context less conducive to
tone sandhi stacked the deck against the hypothesis. Therefore, the fact that we found that the sandhi was generally productive in the nonce items provides even stronger support for the hypothesis.
Finally, a shortcoming in the design of our experiment is that it did not allow our results to generalize to different items, as each participant only produced one stimulus for each tonal combination. Item, therefore, could not be included as a random factor in our data analysis. Although our f0 comparisons recalibrated against the vowel height effect showed similar patterns, there are many other potential item effects, and we want to emphasize the importance of item generality for future studies.
5.2. Situating Shanghai tone sandhi in tone sandhi typology
Our data on Shanghai tone sandhi complement our knowledge on tone sandhi productivity in the following respects. It is the first of its kind to investigate the productivity of rightward spreading sandhi – a typologically common tone sandhi pattern. Its close affinity to progressive tonal coarticulation prompted the hypothesis that it should be relatively productive, and our results generally support this hypothesis. This result supports the earlier finding that the phonetic naturalness of the tone sandhi facilitates its productivity. A comparison between the Shanghai and Taiwanese results also shows that opacity is a strong cause for categorical unproductivity, as the transparent sandhis in Shanghai showed more gradient production in nonce words, while the opaque sandhis in Taiwanese showed only categorical application, non-application, or misapplication. An additional difference between the Shanghai and Taiwanese patterns not directly reflected in the results is the difficulty with which the tone patterns could be classified. As stated earlier, our Shanghai speaker tasked with classification found the task difficult in around 20% of nonce items; Zhang et al. (2011), on the other hand, did not report similar difficulties and stated that three phonetically trained transcribers agreed on the sandhi transcriptions for virtually all tokens. It is likely that the presence of gradient sandhi application caused the classification difficulty in Shanghai, but the lack of it made the task easier in Taiwanese. Finally, our results, both in f0 and duration, showed that Shanghai speakers made structure-dependent generalizations regarding tone sandhi; the phonological analysis of prosodic domains and prosodic heads in Shanghai and other Chinese dialects (see Duanmu, 1995, 2007, for example), therefore, does have psychological reality despite the fact that phonetic motivations for the analysis are sometimes hard to come by.
In general, our results echo Zhang’s (2010, 2014) point regarding Chinese tone sandhi that rushing into an analysis of a sandhi pattern before testing it experimentally is premature, as the speakers’ knowledge of the tone sandhi pattern may not be identical to the pattern in the lexicon, and impressionistic transcriptions, no matter how careful, have their limitations. If we situate our findings here in the recent works in experimental phonology that showed that differences between the speakers’ knowledge and the lexical patterns are informative of the nature of phonological grammar (e.g., Wilson, 2006; Zuraw, 2007; Moreton, 2008; Hayes et al., 2009; Becker et al., 2011), we can more clearly see that the study of Chinese tones has much to gain from experimental investigations of productivity, processing, and learning.
6. Conclusion

Documents

Journal of Phonetics - University of Kansas