Structure-dependent tone sandhi in real and nonce disyllables in
Shanghai WuContents lists available at ScienceDirect
Journal of Phonetics
E-m 1 In
indicates coda, in
journal homepage: www.elsevier.com/locate/phonetics
Research Article
Structure-dependent tone sandhi in real and nonce disyllables in
Shanghai Wu
Jie Zhang n, Yuanliang Meng
Department of Linguistics, The University of Kansas, USA
A R T I C L E I N F O
Article history: Received 20 August 2014 Received in revised form 8
October 2015 Accepted 13 October 2015
Keywords: Tone Tone sandhi Shanghai Wu Productivity Growth curve
analysis
70/$ - see front matter & 2015 Elsevier Ltd. All rig
.doi.org/10.1016/j.wocn.2015.10.004
espondence to: Department of Linguistics, The 85 864 5724. ail
address:
[email protected] (J. Zhang). Chao numbers, a speaker’s tonal
range from low a rising tone in the low range (Chao, 1948, 1968)
the case of Shanghai, a glottal stop . onetically, the “voiced”
stops in Shanghai are no howed that the voiced category has
acoustic prop 2011) as well as a shorter closure duration
than
A B S T R A C T
Disyllabic sequences in Shanghai Wu undergo different types of tone
sandhi depending on their structure: phonological words (e.g.,
modifier–nouns) spread the initial tone across the disyllable,
while phrases (e.g., non- lexicalized verb–nouns) maintain the
final tone and level the contour of the nonfinal tone. We
investigated the productivity of the two tone sandhi types through
48 speakers’ productions of real and nonce disyllables. Our results
show that (a) the word-level tone sandhi in Shanghai indeed
involves tone spreading, while the phrase- level sandhi is better
interpreted as phonetic contour reduction, (b) the spreading sandhi
generally applies productively to nonce words, but there are some
differences in tone production between real and nonce words that
are attributable to both categorical non-application and gradient
application of the sandhi in nonce words, and (c) the structure
dependency of Shanghai tone sandhi is also productive, as the
speakers produced qualitatively different f0 patterns in
modifier–noun nonce words and verb–noun nonce phrases. These
results indicate that in order to arrive at a full picture of tone
sandhi patterning, experimental data that shed light on the
generalizations that speakers make from the speech input are
necessary.
& 2015 Elsevier Ltd. All rights reserved.
1. Introduction
1.1. Tone and tone sandhi in Shanghai Wu
Shanghai is a Northern Wu dialect of Chinese spoken in a major
metropolis in eastern China with a population of 23.5 million (2010
census data, from http://www.stats-sh.gov.cn/). Like other dialects
of Chinese, Shanghai Wu is tonal, but two properties of Shanghai
differentiate its tone system from the more familiar four-tone
system of Mandarin. First, Shanghai has retained the historical
checked syllables (syllables closed by a stop, realized in Shanghai
as CV) that Mandarin has lost. These syllables have considerably
shorter duration than open or sonorant-closed syllables and a
reduced tonal inventory: there are three tones on open or
sonorant-closed syllables, transcribed by Xu, Tang, and Qian (1981)
in Chao numbers (Chao 1948, 1968) as 53 (T1), 34 (T2), and 13 (T3);
but on CV syllables, there are only two phonetic tones 55 (T4) and
12 (T5).1 Second, Shanghai, like many Wu dialects of Chinese, has
maintained the historical voicing/phonation distinction in syllable
onsets, and the cooccurrence restriction between voicing/phonation
and f0, which led to the yin-yang tone split in many Chinese
dialects (Karlgren, 1915–1926; Haudricourt, 1954; Pulleyblank,
1978; Yip, 1990, among many others), is still synchronically
relevant for Shanghai: the higher tones 53, 34, and 55 (the
historical yin tones in Chinese) only occur after voiceless
obstruents and modal sonorants and the lower tones 13 and 12 (the
historical yang tones) only occur after voiced obstruents and
murmured sonorants.2
hts reserved.
University of Kansas, 1541 Lilac Lane, Blake Hall, Room 427,
Lawrence, KS 66045-3129, USA. Tel.: +1 785 864 2879;
to high is represented by a numerical scale from “1” to “5.”
Contour tones are denoted by number concatenations; e.g., “13” . In
the tradition of Chinese dialectology, we also use an underline to
indicate tones that occur on syllables closed by an obstruent
t realized with typical closure voicing, but were described as
“voiceless with voiced aspiration” by Chao (1967). More recent
erties of breathy phonation such as higher H1–H2 (Cao &
Maddieson, 1992; Ren, 1992; Chen, 2011; Gao, Hallé, Honda, Maeda,
the voiceless category (Shen & Wang, 1995; Wang, 2011). On
fricatives, the voicing distinction is truly reflected in voicing.
On
J. Zhang, Y. Meng / Journal of Phonetics 54 (2016) 169–201170
Fig. 1 illustrates the five phonetic tones in Shanghai and their
cooccurrence with syllable types and onsets. The data came from one
female speaker, who read eight monosyllabic morphemes for each tone
one time in isolation. The f0 values of the tones were measured
using the ProsodyPro script (Xu, 2005–2011) in Praat (Boersma &
Weenink, 2009), and the values in Hz were first converted into
semi-tone, and then z-score transformed. For more details of the
stimuli and data analysis, see Section 2.
Like in the majority of Chinese dialects, tones in Shanghai
participate in tone sandhi depending on the context in which they
appear. Comprehensive descriptions of Shanghai tone sandhi in
disyllables appeared in Sherard (1972), Zee and Maddieson (1980),
Shen (1981), Xu et al. (1981), Xu & Tang (1988), and Zhu (1999,
2006). Two properties of Shanghai tone sandhi are particularly
noteworthy. First, the sandhi pattern that occurs in compounds is
the so-called “left-dominant sandhi” (Yue-Hashimoto, 1987; Chen,
2000; Zhang, 2007, 2014), which spreads the tone of the initial
syllable across the entire word. Examples (1a) and (1b) show that
the surface tones of the compounds “to catch a cold” and
“popsicle,” 55-31 and 22-44, are derived by spreading the base
tones of the initial syllables, 53 and 13, over the disyllables,
respectively. This is a notably different pattern from the more
familiar third tone sandhi in Mandarin whereby a T3 (213) changes
into a T2 (35) before another T3.3 Yue-Hashimoto (1987) and Zhang
(2007) termed the Mandarin-type tone sandhi “last-syllable
dominant” and “right-dominant,” respectively, and showed from
typological data that there is an asymmetry in how the sandhi
behaves based on directionality, in that left-dominant sandhi tends
to involve the extension of the initial tone rightward, while
right-dominant sandhi tends to involve local or paradigmatic tone
change. Shanghai and Mandarin, therefore, represent a typical
pattern in their respective sandhi directionality.
(footnote continued) sonorants, the modal-murmured dis Zhu (1999),
who transcribed the so
3 Acoustically, the third tone sa a sandhi T3 and a base T2
cannot
tinction, which corresponds to the norant distinction as CC
and
ndhi in Mandarin does not involve be reliably perceived by native
s
voiceless-voiced distinction in obstruents, is only reported by a
qCC, respectively. We use this transcription practice here
complete neutralization (Peng, 2000; Yuan & Chen, 2014; am
peakers (Peng, 2000).
(1)
“to catch a cold” (Xu et al., 1981: p. 151)
b.
b13
“stick”
pin53
“ice”
“popsicle” (Xu et al., 1981: p. 153)
Second, tone sandhi in Shanghai is sensitive to the morphosyntactic
structure of the disyllabic sequence. According to Xu et al. (1981)
and Xu and Tang (1988), modifier–noun combinations are invariably
compounds and can only undergo left-dominant sandhi. Verb–noun,
verb-modifier, subject–predicate combinations and coordinate
structures that are less lexicalized and have lower frequency of
occurrence, however, can undergo right-dominant sandhi, which
retains the tone of the final syllable and reduces the tonal
contour of the nonfinal syllable. The effects of syntactic
structure and frequency of occurrence on Shanghai tone sandhi are
illustrated by the examples in (2). In (2a), the same morphemes for
“to fry” and “rice”, when concatenated as a modifier–noun compound
“fried rice,” undergo left-dominant contour extension, but when
concatenated as a verb–noun phrase “to fry rice”, may undergo
either left-dominant contour extension or right-dominant contour
reduction. In (2b), the verb “to pull” is concatenated with three
different nouns – “river”, “grass”, and “tree”, which form an
idiomatic expression for “tug-of-war”, a commonly used phrase “to
pull out grass; to weed”, and a rarely used phrase “to pull out a
tree”, respectively, and the tone sandhi patterns for these three
concatenations are left-dominant only, variable left-dominant or
right-dominant, and right-dominant only, respectively.
subset of the res .
ong others). But t
he small acoustic difference between
Ta L
J. Zhang, Y. Meng / Journal of Phonetics 54 (2016) 169–201
171
(2)
Left-dominant sandhi:
σ2¼CV or CVN
53-X-55-31 34-X-33-44 13-X-22-44 55-X-33-44 12-X-11-13
The effects of syntactic structure and frequency in Shanghai tone
sandhi:
a.
ts34
σ2¼CV
“to fry”
b.
b12
/b12-z13/-[b22-z13]
(Xu et al., 1981: p. 148)
The complete patterns of left-dominant and right-dominant sandhis
in Shanghai reported in Xu et al. (1981) are summarized in Table 1.
Three observations can be made regarding the left-dominant sandhi.
First, the tone on the second syllable is entirely determined by
the tone on the first syllable and hence completely loses its
contrastive status. Second, when the second syllable is open or
closed by a nasal, the spreading pattern can be separated into two
types depending on the tone on the first syllable: for Tones 1 to
4, the contour on the first syllable is extended across the
disyllable, which can be termed contour extension; for Tone 5,
however, the contour tone on the first syllable is displaced onto
the second syllable, which can be termed contour displacement (see
also Zhu, 1999). Third, when the second syllable is CV, only level
tones appear on the surface. For right-dominant sandhi, the general
pattern is that the first syllable loses the tonal contour while
maintaining the overall tone height, and Tones 1 (53) and 2 (34)
are neutralized to 44.
Xu et al. (1981) argued that the left- vs. right-dominant sandhi
directionality is determined by whether the disyllable forms a
phonological word, and subsequent phonological analyses of Shanghai
tone sandhi and prosodic domains (e.g., Selkirk & Shen, 1990;
Duanmu, 1995) have adopted this position, but often assumed that
phrases simply do not undergo tone sandhi and right- dominant
sandhi only represents phonetic reduction of the nonfinal
tones.
1.2. Goals of the current study
A goal of the current study is to provide an acoustic investigation
of the two unique properties of Shanghai tone sandhi: rightward
tone spreading and structure dependency. Descriptively, we aim to
provide acoustic details of both left-dominant and right-dominant
tone sandhi in Shanghai in order to (a) verify the spreading
property of the left-dominant sandhi reported in earlier literature
and (b) shed light on the nature of right-dominant sandhi – is it
better interpreted as phonological leveling with prespecified,
neutralized level targets or phonetic contour reduction? In so
doing, the study offers a comprehensive acoustic description of
disyllabic tone sandhi in a Chinese language with purported
bidirectionality, a task hitherto rarely attempted (but see
Takahashi, 2013, reviewed below).
But more importantly, the study aims to go beyond the sandhi
patterns observed in existing words and phrases in Shanghai and
test the productivity of both rightward spreading and structure
dependency in Shanghai using a nonce-probe test (“wug” test; Berko,
1958). The productivity of a linguistic process refers to its
ability to apply to new items (Bybee, 2001: pp.12–13). The
understanding of productivity is important to theoretical
linguistics as it provides crucial evidence about the
generalizations and cognitive abstractions that speakers make and
hence directly addresses the issue of grammar in the sense of the
tacit knowledge of the speaker (Bybee, 2001, Pierrehumbert, 2003,
among many others). In the realm of phonology, productivity is a
particularly timely issue as recent experimental research has shown
that the speakers’ phonological knowledge as reflected in
productivity patterns is often not identical to the lexical
patterns of the language in question (e.g., Zuraw, 2007; Berent,
Steriade, Lennertz, & Vaknin, 2007; Hayes, Zuraw, Siptár, &
Londe, 2009; Becker, Ketrez, & Nevins, 2011; Hayes & White,
2013). One factor that has been shown to affect productivity is the
phonetic basis of the phonological process. For instance, Zuraw
(2007) showed through a corpus study on loans and a web- based
survey on nonce words that Tagalog speakers possessed knowledge of
the splittability of word-initial consonant clusters that was
informed by perception, but could not be deduced from lexical
statistics. Hayes et al. (2009) tested Hungarian speakers’
Right-dominant sandhi:
53-X-44-X 34-X-44-X 13-X-33-X 55-X-44-X 12-X-22-X
J. Zhang, Y. Meng / Journal of Phonetics 54 (2016) 169–201172
knowledge of suffixal vowel harmony through a nonce probe test and
showed that although the speakers learned both phonetically natural
(suffixed vowels correlated with properties of stem vowels) and
unnatural patterns (suffixal vowels correlated with properties of
the stem-final consonant), the unnatural patterns were undervalued
and learned less robustly than the natural ones. Specific to the
rightward spreading tone sandhi in Shanghai, two hypotheses can be
made regarding its productivity. First, based on the
crosslinguistic prevalence of progressive, assimilatory tonal
coarticulation (Mandarin: Xu, 1997; Tianjin: Zhang & Liu, 2011;
Taiwanese Southern Min: Peng, 1997; Malaysian Southern Min: Chang
& Hsieh, 2012; Vietnamese: Han & Kim, 1974; Brunelle, 2009;
Thai: Gandour, Potisuk, & Dechongkit, 1994; Potisuk, Gandour,
& Harper, 1997),4 Zhang (2007) argued that left-dominant
spreading sandhi is conceivably a phonologized result of it. The
strong affinity between rightward spreading sandhi and progressive
assimilatory coarticulation predicts that the spreading sandhi
should be overall productive, and this hypothesis will be tested by
the comparison of the spreading sandhi application between real and
nonce words. Second, we expect a productivity difference between
contour extension and contour displacement in that the latter would
be less productive due to its more distant affinity with
progressive coarticulation.
Our other goal is to test the hypothesis that the structure
sensitivity of the sandhi is productive – a hypothesis rooted in
the productivity of morphosyntactic combinations. We test this
hypothesis by comparing the tonal realization between two types of
disyllabic nonce items – modifier–noun combinations, which form
words and are expected to undergo left-dominant sandhi, and verb–
noun combinations, which should form phrases due to their nonce
nature and hence undergo right-dominant sandhi. If corroborated,
this hypothesis will lend direct support to the interface analysis
between syntactic structure and prosodic domain in Shanghai
(Selkirk & Shen, 1990; Duanmu, 1995). The nonce-probe test used
in the comparison may also serve as an additional method that
offers empirical evidence for theoretical analyses of
prosody–syntax interface in general.
2. Previous literature
2.1. Acoustic studies on Shanghai disyllabic tone sandhi
Despite the relative prestige that Shanghai Wu enjoys as one of the
largest dialects of Chinese, there has been relatively little
experimental data on the tone sandhi pattern of the dialect. Zee
and Maddieson (1980), Toda (1990), Zhu (1999), Chen (2011), and
Takahashi (2013) are the precious few exceptions. We restrict
ourselves to a review of the disyllabic sandhi pattern – the focus
of our study – in these works. Zee and Maddieson (1980) recorded
one female speaker and found that the f0 contour of disyllabic
compounds was similar in shape to that of the first syllable of the
compound. But when the first syllable was a checked syllable with a
low rising tone, the rising contour was realized on the second
syllable of the disyllable, whose sandhi pattern was analyzed as
[L- LM↑], where M↑ indicates a raised Mid. Toda (1990) specifically
investigated the tonal realization of disyllables where the first
syllable had a high falling tone (T1) as its base tone. In the two
speakers that she recorded, both showed a high level tone on the
first syllable and a mid falling tone on the second syllable. Toda
argued that this pattern was difficult to analyze as a simple
contour extension from the base tone of the first syllable due to
the difference in the time-normalized f0 contour between the
disyllable and the first syllable. Zhu (1999) replicated Toda’s
result in one of the two speakers that he recorded, but found that
the other speaker’s T1+X pattern could indeed be interpreted as
contour extension from the first syllable. Zhu further argued that,
while T2+X and T3+X involved contour extension, T4+X and T5+X both
involved contour displacement. But the T4+X result was difficult to
evaluate due to the small f0 excursion on both the monosyllables
and disyllables. Chen’s (2011) primary goal was to investigate how
the f0 perturbation from the onset consonant in noninitial position
is affected by the phonological consonant-tone cooccurrence
restriction, but her results did show that the f0 of the second
syllable was primarily determined by the base tone of the first
syllable, and the f0 difference associated with the laryngeal
feature of the second syllable, which could potentially be linked
to a base tone difference, was largely realized in the first 50ms
of the vowel and attributable to f0 perturbation. Takahashi (2013)
was the only work we are aware of that investigated both
left-dominant and right-dominant sandhi in Shanghai, although his
left-dominant investigation focused on three- and four-syllable
sequences. His data showed that in left-dominant contexts, the f0
contour of the polysyllabic sequence was indeed determined by the
base tone of the first syllable, and younger speakers inserted a
default Low tone on the third syllable of the sequence. This echoes
Chen’s (2008) earlier finding on polysyllabic tone sandhi in
Shanghai. For right-dominant sandhi, Takahashi investigated the f0
pattern on the initial syllable of disyllables under different
speech rates and found that, at all speech rates, the contour shape
of the initial tone was preserved and the falling T1 and rising T2
did not result in neutralization, thus supporting the position that
right-dominant sandhi in Shanghai is gradient phonetic reduction
rather than neutralizing phonological changes.
4 Regressive tonal coarticulation is commonly attested as well, but
its nature may be either assimilatory or dissimilatory. The
dissimilatory effect of a Low tone on a preceding High is
particularly notable and has been shown in Mandarin (Shih, 1986;
Shen, 1990; Xu, 1997), Thai (Gandour et al., 1994; Potisuk et al.,
1997), Taiwanese (Peng, 1997), and Yoruba (Laniran, 1992). The
duration and magnitude of progressive tonal coarticulation is
typically reported to be greater than those of regressive tonal
coarticulation, but the opposite effect has occasionally been found
(e.g., Mandarin: Shen, 1990; Yoruba: Laniran, 1992; Vietnamese:
Brunelle, 2009). In the modeling of prosody, researchers have
treated the directionality of coarticulatory smoothing differently.
For instance, Kochanski and Shih’s (2003) soft template model
(Stem-ML) assumes bidirectional smoothing; Pro-om et al.’s (2009)
quantitative target approximation (qTA) model as well as its
predecessor, the parallel encoding and target approximation (PENTA)
model (Xu, 2005), is sequential and allows only left-to-right
coarticulatory influences.
J. Zhang, Y. Meng / Journal of Phonetics 54 (2016) 169–201
173
2.2. Productivity studies on Chinese tone sandhi
The nonce probe tests, whereby speakers are asked to provide
responses to novel words in contexts that are facilitative to the
application of the phonological process in question, have been
widely used to test the productivity of phonological alternations
(e.g., Albright, Andrade, & Hayes, 2001; Hayes & Londe,
2006; Zuraw, 2007; Hayes et al., 2009; Becker et al., 2011; Hayes
& White, 2013) as well as regular and irregular morphological
rules (e.g., Bybee & Pardo, 1981; Albright, 2002; Albright
& Hayes, 2003; Pierrehumbert, 2006). Using this method to
investigate the productivity of tone sandhi can be traced back to
the work of Hsieh (1970, 1975, 1976) on Taiwanese Southern Min.
Subsequent works have investigated the productivity of tone sandhi
in Mandarin (Zhang & Lai, 2010), Tianjin (Zhang & Liu, in
press), Wuxi (Yan & Zhang, in press), as well as Taiwanese
(Wang, 1993; Zhang & Lai, 2008; Zhang, Lai & Sailor, 2011).
The major finding is that, similar to the works on segmental
phonology cited in Section 1.2, the speakers’ phonological
knowledge of tone sandhi is also not necessarily identical to the
sandhi patterns reflected in lexical statistics. The works on
Taiwanese, for example, have shown that when the tone sandhi
involves a circular chain shift, the sandhi is not entirely
productive in wug tests, indicating that despite the regularity of
the sandhi in the language, the speakers have not completely
internalized the pattern and likely rely on lexical and allomorph
listings for the sandhi.5 The phonetic property of the sandhi has
also been shown to have an influence on how it is internalized by
speakers. For instance, Zhang and Lai (2010) tested the
productivity of both the third tone sandhi (213-35/__213) and the
half-third sandhi (213-21/__T, Ta213) in Mandarin and found that,
although both applied categorically to nonce words, the application
of the former was phonetically incomplete. They attributed this to
the greater phonetic naturalness of the latter.6 Zhang and Liu (in
press) replicated this result in the tonal cognates in Tianjin, a
dialect closely related to Mandarin. In addition, Yan and Zhang (in
press) showed that in Wuxi, a Wu dialect, the productivity of tone
sandhi in nonce words is positively correlated with the phonetic
similarity between the base tone and the sandhi tone – another
effect of the phonetic nature of the sandhi.
These previous works indicate that our understanding of tone sandhi
can benefit considerably from productivity studies that shed more
direct light on the speakers’ tacit knowledge of the sandhi
patterns. The results of these productivity studies will then
provide a firmer foundation from which formal analyses of tone
sandhi can proceed.
The productivity studies so far, however, are limited in two
respects. First, they have primarily focused on right-dominant
sandhi, and we know little about the productivity of the
left-dominant spreading pattern common in Northern Wu dialects like
Shanghai. This is especially interesting as the rightward spreading
pattern is the most closely related to progressive tonal
coarticulation. If a strong phonetic basis of the sandhi
facilitates its productivity, we would expect the rightward
spreading pattern to be relatively productive. Second, previous
studies have not investigated the structure sensitivity of tone
sandhi. The research on Shanghai that we report here fills these
two gaps and complements our current knowledge of tone sandhi
productivity.
3. Methodology
The basic methodology of our study was to elicit disyllabic
utterances from native speakers of Shanghai by presenting them with
two separate monosyllables in their base tones and asking them to
pronounce the syllables together as a real word or phrase in
Shanghai. The tonal realization of the two syllables was then
measured to quantify the application of the tone sandhi. The
experiment was divided into two parts, one dealing with existing
disyllables, one dealing with nonce disyllables, and both words and
phrases, which were expected to undergo left- and right-dominant
sandhis, respectively, were tested. We first describe our
participants and the stimulus construction. The set-up and
procedure for the two parts of the experiment are then discussed,
followed by how we analyzed the f0 data and the statistical method
that we used for f0 curve comparisons.
3.1. Participants
There is considerable dialect-internal variation within Shanghai,
and due to close contact with other Wu dialects such as Suzhou and
Ningbo as well as the dominant influence of Standard Chinese,
Shanghai has undergone and is still undergoing fast changes,
especially in its phonetics and lexicon.7 We focused on the variety
of Shanghai spoken in the urban area by younger speakers in this
study. Our experiment was conducted in the Phonetics Laboratory of
the Department of Chinese Language and Literature at Fudan
University, Shanghai. Forty-eight speakers (28 females) who grew up
in one of the ten urban districts of Shanghai and self-identified
as native, fluent speakers of Shanghai Wu participated in the
experiment. The majority of the participants were undergraduates at
Fudan University, and the participants’ mean age at the time of
experiment was 24.6.
5 There is a range of tone sandhi application rates that has been
reported for Taiwanese wug tests, and the rate seems to (a) be
task-dependent and (b) increase with continued exposure to the
nonce items (e.g., Hsieh, 1975; Wang, 1993). Chuang, Chang, and
Hsieh (2011) argued that the “foreignness” of the nonce items
contributed to the unproductivity results in earlier wug tests and
showed that when speakers were asked to undo tone sandhi in
existing disyllabic monomorphemic words and Japanese loanwords, the
sandhi productivity was considerably higher. They went on to argue
that the method of wug tests in the study of productivity needed to
be reevaluated. While we agree that the exact application rate of a
phonological process in a wug test cannot be directly taken as the
productivity of the process, the comparison of wug test results on
tone sandhi patterns in different dialects under the same method
still informs us that speakers internalize different types of
sandhi differently. For instance, Taiwanese tone sandhi induces
categorical non-application in nonce words, while Mandarin tone
sandhi does not. Moreover, the incorporation of listed lexical
items or allomorphs does not preclude the possibility of
practice/learning effect as the nonce word becomes more familiar.
Zuraw (2000, 2003), for instance, has proposed a model that allows
the application rate of semi-productive phonological processes to
increase in loanwords as they gradually become incorporated in the
lexicon.
6 Zhang & Lai (2010) discussed a number of alternative
interpretations for the result, including the low frequency of
third tone sandhi cases, treating the low falling tone as the base
tone for T3, and the syntactic dependency of the third tone sandhi.
Without taking the discussion too far afield, we refer the reader
to their article for a comparison of the interpretations.
7 For information about the diachronic changes, dialectal
variation, and sociolinguistic situation of Shanghai, see Xu and
Tang (1988), Qian (1997), and Zhu (1999, 2006).
J. Zhang, Y. Meng / Journal of Phonetics 54 (2016) 169–201174
3.2. Stimulus construction
3.2.1. Real disyllables The first part of the experiment
investigated the nature of left-dominant and right-dominant tone
sandhi in real disyllabic
sequences in Shanghai. To this end, we aimed to select 100 words
and phrases, four for each of the 25 base-tone combinations. Among
the four, two should undergo left-dominant sandhi, and two should
undergo right-dominant sandhi. We also wanted to ensure that the
directionality difference was not simply a function of usage
frequency difference, but structure-related. We therefore aimed to
match the overall frequency between the left-dominant and
right-dominant items.
Given that there is no existing frequency corpus of Shanghai, we
first designed and implemented an online subjective frequency
rating pretest that estimated the usage frequency of 400 disyllabic
words and phrases in Shanghai (Balota, Pilotti, & Cortese,
2001). Of the 400 items, 200 undergo left-dominant sandhi and 200
undergo right-dominant sandhi according to Xu et al. (1981) and Xu
and Tao (1997). The pronunciation of these 400 items was further
checked with two native speaker consultants (one female). Within
the 200 in each directionality, each tonal combination was
represented by eight items. Our female consultant recorded the
entire list and the recording was used for the frequency pretest.
The test was divided into four sessions, each with 100 words, and
the four sessions were matched in numbers for tones and sandhi
types. The test was implemented online in LimeSurvey hosted by the
Ermal Garinger Academic Resource Center at the University of
Kansas.
The test was advertised through Chinese dialect websites, the
Linguist List, social media websites, and word of mouth. In the
end, the numbers of complete responses for each session were 33,
30, 30, and 33, respectively. Some of our subjects participated in
all sessions, some only in a subset of them.
During the test, participants were given the Chinese characters and
the acoustic recording of an item and asked to respond whether the
item was “very commonly used,” “commonly used,” “neither common nor
rare,” “rarely used,” or “very rarely used” (given in Chinese
characters). The subjects’ ratings were converted into a 1–5 scale,
where 1 represents a “very rarely used” response and 5 a “very
commonly used” response, and the ratings for each item were
averaged across subjects. From the 400 items, 100 (50 left-
dominant, 50 right-dominant, 4 for each tonal combination) were
eventually selected so that the left- and right-dominant items had
the same rating distribution (Kolmogorov–Smirnov test: D¼0.14,
p¼0.7166) and the same rating mean (Wilcoxon test: W¼1341.5,
p¼0.5304; the Wilcoxon test was used due to non-normal
distributions of the samples). The syntactic structures of the
left-dominant sandhi items were primarily modifier–noun, but also
included modifier–verb, coordination, and lexicalized compounds and
proper names. The syntactic structures of the right-dominant sandhi
items were primarily verb–noun, but also included verb–adverb and
subject–predicate. The segmental contents of the syllables were not
actively controlled. The complete stimulus list is given in
Appendix A.
Given that one of the main goals of the study was to investigate
the productivity of tone sandhi by comparing the sandhi application
in real and nonce words, we elicited the sandhi patterns in real
words by providing the speakers with the base tones of individual
syllables, as the base tones of the nonce syllables must be given
to the subjects (see below). Otherwise, the real words would be
read with no auditory priming of the base tones, while the nonce
words would. The individual syllables were read in their base tones
in isolation by our female consultant and recorded in an anechoic
chamber at the University of Kansas. The acoustic files of the
individual syllables were then used in the first part of the
experiment.
To alleviate fatigue of our subjects, who participated in both
parts of the experiment, we divided the stimulus list into two,
each including one item for left-dominant sandhi and one item for
right-dominant sandhi for each of the tonal combinations. One list
was used for half of the subjects, and the other list was used for
the other half.
3.2.2. Nonce disyllables The second part of the experiment involved
the subjects’ production of disyllabic nonce sequences, which were
formed by
combining a syllable accidentally missing from the Shanghai
syllabary (legal segmentals and legal tone, whose combination does
not violate voicing-tone cooccurrence restrictions, but happens to
be missing in the syllabary) as the first syllable (σ1) and an
existing syllable as the second syllable (σ2). The nonce σ1 was
provided a meaning as either a nominal modifier or a verb to elicit
left- dominant and right-dominant sandhi, respectively, and σ2 was
always an existing noun. Ten nonce syllables, two in each tone,
were used in σ1 position, and each syllable was associated with two
meanings – a modifier meaning and a verb meaning. Each speaker,
however, only heard one meaning for each syllable. These nonce
syllables were arrived at by first consulting the complete Shanghai
syllabary in Zhu (2006, pp. 22–23); the missing segmentals and tone
combinations from the syllabary were then checked with both of our
consultants for acceptability. Given the voicing-tone cooccurrence
restrictions, which limited the number of logical combination,
there were relatively few items to choose from, and we were not
able to match the segmental properties of these nonce syllables
(e.g., consonant aspiration, vowel height) with those of σ1s in the
real disyllables. Ten monosyllabic nouns, two in each tone, were
used in σ2 position, and each speaker used one noun to combine with
a modifier nonce σ1 and the other to combine with a verb nonce σ1.
The two sets of nonce syllables in σ1 and their meanings are given
in Table 2, and the two sets of existing nouns used in σ2 are given
in Table 3.
For example, two nonce syllables with Tone 1 (53) were used in σ1
position: m~ 53 and mu53; half of the speakers would hear m~ 53
used as a modifier meaning “a special color” and mu53 used as a
verb meaning “to shop online,” and the other half of the speakers
would hear m~ 53 as the verb and mu53 as the modifier. Each nonce
syllable in σ1 was combined with five monosyllabic nouns, one in
each tone, in σ2. For example, m~53 was combined with s53 “book”,
s34 “umbrella”, di13 “musical instrument”, pi55
“pen”, di12 “flute”, and mu53 was combined with ho53 “flower”, ts34
“grass”, zo13 “tea”, ty55 “chrysanthemum”, m12 “sock”.
Table 3 The two sets of nouns that were used to create
modifier–noun and verb–noun combinations.
T1 s53 “book” ho53 “flower” T2 s34 “umbrella” ts34 “grass” T3 di13
“musical instrument” zo13 “tea” T4 pi55 “pen” ty55
“chrysanthemum”
T5 di12 “flute” m12 “sock”
Table 2 The two sets of nonce syllables, cued as modifiers or
verbs, for half of the speakers. For the other half, the meaning
columns for the nonce syllables were switched.
T1 m53 “a special color” mu53 “to shop online” T2 p34 “a city name”
to34 “to sell in a special way” T3 b13 “a man-made
material” n13 “to transport via a
spaceship” T4 me55 “a smell” ne55 “to smuggle in a special
way” T5 ue12 “a shape” y12 “to give as a gift in a special
way”
J. Zhang, Y. Meng / Journal of Phonetics 54 (2016) 169–201
175
Therefore, half of the speakers described “book”, “umbrella”,
“musical instrument”, “pen”, and “flute” in a special color whiling
shopping online for “flower”, “grass”, “tea”, “chrysanthemum”, and
“sock”, while the other half did the opposite, but the segmental
contents pronounced by the two groups of speakers were identical.
In the end, each speaker produced 25 modifier–noun and 25 verb–noun
sequences. The entire stimulus list for this part of the experiment
can be found in Appendix B. The cue sentences that provided the
meanings for the nonce syllables as well as prompts for the
subjects’ response were again recorded by our female
consultant.
3.3. Experimental procedure
Each participant first filled out a language background
questionnaire and signed an informed consent form, then
participated in the experiment. All participants did the real
disyllable portion of the experiment first, then the nonce
disyllable portion after a five-minute break. They were paid a
nominal fee upon the completion of the experiment.
Both parts of the experiment were implemented in Paradigms
(Tagliaferri, 2010). For the real disyllable portion, the subjects
were given the two syllables in their base tones auditorily,
separated by an 800 ms pause; the Chinese characters associated
with the syllables also appeared on a computer screen as the
syllables played. The subjects were then prompted to pronounce the
words out loud in a clear and natural way. The stimuli were
randomized for each speaker. The main experiment was preceded by an
instruction read in Shanghai and a practice session that included
four disyllabic items – two left-dominant and two right-dominant –
that did not appear in the main experiment.
For the nonce disyllable portion, the subjects were given the
meanings of the nonce syllables both auditorily and in written
form. The nonce syllables were pronounced with their base tones
twice during the verbal prompt and represented orthographically
with a box “” in lieu of a Chinese character on the computer
screen. For instance, the subjects would both hear and see “
m~ 53; m~ 53, ___” (“If to shop online is called m~ 53; if a book
has not been m~ 53-ed, then we can say that we have not ___”), with
the nonce syllable “m~ 53” represented as “” on the screen. The
subject was expected to reply with /m~ 53/ (“m~ 53-ed the book”)
with right-dominant sandhi. For each nonce syllable, the five
monosyllabic nouns that it was combined with appeared together in
one block; i.e., once the speakers were given the meaning of m~ 53,
they were asked to combine it with five different nouns one after
another. Different nonce words appeared in random order for each
speaker. The main experiment was also preceded by an instruction
and a practice session. The practice session used two nonce
syllables that were not used in the experiment, one cued as a
modifier and one cued as a verb, and the subjects were asked to
combine each nonce syllable with two different monosyllabic nouns.
The subjects were encouraged to ask questions during and after the
practice if they had trouble understanding the task. Some did, but
all were judged to have comprehended the task before they moved
onto the experiment.
For both portions of the experiments, the subjects’ response was
continuously recorded using a Marantz solid state recorder PMD 671
sampling at 22.05 kHz and an EV N/D 767a microphone.
3.4. Data analysis
All acoustic analyses of the data were conducted in Praat (Boersma
& Weenink, 2009). The rimes of the syllables in the target
stimuli were first identified and annotated in a text grid, we then
took an f0 measurement at every 10% of the rime duration for each
target syllable using ProsodyPro (Xu, 2005–2011), giving eleven f0
measurements for each syllable. ProsodyPro uses the automatic vocal
pulse marking by Praat as well as a trimming algorithm that removes
spikes and sharp edges (see Appendix 1 of Xu, 1999 for additional
information on the trimming algorithm). The Maxf0 and Minf0
parameters in the script as well as the octave-jump cost were
J. Zhang, Y. Meng / Journal of Phonetics 54 (2016) 169–201176
adjusted for each speaker, and the f0 measurements were
hand-checked against narrow-band spectrograms in Praat to correct
for octave and other errors in the measurements provided by the
script.
The f0 measurements in Hz were converted to Semi-tone relative to
50 Hz using the formula in (3a) to better reflect pitch perception
(’t Hart, Collier, & Cohen, 1990; Rietveld & Chen, 2006).
The Semi-tone values were then z-score transformed using the
formula in (3b) over all measurements from a speaker to normalize
for between-speaker variations, especially male and female
differences (Rose, 1987; Zhu, 2004).
(3)
a.
STi Þ2 p
In addition to the data from the 48 experimental participants, the
individual syllables recorded from our language consultant for the
first part of the experiment were also analyzed, and these formed
the basis for the data on the tonal inventory of Shanghai (see Fig.
1).
To compare the rightward spreading sandhi application between real
and nonce words and the application of sandhi between modifier–noun
words and verb–noun phrases in nonce items, we used growth curve
analysis (Mirman, 2014) to model the f0 curves of the two syllables
in the subjects’ responses. This analysis describes the functional
form of the probability distribution of f0 over time by identifying
model fit components for a f0 curve that captures this probability
distribution. To capture the changes in f0 direction within a
syllable, but in the meantime avoid overfitting the segmental
effect, we used quadratic orthogonal polynomials to model all f0
curves over a syllable. The time terms for orthogonal polynomials
are uncorrelated, hence their parameter estimates are independent
of each other. The intercept term indicates the average height of
the curve; the linear term indicates the overall slope of the
curve; and the quadratic term indicates the sharpness of the
centered peak of the curve. Detailed methods of the f0 comparisons
are given together with the results to facilitate the
interpretation of the results.
In addition, the participants’ tonal response to each target
stimulus were also classified by a phonetically trained Shanghai
native speaker into “Spreading,” “No Sandhi,” and “Other” to
further shed light on the productivity and structure dependency of
the sandhi pattern. The speaker was a linguistics graduate student
who specialized in tone research and felt comfortable performing
the task. She was asked to classify a disyllabic response as
“Spreading” if its tone pattern is perceptually equivalent to how
she would pronounce an existing nominal compound, “No Sandhi” if
she believed that the subject pronounced the disyllable in its base
tones, and “Other” if the tone pattern did not fall under either of
these two categories. She started from the real disyllables, where
she reported the classification to be straightforward, then moved
onto the nonce disyllables, where she felt that the classification
was difficult for around 20% of the tokens. For these tokens, she
used a combination of her perception and a pitch track comparison
in Praat between the token in question and a real disyllable in the
same syntactic structure by the same speaker to make the final
decision. Generalized Linear Mixed-Effects models were then used to
investigate the effects of word type (real vs. nonce) and structure
on the classification.
The rime duration for all stimulus syllables was measured in Praat
as well and Linear Mixed-Effects models were used to investigate
how duration was affected by word type and structure.
All statistical analyses were carried out in R version 3.1.0 (R
Core Team, 2014) using the lme4 package version 1.1-6 (Bates,
Maechler, Bolker, & Walker, 2014).
4. Results
We first report in Section 4.1 the f0 result on the application of
left-dominant and right-dominant sandhis in real disyllables (first
part of the experiment). The goal is primarily descriptive: the f0
data shed light on the nature of the two types of sandhi and
address the questions of whether the left-dominant sandhi truly
involves the spreading of the initial tone rightward, and whether
right-dominant sandhi is better interpreted as phonological
leveling or phonetic contour reduction. We then report f0 and
sandhi classification comparisons between real and nonce words for
left-dominant sandhi (Section 4.2) and between modifier–noun words
and verb–noun phrases in nonce items (Section 4.3) to address the
productivity and structure dependency of the sandhi system.
Relevant rime duration comparisons are given in each section as
well.
4.1. Acoustic description of left- and right-dominant tone sandhi
in real disyllables
The time-normalized f0 data for real disyllabic words expected to
undergo left-dominant spreading sandhi, organized by base tone
combinations, are given in Fig. 2, and the right-dominant sandhi
undergoing counterparts are given in Fig. 3. f0 curves for the base
tones from our female language consultant, averaged over the eight
monosyllables used for each tone in the real word experiment, were
overlaid onto each graph as thin solid lines for reference. All f0
graphs here and elsewhere were produced with the R package ggplot2
(Wickham, 2009).
We can compare the two sets of f0 graphs in two ways to understand
the nature of the difference between left- and right-dominant
sandhi. First, if we look across each row, in which all graphs
share the same base tone on the first syllable but have different
base tones on the second syllable, we can see that in left-dominant
sandhi (Fig. 2), the base tone difference on the second syllable
is
Fig. 2. F0 data (vertical lines indicate 7SE) for real disyllabic
words expected to undergo left-dominant sandhi. Each graph
represents a base-tone combination. Thin solid lines represent the
average f0 curves for the base tones from the female language
consultant. Each observed data point represents the average f0 at a
particular normalized time point across participants.
J. Zhang, Y. Meng / Journal of Phonetics 54 (2016) 169–201
177
considerably curtailed, and the overall f0 pattern over the
disyllable indeed takes the shape of the contour of the first
syllable. The f0 on the first syllable is little affected by the
different base tones on the second syllable and is generally
realized as a slightly falling tone. The spreading pattern is
particularly clear in T1+X and T5+X: in the former, the falling
contour of the initial base tone was spread over the two-syllable
domain, and in the latter, the rising contour of the initial base
tone was displaced onto the second syllable, leaving a low tone on
the first syllable. For T2+X and T3+X, which had a rising tone as a
base tone on the first syllable, only the low portion of the rise
was realized on the first syllable, and the f0 was higher on the
second syllable in all tonal combinations except for T2+T3,
indicating a spreading of the first-syllable rise. In
right-dominant sandhi (Fig. 3), however, the f0 on the second
syllable remains close to the base tone shape. Like in
left-dominant sandhi, the f0 on the first syllable is also little
affected by the base tone of the second syllable, but instead of a
falling tone, it retains the tonal properties of the base tone.
These indicate that no sandhi has applied.
Second, if we look down each column, in which all graphs share the
same base tone on the second syllable but have different base tones
on the first syllable, we can see that in left-dominant sandhi, the
f0 on the second syllable is strongly affected by the different
base tones on the first syllable and realized differently despite
the same base tone. In right-dominant sandhi, however, the f0 on
the second syllable within the same column remains constant by
maintaining the tonal properties of the base tone. The f0 on the
first syllable also maintains properties of the base tone in
right-dominant sandhi. In particular, the two tones – T1 (53) and
T2 (34) – that have been reported to be neutralized to the same
level tone 44 in the literature retained their falling and rising
contours on the first syllable, respectively.
These comparisons indicate that disyllabic words in Shanghai indeed
undergo rightward spreading tone sandhi, but the so-called
“right-dominant sandhi” for disyllabic phrases is better
interpreted as phonetic contour reduction.
We can also note from Fig. 2 that in left-dominant sandhi, the
second syllable preserves many of its base tone properties despite
the strong influence of the first syllable tone spread. In T2+X,
T3+X, and T4+X, the f0 on the second syllable corresponds to the
base tones on the second syllable. The trace of the base tones is
noticeable for T1+X and T5+X as well: in T1+X, the second syllable
has a rise in T1+T2 that corresponds to the rise in base T2, and
the second syllable is higher in T1+T4 than T1+T5, corresponding to
the base tone difference between T4 and T5; in T5+X, the higher
second syllable in T5+T4 than T5+T5 is also
Fig. 3. F0 data (vertical lines indicate 7SE) for real disyllabic
phrases expected to undergo right-dominant sandhi. Each graph
represents a base-tone combination. Thin solid lines represent the
f0 curves for the base tones from the female language consultant.
Each observed data point represents the average f0 at a particular
normalized time point across participants.
J. Zhang, Y. Meng / Journal of Phonetics 54 (2016) 169–201178
clearly observable. The higher f0 on T1, T2, and T4 than T3 and T5
on the second syllable also corresponds to a voicing difference in
the onset of the second syllable: along with earlier results (Cao
& Maddieson, 1992; Ren, 1992; Chen, 2011; Wang, 2011), our data
showed clear voicing for stop closure and frication for obstruents
that cooccurred with T3 and T5. These results, therefore, are
likely due to a combination of the imitation effect from the
exposure to the base tones (see Goldinger, 1998; Delvaux &
Soquet, 2007; Tilsen, 2009; Nielsen, 2011, etc. on imitation
effects) and the perturbation effect from the initial consonant of
the second syllable.
The classification result of the f0 patterns into “Spreading,” “No
Sandhi,” and “Other” for the real items is given in Fig. 4. The
vast majority of forms that are expected to undergo left-dominant
sandhi indeed underwent the spreading sandhi, and the vast majority
of forms that are expected to undergo right-dominant sandhi
underwent no sandhi, as judged by our phonetically trained Shanghai
speaker. A Generalized Linear-Mixed Effects model on the
“Spreading” pattern, with structure as a fixed effect and
participant and item as random effects, showed that the M–N
structure had significantly higher “Spreading” counts than the V–N
structure (Estimate¼9.6711, S.E.¼0.8259, z¼11.709, p<0.001; M–N
as baseline).
The rime duration results for the two syllables in the two sandhi
directions are given in a box and whiskers plot in Fig. 5. Given
that the segmental content between the left- and right-dominant
sandhi items was not matched, we coded the vowel height of the rime
according to the lowest vocalic element during the rime as “High,”
“Mid,” and “Low” (e.g., [tyø] was coded as “Mid” and [ia] was coded
as “Low”), as vowel height is a known factor that affects duration
(e.g., House & Fairbanks, 1953; Peterson & Lehiste, 1960;
Maddieson, 1997). A likelihood-ratio comparison between a model of
rime duration that only included participant and item as random
effects and one with vowel height as an additional fixed effect
showed that the addition of vowel height significantly improved the
model (χ2(2)¼63.646, p<0.001). So this nuisance factor was
included in subsequent models, with structure (left- vs.
right-dominant) and syllable (σ1 vs. σ2) as potential factors.
Likelihood-ratio tests showed that among these models, the one that
included both terms and their interaction provided the best fit
with the data. From this model, we found that for the left-dominant
structure, there was no rime duration difference between σ1 and σ2
(Estimate¼1.432, S.E.¼1.856, t¼0.772, p¼0.440), and for the
right-dominant structure, σ2 had a significantly longer rime
duration than σ1 (Estimate¼23.945, S.E.¼1.863, t¼12.853,
p<0.001). These results indicate that the structural difference
is correlated with a difference in duration patterning.
Fig. 4. Tone pattern counts for “Spreading,” “No Sandhi,” and
“Other” for real items as determined by a phonetically trained
Shanghai speaker, organized by base tone combinations. “M–
N” (modifier–noun) and “V–N” (verb–noun) represent forms that are
expected to undergo left-dominant and right-dominant sandhi,
respectively.
Fig. 5. Rime durations for the two syllables for the real items in
the two sandhi directions. “S1”¼first syllable; “s2”¼second
syllable. The black dot represents the median, the box represents
the interquartile range (1st to 3rd quartile), and the whiskers
represent maximally 1.5 times the interquartile range.
J. Zhang, Y. Meng / Journal of Phonetics 54 (2016) 169–201
179
4.2. Productivity of the left-dominant sandhi
We focus in this section on the productivity of rightward tone
spreading sandhi by comparing the f0 patterns between real and
nonce words. The real word data were from the
left-dominant-sandhi-undergoing words in the first part of the
experiment, while the nonce word data were from the modifier–noun
novel compound formations in the second part of the experiment. As
stated earlier, the f0 curves were modeled using quadratic
orthogonal polynomials. To investigate the effect of word type
(real vs. nonce) on the f0 curve for a particular tonal
combination, a base model that only included the linear and
quadratic time terms and the participant and participant by word
type random effects on the time terms was first constructed. Word
type was then added onto this model as a factor, and word type’s
interactions with the time terms (time, time2) were subsequently
added step-wise. Their effects on model fit were evaluated using
log-likelihood model comparison. The model fit comparisons for all
50 growth curve analyses (two syllables 25 base-tone combinations)
as well as the R codes that generated the models are given in
Appendix C.
The observed f0 data for these two word types for each of the
base-tone combinations together with the second-order orthogonal
polynomial growth curve models for each of the syllables are given
in Fig. 6. Although model comparisons indicate that the full model
is not always justified in all f0 comparisons, it is in some cases.
We therefore graphed the full models for all cases to allow for a
consistent visual comparison. F0 curves for the base tones from our
language consultant were again overlaid onto each graph to aid the
assessment of sandhi productivity: a higher similarity between the
sandhi tone and the base tone would indicate a lower
productivity.
Fig. 6. Observed data (symbols, vertical lines indicate7SE) and
second-order orthogonal polynomial growth curve model fits (lines)
for f0 on disyllabic words expected to undergo left- dominant
sandhi. Each graph represents a base-tone combination. Filled
circle and thick solid line represent real words; filled triangle
and dotted line represent nonce words; f0 curves for the base tones
from our language consultant are overlaid onto each graph as thin
solid lines. Each observed data point represents the average f0 at
a particular normalized time point across participants.
J. Zhang, Y. Meng / Journal of Phonetics 54 (2016) 169–201180
A visual inspection of the general shapes of the f0 curves in Fig.
6 indicates that the spreading sandhi has generally applied to both
real and nonce words: the disyllable has an overall falling contour
when the first syllable has a falling tone (T1) and an overall
rising contour when the first syllable has a rising tone (T2, T3,
T5). This indicates that the spreading sandhi is generally
productive, supporting the hypothesis that a close affinity with a
phonetic process, here progressive tonal coarticulation,
facilitates a sandhi’s productivity. This is also supported by the
observation that the f0 curves for the two syllables in both real
and nonce words are quite different from those of the base tones
overlaid onto the graphs in Fig. 6.
For T2+X, however, there was a consistently large yet unexpected
difference in average f0 on the first syllable between real and
nonce words, and model comparisons showed that the intercept term
was significant for all T2+X combinations (χ2(1)>30,
p<0.001). An analysis of the experimental stimuli recorded by
our consultant indicated that she pronounced the nonce T2 syllables
with a lower- than-expected f0, almost in the T3 range, and we
believe that this was the cause for the unexpected
difference.
Despite the general similarity in f0 shape between real and nonce
words, model comparisons indicate that the f0 curves from the two
types of words are usually significantly different from each other:
in 42 out of 50 growth curve analyses, word type has a significant
effect on the intercept, linear, or quadratic term of the model;
and for all 25 base-tone combinations, the two word types have
different f0 curves on at least one of the syllables. There is some
evidence that the f0 curves in nonce words show more tonal
characteristics of the base tone than those in real words do. This
effect would be the most obvious when the expected sandhi tone is a
clear falling tone while the base tone is a clear rising tone, or
vice versa, a scenario found on the second syllable of T1+T2,
T1+T3, and T1+T5 combinations. The corresponding graphs in Fig. 6
show that the nonce words indeed have a greater rising tendency on
the second syllable than the real words, and this is supported by
model comparisons that showed that the linear terms significantly
improved the models (T1+T2: χ2(1)¼5.8341, p¼0.0157; T1+T3:
χ2(1)¼30.0598, p<0.001; T1+T5: χ2(1)¼28.5280, p<0.001), and
parameter estimates for the linear terms, with real words as the
baseline, all showed positive values (T1+T2: 0.5741; T1+T3: 1.3487;
T1+T5: 1.3237).
Fig. 7. Tone pattern counts for “Spreading,” “No Sandhi,” and
“Other” real and nonce items expected to undergo left-dominant
sandhi as determined by a phonetically trained Shanghai speaker,
organized by base tone combinations.
J. Zhang, Y. Meng / Journal of Phonetics 54 (2016) 169–201
181
Another scenario where this effect could be observed is when the
base tone and sandhi tone are expected to differ in f0 height. This
is found when (a) the first syllable has a rising (T3) or high tone
(T4) and the second syllable has a low-register tone (T3, T5),
which would cause the sandhi tone to be higher than the base tone
on the second syllable, or (b) the first syllable has a high
falling tone (T1) and the second syllable has a high-register tone
(T1, T4), which would cause the base tone to be higher than the
sandhi tone on the second syllable. The f0 comparison results,
however, are inconsistent. For all tonal combinations in (a), model
comparisons showed that the intercept term significantly improved
the model (T3+T3: χ2(1)¼6.8648, p¼0.0088; T3+T5: χ2(1)¼ 10.1586,
p¼0.0014; T4+T3: χ2(1)¼10.2454, p¼0.0014; T4+T5: χ2(1)¼43.2113,
p<0.001), but the parameter estimates for the intercept, with
real words as the baseline, only showed negative values for T4+T3
(0.6875) and T4+T5 (0.8529), but a positive values for T3+T3
(0.2037) and T3+T5 (0.3661). For the tonal combinations in (b), the
intercept term significantly improved the model for T1+T1
(χ2(1)¼7.7434, p¼0.0054), and the parameter estimate was positive
(0.6185), but the intercept term was not significant for T1+T4
(χ2(1)¼0.0176, p¼0.8944). These effects can also be seen in the
corresponding graphs in Fig. 6.
Another question on productivity we set out to address is whether
contour displacement is as productive as contour extension. Contour
displacement occurs on T5+X combinations, whereby the rising tone
from base T5 is displaced to the second syllable. Given that in the
tonal inventory, T1 is the only falling tone, if contour
displacement did not apply productively to nonce words, but did
apply to real words, we would expect the most marked difference to
appear on the second syllable of T5+T1. The corresponding graph in
Fig. 6 shows that in real words, the rising tone was indeed
displaced onto the second syllable, but in nonce words, the sandhi
tone on the second syllable was close to a level tone that was
higher than the first syllable. Model comparisons showed that for
the f0 on this syllable, adding word type and its interaction with
the linear and quadratic time terms stepwise to the base model all
significantly improved the previous model (intercept: χ2(1)¼5.6000,
p¼0.0180; linear: χ2(1)¼10.5987, p¼0.0011; quadratic: χ2(1)¼
6.1609, p¼0.0131). The parameter estimate for the linear term is
negative and significant (Estimate¼1.5005, t¼4.2857, p<0.001),
supporting the claim that the real words had more of a rising
contour on this syllable than the nonce words. There are two
potential interpretations for this difference. One is that, instead
of contour displacement, the more general contour extension has
applied to the nonce words. The other is that the level f0 is a
result of averaging rising tones from contour displacement and
falling tones from the lack of sandhi application. A closer look at
the sandhi behaviors from individual tokens showed that the latter
interpretation is more accurate. In other words, the nature of the
lower productivity for contour displacement is primarily non-
application. The sandhi classification result and the f0 result
from only the tokens classified as “Spreading” below provide
further support for this.
The model comparisons for the second syllable of other T5+X
combinations, however, showed little effect of word type. Except
for the linear term for T5+T2 (χ2(1)¼5.5578, p¼0.0184) and the
intercept term for T5+T3 (χ2(1)¼3.9629, p¼0.0465), adding word type
or its interactions with time terms did not improve the models.
This could mean that contour displacement applied productively to
nonce words. But an alternative interpretation is that for T5+T2,
T5+T3, and T5+T5, the second syllable had a rising tone as the base
tone, and therefore, the application and non-application of contour
displacement would both predict a rising tone on the second
Table 4 Parameter estimates for the fixed effect of word type (with
real words as the baseline) on the “Spreading” pattern counts in
the real and nonce items expected to undergo left-dominant sandhi
in the Generalized Linear-Mixed Effects models for the 25 base-tone
combinations. “nnn”: p<0.001; “nn”: p<0.01; “nn”: p<0.05.
For T2+T4, T3+T3, T3+T4, and T4+T4, 100% of the real items
exhibited the spreading pattern; to avoid complete separation in
the Generalized Linear-Mixed Effects analysis, an artificial
real-item data point that did not undergo spreading was added to
the dataset before the analysis was run.
Tones Estimate S.E. z p sig.
T1+T1 1.3451 0.5306 2.5350 0.0112 n
T1+T2 9.3582 2.2384 4.1807 <0.001 nnn
T1+T3 3.7461 1.1916 3.1438 0.0017 nn
T1+T4 3.2494 1.0547 3.0809 0.0021 nn
T1+T5 1.6094 0.6080 2.6470 0.0081 nn
T2+T1 0.9369 0.4603 2.0353 0.0418 n
T2+T2 2.7515 1.0641 2.5857 0.0097 nn
T2+T3 0.7388 0.5857 1.2613 0.2072 T2+T4 2.5773 1.2203 2.1119 0.0347
n
T2+T5 1.5445 0.7428 2.0793 0.0376 n
T3+T1 1.9123 0.8767 2.1812 0.0292 n
T3+T2 1.1421 1.1734 0.9733 0.3304 T3+T3 1.1632 1.1732 0.9914 0.3215
T3+T4 1.4733 1.1374 1.2954 0.1952 T3+T5 0.4274 0.9366 0.4564 0.6481
T4+T1 1.5486 0.7962 1.9450 0.0518 T4+T2 3.4720 0.7794 4.4547
<0.001 nnn
T4+T3 2.5233 1.0024 2.5172 0.0118 n
T4+T4 4.2077 1.0519 4.0000 <0.001 nnn
T4+T5 3.4720 0.7794 4.4547 <0.001 nnn
T5+T1 3.5850 1.3199 2.7162 0.0066 nn
T5+T2 16.0643 6.3021 2.5490 0.0108 n
T5+T3 7.1026 3.8792 1.8310 0.0671 T5+T4 15.7873 5.1089 3.0901
0.0020 nn
T5+T5 2.3327 0.8968 2.6011 0.0093 nn
J. Zhang, Y. Meng / Journal of Phonetics 54 (2016) 169–201182
syllable; for T5+T4, the short duration of the final checked
syllable might have prevented the rising contour to surface on the
syllable in both real and nonce words.
The classification result of the f0 patterns for the nonce words
expected to undergo left-dominant sandhi is given in Fig. 7, and
the real words’ result is replicated from Fig. 4 here for
comparison purposes. A large proportion of the tone patterns in the
nonce words has been classified as undergoing the spreading sandhi,
indicating the general productivity of the sandhi pattern. But in
general, the real words had more “Spreading” patterns and fewer “No
Sandhi” patterns than the nonce words. Likelihood ratio comparisons
among Generalized Linear-Mixed Effects models on the “Spreading”
pattern showed that the inclusion of word type and tonal
combination (T1+T2, T1+T3, etc.) both significantly improved upon
the model that only included the random effects of participant and
item (word type: χ2(1)¼39.393, p<0.001; tonal combination:
χ2(24)¼60.721, p<0.001), and the model that includes the
interaction term between word type and tonal combination also
significantly improved upon the one without the interaction
(χ2(24)¼ 75.046, p<0.001). We therefore looked at the effect of
word type on the “Spreading” pattern for each tonal combination,
and these results are summarized in Table 4. For the 25 base-tone
combinations, 18 showed a significant effect of word type. Compared
with the f0 curve results in which all 25 tonal combinations showed
a significant difference based on word type, these results indicate
that the lower sandhi productivity in nonce words suggested by some
of the f0 curve results (T1+T1, T1+T2, T1+T3, T1+T5, T4+T3, T4+T5)
were caused by a combination of categorical non-application of the
sandhi in nonce words and phonetically gradient sandhi
application.
To further investigate the nature of the productivity difference
between real and nonce words, we conducted the same growth curve
analyses on only the f0 patterns that have been classified as
“Spreading” by our trained Shanghai speaker. The full growth curve
models together with the observed f0 data and the base tone data
from our consultant are given in Fig. 8, and the model fit
comparisons are given in Appendix D. Most of the differences
between real and nonce words observed in the entire data set (Fig.
6) persist in Fig. 8. In 41 out of 50 growth curve analyses, word
type still has a significant effect on the intercept, linear, or
quadratic term of the model; and for all 25 base-tone combinations,
the two word types still have different f0 curves on at least one
of the syllables. This indicates that the lower productivity indeed
partially stems from a gradient application of the sandhi. A
particularly interesting comparison appears in the T5+T1 graphs in
Fig. 6 and Fig. 8: in Fig. 8, where the f0 track only includes
those tokens that have been classified as “Spreading,” the f0 on
the second syllable is indeed rising, and the inclusion of the
interaction between word type and the linear time term did not
significantly improved the model (χ2(1)¼0.5685, p¼0.4508). This
provides further support for our earlier claim that for T5+T1, the
lower productivity of the contour displacement sandhi is primarily
reflected in categorical non-application.
A concern with the f0 comparison between real and nonce words above
is that for each tonal combination, each participant only produced
one real word and one nonce word. The results, therefore, are
confounded with the segmental perturbation effects on f0 and may
not generalize to different items. We investigated the potential
effect of vowel height on tones for the tokens classified as
“Spreading” as follows. We again coded each syllable according to
the lowest vocalic element during the rime as “High,” “Mid,” and
“Low”, and for each tone in each syllable position, we investigated
the effect of vowel height on f0 by comparing a base model with
only time terms and participant and participant by word type random
effects on the time terms with a model that includes vowel
height
Fig. 8. Observed data (symbols, vertical lines indicate 7SE) and
second-order orthogonal polynomial growth curve model fits (lines)
for f0 on disyllabic words that have undergone the left-dominant
spreading sandhi according to a phonetically-trained Shanghai
speaker. Each graph represents a base-tone combination. Filled
circle and thick solid line represent real words; filled triangle
and dotted line represent nonce words; f0 curves for the base tones
from our language consultant are overlaid onto each graph as thin
solid lines. Each observed data point represents the average f0 at
a particular normalized time point across participants.
J. Zhang, Y. Meng / Journal of Phonetics 54 (2016) 169–201
183
using log-likelihood tests. The results consistently showed that
the effect of vowel height was significant, and parameter estimates
showed that higher vowels generally had a higher f0 than lower
vowels (High>Mid>Low), a finding consistent with earlier
literature (e.g., Whalen & Levitt, 1995; Maddieson, 1997). To
compensate for the vowel height effect, we calculated the average
f0 for the high, mid, and low vowels for each tone in each syllable
position at the eleven time points and subtracted these values from
the original f0 data according to the vowel height of the item. We
then reran the growth curve analyses on these values. The results
to a large extent replicated the earlier analyses. Thirty-four of
the 41 original f0 comparisons that showed a difference between
real and nonce words maintained a difference between the two word
types, and the vast majority of the effects that were shown to be
significant by model comparison (intercept, linear, quadratic) in
the original analysis maintained their significance in the new
analysis (47 out of 61). This indicates that the word type effects
are largely independent from the vowel height effect. The model fit
comparisons for the 50 growth curve analyses based on f0 data
corrected for vowel height are given in Appendix E.
Another potential confound for the f0 comparison between real and
nonce words is that the f0 difference may have arisen from a
duration difference if the speakers produced the unfamiliar nonce
words more slowly. The rime duration results for the two syllables
in all of the real and nonce words are given in Fig. 9. We again
included vowel height as a nuisance predictor, and model
comparisons showed that adding word type (real vs. nonce) or
syllable (σ1 vs. σ2) did not significantly improved the model (word
type: χ2(1)¼ 0.1695, p¼0.6805; syllable: χ2(1)¼1.7184, p¼0.1899),
nor did adding the interaction between the two improve the model
without the interaction (χ2(1)¼0.4472, p¼0.5037). These results
indicate that the duration pattern in the nonce words was identical
to that in the real words, and that the f0 difference based on word
type was unlikely to be caused by a duration difference.
4.3. Structure dependency
To test the hypothesis that the structure dependency of tone sandhi
is productive, we compared the tonal realizations between
disyllabic nonce items that have different morphosyntactic
structures. We expected the modifier–noun (M–N) combinations to
form
Fig. 9. Rime durations for the two syllables in real and nonce
words expected to undergo left-dominant sandhi. “S1”¼first
syllable; “s2”¼second syllable. The black dot represents the
median, the box represents the interquartile range (1st to 3rd
quartile), and the whiskers represent maximally 1.5 times the
interquartile range.
Fig. 10. Observed data (symbols, vertical lines indicate 7SE) and
second-order orthogonal polynomial growth curve model fits (lines)
for f0 on disyllabic nonce items with different syntactic
structures (M–N¼modifier–noun; V–N¼verb–noun). Each graph
represents a base-tone combination. Filled circle and thick solid
line represent M–N words; filled triangle and dotted line represent
V–N phrases; f0 curves for the base tones from our language
consultant are overlaid onto each graph as thin solid lines. Each
observed data point represents the average f0 at a particular
normalized time point across participants.
J. Zhang, Y. Meng / Journal of Phonetics 54 (2016) 169–201184
words and undergo left-dominant sandhi and the verb–noun (V–N)
combinations to form phrases and undergo right-dominant sandhi,
which we have interpreted as phonetic contour reduction, not
phonological sandhi. The data came from the second part of the
experiment. The segmental contents of the M–N and V–N combinations
were identical, as the nonce syllables in σ1 position were cued as
modifiers for half of the participants and as verbs for the other
half.
The observed f0 data for the M–N and V–N nonce words for each of
the base-tone combinations, along with the second-order orthogonal
polynomial growth curve models for the f0 and the f0 curves for the
base tones from our language consultant, are given in Fig. 10. We
again graphed the full models for all f0 comparisons here for
consistency’s sake.
Fig. 11. Tone pattern counts for “Spreading,” “No Sandhi,” and
“Other” for nonce items as determined by a phonetically trained
Shanghai speaker, organized by base tone combinations. “M–N”
(modifier–noun) and “V–N” (verb–noun) represent forms that are
expected to undergo left-dominant and right-dominant sandhi,
respectively.
Fig. 12. Rime durations for the two syllables for the nonce items
in the two sandhi directions. “S1”¼first syllable; “s2”¼second
syllable. The black dot represents the median, the box represents
the interquartile range (1st to 3rd quartile), and the whiskers
represent maximally 1.5 times the interquartile range.
J. Zhang, Y. Meng / Journal of Phonetics 54 (2016) 169–201
185
From Fig. 10, we can see that the f0 curves for the V–N nonce items
are consistently more similar to the base tones than the M–N nonce
items. We have discussed in Section 4.2 that despite some
differences from real words, the left-dominant spreading sandhi
applied relatively productively in M–N nonce words. For the V–N
nonce phrases, however, we generally observed nothing more than the
gradient reduction of f0 contours on the first syllable. This is
especially clear when the first syllable had a base rising tone
(T2+X, T3+X, T5+X). Model comparisons of the f0 curves indicates
that in 44 out of the 50 analyses, syntactic structure had a
significant effect on the intercept, linear, or quadratic term of
the model. When the first syllable has a base rising tone, model
comparisons for the first syllable consistently showed that the
effect of syntactic structure on the linear term was significant,
and parameter estimates, with M–N as the baseline, consistently
showed positive values for the linear term, indicating that the
first syllables in V–N had greater rising slopes than the first
syllables in M–N. Moreover, due to the matched segmental contents
between the M–N and V–N combinations, the f0 comparisons here are
not confounded with segmental effects, making the result more
easily interpretable. The model fit comparisons for all 50 growth
curve analyses as well as the R codes are given in Appendix
F.
The classification result of the f0 patterns for the nonce items
with M–N and V–N structures is given in Fig. 11, with the M–N
result replicated from Fig. 7. The overwhelming majority of the V–N
items has been classified as undergoing no sandhi by our native
speaker, and a Generalized Linear-Mixed Effects model on the
“Spreading” pattern, with structure as a fixed effect and
participant and item as random effects showed that the V–N
structure had a significantly fewer “Spreading” count than the M–N
structure (Estimate¼5.0304, S.E.¼0.2230, z¼22.559, p<0.001, M–N
as baseline). This result is consistent with the f0 curve result
and
J. Zhang, Y. Meng / Journal of Phonetics 54 (2016) 169–201186
suggests that the structure-sensitivity of tone sandhi is
productive in Shanghai, and speakers are able to let the syntactic
structure of a disyllabic sequence dictate the tonal outcome of the
two syllables.
The rime duration results for the two syllables in the nonce items
with the two structures are given in Fig. 12. We again included
vowel height as a nuisance predictor, and likelihood-ratio tests
showed that the model that included both the structure (left- vs.
right- dominant) and syllable (σ1 vs. σ2) terms as well as their
interaction provided the best fit with the data. From this model,
we found that for the M–N (left-dominant) structure, the second
syllable had a significantly longer duration than the first
syllable (Estimate¼3.792, S.E.¼1.919, t¼1.976, p¼0.048), and the
for the V–N (right-dominant) structure, the second syllable had a
significantly longer duration as well (Estimate¼20.165, SE¼1.919,
t¼10.507, p<0.001), but the duration difference between the two
syllables was significantly greater for the V–N structure
(Estimate¼16.374, S.E.¼2.594, t¼6.311, p<0.001). This durational
pattern is similar to that of the real items in Fig. 5 and provides
support for the fact that the participants correctly interpreted
the grammatical structures for the nonce items.
5. Discussion
5.1. Productivity and structure dependency of Shanghai tone
sandhi
Our descriptive results on the f0 patterns of left-dominant and
right-dominant tone sandhi in existing disyllabic words provided
some evidence that the left-dominant sandhi indeed involved
spreading the f0 contour over the two syllables, while the
right-dominant sandhi was better interpreted as phonetic contour
reduction on the first syllable. The auditory priming of the base
tones during the experiment prevented us from making conclusive
claims about the nature of these sandhis, but the confound was
necessitated by the more important goal of the study, which was to
investigate the productivity of the left-dominant sandhi. Our claim
that the left- dominant sandhi was relatively productive came from
two sets of data, one on the f0 comparison between real and nonce
words, one on the comparison between nonce words (M–N structure)
and nonce phrases (V–N structure), the former of which would not
have been possible had the real word data not been base-tone
primed, as the nonce words were necessarily primed by their base
tones in the setting up of the context.
The real vs. nonce comparison provided a direct test of
productivity by showing whether the sandhi applied differently in
nonce words than in real words. Despite statistical differences in
the f0 curves, we have seen that the shapes of the curves over the
disyllabic nonce words generally represent the base tone contours
of the first syllable, indicating the productivity of the spreading
sandhi. Some of the differences between real and nonce words could
be interpreted as the nonce words preserving more tonal
characteristics from the base tone than the real words, but not all
of them could. The differences, we argue, came from two sources.
One was a greater number of categorical non-applications of the
sandhi, as shown in the classification result. The other lay in the
gradient application of sandhi, a type of gradience that is akin to
incomplete neutralization in production (e.g., Peng, 2000; Yu,
2007) and the lack of full productivity in T3 sandhi in Standard
Chinese (Zhang & Lai, 2010) and Tianjin (Zhang & Liu, in
press).
The f0 comparison between M–N nonce words and V–N nonce phrases
indicates that the structure sensitivity of the sandhi system in
Shanghai is productive, as their differences, both in f0 curves and
tone pattern classification, are consistently interpretable by the
stronger preservation of the base tones in the V–N structure. The
comparison also indirectly supports the productivity of left-
dominant sandhi: if the tones in the V–N structure only involved
phonetic contour reduction of the base tones, then qualitatively
different f0 curves on the M–N nonce words would indicate that
phonological sandhi processes have applied to these words.
We have found some evidence that the contour displacement sandhi in
T5+X has a different productivity pattern from contour extension,
as the lower productivity of contour displacement seems to have
primarily stemmed from categorical non-application of the sandhi,
especially in T5+T1 where the context allows the effect to be the
most clearly observed. This suggests a more substantial degree of
underlearning of the sandhi. We conjectured earlier that this is
due to the sandhi’s more distant affinity with progressive
coarticulation than contour extension. Two additional phonetic
properties of the sandhi may have also contributed to its lower
productivity. One is that according to Zhu (1999), in T5+X
combinations, the phonetic prominence falls on the final syllable
due to the pronounced rising f0. This creates a mismatch between
phonetic prominence and phonological prominence, which is on the
initial syllable – the syntactic non-head that determines the
sandhi tones (see Selkirk & Shen, 1990; Duanmu, 1995). The
other is that pronounced rising tones are typologically disfavored
(Zhang, 2002). Other factors identified in earlier literature that
undermine tone sandhi productivity include phonological opacity
(Hsieh, 1970, 1975, 1976; Wang, 1993; Zhang & Lai, 2008; Zhang
et al., 2011), lexical variation (Zhang & Liu, in press), and
low lexical frequency (Zhang & Lai, 2008, 2010; Zhang et al.,
2011; Zhang & Liu, in press). Opacity and lexical variation are
not relevant here, as the contour displacement pattern itself is
transparent, and for T5+X combinations, contour displacement is the
only sandhi form that has been reported, while the other four
contour extension patterns have more reported variation (Xu et al.,
1981; Xu & Tang, 1988; Zhu, 1999, 2006). Due to the lack of
frequency data, we cannot rule out the possibility that the lower
productivity is related to low lexical frequency, but the reported
effects of frequency are typically smaller than what we found for
T5+T1 combinations (e.g., Zhang & Lai, 2010; Zhang & Liu,
in press).
An anonymous reviewer raised the concern that the real and nonce
items were elicited under different contexts. In particular, the
nonce items that varied in the second syllable were elicited in one
block, which may have resulted a contrastive reading on the second
syllable and consequently more base tone readings (see, for
example, Chen & Gussenhoven, 2008). But given that our
hypothesis was that the spreading sandhi should be productive in
the nonce items, putting these items in a context less conducive
to
J. Zhang, Y. Meng / Journal of Phonetics 54 (2016) 169–201
187
tone sandhi stacked the deck against the hypothesis. Therefore, the
fact that we found that the sandhi was generally productive in the
nonce items provides even stronger support for the
hypothesis.
Finally, a shortcoming in the design of our experiment is that it
did not allow our results to generalize to different items, as each
participant only produced one stimulus for each tonal combination.
Item, therefore, could not be included as a random factor in our
data analysis. Although our f0 comparisons recalibrated against the
vowel height effect showed similar patterns, there are many other
potential item effects, and we want to emphasize the importance of
item generality for future studies.
5.2. Situating Shanghai tone sandhi in tone sandhi typology
Our data on Shanghai tone sandhi complement our knowledge on tone
sandhi productivity in the following respects. It is the first of
its kind to investigate the productivity of rightward spreading
sandhi – a typologically common tone sandhi pattern. Its close
affinity to progressive tonal coarticulation prompted the
hypothesis that it should be relatively productive, and our results
generally support this hypothesis. This result supports the earlier
finding that the phonetic naturalness of the tone sandhi
facilitates its productivity. A comparison between the Shanghai and
Taiwanese results also shows that opacity is a strong cause for
categorical unproductivity, as the transparent sandhis in Shanghai
showed more gradient production in nonce words, while the opaque
sandhis in Taiwanese showed only categorical application,
non-application, or misapplication. An additional difference
between the Shanghai and Taiwanese patterns not directly reflected
in the results is the difficulty with which the tone patterns could
be classified. As stated earlier, our Shanghai speaker tasked with
classification found the task difficult in around 20% of nonce
items; Zhang et al. (2011), on the other hand, did not report
similar difficulties and stated that three phonetically trained
transcribers agreed on the sandhi transcriptions for virtually all
tokens. It is likely that the presence of gradient sandhi
application caused the classification difficulty in Shanghai, but
the lack of it made the task easier in Taiwanese. Finally, our
results, both in f0 and duration, showed that Shanghai speakers
made structure-dependent generalizations regarding tone sandhi; the
phonological analysis of prosodic domains and prosodic heads in
Shanghai and other Chinese dialects (see Duanmu, 1995, 2007, for
example), therefore, does have psychological reality despite the
fact that phonetic motivations for the analysis are sometimes hard
to come by.
In general, our results echo Zhang’s (2010, 2014) point regarding
Chinese tone sandhi that rushing into an analysis of a sandhi
pattern before testing it experimentally is premature, as the
speakers’ knowledge of the tone sandhi pattern may not be identical
to the pattern in the lexicon, and impressionistic transcriptions,
no matter how careful, have their limitations. If we situate our
findings here in the recent works in experimental phonology that
showed that differences between the speakers’ knowledge and the
lexical patterns are informative of the nature of phonological
grammar (e.g., Wilson, 2006; Zuraw, 2007; Moreton, 2008; Hayes et
al., 2009; Becker et al., 2011), we can more clearly see that the
study of Chinese tones has much to gain from experimental
investigations of productivity, processing, and learning.
6. Conclusion