Upload
itcouldbefs
View
233
Download
2
Embed Size (px)
Citation preview
Fang Shi [email protected]
Tonogenesis from a Native Chinese Speaker’s Perspective
Abstract
This paper examines the mainstream tonogenesis model and proposes alternative
hypotheses on a number of possible sources of linguistic tone. The analysis and hypotheses
draw on general linguistic knowledge as well as personal experience and intuition as a
native tonal language speaker, and they are intended to both provide competing
explanations and refine existing theories and their applications.
0 - Terminology and Notation
Tone, for the purpose of this paper, is defined as pitch variation over the domain of a
morpheme or a word that systematically differentiates lexical or grammatical meanings, as
exemplified by tonal minimal pairs. This definition intends to rule out a major source of
confusion: intonation, which is usually over the domain of an entire utterance (i.e. often
multiple words) and denotes meanings such as speaker attitudes and emotions. However,
as I will explain in one of my hypotheses, the actual distinction can get blurry.
Examples will be demonstrated mostly in Chinese and English, the two languages I’m most
familiar with. Some fundamental knowledge of Mandarin Chinese is thus presumed. For
convenience, pronunciation of Chinese characters will be transcribed in pinyin, with tones
marked in corresponding numerals (1-‐4 for the four tones in Modern Standard Chinese, and
0 for the neutral/light tone, e.g. 拼音 pin1yin1).
1 - Introduction to Mainstream Tonogenesis
Initiated by A. G. Haudricourt’s work on Vietnamese tones, later rationalized by J. M.
Hombert’s phonetic experiments and physiological explanations, and further supported by
similar observations in some tonal languages, the now most dominant tonogenesis
paradigm (theory that tries to explain the source of linguistic tones) can be summarized as
below:
Due to intrinsic articulatory constraints, consonants may affect the pitch of vowels that
follow or precede them. These consonants are said to affect the mode of voicing of the
neighboring vowels and thus raise or lower their pitch. After these consonants merge or
disappear through phonological changes, their effects on the vowel pitch remain as the
Fang Shi [email protected]
contrastive feature and are perceived as tones. In particular, in the case of prevocalic
plosives, voicing lowers the pitch of vowels that immediately follow. This, by various
accounts, is described as the most widely attested source of tonogenesis, and accounts for a
high vs. low tonal distinction in many languages. In addition, prevocalic and postvocalic
influences may combine to produce more complex tonal systems.
2 - A Brief Examination of Hombert
Regarding the origin of tone in language, the two most cited sources are Houdricourt’s “De
l’origine des tons en vietnamien” (1954) and Hombert’s “Consonant Types, Vowel Quality,
and Tone” (1978). I unfortunately do not know French well enough to read Houdricourt’s
original work, but I will instead point out a few things I noticed in Hombert’s paper that
concern the experimental basis of the general model.
Both before and after reading Hombert’s paper, I performed experiments on a native
American English speaker and myself (proficient English speaker and linguistically trained),
testing out the effect of prevocalic voicing on vowel pitch. For no obvious reasons, results
based on my own speech showed no significant correlation but only minor free variations of
the pitch, but a sample from the native English speaker favors Houdricout and Hombert’s
idea and shows at the vowel onset a depressing effect of less than 10 Hz on a 130-‐150 Hz
fundamental frequency range. The effect is well below 10% of the speaker’s normal pitch
level (percentage is used here instead of absolute value since it seems that the higher the
pitch value, the more it can potentially get perturbed), consistent with the three test results
by House and Fairbanks (1953), Lehiste and Peterson (1961), and Mohr (1968) as quoted
by Hombert (79). Hombert’s experiments, on the other hand, all showed quite dramatic
effects on vowel pitch, in most cases more than 10%, sometimes to as much as over 20% of
the speakers average pitch value. The huge difference on almost identical experiments
appeared to me as very odd, and some subtle detail in Hombert’s graphs of his experiment
results make it even more dubious whether his experiments could be biased.
Suppose a speaker pronounces two syllables with the same pitch, and suppose the different
consonantal onsets (or codas) of these syllables have opposite effects on the pitch of the
identical vowel nucleus, we would expect, on a graph mapping the vowel pitches over vowel
onset time (or time till closure, as with the case of different codas), two curves start (or end,
Fang Shi [email protected]
as with codas) at different pitches due to the inverse effects of the two consonants, and as
the curves extend away from the onset/closure, they would draw closer to each other and
ideally merge into the pitch level normal of that vowel to the speaker (suppose the vowels
last long enough and extend over the time range of the consonants’ influence). In Hombert’s
experiments, this expectation is not merely met, but far exceeded; the two curves actually
cross over (particularly visible in his S3 from Figure 2 for Experiment I and S1 from Figure
7 for Experiment V). The unexpected crossover appeared to me as either a sign of
incautious contrivance of the graphs or an indication of biases that might have existed in
Hombert’s experiments.
A possible cause of bias which may lead to the exaggerated pitch difference and the
crossover could have been induced by Hombert’s prompts given to the test subjects. Though
Hombert did arrange the test words in random order (and thus rule out the possible bias
due to a conventional intonation pattern), he gives no unambiguous specification of
whether the prompt was given in text or speech. If given in speech, especially by the author
or someone with the belief on effects of the consonants, he could either intentionally or
subconsciously pronounce the syllable with a pitch height that reflects that belief and
therefore hint at the test subject to repeat with the same pitch. A particularly likely hint to
have caused the above-‐mentioned crossover is a low rising pitch contour on one (set of)
syllable(s) and a high falling contour on the other. This psychological process, ironically, can
be very well explained by Hombert himself, using the exact words he used in a very similar
paper to justify how minor pitch perturbations caused by consonants can induce tones:
“Since the listener does not have independent access to the mind of the speaker, and
thus may be unable to determine what parts of the received signal were intended and
what were not, he may intentionally reproduce and probably exaggerate these
distortions when he repeats the same utterances.” (Hombert et al. 1979:37)
Also worthy of mentioning, Hombert’s Experiment VI (94-‐95) was designed to determine
whether small changes of F0 (which represent consonantal effects on vowel pitch) are
perceptibly significant for listeners to notice. The design of this experiment itself appears
legitimate, but the outcome is apparently lacking, and his interpretations and conclusion
unjustifiable. No specific data resulted from the experiment was given at all, and while at
Fang Shi [email protected]
least some kind of correlation is expected to be established, the only outcome Hombert
bothered to mention is a vague description saying that the set of rising contours and the set
of falling contours are statistically significantly different to listeners’ perception. The way
Hombert interprets an important variable ∆t (the duration of the pitch change) in the result
renders his conclusion even more questionable. Logically speaking, suppose the magnitude
of fundamental frequency change (∆F in Hombert’s experiment) is held constant, the
shorter the ∆t, the sharper the pitch change, and the more likely it is perceived by listeners.
Interestingly, Hombert states that for a certain ∆t value and all values greater than it, the
perceived difference was significant. This yields a counter-‐intuitive implication that in
Hombert’s experiment, the sharper the pitch changes, somehow the worse they were
perceived. Unless these doubts can be cleared, Hombert’s experiments cannot sufficiently
support that the consonantal influences exist and can be perceived, even in ideal lab
conditions.
Some physical restrictions of Hombert’s experiments may also impair their credibility. The
hardware pitch extraction method used in his experiments for measuring fundamental
frequency is now far out-‐dated, and Hombert even noted himself how most F0 extractors of
the time performed poorly and caused difficulty for obtaining accurate measurements (88).
Another obvious limitation is the shortage of test subjects. Hombert mostly had 3-‐4
speakers’ speech analyzed in each experiment, and in some of these cases, only a single
speaker per sex per language. This may also be due to the painstaking effort the pitch
measuring methods of the time took, but the low sample size yields no convincibility
beyond that suitable only for preliminary speculations.
With the problems mentioned above, I’d encourage anyone interested in this subject,
especially anyone who intends to cite Hombert or any work that does so, to repeat the
experiments and check on the results, which can now be quite conveniently obtained
through spectrogram analysis of digital recordings using computer softwares such as Praat.
3 - General Problems of the Model
Above I’ve used Hombert (1978) as a popular example of phonetic support for the
consonantal influence tonogenesis model (abbreviated CITGM for convenience) to identify
some specific weaknesses in the experimental foundation of model. Abramson (2004) also
Fang Shi [email protected]
offers a concise summary of the development of CITGM and a number of representative
publications, also pointing out certain disputes within CITGM and discrepancies among
related phonetic experiments. The CITGM, though already widely accepted, lacks not only
definitive experimental support, but also explanatory power in some more general aspects,
which I’ll focus on next.
3.1 - The Gap between Pitch Perturbations and Tonal Actualizations
Without the need of referring to specific data, anyone who speaks a tonal language or at
least familiar with one can tell how insignificant the supposed vowel pitch difference caused
by consonants is (trivial indeed, to my ears native to a tonal language, which are already
extra sensitive to pitch differences), compared to the clear pitch distinctions of real
language tones. Not to mention there is yet a firm answer to whether the minor consonantal
effects are strong enough to be perceived in real language environments, as opposed to in
ideal lab conditions where human can technically discriminate pitch difference as small as
+/-‐ 1 Hz in a 80-‐160 Hz range (Laver: 451).
Even if we stand back for a moment and grant the possibility that the proposed pitch
perturbations caused by consonants may by perceptually significant enough to induce tones,
a paradox would immediately arise. While asserting the features of voicing and mode of
phonation can cause tones, why would the other sources of pitch perturbation, among
which the most notable is inherent vowel pitch, be overlooked entirely by CITGM?
It’s been well observed that there is a “systematic correlation between average pitch of
vowels and vowel height … the higher the vowel, the higher the pitch”, and the pitch
difference can be “as much as 25 Hz” (Laver: 454). The measured internal F0 of vowels can
differ significantly while perceptually they are spoken with a consistent pitch level,
indicating that natural vowel pitch difference due to physiological constraints is likely to be
psychologically normalized for speakers’ perception and thus not perceived as
differentiating cues. While it’s puzzling enough that CITGM bases its premise on the exact
opposite argument, it even goes as far as asserting the pitch differences caused by
consonants can be perceived to induce tones but those inherent to vowels cannot. Chen
(2000) expresses the same doubt on Hombert’s attitude:
Fang Shi [email protected]
“[S]urprisingly, despite the well-‐known intrinsic pitch variations associated with
vowel height, tone split along the high/low vowel distinction is so rare that Hombert
et al. (1979:52) state flatly: “It would seem that the interaction between tones and
vowel height works in only one direction: tone can affect vowel height, but not vice-‐
versa” (11).
If the pitch perturbative effect of adjacent consonants on vowels is somehow indeed
significant enough to induce tones through mergers or lost of consonants, then the inherent
pitch of vowels could also have done so through the merging of vowels. For example, a very
common merger of adjacent vowels [o] and [u] is observed in languages. Using data from
Laver (454), if an [u] of inherent pitch 182 Hz merges with an [o] of 170 Hz, the most
significant linguistic cue left to distinguish the previously minimal pairs with a [o/u]
distinction would be the 12 Hz pitch difference. This pitch difference is as large as the
consonantal perturbations attested by most experiments in favor of a CITGM, not to
mention vowels with even greater internal pitch difference could merge. Apparently, a good
explanation is required by CITGM to clear why the same phonological process never
happens to a highly comparable, if not more salient, set of linguistic cues.
3.2 - Over-reliance on Reconstructions and the Missing Intermediate Stage
Language change, however rapid, as a matter of social behavior, takes place gradually, and
different cultural groups often exemplify a range of different stages in development,
together offering us a whole picture of a continuous changing process. Consider a common
change in language that a case-‐marking language shifts towards one that employs word
order to designate semantic roles such as subject and object. What we can observe is that
some languages use case-‐marking extensively and allow free word order, and some are
case-‐free and require fixed word orders, but also a significant percentage of languages use
both strategies at the same time, even though one would totally suffice. One linguistic
principle we may abstract from this picture is that language is not perfectly logic-‐driven: not
all redundancy in language is eliminated, and even if a redundant feature is to be eliminated,
the change may take a long period in history to complete, and in turn contributes to the
abundance of typologically diverse languages. The same principle can be seen in languages
that require both conjugation on the verb and corresponding subject pronouns, and it’s
even more common in the realm of phonetics and phonology. From a segmental perspective,
adjacent phonetic segments often carry features that are not originally of their own but
Fang Shi [email protected]
assimilated from each other, and both segments would retain and thus share the same set of
features.
The same principle, if applied to the phonological process essential to CITGM, in which
tones emerge out of the merger of originally separate phonemes, should leave us with an
abundance of languages in the intermediate stage of change, in which tones coexist with the
hypothesized consonant contrasts. CITGM, however, often skips this stage and assumes the
change to complete instantly, leaving no trace behind. Arguments for CITGM hardly ever
back themselves up with live examples of language that has developed tone but has yet to
rid of the now redundant consonantal contrast that induced the tone. Instead, most of them
rely exclusively on reconstructions, and thus subject their validity to the quality of the
reconstructions, and moreover, to the inherent variability of the method of reconstruction.
This over-‐reliance sometimes can lead to severe logical fallacy. For instance, it’s now widely
held that Old Chinese was atonal, and that tones first emerged in Chinese as a result of to
postvocalic consonantal influences on vowel pitch. People since Haudricourt have
reconstructed Old Chinese with codas such as [ʔ], [h] and [s] to account for the first round of tonogenesis in Chinese, along the train of thoughts of CITGM (Jacquet: 14-‐21). There is,
however, no evidence whatsoever to support the historical existence of such codas in
Chinese other than that CITGM demands so. Now when these unexamined assumptions are
taken for granted, many theorists even cite the reconstructed Old Chinese codas as a
support for CITGM, which simply results in circular logic.
3.3 - Cross-linguistic Variance Unexplained/Unexamined
If voicing in prevocalic consonants naturally induces a lower pitch in the vowel, why the
loss of voicing distinction in some languages (e.g. Hawaiian) did not cause tones to emerge?
What are the critical differentiating factors? Additionally, CITGM literature seems to be
exclusively concerned with tones over monosyllabic morphemes. In reality, many tonal
languages, especially those in Central and Southern Africa, may have tonal patterns that
range over multiple syllables: there could be a high-‐low vs. low-‐high tonal alternation to
differentiate meaning, as in the case of Dagaare; or as in Chizigula, there could be a tone
that’s placed consistently on the penultimate syllable of a verb (Yip: 2). These kinds of tonal
Fang Shi [email protected]
strategy can hardly be results of mergers of contrasting consonants, or else would certainly
require hyper intricate explanations to fit into CITGM.
A curious phenomenon I mentioned in section 2 also begs for extra inquiry, and research
may give rise to fresh perspectives on cross-‐language variance in pitch perception. If the
pitch depressing effect of voiced plosives does exist and indeed results from articulatory
constraints of the human vocal tract (as CITGM theorist rationalized), I, even as a native
speaker of a tonal language, should for no reason be exempt from this physiological effect.
Yet, mini experiments I conducted on myself showed no such effect at all, as if the mentality
of tonal language speakers automatically precludes it. Well-‐controlled experiments
involving a larger number of speakers of diverse native languages would help verify this. If
my hunch proves positive that perturbations by consonants do not apply to native tonal
language speakers for some reason, among many possible implications, the tonal split (the
second round of tonogenesis) in the history of Chinese and Vietnamese caused by prevocalic
voicing distinction as argued by CITGM would be disproved, since the causing mechanism
would break down for languages that are already tonal. If experiments show otherwise, at
least the consonantal effects get to be claimed a universal with more certainty.
4 - Alternative Hypotheses
Although CITGM provides some temporary relief to the complexities of linguistic tone, as I
have pointed out, it still lacks explanatory power and experimental support and therefore
has a long way to become a truly satisfactory theory. And next, I will propose a few other
possible sources of linguistic tone, some of which may parallel or complement those
suggested by CITGM.
4.1 - The “Tonocentric” View / Tonal as Default
Yip (2002:1) estimates that 60-‐70 percent of languages are tonal, The Cambridge
Encyclopedia of Language gives “well over half” (Crystal: 174), and in WALS’s sample of 527
world’s languages, about 42% are tonal and it’s noted as an underrepresentation. Though
the exact numbers may vary, a considerably large portion of human languages, either by the
number of languages or the population of speakers, are actually tonal languages. This
information appeared surprising to me at first and perhaps so to many others, most likely
due to the prevalence of atonal Indo-‐European languages on earth. Even within the field of
Fang Shi [email protected]
linguistics, tone tends to receive only marginal attention and sometimes is simply ignored
(Yip:1). Also considering the absence of native tonal language speaker input in the
theorizing phase of CITGM, it’s actually no wonder how it came into being with an
underlying mentality that, since tones do not exist in the predominant European languages
and do not appear intuitive to the theorists, they must have not been there in the first place
and thus must be derivative of something more typical (to those theorists). The term
“tonogenesis” itself, as coined by James Matisoff, actually carries a similar implication that
linguistic tone is not something inherent to language, but rather a remarkable phenomenon
that occurred outside the norm.
A particular class of uncommon consonants, clicks, is popularly thought to be a remnant of
early human languages, since languages with click sounds cluster heavily in Africa, where
early humans originated, and that click sounds are seen as rather complicated and unlikely
to have evolved from more common sounds. Yet with tones, which are also particularly rich
in Africa’s aboriginal languages, theorists would rather come up with intricate explanations
to make them fit into a Eurocentric paradigm.
In terms of mechanism, the variance of pitch easily qualifies as a crucial component of early
human languages. While reconstructing early human languages is well beyond the scope of
historical linguistics, we can nevertheless make inferences from relevant observations.
Evolution is a continuous process, and the physiological and psychological capacities that
enable the complex modern human languages did not come into existence overnight.
Language itself is also likely to have evolved gradually from the primitive use of sounds to
convey simple meanings. Distinguishing features that mark the proto-‐language sounds are
improbable to be manners and places or modes of articulation, as these would require
highly specialized organs and related neural controls, which were apparently not fully
developed at first. What appears much more probable to be a core controllable variable in
the primitive language is pitch, as can be demonstrated by most animal communication
systems that deploy the vocal-‐auditory channel. Similar to animal “languages” (consider
that of birds, elephants, and dolphins, for example, in which pitch is clearly the main
variable), our proto-‐language could have consisted of very limited segmental variables -‐
possibly just an invariable sound, and used primarily pitch combinations together with
rhythm to differentiate meaning. Such a language is technically capable of expressing
Fang Shi [email protected]
complex meanings just like regular modern spoken languages and would have allowed
further development of language capacities during the human evolution. The living
examples of whistled languages can well demonstrate this phonological strategy in the
hypothesized proto-‐human language. Whistled languages can employ two possible
strategies to communicate through whistling alone, the simpler and more common of which,
also being the more relevant strategy here, is by whistling the pitch contour of the spoken
language alone, and evidently in tonal languages this can conduct “effective communication
of quite extended linguistic messages” (Laver: 481) and “convey precise distinctions”
(Crystal: 404).
Biological evidence may also shed light on our speculation of the proto-‐human language.
Through making plaster casts of the bony cavities within the fossil skulls of early human
and comparing this reconstructed vocal tract to that of modern man, anthropologists
inferred that “Neanderthal man (70-‐35,000 BC) would have been able to utter only a few
front consonant-‐like sounds and centralized vowel-‐like sounds, and may have been unable
to make a contrast between nasal and oral sounds” (Crystal: 292). In addition, this
reconstructed vocal tract is “remarkably similar to that of a newborn baby” (Crystal: 292).
Also considering that the early humans would have limited psychological capacities of
language (likely to be comparable to those of a baby), we may speculate that the proto-‐
human language can be very similar to the “speech” of a newborn baby, which phonetically
consists of primitive sounds ambiguous of articulatory constraints and phonologically
employs pitch as the major controllable variable. Otto Jespersen made a very similar
speculation regarding the origin of language (1922: 416-‐417), and he also noted a general
trend in language of “gradual disappearance of tone or pitch accent” (419):
“[T]his has been the case in Danish, whereas Norwegian and Swedish have kept the
old tones; so also in Russian as compared with Serbo-‐Croatian. In ... old Indian, Greek
and Latin … pitch accent played a prominent part. ... In modern Greek and in the
Romanic languages the tone element has been obscured, and now 'stress' is heard on
the syllable where the ancients noted only a high or a low tone” (419).
Jespersen not only inferred from this that tone played an important part in our primitive
languages, but also traced further back along the chain of thoughts and posited a singular
source -‐ a form of primitive singing -‐ for both language and music (431-‐437).
Fang Shi [email protected]
Language may have originated as a side product of a primitive “singing” of no meaningful
lyrics but pure emotional expression. The more emotional side of it captured the more
abstract aspects of the singing and later branched off into what we now call “music”
(consider the similarity between the emotional faculties of music and that retained in
language intonation). A utilitarian side of the primitive singing also branched off and started
to associate pitch patterns with certain emotions and meanings, and this eventually evolved
into language. As physiological and psychological developments in human evolution
enabled more advanced articulatory distinctions (e.g. consonants and vowels) to be made to
differentiate meaning, this proto-‐human language gradually reduced the functional load of
pitch variance (which was gradually replaced by increasing options of consonant and vowel
distinctions) for more efficient coding. The mixed use of pitch variance and other phonetic
features eventually came into an equilibrium: in some languages, the functional load of pitch
reduced to the same level as that of consonants and vowels, and these languages are now
referred to as tonal languages; on the other end of the spectrum, the reduction went further
and leaves pitch with only the domain of intonation, and these languages are considered
atonal.
If one finds this course of development of human language plausible, tone as a remnant of
primitive language ought not be overseen. The arrogant presumption that current tonal
languages must have developed from a toneless state should be seriously questioned. In
particular, the kinds of tonal strategies that don’t fit into CITGM well (c.f. 3.3) are highly
likely to have always been in the language since the very beginning.
4.2 - Intonation
All languages, whether tonal or not, have intonation. In a highly tonal language like
Mandarin Chinese, intonation is usually superimposed on lexical tones, but occasionally
they may interfere with each other. Many languages including Chinese and English share
common intonational schemes such as the inquisitive up-‐stepping pitch contour and the
confirmative/declarative down-‐drifting pitch contour. These patterns usually operate over
multiple syllables in a string of utterance, but if the utterance itself is short, especially when
it’s just a single syllable, the domain of intonation would coincide that of lexical tone. When
this happens in Chinese, the up-‐stepping intonation resembles the rising/2nd tone, and the
down-‐stepping intonation resembles the falling/4th tone. Under this principle, the
Fang Shi [email protected]
intonation in certain characters and phrases that are commonly associated with questions
or other emotions might have been perceived as part of the lexical items and then
internalized as tone.
Chinese question words/particles such as 何 he2 “what?” (literary), 咦 yi2 and 啥 sha2
“what/huh?” (colloquial), 什么 shen2me0 “what?” (standard), 谁 shei2/shui2 “who?”, and
archaic question markers 邪/耶 ye2 and 欤 yu2 all carry the 2nd (rising) tone. Thus their
intrinsic tone often coincides with the up-‐stepping intonational pitch contour in the
inquisitive utterance that they are in (the shorter the whole utterance the more obvious).
Another very interesting example is a highly heterophonic character in Chinese: 诶, an
exclamation word often used on its own as a complete expression. Many dictionaries now
list the following 4 pronunciations and their respective meanings (along with a few more
that are unrelated to the discussion here and thus not listed):
-‐ 诶 ei1 exclamation, to call attention
-‐ 诶 ei2 exclamation, to express surprise
-‐ 诶 ei3 exclamation, to express disdain/disagreement
-‐ 诶 ei4 exclamation, affirmation
The pronunciation is invariable except for the tone, and in each tone, the meaning matches
the emotion that’s usually expressed by the similar intonational contour.
Besides the profound connection between tone and intonation deeply rooted in the history
of language (c.f. 4.1), more recent interaction of the two may also have taken place. Suppose
that Chinese used to be atonal at some point as CITGM presumes, characters and phrases as
exemplified in this section could have developed tones first, and then, by analogy, they
could have assigned their tones to characters and phrases of identical or similar syllable
structures (e.g. those with the same prevocalic or/and postvocalic consonants). Otherwise,
if considering the examples above as merely isolated cases, intonation in already tonal
languages can still serve as a limited but nevertheless viable source of tone.
4.3 - Stress
Unstressed syllables at the initial position of polysyllabic English words (e.g. “po-‐” in
“position”) have a pitch contour that impressionistically simulates that of a fall-‐rising tone
Fang Shi [email protected]
(i.e. 3rd tone) in Mandarin Chinese. The rationalization may be that the unstressed thus low-‐
pitch syllable first weakens over its duration, and then prepares to transition into the next
stressed thus high-‐pitch syllable, altogether generating a low-‐lower-‐high pitch contour.
Suppose beyond all written records, Chinese has once had polysyllabic words in which the
first syllable was unstressed (in the same stress paradigm as that of English), and later
phonological reduction resulted in the loss of everything after this first unstressed syllable,
the original pitch contour over multiple syllables could have been condensed on the
remaining syllable and preserved as a meaningful unit, giving rise to what’s now a falling-‐
rising/3rd tone.
Though being a tonal language, Chinese also has stress patterns; most notably, certain
grammatical particles and some characters in specific lexical contexts are unstressed
(unmarked for tone, or conventionally said to be marked with a “light tone” / neutral tone),
e.g. 我的 wo3de0 “my”, 东西 dong1xi0 “stuff”, 好得多 hao3de0duo1 “way better”.
Interestingly, the “tonotactics” of Chinese seems to preclude the occurrence of any
unstressed syllable (neutral tone) at the initial position of any words (not even in
transliteration of foreign words with an unstressed initial syllable), resulting in a
complementary distribution between unstressed / 0-‐tone characters and the hypothesized
stress-‐induced third tone characters at word-‐initial positions. Another piece of tonotactics
in Mandarin Chinese that may relate to this hypothesis is the systematic avoidance of
consecutive falling-‐rising/3rd tones. While purely phonetically speaking, any two tones can
be pronounced side to side without a problem, when two 3rd tone characters occur next to
each other in an utterance, one of them must alter its tonal realization. For example:
你 ni3 “you” + 好 hao3 “good (adj.)” ! 你好 ni2hao3 “greeting”;
好 ni3 “good (adj.)” + 好 ni3 “good (adj.)” ! 好好 hao3hao1 “well (adv.)”.
This may be an implicit reflection of the absence of two stressed syllables within a word
before the hypothesized phonological reduction (this pattern exists in English words, which
can only have one primary stress per phonological word).
Besides the phonetic and phonological connections between certain tones and stress
patterns, they also share some morphological functions. In English, many polysyllabic
words have an invariable written form that can be pronounced in two distinct stresses to
Fang Shi [email protected]
denote related meanings of different word categories. For example (stressed syllable
marked in bold):
‘record’ Noun -‐ ‘record’ Verb; ‘present’ Adjective -‐ ‘present’ Verb.
This implicit phonomorphological knowledge sometimes extends by analogy to words
without a stress alternation and produce hypercorrections like the following:
‘defend’ Verb ! ‘?defense’ Noun, which is listed in most dictionaries with only one possible
stress: ‘defense’; ‘default’ Verb ! ‘?default’ Noun.
In Chinese, tonal alternation in many “heterotonic” characters may play the exact same role
as stress alternation in English, e.g. (only select pronunciations and corresponding
meanings relevant to the discussion here are listed):
好 hao3 “good (adj.) -‐ hao4 “to like”; 处 chu3 “locate” -‐ chu4 “location”;
差 cha1 “difference” -‐ cha4 “differ by”; 冠 guan1 “headwear” -‐ guan4 “to crown”.
(Note that 发 fa1 -‐ fa4 and 只 zhi1 -‐ zhi3 would not be good examples for the point made
here, for the different tones resulted from the merge of two originally distinct characters
before the Simplification.)
This phono-‐morphological pattern is supposed to be a lot more productive in earlier spoken
forms of Chinese. Characters that are no longer used as verbs in Modern Spoken Chinese,
when denoting actions in Classical Chinese and other literary contexts, are pronounced with
a falling/4th tone:
王 wang2 “king/lord” ! wang4 “to rule”; 衣 yi1 “clothes” ! yi4 “to put cloth on”.
(The basis of this is unclear and never explained, but the alternation pattern has been
passed on as an oral tradition in Classical Chinese instructions.)
In addition to the observations we can make about Modern Chinese, historical work also
favors the likelihood of a phonological reduction in early Chinese, which is essential to the
hypothesis argued here. “As early as 1861, R. Lepsius, from a comparison of Chinese and
Tibetan, had derived the conviction that ‘the monosyllabic character of Chinese is not
original, but is a lapse from an earlier polysyllabic structure’” (Jespersen: 370). By
comparing reconstructions of Old Chinese and Proto-‐Austronesian, L. Sagart found a
systematic correlation between the two and also argues that Chinese went through a
monosyllabicization process from their polysyllabic common ancestor (though Sagart tries
Fang Shi [email protected]
to fit the analysis into CITGM and does not argue for any connection between the
monosyllabicization and tones in Chinese).
And last, different stress patterns and their interpretations may account for the differences
in tonal realizations among the Chinese languages/dialects. To illustrate the point, suppose
that an imaginary atonal Dialect A of proto-‐Chinese borrowed the word “massage” from
French and thus pronounce it with a primary stress on the first syllable and a secondary
stress on the second syllable (in terms of English stress), and that another imaginary atonal
Dialect B of proto-‐Chinese borrowed the same word “massage” through English and thus
pronounce it with an English accent, i.e. with an unstressed first syllable and a stressed
second syllable. Should the phonological reduction as hypothesized take place, the word in
both dialects would be left with the same syllable [ma] but different tones reflecting the
original pitch contours due to the different stress patterns. Specifically, Dialect A may end
up having a high level tone (if only the contour of the first syllable is retained) or a high
falling (if the transitioning into the next syllable is also included), so the product is
somewhat like a modern Chinese 妈 ma1 or 骂 ma4; similarly, the product in Dialect B could
have a low pitch or falling-‐rising contour like that of 马 m3. Different systematic
segmentations and interpretations of stress contours in the production of tones may give
rise to similar tonal systems under the same hypothetic model to account for some cross-‐
language or dialectal variations.
5 - Conclusion
CITGM stood out as the most dominant tonogenesis paradigm partly due to the lack of other
satisfactory explanations (Jacquet: 20), and as I have pointed out, it still lacks experimental
support (2.1) as well as explanatory power (3.1-‐3.3). I have in turn suggested other possible
sources of linguistic tone, including the tonal default view (4.1), intonation (4.2), and stress
(4.3). These are only preliminary speculations on the subject and definitely require further
examinations, but we should certainly avoid hasty conclusions. While searching for
linguistic patterns and universals are definitely necessary for a better understanding of
language, resorting to only one model to account for the wide variety of tonal systems in
world’s languages may eventually prove unsuccessful. If every word has its own history, then
every current tonal system, too, may also have developed differently, and so may each tone.
Fang Shi [email protected]
6 - References
Abramson, A.S. “The Plausibility of Phonetic Explanations of Tonogenesis.” From Traditional
Phonology to Modern Speech Processing: Festschrift or Professor Wu Zongji’s 95th
Birthday: 17-‐29. Beijing: Foreign Language Teaching and Research Press (2004).
Chen, Matthew Y. Tone Sandhi: Patterns across Chinese Dialects. Cambridge University Press
(2000).
Crystal, David. The Cambridge Encyclopedia of Language. Second Edition. Cambridge
University Press (1997).
Hombert, Jean-‐Marie. “Consonant Types, Vowel Quality, and Tone.” Tone: A Linguistic Survey:
77-‐111. Academic Press (1978).
Hombert, Jean-‐Marie, Ohala, John J. & Ewan, William G. “Phonetic Explanations for the
Development of Tones.” Language, 55, 37-‐58 (1979).
Jacquet, Janus Bahs. “Tonogenesis in Early Chinese.” Electronic copy accessed May 2013 at
http://eithne.dk/ba.pdf.
Jespersen, Otto. Language: Its Nature, Development and Origin. Chapter XXI: The Origin of
Speech. London: George Allen & Unwin Ltd. (1922).
Laver, John. Principles of Phonetics. Chapter 15: The Prosodic Organization of Speech: Pitch
and Loudness. Cambridge University Press (1994).
Maddieson, Ian. The World Atlas of Language Structures Online. Chapter 13: Tone. Accessed
May 2013 at http://wals.info/chapter/13.
Sagart, Laurent. “Austronesian Final Consonants and the Origin of Chinese Tones.” Oceanic
Linguistics Special Publications, No. 24, Tonality in Austronesian Languages: 47-‐59.
University of Hawai’i Press (1993).
Yip, Moira. Tone. Chapter 1: Introduction. Cambridge University Press (2002).