Tonogenesis From a Native Chinese Speaker’s Perspective

Fang Shi [email protected]

Tonogenesis from a Native Chinese Speaker’s Perspective

Abstract

This paper examines the mainstream tonogenesis model and proposes alternative

hypotheses on a number of possible sources of linguistic tone. The analysis and hypotheses

draw on general linguistic knowledge as well as personal experience and intuition as a

native tonal language speaker, and they are intended to both provide competing

explanations and refine existing theories and their applications.

0 - Terminology and Notation

Tone, for the purpose of this paper, is defined as pitch variation over the domain of a

morpheme or a word that systematically differentiates lexical or grammatical meanings, as

exemplified by tonal minimal pairs. This definition intends to rule out a major source of

confusion: intonation, which is usually over the domain of an entire utterance (i.e. often

multiple words) and denotes meanings such as speaker attitudes and emotions. However,

as I will explain in one of my hypotheses, the actual distinction can get blurry.

Examples will be demonstrated mostly in Chinese and English, the two languages I’m most

familiar with. Some fundamental knowledge of Mandarin Chinese is thus presumed. For

convenience, pronunciation of Chinese characters will be transcribed in pinyin, with tones

marked in corresponding numerals (1-‐4 for the four tones in Modern Standard Chinese, and

0 for the neutral/light tone, e.g. 拼音 pin1yin1).

1 - Introduction to Mainstream Tonogenesis

Initiated by A. G. Haudricourt’s work on Vietnamese tones, later rationalized by J. M.

Hombert’s phonetic experiments and physiological explanations, and further supported by

similar observations in some tonal languages, the now most dominant tonogenesis

paradigm (theory that tries to explain the source of linguistic tones) can be summarized as

below:

Due to intrinsic articulatory constraints, consonants may affect the pitch of vowels that

follow or precede them. These consonants are said to affect the mode of voicing of the

neighboring vowels and thus raise or lower their pitch. After these consonants merge or

disappear through phonological changes, their effects on the vowel pitch remain as the


contrastive feature and are perceived as tones. In particular, in the case of prevocalic

plosives, voicing lowers the pitch of vowels that immediately follow. This, by various

accounts, is described as the most widely attested source of tonogenesis, and accounts for a

high vs. low tonal distinction in many languages. In addition, prevocalic and postvocalic

influences may combine to produce more complex tonal systems.

2 - A Brief Examination of Hombert

Regarding the origin of tone in language, the two most cited sources are Houdricourt’s “De

l’origine des tons en vietnamien” (1954) and Hombert’s “Consonant Types, Vowel Quality,

and Tone” (1978). I unfortunately do not know French well enough to read Houdricourt’s

original work, but I will instead point out a few things I noticed in Hombert’s paper that

concern the experimental basis of the general model.

Both before and after reading Hombert’s paper, I performed experiments on a native

American English speaker and myself (proficient English speaker and linguistically trained),

testing out the effect of prevocalic voicing on vowel pitch. For no obvious reasons, results

based on my own speech showed no significant correlation but only minor free variations of

the pitch, but a sample from the native English speaker favors Houdricout and Hombert’s

idea and shows at the vowel onset a depressing effect of less than 10 Hz on a 130-‐150 Hz

fundamental frequency range. The effect is well below 10% of the speaker’s normal pitch

level (percentage is used here instead of absolute value since it seems that the higher the

pitch value, the more it can potentially get perturbed), consistent with the three test results

by House and Fairbanks (1953), Lehiste and Peterson (1961), and Mohr (1968) as quoted

by Hombert (79). Hombert’s experiments, on the other hand, all showed quite dramatic

effects on vowel pitch, in most cases more than 10%, sometimes to as much as over 20% of

the speakers average pitch value. The huge difference on almost identical experiments

appeared to me as very odd, and some subtle detail in Hombert’s graphs of his experiment

results make it even more dubious whether his experiments could be biased.

Suppose a speaker pronounces two syllables with the same pitch, and suppose the different

consonantal onsets (or codas) of these syllables have opposite effects on the pitch of the

identical vowel nucleus, we would expect, on a graph mapping the vowel pitches over vowel

onset time (or time till closure, as with the case of different codas), two curves start (or end,


as with codas) at different pitches due to the inverse effects of the two consonants, and as

the curves extend away from the onset/closure, they would draw closer to each other and

ideally merge into the pitch level normal of that vowel to the speaker (suppose the vowels

last long enough and extend over the time range of the consonants’ influence). In Hombert’s

experiments, this expectation is not merely met, but far exceeded; the two curves actually

cross over (particularly visible in his S3 from Figure 2 for Experiment I and S1 from Figure

7 for Experiment V). The unexpected crossover appeared to me as either a sign of

incautious contrivance of the graphs or an indication of biases that might have existed in

Hombert’s experiments.

A possible cause of bias which may lead to the exaggerated pitch difference and the

crossover could have been induced by Hombert’s prompts given to the test subjects. Though

Hombert did arrange the test words in random order (and thus rule out the possible bias

due to a conventional intonation pattern), he gives no unambiguous specification of

whether the prompt was given in text or speech. If given in speech, especially by the author

or someone with the belief on effects of the consonants, he could either intentionally or

subconsciously pronounce the syllable with a pitch height that reflects that belief and

therefore hint at the test subject to repeat with the same pitch. A particularly likely hint to

have caused the above-‐mentioned crossover is a low rising pitch contour on one (set of)

syllable(s) and a high falling contour on the other. This psychological process, ironically, can

be very well explained by Hombert himself, using the exact words he used in a very similar

paper to justify how minor pitch perturbations caused by consonants can induce tones:

“Since the listener does not have independent access to the mind of the speaker, and

thus may be unable to determine what parts of the received signal were intended and

what were not, he may intentionally reproduce and probably exaggerate these

distortions when he repeats the same utterances.” (Hombert et al. 1979:37)

Also worthy of mentioning, Hombert’s Experiment VI (94-‐95) was designed to determine

whether small changes of F0 (which represent consonantal effects on vowel pitch) are

perceptibly significant for listeners to notice. The design of this experiment itself appears

legitimate, but the outcome is apparently lacking, and his interpretations and conclusion

unjustifiable. No specific data resulted from the experiment was given at all, and while at


least some kind of correlation is expected to be established, the only outcome Hombert

bothered to mention is a vague description saying that the set of rising contours and the set

of falling contours are statistically significantly different to listeners’ perception. The way

Hombert interprets an important variable ∆t (the duration of the pitch change) in the result

renders his conclusion even more questionable. Logically speaking, suppose the magnitude

of fundamental frequency change (∆F in Hombert’s experiment) is held constant, the

shorter the ∆t, the sharper the pitch change, and the more likely it is perceived by listeners.

Interestingly, Hombert states that for a certain ∆t value and all values greater than it, the

perceived difference was significant. This yields a counter-‐intuitive implication that in

Hombert’s experiment, the sharper the pitch changes, somehow the worse they were

perceived. Unless these doubts can be cleared, Hombert’s experiments cannot sufficiently

support that the consonantal influences exist and can be perceived, even in ideal lab

conditions.

Some physical restrictions of Hombert’s experiments may also impair their credibility. The

hardware pitch extraction method used in his experiments for measuring fundamental

frequency is now far out-‐dated, and Hombert even noted himself how most F0 extractors of

the time performed poorly and caused difficulty for obtaining accurate measurements (88).

Another obvious limitation is the shortage of test subjects. Hombert mostly had 3-‐4

speakers’ speech analyzed in each experiment, and in some of these cases, only a single

speaker per sex per language. This may also be due to the painstaking effort the pitch

measuring methods of the time took, but the low sample size yields no convincibility

beyond that suitable only for preliminary speculations.

With the problems mentioned above, I’d encourage anyone interested in this subject,

especially anyone who intends to cite Hombert or any work that does so, to repeat the

experiments and check on the results, which can now be quite conveniently obtained

through spectrogram analysis of digital recordings using computer softwares such as Praat.

3 - General Problems of the Model

Above I’ve used Hombert (1978) as a popular example of phonetic support for the

consonantal influence tonogenesis model (abbreviated CITGM for convenience) to identify

some specific weaknesses in the experimental foundation of model. Abramson (2004) also


offers a concise summary of the development of CITGM and a number of representative

publications, also pointing out certain disputes within CITGM and discrepancies among

related phonetic experiments. The CITGM, though already widely accepted, lacks not only

definitive experimental support, but also explanatory power in some more general aspects,

which I’ll focus on next.

3.1 - The Gap between Pitch Perturbations and Tonal Actualizations

Without the need of referring to specific data, anyone who speaks a tonal language or at

least familiar with one can tell how insignificant the supposed vowel pitch difference caused

by consonants is (trivial indeed, to my ears native to a tonal language, which are already

extra sensitive to pitch differences), compared to the clear pitch distinctions of real

language tones. Not to mention there is yet a firm answer to whether the minor consonantal

effects are strong enough to be perceived in real language environments, as opposed to in

ideal lab conditions where human can technically discriminate pitch difference as small as

+/-‐ 1 Hz in a 80-‐160 Hz range (Laver: 451).

Even if we stand back for a moment and grant the possibility that the proposed pitch

perturbations caused by consonants may by perceptually significant enough to induce tones,

a paradox would immediately arise. While asserting the features of voicing and mode of

phonation can cause tones, why would the other sources of pitch perturbation, among

which the most notable is inherent vowel pitch, be overlooked entirely by CITGM?

It’s been well observed that there is a “systematic correlation between average pitch of

vowels and vowel height … the higher the vowel, the higher the pitch”, and the pitch

difference can be “as much as 25 Hz” (Laver: 454). The measured internal F0 of vowels can

differ significantly while perceptually they are spoken with a consistent pitch level,

indicating that natural vowel pitch difference due to physiological constraints is likely to be

psychologically normalized for speakers’ perception and thus not perceived as

differentiating cues. While it’s puzzling enough that CITGM bases its premise on the exact

opposite argument, it even goes as far as asserting the pitch differences caused by

consonants can be perceived to induce tones but those inherent to vowels cannot. Chen

(2000) expresses the same doubt on Hombert’s attitude:


“[S]urprisingly, despite the well-‐known intrinsic pitch variations associated with

vowel height, tone split along the high/low vowel distinction is so rare that Hombert

et al. (1979:52) state flatly: “It would seem that the interaction between tones and

vowel height works in only one direction: tone can affect vowel height, but not vice-‐

versa” (11).

If the pitch perturbative effect of adjacent consonants on vowels is somehow indeed

significant enough to induce tones through mergers or lost of consonants, then the inherent

pitch of vowels could also have done so through the merging of vowels. For example, a very

common merger of adjacent vowels [o] and [u] is observed in languages. Using data from

Laver (454), if an [u] of inherent pitch 182 Hz merges with an [o] of 170 Hz, the most

significant linguistic cue left to distinguish the previously minimal pairs with a [o/u]

distinction would be the 12 Hz pitch difference. This pitch difference is as large as the

consonantal perturbations attested by most experiments in favor of a CITGM, not to

mention vowels with even greater internal pitch difference could merge. Apparently, a good

explanation is required by CITGM to clear why the same phonological process never

happens to a highly comparable, if not more salient, set of linguistic cues.

3.2 - Over-reliance on Reconstructions and the Missing Intermediate Stage

Language change, however rapid, as a matter of social behavior, takes place gradually, and

different cultural groups often exemplify a range of different stages in development,

together offering us a whole picture of a continuous changing process. Consider a common

change in language that a case-‐marking language shifts towards one that employs word

order to designate semantic roles such as subject and object. What we can observe is that

some languages use case-‐marking extensively and allow free word order, and some are

case-‐free and require fixed word orders, but also a significant percentage of languages use

both strategies at the same time, even though one would totally suffice. One linguistic

principle we may abstract from this picture is that language is not perfectly logic-‐driven: not

all redundancy in language is eliminated, and even if a redundant feature is to be eliminated,

the change may take a long period in history to complete, and in turn contributes to the

abundance of typologically diverse languages. The same principle can be seen in languages

that require both conjugation on the verb and corresponding subject pronouns, and it’s

even more common in the realm of phonetics and phonology. From a segmental perspective,

adjacent phonetic segments often carry features that are not originally of their own but


assimilated from each other, and both segments would retain and thus share the same set of

features.

The same principle, if applied to the phonological process essential to CITGM, in which

tones emerge out of the merger of originally separate phonemes, should leave us with an

abundance of languages in the intermediate stage of change, in which tones coexist with the

hypothesized consonant contrasts. CITGM, however, often skips this stage and assumes the

change to complete instantly, leaving no trace behind. Arguments for CITGM hardly ever

back themselves up with live examples of language that has developed tone but has yet to

rid of the now redundant consonantal contrast that induced the tone. Instead, most of them

rely exclusively on reconstructions, and thus subject their validity to the quality of the

reconstructions, and moreover, to the inherent variability of the method of reconstruction.

This over-‐reliance sometimes can lead to severe logical fallacy. For instance, it’s now widely

held that Old Chinese was atonal, and that tones first emerged in Chinese as a result of to

postvocalic consonantal influences on vowel pitch. People since Haudricourt have

reconstructed Old Chinese with codas such as [ʔ], [h] and [s] to account for the first round of tonogenesis in Chinese, along the train of thoughts of CITGM (Jacquet: 14-‐21). There is,

however, no evidence whatsoever to support the historical existence of such codas in

Chinese other than that CITGM demands so. Now when these unexamined assumptions are

taken for granted, many theorists even cite the reconstructed Old Chinese codas as a

support for CITGM, which simply results in circular logic.

3.3 - Cross-linguistic Variance Unexplained/Unexamined

If voicing in prevocalic consonants naturally induces a lower pitch in the vowel, why the

loss of voicing distinction in some languages (e.g. Hawaiian) did not cause tones to emerge?

What are the critical differentiating factors? Additionally, CITGM literature seems to be

exclusively concerned with tones over monosyllabic morphemes. In reality, many tonal

languages, especially those in Central and Southern Africa, may have tonal patterns that

range over multiple syllables: there could be a high-‐low vs. low-‐high tonal alternation to

differentiate meaning, as in the case of Dagaare; or as in Chizigula, there could be a tone

that’s placed consistently on the penultimate syllable of a verb (Yip: 2). These kinds of tonal


strategy can hardly be results of mergers of contrasting consonants, or else would certainly

require hyper intricate explanations to fit into CITGM.

A curious phenomenon I mentioned in section 2 also begs for extra inquiry, and research

may give rise to fresh perspectives on cross-‐language variance in pitch perception. If the

pitch depressing effect of voiced plosives does exist and indeed results from articulatory

constraints of the human vocal tract (as CITGM theorist rationalized), I, even as a native

speaker of a tonal language, should for no reason be exempt from this physiological effect.

Yet, mini experiments I conducted on myself showed no such effect at all, as if the mentality

of tonal language speakers automatically precludes it. Well-‐controlled experiments

involving a larger number of speakers of diverse native languages would help verify this. If

my hunch proves positive that perturbations by consonants do not apply to native tonal

language speakers for some reason, among many possible implications, the tonal split (the

second round of tonogenesis) in the history of Chinese and Vietnamese caused by prevocalic

voicing distinction as argued by CITGM would be disproved, since the causing mechanism

would break down for languages that are already tonal. If experiments show otherwise, at

least the consonantal effects get to be claimed a universal with more certainty.

4 - Alternative Hypotheses

Although CITGM provides some temporary relief to the complexities of linguistic tone, as I

have pointed out, it still lacks explanatory power and experimental support and therefore

has a long way to become a truly satisfactory theory. And next, I will propose a few other

possible sources of linguistic tone, some of which may parallel or complement those

suggested by CITGM.

4.1 - The “Tonocentric” View / Tonal as Default

Yip (2002:1) estimates that 60-‐70 percent of languages are tonal, The Cambridge

Encyclopedia of Language gives “well over half” (Crystal: 174), and in WALS’s sample of 527

world’s languages, about 42% are tonal and it’s noted as an underrepresentation. Though

the exact numbers may vary, a considerably large portion of human languages, either by the

number of languages or the population of speakers, are actually tonal languages. This

information appeared surprising to me at first and perhaps so to many others, most likely

due to the prevalence of atonal Indo-‐European languages on earth. Even within the field of


linguistics, tone tends to receive only marginal attention and sometimes is simply ignored

(Yip:1). Also considering the absence of native tonal language speaker input in the

theorizing phase of CITGM, it’s actually no wonder how it came into being with an

underlying mentality that, since tones do not exist in the predominant European languages

and do not appear intuitive to the theorists, they must have not been there in the first place

and thus must be derivative of something more typical (to those theorists). The term

“tonogenesis” itself, as coined by James Matisoff, actually carries a similar implication that

linguistic tone is not something inherent to language, but rather a remarkable phenomenon

that occurred outside the norm.

A particular class of uncommon consonants, clicks, is popularly thought to be a remnant of

early human languages, since languages with click sounds cluster heavily in Africa, where

early humans originated, and that click sounds are seen as rather complicated and unlikely

to have evolved from more common sounds. Yet with tones, which are also particularly rich

in Africa’s aboriginal languages, theorists would rather come up with intricate explanations

to make them fit into a Eurocentric paradigm.

In terms of mechanism, the variance of pitch easily qualifies as a crucial component of early

human languages. While reconstructing early human languages is well beyond the scope of

historical linguistics, we can nevertheless make inferences from relevant observations.

Evolution is a continuous process, and the physiological and psychological capacities that

enable the complex modern human languages did not come into existence overnight.

Language itself is also likely to have evolved gradually from the primitive use of sounds to

convey simple meanings. Distinguishing features that mark the proto-‐language sounds are

improbable to be manners and places or modes of articulation, as these would require

highly specialized organs and related neural controls, which were apparently not fully

developed at first. What appears much more probable to be a core controllable variable in

the primitive language is pitch, as can be demonstrated by most animal communication

systems that deploy the vocal-‐auditory channel. Similar to animal “languages” (consider

that of birds, elephants, and dolphins, for example, in which pitch is clearly the main

variable), our proto-‐language could have consisted of very limited segmental variables -‐

possibly just an invariable sound, and used primarily pitch combinations together with

rhythm to differentiate meaning. Such a language is technically capable of expressing


complex meanings just like regular modern spoken languages and would have allowed

further development of language capacities during the human evolution. The living

examples of whistled languages can well demonstrate this phonological strategy in the

hypothesized proto-‐human language. Whistled languages can employ two possible

strategies to communicate through whistling alone, the simpler and more common of which,

also being the more relevant strategy here, is by whistling the pitch contour of the spoken

language alone, and evidently in tonal languages this can conduct “effective communication

of quite extended linguistic messages” (Laver: 481) and “convey precise distinctions”

(Crystal: 404).

Biological evidence may also shed light on our speculation of the proto-‐human language.

Through making plaster casts of the bony cavities within the fossil skulls of early human

and comparing this reconstructed vocal tract to that of modern man, anthropologists

inferred that “Neanderthal man (70-‐35,000 BC) would have been able to utter only a few

front consonant-‐like sounds and centralized vowel-‐like sounds, and may have been unable

to make a contrast between nasal and oral sounds” (Crystal: 292). In addition, this

reconstructed vocal tract is “remarkably similar to that of a newborn baby” (Crystal: 292).

Also considering that the early humans would have limited psychological capacities of

language (likely to be comparable to those of a baby), we may speculate that the proto-‐

human language can be very similar to the “speech” of a newborn baby, which phonetically

consists of primitive sounds ambiguous of articulatory constraints and phonologically

employs pitch as the major controllable variable. Otto Jespersen made a very similar

speculation regarding the origin of language (1922: 416-‐417), and he also noted a general

trend in language of “gradual disappearance of tone or pitch accent” (419):

“[T]his has been the case in Danish, whereas Norwegian and Swedish have kept the

old tones; so also in Russian as compared with Serbo-‐Croatian. In ... old Indian, Greek

and Latin … pitch accent played a prominent part. ... In modern Greek and in the

Romanic languages the tone element has been obscured, and now 'stress' is heard on

the syllable where the ancients noted only a high or a low tone” (419).

Jespersen not only inferred from this that tone played an important part in our primitive

languages, but also traced further back along the chain of thoughts and posited a singular

source -‐ a form of primitive singing -‐ for both language and music (431-‐437).


Language may have originated as a side product of a primitive “singing” of no meaningful

lyrics but pure emotional expression. The more emotional side of it captured the more

abstract aspects of the singing and later branched off into what we now call “music”

(consider the similarity between the emotional faculties of music and that retained in

language intonation). A utilitarian side of the primitive singing also branched off and started

to associate pitch patterns with certain emotions and meanings, and this eventually evolved

into language. As physiological and psychological developments in human evolution

enabled more advanced articulatory distinctions (e.g. consonants and vowels) to be made to

differentiate meaning, this proto-‐human language gradually reduced the functional load of

pitch variance (which was gradually replaced by increasing options of consonant and vowel

distinctions) for more efficient coding. The mixed use of pitch variance and other phonetic

features eventually came into an equilibrium: in some languages, the functional load of pitch

reduced to the same level as that of consonants and vowels, and these languages are now

referred to as tonal languages; on the other end of the spectrum, the reduction went further

and leaves pitch with only the domain of intonation, and these languages are considered

atonal.

If one finds this course of development of human language plausible, tone as a remnant of

primitive language ought not be overseen. The arrogant presumption that current tonal

languages must have developed from a toneless state should be seriously questioned. In

particular, the kinds of tonal strategies that don’t fit into CITGM well (c.f. 3.3) are highly

likely to have always been in the language since the very beginning.

4.2 - Intonation

All languages, whether tonal or not, have intonation. In a highly tonal language like

Mandarin Chinese, intonation is usually superimposed on lexical tones, but occasionally

they may interfere with each other. Many languages including Chinese and English share

common intonational schemes such as the inquisitive up-‐stepping pitch contour and the

confirmative/declarative down-‐drifting pitch contour. These patterns usually operate over

multiple syllables in a string of utterance, but if the utterance itself is short, especially when

it’s just a single syllable, the domain of intonation would coincide that of lexical tone. When

this happens in Chinese, the up-‐stepping intonation resembles the rising/2nd tone, and the

down-‐stepping intonation resembles the falling/4th tone. Under this principle, the


intonation in certain characters and phrases that are commonly associated with questions

or other emotions might have been perceived as part of the lexical items and then

internalized as tone.

Chinese question words/particles such as 何 he2 “what?” (literary), 咦 yi2 and 啥 sha2

“what/huh?” (colloquial), 什么 shen2me0 “what?” (standard), 谁 shei2/shui2 “who?”, and

archaic question markers 邪/耶 ye2 and 欤 yu2 all carry the 2nd (rising) tone. Thus their

intrinsic tone often coincides with the up-‐stepping intonational pitch contour in the

inquisitive utterance that they are in (the shorter the whole utterance the more obvious).

Another very interesting example is a highly heterophonic character in Chinese: 诶, an

exclamation word often used on its own as a complete expression. Many dictionaries now

list the following 4 pronunciations and their respective meanings (along with a few more

that are unrelated to the discussion here and thus not listed):

-‐ 诶 ei1 exclamation, to call attention

-‐ 诶 ei2 exclamation, to express surprise

-‐ 诶 ei3 exclamation, to express disdain/disagreement

-‐ 诶 ei4 exclamation, affirmation

The pronunciation is invariable except for the tone, and in each tone, the meaning matches

the emotion that’s usually expressed by the similar intonational contour.

Besides the profound connection between tone and intonation deeply rooted in the history

of language (c.f. 4.1), more recent interaction of the two may also have taken place. Suppose

that Chinese used to be atonal at some point as CITGM presumes, characters and phrases as

exemplified in this section could have developed tones first, and then, by analogy, they

could have assigned their tones to characters and phrases of identical or similar syllable

structures (e.g. those with the same prevocalic or/and postvocalic consonants). Otherwise,

if considering the examples above as merely isolated cases, intonation in already tonal

languages can still serve as a limited but nevertheless viable source of tone.

4.3 - Stress

Unstressed syllables at the initial position of polysyllabic English words (e.g. “po-‐” in

“position”) have a pitch contour that impressionistically simulates that of a fall-‐rising tone


(i.e. 3rd tone) in Mandarin Chinese. The rationalization may be that the unstressed thus low-‐

pitch syllable first weakens over its duration, and then prepares to transition into the next

stressed thus high-‐pitch syllable, altogether generating a low-‐lower-‐high pitch contour.

Suppose beyond all written records, Chinese has once had polysyllabic words in which the

first syllable was unstressed (in the same stress paradigm as that of English), and later

phonological reduction resulted in the loss of everything after this first unstressed syllable,

the original pitch contour over multiple syllables could have been condensed on the

remaining syllable and preserved as a meaningful unit, giving rise to what’s now a falling-‐

rising/3rd tone.

Though being a tonal language, Chinese also has stress patterns; most notably, certain

grammatical particles and some characters in specific lexical contexts are unstressed

(unmarked for tone, or conventionally said to be marked with a “light tone” / neutral tone),

e.g. 我的 wo3de0 “my”, 东西 dong1xi0 “stuff”, 好得多 hao3de0duo1 “way better”.

Interestingly, the “tonotactics” of Chinese seems to preclude the occurrence of any

unstressed syllable (neutral tone) at the initial position of any words (not even in

transliteration of foreign words with an unstressed initial syllable), resulting in a

complementary distribution between unstressed / 0-‐tone characters and the hypothesized

stress-‐induced third tone characters at word-‐initial positions. Another piece of tonotactics

in Mandarin Chinese that may relate to this hypothesis is the systematic avoidance of

consecutive falling-‐rising/3rd tones. While purely phonetically speaking, any two tones can

be pronounced side to side without a problem, when two 3rd tone characters occur next to

each other in an utterance, one of them must alter its tonal realization. For example:

你 ni3 “you” + 好 hao3 “good (adj.)” ! 你好 ni2hao3 “greeting”;

好 ni3 “good (adj.)” + 好 ni3 “good (adj.)” ! 好好 hao3hao1 “well (adv.)”.

This may be an implicit reflection of the absence of two stressed syllables within a word

before the hypothesized phonological reduction (this pattern exists in English words, which

can only have one primary stress per phonological word).

Besides the phonetic and phonological connections between certain tones and stress

patterns, they also share some morphological functions. In English, many polysyllabic

words have an invariable written form that can be pronounced in two distinct stresses to


denote related meanings of different word categories. For example (stressed syllable

marked in bold):

‘record’ Noun -‐ ‘record’ Verb; ‘present’ Adjective -‐ ‘present’ Verb.

This implicit phonomorphological knowledge sometimes extends by analogy to words

without a stress alternation and produce hypercorrections like the following:

‘defend’ Verb ! ‘?defense’ Noun, which is listed in most dictionaries with only one possible

stress: ‘defense’; ‘default’ Verb ! ‘?default’ Noun.

In Chinese, tonal alternation in many “heterotonic” characters may play the exact same role

as stress alternation in English, e.g. (only select pronunciations and corresponding

meanings relevant to the discussion here are listed):

好 hao3 “good (adj.) -‐ hao4 “to like”; 处 chu3 “locate” -‐ chu4 “location”;

差 cha1 “difference” -‐ cha4 “differ by”; 冠 guan1 “headwear” -‐ guan4 “to crown”.

(Note that 发 fa1 -‐ fa4 and 只 zhi1 -‐ zhi3 would not be good examples for the point made

here, for the different tones resulted from the merge of two originally distinct characters

before the Simplification.)

This phono-‐morphological pattern is supposed to be a lot more productive in earlier spoken

forms of Chinese. Characters that are no longer used as verbs in Modern Spoken Chinese,

when denoting actions in Classical Chinese and other literary contexts, are pronounced with

a falling/4th tone:

王 wang2 “king/lord” ! wang4 “to rule”; 衣 yi1 “clothes” ! yi4 “to put cloth on”.

(The basis of this is unclear and never explained, but the alternation pattern has been

passed on as an oral tradition in Classical Chinese instructions.)

In addition to the observations we can make about Modern Chinese, historical work also

favors the likelihood of a phonological reduction in early Chinese, which is essential to the

hypothesis argued here. “As early as 1861, R. Lepsius, from a comparison of Chinese and

Tibetan, had derived the conviction that ‘the monosyllabic character of Chinese is not

original, but is a lapse from an earlier polysyllabic structure’” (Jespersen: 370). By

comparing reconstructions of Old Chinese and Proto-‐Austronesian, L. Sagart found a

systematic correlation between the two and also argues that Chinese went through a

monosyllabicization process from their polysyllabic common ancestor (though Sagart tries


to fit the analysis into CITGM and does not argue for any connection between the

monosyllabicization and tones in Chinese).

And last, different stress patterns and their interpretations may account for the differences

in tonal realizations among the Chinese languages/dialects. To illustrate the point, suppose

that an imaginary atonal Dialect A of proto-‐Chinese borrowed the word “massage” from

French and thus pronounce it with a primary stress on the first syllable and a secondary

stress on the second syllable (in terms of English stress), and that another imaginary atonal

Dialect B of proto-‐Chinese borrowed the same word “massage” through English and thus

pronounce it with an English accent, i.e. with an unstressed first syllable and a stressed

second syllable. Should the phonological reduction as hypothesized take place, the word in

both dialects would be left with the same syllable [ma] but different tones reflecting the

original pitch contours due to the different stress patterns. Specifically, Dialect A may end

up having a high level tone (if only the contour of the first syllable is retained) or a high

falling (if the transitioning into the next syllable is also included), so the product is

somewhat like a modern Chinese 妈 ma1 or 骂 ma4; similarly, the product in Dialect B could

have a low pitch or falling-‐rising contour like that of 马 m3. Different systematic

segmentations and interpretations of stress contours in the production of tones may give

rise to similar tonal systems under the same hypothetic model to account for some cross-‐

language or dialectal variations.

5 - Conclusion

CITGM stood out as the most dominant tonogenesis paradigm partly due to the lack of other

satisfactory explanations (Jacquet: 20), and as I have pointed out, it still lacks experimental

support (2.1) as well as explanatory power (3.1-‐3.3). I have in turn suggested other possible

sources of linguistic tone, including the tonal default view (4.1), intonation (4.2), and stress

(4.3). These are only preliminary speculations on the subject and definitely require further

examinations, but we should certainly avoid hasty conclusions. While searching for

linguistic patterns and universals are definitely necessary for a better understanding of

language, resorting to only one model to account for the wide variety of tonal systems in

world’s languages may eventually prove unsuccessful. If every word has its own history, then

every current tonal system, too, may also have developed differently, and so may each tone.


6 - References

Abramson, A.S. “The Plausibility of Phonetic Explanations of Tonogenesis.” From Traditional

Phonology to Modern Speech Processing: Festschrift or Professor Wu Zongji’s 95th

Birthday: 17-‐29. Beijing: Foreign Language Teaching and Research Press (2004).

Chen, Matthew Y. Tone Sandhi: Patterns across Chinese Dialects. Cambridge University Press

(2000).

Crystal, David. The Cambridge Encyclopedia of Language. Second Edition. Cambridge

University Press (1997).

Hombert, Jean-‐Marie. “Consonant Types, Vowel Quality, and Tone.” Tone: A Linguistic Survey:

77-‐111. Academic Press (1978).

Hombert, Jean-‐Marie, Ohala, John J. & Ewan, William G. “Phonetic Explanations for the

Development of Tones.” Language, 55, 37-‐58 (1979).

Jacquet, Janus Bahs. “Tonogenesis in Early Chinese.” Electronic copy accessed May 2013 at

http://eithne.dk/ba.pdf.

Jespersen, Otto. Language: Its Nature, Development and Origin. Chapter XXI: The Origin of

Speech. London: George Allen & Unwin Ltd. (1922).

Laver, John. Principles of Phonetics. Chapter 15: The Prosodic Organization of Speech: Pitch

and Loudness. Cambridge University Press (1994).

Maddieson, Ian. The World Atlas of Language Structures Online. Chapter 13: Tone. Accessed

May 2013 at http://wals.info/chapter/13.

Sagart, Laurent. “Austronesian Final Consonants and the Origin of Chinese Tones.” Oceanic

Linguistics Special Publications, No. 24, Tonality in Austronesian Languages: 47-‐59.

University of Hawai’i Press (1993).

Yip, Moira. Tone. Chapter 1: Introduction. Cambridge University Press (2002).

Documents

Tonogenesis From a Native Chinese Speaker’s Perspective