Transcript

Cathryn Donohue

Fuzhou tonal acoustics and tonologyE LLS

AS

L82

ISB

N978

386288

522

0

Fu

zho

uto

nalaco

usti

cs

an

dto

no

log

y

Cath

ryn

Do

no

hu

e LIN

CO

MS

tud

ies

inA

sia

nL

ing

uis

tic

s82

LIN

CO

ME

UR

OP

A

ac

ad

em

icp

ub

lic

ati

on

s

LINCOM

The

Fuzh

ou

variety

of

Chin

ese

belo

ngs

toth

eM

india

lect

gro

up,

spoke

nin

the

capita

lof

Fujia

npro

vince

.It

iskn

ow

nfo

rits

com

ple

xto

nal

syst

em

,'a

ltern

atin

g'

vow

els

,and

com

plic

ate

dright

dom

inant

tone

sandhi.

How

eve

r,pre

vious

desc

riptio

ns

have

typic

ally

been

base

don

audito

ryim

pre

ssio

ns

of

asi

ngle

speake

r.T

his

study

pre

sents

the

first

multi

-speake

raco

ust

icquantif

icatio

nof

the

cita

tion

tones

inF

uzh

ou.

Usi

ng

two

male

and

two

fem

ale

speake

rs,

mean

fundam

enta

lfr

equency

and

dura

tion

data

for

the

cita

tion

tones

are

pre

sente

dand

dis

cuss

ed

befo

reth

edata

isnorm

aliz

ed

acr

oss

speake

rsto

fact

oro

uta

ny

betw

een-s

peake

rva

riatio

n.

The

phys

iolo

gy

of

tone

pro

duct

ion

inF

uzh

ou

isexp

lore

dth

rough

am

plit

ude

measu

rem

ents

,in

direct

lyass

ess

ing

the

poss

ible

role

of

voca

lco

rdte

nsi

on

(VC

T)

and

subglo

ttal

pre

ssure

(Ps)

thro

ugh

applic

atio

nof

the

model

pre

sente

din

Monse

net

al.

(1978)

whic

he

xte

nd

sth

eIs

hiz

aka

-Fla

na

ga

ntw

o-m

ass

mo

de

lo

fvo

ca

l-fo

ldvi

bra

tion.

Inth

isst

udy,

both

VC

Tand

Ps

were

found

tobe

equally

import

antfo

rto

nalp

roduct

ion.T

he

tonalp

honolo

gy

ofF

uzh

ou

isals

oexa

min

ed.F

irst

,tw

om

ajo

rst

udie

sare

revi

ew

ed

(Chan

1985

and

Yip

1990)

befo

renew

data

for

the

dis

ylla

bic

tone

sandhi

ispre

sente

d.

Analy

ses

of

these

data

usi

ng

two

diff

ere

nt

models

(Auto

segm

enta

lP

ho

no

log

ya

nd

an

ap

pro

ach

usin

gtr

ad

itio

na

lC

hin

ese

ton

al

cate

gories)

are

then

exp

lore

dand

com

pare

d.

All

the

data

from

the

study

are

pre

sente

din

the

appendic

es.

FU

ZHO

U TO

NA

L AC

OU

STICS

AN

D TO

NO

LOG

Y

Cathryn D

onohue The U

niversity of Hong K

ong

For: LIN

CO

M STU

DIES IN

ASIA

N LIN

GU

ISTICS 82

Copyright inform

ation etc.

You have always lit the way for me with your bright smile and warm hugs,

your golden eyes sparkling with wit and humour. You have carried me along with your unfailing support, unconditional love,

your encouragement and optimism. Your gentle shepherding and voice of reason has always been

the beacon I needed on a foggy night; and our shared laughter and friendship, the sunshine in my day.

You have always believed in me, when I wasn’t even sure that I did.

This is not just for you, this is because of you. It is as much yours as it is mine.

Forever, my light, my love, my inspiration – M

y Mum.

Contents

Contents .................................................................................................................................... i  List of figures ....................................................................................................................... iii  List of tables ......................................................................................................................... iv  

Forward .................................................................................................................................... v  

Preface ..................................................................................................................................... ix  A

cknowledgem

ents ................................................................................................................. xi  C

hapter 1. Introduction .......................................................................................................... 1  

1.1  Fuzhou as a C

hinese dialect ......................................................................................... 1  1.2  

Characteristics of the M

in dialect group ...................................................................... 4  1.3  

Peculiarities of Fuzhou ................................................................................................. 6  1.4  

Previous studies of Fuzhou .......................................................................................... 7  1.5  

Summ

ary ..................................................................................................................... 9  C

hapter 2. Previous analyses of Fuzhou tonology .............................................................. 11  

2.1  A

n overview of Fuzhou tones .................................................................................... 11  

2.2  C

han’s analysis of Fuzhou tonology .......................................................................... 14  2.3  

Yip’s analysis of Fuzhou tonology ............................................................................ 21  

2.4  Tw

o analyses compared ............................................................................................. 26  

Chapter 3. A

coustic quantification of the citation tones ................................................... 27  3.1  

Consultants ................................................................................................................. 27  

3.2  The corpus and elicitation .......................................................................................... 28  

3.3  A

coustic instrumentation and m

ensural procedure .................................................... 29  3.4  

Results ........................................................................................................................ 33  

3.5  Sum

mary .................................................................................................................... 33  

Chapter 4. A

coustic characteristics of the citation tones ................................................... 35  4.1  

Method of data interpretation ..................................................................................... 35  

4.2  A

uditory characteristics .............................................................................................. 36  4.3  

Preliminary considerations ......................................................................................... 37  

ii 4.4  

Individual speaker results ........................................................................................... 40  4.5  

A com

parison of the individual speaker’s results ...................................................... 49  4.6  

Norm

alization ............................................................................................................. 51  4.7  

Distinctive features for Fuzhou tones ........................................................................ 58  

4.8  Sum

mary .................................................................................................................... 58  

Chapter 5. T

he physiology of tone production in Fuzhou ................................................. 61  5.1  

Why investigate am

plitude? ....................................................................................... 61  5.2  

What is am

plitude? ..................................................................................................... 62  5.3  

Received theories of F0 production ........................................................................... 62  

5.4  A

r and F0 relationship ................................................................................................ 66  5.5  

VC

T and Ps relationship ............................................................................................ 70  5.6  

Ar/F0 vs. V

CT/Ps relationships ................................................................................. 72  

5.7  D

iscussion .................................................................................................................. 72  5.8  

Summ

ary .................................................................................................................... 72  C

hapter 6. Fuzhou disyllabic tone sandhi ........................................................................... 75  6.1  

Procedure .................................................................................................................... 75  6.2  

Results ........................................................................................................................ 76  

6.3  Som

e views on tonological representation ................................................................. 82  

6.4  Fuzhou tonology: A

utosegmental .............................................................................. 84  

6.5  Fuzhou tonology: C

ategorical approach .................................................................... 99  6.6  

A com

parison of the approaches .............................................................................. 105  6.7  

Summ

ary and conclusion ......................................................................................... 105  C

hapter 7. Summ

ary ........................................................................................................... 107  R

eferences ............................................................................................................................ 109  A

ppendix A: C

orpus: citation tones ..................................................................................... 115  A

ppendix B: C

orpus: disyllabic expressions ....................................................................... 117  A

ppendix C: R

aw F0 and duration m

easurements ............................................................... 121  

Appendix D

: Mean F0 and duration m

easurements ............................................................. 135  

Appendix E

: Norm

alized F0 values ..................................................................................... 137  A

ppendix F: Raw

amplitude m

easurements ........................................................................ 139  

Appendix G

: Mean am

plitude and duration ......................................................................... 149  A

ppendix H: V

CT/Ps and F0/A

r measurem

ents ................................................................. 151  A

ppendix I: VC

T/Ps and F0/Ar plots .................................................................................. 157  

iii

List of figures Figure 1.1. G

eographical distribution of Chinese dialect groups. ............................................ 2  

Figure 1.2. A possible subgrouping of the C

hinese dialect groups. ......................................... 5  Figure 2.1. C

hen & N

orman: citation tones. .......................................................................... 11  

Figure 2.2. A list of near-m

inimal pairs in Fuzhou. ............................................................... 11  

Figure 2.3. Chan: citation tones and tonem

es. ....................................................................... 15  Figure 2.4. The interaction of R

egister and Tone. .................................................................. 22  Figure 2.5. Y

ip: citation tones. ............................................................................................... 23  Figure 2.6. Y

ip: citation tones—feature assignm

ent .............................................................. 24  Figure 3.1. Stim

uli phonotactics ............................................................................................ 29  Figure 3.2. N

arrow and w

ideband spectrograms of [tu] uttered by ZPW

. ............................. 31  Figure 3.3. Spectrogram

of tone 3 [pa] spoken by ZPW w

ith non-modal phonation. ........... 32  

Figure 3.4. Average am

plitude spectrogram on tone 1 [tu] spoken by ZPW

. ........................ 32  Figure 4.1. W

XQ

: Citation tones [0–100%

duration]. ........................................................... 38  Figure 4.2. FM

: Citation tones [0–100%

duration]. ............................................................... 38  Figure 4.3. LY

: Citation tones [0–100%

duration]. ............................................................... 39  Figure 4.4. ZPW

: Citation tones [0–100%

duration]. ............................................................ 39  Figure 4.5. W

XQ

mean F0 contours from

5csec. to 95% duration. ....................................... 42  

Figure 4.6. FM m

ean F0 contours from 5csec. to 95%

duration. ........................................... 44  Figure 4.7. LY

mean F0 contours from

5csec. to 95% duration. ........................................... 46  

Figure 4.8. ZPW m

ean F0 contours from 5 csec. to 95%

duration. ....................................... 48  Figure 4.9. N

ormalized tone 1 .............................................................................................. 53  

Figure 4.10. Norm

alized tone 2 .............................................................................................. 53  Figure 4.11. N

ormalized tone 3 .............................................................................................. 53  

Figure 4.12. Norm

alized tone 4 ............................................................................................. 53  Figure 4.13. N

ormalized tone 5 .............................................................................................. 54  

Figure 4.14. Norm

alized tone 6 .............................................................................................. 54  Figure 4.15. N

ormalized tone 7 .............................................................................................. 54  

Figure 4.16. Mean norm

alized tone 1 .................................................................................... 55

Figure 4.17. Mean norm

alized tone 2 ..................................................................................... 55  Figure 4.18. M

ean normalized tone 3 ..................................................................................... 55  

Figure 4.19. Mean norm

alized tone 4 ..................................................................................... 55  Figure 4.20. M

ean normalized tone 5 .................................................................................... 56  

Figure 4.21. Mean norm

alized tone 6 ..................................................................................... 56  

iv

Figure 4.22. Mean norm

alized tone 7 ..................................................................................... 56  Figure 4.23. M

ean normalized tones in Fuzhou plotted against m

ean duration. ................... 57  Figure 5.1. M

odified map from

Monsen et al. (1978). D

ata synthesized for vocal folds of m

ale dimensions. ............................................................................................................. 65  

Figure 5.2. Modified m

ap from M

onsen et al. (1978). Data synthesized for vocal folds of

female dim

ensions. .......................................................................................................... 66  Figure 5.5. A

r/F0 plotted against equalized duration for Tones 5–6, speaker 2 (FM). .......... 68  

Figure 5.6. Ar/F0 plotted against equalized duration for Tones 5–6, speaker 3 (LY

). .......... 69  Figure 5.7. V

CT and Ps plotted against equalized duration for tones 1–3, speaker 2 (FM

). . 71  Figure 5.8. V

CT and Ps plotted against equalized duration for tones 1–3, speaker 3 (LY

) ... 71  

List of tables T

able 1.1. Min reflexes of M

C ‘d’ ........................................................................................... 5  

Table 1.2. Fuzhou’s seven tones. .............................................................................................. 6  

Table 1.3. A

lternating vowel pairs in Fuzhou .......................................................................... 7  

Table 1.4. Previous descriptions of Fuzhou tones. ................................................................... 8  

Table 2.1. Fuzhou disyllabic tone sandhi form

s (Chen &

Norm

an 1965) ............................. 12  T

able 2.2. Fuzhou vowel alternations (C

hen & N

orman 1965) ............................................. 14  

Table 2.3. Fuzhou disyllabic tone sandhi form

s (Chan 1985) ................................................ 16  

Table 2.4. Y

ip: Vow

el alternations ......................................................................................... 22  T

able 2.5. Fuzhou disyllabic tone sandhi forms (Y

ip 1990) .................................................. 23  T

able 4.1. Auditory description of the tones in Fuzhou. ........................................................ 36  

Table 4.2. Phonological feature assignm

ent for Fuzhou. ....................................................... 58  T

able 6.1. Fuzhou tone sandhi forms. .................................................................................... 77  

Table 6.2. First syllable changes in the disyllabic tone sandhi. ............................................. 79  

Table 6.3. Fuzhou disyllabic tone sandhi form

s (Chen &

Norm

an 1965) ............................. 80  T

able 6.4. Fuzhou disyllabic tone sandhi forms (C

han 1985) ................................................ 80  T

able 6.5. Fuzhou disyllabic tone sandhi forms (Y

ip 1990) .................................................. 81  T

able 6.6. Fuzhou tonal feature assignment ........................................................................... 87  

Table 6.7. Possible feature assignm

ent to the sandhi tones. ................................................... 88  T

able 6.8. Middle C

hinese tone categories/ Wenzhou tonal contours ................................. 100  

Table 6.9. Fuzhou tones and traditional M

C categories ....................................................... 101  

Table 6.10. Fuzhou tone sandhi organized in term

s of MC

tonal categories. ...................... 102

1

1

Introduction

The aim of this book is to provide an investigation of the phonetics and phonology of tones

in the Fuzhou variety of Chinese as they occur in citation and in disyllabic utterances. I

present data to quantify the acoustic dimensions of fundam

ental frequency, duration and am

plitude, and the physiological correlates of vocal cord tension and subglottal pressure for the citation tones. O

n the basis of the data obtained, I give an auditory account of the tone sandhi of disyllabic expressions and present the phonology of Fuzhou. The book is divided into tw

o main parts: the first tw

o chapters summ

arize relevant previous studies, while the

remaining chapters present the results of m

y own investigations and data collection.

The remainder of chapter 1 introduces Fuzhou and its peculiarities. C

hapter 2 gives an overview

of Fuzhou tones based on previous work, and then presents tw

o of the most

detailed proposals for accounting for the tonological system. C

hapter 3 describes the corpus used in the elicitation sessions for Fuzhou citation tones and m

y general methodology.

Chapter 4 gives the results of the fundam

ental frequency and in chapter 5 the amplitude is

examined to see if it is possible to infer som

ething about the way the tones are produced.

Chapter 6 presents the phonetics and phonology of the disyllabic expressions, and finally

chapter 7 concludes with a short sum

mary.

1.1 Fuzhou as a Chinese dialect

Fuzhou (Fúzhōuhuà 副州話

) is an Eastern Min (M

ǐn Dōng (閩

東) dialect of C

hinese. 1,2 This section provides an overview

of Chinese and its subgroups in order to situate Fuzhou

appropriately as a Min dialect.

According

to N

orman,

Chinese

consists of

seven different

dialect groups

(1988:181). Each of these dialect groups likely constitutes a dialect continuum like W

est G

ermanic (C

hambers &

Trudgill 1980), but they are all heteronymous w

ith respect to Standard C

hinese, and possibly to a local standard as well. Y

uan (1962) groups them as

follows:

1 I do not address the sociopolitical landscape here and use both dialect and variety throughout. 2 Throughout this book I use pinyin and characters at the first introduction of a C

hinese term,

thereafter I use the pinyin without the diacritics.

C

HA

PTER 1

2

1 Mandarin or B

ěifānghuà (北方話

) 2 W

ú (吳)

3 Xiāng (湘

) 4 G

àn (贛)

5 Hakka (K

èjiā 客家

) 6 Y

uè (粵)

7 Mǐn (閩

)

There are certain subdivisions within these seven groups that are som

etimes treated as

separate groups. For example, Jìn (晉

語) is som

etimes grouped together w

ith Mandarin (as

Guān 官

) and sometim

es listed as a separate group; Huīzhōu (徽

州話

) has been argued to belong to W

u, Gan or (Jiānghuái

江淮

) Mandarin; and Pínghuà (平

話) is som

etimes used

as an alternative name for the Y

ue dialect group, and sometim

es described as an independent group in the Y

ue branch, or just as a dialect of Yue. M

oreover, Gan and H

akka are som

etimes treated as independent languages under the sam

e main branch of C

hinese. Figure 1.1 locates these dialect groups in C

hina.

Figure 1.1. Geographical distribution of C

hinese dialect groups. 3

3 From

Wikipedia, http://en.w

ikipedia.org/wiki/C

hinese_dialects; accessed 3/3/13.

IN

TRO

DU

CTIO

N

3

The dialect groups are conventionally based on the historical development of the tones

from M

iddle Chinese. M

iddle Chinese (M

C) w

as reconstructed using the Qièyùn (切

韻)

dictionary. This rhyming and pronunciation dictionary w

as compiled in 601 C

E by a small

group of scholars led by the poet Lù Fǎyán (陸法言

) whose aim

was to provide a guide to

the ‘proper’ recitations of literary texts. The Qieyun records the pronunciation of C

hinese characters arranged by tone and rhym

e and is said to be the origin of the fănqiè (反切

) – a w

ay to segment the syllables into onsets and rhym

es in order to include pronunciations of w

ords without hom

ophones. About a century before the Q

ieyun, it was discovered that

there were four tonal categories in C

hinese, traditionally called píng (平 ‘level’), shǎng (上

‘rising’), qù (去

‘departing’) and rù (入 ‘entering’). The dictionary w

as thus arranged into five volum

es, two for the ping tone and one for each of the other tones. W

ithin each volum

e, characters were divided into rhym

es, defined by the nuclear vowel and final

consonant.

This original four-way tonal distinction has developed into m

uch more com

plex system

s for most varieties of C

hinese. The tonal splits are usually conditioned by features of the initial consonants including voicing, glottality, aspiration and prenasalization. In order to account for m

ost of the dialects, a three-way distinction of the initial types is

sufficient – these are traditionally called qīng (清), quánzhuó (全

濁) and cìzhuó (次

濁),

literally ‘clear’, ‘fully muddy’ and ‘partially m

uddy’ (Norm

an 1973:222). These are generally

interpreted to

refer to

voiceless, voiced

obstruent and

voiced sonorant

respectively. It is the diachronic development of the M

C voiced stops into the m

odern dialects that is usually the sole classificatory criterion for subgrouping. For exam

ple, the follow

ing is said to characterize a Mandarin dialect (N

orman 1988:191):

MC

voiced obstruents all become devoiced: V

oiceless aspirate in the Ping tones and voiceless non-aspirate in the other tones. The R

u tones have been redistributed over the other tones.

A different approach to classification is adopted by N

orman (1988:182), w

ho bases the com

parisons on synchronic data. He proposes ten features as further criteria, better

illustrating, he claims, the internal relationships that hold betw

een groups. These criteria are:

1. The third-person pronoun is tā

他 or cognate to it.

2. The subordinative particle is de (di) 的

or cognate to it.

3. The ordinary negative is bù 不

or cognate to it.

4. The gender m

arker for animals is prefixed, as in the w

ord for ‘hen’ mǔjī 母

(literally ‘female chicken’).

5. There is a register distinction only in the ping tonal category.

6. V

elars are palatalized before [i].

7. Zhàn 站

or words cognate to it are used for ‘to stand’.

8. Zǒu 走

or words cognate to it are used for ‘to w

alk’.

9. Érzi 兒

子or w

ords cognate to it are used for ‘son’.

10. Fángzi 房子

or words cognate to it are used for ‘house’.

C

HA

PTER 1

4

According to these criteria, the dialect groups fall into three larger groups, roughly

corresponding to their geographical locations: Northern group (B

eifanghua), Southern group (K

ejia, Yue, M

in), and the rest constituting the Central group w

hich is understood as transitional, possessing features from

both the Northern and Southern groups. The N

orthern group m

eets all of the above criteria, while the Southern group m

eets none. The geographical locations can be seen on the m

ap given in figure 1.1.

1.2 Characteristics of the M

in dialect group The hom

e of the Min dialects is southeast C

hina: most of Fújiàn (福

建) province and the

northeastern corner of Guangdong. It is w

ell known that dialects of this group differ from

the other groups in their linguistic developm

ents, exhibiting many archaism

s as well as

local innovations. This could be due to the relative geographical isolation: no major rivers

and a very mountainous terrain w

ould have made access in or out of this region difficult.

Possibly because of this, the Min dialects are the second m

ost distinctive (after Beifanghua)

and easily characterized group of Chinese dialects.

The key diagnostic feature for determining M

in dialects is the need to posit a tripartite division of the proto voiced stops: *voiced, *voiced aspirate and ‘*softened stops’ (the latter possibly arising from

the influence of some type of voiced prefix, w

ith the root consonant subsequently undergoing lenition) (N

orman 1973:237). This can be observed by

examining the M

C voiced (quanzhuo) stops and their correspondences in M

in. This is illustrated in table 1.1 (adapted from

Norm

an 1988:229). The table compares a set of M

C

words w

ith initial *d and their cognates in four Min dialects, show

ing that some M

in varieties have three different correspondences to the M

C ‘d’. The correspondence sets

illustrating this with the sam

e MC

segmental form

s are highlighted in bold print. Note also

that the distinction is not dependent on tonal forms as som

e MC

forms have the sam

e tonal reflexes across dialects (as indicated by the superscript num

ber). Nor can it be show

n to be a conditioned split, thus it follow

s that three different phonemes present in an earlier stage

of Chinese and are still preserved in M

in. The main distinction for M

in is between aspirate

th and unaspirate t. This two-w

ay distinction is present in all Min dialects, but a further

division of the unaspirate initial consonants into two types is only preserved in som

e northw

estern dialects, with reflexes of voiced sonorants (or nothing). In table 1.1, Jiànyáng

(建陽

) is a representative of a northwestern M

in dialect exhibiting the third correspondence set, as show

n in the third bolded line, with reflexes for

荳 ‘bean’, 頭

‘head’ and脰

‘neck’. These form

s are all *d initial in MC

, but correspond to an initial [t], [h] and [l] in Jianyang.

Table 1.1 shows that the m

ajor reflexes of MC

‘d’ are aspirate and unaspirate dental stops, and there is generally a high degree of correlation as to w

hether the initial is aspirate or not in any given w

ord within this dialect group. In short, a M

in dialect can be defined as “any C

hinese dialect in which both aspirated and unaspirated stops occur in all the yang

(lower register) tones, and in w

hich the lexical incidence of the aspirated forms in any

given word is in substantial agreem

ent with that of the other dialects of the group” (N

orman

1988:229).

IN

TRO

DU

CTIO

N

5

Word

(pinyin) (English)

MC

Fuzhou

Xiam

en Jianyang

Yongan

dòu ‘bean’

dǝu- tau

6 tau

6 teu

6 tø

5

dí ‘hoof’

diei te

2 tue

2 tai 2

te2

dì ‘brother’

diei: tie

6 ti 6

tie5

te4

tóu ‘head’

dǝu thau

2 thau

2 heu

2 thø

2

tí ‘w

eep’ diei

thie2

thi 2 hie

2 the

2

táng ‘sugar’

dâng thouŋ

2 thŋ

2 hɔŋ

2 tham

2

dié ‘stack up’

diep thak

8 thaʔ

8 ha

8 thɔ

4

dòu ‘neck’

dǝu- tau

6 tau

6 lo

6 ---

dài ‘bag’

dậi- toi 6

ta6

lui 6 tue

5

dú ‘poison’

duok tøik

8 tak

8 lo

8 tau

4

tóng ‘copper’

dung tøiŋ

2 taŋ

2 loŋ

2 tãw

2

Table 1.1. M

in reflexes of MC

‘d’

The linguistic situation in southern China is actually very com

plicated and rather difficult to illustrate in the typical tree diagram

format, as there are m

any possible sub-strata. For M

in, there is a possible sub-stratum of A

ustronesian, for Wu, a sub-stratum

from

Miao-Y

ao, and for the Yue dialect group, from

Tai. Nevertheless, it is clear that M

in represents a diachronically earlier stage of C

hinese than that of Middle C

hinese, with a

tentative dialect subgrouping as shown in figure 1.2. 4

AC

MC

Min

Yue

Kejia

Gan X

iang Wu

Beifanghua

Southern

Northern

Figure 1.2. A possible subgrouping of the C

hinese dialect groups.

The lexicon is also a good source of Min peculiarities. It is here that w

e can observe the aforem

entioned preservations of archaisms and local innovations, w

here some w

ords retain their original m

eanings, once used in across the board, while undergoing a sem

antic shift in

4 A

C = A

ncient Chinese.

C

HA

PTER 1

6 other groups. One exam

ple of this is that the Northern group underw

ent a semantic shift of

‘run’→ ‘w

alk’, but the Min dialects did not (N

orman 1988:183). A

nother example of a

semantic change is the reflex of M

C ‘*tieng’, m

eaning a ‘three-legged cooking vessel’; this w

ord has retained a much earlier m

eaning in Min, w

here it now sim

ply refers to a ‘cooking pot, caldron’, w

hereas the Northern group now

only associates it with the ritual bronze

tripod comm

on in Chinese art (N

orman 1988:231).

1.3 Peculiarities of Fuzhou Fuzhou is a typical M

in dialect in terms of the aforem

entioned broad classificatory criteria, and is often chosen to represent the Eastern M

in dialects. Fuzhou has seven lexical tones. W

hile the various sources do not always agree on the exact pitch values reported for these

tones, they may be roughly described as follow

s:

Tone

MC

tonal category A

uditory description

Tone 1 陰平

Yin ping high level

Tone 2 上

Shang low

level

Tone 3 陰去

Yin qu low

fall/fall-rise

Tone 4 陰入

Yin ru low

rise, final stop

Tone 5 陽平

Yang ping high fall

Tone 6 陽去

Yang qu rise-fall

Tone 7 陽入

Yang ru high, final stop

Table 1.2. Fuzhou’s seven tones.

While not m

entioned in previous descriptions of Fuzhou, there is a clear non-modal

phonation (creaky/breathy) associated with tones 3, 4 and 6.

Fuzhou is perhaps most fam

ous for its set of vowel alternations (e.g. M

addieson 1976a; D

onohue 2011a, 2014). The realization of a vowel w

ill vary depending on the tone w

ith which it is realized in citation (or isolation/prepausal) form

. Examples of the nature of

the vowel alternations are given below

in table 3, where the vow

els are divided into two

groups according to the tone they occur with. The A

vowels are higher relative to the B

vow

els, which are low

er/diphthongs. A given row

constitutes a single phonological vowel,

a grouping justified both historically, and by the fact that the vowel differences are

neutralized in sandhi position to the corresponding form from

the Set A group.

IN

TRO

DU

CTIO

N

7

Set A:

Tones 1, 2, 5, 7 Set B

: Tones 3, 4, 6

i ei

ei ai

u ou

ou au

y øy

Table 1.3. A

lternating vowel pairs in Fuzhou

Fuzhou tone sandhi is right dominant, thus it is the final syllable in a given dom

ain that rem

ains unchanged. All of these prepausal syllables retain their citation tones and

vowels. H

owever, in sandhi (non-final) position, the tone changes and all Set B

vowels

become the correspondent vow

el from Set A

. The examples in (1)–(3) (C

hao 1934:41) below

illustrate how a syllable w

ith an underlying tone from Set B

(tone 3, 4 or 6) in prepausal or citation position w

ill be realized with a vow

el from Set B

as shown in the (a)

examples. H

owever, w

hen they are followed by another syllable and are thus in sandhi

position, we see that their tone changes (depending on the follow

ing tone), and the vowel

also changes to the corresponding variant from Set A

, often characterized as a ‘raising’ of the vow

els from the low

er/diphthongal variant to the higher/monophthongal form

. 5

Input tones:

Surface form

:

(1) a. Tone 3 [21]: 氣

[khei 21]

‘air’

b. Tone 3 [21] + Tone 4 [23]:

氣 壓

[khi 53 ɑʔ 23]

‘air pressure’

(2) a. Tone 4 [23]: 竹

[tøyʔ 23]

‘bamboo’

b. Tone 4 [23] + Tone 4 [23]:

竹 節

[ty 5 ʒaiʔ 23]

‘bamboo section’

(3) a. Tone 6 [231]: 護

[hou 231]

‘protect’

b. Tone 6 [231] + Tone 1 [44]:

護 兵

[hu 44 βiŋ 44]

‘guards’

1.4 Previous studies of Fuzhou A

reasonable amount of descriptive w

ork has been conducted on Fuzhou. Chan (1985)

contains a list of many of the relevant sources and the values assigned to the tones. The

tonal values from these, along w

ith other more recent w

orks, are included in table 1.4. One

5 These exam

ples continue to use Donohue (1992a)’s tone num

bers rather than Chao’s original values to

avoid confusion.

C

HA

PTER 1

8 thing that

all previous

works

share is

that they

are based

on auditory

data, or

impressionistic transcriptions of pitch, and often from

just one speaker. The original descriptions vary w

ith respect to the representation of the tones, some even em

ploying a m

usical note system. C

han, however, converted these values into the C

hao ‘tone letters’ w

here the pitch contour is represented through combinations of num

ber 1 to 5, where 5 is

high and 1 is low (C

hao 1930). I use underscoring to indicate a syllable with an unreleased

plosive in the coda (a ‘checked’ or ‘stopped tone’) (which typically corresponds to a

syllable with a shorter duration, the original intent of the underscoring).

Author

Year

Tone 1 Tone 2

Tone 3 Tone 4

Tone 5 Tone 6

Tone 7

Beijing

University

1962 44

31 213

23 52

242 4

Chan

1985 44

32 213

13 51

131 5

Chao

1933 44

22 312

24 52

242 55

Chen

1967 44

22 312

24 52

232 5

Corbato

1945 44

21 25

24 52

232 5

Ergerod 1956

55 33

13 13

52 242

55

Lan 1953

55 33

11 13

61 242

56

Liang 1982

55 31

213 13

53 353

55

Maccy &

B

aldwin

1929 44

33 13

13 53

341 4

Nakajim

a 1979

55 33

31 23

52 242

55

Norm

an 1988

55 22

13 24

41 342

55

Tao 1930

55 31

13 34

52 342

5

Wang

1969 5555

3333 1112

24 6--2

2342 56

Wright

1983 44

22 12

13 52

242 4

Yip

1980 44

22 12

13 52

242 4

Yuan

1980 44

31 213

23 52

353 4

Zhan 1981

44 31

213 23

53 242

5

Zhang 1984

44 22

213 13

52 242

5

Zheng 1958

44 22

213 23

53 231

5

Table 1.4. Previous descriptions of Fuzhou tones.

While there are certain uniform

ities that are obvious, such as tone 1 being a tone with an

overall high level pitch, and tone 5, a high falling pitch and tone 7, a short tone with a final

consonant and a high pitch. How

ever, there are some differences that are hard to reconcile,

such as tone 2 being a tone with a m

id level, mid fall and low

level pitch; tone 3 is described as a tone w

ith a low dipping, low

rising, low level, m

id rising, and mid falling

pitch. Tone 4 is clearly a tone with a rising stopped pitch and tone 6 w

ith a convex pitch,

IN

TRO

DU

CTIO

N

9

though possibly in either the upper or lower part of the pitch range. Som

e even extend the 5 point scale to a 6 point scale to fully reflect their (differing) perceived pitch values. Such differences m

ay result from a range of factors such as idiolectal, regional or social

differences. Given that these descriptions are m

ade from im

pressionistic data usually based on the speech of one person, it is perhaps not surprising to find such disagreem

ents and the differences m

ay well reflect norm

al between-speaker variation. B

ut with the speech of just

one speaker and differing transcriptional skills, there is no way to know

. Having to call into

question the reliability and consistency of the linguists’ pitch transcriptions is the most

troubling. Several major w

orks, including Chen (2000) and Y

ip (2002), which aim

to account for the representation of tones and the com

plex sandhi behavior resulting from

tonal interactions

that are

observed in

(especially) C

hinese languages,

rely alm

ost exclusively on such im

pressionistic data as there is little else available.

Clearly one needs a w

ay of standardizing the method of description of tones in

general: a way of obtaining m

aximally objective descriptions. This is possible by

quantifying the linguistically relevant aspects of the physical signal (e.g. Rose 1982a,

1982b, 1990a, 1990b; Zhu 1999 and others). The results obtained from an appropriately

controlled instrumental investigation of the acoustic phase are not subject to as m

any sources of error, and w

ould eliminate the errors resulting from

possible inconsistencies in perception and transcriptions. B

y ensuring a multi-speaker approach and norm

alizing the results, one can also factor out betw

een-speaker differences and arrive at a representation of the tonal fundam

ental frequency (F0) of the Fuzhou variety as a whole (e.g. R

ose 1987). Y

ip herself notes that some of the disagreem

ent, especially in the notation of contour tones m

ay “illustrate the real problem in tonal phonology posed by different field w

orkers’ perceptions of the sam

e facts: one man’s 24 is another m

an’s 35, or, more seriously, one

man’s 22 is another m

an’s 32 and so forth. This is where instrum

ental work is

indispensable but still almost entirely lacking” (1990: 338-339). D

espite the time lapse

since Yip’s observation, there are still relatively few

instrumental descriptions of tone

systems. Experts in the field continue to lam

ent the lack of instrumental w

ork of both the citation tones and the sandhi form

s. Recently, Zhang w

rites “the most urgent and fruitful

step … that C

hinese tonologists should currently take is to rebuild an empirical foundation

from w

hich theoretical analyses may proceed…

The field needs carefully designed acoustic studies that system

atically look at the realizations of tones in tone sandhi behavior” (Zhang 2009:11–12).

1.5 Summ

ary This chapter introduced Fuzhou as a C

hinese dialect and identified some key characteristics

peculiar to Fuzhou and the Min dialect group to w

hich Fuzhou belongs. The next chapter introduces Fuzhou tones in the context of previous tonological analyses.

10

11

2

Previous analyses of Fuzhou tonology

This chapter compares tw

o different accounts of the tonology of the Fuzhou: Yip (1980) and

Chan (1985). B

oth of these approaches fall within the autosegm

ental framew

ork, differing prim

arily in whether or not the author uses the concept of register. I then present an

impressionistic sum

mary of the tones in Fuzhou and their resulting sandhi form

s before finally evaluating the pros and cons of the different approaches to Fuzhou tonology.

2.1 An overview

of Fuzhou tones This section serves to introduce the tone sandhi phenom

ena in Fuzhou, using the data presented in C

hen & N

orman (1965), taken from

Chan (1985) w

ho summ

arizes their work.

2.1.1 Citation tones

The seven citation tones in Fuzhou are represented by Chen &

Norm

an as follows:

Tone 1

/55/ Tone 5

/52/

Tone 2 /22/

Tone 6 /342/

Tone 3 /12/

Tone 7 /55/

Tone 4 /24/

Figure 2.1. Chen &

Norm

an: citation tones.

As previously m

entioned, the underlining indicates a stopped tone. There is another tone, [35], w

hich occurs only in sandhi forms. B

elow in figure 2.2 are som

e examples of the tones in

citation form.

Tone 1 巴

[pa]

[affix] Tone 5

[pa] ‘clim

b’

Tone 2 把

[pa]

‘handle (n.)’ Tone 6

[ta] [ordinal prefix]

Tone 3 霸

[pa]

‘tyrant’ Tone 7

[paʔ] ‘w

hite’

Tone 4 百

[paʔ]

‘hundred’

Figure 2.2. A list of near-m

inimal pairs in Fuzhou.

C

HA

PTER 2

12

2.1.2 Disyllabic tone sandhi

As noted, tone sandhi in Fuzhou is right dom

inant, so it is the final syllable in a given sandhi dom

ain which retains its ow

n citation value, and determines the observed form

of the im

mediately preceding tone. The set of tonal values in non-final, or sandhi, position in

disyllabic utterances in Fuzhou is illustrated in table 2.1.

Second syllable →→

Tone 1 [55]

Tones 5, 7 [52, 5]

Tones 2, 3, 4, 6 [22, 12, 24, 342]

First syllable ↓↓

Resulting sandhi tones (on first syllable) given below

Tone 1 [55]

Tone 3 [12]

55 52

Tone 6 [342]

Tone 4 (*<h) [24]

Tone 5 [51]

22

Tone 7 [5]

Tone 2 [22]

22 35

Tone 4 (*<k) [24]

Table 2.1. Fuzhou disyllabic tone sandhi form

s (Chen &

Norm

an 1965)

According to C

han, the sandhi forms in table 2.1 are reasonably representative of the other

sources consulted in her extensive survey, the major deviation being w

hether tone 2 is grouped together w

ith tones 3 and 4 in final position, or whether it determ

ines its own sandhi tones for

imm

ediately preceding syllables. Another point needing clarification is the split in tone 4 in its

sandhi behavior. This tone patterns with tw

o different groups of tones when changing to its

sandhi tone. The most w

idely accepted explanation for this is the final glottal stop having com

e from different consonants historically. H

ere, the ‘*<h’ represents a glottal stop in proto-M

in and that the ‘*<k’ is the result of the merger of all other proto-M

in stops (*p, *t, *k) syllable finally (C

han 1985:150). This, however, is by no m

eans a general consensus. It has also been suggested that the ‘k’ group represents an earlier developm

ent of the stops, before w

eakening to a glottal stop. The distinction made betw

een the glottal stop and the ‘k’ is said to have been m

aintained in the literary readings of characters until quite recently. Now

, Fuzhou speakers do not alw

ays make this distinction, and often use a glottal stop as for colloquial

readings.

2.1.3 Trisyllabic tone sandhi The penultim

ate syllable will change according to the rules for disyllabic tone sandhi, and the

antepenultimate syllable w

ill have a low-pitched tone; that is, [22] or [2]. There are, how

ever, a few

restrictions:

P

REV

IOU

S AN

ALY

SES OF F

UZH

OU

TON

OLO

GY

13

If syllable 2 has as its original (input) tone, tone 5 or tone 7 and

(i) the first syllable is either tone 2 or tone 4(<k)

normal disyllabic tone sandhi w

ill take place with the sandhi tone of the first

syllable being determined by the changed form

of the penultimate

(ii) the first syllable is any of the other tones (1, 3, 4(<h), 5, 6, 7)

the sandhi tone will be that of the ‘eighth’ tone [35].

That is, if the second syllable is underlyingly tone 5 or tone 7 (note that these tones constitute the third natural class of sandhi tones in the above table), there are tw

o possibilities of output for the third syllable, all dependent on w

hether or not the first syllable is either tone 2 or 4(<k) (the second natural class of sandhi tones). If the first syllable is one of these tones then norm

al disyllabic tone sandhi w

ill occur between the first syllable and the penultim

ate, the changed form

of the penultimate syllable being the input value for determ

ining the output sandhi tone of the first syllable. If the first syllable is not one of these tones, then it w

ill change to the so-called eighth tone, [35].

2.1.4 Quadrisyllabic tone sandhi

The maxim

um tone sandhi dom

ain in Fuzhou is four syllables. Expressions of five or more

syllables tend to be broken down into sm

aller domains. Q

uadrisyllabic tone sandhi simply

involves any pre-antepenultimate syllables having a low

tone: [22] or [2].

2.1.5 Vow

el alternations Fuzhou has tonally conditioned m

orphophonemic vow

el alternations. Specifically, the tones divide into tw

o natural classes on the basis of vowel alternation under tone sandhi. In one of

the classes, group A (Tones 1, 2, 5, 7), the vow

els do not alternate. In group B (Tones 3, 4, 6),

however, the vow

els do alternate. That is, when one of these tones changes to any sandhi tone,

the vowel w

ill also change to a form like that w

hich occurs on the A-group tones (c.f. section

1.3).

How

many vow

el pairs are involved in these alternations varies between sources,

however the environm

ents in which they occur are consistent throughout the literature. The

vowels are considered to have been either raised in the environm

ent of the A-group tones, or

lowered in the context of the B

-group tones, but only in citation/prepausal position. Chen &

N

orman list the follow

ing finals as undergoing vowel alternations. In table 2.2 the second of

the two vow

els in the diphthongs is always the shorter of the tw

o, more like a glide (C

han 1985:401). The final ‘ŋ’ is intended to cover all syllable types w

ith a final consonant – thus both the velar nasal and the glottal stop.

C

HA

PTER 2

14

Group A

: G

roup B:

Tones 1, 2, 5, 7 Tones 3, 4 6

i ei

iŋ eiŋ

eiŋ aiŋ

u ou

uŋ ouŋ

ouŋ auŋ

y øy

yŋ øyŋ

øy œ

y ɛ

a T

able 2.2. Fuzhou vowel alternations (C

hen & N

orman 1965)

2.2 Chan’s analysis of Fuzhou tonology

One approach to tonology first proposed by G

oldsmith (1976) is A

utosegmental Phonology.

The central idea in this approach is that features occur on separate ‘tiers’, allowing for

interactions of non-adjacent features that are considered adjacent on a specific tier. This readily accounts for a lot of ‘long distance’ phonological phenom

ena, and an important

consequence for tones is that tones, too, can be considered autosegmental, existing on their

own tier. B

oth analyses of Fuzhou tonology presented in detail here are couched within

Autosegm

ental Phonology.

2.2.1 Citation tones

Chan assigns features to the tones based on the contour and pitch height and sandhi behavior

of the tone. She first considers the contour of the tones in Fuzhou: falling, rising, or rising-falling, claim

ing that the overall pitch range is secondary, with the key contrast being just high

or non-high. Chan thus posits just one distinctive feature in the tonal phonology of Fuzhou,

namely [highpitch]: [+highpitch] yields an H

toneme, and [–highpitch] yields an L tonem

e. C

han also claims as evidence the varying transcriptions gathered from

the different sources. For instance, the fact that tone 4 has been recorded as [13] and [24] she claim

s indicates that the tone can vary. It is possible that the differences m

ay represent recent developments in the

dialect. How

ever, whether this is a fact to be accounted for or m

erely the result of within- or

between-speaker differences, regional differences, or language change in progress rem

ains an open question.

The pitch values in Chan’s data differ slightly from

those given above, so I first list the values C

han uses along with the system

atic tonemes that she proposes as their underlying

representation. The ‘@’ that appears w

ith tone 2 and one of the representations for tone 4 is m

eant to signify a ‘floating’ H tone.

P

REV

IOU

S AN

ALY

SES OF F

UZH

OU

TON

OLO

GY

15

Tone 1 /44/

H

Tone 5 /51/

HL

Tone 2 /32/

L@

Tone 6 /131/

LHL

Tone 3 /213/

LH

Tone 7 /5/

HL

Tone 4 /13/

L@ or LH

Figure 2.3. Chan: citation tones and tonem

es.

That tones 1 and 5 be assigned /H/ and /H

L/ is noncontroversial. Tone 2 is described as a low

/mid or low

-fall tone, rising in some sandhi contexts, w

here it is also sometim

es a high-level, hence the assignm

ent of /LH/. C

han claims that diachronically the rising pitch can be

analyzed as a vestige of an original glottal stop final in all tone 2 words; that w

hile the glottal stop has subsequently been lost, the contour resulting from

it remains. She actually refers to it

as being historically /LH/, but if the rise is a phonetic result of the syllable’s final consonant, it

is not clear why the H

was part of the tonem

ic representation historically. Chan explains that

the evolution of this tone involved a rule delinking the H tonem

e, leaving it as a floating tone, to be relinked in certain sandhi contexts. She notes that a rising tone is unusual for a sandhi tone, that they are usually level or falling, but that the presence of one in Fuzhou m

ay be accounted for on historical grounds due to this floating tone.

Tone 3 is allocated the value /LH/. In doing this C

han ignores the initial dip, yet she describes its fall as being longer in duration in citation tones, and about equal to that of the rise com

ponent in combinations w

ith other syllables ending in tone 3. She states that because it behaves as though it w

ere initially low in sandhi contexts, the initial fall m

ust be insignificant. This, how

ever, crucially hinges on the assignments of tonem

es that have already been made,

so is somew

hat circular in reasoning. Chan continues by pointing out that in preterm

inal position, “it behaves as if it is tone 1a [=tone 1]” (1985:120, em

phasis mine), /LH

/ then accounts for this by deletion of the initial L as an early sandhi rule, a rule m

atching another posited to account for all tones like it, LH

(L). Chan continues by stating that the rise is very

audible, “sometim

es the most salient feature” (1985:121), but then notes that due to a slightly

creaky phonation type associated with this tone “the rising pitch that follow

s tends to be very slight, very short, and scarcely audible” (1985:124).

The assignment of /LH

L/ to tone 6 does not need much explanation other than to note

that because it patterns with tone 3 /LH

/, it, too, requires that early deletion of initial L rule. Tone 4 is assigned tw

o values: /L/@ and /LH

/. The reason for it having a /LH/ representation

is that it is identical in pitch shape and pitch level to the initial rising part of tone 6. The split, as noted above, is the result of tw

o different sources for the final glottal stop, namely a proto-

Min *-k for those w

ith the floating high tone and a *-ʔ in the case of the underlying representation of /LH

/. Further evidence for this split is that the [ʔ] is deleted in preterminal

position in the former case and optionally in the latter.

Tone 7 is given the representation /HL/. It is described as being phonetically very short

and very high with perhaps a slight rise. That it behaves like tone 5 in preterm

inal position is the justification for w

hy it should be assigned the same tonem

ic representation. Chan sees this

as non-problematic, as their syllable structures serve to distinguish them

, with tone 7 and not

tone 5 having a final glottal stop. Chan also suggests that, because a follow

ing atonic syllable w

ill have roughly the same phonetic value as if it w

ere following tone 5, there is all the m

ore reason to assum

e that tone 7 is in fact an underlying /HL/. She suggests that there be a glottal

C

HA

PTER 2

16

rule delinking final Ls, leaving them as floating tones in prepausal position, or linking them

to the atonic syllable that m

ight follow.

2.2.2 Disyllabic tone sandhi

Chan describes the rules for disyllabic tone sandhi, first by illustrating the pitch values

associated with the changed form

s, then by using features as indicated by the H and L features

in the table 2.2.3 below.

Second syllable →→

Tone 1

[44] H

Tones 5, 7 [51, 5]

HL

Tone 2 [32] L

Tones 3, 4, 6 [213, 13, 131]

LH

(L)

First syllable ↓↓

Resulting sandhi tones (for first syllable) given below

Tone 1 [44]

H

Tone 3 [213]

LH

44

33 53

51

Tone 6 [131]

LH

L

H

H

HL

H

L

Tone 4 (*<h) [13]

LH

Tone 5 [51]

HL

33

22

Tone 7 [5]

HL

L

L

Tone 2 [32]

L@

22

13 44

Tone 4 (*<k) [13]

L@

L

L

H

LH

Table 2.3. Fuzhou disyllabic tone sandhi form

s (Chan 1985)

The rules that Chan posits to account for the output sandhi tones are given below

. Note

that W stands for ‘w

ord’ and V for ‘vow

el’.

1. Final L deletion rule.

W

|

V

|

L H

L

Ø

2. Initial L deletion rule.

W

|

V

Ø

L

H

P

REV

IOU

S AN

ALY

SES OF F

UZH

OU

TON

OLO

GY

17

3. /HL/ deletion rule.

W

|

V

H

L

Ø

4. L-spreading rule.

V

V

|

|

H

L

5. Sandhi H-docking rule.

V

V

|

L

H

L

6. /LH/ dissim

ilation rule.

V

Ø

L

H

L H

To account for the pitch-low

ering of the /H/ sandhi tones preceding /H

L/ tones, Chan posits

some ‘phonetic tone sandhi rules’ to account for the follow

ing:

V

V

V

V

|

|

H

H

L M

H

L

The rule proposed to account for this is:

Phonetic: H-delinking and low

ering rule

V

V

|

M

H

L

There is also the Obligatory C

ontour Principle to be applied after the sandhi rules but before the phonetic rules apply. It states that any contiguous (auto)segm

ents within a sandhi dom

ain m

ust be collapsed. Two further phonetic rules form

ulated are:

x

C

HA

PTER 2

18

Phonetic: H low

ering

V

L H

M

Phonetic: L raising

S

|

V

L

H

M

Chan states that the order in w

hich these rules must apply is quite strict. First the sandhi rules

1–6, then the OC

P, and finally the phonetic tone rules. She mentions that there is another rule

which m

ust be ordered before all of the above, which is the Final H

-Docking R

ule, linking the @

to stressed tone 4<k syllables. Some exam

ples of the application of these rules are included at the end of the section.

2.2.3 Trisyllabic tone sandhi C

han generates the forms, w

hich do not deviate in any way from

Chen &

Norm

an’s data, by using the disyllabic tone sandhi rules and just one m

ore: the Antepenultim

ate Tone Lowering

Rule, w

hich is ordered first and represented below.

Antepenultim

ate Tone Lowering R

ule

V

V

V

| |

|

L ⟵ T1

T2 T3

Condition: T2 ≠ /H

L/

If the tone on the second syllable in a trisyllabic sandhi domain is /H

L/ then there are additional rules to account for the exceptional patterning in sandhi on the first syllable. I refer the reader to C

han 1985:326 for further explanation.

2.2.4 Quadrisyllabic tone sandhi

There are

no differences

to C

hen &

N

orman’s

data. A

ny syllables

preceding the

antepenultimate are uniform

ly L.

P

REV

IOU

S AN

ALY

SES OF F

UZH

OU

TON

OLO

GY

19

2.2.5 Vow

el alternations C

han recognizes three possibilities to account for these:

1. The A

group is basic and the B group is generated from

these for tones 3, 4, 6 in prepausal context.

2. The B

group is basic and both groups A and B

change in sandhi contexts.

3. B

oth A and B

types are represented underlyingly and only the B group needs to

undergo change for sandhi contexts.

The most noticeable difference betw

een these proposals is that 1 implies vow

el lowering in

prepausal position, while 2 and 3 involve vow

el raising in sandhi contexts. One m

ay draw on

many areas to help decide betw

een the two possibilities. O

ne such area is a possible correlation betw

een vowel height and pitch height, the im

plication being that tones may affect

the vowels. This w

as first noted by Wang (1968:10-11) as quoted at length below

. F0 is consistently influenced by intrinsic factors, all of w

hich have either an acoustic or physiological m

otivation. … som

etimes a sm

all effect conditioned by intrinsic factors has grown in

time into significant differences that participate in the m

orphophonemic alternations in the

language. For example, high vow

els are known to raise the F0 by a sm

all increment …

In a language like Foochow

Chinese, this relation betw

een F0 and vowel height has grow

n into m

orphophonemic alternations.

Each time a tone w

ith a lower F0 changes to one w

ith a higher F0, the vowel also changes to a

higher vowel. Since the tone sandhi is a m

ore general phenomenon (i.e. there are vow

el qualities like [a] w

hich are not affected by the sandhi), it is descriptively more econom

ical to let the sandhi environm

ent condition the vowel raising rule. From

the viewpoint of the physiology of speech

production, however, it is perhaps m

ore likely that the vowel raising brought about the tone sandhi,

which w

as then generalized even to those vowels that do not yet get raised.

How

ever, much of the w

ork done on the inherent pitch/vowel correlations suggests that the

changes in vowel quality due to the F0 are not of the sam

e magnitude as those observed in

Fuzhou (e.g. Maddieson 1976a; Zee 1980).

2.2.6 Sam

ple derivations Follow

ing are some exam

ples illustrating the application of rules to arrive at the correct output tones follow

ing Chan’s rules.

1. Tone 5 + Tone 6 (平路

[píng lù] ‘level track’) [51] + [131] ⟶

[22 131]

Underlying:

V

V

H L

L

H

L

C

HA

PTER 2

20

HL Deletion:

V

V

Ø

⟵ H

L L

H

L

WFC

:

V

V

L

H

L

H Low

ering:

V

V

L H

L

M

Phonetic:

V

V

L

M

L

2. Tone 2 + Tone 6 (表弟

[biǎo dì] ‘cousin’) [32] + [131] ⟶

[44 131]

Underlying:

V

V

L @

L

H

L

Sandhi H docking:

V

V

L H

L H

L

LH dissim

ilation:

V

V

Ø

L H

L H

L

P

REV

IOU

S AN

ALY

SES OF F

UZH

OU

TON

OLO

GY

21

H low

ering:

V

V

H

L H

L

M

Phonetic:

V

V

H

L

M

L

3. T 5 + T 2 + T 1 (揚子江

[yángzǐjiāng] ‘Yangzi river’)

[52] + [22] + [55] ⟶ [22 22 55]

Underlying:

V

V

V

X

L @

H

Antepenultim

ate:

V

V

V

Tone Lowering

L ← X

L @

H

OC

P:

V

V

V

L

L @

H

2.3 Yip’s analysis of Fuzhou tonology

Before presenting Y

ip’s representations of the citation tones and the data from w

hich she w

orked in order to obtain the sandhi tones, I first outline the concept of register, an integral part of the tonal geom

etry employed by Y

ip, and one which has been subsequently recognized

as one of the most capable of handling tonal representation and tone sandhi (e.g. C

hen 2000).

C

HA

PTER 2

22

2.3.1 The concept of Register

Yip (1990) adopts the idea that tones are represented by tw

o features which bear a hierarchical

relation in that one, [upper] is dominant and splits the pitch range into tw

o registers, and the other [high], is subservient and further sub-divides each register. So a tone, w

hile realized phonetically as pitch, is phonologically represented by values for each of these features: R

egister [±upper] and Tone/melody [±high] (w

ritten H or L for short). R

egister and Tone interact to define four pitch levels, as illustrated in figure 2.4 below

.

Register

Tone

+upper +high

H

–high

L

–upper +high

H

–high

L

Figure 2.4. The interaction of Register and Tone.

Each of the two features form

s a separate autosegmental tier, and is therefore subject to

the well-form

edness condition. How

ever, only Tone occurs in sequences underlyingly. R

egister remains constant over the m

orpheme, thus restricting the tonal inventory to no m

ore than tw

o of any given contour (e.g. HL/falling or LH

/rising).

2.3.2. V

owel alternations

The vowels that Y

ip aims to account for are those taken from

Wang (1969), given in table 2.4.

Group A

: G

roup B:

Tones 1, 2, 5, 7, 8* Tones 3, 4 6

i ei

ei ai

y øy

œ

ǝy u

ou ou

ɔu T

able 2.4. Yip: V

owel alternations

Yip em

phasizes not the alternations of the vowels, but rather the correspondences betw

een the vow

els and tones. She also observes that these correspondences are preserved even in tone sandhi, such that if a m

orpheme bearing a tone from

group B “changes into” (1990:276) one

of the tones from group A

, the vowel w

ill also change. While a slightly different view

point from

Chan’s, it does just so happen to be the case that all the sandhi syllables have group A

vow

els (as well as syllables w

ith the ‘eighth’ tone, [35], that only occurs in sandhi).

P

REV

IOU

S AN

ALY

SES OF F

UZH

OU

TON

OLO

GY

23

2.3.3. C

itation tones The citation tones that Y

ip assumes in her w

ork on Fuzhou are given below in figure 2.5.

Tone 1 /44/

Tone 5

/52/

Tone 2 /22/

Tone 6

/242/

Tone 3 /12/

Tone 7

/4/

Tone 4 /13/

Figure 2.5. Yip: citation tones.

Yip uses as criteria for the assignm

ent of underlying representations to the tones the phenom

enon of vowel and tone alternations, as w

ell as the natural classes observed in the sandhi behavior. Let us first consider the sandhi data that Y

ip uses in order to understand the feature assignm

ent.

Second syllable →→

Tones 1, 5, 7

[44, 52, 4]

Tone 2

[22]

Tones 3, 4, 6

[12, 13, 242]

First syllable ↓↓

Resulting sandhi tones (for first syllable) given below

.

Tone 1 [44]

Tone 3 [12]

44

52

Tone 6 [242]

(13)

Tone 5 [52]

22

Tone 7 [4]

Tone 2 [22]

22

35

Tone 4 [13]

4

Table 2.5. Fuzhou disyllabic tone sandhi form

s (Yip 1990)

Yip states that the sources vary as to w

hat the sandhi tone is when /52/ is the fist syllable,

followed by any of tones /22, 12, 13, 242/. It is represented here as [22], but has also been

represented as [12]. If the value is [22], it fits easily into group A’s tones, but if it w

ere [12], it should belong to group B

, thus we w

ould expect the lower vow

el alternants, which w

e do not see. B

ecause of this, Yip posits a rule raising the vow

els in the environment of any of the

tones from group A

. Acknow

ledging that this is clearly an unsatisfactory environment,

requiring further definition, Yip decides that the context for these changes be [+upper] register

(with som

e segmental restrictions). That [+upper] register be proposed to define the tones in

group A is not controversial, w

ith the exception of tone 2: [22]. Yip’s justification for this is

that the tones seem to predom

inantly lie in the lower half of the tonal configuration. W

hen view

ed in this way, w

hat is notated as a /22/ could just about be the mid-level tone, w

hich is (presum

ably, now) m

ore significant than its notation. In the proposed phonological system

[+upper, L] is imm

ediately adjacent to [–upper, H], so the m

id tone could feasibly be

C

HA

PTER 2

24

represented as either, and in this case, the first representation is chosen. The feature assignm

ent for the citation tones is given in figure 2.6.

[+upper]

[–upper]

44 H

H

12

LH

52, 5 H

L

13 LL

22 LL

242

LHL

35 LH

Figure 2.6. Yip: citation tones—

feature assignment

Because [44], [4] and [52] group together as a context for sandhi, Y

ip assumes that they have

the same first tonem

e, necessarily a H due to [52] being a falling tone. She also observes that

[44] differs from [52] and [4] in its sandhi form

s, and that [52] and [4] always “m

erge”, thus form

ing her reasoning for assuming that these tones have the sam

e second tone feature, L. [44] is then H

H (leaving aside issues of the O

CP that w

ould contemporarily require that H

H be H

).

Yip offers an account of the m

otivation behind her assignment of features for those

tones whose feature assignm

ent may seem

somew

hat open to debate. From its features [4]

may seem

to be a falling tone, but is clearly phonetically high and level. Yip claim

s that there is a late rule of L-tone raising applying on stopped syllables only. If it w

ere to be underlyingly a high level, the sandhi rules w

ould need segmental as w

ell as tonal contexts. Tone [13] is again split according to its origins as having glottal or velar coda. The form

er set undergoes the sam

e sandhi as [12] suggesting a final H, although som

e sources report this as [11], conversely suggesting a final L. If it has a final L its sandhi behavior is indeed sim

ilar to [4] in that the phonologically final L is realized as a phonetic high tone (albeit in different registers). Thus Y

ip posits a rule raising the tone in the presence of a final glottal stop:

L ⟶

H / _____

[22] is also sometim

es represented as falling, which calls into question the assignm

ent of [+upper] register. Y

ip’s main justification for this is the vow

el alternations. Moreover, if it

were [–upper], it w

ould necessarily have to be H to distinguish it from

the other [–upper] tones, grouping it w

rongly as a context for sandhi. She stresses that a feature like Upper is a

relative one, and that [22] has a higher onset pitch than the other [–upper] tones.

Unfortunately Y

ip does not have her own data, but rather m

ust rely on data gathered from

other sources, which as she notes m

ay introduce nonexistent differences of pitch due to the different researchers. W

hen trying to account for the actual pitch values of the tones, Yip

does not change the underlying representation to reflect the phonetic form; rather she follow

s standard phonological practice and posits rules, w

hich will enable the correct phonetic form

to surface. H

owever, Y

ip first describes Register as a phonetically based feature w

hich ‘bifurcates the pitch range’ yet it is largely used as a phonological feature, thus requiring the need to justify the feature assignm

ent of [+upper] to tone [22]. For more discussion on the

definition of register, see Donohue (1992b).

P

REV

IOU

S AN

ALY

SES OF F

UZH

OU

TON

OLO

GY

25

Yip’s observation of the vow

el alternations differs from previous ones in that she

views them

as having correspondences with tones, alm

ost like co-occurrence restrictions: should the tone change from

one in Group B

to one in Group A

, then the vowel w

ould necessarily change as a result of this. C

han seems to think that it is the vow

els which alternate.

The tones, in some w

ay or another, provide the context for the rule which m

ay be formulated

like any other phonological rule, such as palatalization. The differences in their viewpoints

may be likened to the difference betw

een classical phonemics and prosodic phonology: one

views the difference in a static w

ay, the other more as dynam

ic changes with a necessary

cause and effect.

2.3.4 Disyllabic tone sandhi

As the table illustrating the pitch changes in sandhi position has already been given, I m

erely present the rules form

ulated to account for these, which, like C

han’s, must be ordered in their

application.

(6) LH

L simplification

L ⟶

Ø / LH

___

(7) [–upper] deletion

[–upper] ⟶ Ø

/ [+closed glottis]

(8) G

lottal stop deletion ʔ

⟶ Ø

/ [+upper]

(9) H

L deletion H

L ⟶

Ø

(10) L dissimilation

L ⟶

H / L ___ L

(11) T deletion T

⟶ Ø

/ [–upper]

(12) L spreading

$

$

(H)

H

L

(13) Register raising

R

⟶ [+upper]

(14) Vow

el raising V

⎡–low

⎤ / [+upper]

[αlow

] ⎣αhigh⎦

C

HA

PTER 2

26

One last point of com

parison; whereas C

han relied on diachronic developments as evidence

for the occurrences of the glottal stop, Yip prefers to account for it solely from

within her

synchronic phonology.

2.4 Two analyses com

pared The tw

o analyses are based on different data, so it is hard to make a direct com

parison. H

owever, I first present a com

parison of the way in w

hich tonemes and features are assigned

to the citation tones and then the rules given to account for the tone sandhi.

Chan claim

s that pitch height is not important in Fuzhou because there are no tw

o tonal contours contrasting only in pitch height. This m

ay, however, be a side effect of w

orking with

only one informant and not actually representative of the tonal possibilities in the w

hole variety of Fuzhou. C

han claims that phonetic output is part of her criteria for choosing a

particular underlying representation, yet her account of tone 3 is not so straightforward: of this

(allegedly) fall-rise tone, she claims that the fall com

ponent is longer in duration, then that the rise is m

ore ‘salient’, and finally that the rise is scarcely audible.

On the other hand, Y

ip assigns features to the tones on a purely phonological basis. She first specifies that R

egister (initially described as bifurcating the pitch range) is the feature that w

ill account for the vowel alternations. Then, w

ithin the domain of register, she assigns

tone features, drawing on the contexts and outputs of sandhi dom

ains and the consequent groupings of tones to guide her. Y

ip does not claim to have a phonetically transparent

analysis. Instead, she posits rules to derive the observed phonetic forms. H

ow abstract one

makes an analysis varies. It is perhaps good not to be overly influenced by the phonetic form

w

hen the phonetics consists of the (reported) pitch transcriptions of a single speaker.

While the sandhi rules that C

han and Yip propose are necessarily different, m

any of the rules are equivalent in form

ulation and/or effect. This is perhaps the result of working

within the sam

e general framew

ork of generative phonology, a model w

hich tries to account for surface m

orphophonemic alternations derivationally by transform

ing a unique underlying representation of a m

orpheme into its surface form

s by applying phonetically natural rules. Y

ip has nine straight rules to account for all vowel alternations and sandhi form

s. Chan

separates the rules into phonology-specific, general, phonetic and has six sandhi rules, the O

bligatory Contour Principle and tw

o phonetic rules, in addition to rules to account for the vow

el alternations.

A flaw

in both approaches has to do with language universals. C

han claims no need to

represent more than one feature for distinguishing pitch height, as she considers it to be

unimportant for Fuzhou phonology. H

owever, this is a very language specific claim

. It is of course desirable that a proposal be as general as possible to be able to capture a universal representation of possible contrasting pitch heights cross-linguistically (e.g. H

yman 1986).

Yip m

anages to do this with her R

egister feature. How

ever, the application of this feature, originally proposed to account for different tonal possibilities, is used to capture a segm

ental feature

about Fuzhou.

It is

perfectly acceptable

within

the fram

ework

to use

tonal phonological features to capture natural classes in the segm

ental phonology, but not having a rigorous definition of a feature prevents it being universally com

parable.

27

3

Acoustic quantification of the citation

tones

This chapter describes the methods and techniques I designed to obtain the desired data, and

the assumptions justifying their use.

3.1 Consultants

One of the m

ost important selection criteria for m

y informants w

as that they come from

the city of Fuzhou, rather than som

ewhere outside C

hina where Fuzhou is spoken, such as

Malaysia. This w

as done with the intention of controlling for regional variation that w

as evident in a pilot study. I w

as able to find four speakers who w

ere remarkably hom

ogeneous w

ith respect to factors that might influence their speech: they w

ere all from the city of Fuzhou

with roughly equivalent age, educational, and socioeconom

ic backgrounds. 6 I outline relevant personalia of each of the speakers in the follow

ing sections.

3.1.1 Speaker 1: WX

Q

WX

Q is m

ale and at the time of the recording w

as aged 30 years. He is a native of Fuzhou

city, and Fuzhou is his native language. It was the m

ost frequently spoken language at his hom

e, though Mandarin w

as also used occasionally. At school, the m

edium of instruction w

as M

andarin, but Fuzhou was spoken betw

een students outside the classroom. H

e completed an

undergraduate degree (where M

andarin was also the m

edium of instruction) before m

oving to A

ustralia to pursue a PhD program

, speaking mostly M

andarin at home.

3.1.2 Speaker 2: FM

FM is fem

ale and was 30 at the tim

e of recording. Fuzhou was her first language and w

as spoken at hom

e by her mother and guardians (grandparents). M

andarin, however, w

as also acquired at an early age and occasionally spoken at hom

e. She spent most of her life in the city

of Fuzhou, where she com

pleted all schooling. Although the language of instruction w

as M

andarin, she also said that Fuzhou was m

ost comm

only spoken outside the classroom. A

fter finishing school, she w

ent on to study at the Fujian Medical C

ollege for five years. She arrived in A

ustralia in February 1991 and has mostly spoken M

andarin at home since then.

6 R

ecall that the data was collected in 1992, so the data and results are relevant to the variety of Fuzhou

as it was spoken in 1992.

C

HA

PTER 3

28

3.1.3 Speaker 3: LY

LY is another 30-year-old fem

ale native to Fuzhou city. Fuzhou was the first language to be

spoken and the dominant language at hom

e, though Mandarin w

as also used on occasion. She spent m

ost of her life in the city of Fuzhou where she com

pleted all her schooling. Her

experience was the sam

e as the others: Mandarin w

as the medium

of instruction, but the language of her peers w

as Fuzhou. She moved to A

ustralia in 1987 and has since then mostly

spoken Mandarin at hom

e.

3.1.4 Speaker 4: ZPW

ZPW is m

ale, aged 35, born in the city of Fuzhou. Fuzhou is his mother tongue and the

dominant language at hom

e, but Mandarin w

as also occasionally used. He never left Fuzhou

during his youth, completing all schooling in the sam

e city (with M

andarin as the medium

of instruction). A

fter finishing school he continued his studies for four years at Zhejiang U

niversity, where the m

edium of instruction w

as also Mandarin. Since m

oving to Australia in

1989 to pursue a PhD at the A

NU

he has mostly spoken M

andarin at home.

3.2 The corpus and elicitation The corpus w

as designed to fulfill several objectives, including controlling for any intrinsic effects betw

een segmentals and tone.

It has been shown (H

ombert 1978; M

addieson 1976; Rose 1990) that differing initial

fundamental frequency perturbations result from

different prevocalic consonant types, namely

that there will be a higher fundam

ental frequency after a voiceless consonant, and a lower

fundamental frequency after voiced consonants (Lehiste 1970:68). The corpus w

as designed to restrict the syllable types represented to those w

ith unaspirated obstruent initial consonants and m

onophthongal vowel finals (w

here possible), thus also eliminating the undesirable

effects of a nasal coda (Rose 1981, 1990). This is largely follow

ing Rose (1981) w

hose investigations for a sim

ilar study claimed that these particular specifications “interfered the

least with the w

ay the source features are reflected in the oral output” (p.92).

There are also intrinsic effects of vowel quality on fundam

ental frequency (F0) to be considered. N

amely, there is a connection betw

een the vowel quality and the average F0

associated with it. That is, all this being equal, higher vow

els will have a higher average F0

(Lehiste 1970:68). Bearing this in m

ind, individual tokens were selected that w

ould equally represent the different vow

el qualities for each tone. Strictly speaking, to eliminate any

intrinsic raising of F0 a reasonable balance should be maintained betw

een open and close vow

els. How

ever, I decided that if every sample of syllable types for each tone consisted of

equal numbers of the different vow

el types, the only overall effect of the intrinsic raising of F0 w

ould be evidenced in the average range of each speaker being raised slightly. This does not affect the tones and their functions w

ithin a given system, and as the final set of contours is

not expressed in terms of hertz, this is actually a m

oot point, so is of no consequence to the results. Figure 3.1 below

shows the phonotactically allow

ed syllable types and the actual syllable types chosen for the corpus.

A

CO

USTIC

QU

AN

TIFICA

TION

OF TH

E CITA

TION

TON

ES

29

i. σ = (C

)(G)V

(G)(C

) w

here G= glide.

ii. σ = C

V(ʔ)

w

here C=voiceless unaspirated plosive,

V

= i (~ei), u (~ou), and a

Figure 3.1. Stimuli phonotactics:

(i) Possible syllable types in Fuzhou; (ii) Syllable types chosen for the study.

I used the Hànyǔ Fāngyīn Zìhuì (漢

語方音字匯

‘Chinese dialect syllabary’) to find characters

for each tone representing syllable types which m

atched the outlined criteria. The set of characters that w

ere used are included in the Appendix A

.

Having found the appropriate characters for each of the syllable types, these w

ere w

ritten on cards 3″ x 5″ (7.5cm x 12.5cm

) by a native speaker of Chinese to ensure that there

were no subconscious effects of foreign handw

riting during the elicitation sessions. I chose to use cards, rather than another m

ethod, such as a typed list, to avoid ‘listing intonation’—a

possibly higher F0 at the beginning of the page, dropping (and thus reducing the range) in anticipation of the end of the page, as w

ell as to eliminate any sandhi effects w

hich may occur

as a consequence of two characters being read in quick succession, or as a result of the speaker

merely anticipating the follow

ing character. Presenting cards to the speakers to read prevents the speaker from

knowing w

hat the following character is, and ensures that no tw

o characters are read too tem

porally close together. I also included ‘dumm

y’ characters at both the beginning and the end of the w

hole set of cards to avoid any possible effects on intonation patterns in these special positions. Finally, the w

hole set was read three tim

es, each time in a

newly random

ized order or reverse sequence, again to ensure that the readings were those of

the citation form and devoid of any influence from

the preceding character. This was also

done to ensure that any differing emotional states resulting from

the relative stages of the session w

ould be more or less elim

inated once the arithmetical m

ean was determ

ined.

The elicitation sessions were conducted in a sound-proof booth in the phonetics

laboratory at the Australian N

ational University. U

sing a Nakam

ichi microphone, the m

aterial w

as recorded on high-quality tape using a Nagra 4.2 reel-to-reel m

onotrack tape recorder at a speed of 7.5 ips. The recordings w

ere made w

ith manually set am

plitude levels on the Nagra,

to avoid any distortion to the amplitude that can occur w

hen instruments are set in the

automatic m

ode.

After setting an optim

um level on the N

agra for the speaker in question, the whole set

of characters was read three tim

es (each time in a new

ly randomized order), tw

o repetitions per character. This provided a corpus of data from

which spectrogram

s could be made:

4 speakers × 7 tones × 3 vowels × 3 repetitions × 2 replicates = 504 tokens.

3.3 Acoustic instrum

entation and mensural procedure

In earlier work (e.g. D

onohue 1991), and during the elicitation sessions, it was quite clear that

four of the tones were produced w

ith a non-modal phonation – tones 3, 4 and 6 w

ere consistently produced w

ith a creaky/breathy voice and tone 2 was optionally slightly breathy.

Given this, I decided to use analogue instrum

entation in the acoustic quantification of Fuzhou

C

HA

PTER 3

30

tones based on the results of a pilot study (Donohue 1991) that w

as conducted using digital equipm

ent, namely the pitch extraction program

on the Interactive Laboratory System by

Signal Technology Inc. How

ever, due to the phonation type associated with som

e of the syllable types, autom

atic pitch extraction was often unsuccessful. This m

eant that some of the

tokens would have to be m

easured by hand, while others w

ould be measured by autom

atic pitch extraction. I w

anted the method to be uniform

across tokens.

All spectrogram

s were thus m

ade using the Voice-Print Laboratories series 700

Spectrum A

nalyser, and were m

easured by hand. This model has a narrow

band-pass filter of 45 H

z and a wide band-pass filter of 300 H

z. Each spectrogram, using H

i-shaping, contains 2 kH

z of narrowband, linear expanded (1 kH

z/29.1mm

) bar analysis and 2-3 kHz of w

ideband, linear (1kH

z/14.45mm

) bar information. For every token, an average am

plitude spectrogram

was additionally m

ade, containing about 1 kHz of w

ideband information at the top.

Wideband spectrogram

s were used because they enable the sam

pling of F0 and am

plitude as a function of the segmental structure (i.e. the vow

el/Rhym

e). The basic data are thus F0 and am

plitude as a function of absolute duration of the rhyme, giving us a

polydimensional param

eterization of tone. This approach follows earlier w

ork (e.g. Kratochvil

1971, 1985; How

ie 1974; Coster &

Kratochvil 1984), w

hich discussed the problems of

sampling F0 w

ithout references to segmentals, given the know

n effects of tones and segments.

It is also the best orientation point available, and it relates to perception: e.g. F0 in onset consonant transitions are not perceived as pitch. Futherm

ore, wideband spectrogram

s have good tim

e-domain resolution, m

aking accurate and reliable segmentation possible.

Determ

ining the points of onset and offset of the syllable, and thus the duration, was

mostly straightforw

ard from exam

ining the wideband spectrogram

s. Following R

ose (1981), the point of onset for these syllable types w

as taken to be equivalent to the phonation onset. Phonation offset w

as determined to occur at that point after w

hich the glottal pulse stopped the otherw

ise regular increase in the period. These onset and offset points were then transferred

onto the narrowband spectrographic inform

ation, allowing for the 2m

m lag w

hich previous experim

entation w

ith transients

produced by

recorder on-off

clicks had

evidenced. Fundam

ental frequency measurem

ents were then m

ade at 20% intervals of the duration, and

also at 5% and 95%

, a rate adjudged high enough to satisfactorily resolve details of the F0 tim

e course for all of the tones. How

ever, I sampled additionally at 50%

of the duration for the tone 6 tokens produced by W

XQ

and FM as this seem

ed to be a significant point in the contour for som

e of their tokens.

A sam

ple of the token [tu] with tone 1 uttered by speaker ZPW

is illustrated in figure 3.2. B

oth narrow and w

ide band spectrograms are of the sam

e utterance, showing the am

ount of energy as a function of tim

e against frequency. The vertical axis, frequency, is calibrated at the expanded rate of 29.1m

m per 1000 H

z for the narrow band and half that for the w

ideband. Tim

e is displayed on the horizontal axis, at a rate of 1.27mm

per centisecond. The different lines on the narrow

band spectrogram each represent a different harm

onic, that is, a whole

number m

ultiple of a speaker’s F0.

A

CO

USTIC

QU

AN

TIFICA

TION

OF TH

E CITA

TION

TON

ES

31 Figure 3.2. N

arrow and w

ideband spectrograms of [tu] uttered by ZPW

. The easiest w

ay to derive a speaker’s F0 at a given point in the speech wave w

ould seem

to be to simply m

easure the first harmonic. H

owever, the m

easurement of a higher

harmonic gives a m

uch more accurate value for the F0 since there is less error per unit

frequency. Therefore, when m

easuring the F0 at any of the points from the spectrogram

, I m

easured the highest clear harmonic w

ith sufficient energy in it to distinguish it properly. It w

as not possible to measure each token from

the same corresponding harm

onic due to the different vow

el types, as some vow

els lack energy in certain parts of the spectrum. H

aving m

easured the harmonic, the F0 at that point w

as calculated. For example, in figure 3.2 the

distance between duration onset and offset points w

as measured to be 5.87cm

. Thus the duration is (5.87 ÷ 0.127 =) 46.22 csecs. N

ext, the various intervals of effective vocalic duration w

ere calculated so that the F0 could be sampled. A

t the 20% interval, the

measurem

ent was m

ade from the fifth harm

onic. The actual frequency at that point is (2.21cm

x 1000 ÷ 2.91 =) 760 Hz. In order to obtain the F0, recalling that harm

onics are whole num

ber m

ultiples of the F0, it is necessary to divide by the number of the harm

onic. Thus the fundam

ental frequency is determined to be (760 ÷ 5 =) 152 H

z.

Some difficulties arose w

hen determining the offset points of the phonation for tones 3

and 6 given the different phonation type associated with these tones (also w

ith tone 4, but as this is a stopped tone, the end point is quite clear). Tone 2 also has a seem

ingly optional change in phonation type, how

ever the phonation associated with tone 2 is som

ewhat m

ore breathy than the creak/breath accom

panying tones 3, 4 and 6. To illustrate these difficulties, I have included an exam

ple of the same speaker’s tone 3 uttered on the segm

ents [pa], shown in

figure 3.3. These tokens reflect a less extreme, though still reasonably typical instance of the

phonation change for this tone. Onset points m

ay be determined as previously described,

however, the offset points are som

ewhat harder to estim

ate. I have circled the areas crucial in m

y determination of these points. It w

as often necessary to inspect both the formants and the

periodicity at the baseline when choosing the appropriate point. In the first utterance, one can

see that at the designated point of offset there is noise in the first formant region. H

owever,

inspection of the circled area and of the periodicity at the baseline should confirm that until

C

HA

PTER 3

32

this point the period was regularly increasing. It is after this point that one can see a fairly

abrupt offset to phonation with a couple of irregular periods at the end. In the second token on

the same spectrogram

there is a breathy offset, with a noise excited second form

ant.

Figure 3.3. Spectrogram of tone 3 [pa] spoken by ZPW

with non-m

odal phonation.

Figure 3.4. A

verage amplitude spectrogram

on tone 1 [tu] spoken by ZPW.

A

CO

USTIC

QU

AN

TIFICA

TION

OF TH

E CITA

TION

TON

ES

33

The amplitude data w

ere less problematic. The onset and offset points w

ere transferred on to the spectrogram

from the previous spectrogram

s. Each horizontal interval of 1.455cm

represents an increase in 6 dB. A

dditional experimenting w

ith transients this time determ

ined there to be a lag of 1.5m

m. I have included a sam

ple of the amplitude spectrogram

s, for the token [tu] on tone 1 as uttered by speaker ZPW

(figure 3.4). At the 40%

interval of duration, the am

plitude is determined to be 0.55cm

up from the 24dB

line. That is (24 + [0.55 x 1.455 =] 0.8 =) 24.8 dB

.

Now

adays most speech analysis is done using Praat softw

are (http://ww

w.praat.org) or

similar program

. How

ever, it is not possible to so readily measure am

plitude like this with

Praat, whose ‘am

plitude’ extraction options are measures of intensity w

hich depends on the ‘pitch’, and w

hich have a time resolution that is too great, to be able to infer anything about F0

production from it. .

3.4 Results

The procedures described above resulted in F0 and amplitude quantified as functions of

absolute Rhym

e duration. These were taken to be the basic acoustic correlates of the Fuzhou

tones. There are approximately 7000 m

easurement points for both F0 and am

plitude. The raw

values may be found in A

ppendices C and F respectively.

For every tone and for each speaker, the means w

ere taken at each measurem

ent point of duration. The m

ean duration for each tone for each speaker was also found. This w

as done using a Statview

2 package on an Apple com

puter. The results of these mean values are in

Appendix D

(F0) and Appendix G

(Ar).

3.5 Summ

ary This chapter described the procedures used to obtain the acoustic data from

simple analog

instrumentation. These data w

ill be analyzed in the following chapter.

34

35

4

Acoustic characteristics of the citation

tones

This chapter presents the data obtained following the m

ethod outlined in chapter 3. First I present the param

eters chosen to describe the F0, and then give a brief auditory description of the isolation tones in Fuzhou in section 4.2. A

preliminary discussion follow

s in section 4.3 before presenting the results for each speaker’s m

ean tonal F0 contours in section 4.4. Section 4.5 com

pares and contrasts these results. Section 4.6 discusses normalization: its m

otivation and the procedure, then illustrates and discusses the results of the norm

alization. Finally, section 4.7 sum

marizes the chapter.

4.1 Method of data interpretation

The results are presented graphically, plotting the raw m

ean values for the F0 against the raw

mean values for the duration at the corresponding percentage points. A

bsolute duration for each tone is used in this study, not equalized duration as is often done, so that the im

portance of duration to the tonal system

may be exam

ined. Indeed, equalizing duration can obscure betw

een-tone differences in tonal F0 shape (e.g. Rose 1993). H

owever, before presenting the

results, I first describe the parameters chosen for interpreting the data, necessarily im

plicit in m

y presentation and descriptions.

The underlying assumption is that the tonal system

may function discretely w

ith respect to actual (raw

) values for duration and range of F0. The raw values are view

ed more as

variables dependent on outside factors such as the sex of the speaker and their emotional state.

Both the param

eters I have chosen are designed with the expectation that by observing each

speaker in a similar w

ay it will be possible to com

pare and contrast these observations. From

this, one can find invariant features that should reflect significant distinguishing features of the w

hole variety.

I describe the duration in terms of relative lengths. This is so that w

e can determine

whether a particular length is salient for a given tone, or perhaps part of an intrinsic gesture

dependent on the nature of the F0 contour. The F0 will first be exam

ined in terms of its onset

and offset points. I next examine the contour of the tonal F0 in term

s of gradient or derivative, that is, the rate of change of F0 (H

z) with respect to tim

e (csecs). To further investigate the given contour of the tone, I observe the points that evidence a change in the gradient. The param

eter for this is percentage points of duration. Finally, there is the possibility of there being significant points in the system

s expressible within the aforem

entioned parameters. That

is, points which m

ay be considered to be significant to the whole configuration, for w

hich the speaker m

ight aim w

hen producing the tones and as a result there may be a clustering effect

C

HA

PTER 4

36

around these points. This could have importance should the tones be able to be described in

terms of these, as this could be used as input data w

hen either assessing a phonological representation for the w

hole tonal system or form

ulating a new one. H

owever, it w

ill first be necessary to determ

ine which points are significantly different from

one another and which

may be considered to be the sam

e. While this can be done for each speaker w

ith AN

OV

A, I

suggest that such testing is best left until after a comparison has been m

ade with the other

tones, which w

ill indicate whether significance testing is even necessary. A

nother obvious criterion by w

hich the tones may be assessed is w

hether the tone occurs on a syllable with a

final stop (the so-called stopped tones). How

ever, this does not need to be part of an explicit discussion, w

hich is intended only to clarify those features that are less well delineated. B

efore em

barking on the discussion, I first provide a brief auditory description of the results before som

e further preliminary considerations and the results for the individual speakers are given.

4.2 Auditory characteristics

This section describes the tones in Fuzhou as illustrated by my speakers. The m

ain reason for this is the lack of agreem

ent of tonal pitch values between previous authors w

orking with

auditory-based observation data. Table 4.1 describes the tones in Fuzhou as perceived by the author. It w

ill be seen that it appears to represent a tonal system that does not m

atch up exactly w

ith any of the previously published works.

Tone 1 B

asically high level. A slight final rise for about the final third of duration. E.g.

[pa] 巴 (an affix), [ki] 期

(a period of time)

Tone 2 M

id-falling. Seemingly optional change in phonation to slightly breathy for the

final third or quarter of the duration (free variation). E.g. [pa] 把 ‘handle (n)’, [pi]

比 ‘to com

pare’

Tone 3 Low

, falling slightly. Phonation usually slightly breathy/creaky. Tends to creak tow

ards the end of the utterance. E.g. [pa] 霸 ‘tyrant’, [kei] 既

‘even if’

Tone 4 Slight dipping (fall-rise). Short and w

ith abrupt offset to phonation. Phonation usually slightly breathy/creaky. E.g. [paʔ] 百

‘hundred’, [teiʔ] 滴 ‘drip’

Tone 5 H

igh fall, falling to just beyond the mid pitch range. E.g. [pa] 爬

‘to climb’, [ki] 奇

‘odd’

Tone 6 R

ise-fall in the lower part of the speaker’s range. Phonation usually slightly

breathy/creaky. E.g. [ta] 第 (ordinal prefix), [tei] 治

‘cure’

Tone 7 H

igh, very short with abrupt phonation offset. E.g. [paʔ] 白

‘white’, [tiʔ] 姪

‘nephew

’ Table 4.1. A

uditory description of the tones in Fuzhou.

A

CO

USTIC

CH

AR

AC

TERISTIC

S OF TH

E CITA

TION

TON

ES

37

4.3 Preliminary considerations

Before beginning the individual descriptions of the tones, I m

ust first discuss some

considerations relevant to the interpretation of the data. It is ultimately the aim

of this study to illustrate the citation tones, reflecting all and only the characteristics of extrinsic control. Thus these prelim

inary considerations have to do with som

e features found in the data which seem

to be intrinsically produced effects, and not im

portant in terms of linguistic tonal production.

Specifically, I found there to be onset and offset perturbations that appear constant for all speakers.

Not all the F0 is tonal F0. O

nset perturbations are evidenced by an initial drop in F0 that affects all tones roughly equally, regardless of length or contour. These effects m

ay thus be elim

inated on the grounds that they cannot be characteristic of any individual tone and will

be ignored. I initially hypothesized that it was a percentage of the duration that m

ay be ignored. H

owever, considering that the effect is likely due to the coincident V

OT stop, the

amount to be ignored should rather be a fixed tim

e period, a constant of (at least) 5 csecs. This is supported by the fact that despite the varying size of the speaker’s m

aximum

duration, the effects are alw

ays similar in m

agnitude in absolute terms (raw

csecs). When describing the

tonal F0, I shall thus consider the onset point to be csec 5.

The next point to discuss is the offset perturbations. It is clear that there are also perturbations in the F0 derivative at the tail of the tones, evidenced by a final and abrupt drop (or rise) in F0. R

ose (1990a) suggests that a suitable parameter w

ould be a fixed constant expressed in centiseconds. 7 Elaborating on this, I w

ould rather have a set algorithm for

determining the constant to be ignored at offset than just a fixed constant as w

as chosen for the onset perturbations, as the effect is not really a ‘fixed’ one, like that of the initial consonant. M

oreover it seems that you w

ould lose too much inform

ation from the really short tones if the

offset perturbation were a fixed constant. M

ore suitable perhaps is that the offset perturbations are expressed in term

s of a fixed time constant that is sensitive to the tone’s duration. O

ne suggestion m

ight be a constant 5% of the duration as calculated from

the 5 csec mark. So if a

speaker has a maxim

um duration of e.g. 50 csecs, then the offset perturbations w

ould be [(50-5) x 5%

=] 2.25 csecs, or if a speaker had a maxim

um duration of 35 csecs, then the offset

perturbations would only be 1.5 csecs. A

ccomm

odating the different durations is desirable as the four speakers vary greatly in their (m

ean) maxim

um durations: 31, 39, 42 and 47 csecs.

How

ever, while accom

modating differences such as fast and slow

speech, this method w

ould not allow

for intrinsic differences between tones, in particular the extrem

ely short duration of the stopped tones. H

owever, R

ose does note that the perturbations are much less extrem

e on the stopped tones (w

hich are also much shorter in his data). This w

ould seem to suggest that

either the offset perturbations are sensitive to the syllable type, the speaker anticipating a final consonant, or that the difference results intrinsically from

the final consonant, or else that the offset perturbations are sensitive to the different tonal targets. That is, the extrinsic gestures m

ade in tonal production may seem

to somehow

accomm

odate phenomena such as offset

perturbations (or vice versa), suggesting a fraction of the desired, or target, duration for the tone in question as that portion to be ignored. I w

ill express these, then, in terms of percentage

of duration of the specific tone (namely the final 5%

) to avoid any disturbances in the perception of the target contour due to the relative differences in duration betw

een the tones. 7 1 centisecond = 10 m

illiseconds.

C

HA

PTER 4

38

The final 5% of duration shall thus be ignored in m

y individual speaker descriptions. Should the reader w

ish to check these perturbations, figures 4.1 to 4.4 show the results for the citation

tones without any m

odifications, plotting the F0 from 0–100%

duration.

Figure 4.1. W

XQ

: Citation tones 0–100%

duration.

Figure 4.2. FM

: Citation tones 0–100%

duration.

05

1015

2025

30

90 100 110 120 130 140 150

Mean F0 plotted against m

ean duration for WX

Q

Mean duration (csec.)

F0 (Hz)

T1

T2T3

T4

T5

T6

T7

010

2030

4050

160 180 200 220 240

Mean F0 plotted against m

ean duration for FM

Mean duration (csec.)

F0 (Hz)

T1

T2

T3

T4

T5T6

T7

A

CO

USTIC

CH

AR

AC

TERISTIC

S OF TH

E CITA

TION

TON

ES

39

Figure 4.3. LY

: Citation tones 0–100%

duration.

Figure 4.4. ZPW

: Citation tones 0–100%

duration.

010

2030

40

140 160 180 200 220

Mean F0 plotted against m

ean duration for LY

Mean duration (csec.)

F0 (Hz)

T1T2

T3T4

T5

T6

T7

010

2030

40

80 100 120 140 160

Mean F0 plotted against m

ean duration for ZPW

Mean duration (csec.)

F0 (Hz)

T1T2

T3

T4T5T6

T7

C

HA

PTER 4

40

One im

portant conclusion that may be draw

n from the above discussion is that not all

the F0 reflects the tones. It may be m

ore appropriate to conceive of the F0 contours as tonal targets, as w

ell as perhaps consonantal targets (allowing for consonant-tone interaction – both

at onset and offset), rather than considering all of the F0 information relevant to the

perception, or even the production of the tones. The next section presents the results for each of the four speakers.

4.4 Individual speaker results This section presents and describes the results for each speaker. W

hile these results have ignored the aforem

entioned 5 csec. onset and 5% duration offset perturbations, the percentage

points are of the original duration.

4.4.1 Speaker 1: WX

Q

The plotted mean F0 contours for speaker 1 are given in figure 4.5. W

XQ

has a small F0 range

(distance between the low

est and highest points in the tonal configuration relative to the y-axis), just 37 H

z, lying between 95 and 135 H

z. His m

aximum

duration finishes just short of the 30 csec m

ark. The relative durations of each tone for the first speaker are: T6 > T2 > T3, T5 > T1 > T4 > T7

The rise-fall has the longest duration, thought its duration is only slightly greater than all of the falling tones, w

hich cluster together at the bottom of the speaker’s range, tone 2

being slightly longer than tones 3 and 5. Next is the level tone, tone 1, and the shortest of all

tones are the stopped tones. The higher of the two stopped tones, w

ith the slightly less com

plex contour – rising, not dipping – is the shortest tone.

Tone 1: Starting at the highest onset point, this is basically a level tone. H

owever, after an

initial drop of nearly 10 Hz betw

een 5 and 20%, it steadily rises at a rate of 0.16 H

z/csec for the rest of its duration.

Tone 2: Tone 2 has the third highest onset point, just about in the m

iddle of the speaker’s range. It is a falling tone, though the derivative is not constant throughout the tone’s duration. From

20-40% of duration, the gradient is -0.43 H

z/csec, increasing to -0.77 for the remainder

of the duration. This gives an overall derivative of -0.8 Hz/csec for this tone.

Tone 3: This com

mences at the third low

est onset point, at about a third of the way up the

speaker’s range. This tone has a concave fall – that is, the gradient decreases with duration, to

90% of its total duration, after w

hich it drops off more steeply.

Tone 5: The onset point is just 2 H

z higher than that of tone 2. These tones appear to be very sim

ilar, with close onset and offset points. H

owever, there is a difference in the actual

contours, and a further difference evidenced in the overall derivatives for these tones. Tone 5,

A

CO

USTIC

CH

AR

AC

TERISTIC

S OF TH

E CITA

TION

TON

ES

41

while also being slightly shorter in duration, has a consistently steeper gradient than that of

tone 2. In fact, the gradient for tone 5 is nearly constant throughout the tone’s duration, at a rate of 1.05 H

z/csec. Despite their sim

ilar F0 shapes, these tones are audibly different.

Tone 6: The onset point for this tone is about 2.5 H

z below that of tone 3. Its F0 contour is

convex. The rise component starts at 20%

duration and peaks at 50% after w

hich it steadily falls for the rest of the tone. The rise has a derivative of 0.7 H

z/csec and the fall is -1.06 H

z/csec.

Tone 4: This has the low

est onset point and is a dipping tone. It is a stopped tone and the second shortest of all tones, and a com

parable contour to tone 7 for WX

Q. D

ropping at a rate of -0.65 H

z/csec from the onset point to 20%

duration, this tone then rises at a rate of 0.35 to 40%

after which it steepens to 2.45 H

z/csec.

Tone 7: This has the m

id onset point, about 1 Hz less than that of tone 2. It has a sim

ilar contour to that of tone 4, though the corresponding derivatives are m

uch greater and the duration is (consequently?) shorter. A

fter dropping about 1 Hz in the first 2.5 csecs, this tone

begins to rise at a rate of 0.35 Hz/csec for 20%

duration, at which point it steepens to 2.45

Hz/csec for the rem

ainder.

Cluster points: It is possible to distinguish certain levels w

ithin this system. The m

ost striking im

pression is the clustering of tones in terms of their onset points into tw

o groups of three and one separate to both groupings. The highest onset point is defined by tone 1. The second and m

id range point groups tones 2, 5, and 7. Tones 3, 4 and 6 are among the low

est of the onset points. The first point, 5, is the highest level to w

hich tone 7 rises. Point 4 is the level on w

hich tone 1 lies, to which tone 4 rises and from

which tone 5 falls. Point 3 is the level at

which tones 2, 5 and 7 have their onset and to w

hich tone 6 rises. Point 2 is the level from

which tones 3, 4 and 6 start and level 1 is the point of offset for all the falling tones. G

iven these points, w

e may quantify the tonal F0 as follow

s: T1: 44, T2: 31, T3: 21, T4: 24, T5: 31, T6: 231, T7: 35

C

HA

PTER 4

42

Figure 4.5. W

XQ

mean F0 contours from

5csec. to 95% duration.

4.4.2 Speaker 2: FM

Speaker 2’s range is about 70 H

z, lying between 165 and 240 H

z and her maxim

um duration

finishes at about 45 csecs as can be seen in fig. 4.6. Her relative durations are:

T6 > T1, T2, T3 > T5 > T4 > T7

The rise-fall is the longest of all tones, then the level, mid and low

tones. Next is the high fall

and finally the stopped tones.

Tone 1: This tone starts at the top of the speaker’s range. The onset and offset points are just

about equal, but strictly speaking, the tone is not level. After an initial drop of nearly 5 H

z in about 2.5 csecs (w

hich I suggest is still part of the onset perturbation), the tonal F0 continues to fall from

20 to 60 % of the duration at a rate of -0.23 H

z/csec. At this point it then rises at a

rate of 1.1 until offset.

510

1520

2530

100 110 120 130

Mean F0 &

Mean duration for W

XQ

: 5 csec to 95% duration

Mean duration (csec.)

F0 (Hz)

T1T2T3

T4

T5T6

T7

A

CO

USTIC

CH

AR

AC

TERISTIC

S OF TH

E CITA

TION

TON

ES

43

Tone 2: Tone 2 starts in the m

iddle of the speaker’s range. The F0 contour is slightly concave (the reference points being onset and offset of duration), w

ith a reasonably constant fall (-0.95 H

z/csec) for about 60% of its duration. A

lthough this tone and tone 3 are both falling, their durations group together w

ith the high (basically) level tone: whereas the high fall, traversing

a much larger portion of the w

hole range, is also distinguished from these tw

o falling tones by its shorter duration.

Tone 3: This tone shares the onset point w

ith tone 6 at about one-third up in the speaker’s range. Its F0 contour is slightly concave and alm

ost parallel in shape to that of tone 2, just described.

Tone 4: This has the low

est onset point, roughly 10 Hz low

er than tone 3’s onset point. The F0 contour is slightly dipping, though m

ostly low-rising, and it is a stopped tone, w

hich explains the short duration. It has a sim

ilar contour to the rising component of tone 6, and lasts

for the same length of tim

e.

Tone 5: This tone appears to have tw

o different targets. For three speakers, tone 5 is a high falling tone, starting at the top of the speaker’s range, about 5 H

z higher than tone 1. It falls steeply and im

mediately, w

ithout any initial level component, traversing about tw

o-thirds of the speaker’s range, and finishing at about the sam

e place in the range as tone 2. For speaker 1, this is m

ore like a mid-fall, w

ith a very similar tonal F0 shape to (his) tone 2.

Tone 6: This tone has a low

onset to a rise-fall contour. The rise peaks at 50% of duration, and

this is maintained through to 60%

. The derivatives either side of this plateau are just about the sam

e, differing only in sign, with the rise at 1.6 and the fall -1.5 H

z/csec.

Tone 7: O

nsetting about 10 Hz below

tone 1, this tone drops nearly 15 Hz in less than 15

csecs. This is also a stopped tone, hence its short duration. Unlike speaker 1 (and like

LY/ZPW

), FM has a shorter, non-rising tone 7.

Cluster points: A

similar clustering effect for the onset points as w

as found for WX

Q can be

noted here for FM, except three tones (1, 5, 7) group together at the top, w

ith tone 2 at the m

id-range point, and the same three tones (3, 4, 6) in the low

er half of the range. How

ever, in order to describe the F0 contours, the top cluster point is broken dow

n into tones 1 and 5 at the top and tone 7 just below

. Six points are then needed to describe these F0 data. Point 6 is the point of onset for tone 5 and the point of onset and offset for tone 1. Point 5 is the point of onset for tone 7, and point 4 is its offset. The third cluster point defines the onset of tone 2, the offset of tone 4 and the peak of tone 6. Point 2 is the point to w

hich tones 2, 5 and 6 fall and at w

hich tones 3, 4 and 6 onset. The lowest point is the point to w

hich tones 3 and 4 fall. This entails a tonal F0 value assignm

ent as follows: T1: 656, T2: 32, T3: 21, T4: 213, T5: 52, T6:

232, T7: 4.

C

HA

PTER 4

44

Figure 4.6. FM

mean F0 contours from

5csec. to 95% duration.

4.4.3 Speaker 3: LY

Speaker 3 has a range of about 75 Hz betw

een 145 and 225 Hz, and m

aximum

duration finishing at about 37 csecs as seen in fig. 4.7. R

elative durations are:

T6 > T1, T2 > T3, T5 > T4 > T7

Again, the rise-fall is the longest of all tones. N

ext come the high, m

ostly level and the mid

tones. The next are the high fall and the low fall tones. Finally the stopped tones are the

shortest, the low rise being alm

ost double that of the high stopped tone.

Tone 1: This tone onsets about one quarter from

the top of the speaker’s range. After a gentle

fall (0.3 Hz/csec) to 40%

duration, the F0 rises steadily for the rest of the duration at a rate of 0.5 H

z/csec.

1020

3040

160 180 200 220 240

Mean F0 &

mean duration for FM

: 5 csec to 95% duration

Mean duration (csec.)

F0 (Hz)

T1

T2T3

T4

T5T6

T7

A

CO

USTIC

CH

AR

AC

TERISTIC

S OF TH

E CITA

TION

TON

ES

45

Tone 2: This onsets in the m

iddle of the speaker’s range, and falls steadily at a rate of -0.3 H

z/csec for 80% of the duration. The last part of the duration evidences a rise in the F0

contour at a rate of 1.0 Hz/csec.

Tone 3: O

nsetting just less than 10 Hz below

tone 2, the F0 contour then falls to the lowest

point in the speaker’s range. The contour is not similar to that of tone 2, rather this tone falls at

a rate of -0.9 Hz/csec until 40 %

duration, after which the contour becom

es concave, falling at a m

uch higher rate that then decreases with tim

e.

Tone 4: This tone has the low

est onset point and the F0 falls to nearly the bottom of the

speaker’s range until 40% of duration. A

fter this point, the contour begins to rise gradually for the next 20%

(0.3 Hz/csec), then steepens to 3.1 H

z/csec for the remainder of the duration.

This is a stopped tone, so the duration is very short.

Tone 5: This high falling tone onsets at the top end of the speaker’s range, and falls to about

two-thirds of the w

ay down the range. It falls w

ithout an initial level component and its overall

derivative is -2.4 Hz/csec.

Tone 6: This rise-fall has the second low

est onset point, just 6 Hz above that of tone 4. This

F0 contour falls at a rate of -0.35 Hz/csec for 40%

of the duration, which then becom

es the rise com

ponent. The rise lasts for only 20% duration, but is a large increase of 24/2 H

z. The final fall com

ponent steepens with duration and has an overall derivative of -2.3 H

z/csec, falling just short of the bottom

of the speaker’s range.

Tone 7: The tone has the highest onset point and offsets just above the onset point for tone 1.

This is a stopped tone and the shortest of all the tones. Its F0 contour is a slight fall until 80%

duration, after which it drops off rapidly.

Cluster points: The onset points for LY

group into either high or low parts of the speaker’s

range. In the top half are tones 1, 5 and 7, and from the m

id-range point down lie the other

tones. Salient points include the highest point, “5” is the point to which tone 1 rises and from

w

hich tones 5 and 7 fall. Point 4 is the onset point for tone 1 and offset for tone 7. Tone 2 lies on the cluster point 3, also the point to w

hich tones 4 and 6 rise. Tone 3 could be interpreted to start at either 3 or 2. C

luster point 2 is also the point to which tone 5 falls and from

which tone

6 rises. The final cluster point is the point from w

hich tone 4 rises and to which tones 3 and 6

fall. This gives rise to the following tonal F0 values: T1: 45, T2: 33, T3: 31/21, T4: 13, T5: 52,

T6: 231, T7: 54.

C

HA

PTER 4

46

Figure 4.7. LY

mean F0 contours from

5csec. to 95% duration.

4.4.4 Speaker 4: ZPW

ZPW’s F0 contours are given in fig. 4.8 below

. The speaker’s range is about 100 Hz, lying

between 70 and 170 H

z. His m

aximum

duration falls just short of the 40 csec mark. R

elative durations are:

T1, T2 > T3, T6 > T5, T4 > T7

The high mostly level and m

id fall tones are the longest. The low fall and the rise fall are the

next longest tones, with alm

ost the same duration. The high falling tone is quite short,

effectively the same as the rising stopped tone. The shortest of all tones is the high stopped

tone, less than half the duration of the other stopped tone.

510

1520

2530

3540

140 160 180 200 220

Mean F0 &

mean duration for LY

: 5 csec to 95% duration

Mean duration (csec.)

F0 (Hz)

T1

T2

T3T4

T5

T6

T7

A

CO

USTIC

CH

AR

AC

TERISTIC

S OF TH

E CITA

TION

TON

ES

47

Tone 1: Tone 1 onsets at the third onset point, about a third of the w

ay from the top of the

speaker’s range. The gradient is steady for most of the duration, rising at a rate of 0.21

Hz/csec.

Tone 2: This tone has its onset point in the m

iddle of the speaker’s range. The F0 falls steadily for all of the duration, at a rate of -0.58 H

z/csec, stopping at about one-third of the way from

the bottom

of the speaker’s range.

Tone 3: This tone has the next low

est onset point, about 10 Hz low

er that that of tone 2. This tone falls m

ore steeply than tone 2, to the lowest point in the speaker’s range at a rate of -1.28

Hz/csec.

Tone 4: This rising stopped tone has the low

est onset point, at about one-third from the

bottom of the speaker’s range. N

ot much of a fall is obvious though the gradient steadily

increases with duration, giving this tonal F0 m

ore of a level-rise appearance. The contour rises to about level w

ith the onset of tone 1, and just past the peak in tone 6’s rise.

Tone 5: The high fall has the highest onset. It falls rapidly, at a rate of -3.9 H

z/csec, traversing tw

o-thirds of the whole range.

Tone 6: This convex tone starts about 2 H

z higher than tone 4. The rise component peaks at

60% of the total duration, rising at a rate of 2.0 H

z/csec. After this point it falls at a rate

of -2.64 Hz/csec to just above the corresponding place in the range for the point of onset.

Tone 7: This tone has an onset point halfw

ay between those for tones 1 and 5. This is a very

short tone, so there’s not much detail in the contour. H

owever, it is basically level w

ith a final drop.

Cluster points: Just like the clustering for speaker 3, ZPW

’s onset points may be grouped into

higher and lower parts of the speaker’s range, w

ith tones 1, 5 and 7 in the higher part. H

owever, like speaker 2, six points are necessary to properly describe the F0 contours in this

speaker’s system. These m

ay be defined as follows: point 6, the highest point is that from

w

hich tone 5 falls. Point 5 is the point on which tone 7 lies, and to w

hich tone 1 rises. The fourth point is the onset of tone 1, and that to w

hich tones 4 and 6 rise. The cluster point 3 defines the onset of tone 2 and the point 2, the onset of tones 3, 4 and 6. The low

est point defines the offset of tone 3. Follow

ing from this are the tonal value assignm

ents: T1: 45, T2: 32, T3: 21, T4: 24, T5: 62, T6: 242, T7: 5.

C

HA

PTER 4

48

Figure 4.8. ZPW

mean F0 contours from

5 csec. to 95% duration.

4.4.5 Summ

ary This section has presented and described the results for each of the four speakers individually follow

ing the parameters outlined earlier. The next section w

ill be comparing and contrasting

these results. How

ever, it should be obvious that defining cluster points for individual speakers is not that productive. M

ore productive is to first factor out the between-speaker

differences and a representation of the variety as a whole has first been obtained. The next

section compares and contrasts the descriptions w

hich is a useful exercise in that it serves to illustrate the problem

s associated with descriptions based only on one speaker; the type of

idiosyncratic differences between speakers in the production of tonal targets; and the

similarities, specifically of contour, for w

hich all speaker’s appear to be aiming w

hen they produce their tones.

510

1520

2530

3540

80 100 120 140 160

Mean F0 &

mean duration for ZP

W: 5 csec to 95%

duration

Mean duration (csec.)

F0 (Hz)

T1T2

T3

T4

T5T6

T7

A

CO

USTIC

CH

AR

AC

TERISTIC

S OF TH

E CITA

TION

TON

ES

49

4.5 A com

parison of the individual speaker’s results This section com

pares and contrasts the tonal F0 configurations for all the speakers. This goal of this is to illustrate characteristics of extrinsic control by identifying the sim

ilarities (and differences). This w

ill be useful for the following sections that assess w

hich of the tones to take as the param

eters for normalization. A

fter first comparing the relative lengths for

duration, I compare and contrast individual tones or pairs of tones.

4.5.1 Duration com

parisons The results of the duration com

parisons reflect what w

ould be expected both in terms of

intrinsic and extrinsic control. A falling tone is expected to be intrinsically shorter than a non-

falling tone, and a rising tone is expected to be intrinsically longer than a non-rising tone. The short duration of the stopped tones reflects extrinsic control. These facts are w

ell corroborated by m

y results.

For all speakers except ZPW the rise-fall tone is the longest, follow

ed by the high level and then the m

id and low falls. W

hile ZPW has his level tone as the longest, the other

speakers’ tones 1, 2, 3 and 6 are longer than tones 4, 5 and 7 (high fall, stopped tones). All

speakers are similar in that the high fall is shorter than any of the other non-stopped tones

(except WX

Q w

hose actual target for this tone will be discussed below

), and of the two

stopped tones, the tone with the com

plex rising/dipping contour will have the longer duration.

4.5.2 Tonal F0 comparison

Tone 1: This tone is located in the top third of all speakers’ ranges. The F0 has a steady,

though gradually increasing derivative for the whole duration. The fem

ale speakers, however,

show m

ore dynamic contours – FM

dropping a little, and then increasing at a greater rate during the final 20%

duration; LY show

ing a similar behavior, though w

ith relatively smaller

gradients. All speakers offset this tone at a point in their range just above, though very close

to, their onset point for the same tone.

Tone 2: Tone 2 starts about m

id range for all speakers and falls. While FM

produces her tone 2 w

ith a slightly concave F0 contour, the other speakers have contours with reasonably

consistent gradients throughout the whole duration. LY

does not fall as far as WX

Q or ZPW

(the m

ale speakers), exhibiting a small rise during the final 5%

duration, placing the offset point very close to the onset point in her range. FM

and ZPW fall to com

parable levels in their tonal system

s, the tone finishing at a place very near to that of tone 6’s offset. LY’s tone 2

finishes much higher in her system

than this. WX

Q’s offsets before the end of his tone 6, the

offset point of which is closer to that of tone 4. H

is tonal F0 is much steeper than that of the

other speakers, making it very like the F0 contour of (his) tone 5.

Tone 3: This tone alw

ays has the third lowest onset point, in the bottom

third of the speaker’s range. The F0 falls to the low

est point in the speaker’s range, identifying the bottom of the

range.

C

HA

PTER 4

50

Tone 4: This is a dipping stopped tone. For all speakers this is located in the bottom

half of their F0 range, w

ith the lowest onset point. It generally rises to near the highest point in tone 6,

the rise-fall tone, and is roughly equal in duration to the rise component as w

ell. It is possible that the initial fall in the dipping tonal F0 contour m

ay be partly consonantally induced.

Tone 5: W

XQ

seems to have a different target for this tone. H

is level tone is shorter in duration than this tone, but m

ost noticeable is the difference in derivative for this tone between

WX

Q and the other speakers. M

ost speakers’ tone 5 has a derivative at least twice as steep

(FM) as that of their tone 2, even up to eight tim

es as steep (ZPW). W

XQ

’s tone 5, however,

has a derivative only 1.3 times as steep as that of his tone 2. W

XQ

’s tone 5 also falls to the very bottom

of his range. It could be a consequence of his condensed range, the peripheral points seem

ing to become m

ore centralized. The end result is that there is little difference betw

een WX

Q’s tones 2 and 5. It should be recalled that despite their acoustic sim

ilarity they are in fact audibly different, and also m

ay be perceived as a different target. For the other speakers, this high fall starts at (FM

, ZPW) or second from

(LY, W

XQ

) the top of the speaker’s range. The F0 alw

ays falls without a level com

ponent, traversing approximately

two-thirds of the speaker’s range. This is usually the shortest of all the unstopped tones.

Tone 6: A

ll speakers exhibit a convex contour for this tonal F0. Starting at the second lowest

onset point, the peak is reached between 50 and 60%

duration. Both fem

ale speakers exhibit a sm

all drop before comm

encing the rise; however, the point at w

hich they offset is roughly equal to the point at w

hich they onset. ZPW, w

hile not exhibiting the initial drop, still offsets this tone at roughly the sam

e point as the onset. WX

Q also does not drop in F0 after onset, and

his offset point is much low

er in his range than that of the onset. While m

ost speakers rise consistently betw

een 20 and 50/60% duration, LY

only rises between 40 and 60%

of her duration. This shorter rise com

ponent in the pitch contour for LY’s tone 6 is audibly different

from the pitch contours for the tone 6s produced by the other speakers.

Tone 7: This tone show

s some variation betw

een speakers. WX

Q clearly has a different target

for this tone. With the duration nearly equal to half of his m

aximum

duration, the F0 appears to have a target contour sim

ilar to that of tone 4. All other speakers have a level or falling

contour for this tonal F0, with the onset point at or near the top of the speaker’s range.

Crucially, the duration for m

ost speakers is less than one-third of their maxim

um duration.

4.5.3 Importance of m

ulti-speaker data Parallel contours m

ay be found within individual system

s, such as between tones 2 and 3 in

FM’s system

, tones 1 and 2, and also tones 3 and 5 in LY’s system

. WX

Q parallels his tones 4

and 7. While none of ZPW

’s tones show any striking parallels in their contours, one could

consider that either tones 1 and 2, or tones 2 and 3 parallel the same contour. From

this it is im

mediately obvious w

hy a multi-speaker approach is superior to investigating the speech of

only one person. Given the above inform

ation, together with a ‘single speaker’ approach, one

could conclude that the Fuzhou variety had, as phonologically significant characteristics:

A

CO

USTIC

CH

AR

AC

TERISTIC

S OF TH

E CITA

TION

TON

ES

51

i. Tw

o falling tones

ii. Tw

o level tones and two falling tones

iii. Tw

o rising stopped tones, one high and one low

iv. Either tw

o level tones and two falling tones, O

R one level and tw

o falls

The advantages of a multi-speaker approach are thus clear: the possibility of factoring out

between-speaker differences to get at the features of extrinsic control w

hich are representative of the variety as a w

hole is clearly desirable. The next section discusses how to do this through

normalization.

4.6 Norm

alization This section introduces the concept of norm

alization, discusses its benefits and techniques for carrying it out.

4.6.1 The importance of norm

alization R

ose (1991) discusses the need to eliminate non-linguistic factors w

hen attempting to

characterize a

whole

variety: “O

ne of

the m

ajor aim

s of

linguistic phonetics

is the

identification of the phonetic features which specify the sounds w

ithin a given language or variety. H

owever linguistic phonetic sam

eness is often instrumentally elusive. O

n the acoustic level of description, differences betw

een speakers in acoustic output caused by differences in their vocal tract anatom

y will often be large enough to sw

amp not only the linguistic content,

which is signalled by the particular sound contrasts involved, but also a fortiori the phonetic

detail which characterises the sounds of one particular variety against another.” (R

ose 1991:230).

Every speaker’s acoustic output will be different for w

hat is perceived to be the same

sound because the size of the individual’s vocal tract is also different, and the acoustic properties of radiated speech w

aves are a unique function of a speaker’s vocal tract anatomy.

The length and mass of the vocal cords is one of the m

ain physiological differences resulting in different acoustic outputs. For exam

ple in the previous section we saw

that the male

speakers have ranges roughly 100 Hz low

er than the female speakers, and there w

as also quite a difference betw

een the male speakers. O

ne can assume this is due to the difference in the

length and mass of the vocal cords: the longer, m

ore massive vocal cords result in a low

er F0.

The fact remains, how

ever, that these sounds are actually perceived to have the same

linguistic content. The linguistic content has thus to be mediated by a process separating it

from the com

ponents determined by the individual speaker’s physiology. “N

ormalisation is a

mathem

atical analogue of this perceptual process, aiming to extract and specify the invariant

acoustic correlates of the Accentual and Linguistic features of a particular variety, and then to

compare varieties for typological and universal purposes” (R

ose 1987:343).

While there have been a num

ber of investigations into vowel norm

alization (e.g. D

isner 1980), Earle (1975) was the first to norm

alize of the acoustic correlates of tone. Rose

C

HA

PTER 4

52

(1987) then examined som

e considerations in tonal normalization in greater detail. This w

ork w

as a breakthrough for tonal description, because without norm

alization it is not possible to take the first step in defining the phonetic and phonological features of a particular variety, nam

ely determining the features w

hich serve to characterize the whole variety and discarding

the idiosyncrasies.

4.6.2 Norm

alization procedural techniques The aim

of normalization is to m

aximally reduce the betw

een-speaker variance while still

making sense perceptually. The notion of perceptual sense, w

hich serves to evaluate the num

erical strategy, can be understood in two w

ays. Firstly, that the normalized values should

correctly reflect the transcriber’s auditory impression, and secondly that the norm

alization should ideally m

odel the actual process of the listener’s perceptual parameters. H

owever, not

enough is known about the relationship betw

een linguistic pitch and its acoustic correlates to facilitate this. F0 is thus taken to be the param

eter for normalization, as it is considered to be

the primary acoustic correlate of pitch (Lehiste 1970: 54). C

onsidering that a normalization

strategy is meant to reflect the speaker’s perception, it is desirable that betw

een-speaker differences in pitch be reflected as differences in norm

alized F0.

The normalization procedure em

ployed in this study is that of the z-score transform,

following R

ose (1987), who reports its superiority in num

erical performance over a com

peting m

ethod, the Fraction of Range. H

e also notes that another advantage to using the z-score is its use of m

any F0 values as normalization param

eters, not just two as em

ployed by the latter technique. H

e further notes that the root-mean-square basis of the function w

ill ensure a globally distributed reduction in betw

een-speaker variance. That is, the individual values are squared, then averaged, and finally the m

ean is taken. How

ever, normalization param

eters should only be calculated from

samples that are com

parable in order to avoid biasing. The ‘high fall’ is a good exam

ple. It is not clear whether all speakers have the sam

e extrinsic target, so if w

e were to include the ‘high falls’ in the norm

alization parameters, it m

ay skew

the other tones. That is, all the speaker’s tones that had the lower high fall w

ould be rendered slightly higher. It is thus preferable to exclude it from

the normalization param

eters. When

considering which sam

ples would be suitable candidates for norm

alization parameters, it is

also necessary to exclude those parts of the signal already ‘discarded’ with respect to

conveying tonally relevant distinctive features, namely the onset and offset perturbations. A

s for transcriptional equivalence, I chose tones 1, 3, 4 and 6 as com

parable tones for this purpose. That is, the high level, low

fall, the rising stopped tone and the rise-fall. Thus my

parameters for norm

alization were these tones from

all four speakers at the 20, 40 60 and 80%

duration sampling points.

The z-score normalization procedure is as follow

s:

𝑧𝑧=

��� �����

where F0

i is the sample point, F0  is the average F0 from

the arithmetic m

ean of all the points chosen to be norm

alization parameters, and SD

is the standard deviation of the mean of those

points (all values calculated to three decimal places). The results are in A

ppendix E. Other

strategies for normalization include taking the log of the F0 (e.g. see N

earey 1989; Zhu 1999).

A

CO

USTIC

CH

AR

AC

TERISTIC

S OF TH

E CITA

TION

TON

ES

53

4.6.3 Results of the norm

alization The results are show

n in the figures below w

hich plot the normalized tones for each speaker

by tone. Most of the transform

ed shapes cluster together quite closely, and some of the tones

with different betw

een-speaker pitches have been kept separate nicely, for example, the larger

derivative in WX

Q’s tone 2 and his low

er onset point for tone 5, LY’s delay of about 20%

duration for the rise in tone 6, and W

XQ

’s different contour for tone 7.

Recall that the norm

alization parameters w

ere F0 values at percentage points of duration, so the x-axis is now

expressed as a percentage of total duration for that tone. The normalized F0

range is now quantified in units of standard deviations from

the mean.

Figure 4.9. N

ormalized tone 1.

Figure 4.10. Norm

alized tone 2.

Figure 4.11. N

ormalized tone 3.

Figure 4.12. Norm

alized tone 4.

020

4060

80100

1 2 3 4

Percentage of duration

SD from mean

020

4060

80100

-1 0 1 2

Percentage of duration

SD from mean

020

4060

80100

-2 -1 0 1

Percentage of duration

SD from mean

020

4060

80100

-1 0 1 2 3

Percentage of duration

SD from mean

C

HA

PTER 4

54

Figure 4.13. N

ormalized tone 5.

Figure 4.14. Norm

alized tone 6.

Figure 4.15. N

ormalized tone 7.

It is worth m

entioning a few things about these representations. A

ll tones will have reached

what is presum

ably the target contour by 20% duration, m

ost noticeable for tone 1. This is just about equal to the 5 csec onset perturbation I proposed to ignore for the F0 descriptions in section 4.3. A

side from the already m

entioned between-speaker differences, there seem

s to be a possible betw

een-sex difference in the third tone. The males have low

er, more level tones

than the females w

hose tones fall slightly, about 0.5 of a standard deviation above the males.

Now

it is possible to specify part of a linguistic-phonetic-acoustical representation of the Fuzhou citation tones, representative of the w

hole variety. Such a representation is shown

in figures 4.16 – 4.22, for each individual tone and figure 4.23, for all tones.

020

4060

80100

-2 -1 0 1 2 3

Percentage of duration

SD from mean

020

4060

80100

-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0

Percentage of duration

SD from mean

020

4060

80100

0 1 2 3 4 5

Percentage of duration

SD from mean

A

CO

USTIC

CH

AR

AC

TERISTIC

S OF TH

E CITA

TION

TON

ES

55

Figures 4.16 – 4.22 plot the mean of the norm

alized tonal F0 values together with one

standard deviation above and below the m

ean, as indicated by the vertical lines. Assum

ing that the norm

alized data are normally distributed, about 66%

of Fuzhou normalized F0 w

ould be expected to lie w

ithin the one SD range above and below

the mean, as indicated in the plots.

In this way, it indicates the degree of expected variation in Fuzhou, w

ith 2/3 of speakers’ tonal F0s falling w

ithin the limits set by these bars (given 100 speakers). The points deem

ed not com

parable for normalization are excluded from

these mean values. These plots then exclude

WX

Q’s tones 5 and 7 and the 40%

point of LY’s tone 6. Strictly speaking the betw

een-sex difference found in tone 3 should perhaps also be kept separate, but I aw

ait further investigation and thus confirm

ation of this as a real between-sex difference before excluding

it. Unless otherw

ise specified, the mean w

as taken from all four speakers.

Figure 4.16. M

ean normalized tone 1.

Figure 4.17. M

ean normalized tone 2.

Figure 4.18. Mean norm

alized tone 3. Figure 4.19. M

ean normalized tone 4.

020

4060

80100

1 2 3 4

Percentage of duration

SD from mean

020

4060

80100

-1 0 1 2

Percentage of duration

SD from mean

020

4060

80100

-2 -1 0 1

Percentage of duration

SD from mean

020

4060

80100

-1 0 1 2 3 4

Percentage of duration

SD from mean

C

HA

PTER 4

56

Figure 4.20. M

ean normalized tone 5.

Figure 4.21. Mean norm

alized tone 6.

Figure 4.22. M

ean normalized tone 7.

Figure 4.23 shows these m

ean values plotted together to represent the tonal configuration of Fuzhou citation tones, but w

ithout illustrating the standard deviations away from

the mean.

Now

we have a representation of the w

hole Fuzhou variety, having factored out the between-

speaker differences.

020

4060

80100

-2 -1 0 1 2 3

Percentage of duration

SD from mean

020

4060

80100

-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0

Percentage of duration

SD from mean

020

4060

80100

0 1 2 3 4 5

Percentage of duration

SD from mean

A

CO

USTIC

CH

AR

AC

TERISTIC

S OF TH

E CITA

TION

TON

ES

57 Figure 4.23. M

ean normalized tones in Fuzhou, plotted against m

ean duration.

At this stage I should note that norm

alized F0 values should be expressed as functions of norm

alized duration parameters (e.g. R

ose 2000). How

ever, I can only quantify the m

agnitude of onset and offset perturbations in terms of centiseconds and percentage of

duration respectively. While desirable to norm

alize duration, it is also desirable to be able to represent Fuzhou tones as only those tonal contours used to distinguish linguistically relevant tonal features. This entails factoring out the betw

een-speaker differences, as I have just done, and also the onset and offset perturbations. To find a suitable w

ay of converting the parameter

used in duration normalization to centiseconds is beyond the scope of this study. I thus

decided to take the arithmetic m

eans of the tones in order to be able to represent duration in term

s of centiseconds and thus exclude what I have found to be the perturbations.

510

1520

2530

3540

-2 -1 0 1 2

Mean norm

alized tones: 5 csec. to 95% duration

Mean duration (csec.)

SD from mean

T1T2

T3

T4

T5

T6

T7

C

HA

PTER 4

58

4.7 Distinctive features for Fuzhou tones

From the tonal configuration given in figure 4.23, w

e can now discuss the distinctive features

required to describe these data from one of the m

any models that have been proposed. Let us

consider one of the first models for such features: W

ang (1967). Wang considers contour tones

to be part of the whole syllable and proposed features to describe these such as [±fall], [±rise],

Applying these to the set of norm

alized tones in Fuzhou, the following specifications can be

made in order to classify (not describe) these tones.

� The prim

ary distinction is made by the presence or absence of a final stop, aptly called

(though not after Wang) [±short]. If the distinctive feature of the tone is that it is [+short],

then the best way to obtain this is to truncate it w

ith a glottal stop, which m

ay be considered part of the tone and not a phonological segm

ent. Of course, it m

ay also be short if the glottal stop is phonologically segm

ental and consumes a large part of the

rhyme duration, for w

hich only a quantum of duration is specified, thus abbreviating the

duration available for tonal F0.

� A

ll the tones that are [–short] may be divided into [±contour]. Tones 1, 2 and 3 can be

considered to be [–contour] and then may be m

inimally distinguished from

each other by [+high], [–high, –low

] and [+low] for each of the tones respectively. If the tone is

[+contour] – tones 5 and 6 – we m

ay further describe them as [+fall] and [+rise, +fall].

� The short tones m

ay be distinguished by either [±high] or [±contour], though the latter goes som

e way tow

ards explaining the difference in duration – the rising stopped tone taking longer than the sim

ple high stopped tone.

Table 4.2 summ

arizes this feature assignment. For feature assignm

ent using Yip’s m

odel, please see section 6.4.

Tone 1

Tone 2

Tone 3

Tone 4

Tone 5

Tone 6

Tone 7

[–short] [–short]

[–short] [+short]

[–short] [–short]

[+short]

[–contour] [–contour]

[–contour] [+contour]

[+contour] [+contour]

[–contour]

[+high] [–high]

[–high] [–high]

[–rise] [+rise]

[+high]

[–low]

[–low]

[+low]

[+fall]

[+fall]

Table 4.2. Phonological feature assignm

ent for Fuzhou.

4.8 Summ

ary This chapter first described the algorithm

s designed to analyze the data. After giving a brief

description of the pitch of the citation tones in Fuzhou, the tonal F0 were discussed, first for

each speaker, then across speakers, thus paving the way for an introduction to the concept of

normalization and the desire to factor out betw

een-speaker differences. This procedure was

A

CO

USTIC

CH

AR

AC

TERISTIC

S OF TH

E CITA

TION

TON

ES

59

then described and the results from the norm

alization were presented. The next chapter

explores the relationship between F0 and am

plitude and presents an assessment of the

importance of radiated am

plitude in tonal production.

This section introduced the concept of normalization and described the m

ethod by which it

can be achieved. After applying the outlined m

ethods to my data, the results w

ere presented and discussed.

60

61

5

The physiology of tone production in Fuzhou

The previous chapter investigated the acoustic dimension of tonal F0. This chapter uses the

acoustics to infer something about the physiology of tone production and provides an insight

into the physiological factors involved in F0 production.

I first discuss the reasons why it is necessary to investigate m

ore than one acoustic dim

ension and why I chose am

plitude. Section 5.3 is a brief description of received attitudes tow

ards F0 production, and is followed by a discussion of the relationships betw

een the two

acoustic parameters and the tw

o physiological parameters. Section 5.6 com

pares these relationships and 5.7 discusses the im

plications for tonal F0 production given my results.

5.1 Why investigate am

plitude? Pitch is usually assum

ed to be the perceptual correlate of fundamental frequency. This is the

result of the assumption that the physiological correlate of tone features is the vibration of the

vocal folds in phonation, and that the acoustic correlate of the vocal fold vibration is the fundam

ental frequency of the sound wave generated at the glottis (Lehiste 1970:54). The rate

of vibration of the vocal cords depends on a number of interdependent factors, such as the

mass of the vibrating part of the vocal folds, the vocal cord tension, the area of the glottis

during the cycle, which determ

ines the effective resistance of the glottis, and the value of the B

ernoulli effect, the value of the subglottal pressure and the damping of the vocal cords. R

ose (1988:11) rem

arks on this: Tonal pitch is [indicated to be] the perceptual result of both general auditory and speech specific

processes operating

on F0.

The speech

specific processes

typically involve

productionally mediated perception, and include speaker and context norm

alisation of F0. Both

of these (but especially the latter) involve interaction, in ways m

ostly not yet clear, between F0

and all the other main acoustic dim

ensions of Ar (radiated am

plitude), duration and spectrum.

Thus tonal pitch cannot be an exclusive function of F0, although consideration of the m

agnitude of perceptual effects indicates that F0 constitutes the basic term in the function

relating acoustics to pitch.

This illustrates the many-to-one relationship betw

een acoustics and auditory features, as pitch is perceived as just one feature, but is actually the result of m

any different acoustic dimensions

co-occurring. This relationship can even be found with features such as [voice]. R

ose (1988:21) found that his perception of the feature [voice] is dependent on sufficient A

r level as w

ell as F0, serving as a reminder of both “the inferential nature of the articulatory features

with w

hich we describe the auditory responses encoded in a phonetic transcription, and of the

fallibility of the process: just because we hear [–voice] does not im

ply that the vocal cords are

C

HA

PTER 5

62

not vibrating.” Let us consider another feature which illustrates how

phonology does not adequately distinguish auditory from

articulatory features. Consider the feature [±nasal]:

hearing an oral (i.e. [–nasal]) vowel does not necessarily m

ean that the soft palate is fully raised (certainly not for low

vowels, for exam

ple).

In the previous chapter the two acoustic dim

ensions of F0 and duration were

investigated. Am

plitude is another of the acoustic dimensions involved in the acoustic

correlation of what is perceived as pitch, and w

ill be investigated in this chapter in terms of its

relationship with F0 and how

these two param

eters may be used to infer som

ething about the production of the tones. First, how

ever, follows a brief introduction to w

hat is meant by

amplitude and w

hy it is important and useful to investigate this dim

ension.

5.2 What is am

plitude? A

mplitude is said to be the prim

ary acoustic correlate of the percept of loudness (Lieberman

and Blum

stein 1991:28). Peak-to-peak amplitude is the extent of the m

aximum

variation in air pressure from

the zero line during a sound (Ladefoged 1962:15), that is, the maxim

um

displacement of a particle set in vibratory m

otion, marking the extrem

e limit of its m

otion of oscillation. This, how

ever, is not a good correlate of what is perceived as ‘loudness’. W

hat is m

ore appropriate is the average displacement of the particle. So the root-m

ean-square (RM

S) am

plitude is calculated, that is a form of the average of the am

plitude which is particularly

useful for complex w

ave forms (the values are individually squared, then averaged, and finally

the square root is taken).

Distinguished from

intensity, which is a m

easure of energy or power (i.e. the ability to

knock a particle sideways), am

plitude is a measurem

ent of pressure and, unlike intensity, is not a function of frequency. A

mplitude is a function of subglottal pressure, all things being

equal. Other m

ajor influencing factors are: the vowel quality, specifically the effect of the

supralaryngeal filter, and the interaction of the source and filter with respect to harm

onics and form

ant frequencies, resulting in local fluctuations in amplitude (nam

ely increasing with the

frequency peaks).

Subglottal pressure (Ps), however, is not just a function of pulm

onic effort, but also of laryngeal activity, and thus vocal cord tension, and also glottal resistance (i.e. the average glottal area). A

s the time-varying am

plitude of the glottal source occurs extrinsically as the result of articulatory gestures that affect the Ps, it should be possible to infer som

ething about the physiological factors involved in the F0 production. It is this aspect of A

r that I will be

primarily concerned w

ith in this chapter. I will next briefly review

the physiological factors involved in tonal production.

5.3 Received theories of F0 production

This section

provides an

overview

of tonal

production, in

order to

investigate the

physiological factors involved in the production of the tones and determine any connection

between the production of F0 and A

r.

T

HE PH

YSIO

LOG

Y O

F TON

E PRO

DU

CTIO

N IN

FU

ZHO

U

63

The relative importance of vocal cord tension (V

CT) and Ps as “m

echanisms of

dynamic F0 control” (R

ose 1982:160) has long been an issue of controversy. It is generally assum

ed that changes in F0 are the result of differences in VC

T due to changes in the intrinsic and

extrinsic laryngeal

musculature;

specifically, passive

implem

entation from

the

cricothyroid and strap muscles; and active im

plementation from

the vocalis, as has been show

n in electromyographic studies on Tai (Erickson 1976).

How

ever, Monsen et al. (1978) proposed a m

ethod of quantifying the separate contributions of V

CT and Ps factors to the production of observed F0 and R

MS glottal

amplitude (G

lot Am

p = volume velocity w

ave generated at the glottis) in English. They found a w

ay of indirectly assessing the contributions of Ps and VC

T to changing F0 by comparing

human glottal-source data w

ith synthetic glottal waveform

s generated by the Ishizaka-Flanagan tw

o-mass m

odel of vocal-fold vibration. “The two-m

ass model duplicates m

any of the essential features of vocal-fold vibration” (M

onsen 1978:66) and provides a means of

investigating the changes that occur in the glottal source when a frequency change is caused

by changes in VC

T and/or Ps. By com

paring the acoustic characteristics of natural phonation w

ith those of synthetic phonation generated by known values of Ps and V

CT, inferences can

be made about the sources of the F0 variation in norm

al speech given the two observed

variables of F0 and Glot A

mp.

The F0 and Glot A

mp values of sam

pled periods were selected and plotted against

each other. This can be seen in figures 5.1 and 5.2, which contain the data synthesized for

vocal folds of typical male and fem

ale dimensions respectively. Each point on these graphs

represents a glottal period of known F0 and G

lot Am

p and corresponds uniquely to a specific value of both Ps and V

CT. “These graphs are thus “m

aps” which can be used to infer changes

in air pressure and vocal-fold tension when the frequency and intensity of individual glottal

periods are known” (M

onsen et al. 1978:69).

I have enlarged the original graphs (Monsen et al. 1978:69 figs 3 and 4) and m

odified them

slightly to accomm

odate the larger amplitude ranges of m

y speakers. The grid is com

posed of intersecting contours of equal Ps and VC

T settings, which connect the values of

the sampled glottal w

ave periods mentioned above. Ps settings, w

hich are shown by the solid

triangles, increase almost vertically, over a range from

3–18cm H

2 0; VC

T settings, shown by

the open circles, increase almost horizontally, in a range from

0.5 to 1.5. No unit is given for

VC

T as it is not an absolute number m

easured in dyn/cm2. Instead, the value of Q

=1.0 is the approxim

ate setting for VC

T typical of phonation in speech. The range represents the variation in tension 50%

above and below this typical value. R

eferring to these maps, M

onsen et al. point out that an increase in Ps produces a large increase in G

lot Am

p, but only a m

oderate increase in F0. Similarly an increase in V

CT produces a large increase in F0, and a

small decrease in am

plitude.

The acoustic results of any contribution of Ps and VC

T settings can be read off along the horizontal axis w

hich shows F0 in 20 H

z increments from

80 to 300 Hz, and along the

vertical axis which show

s Glot A

mp in 1 dB

increments. For exam

ple, at 60% duration for

tone 1, ZPW has a m

ean F0 value of 148.6 Hz and a m

ean Ar value of 23.9 dB

. These have been plotted and are m

arked on the map w

ith an ‘X’. From

this the VC

T can be found to be 0.87, and the Ps to be 8.0 cm

. Similarly for the m

ap of female dim

ensions, an ‘X’ m

arks the 40%

duration for LY’s tone 1, w

here the mean F0 value is 200 H

z and the Ar value is 25.8 dB

. The V

CT is determ

ined to be 1.05 and the Ps to be 8.48 cm.

C

HA

PTER 5

64

I have applied these maps to m

y F0 and Ar data to find out the contribution of Ps and

VC

T to changes in Ar and F0 in Fuzhou citation tones. There are, how

ever, a few differences

between the tw

o approaches which m

ust be made explicit so that the reader is aw

are of the lim

itations of my application of this m

odel.

The amplitude m

easurements I m

ade are of RM

S radiated amplitude (A

r), differing from

those used by Monsen et al. w

ho use RM

S glottal amplitude values. W

hen the Rhym

e is m

onophthongal there is not a problem associated w

ith this difference as the supralaryngeal filter can be assum

ed to remain constant throughout the R

hyme and the am

plitude radiated at the m

outh should be a reasonably true reflection of the amplitude generated at the glottis,

allowing for interaction of form

ants and harmonics. H

owever, as has been discussed, the

presence of tonally conditioned vowel alternations in Fuzhou m

eans that one cannot control for all finals to be m

onophthongal except with [a]. The changing shape of the filter, then, is a

possible source of error for my results (high vow

els tend to have a lower am

plitude) and they should be observed w

ith this in mind. H

owever, at least the slight difference is across the

board/for all speakers.

The second problem is associated w

ith the slight change in phonation type associated w

ith some of the tones. This im

plies that there will be differences in glottal aperture associated

with these changes. The M

onsen et al. maps are based on norm

al modal phonation and thus do

not account for differences associated with a change to creaky or breathy voice. H

owever,

when faced w

ith this problem and the problem

of the speakers’ Ar ranges being m

uch greater than the G

lot Am

p range on the maps, I decided to restrict m

y modification of the m

odel to accom

modate prim

arily those values of Ar w

hich were audibly produced in a norm

al phonation type, thus deliberately letting m

any of those values, clearly audibly uttered in a non-m

odal phonation fall below the range on the m

ap. Thus, the Ar w

as aligned with the G

lot Am

p on this basis (see figures 5.1 and 5.2). This, how

ever, introduces two other possible error

sources. The first, that the values of VC

T and Ps are not correct as determined by the m

ap readings, for those values w

ith differing glottal apertures. It should be noted that the possibility of error due to the differing phonation type is an error source that also cannot be controlled for, as the m

odel doesn’t include this possible third dimension of glottal aperture.

This is, however, a desirable dim

ension to be able to capture, and which could be captured by

the original two-m

ass model devised by Ishizaka &

Flanagan (1972). Through controlled experim

ental implem

entation, the third dimension m

ay be added in the same w

ay as the dim

ensions of Ps and VC

T were derived to relate A

r and F0 to the three dimensions of G

lot A

mp, Ps and V

CT. U

nfortunately this is beyond the scope of this work.

The final possible source of error mentioned above is due to the fact that the m

ap has been m

odified slightly without direct reference to the tw

o-mass m

odel to obtain precise values. V

alues obtained from areas outside of the original m

ap as presented by Monsen et al.

are thus subject to error, albeit minim

al. One last point to note is that the m

aps are based on vocal cords of typical dim

ensions, and so the very different ranges of the male speakers (95–

135 Hz for W

XQ

, 70–170 Hz for ZPW

) probably reflects different length and mass of their

vocal cords. Therefore as the models are based on typical vocal cord dim

ensions, the results for one of the m

ale speakers are likely not comparable w

ith the other speaker. It is for this reason that the m

ale speakers will be excluded. Instead, w

e focus here on the two fem

ale speakers.

T

HE PH

YSIO

LOG

Y O

F TON

E PRO

DU

CTIO

N IN

FU

ZHO

U

65

Figure 5.1. Modified m

ap from M

onsen et al. (1978). Data synthesized for vocal folds of m

ale dim

ensions.

C

HA

PTER 5

66

Figure 5.2. M

odified map from

Monsen et al. (1978). D

ata synthesized for vocal folds of fem

ale dimensions.

5.4 Ar and F0 relationship

Before relating the acoustics to the physiology of tonal production, I first dem

onstrate the positive relationship that holds betw

een the F0 and the Ar.

I do not present all the data here. I have chosen to present some of the results for the

female speakers as they m

ost clearly indicate the relationship and are most com

parable due to

T

HE PH

YSIO

LOG

Y O

F TON

E PRO

DU

CTIO

N IN

FU

ZHO

U

67

their similar F0 ranges. D

ifferences in the production of tones would be m

ost obvious from a

comparison of highly com

parable speakers. Therefore, I do not present data from the m

ale speakers here because their F0 ranges are not com

parable, as noted above. For the values and plots of each speaker’s m

ean F0 and mean A

r values against equalized duration, I refer the reader to A

ppendices H and I. In this section I com

pare the high, mid and low

tones together and then the high fall and rise-fall tones. These w

ill be termed the static and dynam

ic tones respectively from

their assignment of [–contour] and [+contour] as suggested in section 4.7.

They are plotted together against equalized duration in figures 5.3 to 5.6. The dotted lines indicate the A

r values and the solid lines, the F0 values.

Figure 5.3. A

r/F0 plotted against equalized duration for Tones 1–3, speaker 2 (FM).

160

180

200

220

240

Fundamental frequency (Hz)

Radiated amplitude (dB)

15 20 25 30

2040

6080

100

Percentage of duration of Final

T1T2T3

F0Ar

C

HA

PTER 5

68

Figure 5.4. A

r/F0 plotted against equalized duration for Tones 1–3, speaker 3 (LY).

Figure 5.5. Ar/F0 plotted against equalized duration for Tones 5–6, speaker 2 (FM

).

140

160

180

200

220

Fundamental frequency (Hz)

Radiated amplitude (dB)

18 20 22 24 26 28

2040

6080

100

Percentage of duration of Final

T1T2T3

F0Ar

180

200

220

240

Fundamental frequency (Hz)

Radiated amplitude (dB)

20 22 24 26 28 30 32

2040

6080

100

Percentage of duration of Final

T5T6F0Ar

T

HE PH

YSIO

LOG

Y O

F TON

E PRO

DU

CTIO

N IN

FU

ZHO

U

69

Figure 5.6. Ar/F0 plotted against equalized duration for Tones 5–6, speaker 3 (LY

).

5.4.1 The static tones Figures 5.3 and 5.4 show

the values obtained for the static tones plotted together for each of the fem

ale speakers; that is, tones 1, 2 and 3 for speakers FM and LY

respectively. There are great sim

ilarities in the Ar/F0 relationship betw

een the two speakers. In term

s of relative height, the A

r can be seen to reflect the F0 with respect to the relative position in the speaker’s

range. Tone 1 is consistently higher in the range than tone 2, which in turn is consistently

higher in the range than tone 3 for both F0 and Ar contours.

5.4.2 The dynamic tones

Figures 5.5 and 5.6 refer to the dynamic tones. Tones 5 and 6 show

similar relationships to

tones 1–3 in that the Ar can be seen to reflect the F0 contour to a large extent, especially in

terms of relative height in the speaker’s range. Tone 6 for LY

, however, does not have a rising

Ar. R

ather it is level when the F0 is rising. The A

r contour for tone 5 doesn’t have such a steep fall for LY

as it does for FM, but this is also the case w

ith the F0. Again, a positive

correlation between the A

r contour and the F0 can (mostly) be observed.

5.4.3 Summ

ary of the Ar/F0 relationship

The relationship between the A

r and the F0 was exam

ined for the two fem

ale speakers’ static and dynam

ic (non-stopped) tones. There was found to be a positive correlation betw

een these

160

180

200

220

Fundamental frequency (Hz)

Radiated amplitude (dB)

14 16 18 20 22 24 26

2040

6080

100

Percentage of duration of Final

T5T6F0Ar

C

HA

PTER 5

70

two acoustic dim

ensions for both speakers. This correlation exists in terms of the relative

position (height) in the F0 and Ar ranges of the tones. This relationship is also show

n to hold betw

een the tones as determined by their relationship to one another. It should be noted that

inspection of the male speakers’ graphs also reflects this positive A

r/F0 relationship.

Having found a positive relationship betw

een the Ar and the F0 it is reasonable to

assume that they are probably being produced in the sam

e way. N

ext I investigate the extent to w

hich this is reflected in the physiology of the tonal production by using the values derived for V

CT and Ps from

the Monsen et al. m

aps.

5.5 VC

T and Ps relationship This section exam

ines the values of VC

T and Ps for the static tones for the two fem

ale speakers, as derived from

the Monsen et al. m

aps. Again, sim

ilar values were derived and

plotted together against equalized duration for every speaker and every tone, but I refer the reader to A

ppendix I for these. This section is restricted to the examination of the relationship

between the V

CT and Ps of the static tones w

hose Ar and F0 relationship w

as examined in the

previous section.

On the graphs given in figures 5.7 and 5.8, the dotted lines indicate values of Ps and

the solid lines indicate values of VC

T. When the values of A

r and F0 were beyond the range

of the maps, estim

ates were m

ade by extrapolating the model in the appropriate direction and

estimating an increase or decrease in value from

the preceding point. This is indicated on the graph by the slashed lines. These physiological param

eters are first examined together to

determine the relationship holding betw

een them. In the subsequent sections I discuss w

hat can be inferred about the physiology of the tonal production, and how

this compares w

ith received theories of tonal production.

5.5.1 VC

T and Ps for the females’ static tones

In figure 5.8 it can be seen that LY has a clear relationship betw

een her VC

T and Ps. Tone 1 has the highest V

CT contour, follow

ed closely by tone 2 then tone 3. The same can be said for

her Ps. Tone 3 does actually start at a place higher in the range than tone 2, but by 40% of the

duration the distance relationships parallel those of the VC

T. After this point, w

hile still very sim

ilar, there are small differences in the tw

o contours. This could be due to the slight change in phonation type associated w

ith about half of tone 3 and the end of tone 2 (cf. section 4.2). Like LY

, FM (figure 5.7) also exhibits the sam

e relationships reflected in the relative height differences of the values w

ithin the range.

T

HE PH

YSIO

LOG

Y O

F TON

E PRO

DU

CTIO

N IN

FU

ZHO

U

71

Figure 5.7. V

CT and Ps plotted against equalized duration for tones 1–3, speaker 2 (FM

).

Figure 5.8. V

CT and Ps plotted against equalized duration for tones 1–3, speaker 3 (LY

).

0.8

0.9

1.0

1.1

1.2

1.3

1.4

Vocal cord tension

Subglottal Pressure (cm H20)

4 6 8 10 12 14 16

2040

6080

100

Percentage of duration of Final

T1T2T3

VCT

Ps

0.8

0.9

1.0

1.1

1.2

1.3

Vocal cord tension

Subglottal Pressure (cm H20)

3 4 5 6 7 8 9 10

2040

6080

100

Percentage of duration of Final

T1T2T3

VCT

Ps

C

HA

PTER 5

72

5.6 Ar/F0 vs. V

CT/Ps relationships

Figures 5.1–5.4 clearly demonstrate that the relationship betw

een F0 and Ar is positively

correlated to reflect the relative differences of height within the given range. Sim

ilarly, the values of V

CT and Ps (figures 5.7 and 5.8) show

an identical positive correlation reflecting the sam

e thing.

5.7 Discussion

The acoustic and physiological parameters show

parallel relationships that reflect the same

height distinctions within the given range, reflecting the distinctive tone heights. In section

4.7, tones 1, 2 and 3 were given the distinctive features [+high], [–high,–low

], and [+low]

respectively. This is clearly reflected in the physiological production of the tones.

It was pointed out above in section 5.3 that the received theories say that V

CT is the

primary factor involved in F0 production and that Ps is secondary. From

this, when com

paring the static tones, w

e would expect to find a constant Ps, the differences in F0 being controlled

by the VC

T. Instead, what w

e get is more com

plicated that this. In particular, there are different Ps contours for all of the three static tones.

Indeed, a generally positive correlation between A

r and F0 could reflect extrinsic subglottal involvem

ent (that is, the speaker is increasing F0 by both VC

T and Ps). But it could

also be that with increased tension you get a longer closed phase, w

ith concomitant increase in

intrinsic Ps, as a result of the increased glottal resistance. It is impossible to decide the proper

explanation of this correlation without the appropriate m

odeling.

We can, how

ever, conclude that both VC

T and Ps are important in the production of

tones in Fuzhou. This is an important finding as it is contrary to the received theories of tonal

production, but supports earlier work show

ing a clear F0-Ar relationship (congruence) for

some languages (e.g. Zhenhai; R

ose 1984).

5.8 Summ

ary This chapter has explored the possibility of investigating the acoustic dim

ension of radiated am

plitude to shed light on characteristics of tonal production in Fuzhou. This was done by

applying the Monsen et al. m

ap based on the Ishizaka-Flanagan two-m

ass model, to use

parameters of F0 and am

plitude to determine the degree of involvem

ent of the corresponding physiological factors, V

CT and Ps, in the production of the F0. A

between-speaker

comparison dem

onstrated the relationship between F0 and A

r to be positively correlated, suggesting that the tones w

ere produced in the same w

ay by both speakers. This was

compared w

ith the relationship between the V

CT and Ps, as derived by application of the

Monsen et al. m

aps. The relationship between the physiological param

eters was found to

parallel those of the acoustic parameters. From

this we can see the equal im

portance of both Ps

T

HE PH

YSIO

LOG

Y O

F TON

E PRO

DU

CTIO

N IN

FU

ZHO

U

73

and VC

T in producing tones in Fuzhou. This is an important finding because Ps has

previously been credited with little or not involvem

ent in tonal F0 production.

The next chapter reviews the tone sandhi phenom

ena in light of new data on disyllabic

expressions.

74

75

6

Fuzhou disyllabic tone sandhi

This chapter gives an analysis of disyllabic tone sandhi in Fuzhou based on material collected

from the sam

e four speakers used for the data obtained for the citation tones. Section 6.1 outlines the procedure involved before presenting the results in section 6.2. Section 6.3 briefly sum

marizes som

e current views on tonology, to illustrate the advantages of an autosegm

ental approach. Fuzhou is then analyzed w

ithin this framew

ork in section 6.4. Section 6.5 analyses the tone sandhi in a different, m

ore diachronically motivated approach w

hich has been shown

to be suitable for other Chinese dialects, so I adopt this approach to assess its adequacy to

account for Fuzhou tone sandhi. The two analyses are com

pared in section 6.6.

6.1 Procedure This section first describes the corpus and m

ethod for obtaining the data before presenting the results in the follow

ing section.

6.1.1 The corpus and elicitation The corpus consisted of disyllabic expressions taken from

the Hànyǔ fāngyán gàiyào (漢

語方

言概要

) (1960). This ‘survey of Chinese dialects’ includes a section on Fuzhou (pp. 296–

299), part of which lists fifty-seven different disyllabic expressions found in Fuzhou, allegedly

exhaustive of all the tonal combinations. These are listed in appendix B

and largely consist of disyllabic nouns (w

ith a few nom

inal phrases such as ‘Shang dynasty’ or ‘little box’).

All four inform

ants read the whole set of disyllabic expressions three tim

es, repeating each token once or tw

ice. Finally they read a list of characters that was com

posed of all the different characters used in the disyllabic expressions, so that I could check that all of the input tones in the com

binations were as they w

ere reported to be.

Auditory transcriptions w

ere made of the recordings and w

ere corroborated by a professional phonetician, D

r Phil Rose. Together, w

e listened to the utterances many tim

es to ensure the correct pitch values for each tone. I have ‘translated’ the transcriptions into the typical C

hao number scale of 1–5 (cf. section 1.6).

C

HA

PTER 6

76

6.2 Results

Table 6.1 presents the results. It is important to note that there appeared to be little betw

een-speaker differences in pitch targets for the disyllabic tonal outputs. H

owever, w

hile the auditory im

pressions were rem

arkably uniform, the sandhi patterns show

ed some variation.

Table 6.1 thus summ

arizes all four speakers’ tones. The table shows the pitch values

for both syllables in a given utterance. The tones along the top of the table are the citation tone of the second syllable, and dow

n the left-hand side, the citation tone of the first. A given

disyllabic form w

ill be located in the same row

as the input tone on its first syllable and in the sam

e column as the input tone for its second syllable.

For example, tw

o tone 1s have the output sandhi form [44 44]. The com

bination of tone 4 + tone 6 gives tw

o possible forms, [45 231] and [42 231] for three of the four speakers.

The place in the table corresponding to the combination of tone 1 plus tone 4 has only a

question mark. A

combination of tone 3 + tone 7 has the output values [3 5]. I w

ill remark on

the latter three examples below

.

It will be recalled from

chapter 2 that there is sometim

es more than one sandhi form

for tone 4. This can be seen in m

y data when tone 4 com

bines with any of tones 3, 4, 5, 6 or 7.

Chan (1985) reasoned that it is due to diachronically different final stops; how

ever I do not venture reasons for this difference in term

s of diachrony here. It will be seen to w

hat extent the difference is explicable later in sections 6.4 and 6.5. H

owever, for the other tones there

was not a second sandhi form

following tone 4.

Finally, as

mentioned

in the

procedural section,

after reading

the disyllabic

expressions, each speaker read each character individually, on a separate list, in order that the tones m

ay be checked from their citation readings. From

this, it was found that three of the

forms w

ere not the correct combinations and are therefore invalid. That is, for the tone 1 +

tone 4 combination, the speakers gave m

onosyllabic forms that had som

ething other than tone 1 and/or tone 4. The sam

e was true for the tone 3 + tone 4 and tone 3 + tone 6 com

binations. These gaps have been show

n by a question mark in the table, and are blacked out.

As m

entioned, the combination of tones 3 and 7 has the output of [3 5]. The difference

between w

hat has been represented as [3] and as [33] is that of duration. The tone represented by only a single digit is audibly shorter than the other sandhi tones. I reserve the use of the underlining to indicate those syllables that are short stopped tones.

F

UZH

OU

DISY

LLAB

IC TO

NE SA

ND

HI

77

Syllable 2 →→

(context) Tone 1

[44] Tone 2

[32] Tone 3

[21] Tone 4

[23] Tone 5

[51] Tone 6 [231]

Tone 7 [5]

Syllable 1 ↓↓

Resulting sandhi forms below

Tone 1 [44] 44 44

43 32

34 33

42 21 ?

33 51 42 231

3 5

Tone 2 [32] 21 44

44 51

34 51

44 31

44 41

33 13 33 51

44 231 3 5

Tone 3 [21] 44 44

43 32 42 21

? 44 51

? 3 5

Tone 4 [23] 21 44

22 44

23 31 34 21

42 21

53 13

5 13

21 51

33 51

45 231

42 231

[42 231]

3 5

4 5

Tone 5 [52] 44 44

33 32 21 21

33 13 33 51

21 231 3 5

Tone 6 [231] 44 44

43 33

43 32

42 21 31 13

33 51 32 231

3 5

Tone 7 [5] 44 44

33 32 42 21

31 13

33 13

33 51 33 231

3 5

Table 6.1. Fuzhou tone sandhi form

s.

refers to speaker WX

Q. H

e has three combinations that are different to the other

speakers. Firstly in the combination of tone 1 plus tone 2, he has, as w

ell as the observed [43 32], the form

[34 33] for all tokens, suggesting a lexicalized difference. In the com

bination of tone 4 plus tone 1, rather than falling on the first syllable, the pitch on the first syllable is level, that is, [22 44]. Thirdly, he does not have the variant w

ith the falling pitched sandhi tone of the com

bination of tone 4 and tone 4, just the output form [5 13].

refers to LY. She does not have the falling variant for the com

bination of tone 4 and tone 6 as the other speakers do, rather she only has the output form

[45 231]. Another difference

is that her tone 7 becomes a m

id level [33] before tone 4, rather than [31], as is found for the other speakers.

refers to ZPW. In the com

bination of tone 6 plus tone 2, rather than falling on the first syllable and rem

aining level on the second as is found for the other three speakers and represented in the table by [43 33], ZPW

falls gradually over both syllables, finishing the utterance betw

een 2 and 3 on the given scale, represented as [43 32] (though [43 33↓] m

ay be a better representation).

The final syllable has a pitch contour that mostly m

irrors same as the citation tone. H

owever,

in the combination of tone 2 and tone 3, both FM

and WX

Q appear to have a higher target

peak for the second syllable evidenced by a slightly higher fall on the second syllable (and consequent slight rise on the first syllable and/or short rise or level com

ponent to the fall on the second syllable). This is indicated in the table w

ith / [34 51] and [44 41].

S1 S1

S1

S4 S3

S4

S1 S2

S1 S2

S1

S1 S2

S3

S3

C

HA

PTER 6

78

As m

entioned in chapter 1, tone sandhi in Fuzhou is right dominant. That is, it is the last

syllable in a given domain that rem

ains unchanged. So in a disyllabic expression the second syllable m

aintains its citation form but provides the context for the first syllable that changes

accordingly. This is corroborated by my results. A

ll but two of the expressions evidenced w

hat appears to be the appropriate citation tone in the second syllable position. This can be noted by the vertical orientation of the changes. That is, there is m

ore correspondence between the

sandhi tones given the same tones as context for change, than betw

een the tones that are input for the sandhi changes.

6.2.2 Discussion of the tone sandhi data

I have modified table 6.1 to om

it the second syllable forms, as they have been seen to m

irror the citation tones in all but tw

o of the cases as noted above. The first syllable tones are shown

in table 6.2, which illustrates only those form

s to which the first syllable change. Table 6.2 has

also been reorganized to reflect the groupings of similar output sandhi tones. W

hen the first syllable in disyllabic expressions is tone 4, there are tw

o output sandhi forms in all but a few

cases. Tone 4 has thus been split into tw

o rows in table 6.2, grouping the sandhi form

s with

the other sandhi forms around them

. How

ever, for T4+T1, T4+T3, and T4+T4, there are not tw

o sandhi forms available. I have thus shaded out these cells in the table to indicate this.

The high falling tones found on a first syllable sandhi tone ([42, 43]) have also been grouped together and called [42] as the point of their offset are clearly conditioned by the onset of the follow

ing tone.

The table may be read in the sam

e way as table 6.1, but the tone occurring on the

second syllable must be taken from

along the top of the table where all second syllable tones

can be found, just like the sandhi tables in chapter 2. For example, w

hen tone 6 is on the first syllable, it w

ill change to a [42] before a tone 3. Thus the combination of tone 6 plus tone 3 is

[42 21]. In the bottom right of the table, there is a grouping of tone 2 plus tones 2, 3 and 6,

and tone 4 plus tone 6. These all have [44] as their sandhi tone. The ‘#’ indicates that in two of

these instances, the second syllable tones do not reflect the citation tone, but that, in these com

binations, tone 2 becomes [51], and tone 3, [31].

Note that:

# The second syllable tones 2 and 3 both change to a fall following this tone:

Tone 2 → [51] / tone 2 ____

Tone 3 → [31] / tone 2 ____

The same exceptions as noted in table 6.1 are included here, using the sam

e symbols to

indicate to which speaker they applied and the relevant form

(or lack) in parentheses next to the sym

bol.

F

UZH

OU

DISY

LLAB

IC TO

NE SA

ND

HI

79

Syllable 2 →

(context)

Tone 1

[44]

Tone 5

[51]

Tone 7

[5]

Tone 2

[32]

Tone 3

[21]

Tone 6

[231]

Tone 4

[23]

Syllable 1 ↓↓

output

sandhi tones

below

Tone 3 [21]

? ?

Tone 1 [44]

(34) 42

?

Tone 6 [231] 44

33

31

Tone 7 [5]

33 (5)

Tone 4 [23] n/a

n/a

(n/a) n/a

Tone 5 [52] 44

21

33

Tone 2 [32]

# #

44

Tone 4 [23] (22)

21 44

23 34

(5) 53

Table 6.2. First syllable changes in the disyllabic tone sandhi.

In this table we see extensive neutralization in the sandhi form

s. There are now just eight

different tonal pitch values (ten if you include the exceptions for speakers S1 and S3). There are four level tones: 5, 44, 33, 22 and four falling pitch tonal values: 42, 21, 53, 31. There are tw

o also rises: 23, 34.

Out of the 50 available form

s, the most com

mon form

s are level or falling: [33] (19 cells) and [44] (10 cells) and [42] (11 cells).

Finally, it is worth noting that there is no clear relationship betw

een the sandhi tone and the citation tone of the first syllable.

6.2.3 Com

parison with previous data

This section compares m

y data with som

e of the available published data. In particular, I will

compare the data w

ith the works outlined in chapter 1. To recall, the w

orks are by Chen &

N

orman (1965), C

han (1985) and Yip (1990), and w

ere presented in chapter 1. The sandhi tables em

ployed by each of these authors are tables 2.1, 2.3 and 2.5 respectively. I will

reproduce these here for convenience as tables 6.3, 6.4, 6.5.

S1

S1

S1

S3

S3

C

HA

PTER 6

80

Second syllable →→

Tone 1 [55]

Tones 5, 7 [52, 5]

Tones 2, 3, 4, 6 [22, 12, 24, 342]

First syllable ↓↓

Resulting sandhi tones (on first syllable) given below

Tone 1 [55]

Tone 3 [12]

55 52

Tone 6 [342]

Tone 4 (*<h) [24]

Tone 5 [51]

55

22

Tone 7 [5]

Tone 2 [22]

22 35

Tone 4 (*<k) [24]

Table 6.3. Fuzhou disyllabic tone sandhi form

s (Chen &

Norm

an 1965)

Second syllable →→

Tone 1

[44] H

Tones 5, 7 [51, 5]

HL

Tone 2 [32] L

Tones 3, 4, 6 [213, 13, 131]

LH

(L)

First syllable ↓↓

Resulting sandhi tones (for first syllable) given below

Tone 1 [44]

H

Tone 3 [213]

LH

44

33 53

51 Tone 6

[131] L

HL

H

H

H

L

HL

Tone 4 (*<h)

[13] L

H

Tone 5 [51]

HL

33

22 Tone 7

[5] H

L

L

L

Tone 2 [32]

L@

22

13 44

Tone 4 (*<k)

[13] L

@

L

LH

L

H

Table 6.4. Fuzhou disyllabic tone sandhi form

s (Chan 1985)

F

UZH

OU

DISY

LLAB

IC TO

NE SA

ND

HI

81

Second syllable →→

Tones 1, 5, 7

[44, 52, 4]

Tone 2

[22]

Tones 3, 4, 6

[12, 13, 242]

First syllable ↓↓

Resulting sandhi tones (for first syllable) given below

.

Tone 1 [44]

Tone 3 [12]

44

52

Tone 6 [242]

(13)

Tone 5 [52]

22

Tone 7 [4]

Tone 2 [22]

22

35

Tone 4 [13]

4

Table 6.5. Fuzhou disyllabic tone sandhi form

s (Yip 1990)

It must first be recalled that the m

eans of gathering the data was different for each of the

sources. Chan based her tone sandhi data on the speech of a single inform

ant and Yip bases

her data on those presented in a couple of different sources. The Chen &

Norm

an data were

taken from C

han (1985) who did not m

ention the source of their data. There are clearly discrepancies betw

een the published works: C

hen & N

orman’s data have only four different

sandhi tones (22, 35, 52, 55), Yip show

s four sandhi forms (22, 35, 44, 52) and C

han’s data have six different sandhi tones (though after feature assignm

ent, she has only four contrasts) (22, 13, 33, 44, 53, 51). M

y data, as represented in table 6.2 have eight different sandhi tones (21, 31, 23, 33, 34, 42, 44, 53).

In comparing the sandhi tones, general consensus m

ay be found in comparing the

forms to w

hich the first syllable changes when preceding a tone 1 ([44]); that is, all tones

become a high level [44] or [55] before the high level, except tone 2 and one of the sandhi

forms of tone 4 w

hich become [22] or [21]. These exceptions ([22], [21]) could easily be the

same target tones, transcribed slightly differently. H

owever, apart from

this, the data differ a lot, although there is m

uch more general agreem

ent between the different authors’ w

orks than w

ith mine. Som

e of these points are mentioned below

.

In my data, before tones 5 and 7 all tones have the sandhi pitch value of [33]. The

other authors give a range of pitch values for these sandhi tones ([22], [33], [44] and [55]), but crucially, there are at least tw

o contrasting pitch values for these sandhi tones in these previous studies. B

efore a tone 2, my data evidence the form

s [42, 33, 44, 23]. The other authors, how

ever, have fewer and som

ewhat different changes, such as [52, 53, 35, 13, 22, or

33]. The greatest disparity between the previous descriptions and the data obtained for this

work occurs in the sandhi form

s preceding tones 3, 4 and 6. My data have m

any different changes, and no particular obvious natural classes to describe the changes. A

ll of the other sources, how

ever, group these tones together and show just a few

changes. For example, C

han has the sandhi tones [51, 22, 44] as the three possible different sandhi tones occurring before a low

pitch-onset tone – tone 3, 4 or 6, while m

y data show [42, 31, 33, 21, 44, 5, 34, 53] for the

same sandhi tones.

C

HA

PTER 6

82

It can be seen that the data I have obtained from m

y four speakers is more com

plex than the data from

other sources mentioned here. This is likely the result of having data from

m

ore than one speaker as there is clearly between-speaker variation. A

nd despite the relative hom

ogeneity betw

een speakers,

there are

exceptions as

well,

not all

of w

hich w

ere idiosyncratic. This is im

portant as sandhi data is complicated and variation is a natural

outcome given the different learning environm

ents people have. This variation, however,

should not be ignored, but rather explained by the model.

The complexity of the new

data suggests that such an elegant phonological analysis (e.g. as Y

ip’s (cf. chapter 2)) is unlikely. That is, it is possible that the elegance of the phonological analysis is a function of the relative lack of com

plexities in the observation data (or perhaps phonetic abstraction), so it is interesting to see w

hether such an analysis will w

ork w

ith more com

plex data. I will explore this by analyzing the data w

ithin the same fram

ework

as Chan and Y

ip, that of Autosegm

ental Phonology (see also Zee & M

addieson 1979). H

owever, I first w

ill introduce some different approaches to the phonological representation of

tones in order to more fully dem

onstrate the advantages of Autosegm

ental Phonology.

6.3 Some view

s on tonological representation This section gives a brief overview

of some w

ays of phonologically representing tone, specifically contour tones, defined by M

adison (1977:337) as “a pitch glide which cannot be

predicted naturally from factors such as co-articulation and intonation, …

and cannot be generated by rule from

the environment.”

In 1967, Wang proposed that tones are a property of the w

hole syllable, and that all tones should be considered as units. This gave rise to contour features such as [±rise] or [±fall]. H

owever, despite the advantages this system

may have had in term

s of classifying contour tones, it could not properly account for all of the properties w

hich have since been noted for tones. In particular, that of tone stability and the existence of floating tones. Tonal stability is w

hen there is a change on the segmental level of a language (including segm

ental deletion), but the tones rem

ain unaffected by the change, and floating tones are morphem

es that consist of only tone, and show

up because of the effect they may have on other tones in

word derivations.

An exam

ple of tonal stability comes from

a Bantu language, Lom

ongo. In this language w

e find instances where the segm

entals are reduced or deleted, but the effects of the tones in the underlying form

s remain. For exam

ple, ‘they search’ consists of three morphem

es each w

ith its own level tone indicated by H (high) or L (low

): [ba H] + [as L] + [a L]. When

these morphem

es join together, the two m

edial [a]s simplify to one, but the effect of the tw

o tones is apparent by a falling tone surfacing on the first syllable of the output [basa H

L L]:

[ba – as – a ] ⟶   [ba – sa] (fall+low

)

H L L H

L L

An exam

ple of a floating tone comes from

the Chinese dialect C

antonese. In certain environm

ents such as the “familiar vocative” (Y

ip 1990:65), the tone on the given syllable m

ay change to a high rising tone, without any change in the segm

entals (e.g. 陳

Chan

(surname), low

fall becomes a high rise w

ith a familiar vocative use). A

nother system of

F

UZH

OU

DISY

LLAB

IC TO

NE SA

ND

HI

83

features, proposed by Woo (1969), is also unable to account for these properties, yet unlike

Wang, her proposal assum

ed contour tones to be sequences of level tones. A

n important approach to phonology w

as Autosegm

ental phonology, as originally proposed by G

oldsmith (1976). The m

ain aspect of this approach is that features can occur on tiers separate from

the segments, allow

ing for assimilation of any feature. A

more im

portant consequence of this for tones, how

ever, is that tone features can be viewed on separate tiers,

and thus may be regarded as autosegm

ental. This accounts, then, for phenomena such as the

aforementioned floating tones and tonal stability by enabling rules to act on the unit on any of

these tiers without interfering w

ith the other tiers. For instance, the Lomongo problem

is solved by saying that w

hile the segments are reduced at one level of representation, the tones

are not affected by this process. Sequences of two identical tones are redundant and reduced

by the Obligatory C

ontour Principle (OC

P), which states that “at the m

elodic level, adjacent identical elem

ents are prohibited” (McC

arthy 1986:208). Indeed, tones can associate with one

or more syllables according to som

e general principles to account for a lot of non-linear facts. In the C

antonese example, a floating H

tone is posited as underlying the specification of ‘fam

iliar vocative’, which is associated to the syllable w

hen this specification is realized with

the given syllable.

An autosegm

ental framew

ork is, then, considered a desirable framew

ork for describing tonal languages. H

owever, the features used in tonal description still have to be chosen.

Goldsm

ith notes that features should be able to capture both the classificatory and component

functions of tones; that is, features should be “a way of establishing w

hat a ‘natural class’ in phonological statem

ents will be, …

[and] a way of specifying the several and sim

ultaneous characteristics that com

prise what is, from

the point of view of the flow

of time, a single

articulatory or acoustic event” (1989:274-275). Hym

an (1986) states that a feature system

must be able to account for four contrasting tone heights, and that the features in the system

m

ust “capture the natural relationships that exist between tone heights w

ithin a language” (1986:110). Follow

ing from this H

yman explains that tones that constitute a natural class m

ust also share a feature, and furtherm

ore, that the features must m

ake it easier to explain these natural classes and natural rules, thus reflecting the m

arkedness property of tones.

In her 1980 thesis, which w

as an extensive study on Chinese tonology, Y

ip basically follow

s an autosegmental approach, concluding that “only a theory in w

hich tone is suprasegm

ental, and

contour tones

are represented

by sequences

of level

tones, w

ill satisfactorily account for the observed properties of tone” (1990:21; see also Y

ip 2002). Her

system has already been outlined in chapter 2, though to recall, Y

ip employs a R

egister feature. In accord w

ith universals that have been found for tones as stated by Hym

an, this feature restricts the tonal inventory to four tone heights and a m

aximum

of two of any given

contour. How

ever, Yip (1989) m

odifies her system in term

s of the relationship between the

two binary features proposed for tones: [upper], the R

egister feature, and [high] (now called

[raised] after Pulleyblank (1986)) specifying the Tone or melody. She argues that contour

tones in East Asian languages “show

all the behavior predicted by a theory in which they are

melodic units consisting of a root node [upper] dom

inating a branching specification for [raised]” (Y

ip 1989:171). The evidence comes from

initial tone association, spreading, and O

CP effects, w

hich Yip show

s affects tones as whole units. She also argues that by identifying

[upper] as the tonal root node, the fact that no more than tw

o of any given contour may

contrast underlyingly is explained, as six is the logically possible number of contrasting

contour tones in a system for w

hich there are four tone levels specified. She distinguishes

C

HA

PTER 6

84

between branching tones or single m

elodic units, and tonal clusters. These are best represented diagram

matically, as show

n on a rising tone below:

σ

σ

°

Tonal root level

°

°

L

H

L

H

Branching tone/single m

elodic unit

Tone cluster

Yip also argues that the ‘extra-com

plex’ tones, concave or convex contours, are usually sim

plified utterance-internally, to either rising/falling or level tones, which is the result of

associating two m

elodic units. Two exam

ples of these are taken from Suzhou, w

hich has both a concave and a convex tone. These m

ay be represented as follows:

σ

σ

°

°

°

°

H

L H

L

H

L

The branching tone appears on the left-hand side of the complex tone, because phonological

behavior shows this to be the correct grouping. For exam

ple, the concave tone /HLH

/, when

spread over two syllables, evidences a fall on the first syllable and a m

id-high tone on the second, suggesting [H

L.H]. Sim

ilarly, the convex tone /LHL/, evidences a rise on the first

syllable and a low tone on the second, [LH

.L] (Yip, 1989:155).

In the next two sections, I analyze the tonological system

in Fuzhou in two w

ays. The first w

ill be using an autosegmental approach. A

nother, more diachronically m

otivated analysis w

ill be explored and then compared w

ith the autosegmental analysis to see w

hich m

ay better account for the tonal alternations found in Fuzhou.

6.4 Fuzhou tonology: Autosegm

ental In this section I explore the capability of A

utosegmental Phonology to m

odel the sandhi forms

found in disyllabic expressions in Fuzhou. Although this has already been done by a few

scholars, including C

han (1985) and Yip (1990), w

e observed in section 6.2.3 that the data from

which they w

orked was quite different to that w

hich I have obtained, and relatively sim

pler. Thus simple, elegant solutions w

ere facilitated. I will explore the extent to w

hich the

F

UZH

OU

DISY

LLAB

IC TO

NE SA

ND

HI

85

methods used to derive these solutions w

ill be as elegant in the face of some greater

complexities.

In accordance with the generative approach, I posit underlying form

s and phonetically plausible rules to derive the correct surface form

s. I employ tw

o binary features to describe and account for the tones in the w

ay which best captures w

hat can be determined to be natural

classes. I also believe it is important to be able to account for the individual and betw

een-speaker variations w

hich were found in the data, along w

ith the other possible tonal variations (i.e. tone 4). A

lthough the data is complex, no m

atter what can be explained in term

s of diachrony the fact rem

ains that all the four speakers nearly all produced the same form

s for each of the sandhi tones. Thus the com

plexity is not an accident and must be accounted for in

the synchronic phonology of Fuzhou. How

ever, whether the changes are really the result of

rule-governed behavior (and if so whether or not they are generative rules), or the result of

mem

orized tone shapes or substitution needs to be investigated further.

6.4.1 Tonal feature assignment

Following Y

ip, each of the tones is made up of tw

o features: Register and Tone. M

y tonal feature assignm

ent also follows Y

ip in that the [upper] feature is the tonal root node, and that the ‘extra com

plex’ tone, [231], is the result of associating two m

elodic units. For ease of exposition, I abbreviate the Tone feature [±high/raised] to sim

ply H or L. A

s noted above, it is desirable that the tonal features capture natural classes that exist in the system

. So I will

primarily assign features on the grounds of the tone’s phonological behavior, and secondarily

on its phonetic form. H

owever, com

pared with the other published data on the tone sandhi (cf.

section 6.3), there are not many natural classes in the data.

The most apparent natural class is that of tones 5 and 7, before w

hich all tones become a [33].

This natural class is captured by both of these tones having the same feature assignm

ent, [+upper, H

L].

Tones 5 and 7

σ

[+upper]

H

L

Tones 5 and 7 are then distinguished from each other by a rule that deletes the last branching

Tone in a syllable in the presence of a final glottal stop:

C

HA

PTER 6

86

σ

°

X →

Ø / _____ʔ

X

This rule also applies to tone 4, in deriving the low-rising stopped tone from

/LHL/.

Tone 6 is a convex tone in the lower half of both the F0 range and the pitch range. Thus it

should have the features [–upper, L(HL)]. This is then, follow

ing Yip (1989), a com

bination of tw

o melodic units the second of w

hich is branching, structured as follows:

Tones 4 and 6

σ

[–upper] [–upper]

L H

L

Tone 4, like tone 7, undergoes final Tone deletion in the presence of a final glottal stop, and has the sam

e feature assignment as tone 6, [–upper, L(H

L)].

The assignment of features to tone 1 is non-problem

atic, as it is basically level and high. Therefore it has to be in the upper register, and the Tone feature has to be H

.

Tone 1

σ

[+upper]

H

The mid level/fall tone 2 could feasibly be [+upper, L], [–upper, H

], or [–upper, HL] O

n the basis of its pitch shape. Tones 2 and 5 have the sam

e sandhi tones before tone 4, just as tones 2 and 4 have the sam

e sandhi tones before tones 1 and 6. I suggest that it is the shared feature sequence of H

L that is the comm

on denominator in these cases, and w

ill assign tone 2 the features [–upper, H

L].

F

UZH

OU

DISY

LLAB

IC TO

NE SA

ND

HI

87

Tone 2

[–upper]

H

L

Tone 3 has been given the features [–upper, L]. This is because if tone 2 is assigned the features [–upper, H

L], then tone 3 must be featurally distinct in order to be distinguished from

it.

Tone 3

σ

[–upper]

L

To recap then, the tonal feature assignment is as show

n in table 6.6:

Pitch Tone

[+upper]

Pitch Tone

[–upper]

[44] T1

H

[32]

T2 H

L

[51], [5] T5, T7

HL

[21] T3

L

[23], [231] T4, T6

L(H

L)

Table 6.6. Fuzhou tonal feature assignm

ent

The sandhi table may now

be modified to show

the possible features of the sandhi tone in question as w

ell as the pitch values. I have done this by putting upper register in the upper case letters of the corresponding tone features, and those in the low

er register in lower case.

The same exceptions are noted:

# The second syllable tones 2 and 3 both change to a fall following this tone, i.e.

Tone 2 → [51] / tone 2 ____

Tone 3 →

[31] / tone 2 ____

C

HA

PTER 6

88

Second syllable →

T 5, 7 H

L

Tone 1 H

Tone 2

h l Tone 3

l Tone 6

lhl Tone 4

lhl First Syllable ↓↓

O

utput sandhi tones below

Tone 3 (h)l H

/__T5

? ?

Tone 1 H

LH H

L

?

Tone 6 lhl

hl Tone 7 H

L

h H

h

h

h

Tone 4 lhl

n/a n/a

*HL

n/a

Tone 5 HL

hl

h

Tone 2 hl

#

# H

Tone 4 lhl H

/__T7 hl /__T5

hl h

LH

4

HL

Table 6.7. Possible feature assignm

ent to the sandhi tones.

Notes on the feature assignm

ent for the sandhi tones:

i. [44] is [+upper, H

], exactly the same as the citation tone 1.

ii.

All sandhi tones [33] have been specified as [–upper, H

], as has the sandhi tone [23]. Yip

(2002) notes that mid level tones could equally be [+upper, L] or [–upper, H

].

iii. The features [–upper, HL] have been given to tones w

ith values of either [31] or [21].

iv. The tones with pitch values of [34] and [53] have been specified as [+upper LH

] and [+upper H

L], respectively. v.

In the bottom left of the table, it can be seen that tone 4 w

ill become [+upper H

] (=[44]), before a tone 5.

vi. Recall the individual exceptions, indicated on the table w

ith S1 and S3:

a. S1: T1 + T2 = H

+ hl ⟶ LH

hl b.

S3: T7 + T4 = HL + lhl ⟶

h lhl c.

S3: T4 + T6 = lhl + lhl ⟶ H

lhl only; no second option of HL lhl (*H

L) N

ext I

present the

rules needed

to derive

the surface

forms

from

these underlying

representations.

6.4.2 Rules and derivations

In this section I propose rules to derive surface form

s. These rules will be organized into tw

o parts; firstly the rules for deriving the correct Tone features, then those for deriving the correct R

egister. I propose only five rules to account for most of the Tone feature derivations, as w

ell

S1

S3

S3

F

UZH

OU

DISY

LLAB

IC TO

NE SA

ND

HI

89

as using the phonetic rules mentioned in section 6.2.2 for sm

all differences in surface forms,

like the offset of the fall, [42, 43]. The five rules are:

Tone R

ule 1: Tone deletion.

All features on the tone are to be deleted before a tone w

ith the upper Register feature.

(= TR

N)

[±upper]

→ Ø

/ _______ [+upper] T

one Rule 2: Sandhi H

docking.

The floating H on tone 3 attaches to the TR

N.

Tone R

ule 3: Final L deletion.

If the most rightw

ard branch of a TRN

is an L ([–raised]), delete it when in non-final (sandhi)

position.

TRN

but

TRN

H

L → Ø

L

Tone R

ule 4: Com

plex tone simplification.

Should a tone have two TR

Ns, delete the m

ost leftwards TR

N w

hen in non-final position.

σ

Ø ←

° °

L H

L

Tone R

ule 5: Tone association.

Left-to-right association of melody: associate the tone m

elody feature from the second syllable

to the first.

σ

σ

X

Y

One very general exception is that rule 5 is never applied to tone 2, that is, it never associates

with the follow

ing syllable. There is also the application of the phonetic rule, as presented

C

HA

PTER 6

90

above (most recently below

table 6.6) to account for small surface difference in the offset

point of the falling sandhi tone [43, 42].

While these rules w

ill correctly derive the surface Tone features, the situation is not as sim

ple for the Register feature. This is because there are necessarily both upper and low

er register features in the output sandhi tones, given that there are form

s such as [53] and [21]. H

owever, the predictability of w

hich feature will occur on the sandhi tone is not as easy as

taking the feature from either syllable as there are cases w

hen the feature may be that of the

first syllable, the second, both or neither (i.e. T2 + T1 has the [–upper] feature of the first syllable for the sandhi tone; T5 + T6 has the [–upper] feature of the second syllable for the sandhi tone; T6 + T4 both have the feature [–upper], as does the sandhi tone; and T3 + T2 both have the [–upper] feature underlyingly, but the sandhi tone surfaces w

ith the [+upper] feature). The m

ajor natural classes of context tones for sandhi are listed first, because they determ

ine the output value of the register feature of the sandhi tone as context:

A

ll sandhi tones →

[+

upper] before a tone 1

A

ll sandhi tones →

[–upper] before tones 5 and 7

How

ever, for the remaining changes of values for the R

egister feature on the sandhi tone, generalities are m

ost easily seen in terms of the input tones:

Tone 1 →

[+

upper]

Tone 2 →

[+

upper]

Tone 3 →

[+

upper]

Tone 4 →

[+

upper], except: T4 + T2

[–upper]

Tone 5 →

[–upper]

Tone 6 →

[+

upper], except: T6 + T4

[–upper]

Tone 7 →

[–upper],

except: T7 + T3 →

[+

upper]

The generality seems to be for the tone to be in the upper register, but this does not apply

across the board. This is a crucial difference between Y

ip’s data set and the one presented here: it is not possible to claim

that all sandhi tones have the Register feature [+upper] in

sandhi position with these data. H

owever, I w

ill generalize the above rules to the following.

For [+upper] tones, there are clear generalizations:

� The sandhi tone in an expression containing a tone 1 as either input or context is [+

upper] �

The sandhi tone in an expression containing tone 5 or tone 7 as either input or context is [–upper]. Exception: tone 7 + tone 3 yields a [+upper] sandhi tone.

The resulting Register values for the sandhi tones for all tones as input tones are listed below

.

Register Rule 1: A

ll tones →

[+

upper]

but, as mentioned above,

F

UZH

OU

DISY

LLAB

IC TO

NE SA

ND

HI

91

Register Rule 2: B

efore Tones 5, 7 →

[–upper]

Three exceptions to these rules must be noted:

Register Rule 3: Tone 4 + Tone 2

[–upper]

Register Rule 4: Tone 6 + Tone 4

[–upper]

Register Rule 5: Tone 7 + Tone 3

[+upper]

I illustrate the application of the relevant rules by giving six sample derivations before

presenting the problems involved w

ith this analysis.

1. Tone 3 + Tone 1 (退婚

[tuì hūn] ‘annulment’)

[21] + [44] ⟶ [44 44]

Rule 1: Tone deletion

σ

σ

σ

σ

[–upper] [+upper]

[+upper]

L

H

H

Rule 5: Tone association

σ

σ

H

Register Rule 1: all tones→[+upper]

Surface form: /H

H/

[44 44]

σ

σ

H

H

Note: The exception is the com

bination of tone 2 and tone 1, where the resulting sandhi form

of tone 2 is [21], not [44]. I propose that this is due to tone 2 failing to lose its features.

C

HA

PTER 6

92

2. Tone 4 + Tone 3 (答應

[dā yìng] ‘promise’)

[23] + [21] ⟶ [42 21]

[34 21]

Variant 1

σ

σ

L

H L

L

Rule 3: Final L deletion

σ

σ

L

H L →

Ø

L

Rule 4: Com

plex tone simplification

σ

σ

Ø ←

L H

L

Rule 5: Tone association

σ

σ

H

L

Register Rule 1: all tones→[+upper]

Surface form: /H

L l/

[42 21]

σ

σ

H

L

L

F

UZH

OU

DISY

LLAB

IC TO

NE SA

ND

HI

93

Variant 2

σ

σ

L

H L

L

Rule 3: Final L deletion

σ

σ

L

H L →

Ø

L

Rule 4: Com

plex tone simplification – does not apply here.

Rule 5: Tone association – n/a due to nonapplication of Rule 4

Register Rule 1: all tones→[+upper]

Surface form: /H

L l/

[34 21]

σ

σ

L H

L

C

HA

PTER 6

94

3. Tone 4 + Tone 6 (不憤

‘careless [Fuzhou]’) [23] + [231] ⟶

[42 231]

[42 231]

Variant 1

σ

σ

L

H L

L

H L

Rule 3: Final L deletion

σ

σ

L

H L →

Ø

L H

L

Rule 4: Com

plex tone simplification

σ

σ

Ø ←

L H

L H

L

Rule 5: Tone association

σ

σ

H

L H

L

Register Rule 1: all tones→[+upper]

Surface form: /H

L lhl/

[42 231]

σ

σ

H

L

L

H L

F

UZH

OU

DISY

LLAB

IC TO

NE SA

ND

HI

95

Variant 2

σ

σ

L

H L

L

H L

Rule 3: Final L deletion

σ

σ

L

H L →

Ø

L H

L

Rule 4: Com

plex tone simplification

σ

σ

Ø ←

L H

L H

L

Rule 5: Tone association – did not apply

Register Rule 1: all tones→[+upper]

Surface form: /H

L lhl/

[42 231]

σ

σ

H

L

L

H L

Note: S3 (LY

)’s failure to produce the variant with the falling pitched sandhi tone suggests

that she has generalized the absence of Rule 5 in this com

bination to all instances of tone 4 plus tone 6.

C

HA

PTER 6

96

4. Tone 5 + Tone 2 (蘋果

píng guŏ ‘apple’) [52] + [32] ⟶

[33 32]

σ

σ

H

L

H L

Rule 3: Final L deletion

σ

σ

H

L → Ø

H L

Rule 5: Tone association

σ

σ

H

H L

Register Rule 2: tones 5, 7→[−upper]

Surface form: /h hl/

[33 32]

σ

σ

H

H L

F

UZH

OU

DISY

LLAB

IC TO

NE SA

ND

HI

97

5. Tone 7 + Tone 4 (物質 [w

ù zhì] ‘material’)

[5] + [23] ⟶ [42 23]

σ

σ

H

L

L

H L

Rule 3: Final L deletion

σ

σ

H

L →

Ø

L H

L

Rule 5: Tone association

σ

σ

H

L

H L

Register Rule 2: tones 5, 7→[−upper]

Surface form: /hl lhl/

[42 23]

σ

σ

H

L

L

H L

Note: LY

again failed to apply rule 5, thus she has a surface form of /h lhl/.

C

HA

PTER 6

98

6. Tone 2 + Tone 3 (解放

[jiě fang] ‘emancipate’)

[32] + [21] ⟶ [44 21]

σ

σ

H

L

L

Rule 3: Final L deletion

σ

σ

H

L → Ø

L

Rule 5 does not apply when tone 2 is the input tone.

Register Rule 1: all tones→[+upper]

Surface form: /H

hl/ [44 21]

σ

σ

H

L

Above I have given som

e sample derivations to illustrate the application of the rules and

derivations. I shall now indicate w

hich combinations cannot be correctly derived given this set

of rules.

6.4.3 Explaining the variation There are a few

cases where the application of these rules deviates slightly. These are

presented below, before the discussing solutions to w

hat seem to be problem

s.

F

UZH

OU

DISY

LLAB

IC TO

NE SA

ND

HI

99

Explanations of som

e of the sandhi forms:

i. The com

bination of tone 3 and tone 2 /(h)l + hl/, fails to undergo rule 3 (Final-L deletion), resulting in H

L.

ii. Tone 4 does not undergo rule 1 w

hen preceding tones 1 and 5 (resulting sandhi form of

hl. No other rules apply except the com

plex tone simplification. (There is another

T4+T5 that is derived following the rules as expected, w

ith a sandhi form h.)

iii. Tone 4 + Tone 7 can result in the sandhi tone as H

or hl. hl is straightforwardly derived

from the rules given. H

can be explained as the spread of register with the tone feature

in rule 5 (Tone association). iv.

The combination of tone 5 and tone 4 produces a /h/, rather than an expected /hl/. This

may be explained in term

s of rule 5 (Tonal association) failing to apply. The com

bination of tone 7 and tone 6 also fails to undergo rule 5. v.

Tone 6, when preceding tone 2, fails to undergo rule 3 (L deletion), thus barring the

application of rule 5, due to the tonal root node already having two features attached to

it, resulting in HL

. vi.

Tone 7

+ Tone

4 is

produced as

H

or H

L before

lhl. The

H

is produced

straightforwardly from

the rules. The HL results from

not applying rule 3 (final L deletion).

vii. Tone 1 + Tone 2 is the only really problem

atic case. H + hl, results in sandhi form

HL,

but LH for speaker 1. B

oth forms differ from

the expected H sandhi form

. This could be through associating the tonal node (before register) for the H

L, thus copying both tonal features, and undergoing m

etathesis for S1.

Problems:

The forms on the second syllable in the com

binations of tone 2 and either tone 2 or tone 3 appear to be a problem

. To recall, these are the combinations for w

hich the second syllable is not that of citation, but rather tone 2 becom

es a [51], and tone 3, a [31]. I suggest that in the com

bination of tone 2 plus tone 2, having undergone normal rule application the R

egister value for the sandhi tone spreads onto the second syllable, thus changing the citation tone to [+upper] register also, giving /H

HL/. A

s for tone 2 and tone 3, whose output is [44 31], I

suggest that normal rules apply, and that the rise from

[21] to [31] on the second syllable tone 3 is phonetically conditioned. A

lternatively, it could be suggested that the floating ‘H’ on tone

3, which norm

ally docks in non-final position, underwent the docking rule in final position.

6.5 Fuzhou tonology: Categorical approach

It has been shown (e.g. B

allard 1980) for the Wu dialect group of C

hinese, just north of Min,

that many of the alternations in tone sandhi are not phonetically m

otivated and that ordered rules only serve to “com

plicate the description of the synchronic phenomena and obscure the

diachronic relationships among the gram

mars of these dialects” (B

allard 1980:83). Ballard’s

solution to this, that I have called the ‘categorical approach’, involves expressing the sandhi changes in term

s of abstract diachronic categories.

As w

as mentioned in chapter 1, by the Sixth C

entury AD

, Chinese already “exhibited

four classes of morphem

es (monosyllables) distinguished solely by tone and/or syllable final

C

HA

PTER 6

100

consonants: ping ‘level’, shang ‘rising’, qu ‘departing’, and ru ‘entering’ ” (Ballard 1980:84).

The first three of these groups end in sonorant segments, and the fourth group ends in a

voiceless stop. A split then occurred, w

hereby each tone developed two allotones, one higher

(yin) and one lower (yang), the environm

ent being determined by a voiceless or a voiced

initial consonant

respectively. Since

then, m

any dialects

have lost

the initial

voicing distinction, and the final stops, thus giving rise to tonal splits. A

s well as this m

any mergers

have subsequently occurred. Nonetheless, it has been found that tone sandhi operates in term

s of these diachronic categories, defining natural classes as either input or as contexts for sandhi, despite the phonetic dissim

ilarity of their values in citation. An exam

ple of this is taken from

the Wu dialect, W

enzhou which has tw

o identifiable natural classes in disyllabic lexical tone sandhi: 1. {high fall, low

level} and 2. {mid fall, m

id level}. While phonetically

these do not appear to be readily identifiable natural classes, in terms of diachronic categories,

these are the 1. Qu and 2. Ping tones (R

ose 2004). Specifically, in his 2004 study, Rose

showed that the Ping tones form

a natural class on the second syllable of disyllabic words as

conditioning environments for first syllable changes. The Q

u tones undergo the same changes

on the second syllable. First, the Ping tones include Ia and Ib, and the Qu tones are 3a and 3b

and the phonetic forms of the tones in W

enzhou are given in table 6.8.

Ping

Shang Q

u R

u

Yin

Ia [33] IIa [34]

IIIa [51] IV

a [3312]

Yang

Ib [331] IIb [114]

IIIb [222] IV

b [2212]

Table 6.8. M

iddle Chinese tone categories/ W

enzhou tonal contours

(1) N

atural class for conditioning the form on the first syllable (R

ose 2004:238)

Ia + Ia feɪ tsz̩ʔ

[32 33] ��

‘aeroplane’

Ib + Ia nĩ tɕ

hɐŋʔ [32 33]

��

‘young’

Ia + Ib t hi dɔ

[21 11] ��

‘paradise’

Ib + Ib beɪ dʑaʊ

[21 11�]

‘ball’

(2) N

atural class undergoing same changes on the second syllable (R

ose 2004:240)

Ia + IIIa ˈts hɐŋ ts hěʔ

[22 4] �

‘vegetable

Ib + IIIa ˈbeɪ ts hěʔ

[11 4] ��

‘tem

per

Ia + IIIb ˈsa dʊ̌ŋʔ

[22 4] ��

‘cave

Ib + IIIb ˈdʑjo dø̌ʔ

[11 34] �

‘silks and satins’

The data show that the phonetically im

plausible grouping of [51] and [222] are a clear natural class in (2) and that the m

id level and mid fall group together, to the exclusion of the other

mid tones [34], [3312], as a natural class for determ

ining sandhi tones.

F

UZH

OU

DISY

LLAB

IC TO

NE SA

ND

HI

101

In his work the W

u dialect, Shaoxing, Ballard (1980:110) captures som

e key (and phonetically opaque) phonological behavior through the use of categories. B

allard (1980:143) takes this to be an indication of the psychological reality of tonal categories in the m

ind of the native speaker. H

e proposes writing first in term

s of categories, not phonetic values. This is reinforced by the fact that the sandhi does not m

ake phonetic sense, although conversely, it can be seen that som

e of the simple categorical shifts have been “perturbed by phonetic shifts

towards phonetic naturalness and plausibility” (1980:144). B

allard thus proposes that tone sandhi phenom

ena, for Wu dialects at least, are m

ost elegantly described in terms of tw

o types of phenom

ena: categorical shifts and realization rules, a finding corroborated by Rose (2004)

for Wenzhou.

I apply this approach to the Fuzhou data, to see to what extent, if at all, Fuzhou tone

sandhi operates in terms of categories, and a sim

pler, more elegant account of the tone sandhi

can be achieved, as it can for Wu. This follow

s in the next section.

6.5.1 Categorically Fuzhou

This section presents an analysis of Fuzhou tone sandhi using the categories as input and context for the rules. I w

ill reproduce the sandhi table, but this time using the categories.

Firstly, though, I will show

to which categories the different tones belong.

Ping

Shang Q

u R

u

Yin

Tone 1 [44] Tone 2 [32]

Tone 3 [21] Tone 4 [23ʔ]

Yang

Tone 5 [51]

Tone 6 [231] Tone 7 [5ʔ]

Table 6.9. Fuzhou tones and traditional M

C categories

Note that the Y

angshang tone did split, and that the resulting Yangshang m

orphemes w

ith obstruent initials m

erged tonally with Y

angqu, and the Yangshang m

orphemes w

ith initial sonorants m

erged again with Y

inshang (Norm

an 1988:240). The sandhi table can now be

illustrated thus:

C

HA

PTER 6

102 Second syllable →

Ping Shang

Qu

Ru

Yin

Yang

Yin

Yang

Yin

Yang

First syllable ↓↓

Resulting sandhi tones (on first syllable) given below

Ping Y

in 44

33 43

42 42

? 3

Y

ang 44

33 33

21 21

33 3

Shang 21

33 44 #

44/33 #

44 33

3/2

Qu

Yin

44 44

43 42

? ?

3

Y

ang 44

33 43

42 42

31 3

Ru

Yin

44/21 33

23 43/34

42/44 53

3/4

Y

ang 44

33 3

42 33

31 3

Table 6.10. Fuzhou tone sandhi organized in term

s of MC

tonal categories.

# The second syllable tones 2 and 3 both change to a fall following this tone.

Shang → Y

ang Ping / Shang ____

Yin Q

u → [31] / Shang ____

The tone sandhi rules will be expressed below

, in a list. They are not ordered rules. I have used the categories for both the inputs and conditioning contexts, but I have put the output sandhi tones in rather broad term

s such as ‘level’ and ‘fall’, which are further specified at the

end of the rules. I will speculate on the possible tone category shifts, that is, to w

hich of the citation tones the output sandhi tones m

ay be related.

First syllable changes:

1 Ping

level / _____ Ping, Shang, R

u

fall / _____ Q

u

2 Shang

fall / _____ Y

in Ping

level / _____ Yang Ping, Shang, Q

u, Ru

3 Q

u

level / _____ Ping, Yang R

u

fall / _____ Shang, Q

u, Yin R

u

4 R

u

level / _____ Ping, Yang R

u

fall / _____ Q

u, Yin R

u

5 Y

in Ru

rise / _____ Shang

6 Y

ang Ru

level / _____ Shang

F

UZH

OU

DISY

LLAB

IC TO

NE SA

ND

HI

103

Second syllable changes:

7 Shang

fall / Shang _____

8 Y

in Qu

fall / Shang _____

To derive the correct place in the pitch range for the different sandhi tones, the following rules

must apply.

First syllable:

1. W

hen Ping is the context, its Yin/Y

ang distinction determines the relative pitch height,

except for the Yin Q

u which is alw

ays high.

2. W

hen Qu is the context, it is the Y

in/yang distinction of the input tone which determ

ines the relative pitch height for the ping tones

3. W

hen Ru is the context, the only difference in pitch height realization is w

hen they are follow

ing the other Ru tones, in w

hich case it is the Yin/Y

ang distinction of the input tones w

hich determines the relative pitch height.

4. W

hen Shang and Qu are the context for the input Shang tone, the values are relatively

higher than when Ping and R

u are the context. As a context for Shang, Y

in Ping results in the low

est value of the output sandhi tone.

Second syllable:

1. W

hen shang provides the context for the second syllable to change, the shang will have a

much higher sandhi tone than the qu.

These are the rules that can produce the different sandhi tones, as seen in table 6.10. Next I

will speculate on the possible citation tones w

hich could replace the very general terms of

‘level’ and ‘fall’.

That Yin Ping is substituted for the higher level tones is non-problem

atic. The lower

level tone [33] could arguably be identified with the m

id tone, Shang. The falls ([42, 43]) m

ust be equated with the Y

ang Ping tone, both on the basis of its phonetic similarity and

because it is the only falling tone in the citation tone system. The rise [23] should probably be

likened to the Yin R

u tone, as this is the only rising tone, and as this is the sandhi form of the

Yin R

u, it is very likely that the tone didn’t change at all, other than losing its final stop. I will

now present these rules rew

ritten in terms of categorical shifts below

.

First syllable changes:

1 A

ll tones →

Y

in Ping / _____ Y

in Ping

Shang

/ _____ Yang Ping, Y

ang Ru

C

HA

PTER 6

104

2 Y

ang Ping →

Y

in Qu

/ _____ Qu

Shang

/ _____ Shang, Ru

3 Shang

Shang / _____ Y

in Qu, R

u

Y

in Ping / _____ Y

ang Qu, Shang

4 Y

in Ping,

Qu, Y

in Ru

Yang Ping

/ _____ Qu, Shang

5 R

u

Yang Ping

/ _____ Yin Q

u

6 Y

ang Ru

Shang / _____ Shang, Y

ang Qu

Y

in Qu

/ _____ Yin R

u

7 Y

in Ru

Yang Ping

/ _____ Yin R

u

Y

in Ru

/ _____ Shang Second syllable changes:

8 Shang

Yang Ping

/ Shang _____ The realization of these tones is as can be read from

the sandhi table, with just a few

more

specific ones rules are as follows:

Y

ang Ping →

[51]

/ X _____

[42] / _____ X

[2] →

[3]

/ 4 __# 3 X

Y

in Qu

[31] / [44] _____

While this generates the correct form

s it is not otherwise very revealing. In the follow

ing section I com

pare the two analyses of Fuzhou tonology to see w

hich of the two approaches

better accounts for the data.

F

UZH

OU

DISY

LLAB

IC TO

NE SA

ND

HI

105

6.6 A com

parison of the approaches It is hard to com

pare these two approaches as they are so different in their aim

s and consequently in their m

ethod. How

ever, I shall assess them on w

hat is a primary criterion for a

phonological analysis, and that is simplicity. N

atural classes as they are conceived of by A

utosegmentalists are of great im

portance to the phonology. How

ever, ‘natural class’ has a different significance in a categorical analysis, as the natural classes are presum

ed to correspond to the tonal categories, w

hich form the basis of the analysis. N

onetheless, the natural classes that are in m

y data, and how w

ell the phonologies capture these, will also be

considered in the comparison.

One really positive aspect of the autosegm

ental approach is that it can capture the natural class consisting of tones 5 and 7, as they group as context for the sandhi, and for R

egister features in sandhi. This is nicely accounted for in terms of them

sharing the same

features. The categorical approach cannot do this.

In terms of sim

plicity, neither are really good. Both the autosegm

ental analysis and the categorical analysis have a lot of rules. The form

er, however, captures m

ore generalities and the m

ajor natural classes in the data. This is done by being able to explain the natural classes in term

s of shared features and having five rules for each of the tonal features which applies to

all of the underlying representations before they surface. This analysis can also explain the few

between-speaker differences still in term

s of the application or not of these rules. These explanation of these differences are treated as further rules. The m

ain drawback of this

analysis is that the Register feature cannot be analyzed m

ore neatly.

While not having any “further rules”, the categorical approach has a longer list of

different rules. This approach, however, show

s the neutralization clearly. For example, w

hen all tones becom

e a [44] before another [44], this can be stated as all tones becoming a tone 1

before another tone 1. Whilst this is m

ore appealing to our idea of what seem

s to be going on, the difference is in the level of abstraction from

the observed data: the generative model posits

changes to underlying forms to account for the surface form

s, while the categorical m

odel sim

ply replaces tones.

It is certainly appealing to be able to generalize that all tones do something as this is

what w

e can easily see on the surface, in other circumstances (i.e. w

hen working in a different

framew

ork which aim

s to represent the competence, not just the perform

ance of the speaker) it m

ay be better to say that there are ‘X’ num

ber of rules which are used to derive the sandhi

tones, the generalization being that they all apply where they can. W

hich analysis is preferable, then, depends on the aim

s of the description and thus lies with the goals of the

researcher (and thus the choice of the theory).

6.7 Summ

ary and conclusion In this chapter, I presented the data for the disyllabic tone sandhi in Fuzhou. It w

as found that, contrary to previous studies, the data obtained for the sandhi form

s was very com

plicated, in that there are m

ore natural classes, but which are not phonetically m

otivated. As all four

speakers produced very nearly the same sandhi form

s, however, w

e can be sure that this set of alternations is real, at least for these w

ords. These data were then analyzed in term

s of two

C

HA

PTER 6

106

different framew

orks: a generative-autosegmental approach and a categorical approach. W

hile both can be m

anipulated to capture the data, neither of these theoretical framew

orks produced very neat or concise accounts of the data.

The lack of phoneticity suggests that it is possibly not rule-governed. Work w

ith native speakers testing w

hether or not the sandhi is actually rule-governed, perhaps along the lines of H

sieh (1970), would have to be done to determ

ine this. This, then, addresses a very important

question, as it is now generally assum

ed that tone sandhi is a process. The categorical analysis is a m

ore ‘static’ account. That is, it explains things in terms of substitution, not in term

s of dynam

ic processes of change, be they with features or phonetic values. H

owever, it could just

be that there are surface phonetic constraints on tones in prepausal position in Fuzhou not at all related to the underlying form

s.

Before any further w

ork can be done on theories of Fuzhou tone sandhi, it is im

perative that instrumental data be obtained. Such data could m

ake initial grouping a lot easier. It could determ

ine whether or not w

hat has been deemed a 31 is actually significantly

different from a 21 or a 22 from

a 23, a 34 from a 44 and so on. R

ose (1990) has demonstrated

the usefulness of this approach. So, while the im

portance of instrumental phonetics in

phonology is often seen to be in the assessment of com

peting theories (e.g. Ohala 1986;

Ladefoged 1990; and other papers in Beckm

an & K

ingston 1990)), it can now be seen a

fortiori that the tonologist’s observation data would preferably be instrum

entally verified in the first place.

107

7

Summ

ary

This thesis has investigated the phonetics and phonology of tones in Fuzhou, as they occur in citation and in disyllabic utterances. It w

as motivated by the fact that although there have been

a number of analyses of Fuzhou phonology, no previous descriptions of the phonetics of

Fuzhou tones existed.

For the citation tones, I quantified the acoustic dimensions of fundam

ental frequency, duration, and am

plitude. From this I derived the acoustics of tones of the variety of Fuzhou as

a whole, as opposed to previous auditory descriptions, or descriptions of an individual

speaker. These results showed significantly m

ore variation than previously described, not just w

ithin the variety (across all speakers), but also consistent differences between speakers.

There were also m

ore sandhi forms than previously reported, and the actual phonetics of the

data differed to that in previous studies, so that some of the previous feature assignm

ents and phonological generalizations cannot be applied to these data (such as Tone 2 being [+upper] or all sandhi tones being [+upper]).

The relationship between A

r and F0 was also investigated to infer the relative

importance of the physiological features of vocal cord tension and subglottal pressure in the

production of tones. In addition to vocal cord tension, subglottal pressure was found to have a

possible role in tone production in Fuzhou.

In chapter 6, I presented the auditory data for the tone sandhi of disyllabic expressions, and presented and discussed tw

o possible analyses of Fuzhou phonology. The data were found

to be differ in several important respects from

published data, with m

ore complexities than

any of the systems of Fuzhou tone sandhi previously described. There w

ere more sandhi form

s and m

ore natural classes that could not be phonetically motivated. In fact, there w

as little phoneticity obvious in the data. W

hile it would obviously be an advantage to have quantified

data as the observation data for a phonological analysis, the results I obtained were

consistently the same for all four speakers. This allow

ed us to infer that there are little or no betw

een-speaker differences and that the data presented is really representative of the actual system

of tone sandhi. This highlights the need for a multi-speaker approach, not just to

unearth the variation in sandhi forms, but also to corroborate the data. It is certainly suggests

that the notion of an exceptionless sandhi system for a given variety is rather idealistic.

The present data may now

be used for future research, be it a tonetic comparison w

ith other tonal languages; using this as input data for universals on tonology; or input data for tonological theories.

C

hapter 7

108

Suggestions for further study

There are many ideas for further study arising from

the present investigations. I shall list a few

of these.

Phonetics of the citation tones

Now

that we have the norm

alized acoustics, an investigation into the perception of the tones to try to ascertain w

hich are the most perceptually im

portant acoustic cues needs to be done; including perceptual features other than pitch such as the phonation type and changes in vow

el quality. From

the point of view of tonal production, it w

ould be interesting to know w

hat the physiological correlates are, but this w

ill definitely entail a modification of the M

onsen et al. m

odel to incorporate the third dimension of glottal aperture due to the changing phonation

types (cf. Chapter 5).

Phonetics of the disyllabic tone sandhi

Having seen the com

plex system of tone sandhi in Fuzhou, it is even m

ore pressing to quantify the disyllabic expressions in the sam

e way the citation tones w

ere quantified. As noted earlier,

Zhang writes that “the field needs carefully designed acoustic studies that system

atically look at the realizations of tones in tone sandhi behavior” (Zhang 2009:11–12). Slight falls and slight rises w

ould no longer be mere im

pressions, and any phoneticity, if present would be

more obvious. It is also a good start to determ

ining whether the sandhi tones sim

ilar in form to

the citation tones are in fact the ‘same’ tones, at least phonetically.

Psychological reality of tone sandhi

It must be determ

ined whether or not the sandhi changes are rule-governed, and if so, w

hether or not they are generative rules. If it is not rule-governed, the question still rem

ains of how to

account for the changes. Is it purely substitution, or suppletion? That is, are the forms for

which the citation tones are substituted already existing in the tonal inventory, or are they

specifically only for that context (and thus individually learned)? There are many m

ethods that could be explored to test the psychological reality of the nature of the tone sandhi: w

ord gam

es, neologisms, and observing natural speech errors and the incorporation of loan w

ords. The acquisition of Fuzhou tone sandhi is certainly a topic that deserves future attention an perhaps one that w

ould shed light on the question of whether the sandhi is rule-governed or

simply learned stipulations.

Finally, I think it is examine the perceptual salience of these phonetic cues, to see if tonology

can be better motivated by using these (e.g. D

onohue 2012).

109

References

Anderson, Stephen R

. 1978. Tone features. In Victoria From

kin (ed.) Tone: A linguistic survey, pp. 133–175. N

ew Y

ork: Academ

ic Press.

Ballard, W

illiam L. 1980. O

n some aspects of W

u tone sandhi. Ajia Afurika Gengo Bunka

Kenkyū (ア

ジア・アフリカ言語文化研究

) [Journal of Asian and A

frican Studies] 19: 83–163.

Beckm

an, Mary E. and John K

ingston (eds) 1990. Papers in Laboratory Phonology I: Betw

een the gramm

ar and physics of speech. Cam

bridge: Cam

bridge University Press.

Běijīng dàxué zhōngguóyǔyánw

énxuéxì yǔyánxué jiàoyánshì (北京大學中國語言文學系語

言學教研室

) [Peking University C

hinese Departm

ent]. 1962. Hànyǔ Fāngyīn Zìhuì (漢

方音字匯

) [A C

hinese dialect syllabary] Běijīng: W

énzì gǎigé chūbǎn shè (北京:文字改

革出版社

).

Cham

bers, J. K. and Peter Trudgill. 1980. D

ialectology. Cam

bridge: Cam

bridge University

Press. C

han, Lee Lee L. 1998. Fuzhou tone sandhi. PhD D

issertation, UC

SD.

Chan, M

arjorie K. M

. 1985. Fuzhou Phonology: A non-linear analysis of tone and stress. PhD

Dissertation: U

niversity of Washington.

Chao, Y

uen-Ren. 1930. A

system of tone letters. Le M

aître Phonétique 45: 24–27.

Chao, Y

uen-Ren. 1934. O

n the non-uniqueness of phonemic solutions of phonetic system

s. Bulletin of the Institute of H

istory and Philology, Academia Sinica 4: 363–97. R

eproduced in M

artin Joos (ed.) 1966. Readings in Linguistics I, pp. 38–55. Chicago: U

niversity of C

hicago Press.

Chen, Leo and Jerry N

orman. 1965. An introduction to the Foochow

dialect. San Francisco State C

ollege in cooperation with the U

S Office of Education.

Chen, M

atthew. 2000. Tone sandhi: patterns across C

hinese dialects. Cam

bridge Studies in Linguistics 92. C

ambridge: C

ambridge U

niversity Press

Coster, D

.C. and Paul K

ratochvil. 1984. Tone and stress discrimination in norm

al Beijing

dialect speech. Beverly H

ong (ed.), New

papers on Chinese language use, 119–132.

Contem

porary China C

entre, AN

U.

Disner, Sandra Ferrari. 1980. Evaluation of vow

el normalisation procedures. Journal of the

Acoustical Society of America 67 (1): 253–261.

Donohue, C

athryn. 1991. Fuzhou tones. ms, A

ustralian National U

niversity.

Donohue, C

athryn. 1992a. The phonetics and phonology of Fuzhou tones. Honours thesis,

Australian N

ational University.