Upload
hku-hk
View
3
Download
0
Embed Size (px)
Citation preview
Cathryn Donohue
Fuzhou tonal acoustics and tonologyE LLS
AS
L82
ISB
N978
386288
522
0
Fu
zho
uto
nalaco
usti
cs
an
dto
no
log
y
Cath
ryn
Do
no
hu
e LIN
CO
MS
tud
ies
inA
sia
nL
ing
uis
tic
s82
LIN
CO
ME
UR
OP
A
ac
ad
em
icp
ub
lic
ati
on
s
LINCOM
The
Fuzh
ou
variety
of
Chin
ese
belo
ngs
toth
eM
india
lect
gro
up,
spoke
nin
the
capita
lof
Fujia
npro
vince
.It
iskn
ow
nfo
rits
com
ple
xto
nal
syst
em
,'a
ltern
atin
g'
vow
els
,and
com
plic
ate
dright
dom
inant
tone
sandhi.
How
eve
r,pre
vious
desc
riptio
ns
have
typic
ally
been
base
don
audito
ryim
pre
ssio
ns
of
asi
ngle
speake
r.T
his
study
pre
sents
the
first
multi
-speake
raco
ust
icquantif
icatio
nof
the
cita
tion
tones
inF
uzh
ou.
Usi
ng
two
male
and
two
fem
ale
speake
rs,
mean
fundam
enta
lfr
equency
and
dura
tion
data
for
the
cita
tion
tones
are
pre
sente
dand
dis
cuss
ed
befo
reth
edata
isnorm
aliz
ed
acr
oss
speake
rsto
fact
oro
uta
ny
betw
een-s
peake
rva
riatio
n.
The
phys
iolo
gy
of
tone
pro
duct
ion
inF
uzh
ou
isexp
lore
dth
rough
am
plit
ude
measu
rem
ents
,in
direct
lyass
ess
ing
the
poss
ible
role
of
voca
lco
rdte
nsi
on
(VC
T)
and
subglo
ttal
pre
ssure
(Ps)
thro
ugh
applic
atio
nof
the
model
pre
sente
din
Monse
net
al.
(1978)
whic
he
xte
nd
sth
eIs
hiz
aka
-Fla
na
ga
ntw
o-m
ass
mo
de
lo
fvo
ca
l-fo
ldvi
bra
tion.
Inth
isst
udy,
both
VC
Tand
Ps
were
found
tobe
equally
import
antfo
rto
nalp
roduct
ion.T
he
tonalp
honolo
gy
ofF
uzh
ou
isals
oexa
min
ed.F
irst
,tw
om
ajo
rst
udie
sare
revi
ew
ed
(Chan
1985
and
Yip
1990)
befo
renew
data
for
the
dis
ylla
bic
tone
sandhi
ispre
sente
d.
Analy
ses
of
these
data
usi
ng
two
diff
ere
nt
models
(Auto
segm
enta
lP
ho
no
log
ya
nd
an
ap
pro
ach
usin
gtr
ad
itio
na
lC
hin
ese
ton
al
cate
gories)
are
then
exp
lore
dand
com
pare
d.
All
the
data
from
the
study
are
pre
sente
din
the
appendic
es.
FU
ZHO
U TO
NA
L AC
OU
STICS
AN
D TO
NO
LOG
Y
Cathryn D
onohue The U
niversity of Hong K
ong
For: LIN
CO
M STU
DIES IN
ASIA
N LIN
GU
ISTICS 82
Copyright inform
ation etc.
You have always lit the way for me with your bright smile and warm hugs,
your golden eyes sparkling with wit and humour. You have carried me along with your unfailing support, unconditional love,
your encouragement and optimism. Your gentle shepherding and voice of reason has always been
the beacon I needed on a foggy night; and our shared laughter and friendship, the sunshine in my day.
You have always believed in me, when I wasn’t even sure that I did.
This is not just for you, this is because of you. It is as much yours as it is mine.
Forever, my light, my love, my inspiration – M
y Mum.
Contents
Contents .................................................................................................................................... i List of figures ....................................................................................................................... iii List of tables ......................................................................................................................... iv
Forward .................................................................................................................................... v
Preface ..................................................................................................................................... ix A
cknowledgem
ents ................................................................................................................. xi C
hapter 1. Introduction .......................................................................................................... 1
1.1 Fuzhou as a C
hinese dialect ......................................................................................... 1 1.2
Characteristics of the M
in dialect group ...................................................................... 4 1.3
Peculiarities of Fuzhou ................................................................................................. 6 1.4
Previous studies of Fuzhou .......................................................................................... 7 1.5
Summ
ary ..................................................................................................................... 9 C
hapter 2. Previous analyses of Fuzhou tonology .............................................................. 11
2.1 A
n overview of Fuzhou tones .................................................................................... 11
2.2 C
han’s analysis of Fuzhou tonology .......................................................................... 14 2.3
Yip’s analysis of Fuzhou tonology ............................................................................ 21
2.4 Tw
o analyses compared ............................................................................................. 26
Chapter 3. A
coustic quantification of the citation tones ................................................... 27 3.1
Consultants ................................................................................................................. 27
3.2 The corpus and elicitation .......................................................................................... 28
3.3 A
coustic instrumentation and m
ensural procedure .................................................... 29 3.4
Results ........................................................................................................................ 33
3.5 Sum
mary .................................................................................................................... 33
Chapter 4. A
coustic characteristics of the citation tones ................................................... 35 4.1
Method of data interpretation ..................................................................................... 35
4.2 A
uditory characteristics .............................................................................................. 36 4.3
Preliminary considerations ......................................................................................... 37
ii 4.4
Individual speaker results ........................................................................................... 40 4.5
A com
parison of the individual speaker’s results ...................................................... 49 4.6
Norm
alization ............................................................................................................. 51 4.7
Distinctive features for Fuzhou tones ........................................................................ 58
4.8 Sum
mary .................................................................................................................... 58
Chapter 5. T
he physiology of tone production in Fuzhou ................................................. 61 5.1
Why investigate am
plitude? ....................................................................................... 61 5.2
What is am
plitude? ..................................................................................................... 62 5.3
Received theories of F0 production ........................................................................... 62
5.4 A
r and F0 relationship ................................................................................................ 66 5.5
VC
T and Ps relationship ............................................................................................ 70 5.6
Ar/F0 vs. V
CT/Ps relationships ................................................................................. 72
5.7 D
iscussion .................................................................................................................. 72 5.8
Summ
ary .................................................................................................................... 72 C
hapter 6. Fuzhou disyllabic tone sandhi ........................................................................... 75 6.1
Procedure .................................................................................................................... 75 6.2
Results ........................................................................................................................ 76
6.3 Som
e views on tonological representation ................................................................. 82
6.4 Fuzhou tonology: A
utosegmental .............................................................................. 84
6.5 Fuzhou tonology: C
ategorical approach .................................................................... 99 6.6
A com
parison of the approaches .............................................................................. 105 6.7
Summ
ary and conclusion ......................................................................................... 105 C
hapter 7. Summ
ary ........................................................................................................... 107 R
eferences ............................................................................................................................ 109 A
ppendix A: C
orpus: citation tones ..................................................................................... 115 A
ppendix B: C
orpus: disyllabic expressions ....................................................................... 117 A
ppendix C: R
aw F0 and duration m
easurements ............................................................... 121
Appendix D
: Mean F0 and duration m
easurements ............................................................. 135
Appendix E
: Norm
alized F0 values ..................................................................................... 137 A
ppendix F: Raw
amplitude m
easurements ........................................................................ 139
Appendix G
: Mean am
plitude and duration ......................................................................... 149 A
ppendix H: V
CT/Ps and F0/A
r measurem
ents ................................................................. 151 A
ppendix I: VC
T/Ps and F0/Ar plots .................................................................................. 157
iii
List of figures Figure 1.1. G
eographical distribution of Chinese dialect groups. ............................................ 2
Figure 1.2. A possible subgrouping of the C
hinese dialect groups. ......................................... 5 Figure 2.1. C
hen & N
orman: citation tones. .......................................................................... 11
Figure 2.2. A list of near-m
inimal pairs in Fuzhou. ............................................................... 11
Figure 2.3. Chan: citation tones and tonem
es. ....................................................................... 15 Figure 2.4. The interaction of R
egister and Tone. .................................................................. 22 Figure 2.5. Y
ip: citation tones. ............................................................................................... 23 Figure 2.6. Y
ip: citation tones—feature assignm
ent .............................................................. 24 Figure 3.1. Stim
uli phonotactics ............................................................................................ 29 Figure 3.2. N
arrow and w
ideband spectrograms of [tu] uttered by ZPW
. ............................. 31 Figure 3.3. Spectrogram
of tone 3 [pa] spoken by ZPW w
ith non-modal phonation. ........... 32
Figure 3.4. Average am
plitude spectrogram on tone 1 [tu] spoken by ZPW
. ........................ 32 Figure 4.1. W
XQ
: Citation tones [0–100%
duration]. ........................................................... 38 Figure 4.2. FM
: Citation tones [0–100%
duration]. ............................................................... 38 Figure 4.3. LY
: Citation tones [0–100%
duration]. ............................................................... 39 Figure 4.4. ZPW
: Citation tones [0–100%
duration]. ............................................................ 39 Figure 4.5. W
XQ
mean F0 contours from
5csec. to 95% duration. ....................................... 42
Figure 4.6. FM m
ean F0 contours from 5csec. to 95%
duration. ........................................... 44 Figure 4.7. LY
mean F0 contours from
5csec. to 95% duration. ........................................... 46
Figure 4.8. ZPW m
ean F0 contours from 5 csec. to 95%
duration. ....................................... 48 Figure 4.9. N
ormalized tone 1 .............................................................................................. 53
Figure 4.10. Norm
alized tone 2 .............................................................................................. 53 Figure 4.11. N
ormalized tone 3 .............................................................................................. 53
Figure 4.12. Norm
alized tone 4 ............................................................................................. 53 Figure 4.13. N
ormalized tone 5 .............................................................................................. 54
Figure 4.14. Norm
alized tone 6 .............................................................................................. 54 Figure 4.15. N
ormalized tone 7 .............................................................................................. 54
Figure 4.16. Mean norm
alized tone 1 .................................................................................... 55
Figure 4.17. Mean norm
alized tone 2 ..................................................................................... 55 Figure 4.18. M
ean normalized tone 3 ..................................................................................... 55
Figure 4.19. Mean norm
alized tone 4 ..................................................................................... 55 Figure 4.20. M
ean normalized tone 5 .................................................................................... 56
Figure 4.21. Mean norm
alized tone 6 ..................................................................................... 56
iv
Figure 4.22. Mean norm
alized tone 7 ..................................................................................... 56 Figure 4.23. M
ean normalized tones in Fuzhou plotted against m
ean duration. ................... 57 Figure 5.1. M
odified map from
Monsen et al. (1978). D
ata synthesized for vocal folds of m
ale dimensions. ............................................................................................................. 65
Figure 5.2. Modified m
ap from M
onsen et al. (1978). Data synthesized for vocal folds of
female dim
ensions. .......................................................................................................... 66 Figure 5.5. A
r/F0 plotted against equalized duration for Tones 5–6, speaker 2 (FM). .......... 68
Figure 5.6. Ar/F0 plotted against equalized duration for Tones 5–6, speaker 3 (LY
). .......... 69 Figure 5.7. V
CT and Ps plotted against equalized duration for tones 1–3, speaker 2 (FM
). . 71 Figure 5.8. V
CT and Ps plotted against equalized duration for tones 1–3, speaker 3 (LY
) ... 71
List of tables T
able 1.1. Min reflexes of M
C ‘d’ ........................................................................................... 5
Table 1.2. Fuzhou’s seven tones. .............................................................................................. 6
Table 1.3. A
lternating vowel pairs in Fuzhou .......................................................................... 7
Table 1.4. Previous descriptions of Fuzhou tones. ................................................................... 8
Table 2.1. Fuzhou disyllabic tone sandhi form
s (Chen &
Norm
an 1965) ............................. 12 T
able 2.2. Fuzhou vowel alternations (C
hen & N
orman 1965) ............................................. 14
Table 2.3. Fuzhou disyllabic tone sandhi form
s (Chan 1985) ................................................ 16
Table 2.4. Y
ip: Vow
el alternations ......................................................................................... 22 T
able 2.5. Fuzhou disyllabic tone sandhi forms (Y
ip 1990) .................................................. 23 T
able 4.1. Auditory description of the tones in Fuzhou. ........................................................ 36
Table 4.2. Phonological feature assignm
ent for Fuzhou. ....................................................... 58 T
able 6.1. Fuzhou tone sandhi forms. .................................................................................... 77
Table 6.2. First syllable changes in the disyllabic tone sandhi. ............................................. 79
Table 6.3. Fuzhou disyllabic tone sandhi form
s (Chen &
Norm
an 1965) ............................. 80 T
able 6.4. Fuzhou disyllabic tone sandhi forms (C
han 1985) ................................................ 80 T
able 6.5. Fuzhou disyllabic tone sandhi forms (Y
ip 1990) .................................................. 81 T
able 6.6. Fuzhou tonal feature assignment ........................................................................... 87
Table 6.7. Possible feature assignm
ent to the sandhi tones. ................................................... 88 T
able 6.8. Middle C
hinese tone categories/ Wenzhou tonal contours ................................. 100
Table 6.9. Fuzhou tones and traditional M
C categories ....................................................... 101
Table 6.10. Fuzhou tone sandhi organized in term
s of MC
tonal categories. ...................... 102
1
1
Introduction
The aim of this book is to provide an investigation of the phonetics and phonology of tones
in the Fuzhou variety of Chinese as they occur in citation and in disyllabic utterances. I
present data to quantify the acoustic dimensions of fundam
ental frequency, duration and am
plitude, and the physiological correlates of vocal cord tension and subglottal pressure for the citation tones. O
n the basis of the data obtained, I give an auditory account of the tone sandhi of disyllabic expressions and present the phonology of Fuzhou. The book is divided into tw
o main parts: the first tw
o chapters summ
arize relevant previous studies, while the
remaining chapters present the results of m
y own investigations and data collection.
The remainder of chapter 1 introduces Fuzhou and its peculiarities. C
hapter 2 gives an overview
of Fuzhou tones based on previous work, and then presents tw
o of the most
detailed proposals for accounting for the tonological system. C
hapter 3 describes the corpus used in the elicitation sessions for Fuzhou citation tones and m
y general methodology.
Chapter 4 gives the results of the fundam
ental frequency and in chapter 5 the amplitude is
examined to see if it is possible to infer som
ething about the way the tones are produced.
Chapter 6 presents the phonetics and phonology of the disyllabic expressions, and finally
chapter 7 concludes with a short sum
mary.
1.1 Fuzhou as a Chinese dialect
Fuzhou (Fúzhōuhuà 副州話
) is an Eastern Min (M
ǐn Dōng (閩
東) dialect of C
hinese. 1,2 This section provides an overview
of Chinese and its subgroups in order to situate Fuzhou
appropriately as a Min dialect.
According
to N
orman,
Chinese
consists of
seven different
dialect groups
(1988:181). Each of these dialect groups likely constitutes a dialect continuum like W
est G
ermanic (C
hambers &
Trudgill 1980), but they are all heteronymous w
ith respect to Standard C
hinese, and possibly to a local standard as well. Y
uan (1962) groups them as
follows:
1 I do not address the sociopolitical landscape here and use both dialect and variety throughout. 2 Throughout this book I use pinyin and characters at the first introduction of a C
hinese term,
thereafter I use the pinyin without the diacritics.
C
HA
PTER 1
2
1 Mandarin or B
ěifānghuà (北方話
) 2 W
ú (吳)
3 Xiāng (湘
) 4 G
àn (贛)
5 Hakka (K
èjiā 客家
) 6 Y
uè (粵)
7 Mǐn (閩
)
There are certain subdivisions within these seven groups that are som
etimes treated as
separate groups. For example, Jìn (晉
語) is som
etimes grouped together w
ith Mandarin (as
Guān 官
) and sometim
es listed as a separate group; Huīzhōu (徽
州話
) has been argued to belong to W
u, Gan or (Jiānghuái
江淮
) Mandarin; and Pínghuà (平
話) is som
etimes used
as an alternative name for the Y
ue dialect group, and sometim
es described as an independent group in the Y
ue branch, or just as a dialect of Yue. M
oreover, Gan and H
akka are som
etimes treated as independent languages under the sam
e main branch of C
hinese. Figure 1.1 locates these dialect groups in C
hina.
Figure 1.1. Geographical distribution of C
hinese dialect groups. 3
3 From
Wikipedia, http://en.w
ikipedia.org/wiki/C
hinese_dialects; accessed 3/3/13.
IN
TRO
DU
CTIO
N
3
The dialect groups are conventionally based on the historical development of the tones
from M
iddle Chinese. M
iddle Chinese (M
C) w
as reconstructed using the Qièyùn (切
韻)
dictionary. This rhyming and pronunciation dictionary w
as compiled in 601 C
E by a small
group of scholars led by the poet Lù Fǎyán (陸法言
) whose aim
was to provide a guide to
the ‘proper’ recitations of literary texts. The Qieyun records the pronunciation of C
hinese characters arranged by tone and rhym
e and is said to be the origin of the fănqiè (反切
) – a w
ay to segment the syllables into onsets and rhym
es in order to include pronunciations of w
ords without hom
ophones. About a century before the Q
ieyun, it was discovered that
there were four tonal categories in C
hinese, traditionally called píng (平 ‘level’), shǎng (上
‘rising’), qù (去
‘departing’) and rù (入 ‘entering’). The dictionary w
as thus arranged into five volum
es, two for the ping tone and one for each of the other tones. W
ithin each volum
e, characters were divided into rhym
es, defined by the nuclear vowel and final
consonant.
This original four-way tonal distinction has developed into m
uch more com
plex system
s for most varieties of C
hinese. The tonal splits are usually conditioned by features of the initial consonants including voicing, glottality, aspiration and prenasalization. In order to account for m
ost of the dialects, a three-way distinction of the initial types is
sufficient – these are traditionally called qīng (清), quánzhuó (全
濁) and cìzhuó (次
濁),
literally ‘clear’, ‘fully muddy’ and ‘partially m
uddy’ (Norm
an 1973:222). These are generally
interpreted to
refer to
voiceless, voiced
obstruent and
voiced sonorant
respectively. It is the diachronic development of the M
C voiced stops into the m
odern dialects that is usually the sole classificatory criterion for subgrouping. For exam
ple, the follow
ing is said to characterize a Mandarin dialect (N
orman 1988:191):
MC
voiced obstruents all become devoiced: V
oiceless aspirate in the Ping tones and voiceless non-aspirate in the other tones. The R
u tones have been redistributed over the other tones.
A different approach to classification is adopted by N
orman (1988:182), w
ho bases the com
parisons on synchronic data. He proposes ten features as further criteria, better
illustrating, he claims, the internal relationships that hold betw
een groups. These criteria are:
1. The third-person pronoun is tā
他 or cognate to it.
2. The subordinative particle is de (di) 的
or cognate to it.
3. The ordinary negative is bù 不
or cognate to it.
4. The gender m
arker for animals is prefixed, as in the w
ord for ‘hen’ mǔjī 母
雞
(literally ‘female chicken’).
5. There is a register distinction only in the ping tonal category.
6. V
elars are palatalized before [i].
7. Zhàn 站
or words cognate to it are used for ‘to stand’.
8. Zǒu 走
or words cognate to it are used for ‘to w
alk’.
9. Érzi 兒
子or w
ords cognate to it are used for ‘son’.
10. Fángzi 房子
or words cognate to it are used for ‘house’.
C
HA
PTER 1
4
According to these criteria, the dialect groups fall into three larger groups, roughly
corresponding to their geographical locations: Northern group (B
eifanghua), Southern group (K
ejia, Yue, M
in), and the rest constituting the Central group w
hich is understood as transitional, possessing features from
both the Northern and Southern groups. The N
orthern group m
eets all of the above criteria, while the Southern group m
eets none. The geographical locations can be seen on the m
ap given in figure 1.1.
1.2 Characteristics of the M
in dialect group The hom
e of the Min dialects is southeast C
hina: most of Fújiàn (福
建) province and the
northeastern corner of Guangdong. It is w
ell known that dialects of this group differ from
the other groups in their linguistic developm
ents, exhibiting many archaism
s as well as
local innovations. This could be due to the relative geographical isolation: no major rivers
and a very mountainous terrain w
ould have made access in or out of this region difficult.
Possibly because of this, the Min dialects are the second m
ost distinctive (after Beifanghua)
and easily characterized group of Chinese dialects.
The key diagnostic feature for determining M
in dialects is the need to posit a tripartite division of the proto voiced stops: *voiced, *voiced aspirate and ‘*softened stops’ (the latter possibly arising from
the influence of some type of voiced prefix, w
ith the root consonant subsequently undergoing lenition) (N
orman 1973:237). This can be observed by
examining the M
C voiced (quanzhuo) stops and their correspondences in M
in. This is illustrated in table 1.1 (adapted from
Norm
an 1988:229). The table compares a set of M
C
words w
ith initial *d and their cognates in four Min dialects, show
ing that some M
in varieties have three different correspondences to the M
C ‘d’. The correspondence sets
illustrating this with the sam
e MC
segmental form
s are highlighted in bold print. Note also
that the distinction is not dependent on tonal forms as som
e MC
forms have the sam
e tonal reflexes across dialects (as indicated by the superscript num
ber). Nor can it be show
n to be a conditioned split, thus it follow
s that three different phonemes present in an earlier stage
of Chinese and are still preserved in M
in. The main distinction for M
in is between aspirate
th and unaspirate t. This two-w
ay distinction is present in all Min dialects, but a further
division of the unaspirate initial consonants into two types is only preserved in som
e northw
estern dialects, with reflexes of voiced sonorants (or nothing). In table 1.1, Jiànyáng
(建陽
) is a representative of a northwestern M
in dialect exhibiting the third correspondence set, as show
n in the third bolded line, with reflexes for
荳 ‘bean’, 頭
‘head’ and脰
‘neck’. These form
s are all *d initial in MC
, but correspond to an initial [t], [h] and [l] in Jianyang.
Table 1.1 shows that the m
ajor reflexes of MC
‘d’ are aspirate and unaspirate dental stops, and there is generally a high degree of correlation as to w
hether the initial is aspirate or not in any given w
ord within this dialect group. In short, a M
in dialect can be defined as “any C
hinese dialect in which both aspirated and unaspirated stops occur in all the yang
(lower register) tones, and in w
hich the lexical incidence of the aspirated forms in any
given word is in substantial agreem
ent with that of the other dialects of the group” (N
orman
1988:229).
IN
TRO
DU
CTIO
N
5
Word
(pinyin) (English)
MC
Fuzhou
Xiam
en Jianyang
Yongan
荳
dòu ‘bean’
dǝu- tau
6 tau
6 teu
6 tø
5
蹢
dí ‘hoof’
diei te
2 tue
2 tai 2
te2
弟
dì ‘brother’
diei: tie
6 ti 6
tie5
te4
頭
tóu ‘head’
dǝu thau
2 thau
2 heu
2 thø
2
啼
tí ‘w
eep’ diei
thie2
thi 2 hie
2 the
2
糖
táng ‘sugar’
dâng thouŋ
2 thŋ
2 hɔŋ
2 tham
2
疊
dié ‘stack up’
diep thak
8 thaʔ
8 ha
8 thɔ
4
脰
dòu ‘neck’
dǝu- tau
6 tau
6 lo
6 ---
袋
dài ‘bag’
dậi- toi 6
ta6
lui 6 tue
5
毒
dú ‘poison’
duok tøik
8 tak
8 lo
8 tau
4
銅
tóng ‘copper’
dung tøiŋ
2 taŋ
2 loŋ
2 tãw
2
Table 1.1. M
in reflexes of MC
‘d’
The linguistic situation in southern China is actually very com
plicated and rather difficult to illustrate in the typical tree diagram
format, as there are m
any possible sub-strata. For M
in, there is a possible sub-stratum of A
ustronesian, for Wu, a sub-stratum
from
Miao-Y
ao, and for the Yue dialect group, from
Tai. Nevertheless, it is clear that M
in represents a diachronically earlier stage of C
hinese than that of Middle C
hinese, with a
tentative dialect subgrouping as shown in figure 1.2. 4
AC
MC
Min
Yue
Kejia
Gan X
iang Wu
Beifanghua
Southern
Northern
Figure 1.2. A possible subgrouping of the C
hinese dialect groups.
The lexicon is also a good source of Min peculiarities. It is here that w
e can observe the aforem
entioned preservations of archaisms and local innovations, w
here some w
ords retain their original m
eanings, once used in across the board, while undergoing a sem
antic shift in
4 A
C = A
ncient Chinese.
C
HA
PTER 1
6 other groups. One exam
ple of this is that the Northern group underw
ent a semantic shift of
‘run’→ ‘w
alk’, but the Min dialects did not (N
orman 1988:183). A
nother example of a
semantic change is the reflex of M
C ‘*tieng’, m
eaning a ‘three-legged cooking vessel’; this w
ord has retained a much earlier m
eaning in Min, w
here it now sim
ply refers to a ‘cooking pot, caldron’, w
hereas the Northern group now
only associates it with the ritual bronze
tripod comm
on in Chinese art (N
orman 1988:231).
1.3 Peculiarities of Fuzhou Fuzhou is a typical M
in dialect in terms of the aforem
entioned broad classificatory criteria, and is often chosen to represent the Eastern M
in dialects. Fuzhou has seven lexical tones. W
hile the various sources do not always agree on the exact pitch values reported for these
tones, they may be roughly described as follow
s:
Tone
MC
tonal category A
uditory description
Tone 1 陰平
Yin ping high level
Tone 2 上
Shang low
level
Tone 3 陰去
Yin qu low
fall/fall-rise
Tone 4 陰入
Yin ru low
rise, final stop
Tone 5 陽平
Yang ping high fall
Tone 6 陽去
Yang qu rise-fall
Tone 7 陽入
Yang ru high, final stop
Table 1.2. Fuzhou’s seven tones.
While not m
entioned in previous descriptions of Fuzhou, there is a clear non-modal
phonation (creaky/breathy) associated with tones 3, 4 and 6.
Fuzhou is perhaps most fam
ous for its set of vowel alternations (e.g. M
addieson 1976a; D
onohue 2011a, 2014). The realization of a vowel w
ill vary depending on the tone w
ith which it is realized in citation (or isolation/prepausal) form
. Examples of the nature of
the vowel alternations are given below
in table 3, where the vow
els are divided into two
groups according to the tone they occur with. The A
vowels are higher relative to the B
vow
els, which are low
er/diphthongs. A given row
constitutes a single phonological vowel,
a grouping justified both historically, and by the fact that the vowel differences are
neutralized in sandhi position to the corresponding form from
the Set A group.
IN
TRO
DU
CTIO
N
7
Set A:
Tones 1, 2, 5, 7 Set B
: Tones 3, 4, 6
i ei
ei ai
u ou
ou au
y øy
Table 1.3. A
lternating vowel pairs in Fuzhou
Fuzhou tone sandhi is right dominant, thus it is the final syllable in a given dom
ain that rem
ains unchanged. All of these prepausal syllables retain their citation tones and
vowels. H
owever, in sandhi (non-final) position, the tone changes and all Set B
vowels
become the correspondent vow
el from Set A
. The examples in (1)–(3) (C
hao 1934:41) below
illustrate how a syllable w
ith an underlying tone from Set B
(tone 3, 4 or 6) in prepausal or citation position w
ill be realized with a vow
el from Set B
as shown in the (a)
examples. H
owever, w
hen they are followed by another syllable and are thus in sandhi
position, we see that their tone changes (depending on the follow
ing tone), and the vowel
also changes to the corresponding variant from Set A
, often characterized as a ‘raising’ of the vow
els from the low
er/diphthongal variant to the higher/monophthongal form
. 5
Input tones:
Surface form
:
(1) a. Tone 3 [21]: 氣
[khei 21]
‘air’
b. Tone 3 [21] + Tone 4 [23]:
氣 壓
[khi 53 ɑʔ 23]
‘air pressure’
(2) a. Tone 4 [23]: 竹
[tøyʔ 23]
‘bamboo’
b. Tone 4 [23] + Tone 4 [23]:
竹 節
[ty 5 ʒaiʔ 23]
‘bamboo section’
(3) a. Tone 6 [231]: 護
[hou 231]
‘protect’
b. Tone 6 [231] + Tone 1 [44]:
護 兵
[hu 44 βiŋ 44]
‘guards’
1.4 Previous studies of Fuzhou A
reasonable amount of descriptive w
ork has been conducted on Fuzhou. Chan (1985)
contains a list of many of the relevant sources and the values assigned to the tones. The
tonal values from these, along w
ith other more recent w
orks, are included in table 1.4. One
5 These exam
ples continue to use Donohue (1992a)’s tone num
bers rather than Chao’s original values to
avoid confusion.
C
HA
PTER 1
8 thing that
all previous
works
share is
that they
are based
on auditory
data, or
impressionistic transcriptions of pitch, and often from
just one speaker. The original descriptions vary w
ith respect to the representation of the tones, some even em
ploying a m
usical note system. C
han, however, converted these values into the C
hao ‘tone letters’ w
here the pitch contour is represented through combinations of num
ber 1 to 5, where 5 is
high and 1 is low (C
hao 1930). I use underscoring to indicate a syllable with an unreleased
plosive in the coda (a ‘checked’ or ‘stopped tone’) (which typically corresponds to a
syllable with a shorter duration, the original intent of the underscoring).
Author
Year
Tone 1 Tone 2
Tone 3 Tone 4
Tone 5 Tone 6
Tone 7
Beijing
University
1962 44
31 213
23 52
242 4
Chan
1985 44
32 213
13 51
131 5
Chao
1933 44
22 312
24 52
242 55
Chen
1967 44
22 312
24 52
232 5
Corbato
1945 44
21 25
24 52
232 5
Ergerod 1956
55 33
13 13
52 242
55
Lan 1953
55 33
11 13
61 242
56
Liang 1982
55 31
213 13
53 353
55
Maccy &
B
aldwin
1929 44
33 13
13 53
341 4
Nakajim
a 1979
55 33
31 23
52 242
55
Norm
an 1988
55 22
13 24
41 342
55
Tao 1930
55 31
13 34
52 342
5
Wang
1969 5555
3333 1112
24 6--2
2342 56
Wright
1983 44
22 12
13 52
242 4
Yip
1980 44
22 12
13 52
242 4
Yuan
1980 44
31 213
23 52
353 4
Zhan 1981
44 31
213 23
53 242
5
Zhang 1984
44 22
213 13
52 242
5
Zheng 1958
44 22
213 23
53 231
5
Table 1.4. Previous descriptions of Fuzhou tones.
While there are certain uniform
ities that are obvious, such as tone 1 being a tone with an
overall high level pitch, and tone 5, a high falling pitch and tone 7, a short tone with a final
consonant and a high pitch. How
ever, there are some differences that are hard to reconcile,
such as tone 2 being a tone with a m
id level, mid fall and low
level pitch; tone 3 is described as a tone w
ith a low dipping, low
rising, low level, m
id rising, and mid falling
pitch. Tone 4 is clearly a tone with a rising stopped pitch and tone 6 w
ith a convex pitch,
IN
TRO
DU
CTIO
N
9
though possibly in either the upper or lower part of the pitch range. Som
e even extend the 5 point scale to a 6 point scale to fully reflect their (differing) perceived pitch values. Such differences m
ay result from a range of factors such as idiolectal, regional or social
differences. Given that these descriptions are m
ade from im
pressionistic data usually based on the speech of one person, it is perhaps not surprising to find such disagreem
ents and the differences m
ay well reflect norm
al between-speaker variation. B
ut with the speech of just
one speaker and differing transcriptional skills, there is no way to know
. Having to call into
question the reliability and consistency of the linguists’ pitch transcriptions is the most
troubling. Several major w
orks, including Chen (2000) and Y
ip (2002), which aim
to account for the representation of tones and the com
plex sandhi behavior resulting from
tonal interactions
that are
observed in
(especially) C
hinese languages,
rely alm
ost exclusively on such im
pressionistic data as there is little else available.
Clearly one needs a w
ay of standardizing the method of description of tones in
general: a way of obtaining m
aximally objective descriptions. This is possible by
quantifying the linguistically relevant aspects of the physical signal (e.g. Rose 1982a,
1982b, 1990a, 1990b; Zhu 1999 and others). The results obtained from an appropriately
controlled instrumental investigation of the acoustic phase are not subject to as m
any sources of error, and w
ould eliminate the errors resulting from
possible inconsistencies in perception and transcriptions. B
y ensuring a multi-speaker approach and norm
alizing the results, one can also factor out betw
een-speaker differences and arrive at a representation of the tonal fundam
ental frequency (F0) of the Fuzhou variety as a whole (e.g. R
ose 1987). Y
ip herself notes that some of the disagreem
ent, especially in the notation of contour tones m
ay “illustrate the real problem in tonal phonology posed by different field w
orkers’ perceptions of the sam
e facts: one man’s 24 is another m
an’s 35, or, more seriously, one
man’s 22 is another m
an’s 32 and so forth. This is where instrum
ental work is
indispensable but still almost entirely lacking” (1990: 338-339). D
espite the time lapse
since Yip’s observation, there are still relatively few
instrumental descriptions of tone
systems. Experts in the field continue to lam
ent the lack of instrumental w
ork of both the citation tones and the sandhi form
s. Recently, Zhang w
rites “the most urgent and fruitful
step … that C
hinese tonologists should currently take is to rebuild an empirical foundation
from w
hich theoretical analyses may proceed…
The field needs carefully designed acoustic studies that system
atically look at the realizations of tones in tone sandhi behavior” (Zhang 2009:11–12).
1.5 Summ
ary This chapter introduced Fuzhou as a C
hinese dialect and identified some key characteristics
peculiar to Fuzhou and the Min dialect group to w
hich Fuzhou belongs. The next chapter introduces Fuzhou tones in the context of previous tonological analyses.
10
11
2
Previous analyses of Fuzhou tonology
This chapter compares tw
o different accounts of the tonology of the Fuzhou: Yip (1980) and
Chan (1985). B
oth of these approaches fall within the autosegm
ental framew
ork, differing prim
arily in whether or not the author uses the concept of register. I then present an
impressionistic sum
mary of the tones in Fuzhou and their resulting sandhi form
s before finally evaluating the pros and cons of the different approaches to Fuzhou tonology.
2.1 An overview
of Fuzhou tones This section serves to introduce the tone sandhi phenom
ena in Fuzhou, using the data presented in C
hen & N
orman (1965), taken from
Chan (1985) w
ho summ
arizes their work.
2.1.1 Citation tones
The seven citation tones in Fuzhou are represented by Chen &
Norm
an as follows:
Tone 1
/55/ Tone 5
/52/
Tone 2 /22/
Tone 6 /342/
Tone 3 /12/
Tone 7 /55/
Tone 4 /24/
Figure 2.1. Chen &
Norm
an: citation tones.
As previously m
entioned, the underlining indicates a stopped tone. There is another tone, [35], w
hich occurs only in sandhi forms. B
elow in figure 2.2 are som
e examples of the tones in
citation form.
Tone 1 巴
[pa]
[affix] Tone 5
爬
[pa] ‘clim
b’
Tone 2 把
[pa]
‘handle (n.)’ Tone 6
第
[ta] [ordinal prefix]
Tone 3 霸
[pa]
‘tyrant’ Tone 7
白
[paʔ] ‘w
hite’
Tone 4 百
[paʔ]
‘hundred’
Figure 2.2. A list of near-m
inimal pairs in Fuzhou.
C
HA
PTER 2
12
2.1.2 Disyllabic tone sandhi
As noted, tone sandhi in Fuzhou is right dom
inant, so it is the final syllable in a given sandhi dom
ain which retains its ow
n citation value, and determines the observed form
of the im
mediately preceding tone. The set of tonal values in non-final, or sandhi, position in
disyllabic utterances in Fuzhou is illustrated in table 2.1.
Second syllable →→
Tone 1 [55]
Tones 5, 7 [52, 5]
Tones 2, 3, 4, 6 [22, 12, 24, 342]
First syllable ↓↓
Resulting sandhi tones (on first syllable) given below
Tone 1 [55]
Tone 3 [12]
55 52
Tone 6 [342]
Tone 4 (*<h) [24]
Tone 5 [51]
22
Tone 7 [5]
Tone 2 [22]
22 35
Tone 4 (*<k) [24]
Table 2.1. Fuzhou disyllabic tone sandhi form
s (Chen &
Norm
an 1965)
According to C
han, the sandhi forms in table 2.1 are reasonably representative of the other
sources consulted in her extensive survey, the major deviation being w
hether tone 2 is grouped together w
ith tones 3 and 4 in final position, or whether it determ
ines its own sandhi tones for
imm
ediately preceding syllables. Another point needing clarification is the split in tone 4 in its
sandhi behavior. This tone patterns with tw
o different groups of tones when changing to its
sandhi tone. The most w
idely accepted explanation for this is the final glottal stop having com
e from different consonants historically. H
ere, the ‘*<h’ represents a glottal stop in proto-M
in and that the ‘*<k’ is the result of the merger of all other proto-M
in stops (*p, *t, *k) syllable finally (C
han 1985:150). This, however, is by no m
eans a general consensus. It has also been suggested that the ‘k’ group represents an earlier developm
ent of the stops, before w
eakening to a glottal stop. The distinction made betw
een the glottal stop and the ‘k’ is said to have been m
aintained in the literary readings of characters until quite recently. Now
, Fuzhou speakers do not alw
ays make this distinction, and often use a glottal stop as for colloquial
readings.
2.1.3 Trisyllabic tone sandhi The penultim
ate syllable will change according to the rules for disyllabic tone sandhi, and the
antepenultimate syllable w
ill have a low-pitched tone; that is, [22] or [2]. There are, how
ever, a few
restrictions:
P
REV
IOU
S AN
ALY
SES OF F
UZH
OU
TON
OLO
GY
13
If syllable 2 has as its original (input) tone, tone 5 or tone 7 and
(i) the first syllable is either tone 2 or tone 4(<k)
→
normal disyllabic tone sandhi w
ill take place with the sandhi tone of the first
syllable being determined by the changed form
of the penultimate
(ii) the first syllable is any of the other tones (1, 3, 4(<h), 5, 6, 7)
→
the sandhi tone will be that of the ‘eighth’ tone [35].
That is, if the second syllable is underlyingly tone 5 or tone 7 (note that these tones constitute the third natural class of sandhi tones in the above table), there are tw
o possibilities of output for the third syllable, all dependent on w
hether or not the first syllable is either tone 2 or 4(<k) (the second natural class of sandhi tones). If the first syllable is one of these tones then norm
al disyllabic tone sandhi w
ill occur between the first syllable and the penultim
ate, the changed form
of the penultimate syllable being the input value for determ
ining the output sandhi tone of the first syllable. If the first syllable is not one of these tones, then it w
ill change to the so-called eighth tone, [35].
2.1.4 Quadrisyllabic tone sandhi
The maxim
um tone sandhi dom
ain in Fuzhou is four syllables. Expressions of five or more
syllables tend to be broken down into sm
aller domains. Q
uadrisyllabic tone sandhi simply
involves any pre-antepenultimate syllables having a low
tone: [22] or [2].
2.1.5 Vow
el alternations Fuzhou has tonally conditioned m
orphophonemic vow
el alternations. Specifically, the tones divide into tw
o natural classes on the basis of vowel alternation under tone sandhi. In one of
the classes, group A (Tones 1, 2, 5, 7), the vow
els do not alternate. In group B (Tones 3, 4, 6),
however, the vow
els do alternate. That is, when one of these tones changes to any sandhi tone,
the vowel w
ill also change to a form like that w
hich occurs on the A-group tones (c.f. section
1.3).
How
many vow
el pairs are involved in these alternations varies between sources,
however the environm
ents in which they occur are consistent throughout the literature. The
vowels are considered to have been either raised in the environm
ent of the A-group tones, or
lowered in the context of the B
-group tones, but only in citation/prepausal position. Chen &
N
orman list the follow
ing finals as undergoing vowel alternations. In table 2.2 the second of
the two vow
els in the diphthongs is always the shorter of the tw
o, more like a glide (C
han 1985:401). The final ‘ŋ’ is intended to cover all syllable types w
ith a final consonant – thus both the velar nasal and the glottal stop.
C
HA
PTER 2
14
Group A
: G
roup B:
Tones 1, 2, 5, 7 Tones 3, 4 6
i ei
iŋ eiŋ
eiŋ aiŋ
u ou
uŋ ouŋ
ouŋ auŋ
y øy
yŋ øyŋ
øy œ
y ɛ
a T
able 2.2. Fuzhou vowel alternations (C
hen & N
orman 1965)
2.2 Chan’s analysis of Fuzhou tonology
One approach to tonology first proposed by G
oldsmith (1976) is A
utosegmental Phonology.
The central idea in this approach is that features occur on separate ‘tiers’, allowing for
interactions of non-adjacent features that are considered adjacent on a specific tier. This readily accounts for a lot of ‘long distance’ phonological phenom
ena, and an important
consequence for tones is that tones, too, can be considered autosegmental, existing on their
own tier. B
oth analyses of Fuzhou tonology presented in detail here are couched within
Autosegm
ental Phonology.
2.2.1 Citation tones
Chan assigns features to the tones based on the contour and pitch height and sandhi behavior
of the tone. She first considers the contour of the tones in Fuzhou: falling, rising, or rising-falling, claim
ing that the overall pitch range is secondary, with the key contrast being just high
or non-high. Chan thus posits just one distinctive feature in the tonal phonology of Fuzhou,
namely [highpitch]: [+highpitch] yields an H
toneme, and [–highpitch] yields an L tonem
e. C
han also claims as evidence the varying transcriptions gathered from
the different sources. For instance, the fact that tone 4 has been recorded as [13] and [24] she claim
s indicates that the tone can vary. It is possible that the differences m
ay represent recent developments in the
dialect. How
ever, whether this is a fact to be accounted for or m
erely the result of within- or
between-speaker differences, regional differences, or language change in progress rem
ains an open question.
The pitch values in Chan’s data differ slightly from
those given above, so I first list the values C
han uses along with the system
atic tonemes that she proposes as their underlying
representation. The ‘@’ that appears w
ith tone 2 and one of the representations for tone 4 is m
eant to signify a ‘floating’ H tone.
P
REV
IOU
S AN
ALY
SES OF F
UZH
OU
TON
OLO
GY
15
Tone 1 /44/
H
Tone 5 /51/
HL
Tone 2 /32/
L@
Tone 6 /131/
LHL
Tone 3 /213/
LH
Tone 7 /5/
HL
Tone 4 /13/
L@ or LH
Figure 2.3. Chan: citation tones and tonem
es.
That tones 1 and 5 be assigned /H/ and /H
L/ is noncontroversial. Tone 2 is described as a low
/mid or low
-fall tone, rising in some sandhi contexts, w
here it is also sometim
es a high-level, hence the assignm
ent of /LH/. C
han claims that diachronically the rising pitch can be
analyzed as a vestige of an original glottal stop final in all tone 2 words; that w
hile the glottal stop has subsequently been lost, the contour resulting from
it remains. She actually refers to it
as being historically /LH/, but if the rise is a phonetic result of the syllable’s final consonant, it
is not clear why the H
was part of the tonem
ic representation historically. Chan explains that
the evolution of this tone involved a rule delinking the H tonem
e, leaving it as a floating tone, to be relinked in certain sandhi contexts. She notes that a rising tone is unusual for a sandhi tone, that they are usually level or falling, but that the presence of one in Fuzhou m
ay be accounted for on historical grounds due to this floating tone.
Tone 3 is allocated the value /LH/. In doing this C
han ignores the initial dip, yet she describes its fall as being longer in duration in citation tones, and about equal to that of the rise com
ponent in combinations w
ith other syllables ending in tone 3. She states that because it behaves as though it w
ere initially low in sandhi contexts, the initial fall m
ust be insignificant. This, how
ever, crucially hinges on the assignments of tonem
es that have already been made,
so is somew
hat circular in reasoning. Chan continues by pointing out that in preterm
inal position, “it behaves as if it is tone 1a [=tone 1]” (1985:120, em
phasis mine), /LH
/ then accounts for this by deletion of the initial L as an early sandhi rule, a rule m
atching another posited to account for all tones like it, LH
(L). Chan continues by stating that the rise is very
audible, “sometim
es the most salient feature” (1985:121), but then notes that due to a slightly
creaky phonation type associated with this tone “the rising pitch that follow
s tends to be very slight, very short, and scarcely audible” (1985:124).
The assignment of /LH
L/ to tone 6 does not need much explanation other than to note
that because it patterns with tone 3 /LH
/, it, too, requires that early deletion of initial L rule. Tone 4 is assigned tw
o values: /L/@ and /LH
/. The reason for it having a /LH/ representation
is that it is identical in pitch shape and pitch level to the initial rising part of tone 6. The split, as noted above, is the result of tw
o different sources for the final glottal stop, namely a proto-
Min *-k for those w
ith the floating high tone and a *-ʔ in the case of the underlying representation of /LH
/. Further evidence for this split is that the [ʔ] is deleted in preterminal
position in the former case and optionally in the latter.
Tone 7 is given the representation /HL/. It is described as being phonetically very short
and very high with perhaps a slight rise. That it behaves like tone 5 in preterm
inal position is the justification for w
hy it should be assigned the same tonem
ic representation. Chan sees this
as non-problematic, as their syllable structures serve to distinguish them
, with tone 7 and not
tone 5 having a final glottal stop. Chan also suggests that, because a follow
ing atonic syllable w
ill have roughly the same phonetic value as if it w
ere following tone 5, there is all the m
ore reason to assum
e that tone 7 is in fact an underlying /HL/. She suggests that there be a glottal
C
HA
PTER 2
16
rule delinking final Ls, leaving them as floating tones in prepausal position, or linking them
to the atonic syllable that m
ight follow.
2.2.2 Disyllabic tone sandhi
Chan describes the rules for disyllabic tone sandhi, first by illustrating the pitch values
associated with the changed form
s, then by using features as indicated by the H and L features
in the table 2.2.3 below.
Second syllable →→
Tone 1
[44] H
Tones 5, 7 [51, 5]
HL
Tone 2 [32] L
Tones 3, 4, 6 [213, 13, 131]
LH
(L)
First syllable ↓↓
Resulting sandhi tones (for first syllable) given below
Tone 1 [44]
H
Tone 3 [213]
LH
44
33 53
51
Tone 6 [131]
LH
L
H
H
HL
H
L
Tone 4 (*<h) [13]
LH
Tone 5 [51]
HL
33
22
Tone 7 [5]
HL
L
L
Tone 2 [32]
L@
22
13 44
Tone 4 (*<k) [13]
L@
L
L
H
LH
Table 2.3. Fuzhou disyllabic tone sandhi form
s (Chan 1985)
The rules that Chan posits to account for the output sandhi tones are given below
. Note
that W stands for ‘w
ord’ and V for ‘vow
el’.
1. Final L deletion rule.
W
|
V
|
L H
L
⟶
Ø
2. Initial L deletion rule.
W
|
V
Ø
⟵
L
H
P
REV
IOU
S AN
ALY
SES OF F
UZH
OU
TON
OLO
GY
17
∥
3. /HL/ deletion rule.
W
|
V
H
L
⟶
Ø
4. L-spreading rule.
V
V
|
|
H
L
5. Sandhi H-docking rule.
V
V
|
L
H
L
6. /LH/ dissim
ilation rule.
V
Ø
⟵
L
H
L H
To account for the pitch-low
ering of the /H/ sandhi tones preceding /H
L/ tones, Chan posits
some ‘phonetic tone sandhi rules’ to account for the follow
ing:
V
V
⟶
V
V
|
|
H
H
L M
H
L
The rule proposed to account for this is:
Phonetic: H-delinking and low
ering rule
V
V
|
M
H
L
There is also the Obligatory C
ontour Principle to be applied after the sandhi rules but before the phonetic rules apply. It states that any contiguous (auto)segm
ents within a sandhi dom
ain m
ust be collapsed. Two further phonetic rules form
ulated are:
x
C
HA
PTER 2
18
Phonetic: H low
ering
V
L H
⟶
M
Phonetic: L raising
S
|
V
L
H
⟶
M
Chan states that the order in w
hich these rules must apply is quite strict. First the sandhi rules
1–6, then the OC
P, and finally the phonetic tone rules. She mentions that there is another rule
which m
ust be ordered before all of the above, which is the Final H
-Docking R
ule, linking the @
to stressed tone 4<k syllables. Some exam
ples of the application of these rules are included at the end of the section.
2.2.3 Trisyllabic tone sandhi C
han generates the forms, w
hich do not deviate in any way from
Chen &
Norm
an’s data, by using the disyllabic tone sandhi rules and just one m
ore: the Antepenultim
ate Tone Lowering
Rule, w
hich is ordered first and represented below.
Antepenultim
ate Tone Lowering R
ule
V
V
V
]§
| |
|
L ⟵ T1
T2 T3
Condition: T2 ≠ /H
L/
If the tone on the second syllable in a trisyllabic sandhi domain is /H
L/ then there are additional rules to account for the exceptional patterning in sandhi on the first syllable. I refer the reader to C
han 1985:326 for further explanation.
2.2.4 Quadrisyllabic tone sandhi
There are
no differences
to C
hen &
N
orman’s
data. A
ny syllables
preceding the
antepenultimate are uniform
ly L.
P
REV
IOU
S AN
ALY
SES OF F
UZH
OU
TON
OLO
GY
19
2.2.5 Vow
el alternations C
han recognizes three possibilities to account for these:
1. The A
group is basic and the B group is generated from
these for tones 3, 4, 6 in prepausal context.
2. The B
group is basic and both groups A and B
change in sandhi contexts.
3. B
oth A and B
types are represented underlyingly and only the B group needs to
undergo change for sandhi contexts.
The most noticeable difference betw
een these proposals is that 1 implies vow
el lowering in
prepausal position, while 2 and 3 involve vow
el raising in sandhi contexts. One m
ay draw on
many areas to help decide betw
een the two possibilities. O
ne such area is a possible correlation betw
een vowel height and pitch height, the im
plication being that tones may affect
the vowels. This w
as first noted by Wang (1968:10-11) as quoted at length below
. F0 is consistently influenced by intrinsic factors, all of w
hich have either an acoustic or physiological m
otivation. … som
etimes a sm
all effect conditioned by intrinsic factors has grown in
time into significant differences that participate in the m
orphophonemic alternations in the
language. For example, high vow
els are known to raise the F0 by a sm
all increment …
In a language like Foochow
Chinese, this relation betw
een F0 and vowel height has grow
n into m
orphophonemic alternations.
Each time a tone w
ith a lower F0 changes to one w
ith a higher F0, the vowel also changes to a
higher vowel. Since the tone sandhi is a m
ore general phenomenon (i.e. there are vow
el qualities like [a] w
hich are not affected by the sandhi), it is descriptively more econom
ical to let the sandhi environm
ent condition the vowel raising rule. From
the viewpoint of the physiology of speech
production, however, it is perhaps m
ore likely that the vowel raising brought about the tone sandhi,
which w
as then generalized even to those vowels that do not yet get raised.
How
ever, much of the w
ork done on the inherent pitch/vowel correlations suggests that the
changes in vowel quality due to the F0 are not of the sam
e magnitude as those observed in
Fuzhou (e.g. Maddieson 1976a; Zee 1980).
2.2.6 Sam
ple derivations Follow
ing are some exam
ples illustrating the application of rules to arrive at the correct output tones follow
ing Chan’s rules.
1. Tone 5 + Tone 6 (平路
[píng lù] ‘level track’) [51] + [131] ⟶
[22 131]
Underlying:
V
V
H L
L
H
L
C
HA
PTER 2
20
HL Deletion:
V
V
Ø
⟵ H
L L
H
L
WFC
:
V
V
L
H
L
H Low
ering:
V
V
L H
L
↓
M
Phonetic:
V
V
L
M
L
2. Tone 2 + Tone 6 (表弟
[biǎo dì] ‘cousin’) [32] + [131] ⟶
[44 131]
Underlying:
V
V
L @
L
H
L
Sandhi H docking:
V
V
L H
L H
L
LH dissim
ilation:
V
V
Ø
←
L H
L H
L
P
REV
IOU
S AN
ALY
SES OF F
UZH
OU
TON
OLO
GY
21
H low
ering:
V
V
H
L H
L
↓
M
Phonetic:
V
V
H
L
M
L
3. T 5 + T 2 + T 1 (揚子江
[yángzǐjiāng] ‘Yangzi river’)
[52] + [22] + [55] ⟶ [22 22 55]
Underlying:
V
V
V
X
L @
H
Antepenultim
ate:
V
V
V
Tone Lowering
L ← X
L @
H
OC
P:
V
V
V
L
L @
H
2.3 Yip’s analysis of Fuzhou tonology
Before presenting Y
ip’s representations of the citation tones and the data from w
hich she w
orked in order to obtain the sandhi tones, I first outline the concept of register, an integral part of the tonal geom
etry employed by Y
ip, and one which has been subsequently recognized
as one of the most capable of handling tonal representation and tone sandhi (e.g. C
hen 2000).
C
HA
PTER 2
22
2.3.1 The concept of Register
Yip (1990) adopts the idea that tones are represented by tw
o features which bear a hierarchical
relation in that one, [upper] is dominant and splits the pitch range into tw
o registers, and the other [high], is subservient and further sub-divides each register. So a tone, w
hile realized phonetically as pitch, is phonologically represented by values for each of these features: R
egister [±upper] and Tone/melody [±high] (w
ritten H or L for short). R
egister and Tone interact to define four pitch levels, as illustrated in figure 2.4 below
.
Register
Tone
+upper +high
H
–high
L
–upper +high
H
–high
L
Figure 2.4. The interaction of Register and Tone.
Each of the two features form
s a separate autosegmental tier, and is therefore subject to
the well-form
edness condition. How
ever, only Tone occurs in sequences underlyingly. R
egister remains constant over the m
orpheme, thus restricting the tonal inventory to no m
ore than tw
o of any given contour (e.g. HL/falling or LH
/rising).
2.3.2. V
owel alternations
The vowels that Y
ip aims to account for are those taken from
Wang (1969), given in table 2.4.
Group A
: G
roup B:
Tones 1, 2, 5, 7, 8* Tones 3, 4 6
i ei
ei ai
y øy
œ
ǝy u
ou ou
ɔu T
able 2.4. Yip: V
owel alternations
Yip em
phasizes not the alternations of the vowels, but rather the correspondences betw
een the vow
els and tones. She also observes that these correspondences are preserved even in tone sandhi, such that if a m
orpheme bearing a tone from
group B “changes into” (1990:276) one
of the tones from group A
, the vowel w
ill also change. While a slightly different view
point from
Chan’s, it does just so happen to be the case that all the sandhi syllables have group A
vow
els (as well as syllables w
ith the ‘eighth’ tone, [35], that only occurs in sandhi).
P
REV
IOU
S AN
ALY
SES OF F
UZH
OU
TON
OLO
GY
23
2.3.3. C
itation tones The citation tones that Y
ip assumes in her w
ork on Fuzhou are given below in figure 2.5.
Tone 1 /44/
Tone 5
/52/
Tone 2 /22/
Tone 6
/242/
Tone 3 /12/
Tone 7
/4/
Tone 4 /13/
Figure 2.5. Yip: citation tones.
Yip uses as criteria for the assignm
ent of underlying representations to the tones the phenom
enon of vowel and tone alternations, as w
ell as the natural classes observed in the sandhi behavior. Let us first consider the sandhi data that Y
ip uses in order to understand the feature assignm
ent.
Second syllable →→
Tones 1, 5, 7
[44, 52, 4]
Tone 2
[22]
Tones 3, 4, 6
[12, 13, 242]
First syllable ↓↓
Resulting sandhi tones (for first syllable) given below
.
Tone 1 [44]
Tone 3 [12]
44
52
Tone 6 [242]
(13)
Tone 5 [52]
22
Tone 7 [4]
Tone 2 [22]
22
35
Tone 4 [13]
4
Table 2.5. Fuzhou disyllabic tone sandhi form
s (Yip 1990)
Yip states that the sources vary as to w
hat the sandhi tone is when /52/ is the fist syllable,
followed by any of tones /22, 12, 13, 242/. It is represented here as [22], but has also been
represented as [12]. If the value is [22], it fits easily into group A’s tones, but if it w
ere [12], it should belong to group B
, thus we w
ould expect the lower vow
el alternants, which w
e do not see. B
ecause of this, Yip posits a rule raising the vow
els in the environment of any of the
tones from group A
. Acknow
ledging that this is clearly an unsatisfactory environment,
requiring further definition, Yip decides that the context for these changes be [+upper] register
(with som
e segmental restrictions). That [+upper] register be proposed to define the tones in
group A is not controversial, w
ith the exception of tone 2: [22]. Yip’s justification for this is
that the tones seem to predom
inantly lie in the lower half of the tonal configuration. W
hen view
ed in this way, w
hat is notated as a /22/ could just about be the mid-level tone, w
hich is (presum
ably, now) m
ore significant than its notation. In the proposed phonological system
[+upper, L] is imm
ediately adjacent to [–upper, H], so the m
id tone could feasibly be
C
HA
PTER 2
24
represented as either, and in this case, the first representation is chosen. The feature assignm
ent for the citation tones is given in figure 2.6.
[+upper]
[–upper]
44 H
H
12
LH
52, 5 H
L
13 LL
22 LL
242
LHL
35 LH
Figure 2.6. Yip: citation tones—
feature assignment
Because [44], [4] and [52] group together as a context for sandhi, Y
ip assumes that they have
the same first tonem
e, necessarily a H due to [52] being a falling tone. She also observes that
[44] differs from [52] and [4] in its sandhi form
s, and that [52] and [4] always “m
erge”, thus form
ing her reasoning for assuming that these tones have the sam
e second tone feature, L. [44] is then H
H (leaving aside issues of the O
CP that w
ould contemporarily require that H
H be H
).
Yip offers an account of the m
otivation behind her assignment of features for those
tones whose feature assignm
ent may seem
somew
hat open to debate. From its features [4]
may seem
to be a falling tone, but is clearly phonetically high and level. Yip claim
s that there is a late rule of L-tone raising applying on stopped syllables only. If it w
ere to be underlyingly a high level, the sandhi rules w
ould need segmental as w
ell as tonal contexts. Tone [13] is again split according to its origins as having glottal or velar coda. The form
er set undergoes the sam
e sandhi as [12] suggesting a final H, although som
e sources report this as [11], conversely suggesting a final L. If it has a final L its sandhi behavior is indeed sim
ilar to [4] in that the phonologically final L is realized as a phonetic high tone (albeit in different registers). Thus Y
ip posits a rule raising the tone in the presence of a final glottal stop:
Vʔ
L ⟶
H / _____
[22] is also sometim
es represented as falling, which calls into question the assignm
ent of [+upper] register. Y
ip’s main justification for this is the vow
el alternations. Moreover, if it
were [–upper], it w
ould necessarily have to be H to distinguish it from
the other [–upper] tones, grouping it w
rongly as a context for sandhi. She stresses that a feature like Upper is a
relative one, and that [22] has a higher onset pitch than the other [–upper] tones.
Unfortunately Y
ip does not have her own data, but rather m
ust rely on data gathered from
other sources, which as she notes m
ay introduce nonexistent differences of pitch due to the different researchers. W
hen trying to account for the actual pitch values of the tones, Yip
does not change the underlying representation to reflect the phonetic form; rather she follow
s standard phonological practice and posits rules, w
hich will enable the correct phonetic form
to surface. H
owever, Y
ip first describes Register as a phonetically based feature w
hich ‘bifurcates the pitch range’ yet it is largely used as a phonological feature, thus requiring the need to justify the feature assignm
ent of [+upper] to tone [22]. For more discussion on the
definition of register, see Donohue (1992b).
P
REV
IOU
S AN
ALY
SES OF F
UZH
OU
TON
OLO
GY
25
Yip’s observation of the vow
el alternations differs from previous ones in that she
views them
as having correspondences with tones, alm
ost like co-occurrence restrictions: should the tone change from
one in Group B
to one in Group A
, then the vowel w
ould necessarily change as a result of this. C
han seems to think that it is the vow
els which alternate.
The tones, in some w
ay or another, provide the context for the rule which m
ay be formulated
like any other phonological rule, such as palatalization. The differences in their viewpoints
may be likened to the difference betw
een classical phonemics and prosodic phonology: one
views the difference in a static w
ay, the other more as dynam
ic changes with a necessary
cause and effect.
2.3.4 Disyllabic tone sandhi
As the table illustrating the pitch changes in sandhi position has already been given, I m
erely present the rules form
ulated to account for these, which, like C
han’s, must be ordered in their
application.
(6) LH
L simplification
L ⟶
Ø / LH
___
(7) [–upper] deletion
[–upper] ⟶ Ø
/ [+closed glottis]
(8) G
lottal stop deletion ʔ
⟶ Ø
/ [+upper]
(9) H
L deletion H
L ⟶
Ø
(10) L dissimilation
L ⟶
H / L ___ L
(11) T deletion T
⟶ Ø
/ [–upper]
(12) L spreading
$
$
(H)
H
L
(13) Register raising
R
⟶ [+upper]
(14) Vow
el raising V
⟶
⎡–low
⎤ / [+upper]
[αlow
] ⎣αhigh⎦
C
HA
PTER 2
26
One last point of com
parison; whereas C
han relied on diachronic developments as evidence
for the occurrences of the glottal stop, Yip prefers to account for it solely from
within her
synchronic phonology.
2.4 Two analyses com
pared The tw
o analyses are based on different data, so it is hard to make a direct com
parison. H
owever, I first present a com
parison of the way in w
hich tonemes and features are assigned
to the citation tones and then the rules given to account for the tone sandhi.
Chan claim
s that pitch height is not important in Fuzhou because there are no tw
o tonal contours contrasting only in pitch height. This m
ay, however, be a side effect of w
orking with
only one informant and not actually representative of the tonal possibilities in the w
hole variety of Fuzhou. C
han claims that phonetic output is part of her criteria for choosing a
particular underlying representation, yet her account of tone 3 is not so straightforward: of this
(allegedly) fall-rise tone, she claims that the fall com
ponent is longer in duration, then that the rise is m
ore ‘salient’, and finally that the rise is scarcely audible.
On the other hand, Y
ip assigns features to the tones on a purely phonological basis. She first specifies that R
egister (initially described as bifurcating the pitch range) is the feature that w
ill account for the vowel alternations. Then, w
ithin the domain of register, she assigns
tone features, drawing on the contexts and outputs of sandhi dom
ains and the consequent groupings of tones to guide her. Y
ip does not claim to have a phonetically transparent
analysis. Instead, she posits rules to derive the observed phonetic forms. H
ow abstract one
makes an analysis varies. It is perhaps good not to be overly influenced by the phonetic form
w
hen the phonetics consists of the (reported) pitch transcriptions of a single speaker.
While the sandhi rules that C
han and Yip propose are necessarily different, m
any of the rules are equivalent in form
ulation and/or effect. This is perhaps the result of working
within the sam
e general framew
ork of generative phonology, a model w
hich tries to account for surface m
orphophonemic alternations derivationally by transform
ing a unique underlying representation of a m
orpheme into its surface form
s by applying phonetically natural rules. Y
ip has nine straight rules to account for all vowel alternations and sandhi form
s. Chan
separates the rules into phonology-specific, general, phonetic and has six sandhi rules, the O
bligatory Contour Principle and tw
o phonetic rules, in addition to rules to account for the vow
el alternations.
A flaw
in both approaches has to do with language universals. C
han claims no need to
represent more than one feature for distinguishing pitch height, as she considers it to be
unimportant for Fuzhou phonology. H
owever, this is a very language specific claim
. It is of course desirable that a proposal be as general as possible to be able to capture a universal representation of possible contrasting pitch heights cross-linguistically (e.g. H
yman 1986).
Yip m
anages to do this with her R
egister feature. How
ever, the application of this feature, originally proposed to account for different tonal possibilities, is used to capture a segm
ental feature
about Fuzhou.
It is
perfectly acceptable
within
the fram
ework
to use
tonal phonological features to capture natural classes in the segm
ental phonology, but not having a rigorous definition of a feature prevents it being universally com
parable.
27
3
Acoustic quantification of the citation
tones
This chapter describes the methods and techniques I designed to obtain the desired data, and
the assumptions justifying their use.
3.1 Consultants
One of the m
ost important selection criteria for m
y informants w
as that they come from
the city of Fuzhou, rather than som
ewhere outside C
hina where Fuzhou is spoken, such as
Malaysia. This w
as done with the intention of controlling for regional variation that w
as evident in a pilot study. I w
as able to find four speakers who w
ere remarkably hom
ogeneous w
ith respect to factors that might influence their speech: they w
ere all from the city of Fuzhou
with roughly equivalent age, educational, and socioeconom
ic backgrounds. 6 I outline relevant personalia of each of the speakers in the follow
ing sections.
3.1.1 Speaker 1: WX
Q
WX
Q is m
ale and at the time of the recording w
as aged 30 years. He is a native of Fuzhou
city, and Fuzhou is his native language. It was the m
ost frequently spoken language at his hom
e, though Mandarin w
as also used occasionally. At school, the m
edium of instruction w
as M
andarin, but Fuzhou was spoken betw
een students outside the classroom. H
e completed an
undergraduate degree (where M
andarin was also the m
edium of instruction) before m
oving to A
ustralia to pursue a PhD program
, speaking mostly M
andarin at home.
3.1.2 Speaker 2: FM
FM is fem
ale and was 30 at the tim
e of recording. Fuzhou was her first language and w
as spoken at hom
e by her mother and guardians (grandparents). M
andarin, however, w
as also acquired at an early age and occasionally spoken at hom
e. She spent most of her life in the city
of Fuzhou, where she com
pleted all schooling. Although the language of instruction w
as M
andarin, she also said that Fuzhou was m
ost comm
only spoken outside the classroom. A
fter finishing school, she w
ent on to study at the Fujian Medical C
ollege for five years. She arrived in A
ustralia in February 1991 and has mostly spoken M
andarin at home since then.
6 R
ecall that the data was collected in 1992, so the data and results are relevant to the variety of Fuzhou
as it was spoken in 1992.
C
HA
PTER 3
28
3.1.3 Speaker 3: LY
LY is another 30-year-old fem
ale native to Fuzhou city. Fuzhou was the first language to be
spoken and the dominant language at hom
e, though Mandarin w
as also used on occasion. She spent m
ost of her life in the city of Fuzhou where she com
pleted all her schooling. Her
experience was the sam
e as the others: Mandarin w
as the medium
of instruction, but the language of her peers w
as Fuzhou. She moved to A
ustralia in 1987 and has since then mostly
spoken Mandarin at hom
e.
3.1.4 Speaker 4: ZPW
ZPW is m
ale, aged 35, born in the city of Fuzhou. Fuzhou is his mother tongue and the
dominant language at hom
e, but Mandarin w
as also occasionally used. He never left Fuzhou
during his youth, completing all schooling in the sam
e city (with M
andarin as the medium
of instruction). A
fter finishing school he continued his studies for four years at Zhejiang U
niversity, where the m
edium of instruction w
as also Mandarin. Since m
oving to Australia in
1989 to pursue a PhD at the A
NU
he has mostly spoken M
andarin at home.
3.2 The corpus and elicitation The corpus w
as designed to fulfill several objectives, including controlling for any intrinsic effects betw
een segmentals and tone.
It has been shown (H
ombert 1978; M
addieson 1976; Rose 1990) that differing initial
fundamental frequency perturbations result from
different prevocalic consonant types, namely
that there will be a higher fundam
ental frequency after a voiceless consonant, and a lower
fundamental frequency after voiced consonants (Lehiste 1970:68). The corpus w
as designed to restrict the syllable types represented to those w
ith unaspirated obstruent initial consonants and m
onophthongal vowel finals (w
here possible), thus also eliminating the undesirable
effects of a nasal coda (Rose 1981, 1990). This is largely follow
ing Rose (1981) w
hose investigations for a sim
ilar study claimed that these particular specifications “interfered the
least with the w
ay the source features are reflected in the oral output” (p.92).
There are also intrinsic effects of vowel quality on fundam
ental frequency (F0) to be considered. N
amely, there is a connection betw
een the vowel quality and the average F0
associated with it. That is, all this being equal, higher vow
els will have a higher average F0
(Lehiste 1970:68). Bearing this in m
ind, individual tokens were selected that w
ould equally represent the different vow
el qualities for each tone. Strictly speaking, to eliminate any
intrinsic raising of F0 a reasonable balance should be maintained betw
een open and close vow
els. How
ever, I decided that if every sample of syllable types for each tone consisted of
equal numbers of the different vow
el types, the only overall effect of the intrinsic raising of F0 w
ould be evidenced in the average range of each speaker being raised slightly. This does not affect the tones and their functions w
ithin a given system, and as the final set of contours is
not expressed in terms of hertz, this is actually a m
oot point, so is of no consequence to the results. Figure 3.1 below
shows the phonotactically allow
ed syllable types and the actual syllable types chosen for the corpus.
A
CO
USTIC
QU
AN
TIFICA
TION
OF TH
E CITA
TION
TON
ES
29
i. σ = (C
)(G)V
(G)(C
) w
here G= glide.
ii. σ = C
V(ʔ)
w
here C=voiceless unaspirated plosive,
V
= i (~ei), u (~ou), and a
Figure 3.1. Stimuli phonotactics:
(i) Possible syllable types in Fuzhou; (ii) Syllable types chosen for the study.
I used the Hànyǔ Fāngyīn Zìhuì (漢
語方音字匯
‘Chinese dialect syllabary’) to find characters
for each tone representing syllable types which m
atched the outlined criteria. The set of characters that w
ere used are included in the Appendix A
.
Having found the appropriate characters for each of the syllable types, these w
ere w
ritten on cards 3″ x 5″ (7.5cm x 12.5cm
) by a native speaker of Chinese to ensure that there
were no subconscious effects of foreign handw
riting during the elicitation sessions. I chose to use cards, rather than another m
ethod, such as a typed list, to avoid ‘listing intonation’—a
possibly higher F0 at the beginning of the page, dropping (and thus reducing the range) in anticipation of the end of the page, as w
ell as to eliminate any sandhi effects w
hich may occur
as a consequence of two characters being read in quick succession, or as a result of the speaker
merely anticipating the follow
ing character. Presenting cards to the speakers to read prevents the speaker from
knowing w
hat the following character is, and ensures that no tw
o characters are read too tem
porally close together. I also included ‘dumm
y’ characters at both the beginning and the end of the w
hole set of cards to avoid any possible effects on intonation patterns in these special positions. Finally, the w
hole set was read three tim
es, each time in a
newly random
ized order or reverse sequence, again to ensure that the readings were those of
the citation form and devoid of any influence from
the preceding character. This was also
done to ensure that any differing emotional states resulting from
the relative stages of the session w
ould be more or less elim
inated once the arithmetical m
ean was determ
ined.
The elicitation sessions were conducted in a sound-proof booth in the phonetics
laboratory at the Australian N
ational University. U
sing a Nakam
ichi microphone, the m
aterial w
as recorded on high-quality tape using a Nagra 4.2 reel-to-reel m
onotrack tape recorder at a speed of 7.5 ips. The recordings w
ere made w
ith manually set am
plitude levels on the Nagra,
to avoid any distortion to the amplitude that can occur w
hen instruments are set in the
automatic m
ode.
After setting an optim
um level on the N
agra for the speaker in question, the whole set
of characters was read three tim
es (each time in a new
ly randomized order), tw
o repetitions per character. This provided a corpus of data from
which spectrogram
s could be made:
4 speakers × 7 tones × 3 vowels × 3 repetitions × 2 replicates = 504 tokens.
3.3 Acoustic instrum
entation and mensural procedure
In earlier work (e.g. D
onohue 1991), and during the elicitation sessions, it was quite clear that
four of the tones were produced w
ith a non-modal phonation – tones 3, 4 and 6 w
ere consistently produced w
ith a creaky/breathy voice and tone 2 was optionally slightly breathy.
Given this, I decided to use analogue instrum
entation in the acoustic quantification of Fuzhou
C
HA
PTER 3
30
tones based on the results of a pilot study (Donohue 1991) that w
as conducted using digital equipm
ent, namely the pitch extraction program
on the Interactive Laboratory System by
Signal Technology Inc. How
ever, due to the phonation type associated with som
e of the syllable types, autom
atic pitch extraction was often unsuccessful. This m
eant that some of the
tokens would have to be m
easured by hand, while others w
ould be measured by autom
atic pitch extraction. I w
anted the method to be uniform
across tokens.
All spectrogram
s were thus m
ade using the Voice-Print Laboratories series 700
Spectrum A
nalyser, and were m
easured by hand. This model has a narrow
band-pass filter of 45 H
z and a wide band-pass filter of 300 H
z. Each spectrogram, using H
i-shaping, contains 2 kH
z of narrowband, linear expanded (1 kH
z/29.1mm
) bar analysis and 2-3 kHz of w
ideband, linear (1kH
z/14.45mm
) bar information. For every token, an average am
plitude spectrogram
was additionally m
ade, containing about 1 kHz of w
ideband information at the top.
Wideband spectrogram
s were used because they enable the sam
pling of F0 and am
plitude as a function of the segmental structure (i.e. the vow
el/Rhym
e). The basic data are thus F0 and am
plitude as a function of absolute duration of the rhyme, giving us a
polydimensional param
eterization of tone. This approach follows earlier w
ork (e.g. Kratochvil
1971, 1985; How
ie 1974; Coster &
Kratochvil 1984), w
hich discussed the problems of
sampling F0 w
ithout references to segmentals, given the know
n effects of tones and segments.
It is also the best orientation point available, and it relates to perception: e.g. F0 in onset consonant transitions are not perceived as pitch. Futherm
ore, wideband spectrogram
s have good tim
e-domain resolution, m
aking accurate and reliable segmentation possible.
Determ
ining the points of onset and offset of the syllable, and thus the duration, was
mostly straightforw
ard from exam
ining the wideband spectrogram
s. Following R
ose (1981), the point of onset for these syllable types w
as taken to be equivalent to the phonation onset. Phonation offset w
as determined to occur at that point after w
hich the glottal pulse stopped the otherw
ise regular increase in the period. These onset and offset points were then transferred
onto the narrowband spectrographic inform
ation, allowing for the 2m
m lag w
hich previous experim
entation w
ith transients
produced by
recorder on-off
clicks had
evidenced. Fundam
ental frequency measurem
ents were then m
ade at 20% intervals of the duration, and
also at 5% and 95%
, a rate adjudged high enough to satisfactorily resolve details of the F0 tim
e course for all of the tones. How
ever, I sampled additionally at 50%
of the duration for the tone 6 tokens produced by W
XQ
and FM as this seem
ed to be a significant point in the contour for som
e of their tokens.
A sam
ple of the token [tu] with tone 1 uttered by speaker ZPW
is illustrated in figure 3.2. B
oth narrow and w
ide band spectrograms are of the sam
e utterance, showing the am
ount of energy as a function of tim
e against frequency. The vertical axis, frequency, is calibrated at the expanded rate of 29.1m
m per 1000 H
z for the narrow band and half that for the w
ideband. Tim
e is displayed on the horizontal axis, at a rate of 1.27mm
per centisecond. The different lines on the narrow
band spectrogram each represent a different harm
onic, that is, a whole
number m
ultiple of a speaker’s F0.
A
CO
USTIC
QU
AN
TIFICA
TION
OF TH
E CITA
TION
TON
ES
31 Figure 3.2. N
arrow and w
ideband spectrograms of [tu] uttered by ZPW
. The easiest w
ay to derive a speaker’s F0 at a given point in the speech wave w
ould seem
to be to simply m
easure the first harmonic. H
owever, the m
easurement of a higher
harmonic gives a m
uch more accurate value for the F0 since there is less error per unit
frequency. Therefore, when m
easuring the F0 at any of the points from the spectrogram
, I m
easured the highest clear harmonic w
ith sufficient energy in it to distinguish it properly. It w
as not possible to measure each token from
the same corresponding harm
onic due to the different vow
el types, as some vow
els lack energy in certain parts of the spectrum. H
aving m
easured the harmonic, the F0 at that point w
as calculated. For example, in figure 3.2 the
distance between duration onset and offset points w
as measured to be 5.87cm
. Thus the duration is (5.87 ÷ 0.127 =) 46.22 csecs. N
ext, the various intervals of effective vocalic duration w
ere calculated so that the F0 could be sampled. A
t the 20% interval, the
measurem
ent was m
ade from the fifth harm
onic. The actual frequency at that point is (2.21cm
x 1000 ÷ 2.91 =) 760 Hz. In order to obtain the F0, recalling that harm
onics are whole num
ber m
ultiples of the F0, it is necessary to divide by the number of the harm
onic. Thus the fundam
ental frequency is determined to be (760 ÷ 5 =) 152 H
z.
Some difficulties arose w
hen determining the offset points of the phonation for tones 3
and 6 given the different phonation type associated with these tones (also w
ith tone 4, but as this is a stopped tone, the end point is quite clear). Tone 2 also has a seem
ingly optional change in phonation type, how
ever the phonation associated with tone 2 is som
ewhat m
ore breathy than the creak/breath accom
panying tones 3, 4 and 6. To illustrate these difficulties, I have included an exam
ple of the same speaker’s tone 3 uttered on the segm
ents [pa], shown in
figure 3.3. These tokens reflect a less extreme, though still reasonably typical instance of the
phonation change for this tone. Onset points m
ay be determined as previously described,
however, the offset points are som
ewhat harder to estim
ate. I have circled the areas crucial in m
y determination of these points. It w
as often necessary to inspect both the formants and the
periodicity at the baseline when choosing the appropriate point. In the first utterance, one can
see that at the designated point of offset there is noise in the first formant region. H
owever,
inspection of the circled area and of the periodicity at the baseline should confirm that until
C
HA
PTER 3
32
this point the period was regularly increasing. It is after this point that one can see a fairly
abrupt offset to phonation with a couple of irregular periods at the end. In the second token on
the same spectrogram
there is a breathy offset, with a noise excited second form
ant.
Figure 3.3. Spectrogram of tone 3 [pa] spoken by ZPW
with non-m
odal phonation.
Figure 3.4. A
verage amplitude spectrogram
on tone 1 [tu] spoken by ZPW.
A
CO
USTIC
QU
AN
TIFICA
TION
OF TH
E CITA
TION
TON
ES
33
The amplitude data w
ere less problematic. The onset and offset points w
ere transferred on to the spectrogram
from the previous spectrogram
s. Each horizontal interval of 1.455cm
represents an increase in 6 dB. A
dditional experimenting w
ith transients this time determ
ined there to be a lag of 1.5m
m. I have included a sam
ple of the amplitude spectrogram
s, for the token [tu] on tone 1 as uttered by speaker ZPW
(figure 3.4). At the 40%
interval of duration, the am
plitude is determined to be 0.55cm
up from the 24dB
line. That is (24 + [0.55 x 1.455 =] 0.8 =) 24.8 dB
.
Now
adays most speech analysis is done using Praat softw
are (http://ww
w.praat.org) or
similar program
. How
ever, it is not possible to so readily measure am
plitude like this with
Praat, whose ‘am
plitude’ extraction options are measures of intensity w
hich depends on the ‘pitch’, and w
hich have a time resolution that is too great, to be able to infer anything about F0
production from it. .
3.4 Results
The procedures described above resulted in F0 and amplitude quantified as functions of
absolute Rhym
e duration. These were taken to be the basic acoustic correlates of the Fuzhou
tones. There are approximately 7000 m
easurement points for both F0 and am
plitude. The raw
values may be found in A
ppendices C and F respectively.
For every tone and for each speaker, the means w
ere taken at each measurem
ent point of duration. The m
ean duration for each tone for each speaker was also found. This w
as done using a Statview
2 package on an Apple com
puter. The results of these mean values are in
Appendix D
(F0) and Appendix G
(Ar).
3.5 Summ
ary This chapter described the procedures used to obtain the acoustic data from
simple analog
instrumentation. These data w
ill be analyzed in the following chapter.
34
35
4
Acoustic characteristics of the citation
tones
This chapter presents the data obtained following the m
ethod outlined in chapter 3. First I present the param
eters chosen to describe the F0, and then give a brief auditory description of the isolation tones in Fuzhou in section 4.2. A
preliminary discussion follow
s in section 4.3 before presenting the results for each speaker’s m
ean tonal F0 contours in section 4.4. Section 4.5 com
pares and contrasts these results. Section 4.6 discusses normalization: its m
otivation and the procedure, then illustrates and discusses the results of the norm
alization. Finally, section 4.7 sum
marizes the chapter.
4.1 Method of data interpretation
The results are presented graphically, plotting the raw m
ean values for the F0 against the raw
mean values for the duration at the corresponding percentage points. A
bsolute duration for each tone is used in this study, not equalized duration as is often done, so that the im
portance of duration to the tonal system
may be exam
ined. Indeed, equalizing duration can obscure betw
een-tone differences in tonal F0 shape (e.g. Rose 1993). H
owever, before presenting the
results, I first describe the parameters chosen for interpreting the data, necessarily im
plicit in m
y presentation and descriptions.
The underlying assumption is that the tonal system
may function discretely w
ith respect to actual (raw
) values for duration and range of F0. The raw values are view
ed more as
variables dependent on outside factors such as the sex of the speaker and their emotional state.
Both the param
eters I have chosen are designed with the expectation that by observing each
speaker in a similar w
ay it will be possible to com
pare and contrast these observations. From
this, one can find invariant features that should reflect significant distinguishing features of the w
hole variety.
I describe the duration in terms of relative lengths. This is so that w
e can determine
whether a particular length is salient for a given tone, or perhaps part of an intrinsic gesture
dependent on the nature of the F0 contour. The F0 will first be exam
ined in terms of its onset
and offset points. I next examine the contour of the tonal F0 in term
s of gradient or derivative, that is, the rate of change of F0 (H
z) with respect to tim
e (csecs). To further investigate the given contour of the tone, I observe the points that evidence a change in the gradient. The param
eter for this is percentage points of duration. Finally, there is the possibility of there being significant points in the system
s expressible within the aforem
entioned parameters. That
is, points which m
ay be considered to be significant to the whole configuration, for w
hich the speaker m
ight aim w
hen producing the tones and as a result there may be a clustering effect
C
HA
PTER 4
36
around these points. This could have importance should the tones be able to be described in
terms of these, as this could be used as input data w
hen either assessing a phonological representation for the w
hole tonal system or form
ulating a new one. H
owever, it w
ill first be necessary to determ
ine which points are significantly different from
one another and which
may be considered to be the sam
e. While this can be done for each speaker w
ith AN
OV
A, I
suggest that such testing is best left until after a comparison has been m
ade with the other
tones, which w
ill indicate whether significance testing is even necessary. A
nother obvious criterion by w
hich the tones may be assessed is w
hether the tone occurs on a syllable with a
final stop (the so-called stopped tones). How
ever, this does not need to be part of an explicit discussion, w
hich is intended only to clarify those features that are less well delineated. B
efore em
barking on the discussion, I first provide a brief auditory description of the results before som
e further preliminary considerations and the results for the individual speakers are given.
4.2 Auditory characteristics
This section describes the tones in Fuzhou as illustrated by my speakers. The m
ain reason for this is the lack of agreem
ent of tonal pitch values between previous authors w
orking with
auditory-based observation data. Table 4.1 describes the tones in Fuzhou as perceived by the author. It w
ill be seen that it appears to represent a tonal system that does not m
atch up exactly w
ith any of the previously published works.
Tone 1 B
asically high level. A slight final rise for about the final third of duration. E.g.
[pa] 巴 (an affix), [ki] 期
(a period of time)
Tone 2 M
id-falling. Seemingly optional change in phonation to slightly breathy for the
final third or quarter of the duration (free variation). E.g. [pa] 把 ‘handle (n)’, [pi]
比 ‘to com
pare’
Tone 3 Low
, falling slightly. Phonation usually slightly breathy/creaky. Tends to creak tow
ards the end of the utterance. E.g. [pa] 霸 ‘tyrant’, [kei] 既
‘even if’
Tone 4 Slight dipping (fall-rise). Short and w
ith abrupt offset to phonation. Phonation usually slightly breathy/creaky. E.g. [paʔ] 百
‘hundred’, [teiʔ] 滴 ‘drip’
Tone 5 H
igh fall, falling to just beyond the mid pitch range. E.g. [pa] 爬
‘to climb’, [ki] 奇
‘odd’
Tone 6 R
ise-fall in the lower part of the speaker’s range. Phonation usually slightly
breathy/creaky. E.g. [ta] 第 (ordinal prefix), [tei] 治
‘cure’
Tone 7 H
igh, very short with abrupt phonation offset. E.g. [paʔ] 白
‘white’, [tiʔ] 姪
‘nephew
’ Table 4.1. A
uditory description of the tones in Fuzhou.
A
CO
USTIC
CH
AR
AC
TERISTIC
S OF TH
E CITA
TION
TON
ES
37
4.3 Preliminary considerations
Before beginning the individual descriptions of the tones, I m
ust first discuss some
considerations relevant to the interpretation of the data. It is ultimately the aim
of this study to illustrate the citation tones, reflecting all and only the characteristics of extrinsic control. Thus these prelim
inary considerations have to do with som
e features found in the data which seem
to be intrinsically produced effects, and not im
portant in terms of linguistic tonal production.
Specifically, I found there to be onset and offset perturbations that appear constant for all speakers.
Not all the F0 is tonal F0. O
nset perturbations are evidenced by an initial drop in F0 that affects all tones roughly equally, regardless of length or contour. These effects m
ay thus be elim
inated on the grounds that they cannot be characteristic of any individual tone and will
be ignored. I initially hypothesized that it was a percentage of the duration that m
ay be ignored. H
owever, considering that the effect is likely due to the coincident V
OT stop, the
amount to be ignored should rather be a fixed tim
e period, a constant of (at least) 5 csecs. This is supported by the fact that despite the varying size of the speaker’s m
aximum
duration, the effects are alw
ays similar in m
agnitude in absolute terms (raw
csecs). When describing the
tonal F0, I shall thus consider the onset point to be csec 5.
The next point to discuss is the offset perturbations. It is clear that there are also perturbations in the F0 derivative at the tail of the tones, evidenced by a final and abrupt drop (or rise) in F0. R
ose (1990a) suggests that a suitable parameter w
ould be a fixed constant expressed in centiseconds. 7 Elaborating on this, I w
ould rather have a set algorithm for
determining the constant to be ignored at offset than just a fixed constant as w
as chosen for the onset perturbations, as the effect is not really a ‘fixed’ one, like that of the initial consonant. M
oreover it seems that you w
ould lose too much inform
ation from the really short tones if the
offset perturbation were a fixed constant. M
ore suitable perhaps is that the offset perturbations are expressed in term
s of a fixed time constant that is sensitive to the tone’s duration. O
ne suggestion m
ight be a constant 5% of the duration as calculated from
the 5 csec mark. So if a
speaker has a maxim
um duration of e.g. 50 csecs, then the offset perturbations w
ould be [(50-5) x 5%
=] 2.25 csecs, or if a speaker had a maxim
um duration of 35 csecs, then the offset
perturbations would only be 1.5 csecs. A
ccomm
odating the different durations is desirable as the four speakers vary greatly in their (m
ean) maxim
um durations: 31, 39, 42 and 47 csecs.
How
ever, while accom
modating differences such as fast and slow
speech, this method w
ould not allow
for intrinsic differences between tones, in particular the extrem
ely short duration of the stopped tones. H
owever, R
ose does note that the perturbations are much less extrem
e on the stopped tones (w
hich are also much shorter in his data). This w
ould seem to suggest that
either the offset perturbations are sensitive to the syllable type, the speaker anticipating a final consonant, or that the difference results intrinsically from
the final consonant, or else that the offset perturbations are sensitive to the different tonal targets. That is, the extrinsic gestures m
ade in tonal production may seem
to somehow
accomm
odate phenomena such as offset
perturbations (or vice versa), suggesting a fraction of the desired, or target, duration for the tone in question as that portion to be ignored. I w
ill express these, then, in terms of percentage
of duration of the specific tone (namely the final 5%
) to avoid any disturbances in the perception of the target contour due to the relative differences in duration betw
een the tones. 7 1 centisecond = 10 m
illiseconds.
C
HA
PTER 4
38
The final 5% of duration shall thus be ignored in m
y individual speaker descriptions. Should the reader w
ish to check these perturbations, figures 4.1 to 4.4 show the results for the citation
tones without any m
odifications, plotting the F0 from 0–100%
duration.
Figure 4.1. W
XQ
: Citation tones 0–100%
duration.
Figure 4.2. FM
: Citation tones 0–100%
duration.
05
1015
2025
30
90 100 110 120 130 140 150
Mean F0 plotted against m
ean duration for WX
Q
Mean duration (csec.)
F0 (Hz)
T1
T2T3
T4
T5
T6
T7
010
2030
4050
160 180 200 220 240
Mean F0 plotted against m
ean duration for FM
Mean duration (csec.)
F0 (Hz)
T1
T2
T3
T4
T5T6
T7
A
CO
USTIC
CH
AR
AC
TERISTIC
S OF TH
E CITA
TION
TON
ES
39
Figure 4.3. LY
: Citation tones 0–100%
duration.
Figure 4.4. ZPW
: Citation tones 0–100%
duration.
010
2030
40
140 160 180 200 220
Mean F0 plotted against m
ean duration for LY
Mean duration (csec.)
F0 (Hz)
T1T2
T3T4
T5
T6
T7
010
2030
40
80 100 120 140 160
Mean F0 plotted against m
ean duration for ZPW
Mean duration (csec.)
F0 (Hz)
T1T2
T3
T4T5T6
T7
C
HA
PTER 4
40
One im
portant conclusion that may be draw
n from the above discussion is that not all
the F0 reflects the tones. It may be m
ore appropriate to conceive of the F0 contours as tonal targets, as w
ell as perhaps consonantal targets (allowing for consonant-tone interaction – both
at onset and offset), rather than considering all of the F0 information relevant to the
perception, or even the production of the tones. The next section presents the results for each of the four speakers.
4.4 Individual speaker results This section presents and describes the results for each speaker. W
hile these results have ignored the aforem
entioned 5 csec. onset and 5% duration offset perturbations, the percentage
points are of the original duration.
4.4.1 Speaker 1: WX
Q
The plotted mean F0 contours for speaker 1 are given in figure 4.5. W
XQ
has a small F0 range
(distance between the low
est and highest points in the tonal configuration relative to the y-axis), just 37 H
z, lying between 95 and 135 H
z. His m
aximum
duration finishes just short of the 30 csec m
ark. The relative durations of each tone for the first speaker are: T6 > T2 > T3, T5 > T1 > T4 > T7
The rise-fall has the longest duration, thought its duration is only slightly greater than all of the falling tones, w
hich cluster together at the bottom of the speaker’s range, tone 2
being slightly longer than tones 3 and 5. Next is the level tone, tone 1, and the shortest of all
tones are the stopped tones. The higher of the two stopped tones, w
ith the slightly less com
plex contour – rising, not dipping – is the shortest tone.
Tone 1: Starting at the highest onset point, this is basically a level tone. H
owever, after an
initial drop of nearly 10 Hz betw
een 5 and 20%, it steadily rises at a rate of 0.16 H
z/csec for the rest of its duration.
Tone 2: Tone 2 has the third highest onset point, just about in the m
iddle of the speaker’s range. It is a falling tone, though the derivative is not constant throughout the tone’s duration. From
20-40% of duration, the gradient is -0.43 H
z/csec, increasing to -0.77 for the remainder
of the duration. This gives an overall derivative of -0.8 Hz/csec for this tone.
Tone 3: This com
mences at the third low
est onset point, at about a third of the way up the
speaker’s range. This tone has a concave fall – that is, the gradient decreases with duration, to
90% of its total duration, after w
hich it drops off more steeply.
Tone 5: The onset point is just 2 H
z higher than that of tone 2. These tones appear to be very sim
ilar, with close onset and offset points. H
owever, there is a difference in the actual
contours, and a further difference evidenced in the overall derivatives for these tones. Tone 5,
A
CO
USTIC
CH
AR
AC
TERISTIC
S OF TH
E CITA
TION
TON
ES
41
while also being slightly shorter in duration, has a consistently steeper gradient than that of
tone 2. In fact, the gradient for tone 5 is nearly constant throughout the tone’s duration, at a rate of 1.05 H
z/csec. Despite their sim
ilar F0 shapes, these tones are audibly different.
Tone 6: The onset point for this tone is about 2.5 H
z below that of tone 3. Its F0 contour is
convex. The rise component starts at 20%
duration and peaks at 50% after w
hich it steadily falls for the rest of the tone. The rise has a derivative of 0.7 H
z/csec and the fall is -1.06 H
z/csec.
Tone 4: This has the low
est onset point and is a dipping tone. It is a stopped tone and the second shortest of all tones, and a com
parable contour to tone 7 for WX
Q. D
ropping at a rate of -0.65 H
z/csec from the onset point to 20%
duration, this tone then rises at a rate of 0.35 to 40%
after which it steepens to 2.45 H
z/csec.
Tone 7: This has the m
id onset point, about 1 Hz less than that of tone 2. It has a sim
ilar contour to that of tone 4, though the corresponding derivatives are m
uch greater and the duration is (consequently?) shorter. A
fter dropping about 1 Hz in the first 2.5 csecs, this tone
begins to rise at a rate of 0.35 Hz/csec for 20%
duration, at which point it steepens to 2.45
Hz/csec for the rem
ainder.
Cluster points: It is possible to distinguish certain levels w
ithin this system. The m
ost striking im
pression is the clustering of tones in terms of their onset points into tw
o groups of three and one separate to both groupings. The highest onset point is defined by tone 1. The second and m
id range point groups tones 2, 5, and 7. Tones 3, 4 and 6 are among the low
est of the onset points. The first point, 5, is the highest level to w
hich tone 7 rises. Point 4 is the level on w
hich tone 1 lies, to which tone 4 rises and from
which tone 5 falls. Point 3 is the level at
which tones 2, 5 and 7 have their onset and to w
hich tone 6 rises. Point 2 is the level from
which tones 3, 4 and 6 start and level 1 is the point of offset for all the falling tones. G
iven these points, w
e may quantify the tonal F0 as follow
s: T1: 44, T2: 31, T3: 21, T4: 24, T5: 31, T6: 231, T7: 35
C
HA
PTER 4
42
Figure 4.5. W
XQ
mean F0 contours from
5csec. to 95% duration.
4.4.2 Speaker 2: FM
Speaker 2’s range is about 70 H
z, lying between 165 and 240 H
z and her maxim
um duration
finishes at about 45 csecs as can be seen in fig. 4.6. Her relative durations are:
T6 > T1, T2, T3 > T5 > T4 > T7
The rise-fall is the longest of all tones, then the level, mid and low
tones. Next is the high fall
and finally the stopped tones.
Tone 1: This tone starts at the top of the speaker’s range. The onset and offset points are just
about equal, but strictly speaking, the tone is not level. After an initial drop of nearly 5 H
z in about 2.5 csecs (w
hich I suggest is still part of the onset perturbation), the tonal F0 continues to fall from
20 to 60 % of the duration at a rate of -0.23 H
z/csec. At this point it then rises at a
rate of 1.1 until offset.
510
1520
2530
100 110 120 130
Mean F0 &
Mean duration for W
XQ
: 5 csec to 95% duration
Mean duration (csec.)
F0 (Hz)
T1T2T3
T4
T5T6
T7
A
CO
USTIC
CH
AR
AC
TERISTIC
S OF TH
E CITA
TION
TON
ES
43
Tone 2: Tone 2 starts in the m
iddle of the speaker’s range. The F0 contour is slightly concave (the reference points being onset and offset of duration), w
ith a reasonably constant fall (-0.95 H
z/csec) for about 60% of its duration. A
lthough this tone and tone 3 are both falling, their durations group together w
ith the high (basically) level tone: whereas the high fall, traversing
a much larger portion of the w
hole range, is also distinguished from these tw
o falling tones by its shorter duration.
Tone 3: This tone shares the onset point w
ith tone 6 at about one-third up in the speaker’s range. Its F0 contour is slightly concave and alm
ost parallel in shape to that of tone 2, just described.
Tone 4: This has the low
est onset point, roughly 10 Hz low
er than tone 3’s onset point. The F0 contour is slightly dipping, though m
ostly low-rising, and it is a stopped tone, w
hich explains the short duration. It has a sim
ilar contour to the rising component of tone 6, and lasts
for the same length of tim
e.
Tone 5: This tone appears to have tw
o different targets. For three speakers, tone 5 is a high falling tone, starting at the top of the speaker’s range, about 5 H
z higher than tone 1. It falls steeply and im
mediately, w
ithout any initial level component, traversing about tw
o-thirds of the speaker’s range, and finishing at about the sam
e place in the range as tone 2. For speaker 1, this is m
ore like a mid-fall, w
ith a very similar tonal F0 shape to (his) tone 2.
Tone 6: This tone has a low
onset to a rise-fall contour. The rise peaks at 50% of duration, and
this is maintained through to 60%
. The derivatives either side of this plateau are just about the sam
e, differing only in sign, with the rise at 1.6 and the fall -1.5 H
z/csec.
Tone 7: O
nsetting about 10 Hz below
tone 1, this tone drops nearly 15 Hz in less than 15
csecs. This is also a stopped tone, hence its short duration. Unlike speaker 1 (and like
LY/ZPW
), FM has a shorter, non-rising tone 7.
Cluster points: A
similar clustering effect for the onset points as w
as found for WX
Q can be
noted here for FM, except three tones (1, 5, 7) group together at the top, w
ith tone 2 at the m
id-range point, and the same three tones (3, 4, 6) in the low
er half of the range. How
ever, in order to describe the F0 contours, the top cluster point is broken dow
n into tones 1 and 5 at the top and tone 7 just below
. Six points are then needed to describe these F0 data. Point 6 is the point of onset for tone 5 and the point of onset and offset for tone 1. Point 5 is the point of onset for tone 7, and point 4 is its offset. The third cluster point defines the onset of tone 2, the offset of tone 4 and the peak of tone 6. Point 2 is the point to w
hich tones 2, 5 and 6 fall and at w
hich tones 3, 4 and 6 onset. The lowest point is the point to w
hich tones 3 and 4 fall. This entails a tonal F0 value assignm
ent as follows: T1: 656, T2: 32, T3: 21, T4: 213, T5: 52, T6:
232, T7: 4.
C
HA
PTER 4
44
Figure 4.6. FM
mean F0 contours from
5csec. to 95% duration.
4.4.3 Speaker 3: LY
Speaker 3 has a range of about 75 Hz betw
een 145 and 225 Hz, and m
aximum
duration finishing at about 37 csecs as seen in fig. 4.7. R
elative durations are:
T6 > T1, T2 > T3, T5 > T4 > T7
Again, the rise-fall is the longest of all tones. N
ext come the high, m
ostly level and the mid
tones. The next are the high fall and the low fall tones. Finally the stopped tones are the
shortest, the low rise being alm
ost double that of the high stopped tone.
Tone 1: This tone onsets about one quarter from
the top of the speaker’s range. After a gentle
fall (0.3 Hz/csec) to 40%
duration, the F0 rises steadily for the rest of the duration at a rate of 0.5 H
z/csec.
1020
3040
160 180 200 220 240
Mean F0 &
mean duration for FM
: 5 csec to 95% duration
Mean duration (csec.)
F0 (Hz)
T1
T2T3
T4
T5T6
T7
A
CO
USTIC
CH
AR
AC
TERISTIC
S OF TH
E CITA
TION
TON
ES
45
Tone 2: This onsets in the m
iddle of the speaker’s range, and falls steadily at a rate of -0.3 H
z/csec for 80% of the duration. The last part of the duration evidences a rise in the F0
contour at a rate of 1.0 Hz/csec.
Tone 3: O
nsetting just less than 10 Hz below
tone 2, the F0 contour then falls to the lowest
point in the speaker’s range. The contour is not similar to that of tone 2, rather this tone falls at
a rate of -0.9 Hz/csec until 40 %
duration, after which the contour becom
es concave, falling at a m
uch higher rate that then decreases with tim
e.
Tone 4: This tone has the low
est onset point and the F0 falls to nearly the bottom of the
speaker’s range until 40% of duration. A
fter this point, the contour begins to rise gradually for the next 20%
(0.3 Hz/csec), then steepens to 3.1 H
z/csec for the remainder of the duration.
This is a stopped tone, so the duration is very short.
Tone 5: This high falling tone onsets at the top end of the speaker’s range, and falls to about
two-thirds of the w
ay down the range. It falls w
ithout an initial level component and its overall
derivative is -2.4 Hz/csec.
Tone 6: This rise-fall has the second low
est onset point, just 6 Hz above that of tone 4. This
F0 contour falls at a rate of -0.35 Hz/csec for 40%
of the duration, which then becom
es the rise com
ponent. The rise lasts for only 20% duration, but is a large increase of 24/2 H
z. The final fall com
ponent steepens with duration and has an overall derivative of -2.3 H
z/csec, falling just short of the bottom
of the speaker’s range.
Tone 7: The tone has the highest onset point and offsets just above the onset point for tone 1.
This is a stopped tone and the shortest of all the tones. Its F0 contour is a slight fall until 80%
duration, after which it drops off rapidly.
Cluster points: The onset points for LY
group into either high or low parts of the speaker’s
range. In the top half are tones 1, 5 and 7, and from the m
id-range point down lie the other
tones. Salient points include the highest point, “5” is the point to which tone 1 rises and from
w
hich tones 5 and 7 fall. Point 4 is the onset point for tone 1 and offset for tone 7. Tone 2 lies on the cluster point 3, also the point to w
hich tones 4 and 6 rise. Tone 3 could be interpreted to start at either 3 or 2. C
luster point 2 is also the point to which tone 5 falls and from
which tone
6 rises. The final cluster point is the point from w
hich tone 4 rises and to which tones 3 and 6
fall. This gives rise to the following tonal F0 values: T1: 45, T2: 33, T3: 31/21, T4: 13, T5: 52,
T6: 231, T7: 54.
C
HA
PTER 4
46
Figure 4.7. LY
mean F0 contours from
5csec. to 95% duration.
4.4.4 Speaker 4: ZPW
ZPW’s F0 contours are given in fig. 4.8 below
. The speaker’s range is about 100 Hz, lying
between 70 and 170 H
z. His m
aximum
duration falls just short of the 40 csec mark. R
elative durations are:
T1, T2 > T3, T6 > T5, T4 > T7
The high mostly level and m
id fall tones are the longest. The low fall and the rise fall are the
next longest tones, with alm
ost the same duration. The high falling tone is quite short,
effectively the same as the rising stopped tone. The shortest of all tones is the high stopped
tone, less than half the duration of the other stopped tone.
510
1520
2530
3540
140 160 180 200 220
Mean F0 &
mean duration for LY
: 5 csec to 95% duration
Mean duration (csec.)
F0 (Hz)
T1
T2
T3T4
T5
T6
T7
A
CO
USTIC
CH
AR
AC
TERISTIC
S OF TH
E CITA
TION
TON
ES
47
Tone 1: Tone 1 onsets at the third onset point, about a third of the w
ay from the top of the
speaker’s range. The gradient is steady for most of the duration, rising at a rate of 0.21
Hz/csec.
Tone 2: This tone has its onset point in the m
iddle of the speaker’s range. The F0 falls steadily for all of the duration, at a rate of -0.58 H
z/csec, stopping at about one-third of the way from
the bottom
of the speaker’s range.
Tone 3: This tone has the next low
est onset point, about 10 Hz low
er that that of tone 2. This tone falls m
ore steeply than tone 2, to the lowest point in the speaker’s range at a rate of -1.28
Hz/csec.
Tone 4: This rising stopped tone has the low
est onset point, at about one-third from the
bottom of the speaker’s range. N
ot much of a fall is obvious though the gradient steadily
increases with duration, giving this tonal F0 m
ore of a level-rise appearance. The contour rises to about level w
ith the onset of tone 1, and just past the peak in tone 6’s rise.
Tone 5: The high fall has the highest onset. It falls rapidly, at a rate of -3.9 H
z/csec, traversing tw
o-thirds of the whole range.
Tone 6: This convex tone starts about 2 H
z higher than tone 4. The rise component peaks at
60% of the total duration, rising at a rate of 2.0 H
z/csec. After this point it falls at a rate
of -2.64 Hz/csec to just above the corresponding place in the range for the point of onset.
Tone 7: This tone has an onset point halfw
ay between those for tones 1 and 5. This is a very
short tone, so there’s not much detail in the contour. H
owever, it is basically level w
ith a final drop.
Cluster points: Just like the clustering for speaker 3, ZPW
’s onset points may be grouped into
higher and lower parts of the speaker’s range, w
ith tones 1, 5 and 7 in the higher part. H
owever, like speaker 2, six points are necessary to properly describe the F0 contours in this
speaker’s system. These m
ay be defined as follows: point 6, the highest point is that from
w
hich tone 5 falls. Point 5 is the point on which tone 7 lies, and to w
hich tone 1 rises. The fourth point is the onset of tone 1, and that to w
hich tones 4 and 6 rise. The cluster point 3 defines the onset of tone 2 and the point 2, the onset of tones 3, 4 and 6. The low
est point defines the offset of tone 3. Follow
ing from this are the tonal value assignm
ents: T1: 45, T2: 32, T3: 21, T4: 24, T5: 62, T6: 242, T7: 5.
C
HA
PTER 4
48
Figure 4.8. ZPW
mean F0 contours from
5 csec. to 95% duration.
4.4.5 Summ
ary This section has presented and described the results for each of the four speakers individually follow
ing the parameters outlined earlier. The next section w
ill be comparing and contrasting
these results. How
ever, it should be obvious that defining cluster points for individual speakers is not that productive. M
ore productive is to first factor out the between-speaker
differences and a representation of the variety as a whole has first been obtained. The next
section compares and contrasts the descriptions w
hich is a useful exercise in that it serves to illustrate the problem
s associated with descriptions based only on one speaker; the type of
idiosyncratic differences between speakers in the production of tonal targets; and the
similarities, specifically of contour, for w
hich all speaker’s appear to be aiming w
hen they produce their tones.
510
1520
2530
3540
80 100 120 140 160
Mean F0 &
mean duration for ZP
W: 5 csec to 95%
duration
Mean duration (csec.)
F0 (Hz)
T1T2
T3
T4
T5T6
T7
A
CO
USTIC
CH
AR
AC
TERISTIC
S OF TH
E CITA
TION
TON
ES
49
4.5 A com
parison of the individual speaker’s results This section com
pares and contrasts the tonal F0 configurations for all the speakers. This goal of this is to illustrate characteristics of extrinsic control by identifying the sim
ilarities (and differences). This w
ill be useful for the following sections that assess w
hich of the tones to take as the param
eters for normalization. A
fter first comparing the relative lengths for
duration, I compare and contrast individual tones or pairs of tones.
4.5.1 Duration com
parisons The results of the duration com
parisons reflect what w
ould be expected both in terms of
intrinsic and extrinsic control. A falling tone is expected to be intrinsically shorter than a non-
falling tone, and a rising tone is expected to be intrinsically longer than a non-rising tone. The short duration of the stopped tones reflects extrinsic control. These facts are w
ell corroborated by m
y results.
For all speakers except ZPW the rise-fall tone is the longest, follow
ed by the high level and then the m
id and low falls. W
hile ZPW has his level tone as the longest, the other
speakers’ tones 1, 2, 3 and 6 are longer than tones 4, 5 and 7 (high fall, stopped tones). All
speakers are similar in that the high fall is shorter than any of the other non-stopped tones
(except WX
Q w
hose actual target for this tone will be discussed below
), and of the two
stopped tones, the tone with the com
plex rising/dipping contour will have the longer duration.
4.5.2 Tonal F0 comparison
Tone 1: This tone is located in the top third of all speakers’ ranges. The F0 has a steady,
though gradually increasing derivative for the whole duration. The fem
ale speakers, however,
show m
ore dynamic contours – FM
dropping a little, and then increasing at a greater rate during the final 20%
duration; LY show
ing a similar behavior, though w
ith relatively smaller
gradients. All speakers offset this tone at a point in their range just above, though very close
to, their onset point for the same tone.
Tone 2: Tone 2 starts about m
id range for all speakers and falls. While FM
produces her tone 2 w
ith a slightly concave F0 contour, the other speakers have contours with reasonably
consistent gradients throughout the whole duration. LY
does not fall as far as WX
Q or ZPW
(the m
ale speakers), exhibiting a small rise during the final 5%
duration, placing the offset point very close to the onset point in her range. FM
and ZPW fall to com
parable levels in their tonal system
s, the tone finishing at a place very near to that of tone 6’s offset. LY’s tone 2
finishes much higher in her system
than this. WX
Q’s offsets before the end of his tone 6, the
offset point of which is closer to that of tone 4. H
is tonal F0 is much steeper than that of the
other speakers, making it very like the F0 contour of (his) tone 5.
Tone 3: This tone alw
ays has the third lowest onset point, in the bottom
third of the speaker’s range. The F0 falls to the low
est point in the speaker’s range, identifying the bottom of the
range.
C
HA
PTER 4
50
Tone 4: This is a dipping stopped tone. For all speakers this is located in the bottom
half of their F0 range, w
ith the lowest onset point. It generally rises to near the highest point in tone 6,
the rise-fall tone, and is roughly equal in duration to the rise component as w
ell. It is possible that the initial fall in the dipping tonal F0 contour m
ay be partly consonantally induced.
Tone 5: W
XQ
seems to have a different target for this tone. H
is level tone is shorter in duration than this tone, but m
ost noticeable is the difference in derivative for this tone between
WX
Q and the other speakers. M
ost speakers’ tone 5 has a derivative at least twice as steep
(FM) as that of their tone 2, even up to eight tim
es as steep (ZPW). W
XQ
’s tone 5, however,
has a derivative only 1.3 times as steep as that of his tone 2. W
XQ
’s tone 5 also falls to the very bottom
of his range. It could be a consequence of his condensed range, the peripheral points seem
ing to become m
ore centralized. The end result is that there is little difference betw
een WX
Q’s tones 2 and 5. It should be recalled that despite their acoustic sim
ilarity they are in fact audibly different, and also m
ay be perceived as a different target. For the other speakers, this high fall starts at (FM
, ZPW) or second from
(LY, W
XQ
) the top of the speaker’s range. The F0 alw
ays falls without a level com
ponent, traversing approximately
two-thirds of the speaker’s range. This is usually the shortest of all the unstopped tones.
Tone 6: A
ll speakers exhibit a convex contour for this tonal F0. Starting at the second lowest
onset point, the peak is reached between 50 and 60%
duration. Both fem
ale speakers exhibit a sm
all drop before comm
encing the rise; however, the point at w
hich they offset is roughly equal to the point at w
hich they onset. ZPW, w
hile not exhibiting the initial drop, still offsets this tone at roughly the sam
e point as the onset. WX
Q also does not drop in F0 after onset, and
his offset point is much low
er in his range than that of the onset. While m
ost speakers rise consistently betw
een 20 and 50/60% duration, LY
only rises between 40 and 60%
of her duration. This shorter rise com
ponent in the pitch contour for LY’s tone 6 is audibly different
from the pitch contours for the tone 6s produced by the other speakers.
Tone 7: This tone show
s some variation betw
een speakers. WX
Q clearly has a different target
for this tone. With the duration nearly equal to half of his m
aximum
duration, the F0 appears to have a target contour sim
ilar to that of tone 4. All other speakers have a level or falling
contour for this tonal F0, with the onset point at or near the top of the speaker’s range.
Crucially, the duration for m
ost speakers is less than one-third of their maxim
um duration.
4.5.3 Importance of m
ulti-speaker data Parallel contours m
ay be found within individual system
s, such as between tones 2 and 3 in
FM’s system
, tones 1 and 2, and also tones 3 and 5 in LY’s system
. WX
Q parallels his tones 4
and 7. While none of ZPW
’s tones show any striking parallels in their contours, one could
consider that either tones 1 and 2, or tones 2 and 3 parallel the same contour. From
this it is im
mediately obvious w
hy a multi-speaker approach is superior to investigating the speech of
only one person. Given the above inform
ation, together with a ‘single speaker’ approach, one
could conclude that the Fuzhou variety had, as phonologically significant characteristics:
A
CO
USTIC
CH
AR
AC
TERISTIC
S OF TH
E CITA
TION
TON
ES
51
i. Tw
o falling tones
ii. Tw
o level tones and two falling tones
iii. Tw
o rising stopped tones, one high and one low
iv. Either tw
o level tones and two falling tones, O
R one level and tw
o falls
The advantages of a multi-speaker approach are thus clear: the possibility of factoring out
between-speaker differences to get at the features of extrinsic control w
hich are representative of the variety as a w
hole is clearly desirable. The next section discusses how to do this through
normalization.
4.6 Norm
alization This section introduces the concept of norm
alization, discusses its benefits and techniques for carrying it out.
4.6.1 The importance of norm
alization R
ose (1991) discusses the need to eliminate non-linguistic factors w
hen attempting to
characterize a
whole
variety: “O
ne of
the m
ajor aim
s of
linguistic phonetics
is the
identification of the phonetic features which specify the sounds w
ithin a given language or variety. H
owever linguistic phonetic sam
eness is often instrumentally elusive. O
n the acoustic level of description, differences betw
een speakers in acoustic output caused by differences in their vocal tract anatom
y will often be large enough to sw
amp not only the linguistic content,
which is signalled by the particular sound contrasts involved, but also a fortiori the phonetic
detail which characterises the sounds of one particular variety against another.” (R
ose 1991:230).
Every speaker’s acoustic output will be different for w
hat is perceived to be the same
sound because the size of the individual’s vocal tract is also different, and the acoustic properties of radiated speech w
aves are a unique function of a speaker’s vocal tract anatomy.
The length and mass of the vocal cords is one of the m
ain physiological differences resulting in different acoustic outputs. For exam
ple in the previous section we saw
that the male
speakers have ranges roughly 100 Hz low
er than the female speakers, and there w
as also quite a difference betw
een the male speakers. O
ne can assume this is due to the difference in the
length and mass of the vocal cords: the longer, m
ore massive vocal cords result in a low
er F0.
The fact remains, how
ever, that these sounds are actually perceived to have the same
linguistic content. The linguistic content has thus to be mediated by a process separating it
from the com
ponents determined by the individual speaker’s physiology. “N
ormalisation is a
mathem
atical analogue of this perceptual process, aiming to extract and specify the invariant
acoustic correlates of the Accentual and Linguistic features of a particular variety, and then to
compare varieties for typological and universal purposes” (R
ose 1987:343).
While there have been a num
ber of investigations into vowel norm
alization (e.g. D
isner 1980), Earle (1975) was the first to norm
alize of the acoustic correlates of tone. Rose
C
HA
PTER 4
52
(1987) then examined som
e considerations in tonal normalization in greater detail. This w
ork w
as a breakthrough for tonal description, because without norm
alization it is not possible to take the first step in defining the phonetic and phonological features of a particular variety, nam
ely determining the features w
hich serve to characterize the whole variety and discarding
the idiosyncrasies.
4.6.2 Norm
alization procedural techniques The aim
of normalization is to m
aximally reduce the betw
een-speaker variance while still
making sense perceptually. The notion of perceptual sense, w
hich serves to evaluate the num
erical strategy, can be understood in two w
ays. Firstly, that the normalized values should
correctly reflect the transcriber’s auditory impression, and secondly that the norm
alization should ideally m
odel the actual process of the listener’s perceptual parameters. H
owever, not
enough is known about the relationship betw
een linguistic pitch and its acoustic correlates to facilitate this. F0 is thus taken to be the param
eter for normalization, as it is considered to be
the primary acoustic correlate of pitch (Lehiste 1970: 54). C
onsidering that a normalization
strategy is meant to reflect the speaker’s perception, it is desirable that betw
een-speaker differences in pitch be reflected as differences in norm
alized F0.
The normalization procedure em
ployed in this study is that of the z-score transform,
following R
ose (1987), who reports its superiority in num
erical performance over a com
peting m
ethod, the Fraction of Range. H
e also notes that another advantage to using the z-score is its use of m
any F0 values as normalization param
eters, not just two as em
ployed by the latter technique. H
e further notes that the root-mean-square basis of the function w
ill ensure a globally distributed reduction in betw
een-speaker variance. That is, the individual values are squared, then averaged, and finally the m
ean is taken. How
ever, normalization param
eters should only be calculated from
samples that are com
parable in order to avoid biasing. The ‘high fall’ is a good exam
ple. It is not clear whether all speakers have the sam
e extrinsic target, so if w
e were to include the ‘high falls’ in the norm
alization parameters, it m
ay skew
the other tones. That is, all the speaker’s tones that had the lower high fall w
ould be rendered slightly higher. It is thus preferable to exclude it from
the normalization param
eters. When
considering which sam
ples would be suitable candidates for norm
alization parameters, it is
also necessary to exclude those parts of the signal already ‘discarded’ with respect to
conveying tonally relevant distinctive features, namely the onset and offset perturbations. A
s for transcriptional equivalence, I chose tones 1, 3, 4 and 6 as com
parable tones for this purpose. That is, the high level, low
fall, the rising stopped tone and the rise-fall. Thus my
parameters for norm
alization were these tones from
all four speakers at the 20, 40 60 and 80%
duration sampling points.
The z-score normalization procedure is as follow
s:
𝑧𝑧=
��� �����
where F0
i is the sample point, F0 is the average F0 from
the arithmetic m
ean of all the points chosen to be norm
alization parameters, and SD
is the standard deviation of the mean of those
points (all values calculated to three decimal places). The results are in A
ppendix E. Other
strategies for normalization include taking the log of the F0 (e.g. see N
earey 1989; Zhu 1999).
A
CO
USTIC
CH
AR
AC
TERISTIC
S OF TH
E CITA
TION
TON
ES
53
4.6.3 Results of the norm
alization The results are show
n in the figures below w
hich plot the normalized tones for each speaker
by tone. Most of the transform
ed shapes cluster together quite closely, and some of the tones
with different betw
een-speaker pitches have been kept separate nicely, for example, the larger
derivative in WX
Q’s tone 2 and his low
er onset point for tone 5, LY’s delay of about 20%
duration for the rise in tone 6, and W
XQ
’s different contour for tone 7.
Recall that the norm
alization parameters w
ere F0 values at percentage points of duration, so the x-axis is now
expressed as a percentage of total duration for that tone. The normalized F0
range is now quantified in units of standard deviations from
the mean.
Figure 4.9. N
ormalized tone 1.
Figure 4.10. Norm
alized tone 2.
Figure 4.11. N
ormalized tone 3.
Figure 4.12. Norm
alized tone 4.
020
4060
80100
1 2 3 4
Percentage of duration
SD from mean
020
4060
80100
-1 0 1 2
Percentage of duration
SD from mean
020
4060
80100
-2 -1 0 1
Percentage of duration
SD from mean
020
4060
80100
-1 0 1 2 3
Percentage of duration
SD from mean
C
HA
PTER 4
54
Figure 4.13. N
ormalized tone 5.
Figure 4.14. Norm
alized tone 6.
Figure 4.15. N
ormalized tone 7.
It is worth m
entioning a few things about these representations. A
ll tones will have reached
what is presum
ably the target contour by 20% duration, m
ost noticeable for tone 1. This is just about equal to the 5 csec onset perturbation I proposed to ignore for the F0 descriptions in section 4.3. A
side from the already m
entioned between-speaker differences, there seem
s to be a possible betw
een-sex difference in the third tone. The males have low
er, more level tones
than the females w
hose tones fall slightly, about 0.5 of a standard deviation above the males.
Now
it is possible to specify part of a linguistic-phonetic-acoustical representation of the Fuzhou citation tones, representative of the w
hole variety. Such a representation is shown
in figures 4.16 – 4.22, for each individual tone and figure 4.23, for all tones.
020
4060
80100
-2 -1 0 1 2 3
Percentage of duration
SD from mean
020
4060
80100
-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0
Percentage of duration
SD from mean
020
4060
80100
0 1 2 3 4 5
Percentage of duration
SD from mean
A
CO
USTIC
CH
AR
AC
TERISTIC
S OF TH
E CITA
TION
TON
ES
55
Figures 4.16 – 4.22 plot the mean of the norm
alized tonal F0 values together with one
standard deviation above and below the m
ean, as indicated by the vertical lines. Assum
ing that the norm
alized data are normally distributed, about 66%
of Fuzhou normalized F0 w
ould be expected to lie w
ithin the one SD range above and below
the mean, as indicated in the plots.
In this way, it indicates the degree of expected variation in Fuzhou, w
ith 2/3 of speakers’ tonal F0s falling w
ithin the limits set by these bars (given 100 speakers). The points deem
ed not com
parable for normalization are excluded from
these mean values. These plots then exclude
WX
Q’s tones 5 and 7 and the 40%
point of LY’s tone 6. Strictly speaking the betw
een-sex difference found in tone 3 should perhaps also be kept separate, but I aw
ait further investigation and thus confirm
ation of this as a real between-sex difference before excluding
it. Unless otherw
ise specified, the mean w
as taken from all four speakers.
Figure 4.16. M
ean normalized tone 1.
Figure 4.17. M
ean normalized tone 2.
Figure 4.18. Mean norm
alized tone 3. Figure 4.19. M
ean normalized tone 4.
020
4060
80100
1 2 3 4
Percentage of duration
SD from mean
020
4060
80100
-1 0 1 2
Percentage of duration
SD from mean
020
4060
80100
-2 -1 0 1
Percentage of duration
SD from mean
020
4060
80100
-1 0 1 2 3 4
Percentage of duration
SD from mean
C
HA
PTER 4
56
Figure 4.20. M
ean normalized tone 5.
Figure 4.21. Mean norm
alized tone 6.
Figure 4.22. M
ean normalized tone 7.
Figure 4.23 shows these m
ean values plotted together to represent the tonal configuration of Fuzhou citation tones, but w
ithout illustrating the standard deviations away from
the mean.
Now
we have a representation of the w
hole Fuzhou variety, having factored out the between-
speaker differences.
020
4060
80100
-2 -1 0 1 2 3
Percentage of duration
SD from mean
020
4060
80100
-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0
Percentage of duration
SD from mean
020
4060
80100
0 1 2 3 4 5
Percentage of duration
SD from mean
A
CO
USTIC
CH
AR
AC
TERISTIC
S OF TH
E CITA
TION
TON
ES
57 Figure 4.23. M
ean normalized tones in Fuzhou, plotted against m
ean duration.
At this stage I should note that norm
alized F0 values should be expressed as functions of norm
alized duration parameters (e.g. R
ose 2000). How
ever, I can only quantify the m
agnitude of onset and offset perturbations in terms of centiseconds and percentage of
duration respectively. While desirable to norm
alize duration, it is also desirable to be able to represent Fuzhou tones as only those tonal contours used to distinguish linguistically relevant tonal features. This entails factoring out the betw
een-speaker differences, as I have just done, and also the onset and offset perturbations. To find a suitable w
ay of converting the parameter
used in duration normalization to centiseconds is beyond the scope of this study. I thus
decided to take the arithmetic m
eans of the tones in order to be able to represent duration in term
s of centiseconds and thus exclude what I have found to be the perturbations.
510
1520
2530
3540
-2 -1 0 1 2
Mean norm
alized tones: 5 csec. to 95% duration
Mean duration (csec.)
SD from mean
T1T2
T3
T4
T5
T6
T7
C
HA
PTER 4
58
4.7 Distinctive features for Fuzhou tones
From the tonal configuration given in figure 4.23, w
e can now discuss the distinctive features
required to describe these data from one of the m
any models that have been proposed. Let us
consider one of the first models for such features: W
ang (1967). Wang considers contour tones
to be part of the whole syllable and proposed features to describe these such as [±fall], [±rise],
Applying these to the set of norm
alized tones in Fuzhou, the following specifications can be
made in order to classify (not describe) these tones.
� The prim
ary distinction is made by the presence or absence of a final stop, aptly called
(though not after Wang) [±short]. If the distinctive feature of the tone is that it is [+short],
then the best way to obtain this is to truncate it w
ith a glottal stop, which m
ay be considered part of the tone and not a phonological segm
ent. Of course, it m
ay also be short if the glottal stop is phonologically segm
ental and consumes a large part of the
rhyme duration, for w
hich only a quantum of duration is specified, thus abbreviating the
duration available for tonal F0.
� A
ll the tones that are [–short] may be divided into [±contour]. Tones 1, 2 and 3 can be
considered to be [–contour] and then may be m
inimally distinguished from
each other by [+high], [–high, –low
] and [+low] for each of the tones respectively. If the tone is
[+contour] – tones 5 and 6 – we m
ay further describe them as [+fall] and [+rise, +fall].
� The short tones m
ay be distinguished by either [±high] or [±contour], though the latter goes som
e way tow
ards explaining the difference in duration – the rising stopped tone taking longer than the sim
ple high stopped tone.
Table 4.2 summ
arizes this feature assignment. For feature assignm
ent using Yip’s m
odel, please see section 6.4.
Tone 1
Tone 2
Tone 3
Tone 4
Tone 5
Tone 6
Tone 7
[–short] [–short]
[–short] [+short]
[–short] [–short]
[+short]
[–contour] [–contour]
[–contour] [+contour]
[+contour] [+contour]
[–contour]
[+high] [–high]
[–high] [–high]
[–rise] [+rise]
[+high]
[–low]
[–low]
[+low]
[+fall]
[+fall]
Table 4.2. Phonological feature assignm
ent for Fuzhou.
4.8 Summ
ary This chapter first described the algorithm
s designed to analyze the data. After giving a brief
description of the pitch of the citation tones in Fuzhou, the tonal F0 were discussed, first for
each speaker, then across speakers, thus paving the way for an introduction to the concept of
normalization and the desire to factor out betw
een-speaker differences. This procedure was
A
CO
USTIC
CH
AR
AC
TERISTIC
S OF TH
E CITA
TION
TON
ES
59
then described and the results from the norm
alization were presented. The next chapter
explores the relationship between F0 and am
plitude and presents an assessment of the
importance of radiated am
plitude in tonal production.
This section introduced the concept of normalization and described the m
ethod by which it
can be achieved. After applying the outlined m
ethods to my data, the results w
ere presented and discussed.
60
61
5
The physiology of tone production in Fuzhou
The previous chapter investigated the acoustic dimension of tonal F0. This chapter uses the
acoustics to infer something about the physiology of tone production and provides an insight
into the physiological factors involved in F0 production.
I first discuss the reasons why it is necessary to investigate m
ore than one acoustic dim
ension and why I chose am
plitude. Section 5.3 is a brief description of received attitudes tow
ards F0 production, and is followed by a discussion of the relationships betw
een the two
acoustic parameters and the tw
o physiological parameters. Section 5.6 com
pares these relationships and 5.7 discusses the im
plications for tonal F0 production given my results.
5.1 Why investigate am
plitude? Pitch is usually assum
ed to be the perceptual correlate of fundamental frequency. This is the
result of the assumption that the physiological correlate of tone features is the vibration of the
vocal folds in phonation, and that the acoustic correlate of the vocal fold vibration is the fundam
ental frequency of the sound wave generated at the glottis (Lehiste 1970:54). The rate
of vibration of the vocal cords depends on a number of interdependent factors, such as the
mass of the vibrating part of the vocal folds, the vocal cord tension, the area of the glottis
during the cycle, which determ
ines the effective resistance of the glottis, and the value of the B
ernoulli effect, the value of the subglottal pressure and the damping of the vocal cords. R
ose (1988:11) rem
arks on this: Tonal pitch is [indicated to be] the perceptual result of both general auditory and speech specific
processes operating
on F0.
The speech
specific processes
typically involve
productionally mediated perception, and include speaker and context norm
alisation of F0. Both
of these (but especially the latter) involve interaction, in ways m
ostly not yet clear, between F0
and all the other main acoustic dim
ensions of Ar (radiated am
plitude), duration and spectrum.
Thus tonal pitch cannot be an exclusive function of F0, although consideration of the m
agnitude of perceptual effects indicates that F0 constitutes the basic term in the function
relating acoustics to pitch.
This illustrates the many-to-one relationship betw
een acoustics and auditory features, as pitch is perceived as just one feature, but is actually the result of m
any different acoustic dimensions
co-occurring. This relationship can even be found with features such as [voice]. R
ose (1988:21) found that his perception of the feature [voice] is dependent on sufficient A
r level as w
ell as F0, serving as a reminder of both “the inferential nature of the articulatory features
with w
hich we describe the auditory responses encoded in a phonetic transcription, and of the
fallibility of the process: just because we hear [–voice] does not im
ply that the vocal cords are
C
HA
PTER 5
62
not vibrating.” Let us consider another feature which illustrates how
phonology does not adequately distinguish auditory from
articulatory features. Consider the feature [±nasal]:
hearing an oral (i.e. [–nasal]) vowel does not necessarily m
ean that the soft palate is fully raised (certainly not for low
vowels, for exam
ple).
In the previous chapter the two acoustic dim
ensions of F0 and duration were
investigated. Am
plitude is another of the acoustic dimensions involved in the acoustic
correlation of what is perceived as pitch, and w
ill be investigated in this chapter in terms of its
relationship with F0 and how
these two param
eters may be used to infer som
ething about the production of the tones. First, how
ever, follows a brief introduction to w
hat is meant by
amplitude and w
hy it is important and useful to investigate this dim
ension.
5.2 What is am
plitude? A
mplitude is said to be the prim
ary acoustic correlate of the percept of loudness (Lieberman
and Blum
stein 1991:28). Peak-to-peak amplitude is the extent of the m
aximum
variation in air pressure from
the zero line during a sound (Ladefoged 1962:15), that is, the maxim
um
displacement of a particle set in vibratory m
otion, marking the extrem
e limit of its m
otion of oscillation. This, how
ever, is not a good correlate of what is perceived as ‘loudness’. W
hat is m
ore appropriate is the average displacement of the particle. So the root-m
ean-square (RM
S) am
plitude is calculated, that is a form of the average of the am
plitude which is particularly
useful for complex w
ave forms (the values are individually squared, then averaged, and finally
the square root is taken).
Distinguished from
intensity, which is a m
easure of energy or power (i.e. the ability to
knock a particle sideways), am
plitude is a measurem
ent of pressure and, unlike intensity, is not a function of frequency. A
mplitude is a function of subglottal pressure, all things being
equal. Other m
ajor influencing factors are: the vowel quality, specifically the effect of the
supralaryngeal filter, and the interaction of the source and filter with respect to harm
onics and form
ant frequencies, resulting in local fluctuations in amplitude (nam
ely increasing with the
frequency peaks).
Subglottal pressure (Ps), however, is not just a function of pulm
onic effort, but also of laryngeal activity, and thus vocal cord tension, and also glottal resistance (i.e. the average glottal area). A
s the time-varying am
plitude of the glottal source occurs extrinsically as the result of articulatory gestures that affect the Ps, it should be possible to infer som
ething about the physiological factors involved in the F0 production. It is this aspect of A
r that I will be
primarily concerned w
ith in this chapter. I will next briefly review
the physiological factors involved in tonal production.
5.3 Received theories of F0 production
This section
provides an
overview
of tonal
production, in
order to
investigate the
physiological factors involved in the production of the tones and determine any connection
between the production of F0 and A
r.
T
HE PH
YSIO
LOG
Y O
F TON
E PRO
DU
CTIO
N IN
FU
ZHO
U
63
The relative importance of vocal cord tension (V
CT) and Ps as “m
echanisms of
dynamic F0 control” (R
ose 1982:160) has long been an issue of controversy. It is generally assum
ed that changes in F0 are the result of differences in VC
T due to changes in the intrinsic and
extrinsic laryngeal
musculature;
specifically, passive
implem
entation from
the
cricothyroid and strap muscles; and active im
plementation from
the vocalis, as has been show
n in electromyographic studies on Tai (Erickson 1976).
How
ever, Monsen et al. (1978) proposed a m
ethod of quantifying the separate contributions of V
CT and Ps factors to the production of observed F0 and R
MS glottal
amplitude (G
lot Am
p = volume velocity w
ave generated at the glottis) in English. They found a w
ay of indirectly assessing the contributions of Ps and VC
T to changing F0 by comparing
human glottal-source data w
ith synthetic glottal waveform
s generated by the Ishizaka-Flanagan tw
o-mass m
odel of vocal-fold vibration. “The two-m
ass model duplicates m
any of the essential features of vocal-fold vibration” (M
onsen 1978:66) and provides a means of
investigating the changes that occur in the glottal source when a frequency change is caused
by changes in VC
T and/or Ps. By com
paring the acoustic characteristics of natural phonation w
ith those of synthetic phonation generated by known values of Ps and V
CT, inferences can
be made about the sources of the F0 variation in norm
al speech given the two observed
variables of F0 and Glot A
mp.
The F0 and Glot A
mp values of sam
pled periods were selected and plotted against
each other. This can be seen in figures 5.1 and 5.2, which contain the data synthesized for
vocal folds of typical male and fem
ale dimensions respectively. Each point on these graphs
represents a glottal period of known F0 and G
lot Am
p and corresponds uniquely to a specific value of both Ps and V
CT. “These graphs are thus “m
aps” which can be used to infer changes
in air pressure and vocal-fold tension when the frequency and intensity of individual glottal
periods are known” (M
onsen et al. 1978:69).
I have enlarged the original graphs (Monsen et al. 1978:69 figs 3 and 4) and m
odified them
slightly to accomm
odate the larger amplitude ranges of m
y speakers. The grid is com
posed of intersecting contours of equal Ps and VC
T settings, which connect the values of
the sampled glottal w
ave periods mentioned above. Ps settings, w
hich are shown by the solid
triangles, increase almost vertically, over a range from
3–18cm H
2 0; VC
T settings, shown by
the open circles, increase almost horizontally, in a range from
0.5 to 1.5. No unit is given for
VC
T as it is not an absolute number m
easured in dyn/cm2. Instead, the value of Q
=1.0 is the approxim
ate setting for VC
T typical of phonation in speech. The range represents the variation in tension 50%
above and below this typical value. R
eferring to these maps, M
onsen et al. point out that an increase in Ps produces a large increase in G
lot Am
p, but only a m
oderate increase in F0. Similarly an increase in V
CT produces a large increase in F0, and a
small decrease in am
plitude.
The acoustic results of any contribution of Ps and VC
T settings can be read off along the horizontal axis w
hich shows F0 in 20 H
z increments from
80 to 300 Hz, and along the
vertical axis which show
s Glot A
mp in 1 dB
increments. For exam
ple, at 60% duration for
tone 1, ZPW has a m
ean F0 value of 148.6 Hz and a m
ean Ar value of 23.9 dB
. These have been plotted and are m
arked on the map w
ith an ‘X’. From
this the VC
T can be found to be 0.87, and the Ps to be 8.0 cm
. Similarly for the m
ap of female dim
ensions, an ‘X’ m
arks the 40%
duration for LY’s tone 1, w
here the mean F0 value is 200 H
z and the Ar value is 25.8 dB
. The V
CT is determ
ined to be 1.05 and the Ps to be 8.48 cm.
C
HA
PTER 5
64
I have applied these maps to m
y F0 and Ar data to find out the contribution of Ps and
VC
T to changes in Ar and F0 in Fuzhou citation tones. There are, how
ever, a few differences
between the tw
o approaches which m
ust be made explicit so that the reader is aw
are of the lim
itations of my application of this m
odel.
The amplitude m
easurements I m
ade are of RM
S radiated amplitude (A
r), differing from
those used by Monsen et al. w
ho use RM
S glottal amplitude values. W
hen the Rhym
e is m
onophthongal there is not a problem associated w
ith this difference as the supralaryngeal filter can be assum
ed to remain constant throughout the R
hyme and the am
plitude radiated at the m
outh should be a reasonably true reflection of the amplitude generated at the glottis,
allowing for interaction of form
ants and harmonics. H
owever, as has been discussed, the
presence of tonally conditioned vowel alternations in Fuzhou m
eans that one cannot control for all finals to be m
onophthongal except with [a]. The changing shape of the filter, then, is a
possible source of error for my results (high vow
els tend to have a lower am
plitude) and they should be observed w
ith this in mind. H
owever, at least the slight difference is across the
board/for all speakers.
The second problem is associated w
ith the slight change in phonation type associated w
ith some of the tones. This im
plies that there will be differences in glottal aperture associated
with these changes. The M
onsen et al. maps are based on norm
al modal phonation and thus do
not account for differences associated with a change to creaky or breathy voice. H
owever,
when faced w
ith this problem and the problem
of the speakers’ Ar ranges being m
uch greater than the G
lot Am
p range on the maps, I decided to restrict m
y modification of the m
odel to accom
modate prim
arily those values of Ar w
hich were audibly produced in a norm
al phonation type, thus deliberately letting m
any of those values, clearly audibly uttered in a non-m
odal phonation fall below the range on the m
ap. Thus, the Ar w
as aligned with the G
lot Am
p on this basis (see figures 5.1 and 5.2). This, how
ever, introduces two other possible error
sources. The first, that the values of VC
T and Ps are not correct as determined by the m
ap readings, for those values w
ith differing glottal apertures. It should be noted that the possibility of error due to the differing phonation type is an error source that also cannot be controlled for, as the m
odel doesn’t include this possible third dimension of glottal aperture.
This is, however, a desirable dim
ension to be able to capture, and which could be captured by
the original two-m
ass model devised by Ishizaka &
Flanagan (1972). Through controlled experim
ental implem
entation, the third dimension m
ay be added in the same w
ay as the dim
ensions of Ps and VC
T were derived to relate A
r and F0 to the three dimensions of G
lot A
mp, Ps and V
CT. U
nfortunately this is beyond the scope of this work.
The final possible source of error mentioned above is due to the fact that the m
ap has been m
odified slightly without direct reference to the tw
o-mass m
odel to obtain precise values. V
alues obtained from areas outside of the original m
ap as presented by Monsen et al.
are thus subject to error, albeit minim
al. One last point to note is that the m
aps are based on vocal cords of typical dim
ensions, and so the very different ranges of the male speakers (95–
135 Hz for W
XQ
, 70–170 Hz for ZPW
) probably reflects different length and mass of their
vocal cords. Therefore as the models are based on typical vocal cord dim
ensions, the results for one of the m
ale speakers are likely not comparable w
ith the other speaker. It is for this reason that the m
ale speakers will be excluded. Instead, w
e focus here on the two fem
ale speakers.
T
HE PH
YSIO
LOG
Y O
F TON
E PRO
DU
CTIO
N IN
FU
ZHO
U
65
Figure 5.1. Modified m
ap from M
onsen et al. (1978). Data synthesized for vocal folds of m
ale dim
ensions.
C
HA
PTER 5
66
Figure 5.2. M
odified map from
Monsen et al. (1978). D
ata synthesized for vocal folds of fem
ale dimensions.
5.4 Ar and F0 relationship
Before relating the acoustics to the physiology of tonal production, I first dem
onstrate the positive relationship that holds betw
een the F0 and the Ar.
I do not present all the data here. I have chosen to present some of the results for the
female speakers as they m
ost clearly indicate the relationship and are most com
parable due to
T
HE PH
YSIO
LOG
Y O
F TON
E PRO
DU
CTIO
N IN
FU
ZHO
U
67
their similar F0 ranges. D
ifferences in the production of tones would be m
ost obvious from a
comparison of highly com
parable speakers. Therefore, I do not present data from the m
ale speakers here because their F0 ranges are not com
parable, as noted above. For the values and plots of each speaker’s m
ean F0 and mean A
r values against equalized duration, I refer the reader to A
ppendices H and I. In this section I com
pare the high, mid and low
tones together and then the high fall and rise-fall tones. These w
ill be termed the static and dynam
ic tones respectively from
their assignment of [–contour] and [+contour] as suggested in section 4.7.
They are plotted together against equalized duration in figures 5.3 to 5.6. The dotted lines indicate the A
r values and the solid lines, the F0 values.
Figure 5.3. A
r/F0 plotted against equalized duration for Tones 1–3, speaker 2 (FM).
160
180
200
220
240
Fundamental frequency (Hz)
Radiated amplitude (dB)
15 20 25 30
2040
6080
100
Percentage of duration of Final
T1T2T3
F0Ar
C
HA
PTER 5
68
Figure 5.4. A
r/F0 plotted against equalized duration for Tones 1–3, speaker 3 (LY).
Figure 5.5. Ar/F0 plotted against equalized duration for Tones 5–6, speaker 2 (FM
).
140
160
180
200
220
Fundamental frequency (Hz)
Radiated amplitude (dB)
18 20 22 24 26 28
2040
6080
100
Percentage of duration of Final
T1T2T3
F0Ar
180
200
220
240
Fundamental frequency (Hz)
Radiated amplitude (dB)
20 22 24 26 28 30 32
2040
6080
100
Percentage of duration of Final
T5T6F0Ar
T
HE PH
YSIO
LOG
Y O
F TON
E PRO
DU
CTIO
N IN
FU
ZHO
U
69
Figure 5.6. Ar/F0 plotted against equalized duration for Tones 5–6, speaker 3 (LY
).
5.4.1 The static tones Figures 5.3 and 5.4 show
the values obtained for the static tones plotted together for each of the fem
ale speakers; that is, tones 1, 2 and 3 for speakers FM and LY
respectively. There are great sim
ilarities in the Ar/F0 relationship betw
een the two speakers. In term
s of relative height, the A
r can be seen to reflect the F0 with respect to the relative position in the speaker’s
range. Tone 1 is consistently higher in the range than tone 2, which in turn is consistently
higher in the range than tone 3 for both F0 and Ar contours.
5.4.2 The dynamic tones
Figures 5.5 and 5.6 refer to the dynamic tones. Tones 5 and 6 show
similar relationships to
tones 1–3 in that the Ar can be seen to reflect the F0 contour to a large extent, especially in
terms of relative height in the speaker’s range. Tone 6 for LY
, however, does not have a rising
Ar. R
ather it is level when the F0 is rising. The A
r contour for tone 5 doesn’t have such a steep fall for LY
as it does for FM, but this is also the case w
ith the F0. Again, a positive
correlation between the A
r contour and the F0 can (mostly) be observed.
5.4.3 Summ
ary of the Ar/F0 relationship
The relationship between the A
r and the F0 was exam
ined for the two fem
ale speakers’ static and dynam
ic (non-stopped) tones. There was found to be a positive correlation betw
een these
160
180
200
220
Fundamental frequency (Hz)
Radiated amplitude (dB)
14 16 18 20 22 24 26
2040
6080
100
Percentage of duration of Final
T5T6F0Ar
C
HA
PTER 5
70
two acoustic dim
ensions for both speakers. This correlation exists in terms of the relative
position (height) in the F0 and Ar ranges of the tones. This relationship is also show
n to hold betw
een the tones as determined by their relationship to one another. It should be noted that
inspection of the male speakers’ graphs also reflects this positive A
r/F0 relationship.
Having found a positive relationship betw
een the Ar and the F0 it is reasonable to
assume that they are probably being produced in the sam
e way. N
ext I investigate the extent to w
hich this is reflected in the physiology of the tonal production by using the values derived for V
CT and Ps from
the Monsen et al. m
aps.
5.5 VC
T and Ps relationship This section exam
ines the values of VC
T and Ps for the static tones for the two fem
ale speakers, as derived from
the Monsen et al. m
aps. Again, sim
ilar values were derived and
plotted together against equalized duration for every speaker and every tone, but I refer the reader to A
ppendix I for these. This section is restricted to the examination of the relationship
between the V
CT and Ps of the static tones w
hose Ar and F0 relationship w
as examined in the
previous section.
On the graphs given in figures 5.7 and 5.8, the dotted lines indicate values of Ps and
the solid lines indicate values of VC
T. When the values of A
r and F0 were beyond the range
of the maps, estim
ates were m
ade by extrapolating the model in the appropriate direction and
estimating an increase or decrease in value from
the preceding point. This is indicated on the graph by the slashed lines. These physiological param
eters are first examined together to
determine the relationship holding betw
een them. In the subsequent sections I discuss w
hat can be inferred about the physiology of the tonal production, and how
this compares w
ith received theories of tonal production.
5.5.1 VC
T and Ps for the females’ static tones
In figure 5.8 it can be seen that LY has a clear relationship betw
een her VC
T and Ps. Tone 1 has the highest V
CT contour, follow
ed closely by tone 2 then tone 3. The same can be said for
her Ps. Tone 3 does actually start at a place higher in the range than tone 2, but by 40% of the
duration the distance relationships parallel those of the VC
T. After this point, w
hile still very sim
ilar, there are small differences in the tw
o contours. This could be due to the slight change in phonation type associated w
ith about half of tone 3 and the end of tone 2 (cf. section 4.2). Like LY
, FM (figure 5.7) also exhibits the sam
e relationships reflected in the relative height differences of the values w
ithin the range.
T
HE PH
YSIO
LOG
Y O
F TON
E PRO
DU
CTIO
N IN
FU
ZHO
U
71
Figure 5.7. V
CT and Ps plotted against equalized duration for tones 1–3, speaker 2 (FM
).
Figure 5.8. V
CT and Ps plotted against equalized duration for tones 1–3, speaker 3 (LY
).
0.8
0.9
1.0
1.1
1.2
1.3
1.4
Vocal cord tension
Subglottal Pressure (cm H20)
4 6 8 10 12 14 16
2040
6080
100
Percentage of duration of Final
T1T2T3
VCT
Ps
0.8
0.9
1.0
1.1
1.2
1.3
Vocal cord tension
Subglottal Pressure (cm H20)
3 4 5 6 7 8 9 10
2040
6080
100
Percentage of duration of Final
T1T2T3
VCT
Ps
C
HA
PTER 5
72
5.6 Ar/F0 vs. V
CT/Ps relationships
Figures 5.1–5.4 clearly demonstrate that the relationship betw
een F0 and Ar is positively
correlated to reflect the relative differences of height within the given range. Sim
ilarly, the values of V
CT and Ps (figures 5.7 and 5.8) show
an identical positive correlation reflecting the sam
e thing.
5.7 Discussion
The acoustic and physiological parameters show
parallel relationships that reflect the same
height distinctions within the given range, reflecting the distinctive tone heights. In section
4.7, tones 1, 2 and 3 were given the distinctive features [+high], [–high,–low
], and [+low]
respectively. This is clearly reflected in the physiological production of the tones.
It was pointed out above in section 5.3 that the received theories say that V
CT is the
primary factor involved in F0 production and that Ps is secondary. From
this, when com
paring the static tones, w
e would expect to find a constant Ps, the differences in F0 being controlled
by the VC
T. Instead, what w
e get is more com
plicated that this. In particular, there are different Ps contours for all of the three static tones.
Indeed, a generally positive correlation between A
r and F0 could reflect extrinsic subglottal involvem
ent (that is, the speaker is increasing F0 by both VC
T and Ps). But it could
also be that with increased tension you get a longer closed phase, w
ith concomitant increase in
intrinsic Ps, as a result of the increased glottal resistance. It is impossible to decide the proper
explanation of this correlation without the appropriate m
odeling.
We can, how
ever, conclude that both VC
T and Ps are important in the production of
tones in Fuzhou. This is an important finding as it is contrary to the received theories of tonal
production, but supports earlier work show
ing a clear F0-Ar relationship (congruence) for
some languages (e.g. Zhenhai; R
ose 1984).
5.8 Summ
ary This chapter has explored the possibility of investigating the acoustic dim
ension of radiated am
plitude to shed light on characteristics of tonal production in Fuzhou. This was done by
applying the Monsen et al. m
ap based on the Ishizaka-Flanagan two-m
ass model, to use
parameters of F0 and am
plitude to determine the degree of involvem
ent of the corresponding physiological factors, V
CT and Ps, in the production of the F0. A
between-speaker
comparison dem
onstrated the relationship between F0 and A
r to be positively correlated, suggesting that the tones w
ere produced in the same w
ay by both speakers. This was
compared w
ith the relationship between the V
CT and Ps, as derived by application of the
Monsen et al. m
aps. The relationship between the physiological param
eters was found to
parallel those of the acoustic parameters. From
this we can see the equal im
portance of both Ps
T
HE PH
YSIO
LOG
Y O
F TON
E PRO
DU
CTIO
N IN
FU
ZHO
U
73
and VC
T in producing tones in Fuzhou. This is an important finding because Ps has
previously been credited with little or not involvem
ent in tonal F0 production.
The next chapter reviews the tone sandhi phenom
ena in light of new data on disyllabic
expressions.
74
75
6
Fuzhou disyllabic tone sandhi
This chapter gives an analysis of disyllabic tone sandhi in Fuzhou based on material collected
from the sam
e four speakers used for the data obtained for the citation tones. Section 6.1 outlines the procedure involved before presenting the results in section 6.2. Section 6.3 briefly sum
marizes som
e current views on tonology, to illustrate the advantages of an autosegm
ental approach. Fuzhou is then analyzed w
ithin this framew
ork in section 6.4. Section 6.5 analyses the tone sandhi in a different, m
ore diachronically motivated approach w
hich has been shown
to be suitable for other Chinese dialects, so I adopt this approach to assess its adequacy to
account for Fuzhou tone sandhi. The two analyses are com
pared in section 6.6.
6.1 Procedure This section first describes the corpus and m
ethod for obtaining the data before presenting the results in the follow
ing section.
6.1.1 The corpus and elicitation The corpus consisted of disyllabic expressions taken from
the Hànyǔ fāngyán gàiyào (漢
語方
言概要
) (1960). This ‘survey of Chinese dialects’ includes a section on Fuzhou (pp. 296–
299), part of which lists fifty-seven different disyllabic expressions found in Fuzhou, allegedly
exhaustive of all the tonal combinations. These are listed in appendix B
and largely consist of disyllabic nouns (w
ith a few nom
inal phrases such as ‘Shang dynasty’ or ‘little box’).
All four inform
ants read the whole set of disyllabic expressions three tim
es, repeating each token once or tw
ice. Finally they read a list of characters that was com
posed of all the different characters used in the disyllabic expressions, so that I could check that all of the input tones in the com
binations were as they w
ere reported to be.
Auditory transcriptions w
ere made of the recordings and w
ere corroborated by a professional phonetician, D
r Phil Rose. Together, w
e listened to the utterances many tim
es to ensure the correct pitch values for each tone. I have ‘translated’ the transcriptions into the typical C
hao number scale of 1–5 (cf. section 1.6).
C
HA
PTER 6
76
6.2 Results
Table 6.1 presents the results. It is important to note that there appeared to be little betw
een-speaker differences in pitch targets for the disyllabic tonal outputs. H
owever, w
hile the auditory im
pressions were rem
arkably uniform, the sandhi patterns show
ed some variation.
Table 6.1 thus summ
arizes all four speakers’ tones. The table shows the pitch values
for both syllables in a given utterance. The tones along the top of the table are the citation tone of the second syllable, and dow
n the left-hand side, the citation tone of the first. A given
disyllabic form w
ill be located in the same row
as the input tone on its first syllable and in the sam
e column as the input tone for its second syllable.
For example, tw
o tone 1s have the output sandhi form [44 44]. The com
bination of tone 4 + tone 6 gives tw
o possible forms, [45 231] and [42 231] for three of the four speakers.
The place in the table corresponding to the combination of tone 1 plus tone 4 has only a
question mark. A
combination of tone 3 + tone 7 has the output values [3 5]. I w
ill remark on
the latter three examples below
.
It will be recalled from
chapter 2 that there is sometim
es more than one sandhi form
for tone 4. This can be seen in m
y data when tone 4 com
bines with any of tones 3, 4, 5, 6 or 7.
Chan (1985) reasoned that it is due to diachronically different final stops; how
ever I do not venture reasons for this difference in term
s of diachrony here. It will be seen to w
hat extent the difference is explicable later in sections 6.4 and 6.5. H
owever, for the other tones there
was not a second sandhi form
following tone 4.
Finally, as
mentioned
in the
procedural section,
after reading
the disyllabic
expressions, each speaker read each character individually, on a separate list, in order that the tones m
ay be checked from their citation readings. From
this, it was found that three of the
forms w
ere not the correct combinations and are therefore invalid. That is, for the tone 1 +
tone 4 combination, the speakers gave m
onosyllabic forms that had som
ething other than tone 1 and/or tone 4. The sam
e was true for the tone 3 + tone 4 and tone 3 + tone 6 com
binations. These gaps have been show
n by a question mark in the table, and are blacked out.
As m
entioned, the combination of tones 3 and 7 has the output of [3 5]. The difference
between w
hat has been represented as [3] and as [33] is that of duration. The tone represented by only a single digit is audibly shorter than the other sandhi tones. I reserve the use of the underlining to indicate those syllables that are short stopped tones.
F
UZH
OU
DISY
LLAB
IC TO
NE SA
ND
HI
77
Syllable 2 →→
(context) Tone 1
[44] Tone 2
[32] Tone 3
[21] Tone 4
[23] Tone 5
[51] Tone 6 [231]
Tone 7 [5]
Syllable 1 ↓↓
Resulting sandhi forms below
Tone 1 [44] 44 44
43 32
34 33
42 21 ?
33 51 42 231
3 5
Tone 2 [32] 21 44
44 51
34 51
44 31
44 41
33 13 33 51
44 231 3 5
Tone 3 [21] 44 44
43 32 42 21
? 44 51
? 3 5
Tone 4 [23] 21 44
22 44
23 31 34 21
42 21
53 13
5 13
21 51
33 51
45 231
42 231
[42 231]
3 5
4 5
Tone 5 [52] 44 44
33 32 21 21
33 13 33 51
21 231 3 5
Tone 6 [231] 44 44
43 33
43 32
42 21 31 13
33 51 32 231
3 5
Tone 7 [5] 44 44
33 32 42 21
31 13
33 13
33 51 33 231
3 5
Table 6.1. Fuzhou tone sandhi form
s.
refers to speaker WX
Q. H
e has three combinations that are different to the other
speakers. Firstly in the combination of tone 1 plus tone 2, he has, as w
ell as the observed [43 32], the form
[34 33] for all tokens, suggesting a lexicalized difference. In the com
bination of tone 4 plus tone 1, rather than falling on the first syllable, the pitch on the first syllable is level, that is, [22 44]. Thirdly, he does not have the variant w
ith the falling pitched sandhi tone of the com
bination of tone 4 and tone 4, just the output form [5 13].
refers to LY. She does not have the falling variant for the com
bination of tone 4 and tone 6 as the other speakers do, rather she only has the output form
[45 231]. Another difference
is that her tone 7 becomes a m
id level [33] before tone 4, rather than [31], as is found for the other speakers.
refers to ZPW. In the com
bination of tone 6 plus tone 2, rather than falling on the first syllable and rem
aining level on the second as is found for the other three speakers and represented in the table by [43 33], ZPW
falls gradually over both syllables, finishing the utterance betw
een 2 and 3 on the given scale, represented as [43 32] (though [43 33↓] m
ay be a better representation).
The final syllable has a pitch contour that mostly m
irrors same as the citation tone. H
owever,
in the combination of tone 2 and tone 3, both FM
and WX
Q appear to have a higher target
peak for the second syllable evidenced by a slightly higher fall on the second syllable (and consequent slight rise on the first syllable and/or short rise or level com
ponent to the fall on the second syllable). This is indicated in the table w
ith / [34 51] and [44 41].
S1 S1
S1
S4 S3
S4
S1 S2
S1 S2
S1
S1 S2
S3
S3
C
HA
PTER 6
78
As m
entioned in chapter 1, tone sandhi in Fuzhou is right dominant. That is, it is the last
syllable in a given domain that rem
ains unchanged. So in a disyllabic expression the second syllable m
aintains its citation form but provides the context for the first syllable that changes
accordingly. This is corroborated by my results. A
ll but two of the expressions evidenced w
hat appears to be the appropriate citation tone in the second syllable position. This can be noted by the vertical orientation of the changes. That is, there is m
ore correspondence between the
sandhi tones given the same tones as context for change, than betw
een the tones that are input for the sandhi changes.
6.2.2 Discussion of the tone sandhi data
I have modified table 6.1 to om
it the second syllable forms, as they have been seen to m
irror the citation tones in all but tw
o of the cases as noted above. The first syllable tones are shown
in table 6.2, which illustrates only those form
s to which the first syllable change. Table 6.2 has
also been reorganized to reflect the groupings of similar output sandhi tones. W
hen the first syllable in disyllabic expressions is tone 4, there are tw
o output sandhi forms in all but a few
cases. Tone 4 has thus been split into tw
o rows in table 6.2, grouping the sandhi form
s with
the other sandhi forms around them
. How
ever, for T4+T1, T4+T3, and T4+T4, there are not tw
o sandhi forms available. I have thus shaded out these cells in the table to indicate this.
The high falling tones found on a first syllable sandhi tone ([42, 43]) have also been grouped together and called [42] as the point of their offset are clearly conditioned by the onset of the follow
ing tone.
The table may be read in the sam
e way as table 6.1, but the tone occurring on the
second syllable must be taken from
along the top of the table where all second syllable tones
can be found, just like the sandhi tables in chapter 2. For example, w
hen tone 6 is on the first syllable, it w
ill change to a [42] before a tone 3. Thus the combination of tone 6 plus tone 3 is
[42 21]. In the bottom right of the table, there is a grouping of tone 2 plus tones 2, 3 and 6,
and tone 4 plus tone 6. These all have [44] as their sandhi tone. The ‘#’ indicates that in two of
these instances, the second syllable tones do not reflect the citation tone, but that, in these com
binations, tone 2 becomes [51], and tone 3, [31].
Note that:
# The second syllable tones 2 and 3 both change to a fall following this tone:
Tone 2 → [51] / tone 2 ____
Tone 3 → [31] / tone 2 ____
The same exceptions as noted in table 6.1 are included here, using the sam
e symbols to
indicate to which speaker they applied and the relevant form
(or lack) in parentheses next to the sym
bol.
F
UZH
OU
DISY
LLAB
IC TO
NE SA
ND
HI
79
Syllable 2 →
(context)
Tone 1
[44]
Tone 5
[51]
Tone 7
[5]
Tone 2
[32]
Tone 3
[21]
Tone 6
[231]
Tone 4
[23]
Syllable 1 ↓↓
output
sandhi tones
below
Tone 3 [21]
? ?
Tone 1 [44]
(34) 42
?
Tone 6 [231] 44
33
31
Tone 7 [5]
33 (5)
Tone 4 [23] n/a
n/a
(n/a) n/a
Tone 5 [52] 44
21
33
Tone 2 [32]
# #
44
Tone 4 [23] (22)
21 44
23 34
(5) 53
Table 6.2. First syllable changes in the disyllabic tone sandhi.
In this table we see extensive neutralization in the sandhi form
s. There are now just eight
different tonal pitch values (ten if you include the exceptions for speakers S1 and S3). There are four level tones: 5, 44, 33, 22 and four falling pitch tonal values: 42, 21, 53, 31. There are tw
o also rises: 23, 34.
Out of the 50 available form
s, the most com
mon form
s are level or falling: [33] (19 cells) and [44] (10 cells) and [42] (11 cells).
Finally, it is worth noting that there is no clear relationship betw
een the sandhi tone and the citation tone of the first syllable.
6.2.3 Com
parison with previous data
This section compares m
y data with som
e of the available published data. In particular, I will
compare the data w
ith the works outlined in chapter 1. To recall, the w
orks are by Chen &
N
orman (1965), C
han (1985) and Yip (1990), and w
ere presented in chapter 1. The sandhi tables em
ployed by each of these authors are tables 2.1, 2.3 and 2.5 respectively. I will
reproduce these here for convenience as tables 6.3, 6.4, 6.5.
S1
S1
S1
S3
S3
C
HA
PTER 6
80
Second syllable →→
Tone 1 [55]
Tones 5, 7 [52, 5]
Tones 2, 3, 4, 6 [22, 12, 24, 342]
First syllable ↓↓
Resulting sandhi tones (on first syllable) given below
Tone 1 [55]
Tone 3 [12]
55 52
Tone 6 [342]
Tone 4 (*<h) [24]
Tone 5 [51]
55
22
Tone 7 [5]
Tone 2 [22]
22 35
Tone 4 (*<k) [24]
Table 6.3. Fuzhou disyllabic tone sandhi form
s (Chen &
Norm
an 1965)
Second syllable →→
Tone 1
[44] H
Tones 5, 7 [51, 5]
HL
Tone 2 [32] L
Tones 3, 4, 6 [213, 13, 131]
LH
(L)
First syllable ↓↓
Resulting sandhi tones (for first syllable) given below
Tone 1 [44]
H
Tone 3 [213]
LH
44
33 53
51 Tone 6
[131] L
HL
H
H
H
L
HL
Tone 4 (*<h)
[13] L
H
Tone 5 [51]
HL
33
22 Tone 7
[5] H
L
L
L
Tone 2 [32]
L@
22
13 44
Tone 4 (*<k)
[13] L
@
L
LH
L
H
Table 6.4. Fuzhou disyllabic tone sandhi form
s (Chan 1985)
F
UZH
OU
DISY
LLAB
IC TO
NE SA
ND
HI
81
Second syllable →→
Tones 1, 5, 7
[44, 52, 4]
Tone 2
[22]
Tones 3, 4, 6
[12, 13, 242]
First syllable ↓↓
Resulting sandhi tones (for first syllable) given below
.
Tone 1 [44]
Tone 3 [12]
44
52
Tone 6 [242]
(13)
Tone 5 [52]
22
Tone 7 [4]
Tone 2 [22]
22
35
Tone 4 [13]
4
Table 6.5. Fuzhou disyllabic tone sandhi form
s (Yip 1990)
It must first be recalled that the m
eans of gathering the data was different for each of the
sources. Chan based her tone sandhi data on the speech of a single inform
ant and Yip bases
her data on those presented in a couple of different sources. The Chen &
Norm
an data were
taken from C
han (1985) who did not m
ention the source of their data. There are clearly discrepancies betw
een the published works: C
hen & N
orman’s data have only four different
sandhi tones (22, 35, 52, 55), Yip show
s four sandhi forms (22, 35, 44, 52) and C
han’s data have six different sandhi tones (though after feature assignm
ent, she has only four contrasts) (22, 13, 33, 44, 53, 51). M
y data, as represented in table 6.2 have eight different sandhi tones (21, 31, 23, 33, 34, 42, 44, 53).
In comparing the sandhi tones, general consensus m
ay be found in comparing the
forms to w
hich the first syllable changes when preceding a tone 1 ([44]); that is, all tones
become a high level [44] or [55] before the high level, except tone 2 and one of the sandhi
forms of tone 4 w
hich become [22] or [21]. These exceptions ([22], [21]) could easily be the
same target tones, transcribed slightly differently. H
owever, apart from
this, the data differ a lot, although there is m
uch more general agreem
ent between the different authors’ w
orks than w
ith mine. Som
e of these points are mentioned below
.
In my data, before tones 5 and 7 all tones have the sandhi pitch value of [33]. The
other authors give a range of pitch values for these sandhi tones ([22], [33], [44] and [55]), but crucially, there are at least tw
o contrasting pitch values for these sandhi tones in these previous studies. B
efore a tone 2, my data evidence the form
s [42, 33, 44, 23]. The other authors, how
ever, have fewer and som
ewhat different changes, such as [52, 53, 35, 13, 22, or
33]. The greatest disparity between the previous descriptions and the data obtained for this
work occurs in the sandhi form
s preceding tones 3, 4 and 6. My data have m
any different changes, and no particular obvious natural classes to describe the changes. A
ll of the other sources, how
ever, group these tones together and show just a few
changes. For example, C
han has the sandhi tones [51, 22, 44] as the three possible different sandhi tones occurring before a low
pitch-onset tone – tone 3, 4 or 6, while m
y data show [42, 31, 33, 21, 44, 5, 34, 53] for the
same sandhi tones.
C
HA
PTER 6
82
It can be seen that the data I have obtained from m
y four speakers is more com
plex than the data from
other sources mentioned here. This is likely the result of having data from
m
ore than one speaker as there is clearly between-speaker variation. A
nd despite the relative hom
ogeneity betw
een speakers,
there are
exceptions as
well,
not all
of w
hich w
ere idiosyncratic. This is im
portant as sandhi data is complicated and variation is a natural
outcome given the different learning environm
ents people have. This variation, however,
should not be ignored, but rather explained by the model.
The complexity of the new
data suggests that such an elegant phonological analysis (e.g. as Y
ip’s (cf. chapter 2)) is unlikely. That is, it is possible that the elegance of the phonological analysis is a function of the relative lack of com
plexities in the observation data (or perhaps phonetic abstraction), so it is interesting to see w
hether such an analysis will w
ork w
ith more com
plex data. I will explore this by analyzing the data w
ithin the same fram
ework
as Chan and Y
ip, that of Autosegm
ental Phonology (see also Zee & M
addieson 1979). H
owever, I first w
ill introduce some different approaches to the phonological representation of
tones in order to more fully dem
onstrate the advantages of Autosegm
ental Phonology.
6.3 Some view
s on tonological representation This section gives a brief overview
of some w
ays of phonologically representing tone, specifically contour tones, defined by M
adison (1977:337) as “a pitch glide which cannot be
predicted naturally from factors such as co-articulation and intonation, …
and cannot be generated by rule from
the environment.”
In 1967, Wang proposed that tones are a property of the w
hole syllable, and that all tones should be considered as units. This gave rise to contour features such as [±rise] or [±fall]. H
owever, despite the advantages this system
may have had in term
s of classifying contour tones, it could not properly account for all of the properties w
hich have since been noted for tones. In particular, that of tone stability and the existence of floating tones. Tonal stability is w
hen there is a change on the segmental level of a language (including segm
ental deletion), but the tones rem
ain unaffected by the change, and floating tones are morphem
es that consist of only tone, and show
up because of the effect they may have on other tones in
word derivations.
An exam
ple of tonal stability comes from
a Bantu language, Lom
ongo. In this language w
e find instances where the segm
entals are reduced or deleted, but the effects of the tones in the underlying form
s remain. For exam
ple, ‘they search’ consists of three morphem
es each w
ith its own level tone indicated by H (high) or L (low
): [ba H] + [as L] + [a L]. When
these morphem
es join together, the two m
edial [a]s simplify to one, but the effect of the tw
o tones is apparent by a falling tone surfacing on the first syllable of the output [basa H
L L]:
[ba – as – a ] ⟶ [ba – sa] (fall+low
)
H L L H
L L
An exam
ple of a floating tone comes from
the Chinese dialect C
antonese. In certain environm
ents such as the “familiar vocative” (Y
ip 1990:65), the tone on the given syllable m
ay change to a high rising tone, without any change in the segm
entals (e.g. 陳
Chan
(surname), low
fall becomes a high rise w
ith a familiar vocative use). A
nother system of
F
UZH
OU
DISY
LLAB
IC TO
NE SA
ND
HI
83
features, proposed by Woo (1969), is also unable to account for these properties, yet unlike
Wang, her proposal assum
ed contour tones to be sequences of level tones. A
n important approach to phonology w
as Autosegm
ental phonology, as originally proposed by G
oldsmith (1976). The m
ain aspect of this approach is that features can occur on tiers separate from
the segments, allow
ing for assimilation of any feature. A
more im
portant consequence of this for tones, how
ever, is that tone features can be viewed on separate tiers,
and thus may be regarded as autosegm
ental. This accounts, then, for phenomena such as the
aforementioned floating tones and tonal stability by enabling rules to act on the unit on any of
these tiers without interfering w
ith the other tiers. For instance, the Lomongo problem
is solved by saying that w
hile the segments are reduced at one level of representation, the tones
are not affected by this process. Sequences of two identical tones are redundant and reduced
by the Obligatory C
ontour Principle (OC
P), which states that “at the m
elodic level, adjacent identical elem
ents are prohibited” (McC
arthy 1986:208). Indeed, tones can associate with one
or more syllables according to som
e general principles to account for a lot of non-linear facts. In the C
antonese example, a floating H
tone is posited as underlying the specification of ‘fam
iliar vocative’, which is associated to the syllable w
hen this specification is realized with
the given syllable.
An autosegm
ental framew
ork is, then, considered a desirable framew
ork for describing tonal languages. H
owever, the features used in tonal description still have to be chosen.
Goldsm
ith notes that features should be able to capture both the classificatory and component
functions of tones; that is, features should be “a way of establishing w
hat a ‘natural class’ in phonological statem
ents will be, …
[and] a way of specifying the several and sim
ultaneous characteristics that com
prise what is, from
the point of view of the flow
of time, a single
articulatory or acoustic event” (1989:274-275). Hym
an (1986) states that a feature system
must be able to account for four contrasting tone heights, and that the features in the system
m
ust “capture the natural relationships that exist between tone heights w
ithin a language” (1986:110). Follow
ing from this H
yman explains that tones that constitute a natural class m
ust also share a feature, and furtherm
ore, that the features must m
ake it easier to explain these natural classes and natural rules, thus reflecting the m
arkedness property of tones.
In her 1980 thesis, which w
as an extensive study on Chinese tonology, Y
ip basically follow
s an autosegmental approach, concluding that “only a theory in w
hich tone is suprasegm
ental, and
contour tones
are represented
by sequences
of level
tones, w
ill satisfactorily account for the observed properties of tone” (1990:21; see also Y
ip 2002). Her
system has already been outlined in chapter 2, though to recall, Y
ip employs a R
egister feature. In accord w
ith universals that have been found for tones as stated by Hym
an, this feature restricts the tonal inventory to four tone heights and a m
aximum
of two of any given
contour. How
ever, Yip (1989) m
odifies her system in term
s of the relationship between the
two binary features proposed for tones: [upper], the R
egister feature, and [high] (now called
[raised] after Pulleyblank (1986)) specifying the Tone or melody. She argues that contour
tones in East Asian languages “show
all the behavior predicted by a theory in which they are
melodic units consisting of a root node [upper] dom
inating a branching specification for [raised]” (Y
ip 1989:171). The evidence comes from
initial tone association, spreading, and O
CP effects, w
hich Yip show
s affects tones as whole units. She also argues that by identifying
[upper] as the tonal root node, the fact that no more than tw
o of any given contour may
contrast underlyingly is explained, as six is the logically possible number of contrasting
contour tones in a system for w
hich there are four tone levels specified. She distinguishes
C
HA
PTER 6
84
between branching tones or single m
elodic units, and tonal clusters. These are best represented diagram
matically, as show
n on a rising tone below:
σ
σ
°
Tonal root level
°
°
L
H
L
H
Branching tone/single m
elodic unit
Tone cluster
Yip also argues that the ‘extra-com
plex’ tones, concave or convex contours, are usually sim
plified utterance-internally, to either rising/falling or level tones, which is the result of
associating two m
elodic units. Two exam
ples of these are taken from Suzhou, w
hich has both a concave and a convex tone. These m
ay be represented as follows:
σ
σ
°
°
°
°
H
L H
L
H
L
The branching tone appears on the left-hand side of the complex tone, because phonological
behavior shows this to be the correct grouping. For exam
ple, the concave tone /HLH
/, when
spread over two syllables, evidences a fall on the first syllable and a m
id-high tone on the second, suggesting [H
L.H]. Sim
ilarly, the convex tone /LHL/, evidences a rise on the first
syllable and a low tone on the second, [LH
.L] (Yip, 1989:155).
In the next two sections, I analyze the tonological system
in Fuzhou in two w
ays. The first w
ill be using an autosegmental approach. A
nother, more diachronically m
otivated analysis w
ill be explored and then compared w
ith the autosegmental analysis to see w
hich m
ay better account for the tonal alternations found in Fuzhou.
6.4 Fuzhou tonology: Autosegm
ental In this section I explore the capability of A
utosegmental Phonology to m
odel the sandhi forms
found in disyllabic expressions in Fuzhou. Although this has already been done by a few
scholars, including C
han (1985) and Yip (1990), w
e observed in section 6.2.3 that the data from
which they w
orked was quite different to that w
hich I have obtained, and relatively sim
pler. Thus simple, elegant solutions w
ere facilitated. I will explore the extent to w
hich the
F
UZH
OU
DISY
LLAB
IC TO
NE SA
ND
HI
85
methods used to derive these solutions w
ill be as elegant in the face of some greater
complexities.
In accordance with the generative approach, I posit underlying form
s and phonetically plausible rules to derive the correct surface form
s. I employ tw
o binary features to describe and account for the tones in the w
ay which best captures w
hat can be determined to be natural
classes. I also believe it is important to be able to account for the individual and betw
een-speaker variations w
hich were found in the data, along w
ith the other possible tonal variations (i.e. tone 4). A
lthough the data is complex, no m
atter what can be explained in term
s of diachrony the fact rem
ains that all the four speakers nearly all produced the same form
s for each of the sandhi tones. Thus the com
plexity is not an accident and must be accounted for in
the synchronic phonology of Fuzhou. How
ever, whether the changes are really the result of
rule-governed behavior (and if so whether or not they are generative rules), or the result of
mem
orized tone shapes or substitution needs to be investigated further.
6.4.1 Tonal feature assignment
Following Y
ip, each of the tones is made up of tw
o features: Register and Tone. M
y tonal feature assignm
ent also follows Y
ip in that the [upper] feature is the tonal root node, and that the ‘extra com
plex’ tone, [231], is the result of associating two m
elodic units. For ease of exposition, I abbreviate the Tone feature [±high/raised] to sim
ply H or L. A
s noted above, it is desirable that the tonal features capture natural classes that exist in the system
. So I will
primarily assign features on the grounds of the tone’s phonological behavior, and secondarily
on its phonetic form. H
owever, com
pared with the other published data on the tone sandhi (cf.
section 6.3), there are not many natural classes in the data.
The most apparent natural class is that of tones 5 and 7, before w
hich all tones become a [33].
This natural class is captured by both of these tones having the same feature assignm
ent, [+upper, H
L].
Tones 5 and 7
σ
[+upper]
H
L
Tones 5 and 7 are then distinguished from each other by a rule that deletes the last branching
Tone in a syllable in the presence of a final glottal stop:
C
HA
PTER 6
86
σ
°
X →
Ø / _____ʔ
X
This rule also applies to tone 4, in deriving the low-rising stopped tone from
/LHL/.
Tone 6 is a convex tone in the lower half of both the F0 range and the pitch range. Thus it
should have the features [–upper, L(HL)]. This is then, follow
ing Yip (1989), a com
bination of tw
o melodic units the second of w
hich is branching, structured as follows:
Tones 4 and 6
σ
[–upper] [–upper]
L H
L
Tone 4, like tone 7, undergoes final Tone deletion in the presence of a final glottal stop, and has the sam
e feature assignment as tone 6, [–upper, L(H
L)].
The assignment of features to tone 1 is non-problem
atic, as it is basically level and high. Therefore it has to be in the upper register, and the Tone feature has to be H
.
Tone 1
σ
[+upper]
H
The mid level/fall tone 2 could feasibly be [+upper, L], [–upper, H
], or [–upper, HL] O
n the basis of its pitch shape. Tones 2 and 5 have the sam
e sandhi tones before tone 4, just as tones 2 and 4 have the sam
e sandhi tones before tones 1 and 6. I suggest that it is the shared feature sequence of H
L that is the comm
on denominator in these cases, and w
ill assign tone 2 the features [–upper, H
L].
F
UZH
OU
DISY
LLAB
IC TO
NE SA
ND
HI
87
Tone 2
[–upper]
H
L
Tone 3 has been given the features [–upper, L]. This is because if tone 2 is assigned the features [–upper, H
L], then tone 3 must be featurally distinct in order to be distinguished from
it.
Tone 3
σ
[–upper]
L
To recap then, the tonal feature assignment is as show
n in table 6.6:
Pitch Tone
[+upper]
Pitch Tone
[–upper]
[44] T1
H
[32]
T2 H
L
[51], [5] T5, T7
HL
[21] T3
L
[23], [231] T4, T6
L(H
L)
Table 6.6. Fuzhou tonal feature assignm
ent
The sandhi table may now
be modified to show
the possible features of the sandhi tone in question as w
ell as the pitch values. I have done this by putting upper register in the upper case letters of the corresponding tone features, and those in the low
er register in lower case.
The same exceptions are noted:
# The second syllable tones 2 and 3 both change to a fall following this tone, i.e.
Tone 2 → [51] / tone 2 ____
Tone 3 →
[31] / tone 2 ____
C
HA
PTER 6
88
Second syllable →
T 5, 7 H
L
Tone 1 H
Tone 2
h l Tone 3
l Tone 6
lhl Tone 4
lhl First Syllable ↓↓
O
utput sandhi tones below
Tone 3 (h)l H
/__T5
? ?
Tone 1 H
LH H
L
?
Tone 6 lhl
hl Tone 7 H
L
h H
h
h
h
Tone 4 lhl
n/a n/a
*HL
n/a
Tone 5 HL
hl
h
Tone 2 hl
#
# H
Tone 4 lhl H
/__T7 hl /__T5
hl h
LH
4
HL
Table 6.7. Possible feature assignm
ent to the sandhi tones.
Notes on the feature assignm
ent for the sandhi tones:
i. [44] is [+upper, H
], exactly the same as the citation tone 1.
ii.
All sandhi tones [33] have been specified as [–upper, H
], as has the sandhi tone [23]. Yip
(2002) notes that mid level tones could equally be [+upper, L] or [–upper, H
].
iii. The features [–upper, HL] have been given to tones w
ith values of either [31] or [21].
iv. The tones with pitch values of [34] and [53] have been specified as [+upper LH
] and [+upper H
L], respectively. v.
In the bottom left of the table, it can be seen that tone 4 w
ill become [+upper H
] (=[44]), before a tone 5.
vi. Recall the individual exceptions, indicated on the table w
ith S1 and S3:
a. S1: T1 + T2 = H
+ hl ⟶ LH
hl b.
S3: T7 + T4 = HL + lhl ⟶
h lhl c.
S3: T4 + T6 = lhl + lhl ⟶ H
lhl only; no second option of HL lhl (*H
L) N
ext I
present the
rules needed
to derive
the surface
forms
from
these underlying
representations.
6.4.2 Rules and derivations
In this section I propose rules to derive surface form
s. These rules will be organized into tw
o parts; firstly the rules for deriving the correct Tone features, then those for deriving the correct R
egister. I propose only five rules to account for most of the Tone feature derivations, as w
ell
S1
S3
S3
F
UZH
OU
DISY
LLAB
IC TO
NE SA
ND
HI
89
as using the phonetic rules mentioned in section 6.2.2 for sm
all differences in surface forms,
like the offset of the fall, [42, 43]. The five rules are:
Tone R
ule 1: Tone deletion.
All features on the tone are to be deleted before a tone w
ith the upper Register feature.
(= TR
N)
[±upper]
→ Ø
/ _______ [+upper] T
one Rule 2: Sandhi H
docking.
The floating H on tone 3 attaches to the TR
N.
Tone R
ule 3: Final L deletion.
If the most rightw
ard branch of a TRN
is an L ([–raised]), delete it when in non-final (sandhi)
position.
TRN
but
TRN
H
L → Ø
L
Tone R
ule 4: Com
plex tone simplification.
Should a tone have two TR
Ns, delete the m
ost leftwards TR
N w
hen in non-final position.
σ
Ø ←
° °
L H
L
Tone R
ule 5: Tone association.
Left-to-right association of melody: associate the tone m
elody feature from the second syllable
to the first.
σ
σ
X
Y
One very general exception is that rule 5 is never applied to tone 2, that is, it never associates
with the follow
ing syllable. There is also the application of the phonetic rule, as presented
C
HA
PTER 6
90
above (most recently below
table 6.6) to account for small surface difference in the offset
point of the falling sandhi tone [43, 42].
While these rules w
ill correctly derive the surface Tone features, the situation is not as sim
ple for the Register feature. This is because there are necessarily both upper and low
er register features in the output sandhi tones, given that there are form
s such as [53] and [21]. H
owever, the predictability of w
hich feature will occur on the sandhi tone is not as easy as
taking the feature from either syllable as there are cases w
hen the feature may be that of the
first syllable, the second, both or neither (i.e. T2 + T1 has the [–upper] feature of the first syllable for the sandhi tone; T5 + T6 has the [–upper] feature of the second syllable for the sandhi tone; T6 + T4 both have the feature [–upper], as does the sandhi tone; and T3 + T2 both have the [–upper] feature underlyingly, but the sandhi tone surfaces w
ith the [+upper] feature). The m
ajor natural classes of context tones for sandhi are listed first, because they determ
ine the output value of the register feature of the sandhi tone as context:
A
ll sandhi tones →
[+
upper] before a tone 1
A
ll sandhi tones →
[–upper] before tones 5 and 7
How
ever, for the remaining changes of values for the R
egister feature on the sandhi tone, generalities are m
ost easily seen in terms of the input tones:
Tone 1 →
[+
upper]
Tone 2 →
[+
upper]
Tone 3 →
[+
upper]
Tone 4 →
[+
upper], except: T4 + T2
→
[–upper]
Tone 5 →
[–upper]
Tone 6 →
[+
upper], except: T6 + T4
→
[–upper]
Tone 7 →
[–upper],
except: T7 + T3 →
[+
upper]
The generality seems to be for the tone to be in the upper register, but this does not apply
across the board. This is a crucial difference between Y
ip’s data set and the one presented here: it is not possible to claim
that all sandhi tones have the Register feature [+upper] in
sandhi position with these data. H
owever, I w
ill generalize the above rules to the following.
For [+upper] tones, there are clear generalizations:
� The sandhi tone in an expression containing a tone 1 as either input or context is [+
upper] �
The sandhi tone in an expression containing tone 5 or tone 7 as either input or context is [–upper]. Exception: tone 7 + tone 3 yields a [+upper] sandhi tone.
The resulting Register values for the sandhi tones for all tones as input tones are listed below
.
Register Rule 1: A
ll tones →
[+
upper]
but, as mentioned above,
F
UZH
OU
DISY
LLAB
IC TO
NE SA
ND
HI
91
Register Rule 2: B
efore Tones 5, 7 →
[–upper]
Three exceptions to these rules must be noted:
Register Rule 3: Tone 4 + Tone 2
→
[–upper]
Register Rule 4: Tone 6 + Tone 4
→
[–upper]
Register Rule 5: Tone 7 + Tone 3
→
[+upper]
I illustrate the application of the relevant rules by giving six sample derivations before
presenting the problems involved w
ith this analysis.
1. Tone 3 + Tone 1 (退婚
[tuì hūn] ‘annulment’)
[21] + [44] ⟶ [44 44]
Rule 1: Tone deletion
σ
σ
→
σ
σ
[–upper] [+upper]
[+upper]
L
H
H
Rule 5: Tone association
σ
σ
H
Register Rule 1: all tones→[+upper]
Surface form: /H
H/
[44 44]
σ
σ
H
H
Note: The exception is the com
bination of tone 2 and tone 1, where the resulting sandhi form
of tone 2 is [21], not [44]. I propose that this is due to tone 2 failing to lose its features.
C
HA
PTER 6
92
2. Tone 4 + Tone 3 (答應
[dā yìng] ‘promise’)
[23] + [21] ⟶ [42 21]
⟶
[34 21]
Variant 1
σ
σ
L
H L
L
Rule 3: Final L deletion
σ
σ
L
H L →
Ø
L
Rule 4: Com
plex tone simplification
σ
σ
Ø ←
L H
L
Rule 5: Tone association
σ
σ
H
L
Register Rule 1: all tones→[+upper]
Surface form: /H
L l/
[42 21]
σ
σ
H
L
L
F
UZH
OU
DISY
LLAB
IC TO
NE SA
ND
HI
93
Variant 2
σ
σ
L
H L
L
Rule 3: Final L deletion
σ
σ
L
H L →
Ø
L
Rule 4: Com
plex tone simplification – does not apply here.
Rule 5: Tone association – n/a due to nonapplication of Rule 4
Register Rule 1: all tones→[+upper]
Surface form: /H
L l/
[34 21]
σ
σ
L H
L
C
HA
PTER 6
94
3. Tone 4 + Tone 6 (不憤
‘careless [Fuzhou]’) [23] + [231] ⟶
[42 231]
⟶
[42 231]
Variant 1
σ
σ
L
H L
L
H L
Rule 3: Final L deletion
σ
σ
L
H L →
Ø
L H
L
Rule 4: Com
plex tone simplification
σ
σ
Ø ←
L H
L H
L
Rule 5: Tone association
σ
σ
H
L H
L
Register Rule 1: all tones→[+upper]
Surface form: /H
L lhl/
[42 231]
σ
σ
H
L
L
H L
F
UZH
OU
DISY
LLAB
IC TO
NE SA
ND
HI
95
Variant 2
σ
σ
L
H L
L
H L
Rule 3: Final L deletion
σ
σ
L
H L →
Ø
L H
L
Rule 4: Com
plex tone simplification
σ
σ
Ø ←
L H
L H
L
Rule 5: Tone association – did not apply
Register Rule 1: all tones→[+upper]
Surface form: /H
L lhl/
[42 231]
σ
σ
H
L
L
H L
Note: S3 (LY
)’s failure to produce the variant with the falling pitched sandhi tone suggests
that she has generalized the absence of Rule 5 in this com
bination to all instances of tone 4 plus tone 6.
C
HA
PTER 6
96
4. Tone 5 + Tone 2 (蘋果
píng guŏ ‘apple’) [52] + [32] ⟶
[33 32]
σ
σ
H
L
H L
Rule 3: Final L deletion
σ
σ
H
L → Ø
H L
Rule 5: Tone association
σ
σ
H
H L
Register Rule 2: tones 5, 7→[−upper]
Surface form: /h hl/
[33 32]
σ
σ
H
H L
F
UZH
OU
DISY
LLAB
IC TO
NE SA
ND
HI
97
5. Tone 7 + Tone 4 (物質 [w
ù zhì] ‘material’)
[5] + [23] ⟶ [42 23]
σ
σ
H
L
L
H L
Rule 3: Final L deletion
σ
σ
H
L →
Ø
L H
L
Rule 5: Tone association
σ
σ
H
L
H L
Register Rule 2: tones 5, 7→[−upper]
Surface form: /hl lhl/
[42 23]
σ
σ
H
L
L
H L
Note: LY
again failed to apply rule 5, thus she has a surface form of /h lhl/.
C
HA
PTER 6
98
6. Tone 2 + Tone 3 (解放
[jiě fang] ‘emancipate’)
[32] + [21] ⟶ [44 21]
σ
σ
H
L
L
Rule 3: Final L deletion
σ
σ
H
L → Ø
L
Rule 5 does not apply when tone 2 is the input tone.
Register Rule 1: all tones→[+upper]
Surface form: /H
hl/ [44 21]
σ
σ
H
L
Above I have given som
e sample derivations to illustrate the application of the rules and
derivations. I shall now indicate w
hich combinations cannot be correctly derived given this set
of rules.
6.4.3 Explaining the variation There are a few
cases where the application of these rules deviates slightly. These are
presented below, before the discussing solutions to w
hat seem to be problem
s.
F
UZH
OU
DISY
LLAB
IC TO
NE SA
ND
HI
99
Explanations of som
e of the sandhi forms:
i. The com
bination of tone 3 and tone 2 /(h)l + hl/, fails to undergo rule 3 (Final-L deletion), resulting in H
L.
ii. Tone 4 does not undergo rule 1 w
hen preceding tones 1 and 5 (resulting sandhi form of
hl. No other rules apply except the com
plex tone simplification. (There is another
T4+T5 that is derived following the rules as expected, w
ith a sandhi form h.)
iii. Tone 4 + Tone 7 can result in the sandhi tone as H
or hl. hl is straightforwardly derived
from the rules given. H
can be explained as the spread of register with the tone feature
in rule 5 (Tone association). iv.
The combination of tone 5 and tone 4 produces a /h/, rather than an expected /hl/. This
may be explained in term
s of rule 5 (Tonal association) failing to apply. The com
bination of tone 7 and tone 6 also fails to undergo rule 5. v.
Tone 6, when preceding tone 2, fails to undergo rule 3 (L deletion), thus barring the
application of rule 5, due to the tonal root node already having two features attached to
it, resulting in HL
. vi.
Tone 7
+ Tone
4 is
produced as
H
or H
L before
lhl. The
H
is produced
straightforwardly from
the rules. The HL results from
not applying rule 3 (final L deletion).
vii. Tone 1 + Tone 2 is the only really problem
atic case. H + hl, results in sandhi form
HL,
but LH for speaker 1. B
oth forms differ from
the expected H sandhi form
. This could be through associating the tonal node (before register) for the H
L, thus copying both tonal features, and undergoing m
etathesis for S1.
Problems:
The forms on the second syllable in the com
binations of tone 2 and either tone 2 or tone 3 appear to be a problem
. To recall, these are the combinations for w
hich the second syllable is not that of citation, but rather tone 2 becom
es a [51], and tone 3, a [31]. I suggest that in the com
bination of tone 2 plus tone 2, having undergone normal rule application the R
egister value for the sandhi tone spreads onto the second syllable, thus changing the citation tone to [+upper] register also, giving /H
HL/. A
s for tone 2 and tone 3, whose output is [44 31], I
suggest that normal rules apply, and that the rise from
[21] to [31] on the second syllable tone 3 is phonetically conditioned. A
lternatively, it could be suggested that the floating ‘H’ on tone
3, which norm
ally docks in non-final position, underwent the docking rule in final position.
6.5 Fuzhou tonology: Categorical approach
It has been shown (e.g. B
allard 1980) for the Wu dialect group of C
hinese, just north of Min,
that many of the alternations in tone sandhi are not phonetically m
otivated and that ordered rules only serve to “com
plicate the description of the synchronic phenomena and obscure the
diachronic relationships among the gram
mars of these dialects” (B
allard 1980:83). Ballard’s
solution to this, that I have called the ‘categorical approach’, involves expressing the sandhi changes in term
s of abstract diachronic categories.
As w
as mentioned in chapter 1, by the Sixth C
entury AD
, Chinese already “exhibited
four classes of morphem
es (monosyllables) distinguished solely by tone and/or syllable final
C
HA
PTER 6
100
consonants: ping ‘level’, shang ‘rising’, qu ‘departing’, and ru ‘entering’ ” (Ballard 1980:84).
The first three of these groups end in sonorant segments, and the fourth group ends in a
voiceless stop. A split then occurred, w
hereby each tone developed two allotones, one higher
(yin) and one lower (yang), the environm
ent being determined by a voiceless or a voiced
initial consonant
respectively. Since
then, m
any dialects
have lost
the initial
voicing distinction, and the final stops, thus giving rise to tonal splits. A
s well as this m
any mergers
have subsequently occurred. Nonetheless, it has been found that tone sandhi operates in term
s of these diachronic categories, defining natural classes as either input or as contexts for sandhi, despite the phonetic dissim
ilarity of their values in citation. An exam
ple of this is taken from
the Wu dialect, W
enzhou which has tw
o identifiable natural classes in disyllabic lexical tone sandhi: 1. {high fall, low
level} and 2. {mid fall, m
id level}. While phonetically
these do not appear to be readily identifiable natural classes, in terms of diachronic categories,
these are the 1. Qu and 2. Ping tones (R
ose 2004). Specifically, in his 2004 study, Rose
showed that the Ping tones form
a natural class on the second syllable of disyllabic words as
conditioning environments for first syllable changes. The Q
u tones undergo the same changes
on the second syllable. First, the Ping tones include Ia and Ib, and the Qu tones are 3a and 3b
and the phonetic forms of the tones in W
enzhou are given in table 6.8.
Ping
Shang Q
u R
u
Yin
Ia [33] IIa [34]
IIIa [51] IV
a [3312]
Yang
Ib [331] IIb [114]
IIIb [222] IV
b [2212]
Table 6.8. M
iddle Chinese tone categories/ W
enzhou tonal contours
(1) N
atural class for conditioning the form on the first syllable (R
ose 2004:238)
Ia + Ia feɪ tsz̩ʔ
[32 33] ��
‘aeroplane’
Ib + Ia nĩ tɕ
hɐŋʔ [32 33]
��
‘young’
Ia + Ib t hi dɔ
[21 11] ��
‘paradise’
Ib + Ib beɪ dʑaʊ
[21 11�]
�
‘ball’
(2) N
atural class undergoing same changes on the second syllable (R
ose 2004:240)
Ia + IIIa ˈts hɐŋ ts hěʔ
[22 4] �
‘vegetable
Ib + IIIa ˈbeɪ ts hěʔ
[11 4] ��
‘tem
per
Ia + IIIb ˈsa dʊ̌ŋʔ
[22 4] ��
‘cave
Ib + IIIb ˈdʑjo dø̌ʔ
[11 34] �
‘silks and satins’
The data show that the phonetically im
plausible grouping of [51] and [222] are a clear natural class in (2) and that the m
id level and mid fall group together, to the exclusion of the other
mid tones [34], [3312], as a natural class for determ
ining sandhi tones.
F
UZH
OU
DISY
LLAB
IC TO
NE SA
ND
HI
101
In his work the W
u dialect, Shaoxing, Ballard (1980:110) captures som
e key (and phonetically opaque) phonological behavior through the use of categories. B
allard (1980:143) takes this to be an indication of the psychological reality of tonal categories in the m
ind of the native speaker. H
e proposes writing first in term
s of categories, not phonetic values. This is reinforced by the fact that the sandhi does not m
ake phonetic sense, although conversely, it can be seen that som
e of the simple categorical shifts have been “perturbed by phonetic shifts
towards phonetic naturalness and plausibility” (1980:144). B
allard thus proposes that tone sandhi phenom
ena, for Wu dialects at least, are m
ost elegantly described in terms of tw
o types of phenom
ena: categorical shifts and realization rules, a finding corroborated by Rose (2004)
for Wenzhou.
I apply this approach to the Fuzhou data, to see to what extent, if at all, Fuzhou tone
sandhi operates in terms of categories, and a sim
pler, more elegant account of the tone sandhi
can be achieved, as it can for Wu. This follow
s in the next section.
6.5.1 Categorically Fuzhou
This section presents an analysis of Fuzhou tone sandhi using the categories as input and context for the rules. I w
ill reproduce the sandhi table, but this time using the categories.
Firstly, though, I will show
to which categories the different tones belong.
Ping
Shang Q
u R
u
Yin
Tone 1 [44] Tone 2 [32]
Tone 3 [21] Tone 4 [23ʔ]
Yang
Tone 5 [51]
Tone 6 [231] Tone 7 [5ʔ]
Table 6.9. Fuzhou tones and traditional M
C categories
Note that the Y
angshang tone did split, and that the resulting Yangshang m
orphemes w
ith obstruent initials m
erged tonally with Y
angqu, and the Yangshang m
orphemes w
ith initial sonorants m
erged again with Y
inshang (Norm
an 1988:240). The sandhi table can now be
illustrated thus:
C
HA
PTER 6
102 Second syllable →
Ping Shang
Qu
Ru
Yin
Yang
Yin
Yang
Yin
Yang
First syllable ↓↓
Resulting sandhi tones (on first syllable) given below
Ping Y
in 44
33 43
42 42
? 3
Y
ang 44
33 33
21 21
33 3
Shang 21
33 44 #
44/33 #
44 33
3/2
Qu
Yin
44 44
43 42
? ?
3
Y
ang 44
33 43
42 42
31 3
Ru
Yin
44/21 33
23 43/34
42/44 53
3/4
Y
ang 44
33 3
42 33
31 3
Table 6.10. Fuzhou tone sandhi organized in term
s of MC
tonal categories.
# The second syllable tones 2 and 3 both change to a fall following this tone.
Shang → Y
ang Ping / Shang ____
Yin Q
u → [31] / Shang ____
The tone sandhi rules will be expressed below
, in a list. They are not ordered rules. I have used the categories for both the inputs and conditioning contexts, but I have put the output sandhi tones in rather broad term
s such as ‘level’ and ‘fall’, which are further specified at the
end of the rules. I will speculate on the possible tone category shifts, that is, to w
hich of the citation tones the output sandhi tones m
ay be related.
First syllable changes:
1 Ping
→
level / _____ Ping, Shang, R
u
→
fall / _____ Q
u
2 Shang
→
fall / _____ Y
in Ping
→
level / _____ Yang Ping, Shang, Q
u, Ru
3 Q
u
→
level / _____ Ping, Yang R
u
→
fall / _____ Shang, Q
u, Yin R
u
4 R
u
→
level / _____ Ping, Yang R
u
→
fall / _____ Q
u, Yin R
u
5 Y
in Ru
→
rise / _____ Shang
6 Y
ang Ru
→
level / _____ Shang
F
UZH
OU
DISY
LLAB
IC TO
NE SA
ND
HI
103
Second syllable changes:
7 Shang
→
fall / Shang _____
8 Y
in Qu
→
fall / Shang _____
To derive the correct place in the pitch range for the different sandhi tones, the following rules
must apply.
First syllable:
1. W
hen Ping is the context, its Yin/Y
ang distinction determines the relative pitch height,
except for the Yin Q
u which is alw
ays high.
2. W
hen Qu is the context, it is the Y
in/yang distinction of the input tone which determ
ines the relative pitch height for the ping tones
3. W
hen Ru is the context, the only difference in pitch height realization is w
hen they are follow
ing the other Ru tones, in w
hich case it is the Yin/Y
ang distinction of the input tones w
hich determines the relative pitch height.
4. W
hen Shang and Qu are the context for the input Shang tone, the values are relatively
higher than when Ping and R
u are the context. As a context for Shang, Y
in Ping results in the low
est value of the output sandhi tone.
Second syllable:
1. W
hen shang provides the context for the second syllable to change, the shang will have a
much higher sandhi tone than the qu.
These are the rules that can produce the different sandhi tones, as seen in table 6.10. Next I
will speculate on the possible citation tones w
hich could replace the very general terms of
‘level’ and ‘fall’.
That Yin Ping is substituted for the higher level tones is non-problem
atic. The lower
level tone [33] could arguably be identified with the m
id tone, Shang. The falls ([42, 43]) m
ust be equated with the Y
ang Ping tone, both on the basis of its phonetic similarity and
because it is the only falling tone in the citation tone system. The rise [23] should probably be
likened to the Yin R
u tone, as this is the only rising tone, and as this is the sandhi form of the
Yin R
u, it is very likely that the tone didn’t change at all, other than losing its final stop. I will
now present these rules rew
ritten in terms of categorical shifts below
.
First syllable changes:
1 A
ll tones →
Y
in Ping / _____ Y
in Ping
→
Shang
/ _____ Yang Ping, Y
ang Ru
C
HA
PTER 6
104
2 Y
ang Ping →
Y
in Qu
/ _____ Qu
→
Shang
/ _____ Shang, Ru
3 Shang
→
Shang / _____ Y
in Qu, R
u
→
Y
in Ping / _____ Y
ang Qu, Shang
4 Y
in Ping,
Qu, Y
in Ru
→
Yang Ping
/ _____ Qu, Shang
5 R
u
→
Yang Ping
/ _____ Yin Q
u
6 Y
ang Ru
→
Shang / _____ Shang, Y
ang Qu
→
Y
in Qu
/ _____ Yin R
u
7 Y
in Ru
→
Yang Ping
/ _____ Yin R
u
→
Y
in Ru
/ _____ Shang Second syllable changes:
8 Shang
→
Yang Ping
/ Shang _____ The realization of these tones is as can be read from
the sandhi table, with just a few
more
specific ones rules are as follows:
Y
ang Ping →
[51]
/ X _____
→
[42] / _____ X
[2] →
[3]
/ 4 __# 3 X
Y
in Qu
→
[31] / [44] _____
While this generates the correct form
s it is not otherwise very revealing. In the follow
ing section I com
pare the two analyses of Fuzhou tonology to see w
hich of the two approaches
better accounts for the data.
F
UZH
OU
DISY
LLAB
IC TO
NE SA
ND
HI
105
6.6 A com
parison of the approaches It is hard to com
pare these two approaches as they are so different in their aim
s and consequently in their m
ethod. How
ever, I shall assess them on w
hat is a primary criterion for a
phonological analysis, and that is simplicity. N
atural classes as they are conceived of by A
utosegmentalists are of great im
portance to the phonology. How
ever, ‘natural class’ has a different significance in a categorical analysis, as the natural classes are presum
ed to correspond to the tonal categories, w
hich form the basis of the analysis. N
onetheless, the natural classes that are in m
y data, and how w
ell the phonologies capture these, will also be
considered in the comparison.
One really positive aspect of the autosegm
ental approach is that it can capture the natural class consisting of tones 5 and 7, as they group as context for the sandhi, and for R
egister features in sandhi. This is nicely accounted for in terms of them
sharing the same
features. The categorical approach cannot do this.
In terms of sim
plicity, neither are really good. Both the autosegm
ental analysis and the categorical analysis have a lot of rules. The form
er, however, captures m
ore generalities and the m
ajor natural classes in the data. This is done by being able to explain the natural classes in term
s of shared features and having five rules for each of the tonal features which applies to
all of the underlying representations before they surface. This analysis can also explain the few
between-speaker differences still in term
s of the application or not of these rules. These explanation of these differences are treated as further rules. The m
ain drawback of this
analysis is that the Register feature cannot be analyzed m
ore neatly.
While not having any “further rules”, the categorical approach has a longer list of
different rules. This approach, however, show
s the neutralization clearly. For example, w
hen all tones becom
e a [44] before another [44], this can be stated as all tones becoming a tone 1
before another tone 1. Whilst this is m
ore appealing to our idea of what seem
s to be going on, the difference is in the level of abstraction from
the observed data: the generative model posits
changes to underlying forms to account for the surface form
s, while the categorical m
odel sim
ply replaces tones.
It is certainly appealing to be able to generalize that all tones do something as this is
what w
e can easily see on the surface, in other circumstances (i.e. w
hen working in a different
framew
ork which aim
s to represent the competence, not just the perform
ance of the speaker) it m
ay be better to say that there are ‘X’ num
ber of rules which are used to derive the sandhi
tones, the generalization being that they all apply where they can. W
hich analysis is preferable, then, depends on the aim
s of the description and thus lies with the goals of the
researcher (and thus the choice of the theory).
6.7 Summ
ary and conclusion In this chapter, I presented the data for the disyllabic tone sandhi in Fuzhou. It w
as found that, contrary to previous studies, the data obtained for the sandhi form
s was very com
plicated, in that there are m
ore natural classes, but which are not phonetically m
otivated. As all four
speakers produced very nearly the same sandhi form
s, however, w
e can be sure that this set of alternations is real, at least for these w
ords. These data were then analyzed in term
s of two
C
HA
PTER 6
106
different framew
orks: a generative-autosegmental approach and a categorical approach. W
hile both can be m
anipulated to capture the data, neither of these theoretical framew
orks produced very neat or concise accounts of the data.
The lack of phoneticity suggests that it is possibly not rule-governed. Work w
ith native speakers testing w
hether or not the sandhi is actually rule-governed, perhaps along the lines of H
sieh (1970), would have to be done to determ
ine this. This, then, addresses a very important
question, as it is now generally assum
ed that tone sandhi is a process. The categorical analysis is a m
ore ‘static’ account. That is, it explains things in terms of substitution, not in term
s of dynam
ic processes of change, be they with features or phonetic values. H
owever, it could just
be that there are surface phonetic constraints on tones in prepausal position in Fuzhou not at all related to the underlying form
s.
Before any further w
ork can be done on theories of Fuzhou tone sandhi, it is im
perative that instrumental data be obtained. Such data could m
ake initial grouping a lot easier. It could determ
ine whether or not w
hat has been deemed a 31 is actually significantly
different from a 21 or a 22 from
a 23, a 34 from a 44 and so on. R
ose (1990) has demonstrated
the usefulness of this approach. So, while the im
portance of instrumental phonetics in
phonology is often seen to be in the assessment of com
peting theories (e.g. Ohala 1986;
Ladefoged 1990; and other papers in Beckm
an & K
ingston 1990)), it can now be seen a
fortiori that the tonologist’s observation data would preferably be instrum
entally verified in the first place.
107
7
Summ
ary
This thesis has investigated the phonetics and phonology of tones in Fuzhou, as they occur in citation and in disyllabic utterances. It w
as motivated by the fact that although there have been
a number of analyses of Fuzhou phonology, no previous descriptions of the phonetics of
Fuzhou tones existed.
For the citation tones, I quantified the acoustic dimensions of fundam
ental frequency, duration, and am
plitude. From this I derived the acoustics of tones of the variety of Fuzhou as
a whole, as opposed to previous auditory descriptions, or descriptions of an individual
speaker. These results showed significantly m
ore variation than previously described, not just w
ithin the variety (across all speakers), but also consistent differences between speakers.
There were also m
ore sandhi forms than previously reported, and the actual phonetics of the
data differed to that in previous studies, so that some of the previous feature assignm
ents and phonological generalizations cannot be applied to these data (such as Tone 2 being [+upper] or all sandhi tones being [+upper]).
The relationship between A
r and F0 was also investigated to infer the relative
importance of the physiological features of vocal cord tension and subglottal pressure in the
production of tones. In addition to vocal cord tension, subglottal pressure was found to have a
possible role in tone production in Fuzhou.
In chapter 6, I presented the auditory data for the tone sandhi of disyllabic expressions, and presented and discussed tw
o possible analyses of Fuzhou phonology. The data were found
to be differ in several important respects from
published data, with m
ore complexities than
any of the systems of Fuzhou tone sandhi previously described. There w
ere more sandhi form
s and m
ore natural classes that could not be phonetically motivated. In fact, there w
as little phoneticity obvious in the data. W
hile it would obviously be an advantage to have quantified
data as the observation data for a phonological analysis, the results I obtained were
consistently the same for all four speakers. This allow
ed us to infer that there are little or no betw
een-speaker differences and that the data presented is really representative of the actual system
of tone sandhi. This highlights the need for a multi-speaker approach, not just to
unearth the variation in sandhi forms, but also to corroborate the data. It is certainly suggests
that the notion of an exceptionless sandhi system for a given variety is rather idealistic.
The present data may now
be used for future research, be it a tonetic comparison w
ith other tonal languages; using this as input data for universals on tonology; or input data for tonological theories.
C
hapter 7
108
Suggestions for further study
There are many ideas for further study arising from
the present investigations. I shall list a few
of these.
Phonetics of the citation tones
Now
that we have the norm
alized acoustics, an investigation into the perception of the tones to try to ascertain w
hich are the most perceptually im
portant acoustic cues needs to be done; including perceptual features other than pitch such as the phonation type and changes in vow
el quality. From
the point of view of tonal production, it w
ould be interesting to know w
hat the physiological correlates are, but this w
ill definitely entail a modification of the M
onsen et al. m
odel to incorporate the third dimension of glottal aperture due to the changing phonation
types (cf. Chapter 5).
Phonetics of the disyllabic tone sandhi
Having seen the com
plex system of tone sandhi in Fuzhou, it is even m
ore pressing to quantify the disyllabic expressions in the sam
e way the citation tones w
ere quantified. As noted earlier,
Zhang writes that “the field needs carefully designed acoustic studies that system
atically look at the realizations of tones in tone sandhi behavior” (Zhang 2009:11–12). Slight falls and slight rises w
ould no longer be mere im
pressions, and any phoneticity, if present would be
more obvious. It is also a good start to determ
ining whether the sandhi tones sim
ilar in form to
the citation tones are in fact the ‘same’ tones, at least phonetically.
Psychological reality of tone sandhi
It must be determ
ined whether or not the sandhi changes are rule-governed, and if so, w
hether or not they are generative rules. If it is not rule-governed, the question still rem
ains of how to
account for the changes. Is it purely substitution, or suppletion? That is, are the forms for
which the citation tones are substituted already existing in the tonal inventory, or are they
specifically only for that context (and thus individually learned)? There are many m
ethods that could be explored to test the psychological reality of the nature of the tone sandhi: w
ord gam
es, neologisms, and observing natural speech errors and the incorporation of loan w
ords. The acquisition of Fuzhou tone sandhi is certainly a topic that deserves future attention an perhaps one that w
ould shed light on the question of whether the sandhi is rule-governed or
simply learned stipulations.
Finally, I think it is examine the perceptual salience of these phonetic cues, to see if tonology
can be better motivated by using these (e.g. D
onohue 2012).
109
References
Anderson, Stephen R
. 1978. Tone features. In Victoria From
kin (ed.) Tone: A linguistic survey, pp. 133–175. N
ew Y
ork: Academ
ic Press.
Ballard, W
illiam L. 1980. O
n some aspects of W
u tone sandhi. Ajia Afurika Gengo Bunka
Kenkyū (ア
ジア・アフリカ言語文化研究
) [Journal of Asian and A
frican Studies] 19: 83–163.
Beckm
an, Mary E. and John K
ingston (eds) 1990. Papers in Laboratory Phonology I: Betw
een the gramm
ar and physics of speech. Cam
bridge: Cam
bridge University Press.
Běijīng dàxué zhōngguóyǔyánw
énxuéxì yǔyánxué jiàoyánshì (北京大學中國語言文學系語
言學教研室
) [Peking University C
hinese Departm
ent]. 1962. Hànyǔ Fāngyīn Zìhuì (漢
語
方音字匯
) [A C
hinese dialect syllabary] Běijīng: W
énzì gǎigé chūbǎn shè (北京:文字改
革出版社
).
Cham
bers, J. K. and Peter Trudgill. 1980. D
ialectology. Cam
bridge: Cam
bridge University
Press. C
han, Lee Lee L. 1998. Fuzhou tone sandhi. PhD D
issertation, UC
SD.
Chan, M
arjorie K. M
. 1985. Fuzhou Phonology: A non-linear analysis of tone and stress. PhD
Dissertation: U
niversity of Washington.
Chao, Y
uen-Ren. 1930. A
system of tone letters. Le M
aître Phonétique 45: 24–27.
Chao, Y
uen-Ren. 1934. O
n the non-uniqueness of phonemic solutions of phonetic system
s. Bulletin of the Institute of H
istory and Philology, Academia Sinica 4: 363–97. R
eproduced in M
artin Joos (ed.) 1966. Readings in Linguistics I, pp. 38–55. Chicago: U
niversity of C
hicago Press.
Chen, Leo and Jerry N
orman. 1965. An introduction to the Foochow
dialect. San Francisco State C
ollege in cooperation with the U
S Office of Education.
Chen, M
atthew. 2000. Tone sandhi: patterns across C
hinese dialects. Cam
bridge Studies in Linguistics 92. C
ambridge: C
ambridge U
niversity Press
Coster, D
.C. and Paul K
ratochvil. 1984. Tone and stress discrimination in norm
al Beijing
dialect speech. Beverly H
ong (ed.), New
papers on Chinese language use, 119–132.
Contem
porary China C
entre, AN
U.
Disner, Sandra Ferrari. 1980. Evaluation of vow
el normalisation procedures. Journal of the
Acoustical Society of America 67 (1): 253–261.
Donohue, C
athryn. 1991. Fuzhou tones. ms, A
ustralian National U
niversity.
Donohue, C
athryn. 1992a. The phonetics and phonology of Fuzhou tones. Honours thesis,
Australian N
ational University.