39
Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Voice source dynamics in connected speech Gobl, C. journal: STL-QPSR volume: 29 number: 1 year: 1988 pages: 123-159 http://www.speech.kth.se/qpsr

Voice source dynamics in connected speech · present experimental study supplements the ... synthesizer structure are not likely to be all ... is a continious function until the main

Embed Size (px)

Citation preview

Dept. for Speech, Music and Hearing

Quarterly Progress andStatus Report

Voice source dynamics inconnected speech

Gobl, C.

journal: STL-QPSRvolume: 29number: 1year: 1988pages: 123-159

http://www.speech.kth.se/qpsr

STL-QPSR 1/1988

11. SPEECH PRODUCTION

A. VOICE SOURCE DYNAMICS I N CONNECTED SPEECH Chr ister Gobl*

Abstract Dynamic v a r i a t i o n s o f t h e v o i c e s o u r c e i n c o n n e c t e d s p e e c h w e r e

s t u d i e d b y m e a n s o f i n v e r s e f i l t e r i n g a n d waveform p a r a m e t e r i z a t i o n . The speech materials c o n s i s t e d o f a few u t t e r a n c e s spoken by t h r e e a d u l t males a n d o n e 1 0 y e a r - o l d male. A f o u r p a r a m e t e r s o u r c e mode l ( t h e LF model) was u t i l i z e d t o d e s c r i b e t h e t e m p o r a l c h a n g e s o f t h e g l o t t a l p u l s e shape. The p r o p e r t i e s o f t h i s model are b r i e f l y exp la ined . Some problems a s s o c i a t e d w i t h i n v e r s e f i l t e r i n g f o r d e r i v i n g t h e v o l u m e v e l o c i t y th rough t h e g l o t t i s are a l s o d i s c u s s e d .

The r e s u l t s o f t h e a n a l y s i s p rov ide a g e n e r a l i d e a o f t h e range o f p u l s e shape v a r i a t i o n i n normal speech. D i f f e r e n t stress env i ronments show m a j o r e f f e c t s o n t h e g l o t t a l e x c i t a t i o n . S i g n i f i c a n t c h a n g e s i n t h e g l o t t a l p u l s e shape are t y p i c a l l y found a t t h e o n s e t and, p a r t i c u - l a r l y , a t t h e t e r m i n a t i o n o f t h e v o i c e s o u r c e , a n d a l s o a t many o f t h e b o u n d a r i e s between v o w e l s and consonants . St rong i n t e r d e p e n d e n c i e s are o f t e n found between t h e d i f f e r e n t p a r a m e t e r s o f t h e model, and substan- t i a l i n f o r m a t i o n abou t t h e p u l s e shape can be i n f e r r e d d i r e c t l y from t h e ampl i tude o f t h e s p e e c h waveform. R e s y n t h e s i z e d u t t e r a n c e s show t h a t t h e improved s o u r c e model is more i m p o r t a n t f o r t h e c h i l d ' s v o i c e t h a n f o r t h e v o i c e s o f t h e a d u l t males. Typica l pa ramete r v a l u e s f o r d i f f e r - e n t c o n t e x t s are t a b u l a t e d and s u g g e s t i o n s f o r developing a p r e l i m i n a r y s o u r c e r u l e sys tem a r e p resen ted .

1. I n t r o d u c t i o n Most speech s y n t h e s i z e r s are based upon t h e s o u r c e - f i l t e r t h e o r y o f

speech p r o d u c t i o n , f o r which t h e s o u r c e is assumed t o be l i n e a r l y sepa- r a b l e f r o m t h e v o c a l t r a c t f i l t e r . The c o n v e n t i o n a l s o u r c e o f v o i c e d sounds i n a fo rmant s y n t h e s i z e r is a sequence o f c o n s t a n t shape p u l s e s w i t h a s p e c t r a l s l o p e o f -6 dB p e r o c t a v e , c o n t r o l l e d i n terms o f fundamental f r e q u e n c y a n d a m p l i t u d e o n l y . Much more a t t e n t i o n h a s a c -

c o r d i n g l y been p a i d to f i l t e r t h a n t o s o u r c e f u n c t i o n s . Systems f o r speech s y n t h e s i s by r u l e , such as t h e ona developed a t

KTH by R. Car lson and B. Granstrom ( s e e Car l son , Granst rom, & Hunnicu t t , 1981 and r e f e r e n c e s t h e r e i n ) , have advanced s t e a d i l y by improving phone- t i c a n d p r o s o d i c r u l e s . However , t h e s o u n d g e n e r a t i n g s y s t e m ( t h e s y n t h e s i z e r ) is b a s i c a l l y t h e same, w h i c h means t h a t t h e c o n v e n t i o n a l s o u r c e is s t i l l used .

* Swedish Telecom A d m i n i s t r a t i o n ( T e l e v e r k e t ) , Technology D e p t . , S e c t i o n f o r Research and Development, S-123 86 F a r s t a , Sweden. Graduate s t u - d e n t a t KTH.

STL-QPSR 1/1988

The demand f o r improved q u a l i t y and n a t u r a l n e s s o f s y n t h e t i c speech is growing w i t h c u r r e n t e x p c t a t i o n s f o r a more extended use o f t ex t - to -

speech s y s t e m s i n i n f o r m a t i o n s e r v i c e s a n d h a n d i c a p a i d s . T h e r e f o r e , s t u d i e s o f t h e human v o i c e s o u r c e h a v e g a i n e d a r e n e w e d i n t e r e s t . The p r e s e n t e x p e r i m e n t a l s t u d y supp lements t h e c o n s i d e r a b l e amount o f theo-

r e t i c a l work which h a s been persued by o t h e r s i n t h i s f i e l d . I n o r d e r t o c o n v i n c i n g l y s y n t h e s i z e women's and c h i l d r e n ' s v o i c e s

and d i f f e r e n t t y p e s o f p h o n a t i o n s u c h a s b r e a t h i n e s s , t e n s e ( p r e s s e d ) v o i c e , l o u d v o i c e , s h o u t i n g , e t c . , i t is i m p o r t a n t t o g a i n a more p ro- found i n s i g h t i n t o t h e p r o p e r t i e s a n d d y n a m i c s o f t h e human v o i c e source .

There a r e ( a t l e a s t ) t w o d i f f e r e n t a p p r o a c h e s t o t h e p r o b l e m o f

improving t h e v o i c e s o u r c e . One is t o re ject t h e l i n e a r s o u r c e - f i l t e r concep t o f speech p roduc t ion w i t h its i n h e r e n t l i m i t a t i o n s . The alter- n a t i v e is an i n t e r a c t i v e s y n t h e s i s . (For basic t h e o r y and d e s c r i p t i o n o f i n t e r a c t i o n , see I f o r i n s t a n c e , Ananthapadmanabha & Fant , 1982; Fant & L i n , 1 9 8 7 , and r e f e r e n c e s t h e r e i n . A b r i e f summary o f i n t e r a c t i o n phenomena can be found i n Klatt & K l a t t , forthcoming.) However, a f u l l y i n t e r a c t i v e s y n t h e s i s would b e v e r y c o m p l e x , e s p e c i a l l y i f a c o m p l e t e a r t i c u l a t o r y c o d i n g were a d o p t e d . We s t i l l l a c k s t r a t e g i e s a n d r u l e s f o r c o n t r o l l i n g a r t i c u l a t o r s r a t h e r t h a n f o r m a n t s . A c o m p l e t e l y new system o f r u l e s would have t o be developed.

The p e r c e p t u a l impor tance o f s o u r c e - f i l t e r i n t e r a c t i o n is as y e t t o a l a r g e e x t e n t unknown ( s e e , however , Nord, Ananthapadmanabha, & Farlit 1984). Even SO, I f i n d i t r e a s o n a b l e t o b e l i e v e t h a t t h e t y p e s o f s o u r c e f i l t e r i n t e r a c t i o n s t h a t cannot be s i m u l a t e d w i t h i n t h e fo rmant s y n t h e s i z e r s t r u c t u r e a r e n o t l i k e l y t o b e a l l t h a t i m p o r t a n t . HiF i -

s y n t h e s i s ( u s i n g a n o n - i n t e r a c t i v e fo rmant s y n t h e s i z e r ) c a r r i e d o u t a t t h e Dept. o f Speech Communication and Music Acous t i cs and e l s e w h e r e , h a s

shown i t p o s s i b l e t o a c h i e v e v e r y c l o s e i m i t a t i o n s o f n a t u r a l u t t e r - a n c e s , p e r c e p t u a l l y a l m o s t i n d i s t i n g u i s h a b l e from t h e o r i y i r i a l s (Fiolmes, 1982; F a n t , Gob11 K a r l s s o n , & L i n , 1 9 8 7 a ; K l a t t , 1987 ; K la t t & K l a t t ,

forthcoming ) . The a p p r o a c h o u t l i n e d i n t h e p r e s e n t p a p e r is t o r e t a i n a n d

o p t i m i z e t h e s o u r c e f u n c t i o n w i t h i n t h e l i n e a r source-f i l ter theory. It

is i m p o r t a n t t o u t i l i z e a p a r a m e t e r i z e d model w h i c h is c a p a b l e o f

p r e s e r v i n g t h e m a i n s h a p e ( d i s r e g a r d i n g a n y k i n d o f r i p p l e c o m p o n e n t s due t o s o u r c e - f i l t e r i n t e r a c t i o n ) o f a l m o s t any g l o t t a l p u l s e which is l i k e l y t o o c c u r . F u r t h e r , o n e m u s t s t u d y how t h e g l o t t a l p u l s e s h a p e

v a r i e s i n a l i n g u i s t i c con tex t . S e v e r a l d i f f e r e n t v o i c e s o u r c e m o d e l s (e.g., Ananthapadmarlabha

1984; F a n t , 1 9 7 9 a ; 197913; 198213; F a n t , L i l j e n c r a n t s , & L i n , 1 9 8 5 a ; Hedel in , 1984; Klatt & K l a t t , for thcoming; L jungqvis t & F u j i s a k i 1935a; Rosenberg , 1 9 7 1 ; R o t h e n b e r g , C a r l s o n , G r a n s t r o m , & L i n d q v i s t - G a u f f i n , 1974) a n d n u m e r o u s m e t h o d s o f a n a l y s i n g t h e g l o t t a l a i r f l o w (Anan-

STL-QPSR 1/1988

thapadmanabha, 1 9 8 4 ; C r a n e n & Boves , 1 9 8 5 ; F a n t & S o n e s s o n , 1 9 6 2 ; Hunt , 1987; L i n d q v i s t - G a u f f i n , 1 9 6 5 ; 1 9 7 0 ; L j u n g q v i s t , 1 9 8 6 ; L j u n g q v i s t &

Fuj i s a k i , 1985b; Rothenberg , 1973; Sondhi, 1975) have been d e s c r i b e d i n t h e l i t e r a t u r e . However, t h e s e models and t h e s e methods have e s s e n t i a l - l y been u t i l i z e d f o r s t u d y i n g t h e v o i c e s o u r c e under static c o n d i t i o n s such as s t e a d y - s t a t e v o w e l s (Gauf f i n & Sundberg , 1980; Holmberg, H i l l -

man, & P e r k e l l t 1 9 8 7 ; ~ k t o n y , 1 9 6 5 ; S u n d b e r g & G a u f f i n , 1 9 7 9 , etc.). R e l a t i v e l y l i t t l e a t t e n t i o n h a s been p a i d t o t h e dynamic p r o p e r t i e s o f t h e v o i c e s o u r c e i n connected speech (see however Ananthapadmanabtla, 1984; F a n t , 1 9 8 0 ; Gob11 1985) . I t is t h e c o n t e x t - d e p e n d e n t t e m p o r a l v a r i a t i o n s o f t h e v o i c e s o u r c e t h a t is t h e c e n t r e o f i n t e r e s t i n t h e p r e s e n t p a p r . The u l t i m a t e o b j e c t i v e is a set o f r u l e s f o r t h e v a r i a - t i o n o f t h e s o u r c e model parameters accord ing t o f e a t u r e s l i k e in tona- t i o n , stress p a t t e r n s , o n s e t a n d t e r m i n a t i o n o f u t t e r a n c e s , p h o n e t i c d i f f e r e n c e s , etc. T h e s e r u l e s c o u l d t h e n b e i n c o r p o r a t e d i n t e x t - t o - speech s y n t h e s i s .

2. The Voice Source Model The model t h a t was u t i l i z e d f o r s i m u l a t i n g t h e volume v e l o c i t y f l o w

is a f o u r pa ramete r model, named t h e LF model. Its p r o p e r t i e s have been f u l l y d e s c r i b e d i n "A four-parameter model o f g l o t t a l f low" ( F a n t & al.,

1985a) . The f o u r p a r a m e t e r s a re u s e d t o mode l t h e d i f f e r e n t i a t e d f l o w

r a t h e r t h a n t h e r e a l g l o t t a l f l o w . A d i f f e r e n t i a t i o n o f t h e s o u r c e f u n c t i o n emphas izes t h e h i g h e r f r e q u e n c i e s o f t h e s o u r c e spec t rum (+6

dB/octave) a n d t h e r e f o r e a l l o w s a more p r e c i s e s p e c t r a l m a t c h i n t h i s f requency r e g i o n . The d i f f e r e n t i a t e d f l o w is commonly u s e d i n s p e e c h s y n t h e s i s , and i n c l u d e s t h e e f f e c t o f r a d i a t i o n a t t h e l i p s .

The mode l c o n s i s t s o f t w o p a r t s ( c f . F ig . 1) . The f i r s t p a r t is a n

e x p o n e n t i a l l y growing s i n u s o i d t o which t h r e e o f t h e f o u r p a r a m e t e r s o f t h e model p e r t a i n . T h i s segment is

model l ing t h e f low from g l o t t a l opening u n t i l t h e main e x c i t a t i o n o c c u r s ( t h e moment o f maximum d i s c o n t i n u i t y i n t h e g l o t t a l a i r f l o w f u n c t i o n , which n o r m a l l y c o i n c i d e s w i t h t h e moment o f maximum n e g a t i v e f l o w d e r i - v a t i v e ) . A s opposed to most o t h e r models o f g l o t t a l f low, t h e LF model is a c o n t i n i o u s f u n c t i o n u n t i l t h e main e x c i t a t i o n , and t h e r e f o r e d o e s

STL-QPSR 1/1988

n o t i n t r o d u c e a d d i t i o n a l e x c i t a t i o n s . I n c o m p a r i s o n r t h e F a n t mode l ( F a n t , 1 9 7 9 a ; 1 9 7 9 b ) is composed o f t w o d i f f e r e n t s e g m e n t s ; a r i s i n g b ranch up t o maximum f l o w and a f a l l i n g b ranch down t o comple te c l o s u r e . The d i s c o n t i n u i t y between t h e two segments i n t r o d u c e s a secondary weak e x c i t a t i o n a t t h e f l o w peak.

LF MODEL

I " " " I I I I

0 1 2 3 4 5 6 7 8 m s

TIME IN MSEC

a t . S e g m e n t l . E ( t ) = E o e s ln wgt

(to: t :te)

e [ - ~ ( t - + e ) - e - ~ ( t c - t e ) Segment 2. E ( t ) = - - a e

( t , :t 2 t ' )

F i g . 1. The LF model. A f o u r parameter v o i c e s o u r c e model.

The t h r e e p a r d r n e t e r s p e r t a i n i n g t o t h e f i r s t s e g m e n t o f t h e LF

model are:

( 1 ) Eo which is m e r e l y a scale f a c t o r .

( 2 ) a = -B TC w h e r e B is t h e " n e g a t i v e b a n d w i d t h " o f t h e e x p o n e n t i a l l y growing ampl i tude.

( 3 ) w g = '2l-cF w h e r e Fg = 1 / 2 t p a n d tp is t h e r i s i n g - t i m e ( t h e t i m e q from g l o t t a l opening t o maximum f l o w ) .

STL-QPSR 1/1988

The second p a r t o f t h e model is an e x p o n e n t i a l segment t h a t a l l o w s a r e s i d u a l f l o w (dynamic l e a k a g e ) a f t e r t h e main d i s c o n t i n u i t y , a t time t,, when t h e v o c a l f o l d s c l o s e . The s e g m e n t u s e d f o r t h i s " r e t u r n phase" is

where ta is t h e f o u r t h pa ramete r o f t h e model. ta is t h e t i m e c o n s t a n t o f t h e e x p o n e n t i a l c u r v e and is de te rmined by t h e p r o j e c t i o n on t h e t i m e a x i s o f t h e d e r i v a t i v e a t t i m e te.

E can i t e r a t i v e l y be de te rmined from

and f o r small v a l u e s o f tat E is a p p r o x i m a t e l y e q u a l t o l/t,. Eeis t h e n e g a t i v e a m p l i t u d e o f t h e e x c i t a t i o n - s p i k e a n d tc is t h e moment when complete c l o s u r e is reached.

The e f f e c t o f t h e r e t u r n p h a s e on t h e s o u r c e s p e c t r u m is, d u e t o its e x p o n e n t i a l waveshape) a p p r o x i m a t e l y a f i r s t o r d e r low-pass f i l t e r

w i t h a c u t o f f f r e q u e n c y Fa = 1/(2rcta) . T h i s means ; t h e l o n g e r t h e r e t u r n p h a s e ) t h e l o w e r t h e c u t o f f f r e q u e n c y ) a n d t h e l a r g e r t h e h i g h - f requency reduc t ion . The a t t e n u a t i o n , A La(£) , i n d e c i b e l s a t f requency f ( a s c o m p a r e d t o a c o n s t a n t s p e c t r a l s l o p e o f -6 d ~ / o c t ) is a p p r o x i - mate ly :

By c o n v e n t i o n tc = tor t h e t i m e o f g l o t t a l o p e n i n g f o r t h e f o r t h - coming p u l s e per iod. T h i s i m p l i e s t h a t t h e model l a c k s a c l o s e d phase. I n p r a c t i s e t h i s is n o d r a w b a c k ; f o r n o r m a l ( s m a l l ) v a l u e s o f t,, t h e e x p o n e n t i a l c u r v e w i l l f i t c l o s e l y t o t h e z e r o l i n e ) p rov id ing , f o r a l l e x t e n t s and purposes , a c l o s e d phase. The lesser number o f parameters makes t h e implementa t ion o f t h e model s i m p l e r .

A p a r t f r o m t h e f o u r p a r a m e t e r s , t h e r e is a r e q u i r e m e n t of area b a l a n c e ,

STL-QPSR 1/1988

which keeps the zero flow l i n e from d r i f t i n g .

One of the p roper t i e s of t h i s model is the p o s s i b i l i t y of achieving

a smooth and gradual change from a sharp e x c i t a t i o n t o a pe r fec t sinus-

o id ( a = 0).

For t h e a n a l y s i s , t h e p a r a m e t e r s Ee, rg, rk, and ra were used, a s

these a r e more convenient and more c l o s e l y r e l a t e d t o d i f f e r e n t proper-

t i e s o f t h e v o i c e s o u r c e ( c f . Fig. 2) .

Fig. 2 . U ( t ) = amplitude of t r u e g l o t t a l flow a s a function of time. E ( t ) = amplitude of d i f f e r e n t i a t e d g l o t t a l flow a s a function of time. Up = peak flow.

E i = maximum p o s i t i v e r a t e of change i n the flow function ( t h e in- f l ex ion p o i n t ) .

Ee = negative l e v e l of the f low-derivat ive a t maximum flow-discon- t i n u i t y ( t h e e x c i t a t i o n ) .

t i , t p , and te = time points of E i , Up, and Ee, respect ive ly .

to = time of g l o t t a l opening. tc = time of complete ( o r maximum) c losure . Note: t i , tp, and te a r e a l s o used t o denote the durat ions t i - to,

tp - t o , and te - t o , where to is defined a s equal t o zero.

t, = durat ion of te - tp. For explanation of t a , see t e x t . I

STL-QPSR 1/1988

The mode l p a r a m e t e r s Eo,a, W g l a n d ta c a n t h e n , f r o m t h e a n a l y s i s pa ramete rs , b e c a l c u l a t e d by us ing an i t e r a t i v e method. A l t e r n a t i v e l y , o t h e r q u a n t i t i e s s u c h a s o p e n q u o t i e n t d e f i n e d a s Oq = te /TO, t h e c u t -

o f f f r e q u e n c y o f t h e r e t u r n p h a s e , Fa = 1/(21Tta) , etc., may b e u s e d a s a n a l y s i s p a r a m e t e r s . Such a l t e r n a t i v e p a r a m e t e r s w i l l b e d i s c u s s e d f u r t h e r i n S e c t i o n 7.

3. Speech M a t e r i a l s The s p e e c h materials u s e d f o r t h e a n a l y s i s w a s r e c o r d e d i n a n

a n e c h o i c chamber on a n FM t a p e recorder . The speech was picked up by a 1" B&K condenser microphone. T h i s p r o c e d u r e e n s u r e d h i g h-qua1 i t y re- cord ing w i t h c o r r e c t p h a s e r e s p o n s e e v e n a t t h e l o w e s t f r e q u e n c i e s , t h e r e b y min imiz ing t h e r i s k o f d i s t o r t i n g t h e g l o t t a l p u l s e shape.

Three a d u l t male S w e d i s h s p e a k e r s w i t h r a t h e r d i f f e r e n t v o i c e

q u a l i t i e s were s t u d i e d . The u t t e r a n c e s a n a l y s e d were:

b e h s l l a from t h e s e n t e n c e v i v i l l b e h s l l a honom /vIvI lb&h3l :ah~n:3m/ ("we w a n t t o k e e p h im") r e a d w i t h t h r e e d i f f e r e n t stress p a t t e r n s : f o c u s on v i l l / ' v I l / , f o c u s on b e h z l l a /b€ 'h3 l :a/, a n d f o c u s on honom /X h3n :2m/.

The two s e n t e n c e s e n a l l d e l e s u tmark t i d & / & n a l d ~ l ~ $ w :t , ~nark t Ide : / ( " a p e r f e c t l y m a r v e l l o u s i d e a " ) a n d i n t e i d e t t a z r h u n d r a d e /~nt~~d&,t:ao:rhendrad~/ ( " n o t i n t h i s c e n t u r y " ) .

ja / j a : / ( "yes1 ' ) , a d j o /a ' jd : / ( "goodbye") - and o n l y f o r one o f t h e speakers : Jag h e t e r Johan / ja:he:t€rXju:an/ ("my name is Johan")

The v o i c e o f a t e n y e a r o l d c h i l d ( m a l e ) was a l s o s t u d i e d . The u t t e r a n c e s a n a l y s e d were :

ad j o and Jag h e t e r Johan ja 1 -

( I n t h e t r a n s c r i p t i o n s , t h e f o l l o w i n g symbols were used t o d e n o t e

stress: ' i n d i c a t e s t h a t t h e f o l l o w i n g s y l l a b l e h a s a c u t e a c c e n t , i n d i - cates t h a t t h e f o l l o w i n g s y l l a b l e h a s g r a v e a c c e n t , and , i n d i c a t e s t h a t t h e f o l l o w i n g s y l l a b l e h a s t h e secondary stress o f a g r a v e accent.)

The s p e e c h d a t a was l o w - p a s s f i l t e r e d a t 6.3 kHz a n d s a m p l e d a t

1 6 kHz. I t w a s f u r t h e r p r o c e s s e d a s f o l l o w s : a h i g h - p a s s f i l t e r a t 20 Hz ( l i n e a r phase) was i n t r o d u c e d to remove v a r i a t i o n s o f t he z e r o l i n e ,

STL-QPSR 1/1988

due t o p o s s i b l e super imposed low-frequency p r e s s u r e f l u c t u a t i o n s i n t h e

recording-room. O s c i l l o g r a m s o f t h e speech waveform , s p e c t r o g r a m s , and s p e c t r a l s e c t i o n s ( f o r some o f t h e m a t e r i a l s ) were p l o t t e d .

4. Procedure The s o u r c e was s t u d i e d by means o f i n v e r s e f i l t e r i n g , employing a

number o f complex-conjugate z e r o s t o c a n c e l t h e f i l t e r i n g e f f e c t o f t h e

s u p r a g l o t t a l s y s t e m . Each z e r o s h o u l d c o r r e s p o n d t o a f o r m a n t o f t h e v o c a l t r ac t . The c o m p l e t e z e r o f u n c t i o n t h e n r e p r e s e n t s t h e i n v e r s e s u p r a g l o t t a l t r a n s f e r f u n c t i o n a t a p a r t i c u l a r moment i n t i m e . With a sampling f requency o f 1 6 kHz and assuming a c o n s t a n t v o c a l t r a c t l e n g t h o f 17.5 crn f o r t h e male s u b j e c t s l t h e t h e o r e t i c a l number o f f o r m a n t s would be e i g h t . No h i g h e r pole c o r r e c t i o n is needed; it is i n h e r e n t i n t h e d i g i t a l r e a l i z a t i o n . I n p r a c t i s e , n i n e z e r o s were c o n s i s t e n t l y used. The r e a s o n f o r t h e n i n t h z e r o a n d i ts e f f e c t w i l l b e d i s c ~ ~ s s e d

later i n t h i s s e c t i o n . To g e t t h e " t r u e " g l o t t a l f l o w , t h e r a d i a t i o n a t

t h e l i p s h a s t o b e c o m p e n s a t e d f o r b y a s i m p l e r e a l p o l e a t z e r o f r e - quency ( i n t e g r a t o r ) . O t h e r w i s e , t h e d i f f e r e n t i a t e d g l o t t a l f l o w is

o b t a i n e d , t h i s b e i n g , o n e m i g h t n o t e , t h e a l m o s t e x c l u s i v e o b j e c t o f

s t u d y i n t h i s p a p e r .

One o f t h e m a i n p r o b l e m s w i t h t h i s method is o f c o u r s e t h a t o f f i n d i n g t h e c o r r e c t t r a n s f e r f u n c t i o n o f t h e s u p r a g l o t t a l system. Any

e r r o r i n t h e p a r a m e t e r s o f t h e i n v e r s e f i l t e r w i l l more o r less d i s t o r t t h e g l o t t a l p u l s e . The p u l s e s h a p e is v e r y s e n s i t i v e t o e r r o n e o u s s e t t i n g s o f t h e f r e q u e n c i e s a n d b a n d w i d t h s o f t h e f i r s t f o r m a n t , es- p e c i a l l y when F1 is low. Minor e r r o r s i n t h e h i g h e r f o r m a n t s h a v e

l i t t l e e f f e c t on t h e main p u l s e shape. One l i m i t a t i o n o f t h e t e c h n i q u e used h e r e is t h a t l as z e r o a i r f l o w is n o t i n d i c a t e d , t h e a b s o l u t e volume v e l o c i t y l e v e l c a n n o t b e c a l c u l a t e d . T h i s p r o b l e m c a n b e s o l v e d u s i n g

t h e c i r c u m f e r e n t i a l l y ven ted peumotachograph mask developed by M. Roth-

enberg (Rothenberg, 1973). Only t h e v a r i a t i o n s i n a i r f l o w were s t u d i e d

h e r e ; f o r s y n t h e s i s p u r p o s e s DC c o m p o n e n t s c a u s e d b y c o n s t a n t q l o t t a l

l e a k a g e are o f minor importance. To a c h i e v e t h e b e s t p o s s i b l e r e s u l t s , i t w a s t h o u g h t b e s t n o t t o

use a u t o m a t i c formant t r a c k i n g b a s e d o n l i n e a r p r e d i c t i o n t e c h n i q u e s . E x i s t i n g m e t h o d s are u n r e l i a b l e a s t h e y g e n e r a t e f r e q u e n t e r r o r s i n

formant l o c a t i o n s a n d b a n d w i d t h a n d , o f t e n , s p u r i o u s p o l e s w i t h v e r y l a r g e bandwidths. I n s t e a d , t h e t i m e va ry ing f i l t e r was manual ly adap ted t o t h e speech waveform u t i l i z i n g an i n t e r a c t i v e computer program ( I N A ,

by J. L i l j e n c r a n t s , KTH, c f . Fig. 3 ) which permits manipu la t ion ( w i t h a j o y - s t i c k ) o f f r e q u e n c i e s a n d b a n d w i d t h s o f t h e z e r o s ( o r p o l e s ) u s e d f o r f i l t e r i n g t h e speech input . Whenever a change is made i n t h e f i l t e r s e t t i n g s , t h e f i l t e r - o u t p u t is updated. The spectrum o f t h e new f i l t e r - o u t p u t is seen d i r e c t l y on t h e screen. When s a t i s f a c t o r y i n v e r s e f i l t e r

STL-QPSR 1/1988 - 131 -

WINDOW 6.00ms PITCH 167 Hz I

419 13.7 142 44.5

CZ 7357 367 20.0

.......... ....... . ' , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...... ) i . . . . . . . . . . : . : . . 8 . . . . . . . . . . , . . . : . . . . . . . j . j . : ..... ....... .... ..... ....._........... .................. I... .... _ ................... 8. _I _..i ..i

. . . . . . . , .

. . . . . : . . . . : j : : : SPEAKER . . .. . ........... . : . : . i_._.._ _ A ..................................... 1

JS . . . . .

, . . . . . . . . j IDENTITY BEHRLLA . . . . . . . . . . . : .

. . . , . ......... ... . ............... ................ ......-....... , ............... , - , -. , ................... , ................. , .< .................... . . . . . . . . . . . . .

, . . : SANPFREQ 16 kHz

. . . . . . . . : . / . , . . ..................................................... ............................ L : .............. :

, , . . . . . . . . . . . j . : . . . : .

. . . . . . . : . / .

Fig. 3. Program for interactive filtering of speech waveform (INA).

(a) Speech waveform. (b) Log FFT-spectrum of (a). (c) Filtered waveform. (d) Log FFT-spectrum of (c). (e) Filter configuration. The filter can be manipulated inter-

actively in the s-domain using a joy-stick. The filter out- put is updated in real-time both in the time domain (the fil- tered waveform) and in the frequency domain (the FFT-spectrum).

STL-QPSR 1/1988

s e t t i n g s h a v e b e e n o b t a i n e d , t h e s e may b e s t o r e d i n a s e p a r a t e d a t a f i l e . F i l t e r f u n c t i o n s c a n b e s p e c i f i e d f o r a s many f r a m e s a s are

needed t o f o l l o w t h e d y n a m i c v a r i a t i o n s o f t h e v o c a l t r a c t . The d a t a f i l e f o r a w h o l e u t t e r a n c e is u s e d f o r c a l c u l a t i n g t h e d i f f e r e n t i a t e d

g l o t t a i ou tpu t . F i l t e r v a l u e s i n between a n a l y s e d f r a m e s a r e l i n e a r l y i n t e r p o l a t e d and t h e fo rmant trajectories a r e smoothed w i t h a low-pass

f i l t e r . The i n v e r s e f i l t e r o u t p u t o f a comple te u t t e r a n c e is d e p i c t e d

i n Fig . 4. N o t i c e t h e i n t e r a c t i o n r i p p l e i n t h e g l o t t a l o p e n p h a s e o f

t h e /d: / v o w e l a n d t h e o f t e n n o n - f l a t " c l o s e d " p h a s e ( c f . F a n t , L i n , &

Gob11 1985b).

F i g . 4 . The i n v e r s e f i l t e r o u t p u t ( d i f f e r e n t i a t e d g l o t t a l f l o w ) o f t h e u t t e r a n c e ( a ' j 6 : ) . S u b j e c t J S . Not ice t h e s o u r c e - f i l t e r i n t e r - a c t i o n r i p p l e i n t h e g l o t t a l open phase ( p a r t i c u l a r l y i n f r ames 34 -47 ) and t h e f r e q u e n t l y n o n - f l a t "c losed" phase .

STL-QPSR 1/1988

The next s t e p , once t h e de t e rmina t ion o f t h e i nve r se f i l t e r ou tput ( d i f f e r e n t i a t e d g l o t t a l f l o w ) is deemed s a t i s f a c t o r y , is t o f i t t h e

parameter model a s c l o s e l y a s p o s s i b l e t o t h e g l o t t a l pulses. Th i s w a s done w i t h a computer p r g r a m developed a t KTH by T.V. Ananthapadmanabha following sugges t ions by G. Fant, and involved t h e fol lowing procedure:

wi th a j o y - s t i c k s i x t ime p o i n t s i n t h e g l o t t a l p e r i o d b e i n g a n a l y s e d

a r e marked o u t . These a r e ( c f . Fig. 2 ) :

(1) g l o t t a l open ing ( to ) ;

( 2 ) p i n t o f i n f l e x i o n (maximum p s i t i v e d e r i v a t i v e , t i);

( 3 ) p i n t o f maximum flow ( t ) ; P

( 4 ) p i n t o f e x c i t a t i o n (te);

( 5 ) t h e p ro j ec t i on on t h e time a x i s o f t h e d e r i v a t i v e a t t h e beginning

o f t h e r e t u r n phase (t,);

(6 ) next g l o t t a l opening (to).

When a l l p o i n t s have been set, t h e program a s k s whether ti o r t is P

r e l i a b l e and w h e t h e r a r e a b a l a n c e is d e s i r e d . From t h e s e d a t a , t h e

curve-form and t h e parameter v a l u e s of t h e g l o t t a l model a r e c a l c u l a t e d

( c f . Fig. 5) .

I I I I I I I

-

Fo 136

Ei 553

E, 1513 ( e 1

rg 117

rk 33

ra -

SOURCE I 11 1 21 1 31 l kHz

Fig . 5 . Program f o r source pu l se matching. ( a ) Inverse f i l t e r ou tput ( d i f f e r e n t i a t e d g l o t t a l waveform). ( b ) Log FFT-spectrum of ( a ) . ( c ) Calcu la ted curve form of LF model. ( d ) Log FFT-spectrum of ( c ) . ( e ) Values of a n a l y s i s parameters of LF model. The example shows t h e matching of t he second pu l se i n frame 39 of F ig . 4 . Notice t h a t t h e model i s not capable of cap tu r ing t h e s o u r c e - f i l t e r i n t e r a c t i o n r i p p l e i n t h e g l o t t a l open phase.

STL-QPSR 1/1988

The p o s i t i o n o f t h e maximum p o s i t i v e d e r i v a t i v e l ti is o f t e n v e r y 1 much a f f e c t e d by a s o u r c e - f i l t e r i n t e r a c t i o n r i p p l e . S i n c e t h e L F model

is n o t a i m e d a t m o d e l l i n g i n t e r a c t i o n r i p p l e s ! I h a v e c o n s i s t e n t l y

r e g a r d e d tp a s more r e l i a b l e t h a n ti. The o p t i o n a l area b a l a n c e w a s used th roughout t h e materials.

The curve-form o f t h e model and t h e i n v e r s e f i l t e r o u t p u t are t h e n compared i n b o t h t h e time and t h e f requency domain. I f t h e d i v e r g e n c e between t h e two is t o o l a r g e , t h e parameter v a l u e s can be e d i t e d manual- l y and a new c a l c u l a t i o n and compar ison can b e made. A l t e r n a t i v e l y t h e p rocedure can be recommenced w i t h a new set o f t i m e p o i n t s . When f i n a l - l y s a t i s f i e d w i t h ' t h e g l o t t a l matching, t h e s o u r c e pa ramete r v a l u e s can be s t o r e d i n t h e same d a t a f i l e a s t h e f i l t e r p a r a m e t e r s . T h i s d a t a f i l e was used f o r p l o t t i n g t h e d y n a m i c a l l y v a r y i n g s o u r c e p a r a m e t e r s and

( f o r some o f t h e m a t e r i a l s ) f o r c r e a t i n g r e s y n t h e s i z e d v e r s i o n s o f t h e o r i g i n a l u t t e r a n c e s . A s y n t h e t i c v o i c e s o u r c e c o r r e s p o n d i n g t o t h e i n v e r s e f i l t e r o u t p u t i n F ig . 4 is d e p i c t e d i n F ig . 6.

Fig. 6. S y n t h e t i c v o i c e s o u r c e cor respond ing t o t h e i n v e r s e f i l t e r e d wave- form i n F i g . 4 . The t ime-vary ing LF model parameters were e s t i - mated by means o f s o u r c e p u l s e matching.

STL-QPSR 1/1988

Almost c o n s i s t e n t l y f o r a l l o f t h e t h r e e a d u l t male s u b j e c t s s t u d i e d , t h e h a r m o n i c s a r o u n d 3 kHz ( i n t h e r e g i o n o f F4) were found t o be enhanced ( o f t e n by more than 10 dB) i n comparison w i t h what would be expected f rom t h e t h e o r y o f a n " i d e a l " a c o u s t i c t u b e w i t h an e f f e c t i v e l e n g t h o f 17.5 c m . However, t h e nomograms i n F a n t (1960 , Fig. 1.4-11)

show t h a t F4 and F5 a r e o f t e n r e l a t i v e l y c l o s e and t h a t F5 is cons ider - a b l y lower than f o r a uniform tube (4500 Hz). The f a c t t h a t t h e l e v e l s

of f o r m a n t s coming i n t o p r o x i m i t y a r e i n c r e a s e d , is o n l y i n p a r t a n explana t ion f o r t h e enhancement. There a r e o t h e r p o s s i b l e c o n t r i b u t i n g f a c t o r s : i t c o u l d b e t h e consequence o f a n e x t r a l o n g v o c a l t r a c t combined w i t h wha t h a s been t e r m e d t h e " s i n g e r ' s f o r m a n t " (Sundbe rg , 1972). A d d i t i o n a l l y , o r a l t e r n a t i v e l y , i t m i g h t b e t h a t c r o s s - r e s o - nances a r e i n t e r f e r i n g w i t h p l a n e wave p r o p a g a t i o n . One o f t h e s u b j e c t s , JS, is c o n s i d e r a b l y above a v e r a g e h e i g h t and h i s v o c a l t r a c t is t h e r e f o r e v e r y l i k e l y t o b e l o n g e r t h a n 17.5 c m . Moreover , he is a t r a i n e d s i n g e r , a f a c t which is l i k e l y to a f f e c t h i s speech. The posi- t i o n o f h i s l a r y n x may b e l o w e r compared w i t h s p e a k e r s w i t h o u t s u c h t r a i n i n g , and t h i s would f u r t h e r i nc rease t h e voca l t r a c t length. Pha- ryngea l w iden ing is l i k e l y t o co-occur w i t h l a r y n x l o w e r i n g ( c f .

Sundberg, 1972). Together, t he se f a c t o r s could r e s u l t i n lower and more s a l i e n t cross-resonance frequencies . The enhancement is d e f i n a t e l y t h e

l a r g e s t f o r t h i s sub jec t . However, it is a l s o c l e a r l y found f o r t h e two o t h e r s u b j e c t s , BG and LN, who a r e n o t t r a i n e d s i n g e r s . Of t h e s e t w o , BG (who is a l s o t h e t a l l e r ) e x h i b i t s t h i s tendency t o a g r e a t e r degree.

It was b e c a u s e o f t h e enhancemen t o f e n e r g y i n t h i s r e g i o n t h a t n i n e ze ros were used i n t h e i nve r se f i l t e r funct ion. Without an a n t i - resonance f o r F5 c l o s e t o F4, t h e r e would be a c o n s i d e r a b l e amount o f r e s i d u a l r i p p l e i n t h e f l o w f u n c t i o n which c o u l d n o t b e e x p l a i n e d a s a c o u s t i c i n t e r a c t i o n be tween s o u r c e and f i l t e r . The e f f e c t s w i t h and without t h e F5 c a n c e l l a t i o n ( e i g h t and n i n e z e r o s r e s p e c t i v e l y ) a r e i l l u s t r a t e d i n Fig. 7. One m i g h t t h i n k t h a t i t would be s u f f i c i e n t t o

cance l F1-F5 c o r r e c t l y ( a s i n ( b ) , Fig. 7 ) , and o n l y u s e t h r e e z e r o s f o r t h e reg ion above F5. However, t h i s s t r a t e g y would also r e s u l t i n r i p p l e

components and n o i s e ( b u t o f h i g h e r f r e q u e n c i e s t h a n F5) i n t h e f l o w funct ion. Due t o t h e m i s s i n q a n t i - r e s o n a n c e , l e v e l s above F5 would b e e r roneous ly emphas i zed . D i s r e g a r d i n g r i p p l e componen t s , t h e o n l y i m -

p o r t a n t e f f e c t t h e n i n t h z e r o h a s on t h e main p u l s e s h a p e is a s l i g h t i nc rease i n t h e d u r a t i o n o f t h e r e t u r n phase. For c o n v e n i e n c e , t h i s f i l t e r con f igu ra t i on (n ine ze ros f o r t h e a d u l t males, which t h e o r e t i c a l - l y c o r r e s p o n d s t o a v o c a l t r a c t l e n g t h o f 19.7 c m ) was u sed f o r a l l voiced s o u n d s , and t h i s may have i n t r o d u c e d small e r r o r s f o r s o u n d s o t h e r t h a n o r a l ( n o n - n a s a l i z e d ) vowe l s . Bu t , a s p o i n t e d o u t , t h i s mainly a f f e c t s t h e r e t u r n p h a s e when g l o t t a l ma tch ing is c a r r i e d o u t .

For t h e a n a l y s i s o f t h e c h i l d , s i x ze ros were used i n t h e i nve r se f i l t e r func t ion , which corresponds t o a voca l tract l e n g t h o f 13.1 c m .

STL-QPSR 1/1988 - 136 -

1 17 . 160 WINDOW 7.81 ms PITCH 1 2 8 H z

SPEAKER BG IDENTITY BEHRLLA

SAMPFREQ 16 kHz

WINDOW 7.81 ms PITCH 1 2 8 H z

SPEAKER BG IDENTITY BEHRLLA

Fig. 7. The filter output (a) without and (b) with F5 cancellation (using eight and nine zeros, respectively) in the inverse filter function for /&/ from the word behzlla, subject BG. Notice the residual ripple in the glottal closed phase and the remaining peak in the source spectrum when the F5 anti-resonance is lacking.

STL-QPSR 1/1988 - 138 -

+ .- - - . --- .. . .-

r . 3 1

I 3 I 2 0 I 70 1 1910 1 770 1 1 5 7 --.

p r e f o c a l 340 2050 11.10 2210 1260 1790

I S U B J E C T : YC En I

I S U B J E C T : I S ( J D ) I c o n t e x t s

f o c a l / p o s t f o c a l

-3 .5 1 p r e f o c a l fOc"l ' 1 -::: 1 0 . 3 1 ;;; 1 - 1 . 3 1 - 4 . 2 1 - 1 . 3 1 -

p a s t f o c a l / p r o f o c a l

- 1 . 1 - 2 . 9 - 2 . 9 - 5 . 5

b ~ h ~ I : a

1200

1750

1400

p o s t f o c a l

f o c a l

p r e f o c a l

I S U B J E C T : BG ( 3 9 ) 1

1190

1800

1450

200

210

1 5 0

a 4 0

560

. 603

540

9 5 0

1000

f o c a l / I-).* p r e f o c a l 1.9 j -1 .7 j 1 . 9 j -o.4 i - l .o I 9 5 9

1020

1150

f o c a l / ~ o s t f o c a l

I S I I B J E C T : L N ( d B 1 I

- 3 . 5

1

0 . 4

Table Ia. Values of Ee (excitation-strength) Table Ib. The relative difference in Ee for the segments in the word be- between the three stress con- halls in the three stress contexts. texts for the segments in the

word behslla.

3.7

postfocal/ p r e f o c a l

0 . 6

b ~ h 3 i : a

3 . 2

-5 .4

S U B J E C T : B C S U B 3 E C T : HC ( d B 1 1

4 . 9

-1.7 2.5

f o c a l / pos t [oca l

f o c a l / p r e f o c n l

p o s t f o c a l / p r e f o c a l

€ ~ ~ d l / p r e f o c a l

p o s t f o c a l / p r e f o c a l

I S U B J E C T : L N ( d B )

0 . 5

- 0 . 6

- 1 . 1

b ~ h I : a

0 . 1

1 . 5

1 . 4

1 . 2

3 . 1

2 . 0

- 2 . 8

- 0 . 6

2.2

- 1 . 8

- 2 . 9

- 3 . 8

- 1 . 0

- 1 9

- 2 0

- 0 . 6

f o c a l / p o s t f o c a l

7 . 2 6 . 8 - 1 . 5 - 4 . 3 - 0 . 4

f o c a l / p r e f o c a l

3.7 3.6 2.4 -0.6 0.1

p o s t f o c a l / p r e f o c a l

- 3 . 6 - 3 . 2 4 . 0 3 .7 0 . 5

1 . 4 25

Table IIa. The relative difference in Ee Table IIb. The relative difference be- between adjacent segments and tween the three stress con- between the vowels / 3 / and /E/. texts, in the degree of con-

trast between the segments in Table IIa.

1 . 8

23

- 4 . 3 -3 .5

3 / ~

- 0 . 8

-1 .9

-0 .3

- 1 . 3

- 2 . 5

p o s t f o c a l

f o c a 1

p r e f o c a l

3 .0

0 . 1

311:

3 . 2

6 . 1

0 . 8

a / \ :

1 . 4

4 . 7

1 . 5

~ / b

9 . 5

1 3

9 . 7

3.0

-2 .6

- 1 . 6

- 1 . a

- ~ / h

4 . 4

25

1 . 8

3/h

3.5

23

1 .6

Fig. 8. Temporal variations of the analysis parameters for the Swedish word behalla from the sentence vi vill behalla honom. Subject JS4a)~ostfocal context (emphasis on vill); (b) Focal context (emphasis on behblla); (c) Prefocal context (emphasis on honom).

STL-QPSR 1/ 1988

A t e n d e n c y was m e n t i o n e d a b o v e o f a s t r o n g e r e x c i t a t i o n f o r t h e

vowels i n f o c a l con tex t . Th i s , it must be p o i n t e d o u t , is n o t abso lu te . Cne e x c e p t i o n is t h e f i n a l /a/ w h e r e o n e f i n d s t h a t E, is s l i g h t l y s t r o n g e r i n p r e f o c a l t h a n i n f o c a l c o n t e x t f o r a l l t h r e e speakers . T h i s is p e r h a p s n o t t o o s u r p r i s i n g . I t is t h e s e c o n d s y l l a b l e i n b e h A l l a which carries t h e word stress. Thus, when t h e word o c c u r s i n p r e f o c a l

c o n t e x t t h e u n s t r e s s e d /a/ i m m e d i a t e l y p r e c e d e s t h e e m p h a t i c a l l y s t r e s s e d f i r s t s y l l a b l e o f honom, a n d is c l e a r l y s o m e w h a t a f f e c t e d b y

it. N o t e h o w e v e r , t h a t /E/ i n b e h d l l a is a l s o u n s t r e s s e d ; y e t , when i t is i m m e d i a t e l y p r e c e d e d b y t h e e m p h a t i c a l l y s t r e s s e d v i l l ( p o s t f o c a l

c o n t e x t ) , it is n o t s i m i l a r l y a f f e c t e d . A v e r y t e n t a t i v e h y p o t h e s i s a t t h i s p o i n t m i g h t b e t h a t a n u n s t r e s s e d v o w e l is more a f f e c t e d b y a s t r e s s e d e l e m e n t t o its r i g h t than t o its l e f t . *

The o t h e r g r o s s c o r o l l a r y o f stress mentioned above was a weakening

o f Ee f o r vo iced consonants. Again, t h i s s t a t e m e n t needs t o be r e f i n e d somewhat. The e f f e c t is q u i t e d i f f e r e n t f o r t h e t h r e e consonants :

(1) For a l l t h r e e s u b j e c t s , /h/ e x h i b i t s t h e l a r g e s t weakening. For JS

and L N i t is v e r y e x t e n s i v e ; Ee is a b o u t 20-25 dB w e a k e r c o m p a r e d

t o t h e non-focal c o n t e x t s .

( 2 ) I n t h e case o f /l:/, t h e e f f e c t is n o t a l w a y s c o n s i s t e n t , a n d t h e

weakening is o f c o n s i d e r a b l y lesser d e g r e e ( t h e w e a k e n i n g is h e r e never more t h a n ca. 4 dB, a s c a n b e s e e n i n T a b l e I b ) .

( 3 ) For / b / t h e r e is n o c lea r e v i d e n c e o f a w e a k e n i n g ( t h e r e a s o n f o r

t h i s c o u l d b e t h a t / b / d o e s n o t p e r t a i n t o a p o t e n t i a l l y s t r e s s e d s y l l a b l e i n a n y c a s e ) . D i f f e r e n c e s f o u n d are small a n d i n c o n s i s - t e n t .

A t t h i s p i n t , it might be worth c o n s i d e r i n g t h e p o s s i b l e produc-

t i o n f a c t o r s t h a t would g i v e rise t o t h e observed v a r i a t i o n s o f Ee. The

s t r o n g e r Ee a s s o c i a t e d p a r t i c u l a r l y w i t h t h e vowels o f t h e word i n f o c a l c o n t e x t cou ld be e x p l a i n e d by e i t h e r ( o r a combinat ion o f ) two f a c t o r s :

(1) Increased r e s p i r a t o r y e f f o r t and s u b s e q u e n t l y h i g h e r s u b g l o t t a l p ressure . I n c r e a s e d a c t i v i t y o f t h e i n t e r n a l i n t e r c o s t a l s was demonstra ted t o i m m e d i a t e l y p r e c e d e t h e s t r e s s e d s y l l a b l e (Lade- f g e d , Draper, & W h i t t e r i d g e , 1958).

( 2 ) Inc reased media l compress ion o f t h e v o c a l fo lds . By b r i n g i n g abou t a more a b r u p t c l o s i n g p h a s e o f t h e g l o t t a l c y c l e , t h i s mechan ism c o u l d c o n c e i v a b l y y i e l d a h igher Ee w i t h o u t any n e c e s s a r y c o n t r i b u - t i o n f r o m r e s p i r a t o r y f a c t o r s . D a t a f r o m t h e word " a d j o " i n F a n t

* For s u b j e c t J S /I/ i s a l s o s t r o n g e r i n p r e f o c a l t h a n i n f o c a l c o n t e x t .

Note, however, t h a t t h e f o c a l c a s e i s n e v e r t h e l e s s c h a r a c t e r i z e d by a g r e a t e r c o n t r a s t between t h e vowel and a d j a c e n t consonant segments ( T a b l e I I b ) .

Fig. 9. (a)(b)(c) Subject BG. Otherwise as in Fig. 8. --

STL-QPSR 1/1988

(1979a) showed e v i d e n c e o f i n c r e a s e d m e d i a l c o m p r e s s i o n i n t h e s t r e s s e d s y l l a b l e ( f o r a d i s c u s s i o n , see a l so F a n t 1 9 8 1 ; 1982a) .

Of t h e t w o p o s s i b l e e x p l a n a t i o n s , t h e p r e s e n t d a t a s u g g e s t t h e former t o be t h e more l i k e l y . I f t h e la t ter were i m p l i c a t e d , one shou ld

p robab ly f i n d (a long w i t h i n c r e a s e d Ee) a reduced ra v a l u e ( l e s s dynamic l e a k a g e ) and a smaller open q u o t i e n t , and t h e s e were n o t a t t e s t e d here.

A s r e g a r d s t h e v e r y g r e a t w e a k e n i n g o f Ee f o r /h / i n t h e f o c a l c o n t e x t , i t would a p p e a r t o r e f l ec t t h e f a c t t h a t /h / t e n d s t o b e s u b s t a n t i a l l y v o i c e l e s s i n t h i s environment ( i n Swedish /h/ is n o r m a l l y

assumed t o b e v o i c e d i n i n t e r v o c a l i c p o s i t i o n ) , w h e r e a s it is f u l l y v o i c e d i n t h e non-focal contexts.* From t h i s w e can i n f e r t h a t t h e r e is l i k e l y t o be a g r e a t e r d e g r e e o f v o c a l f o l d a b d u c t i o n a s s o c i a t e d w i t h /h/ i n t h e f o c a l con tex t . See a l s o Fant (1987) f o r a d i s c u s s i o n o f t h i s po in t . A g r e a t e r d e g r e e o f g l o t t a l abduc t ion f o r v o i c e l e s s consonan ts i n s t r e s s e d s y l l a b l e s w a s a l s o r e p o r t e d b y N i C h a s a i d e ( 1 9 8 5 ; 1 9 8 7 ) - a c o n c l u s i o n based on photo-electrcqlottographic d a t a and more r e c e n t l y suppor ted by EMG d a t a ( p e r s o n a l communication).

I n t h e l i t e r a t u r e t h e r e is e v i d e n c e s u g g e s t i n g t h a t stress may i n v o l v e i n c r e a s e d muscular a c t i v i t y a t e v e r y l e v e l o f product ion: r e s p i - r a t o r y , l a r y n g e a l and a r t i c u l a t o r y (see, f o r i n s t a n c e , Ni Chasaide 1987, and r e f e r e n c e s t h e r e i n ) . I n t h e case of /1:/, r e s p i r a t o r y and a r t i c u l a - t o r y c o r r e l a t e s o f stress may somewhat c o u n t e r a c t e a c h o t h e r i n d e t e r -

mining t h e Ee l e v e l . I n c r e a s e d r e s p i r a t o r y l e v e l s shou ld y i e l d a h igher E, as f o r t h e vowels. On t h e o t h e r hand, i f t h e lateral is a r t i c u l a t e d w i t h g r e a t e r f o r c e , one might e x p e c t it t o be execu ted more r a p i d l y , t o

i n v o l v e (pe rhaps ) g r e a t e r o r a l o c c l u s i o n , and t o have a l o n g e r d u r a t i o n . A g r e a t e r o c c l u s i o n and i n c r e a s e d d u r a t i o n shou ld reduce t h e t r a n q l o t -

t a l p r e s s u r e d r o p and t h u s Ee. That /I:/ is l o n g e r i n f o c a l c o n t e x t can be a s s e r t a i n e d from Figs. 8-10. Note a l s o i n t h e s e f i g u r e s t h a t i n t h e

s t r e s s e d cases t h e r e is a s h a r p e r , more a b r u p t d r o p i n Ee f r o m / 3 / t o t h e f o l l o w i n g /1:/. T h i s s u g g e s t s t h a t t h e a r t i c u l a t o r y g e s t u r e is indeed more r a p i d l y executed. Fur thermore ( r e g a r d l e s s o f environment) , t h e minimum t e n d s t o occur a t t h e end o f t h e segment, an i n d i c a t i o n t h a t

t h e a e r o d y n a m i c c o n s e q u e n c e s o f o r a l o c c l u s i o n d o seem t o c a u s e a de - p r e s s i o n o f Ee.

The c o m b i n e d e f f e c t o f s t r o n g e r Ee f o r v o w e l s a n d w e a k e r Ee f o r v o i c e d consonan ts i n f o c a l c o n t e x t means t h a t t h e c o n t r a s t between t h e s e

t y p e s o f sounds is enhanced (see Table I I b ) . Pu t a n o t h e r way, it means

t h a t t h e n u c l e u s o f t h e s t r e s s e d s y l l a b l e is much more prominent r e l a - t i v e t o t h e s u r r o u n d i n g c o n s o n a n t s . P a s t r e s e a r c h e r s h a v e t e n d e d t o

* For speaker B G , / h / i s s t i l l f u l l y vo iced i n f o c a l c o n t e x t and t h e weak-

ening i s consequen t ly much l e s s e r , o n l y 2-4 d B , as can be seen i n F i g . 9 and Table Ib . The tendency towards a weakening of consonants i n f o c a l p o s i t i o n appears t o be g e n e r a l l y l e s s s a l i e n t and a l s o l e s s c o n s i s t e n t f o r t h i s speaker .

Fig. 10. (a)(b)(c) Subject LN. Otherwise as in Fig. 8.

STL-QPSR 1/1988

Beside t h e v a r i a t i o n i n exc i t a t i on - s t r eng th , Eel i t is d i f f i c u l t t o f i n d r e l i a b l e s o u r c e p r o p e r t i e s r e l a t e d t o s t r e s s . One m i g h t have expected a s h o r t e r r e t u r n phase i n f o c a l c o n t e x t b e c a u s e o f g r e a t e r medial compression of t h e voca l folds . However, a s was d iscussed above, t h i s e x p e c t a t i o n w a s n o t r e a l l y b o r n e o u t i n t h e s e r e s u l t s . S u b j e c t L N

shows h i g h e r ra v a l u e s f o r t h e v o w e l s i n f o c a l c o n t e x t (F ig . 1 0 ) ; f o r t h e o t h e r two s u b j e c t s t h e d i f f e r e n c e s s o m e t i m e s g o i n t h e e x p e c t e d d i r e c t i o n but a r e small . Further ana lyses o f more speakers and c o n t e x t s are needed.

5.1.2. FURTHER OBSERVATIONS MADE FROM THE WORD BEHALLA There a r e f u r t h e r observa t ions t o be made from t h e temporal param-

eter v a r i a t i o n s o f t h i s word b e s i d e t h o s e p e r t a i n i n g t o t h e stress pat te rn . For i n s t a n c e , one migh t n o t e t h a t , r e g a r d l e s s o f s p e a k e r o r stress e n v i r o n m e n t , Ee d r o p s d u r i n g t h e /b / r e a c h i n g a minimum j u s t before r e l e a s e : t h i s is n a t u r a l l y a consequence o f t h e o r a l c l o s u r e , and of t h e consequent ly increas ing o r a l p re s su re and decreasing t rans- g l o t t a l p ressure drop. The d iminish ing t r a n s g l o t t a l p ressure makes t h e voca l f o l d v i b r a t i o n more and more d i f f i c u l t t o s u s t a i n and is d i r e c t l y r e f l e c t e d i n t h e e x c i tat ion-streng th. The decrease of E, is g e n e r a l l y accompanied by a n i n c r e a s e o f ra and rk. S p e c i f i c a l l y f o r ra; t h e i n i t i a l va lue is r e l a t i v e l y high (6-8%) compared wi th the t y p i c a l va lue f o r t h e a d j a c e n t vowe l (2-3%) , and t h e n g r a d u a l l y i n c r e a s e s u n t i l r e l e a s e (10-138) where it drops rap id ly . The c o r r e l a t i o n between ra and rk is cons iderable here, a s w e l l a s i n many o t h e r pos i t ions .

Also, n o t i c e t h e higher dynamic leakage f o r /h/ a s compared t o t h e surrounding vowe l s . Here t h e d u r a t i o n o f t h e r e t u r n p h a s e is o f t e n t h r e e t o f o u r times a s long . The re is a l s o g e n e r a l l y less skewing o f t h e p u l s e s ( l a r g e r rk).

Occasional ly r, inc reases during /I:/ and then dec reases r a p i d l y a t t h e t r a n s i t i o n t o t h e vowel . The i n c r e a s e is a n a l o g o u s t o ( b u t much less pronounced t h a n ) t h a t o f t h e v o i c e d s t o p / b / ; t h e peak v a l u e f o r /I:/ is here t y p i c a l l y 4-6%, and never higher than 8%. In f o c a l contex t t h i s is found f o r a l l t h r e e speakers. In pos t foca l contex t it is found f o r s p e a k e r s BG and J S b u t n o t f o r LN. I n p r e f o c a l c o n t e x t where t h e d i s t i n c t i o n s between t h e l a t e r a l and the ad jacent vowels a r e much more reduced, i t is n o t c l e a r l y found f o r a n y o f t h e s p e a k e r s ( s o m e t i m e s t h e r e is a s l i g h t i nc rease , b u t wi thout t h e sha rp drop a t t h e t r a n s i t i o n t o t h e v o w e l ) . The i n c r e a s e o f ra is t y p i c a l l y c o n c o m i t a n t w i t h t h e decrease of Ee t h a t was mentioned i n Sec t ion 5.1.1.

What about t h e "r is ing-t ime" parameter , r ? This parameter v a r i e s i r r e g u l a r l y and wi th in a narrow range, and is sometimes a lmost cons tan t , as i n f o c a l c o n t e x t , s u b j e c t J S (F ig . 8b). Except ing t h e r i se i n /1:/

and t h e i n i t i a l low p a r t i n /b/,, t he r va lue is ve ry s t a b l e j u s t around g

100% f o r t h i s s p e a k e r . A s a p o s s i b l e e x p l a n a t i o n f o r t h e g e n e r a l l y

STL-QPSR 1/1988

narrow r a n g e o f r v a r i a t i o n , I w o u l d s u g g e s t t h a t t h e o p e n i n g t i m e i n normal phonat ion may b e more o r less c o n d i t i o n e d by t h e t i m e c o n s t a n t o f t h e mass and compliance of t h e v o c a l f o l d s , whereas t h e c l o s i n g t i m e is a f f e c t e d by t h e B e r n o u l l i fo rces . However, t h i s would n o t be a comple te e x p l a n a t i o n ; t h e r e a s o n i n g a b o v e c o n c e r n s t h e g l o t t a l area a n d n o t t h e flow. The i n e r t i v e l o a d i n g o f t h e v o c a l t r a c t ( R o t h e n b e r g , 1 9 8 1 ) a n d t h e g l o t t a l i n d u c t a n c e , d e l a y t h e p e a k g l o t t a l a i r f l o w a s c o m p a r e d t o t h e peak g l o t t a l area (skewing t o t h e r i g h t ) . Fur thermore, Cranen and Boves (1985) have found t h e v e r t i c a l phase l a g between t h e upper and t h e lower p a r t s o f t h e v o c a l f o l d s t o c o n t r i b u t e t o t h e skewing o f t h e f l o w

p u l s e . For t h e s e s p e a k e r s , ra v a l u e s a re t y p i c a l l y ca. 100-120%, w h i c h

means t h a t t h e g l o t t a l f r e q u e n c y , Fg = 1 / 2 t p , is n o r m a l l y s l i g h t l y h i g h e r t h a n t h e fundamental frequency. The o n l y s u b j e c t t h a t occasion- a l l y s h o w s a s y s t e m a t i c v a r i a t i o n f o r t h i s p a r a m e t e r is LN. For i n - s t a n c e , i n f o c a l c o n t e x t t h e r e is some c o v a r i a t i o n b e t w e e n r a n d Ee

9 (Fig . lob) . E x c e p t f o r t h e f i n a l / a / , rg is h i g h e r f o r t h e v o w e l s t h a n

f o r t h e c o n s o n a n t s . B u t t h e v a r i a t i o n s are smal l a n d p r o b a b l y n o t p e r c e p t u a l l y impor tan t .

5.2. TWO COMPLETE SENTENCES The t w o f o l l o w i n g S w e d i s h s e n t e n c e s were s t u d i e d : e n a l l d e l e s

u tmarkt id&, w i t h s e n t e n c e stress on t h e word u tmark t t and i n t e i d e t t a $rhundrade, w i t h s e n t e n c e stress on t h e word d e t t a ( F i g s . 11 a n d 1 2 ) . The s u b j e c t s were t h e same a s i n S e c t i o n 5.1. Here o n l y t h e m o s t g e n e r a l o b s e r v a t i o n s w i l l be mentioned.

Throughout t h e s e s e n t e n c e s and f o r a l l t h r e e s p e a k e r s t h e r e is a

s t r o n g c o r r e l a t i o n b e t w e e n e x c i t a t i o n - s t r e n g t h , Eel a n d t h e n e g a t i v e amp1 i t u d e o f t h e speech waveform.

The p a r a m e t e r v a l u e s o f Ee d e c r e a s e a n d t h e v a l u e s o f ra a n d r k i n c r e a s e b e f o r e t h e v o i c e l e s s consonan ts /s/, / t / r /k/ and a t t h e termi- n a t i o n o f t h e u t t e r a n c e . There is r e a s o n t o b e l i e v e t h a t t h e s e changes , caus ing a smoother and more s i n u s o i d a l g l o t t a l p u l s e shape , occur when-

e v e r v o i c i n g is t e r m i n a t e d ; whether it is b e f o r e a v o i c e l e s s consonant , a pause , o r t h e end o f an u t t e r a n c e ( d i s r e g a r d i n g s p e c i a l cases such as g l o t t a l s t o p s a n d v o c a l f r y ) . One c o u l d a r g u e t h a t t h i s is a d i r e c t consequence o f g l o t t a l a b d u c t i o n o c c u r i n g p r i o r t o d e v o i c i n g . I n a s t u d y c o m b i n i n g a i r f l o w r e c o r d i n g s a n d i n v e r s e f i l t e r i n g ( N i Chasaide & Gobl, 1987)t t h e r e is s t r o n g ev idence o f a c o r r e l a t i o n between g l o t t a l

abduc t ion a n d a c h a n g e t o w a r d s s u c h v o i c e s o u r c e c h a r a c t e r i s t i c s f o r devo ic ing p r i o r t o v o i c e l e s s consonants . Furtherinore, e a r l i e r a i r f l o w r e c o r d i n g s o f f e m a l e v o i c e s (Ananthapadmanabha, 1984) show ev idence o f similar s o u r c e c h a r a c t e r i s t i c s f o r p r e p a u s a l devoic ing.

Usua l ly ra a n d rk v a l u e s are h i g h e r f o r t h e v o i c e o n s e t a s w e l l ; b o t h i n i t i a l l y and w i t h i n an u t t e r a n c e a f t e r v o i c e l e s s sounds. However,

STL-QPSR 1/1988

t h e o n s e t d i f f e r e n c e s are n o t as c o n s i s t e n t a s f o r t h e devo ic iny and are normal ly n o t o f t h e same magnitude and d u r a t i o n .

Typica l v a l u e s b e f o r e a v o i c e l e s s f r i c a t i v e are 8-10% f o r ra a n d

40-50% f o r rk. One m i g h t e x p e c t s l i g h t l y l o w e r v a l u e s o f t h e s e t w o paramete rs b e f o r e v o i c e l e s s s t o p s : a s t h e s t o p s i n v o l v e c o m p l e t e o r a l c l o s u r e , t h i s m i g h t c o n c e i v a b l y l e s s e n t h e e x t e n t t o w h i c h q l o t t a l abduc t ion is n e e d e d t o e f f e c t v o i c e l e s s n e s s . T h i s is s o m e t i m e s s e e n , sometimes n o t . The ra v a l u e b e f o r e v o i c e l e s s s t o p s v a r y a s much a s 3- 18%, b u t a r e a s o n a b l e r u l e would b e 3 t imes t h e v a l u e o f t h e p r e c e d i n g vowel, s a y 6-9%.

Maximum v a l u e s o f ra are n o r m a l l y f o u n d a t t h e t e r m i n a t i o n o f t h e

sentence. The i n c r e a s e e x t e n d s o v e r a l o n g e r time i n t e r v a l t h a n i n - creases occur ing w i t h i n t h e u t t e r a n c e . T h i s more g r a d u a l t e r m i n a t i o n o f v o i c i n g may be r e l a t e d t o t n e f a c t t h a t f i n a l s e g m e n t s l e n g t h e n i n a n y case . Devoicing o f t e n starts i n t h e midd le o f t h e f i n a l vowel , which is h e r e ca . 200-300ms l o n g . W i t h i n a n u t t e r a n c e t h e time i n t e r v a l o v e r

which ra i n c r e a s e s is about 20-50ms. The above comments on t h e v o i c e s o u r c e d o n o t hold f o r cases where

an u t t e r a n c e e n d s i n v o c a l f ry . The g l o t t a l f l o w f o r c r e a k y phonat ion d i f f e r s c o n s i d e r a b l y and was n o t d e a l t w i t h here.

5.3. COMPARISON OF ADULT MALE AND CHILD VOICES.

The S w e d i s h u t t e r a n c e j a g h e t e r J o h a n was a n a l y s e d f o r a n a d u l t male ( J S ) and a c h i l d (AN) ( F i g s . 1 3 a n d 1 4 ) . The s u b j e c t s were t o l d t o r e a d t h e u t t e r a n c e i n a n a t u r a l way. Otherwise , no f u r t h e r i n s t r u c t i o n s

were g i v e n . The most s t r i k i n g d i f f e r e n c e is perhaps i n r e a d i n g s t y l e s : f o r JS

s e n t e n c e stress f a l l s o n t h e l a s t w o r d , t h e name J o h a n , w h e r e a s f o r AN

a l l t h r e e words are e q u a l l y ( f a i r l y s t r o n g l y ) s t r e s s e d . For t h e c h i l d t h i s means l o n q e r phonemes and less r e d u c e d p r o n u n c i a t i o n , e s p e c i a l l y f o r t h e two f i r s t words.

Apart f r o m d i s t i n c t i o n s p e r t a i n i n g t o r e a d i n g s t y l e s , t h e r e are o b v i o u s l y d i f f e r e n c e s i n t h e g l o t t a l p u l s e shape as w e l l . Again w e can

see t h a t t h e r e t u r n p h a s e , ra, p l a y s a n i m p o r t a n t r o l e . E x c e p t b e f o r e

t h e s t o p c o n s o n a n t , t h e a d u l t h a s ra v a l u e s r a n g i n g b e t w e e n 1% a n d 4% f o r t h e v o w e l s r b u t t h e c h i l d ' s v a l u e s are much h igher ; 5-12%.

Dis regard ing t h e o v e r a l l l e v e l d i f f e r e n c e s , t h e g r o s s p a t t e r n o f

dynamic v a r i a t i o n is roughly similar. A t f i r s t g l a n c e t h e t e r m i n a t i o n s can l o o k r a t h e r d i s s i m i l a r . However, t h i s is a consequence o f t h e way

t h e model works and o f t h e f a c t t h a t t h e s o u r c e f u n c t i o n becomes approx- i m a t e l y s i n u s o i d a l , p a r t i c u l a r l y f o r t h e c h i l d . The g l o t t a l p u l s e s towards t h e t e r m i n a t i o n become less and less skewed, which means t h a t rk v a l u e s become l a r g e r . For s m a l l v a l u e s o f r k r t h e r e is a p o s i t i v e

c o r r e l a t i o n between it and ra. A s rk i n c r e a s e s beyond a c e r t a i n th resh- o l d ( n o r m a l l y a r o u n d 50%, when r is a r o u n d 1 0 0 % ) t h e c o r r e l a t i o n is

9

STL-QPSR 1/1988

r e v e r s e d a n d ra d r o p s . F o r rk v a l u e s o f more t h a n 50%, t h e main e x c i t a t i o n , d e f i n e d as t h e maximum d i s c o n t i n u i t y , d o e s n o t c o i n c i d e w i t h t h e maximum n e g a t i v e f l o w d e r i v a t i v e . The e x c i t a t i o n o c c u r s later and

weakens a s rk i n c r e a s e s ( a n d ra d e c r e a s e s ) . Here, r e d u c e d v a l u e s o f ra are accompanied by a r e d u c t i o n i n e x c i t a t i o n - s t r e n g t h and t h e r e f o r 2 w i l l

n o t l e a d t o s t r o n g e r h i g h e r h a r m o n i c s . When rk a n d r are e q u a l t o g loo%, t h e r e is n o e x c i t a t i o n a t a l l a n d t h e r e c a n n o t , p e r d e f i n i t i o n ,

b e a r e t u r n p h a s e (ra = 0 ) ( c f . S e c t i o n 2) . For s o u r c e p u l s e s w h i c h are a lmos t s i n u s o i d a l i t is h a r d t o l o c a t e t h e m a i n e x c i t a t i o n , a n d i t is also v e r y u n l i k e l y t h a t t h e r e is o n l y o n e d o m i n a t i n g e x c i t a t i o n . I n t h e s e cases t h e S o u r c e m a t c h i n g was d o n e t o g i v e t h e b e s t s p e c t r a l resemblence and a r e a s o n a b l e c o n t i n u i t y o f pa ramete r va lues . I n o t h e r words; o n e w i l l n o t a l w a y s f i n d a n i n c r e a s e i n ra w h e r e o n e m i g h t h a v e

expec ted i t (e.g. when rk is i n c r e a s i n g ) . I f ra is a l r e a d y h i g h ( a s t y p i c a l l y f o r t h e c h i l d ' s v o i c e ) a n d i f t h e g l o t t a l p u l s e is becoming more s i n u s o i d a l , f u r t h e r change i n t h e same d i r e c t i o n may y i e l d l i t t l e

i n c r e a s e o r e v e n a d e c r e a s e i n ra. The o n s e t f o r s u b j e c t J S shows an e x c e p t i o n t o t h e r u l e o f c o v a r i a -

t i o n b e t w e e n ra a n d rk t h a t is n o t d u e t o a n y p r o p e r t y o f t h e model . Here we have a hard ( g l o t t a l i z e d ) o n s e t which is r e f l e c t e d by t h e rela-

t i v e l y l o w v a l u e o f ra ( 1 % ) i n i t a l l y . Even t h o u g h ra is small a n d r i s i n g r a t h e r t h a n f a l l i n g t o w a r d s / a / / r k is f a l l i n g ( t h o u g h n o t v e r y

much). I n t h e c h i l d ' s v o i c e , w h e r e t h e o n s e t is much s o f t e r , ra f a l l s from 20% t o 5% and rk from 57% t o 38%. G e n e r a l l y , rk is s l i g h t l y h i g h e r f o r t h e c h i l d t h a n f o r t h e a d u l t . Typ ica l v a l u e s f o r vowels are 25-358 f o r t h e t h r e e a d u l t males and 30-40% f o r t h e c h i l d .

6. Summary and Conclusions The d i s t r i b u t i o n o f t h e v a l u e s o f t h e a n a l y s i s p a r a m e t e r s ( e x c e p t

E,) o f t h e LF mode l is shown i n Fig . 15. T h e s e r e s u l t s a re f r o m 1 2 seconds o f a d u l t male s p e e c h a n d 2.5 s e c o n d s o f a c h i l d ' s v o i c e . Two

a l t e r n a t i v e p a r a m e t e r s l Fa and Oq, were a l s o c a l c u l a t e d . Fa is inverse -

l y p r o p o r t i o n a l t o t h e a b s o l u t e time o f t h e r e t u r n phase, t h u s independ- e n t o f t h e f u n d a m e n t a l p e r i o d . E x c e p t i n g cases w h e r e t h e r e a re v e r y l a r g e v a r i a t i o n s i n FO, i t c a n b e a s s u m e d t h a t Fa i n c r e a s e s when ra decreases . 'The "open q u o t i e n t " , O q r is h e r e d e f i n e d a s t h e r e l a t i o n between t h e t i m e from g l o t t a l opening t o main e x c i t a t i o n and t h e funda-

mental p r i o d , te/To = ( 1 + rk ) /2 rg . T h i s means t h a t small v a l u e s o f r Y and l a r g e v d l u e s o f rk c o r r e l a t e w i t h l a r g e v a l u e s o f 0

9' The h i s t o g r a m s i n F ig . 1 5 i l l u s t r a t e t h e r a n g e o f v a l u e s f o r t h e

p a r a m e t e r s i n normal speech. Paramete r v a l u e s are m o s t l y c o n f i n e d t o a narrow p a r t o f t h e d i s t r i b u t i o n . G l o t t a l p u l s e s h a p e s t h a t d i f f e r c o n s i d e r a b l y from t h e " t y p i c a l " shape still c o n s t i t u t e a s m a l l percen- tage o f the t o t a l number. It is most l i k e l y , however, t h a t t h e y c o n t r i -

STL-QPSR 1/1988

Fig. 15. Distribution of rg, rk, ra, Fa, and Oq values, as a percentage of the total duration. (a) 12 seconds of male voices. Three subjects. (b) 2.5 seconds of a child's voice. Note that in several of the histograms the x-axis is not con- sistently linear.

STL-QPSR 1/1988

b u t e s i g n i f i c a n t l y t o t h e p e r c e i v e d n a t u r a l n e s s , an assumpt ion cor ro -

b o r a t e d b y t h e r e s y n t h e s i s t h a t w a s c a r r i e d o u t . I d o n o t c laim t h a t t h e s e l i m i t e d s a m p l e s o f speech are n e c e s s a r i l y r e p r e s e n t a t i v e f o r a l l o f normal speech. There might indeed be s i g n i f i c a n t d i f f e r e n c e s between

t h i s t y p e o f s p e e c h m a t e r i a l s a n d more c a s u a l o r s p o n t a n e o u s s p e e c h . N e v e r t h e l e s s , t h e y p r o v i d e a g o o d i n i t i a l est imate o f t h e n e c e s s a r y c o n t r o l r ange o f t h e s e parameters .

The l i m i t e d d a t a i n t h i s s t u d y o b v i o u s l y p r e c l u d e s t a t i s t i c a l

a n a l y s i s o f t h e d i f f e r e n c e s b e t w e e n a d u l t male v o i c e s a n d c h i l d r e n ' s v o i c e s i n g e n e r a l . B e a r i n g t h i s i n m i n d , t h e d i f f e r e n c e s b e t w e e n t h e

v a l u e s o f t h e p a r a m e t e r s ra a n d Fa a re , h o w e v e r , s t r i k i n g . I t i s v e r y

l i k e l y t h a t a l a r g e r d y n a m i c l e a k a g e is a c h a r a c t e r i s t i c f e a t u r e o f

c h i l d r e n ' s v o i c e s compared t o a d u l t male vo ices . Other t e n d e n c i e s found

i n t h e s e d a t a a re h i g h e r v a l u e s o f r k a n d 0 f o r t h e c h i l d ' s v o i c e . 9

E s p e c i a l l y t h e v e r y h i g h v a l u e s o f rk ( > 60%) are more f r e q u e n t f o r t h e c h i l d . One m i g h t a l s o n o t e , f o r b o t h t h e a d u l t males a n d t h e c h i l d ,

t h a t 0 v a l u e s o f less t h a n 50% are v e r y i n f r e q u e n t , i.e. t h e c l o s e d q

p o r t i o n ( o r t h e f l a t p o r t i o n i f t h e r e is a c o n s t a n t l eakage th rough t h e

g l o t t i s ) o f t h e g l o t t a l c y c l e is a l m o s t a l w a y s s h o r t e r t h a n h a l f t h e

fundamental p e r i o d . The v a l u e s o f r i n t h e s e d a t a a r e s l i g h t l y l o w e r 3

f o r t h e c h i l d t h a n f o r t h e a d u l t males. It t r a n s p i r e s t h a t a l o t o f i n f o r m a t i o n a b o u t t h e g l o t t a l p u l s e

shape can be i n f e r r e d d i r e c t l y from t h e n e g a t i v e a m p l i t u d e of t h e speech

waveform. There is a s t r o n g c o r r e l a t i o n between it and t h e e x c i t a t i o n -

s t r e n g t h , Ee (as one might e x p e c t i n t u i t i v e l y ) . Ee is t h e main d e t e r - minant o f t h e i n i t i a l a m p l i t u d e o f F l , which i n t u r n is t h e main d e t e r - minant o f t h e n e g a t i v e a m p l i t u d e o f t h e s p e e c h waveform. (Phenomena

such as s u p e r p o s i t i o n o f fo rmant o s c i l l a t i o n s from p r e v i o u s p e r i o d s mean

t h a t t h e r e l a t i o n s h i p w i l l n o t be p e r f e c t l y l i n e a r ) . The r e t u r n phase r,, f o r its p a r t , v a r i e s i n v e r s e l y w i t h t h e exc i -

t a t i o n , w h i c h c a n b e u n d e r s t o o d i n t h e f o l l o w i n g way: i f , a t t h e main

e x c i t a t i o n , t h e v o c a l f o l d s d o n o t make c o n t a c t c o m p l e t e l y a long t h e i r e n t i r e l e n g t h , t h i s w i l l c a u s e a r e s i d u a l flow. When t h e e x c i t a t i o n is s t r o n g , t h e v o c a l f o l d s are c l o s i n g r a p i d l y , and t h e r e f o r e t h e t i m e from e x c i t a t i o n t o comple te c l o s u r e ( o r maximum c l o s u r e l i f t h e r e is c o n s t a n t

l e a k a g e ) w i l l b e s h o r t . Assuming a s i m i l a r c l o s i n g g e s t u r e , a weak e x c i t a t i o n i m p l i e s a s l o w e r r a t e o f v o c a l f o l d c l o s u r e , a n d t h e r e f o r e

t h e t i m e from e x c i t a t i o n t o comple te c l o s u r e w i l l be longer .

A s s t a t e d earlier, t h e r e is a c o v a r i a t i o n between ra and rk which

can be e x p l a i n e d i f r is f a i r l y cons tan t . When ra is s m a l l (and, from g

t h e reason ing above, E, l a r g e ) , t h e c l o s i n g - t i m e from maximum f l o w t o t h e e x c i t a t i o n w i l l be s h o r t and c a u s e a skewed p u l s e and t h u s a low rk va lue . Assuming similar opening and c l o s i n g g e s t u r e s and abou t t h e same peak f l o w , a l a r g e v a l u e o f ra a n d a s m a l l v a l u e o f Ee i m p l i e s a l o n g e r f a l l i n g - t i m e and t h e r e f o r e a h i g h v a l u e of rk. For o n e of t h e

STL-QPSR 1/1988

Table 111. Typical values for the parameters ra, rk, and rg, for differ- ent sounds in different contexts, generalized from the inverse filtered data. See text for comments.

STL-QPSR 1/1988

s u b j e c t s (LN, c f . Fig . l o b ) r c o v a r i e d s o m e w h a t w i t h Ee l w h i c h m e a n s 9 t h a t t h e o p e n i n g - t i m e g e t s s h o r t e r a s t h e e x c i t a t i o n g e t s s t r o n g e r . T h i s w i l l o f c o u r s e c o u n t e r a c t t h e c o r r e l a t i o n b e t w e e n rk a n d ra and make t h e skewing more independent o f t h e e x c i t a t i o n - s t r e n g th . However, i n t h e s e mater ia ls t h e c o v a r i a t i o n b e t w e e n rk a n d ra is t h e d o m i n a n t

t r e n d .

The above d i s c u s s i o n on t h e i n t e r d e p e n d e n c i e s between t h e d i f f e r e n t g l o t t a l p a r a m e t e r s is somewhat s i m p l i f i e d and is main ly in tended t o g i v e

t h e r e a d e r a n i n t u i t i v e l y g r a s p a b l e e x p l a n a t i o n o f t h e c o v a r i a t i o n s found i n t h e s e d a t a . A l i n e a r r e l a t i o n s h i p between t h e g l o t t a l a r e a and t h e a i r f l o w th rough t h e g l o t t i s is assumed, i.e. i n f l u e n c e s o f t h e v o c a l tract l o a d a n d t h e g l o t t a l i n d u c t a n c e a r e n o t t a k e n i n t o a c c o u n t . Furthermore, t h e p e a k f l o w , U , is a s s u m e d t o be p r o p o r t i o n a l t o t h e

P g l o t t a l o p e n i n g t i m e ( a n d c o n s e q u e n t l y a l s o t o t p ) For some o f t h e a n a l y s e d materials, p a r a m e t e r s d i d n o t covary e x a c t l y i n accordance w i t h t h e g e n e r a l t r e n d s j u s t d e s c r i b e d , s h o w i n g t h a t t h e s i m p l i f y i n g as- sumptions made h e r e w i l l n o t a l w a y s h o l d . F u r t h e r , f o r o t h e r v o i c e t y p e s ( a n d f o r non-modal v o i c e q u a l i t i e s ) o n e m i g h t w e l l f i n d o t h e r c o r r e l a t i o n s between t h e model parameters .

I n T a b l e I11 t y p i c a l v a l u e s f o r t h e p a r a m e t e r s o f t h e LF model i n

d i f f e r e n t c o n t e x t s a re p r e s e n t e d . They may s e r v e a s a s t a r t i n g - p o i n t f o r a u s e f u l dynamic v o i c e s o u r c e governed by r u l e s . A f i r s t t e n t a t i v e

approach t o c o n t r o l t h e LF model i n a text- to-speech system cou ld s i m p l y

b e t o l e t Ee b e c o n t r o l l e d b y t h e a m p l i t u d e p a r a m e t e r o f t h e conven- t i o n a l e x p o n e n t i a l p u l s e s o u r c e . The o t h e r p a r a m e t e r s s h o u l d t h e n be

de te rmined from t h e v a l u e o f Ee, w i t h d i f f e r e n t p r o p o r t i o n a l i t y c o e f f i -

c i e n t s t o g i v e t h e b e s t f i t w i t h t h e t y p i c a l p a r a m e t e r v a l u e s i n T a b l e 111. A s t h e main purpose o f t h e t a b l e is t o l a y down t h e broad o u t l i n e s f o r d e v e l o p i n g a s o u r c e r u l e s y s t e m , I h a v e a l s o i n c l u d e d p a r a m e t e r v a l u e s f o r s o u n d s t h a t w e r e n o t a n a l y s e d . T h e s e are o f c o u r s e o n l y a n

approximate f i r s t g u e s s based on t h e v a l u e s o f similar sounds, and w i l l

e v e n t u a l l y need t o b e s p e c i f i e d more p r e c i s e l y .

Comments on Table 111:

( 1 ) I f devo ic ing is e f f e c t e d by a g l o t t a l s t o p t ra should be reduced by

a f a c t o r o f 2-4, rk s h o u l d b e i n c r e a s e d b y a f a c t o r o f 1.5-2, a n d r shou ld be i n c r e a s e d by a f a c t o r o f 1.3-2.

9

( 2 ) The /1/ s o u n d may e x h i b i t a p a t t e r n s i m i l a r t o t h e v o i c e d s t o p s . I n t h o s e c a s e s , t h e ra v a l u e a t o r a l r e l e a s e s h o u l d be 1-3 t imes

t h e i n i t i a l ra v a l u e . T h i s p a t t e r n is more l i k e l y t o be f o u n d i n s t r e s s e d s y l l a b l e s , a n d is n o r m a l l y a c c o m p a n i e d b y a d e c r e a s e i n t h e e x c i t a t i o n , Ee. The ra v a l u e f o r n a s a l s may be i n c r e a s e d by a f a c t o r o f 1-2.

(3 ) r may covary modera te ly w i t h t h e e x c i t a t i o n , Ee. g

STL-QPSR 1/1988

7. Discussion The m a t e r i a l s a n a l y s e d h e r e m a i n l y shed l i g h t on t h e g e n e r a l be-

haviour o f t h e v o i c e s o u r c e i n a no rma l dynamic c o n t e x t . For f u t u r e s t u d i e s i t would be i n t e r e s t i n g t o i n v e s t i g a t e whe the r t h e r e a r e any s i g n i f i c a n t i nhe ren t d i f f e r e n c e s i n g l o t t a l pu lse shape as f o r d i f f e r e n t voiced sounds. Obviously vowels d i f f e r c o n s i d e r a b l y from most v o i c e d consonants, b u t t h e r e could conceivably a l s o be phys io log ica l ly r e l a t e d dependencies t h a t might l ead t o d i f f e r e n c e s i n voice source c h a r a c t e r i s - t i c s f rom one vowe l t o a n o t h e r , o r a t l e a s t t o d i f f e r e n c e s be tween groups o f v o w e l s , s a y , b a c k and f r o n t v o w e l s ( c f . F o o t n o t e on p. 144).

(For e f f e c t s of v a r i o u s degrees of voca l t r a c t c o n s t r i c t i o n on t h e voice source, see B i c k l e y & S t e v e n s , 1986.) One m i g h t a l s o a s k as t o whe the r t h e r e might be inhe ren t vo ice source d i f f e r e n c e s between nasa l i zed and non-nasalized vowels, a s w e l l a s source d i f f e r e n c e s which c o r r e l a t e w i t h vo ice loudness (see, f o r ins tance , Holmberg, Hillman, & P e r k e l l , 1987), voca l fo ld t ens ion , FO, e t c . (For FO dependencies, see Fant 1982a; Fant & Ananthapadmanabha, 1982; Monsen & Engebretson, 1977.) To be a b l e t o answer some o f t h e s e q u e s t i o n s , c o n t r o l l e d s p e e c h m a t e r i a l s w i l l b e needed where v a r i a b l e s t h a t a f f e c t t h e g l o t t a l a i r f l o w , o t h e r than t h e one under i n v e s t i g a t i o n , a r e held a s cons t an t a s possible . This proce-

du re w i l l ensure t h a t d i f f e r e n c e s found are r e a l l y r e l a t e d t o t h e fea- t u r e s tud ied , and not caused by o t h e r covarying proper t ies .

A l t e rna t ive a n a l y s i s parameters have been mentioned i n Sec t ions 2 and 6. The ques t ion remains as t o which q u a n t i t i e s are phone t i ca l ly t h e most r e l e v a n t . U n f o r t u n a t e l y i t is n o t a s i m p l e d e c i s i o n . To g i v e a n example: o r i g i n a l l y w e d e f i n e d t h e open q u o t i e n t a s ( te + t a ) /To = 0 +

q ra according t o t h e t r a d i t i o n a l d e f i n i t i o n . However w i t h ra included, it r e s u l t e d i n open q u o t i e n t v a l u e s depend ing v e r y much on t h e r e t u r n phase. Because o f t h e i m p o r t a n c e o f t h e r e t u r n p h a s e f o r t h e s o u r c e spectrum, it seems reasonable t o desc r ibe it wi th one independent param-

e t e r ( ra t Fa, o r ta). To u s e a n o t h e r p a r a m e t e r i n c l u d i n g t h e r e t u r n phase would g ive redundant i n formation and obscure o t h e r dynamic source proper t ies . Therefore t h e open quo t i en t is def ined a s te/To. Further- more, t h e va lues of 0 seem c l o s e r t o e a r l i e r d a t a on t h e r a t i o between

q g l o t t a l open period and fundamental per iod, where t h e r e t u r n phase w a s normally d i s r ega rded .

The exc i t a t i on - s t r eng th is of paramount importance and should natu-

r a l l y be descr ibed by a s i n g l e parameter, here Ee. The r e t u r n phase is a l s o very impor tan t , b u t it is no t obvious which of t h e r e p r e s e n t a t i o n s rat Far and ta is preferab le . ra has t h e advantage of providing i n t u i - t i v e l y g ra spab le informat ion on t h e g l o t t a l pu lse shape. Fa, which is t h e i n v e r s e o f ta m u l t i p l i e d by a c o n s t a n t , is i n d e p e n d e n t o f FO b u t i n s t ead d i r e c t l y c o r r e l a t e d t o t he source spectrum. For t h e two remain- ing parameters t h e r e e x i s t s e v e r a l candidates . Three have a l r eady been

STL-QPSR 1/ 1988

mentioned above ( rk , rg , and 0 ) , b u t any o f t h e f o l l o w i n g migh t have q

been used: "PI

Ai=Ei/Eer t / t =l/rk, c l o s e d q u o t i e n t C '1-Oq, Fg tn l P n 9 e t c . The two p a r a m e t e r s , rk and r 1 m a i n l y a f f e c t t h e low f r e q u e n c y

g energy of t h e source spectrum. S i m p l i s t i c a l l y one might s ay they repre- s e n t l e v e l and f r e q u e n c y o f t h e " g l o t t a l f o r m a n t " , even though t h e p r o p e r t i e s " l e v e l " and " f r e q u e n c y " c a n n o t be d e s c r i b e d by i n d e p e n d e n t time domain parameters.

Some o f t h e u t t e r a n c e s were r e s y n t h e s i z e d , u s i n g f o r m a n t v a l u e s taken from t h e i n v e r s e f i l t e r e d d a t a and s o u r c e p a r a m e t e r v a l u e s from t h e matched source data . The ch i ld ' s vo ice is most s t r i k i n g l y improved with t h e more s o p h i s t i c a t e d model , b u t a l s o f o r t h e a d u l t ma le v o i c e s s y n t h e s i s q u a l i t y is cons iderably enhanced.

Most o f t h e work h e r e was conce rned w i t h a d u l t male v o i c e s . I t

would b e u s e f u l i n t h e f u t u r e t o f o c u s on t h e v o i c e s o f women and ch i ld ren i f w e a r e t o g e t a more c o m p l e t e p i c t u r e o f b o t h v o i c e s o u r c e p r o p e r t i e s and t h e v e r s a t i l i t i e s of t h e LF model.

Acknowledsments The presented work w a s supported by the Swedish Telecom Administra-

t i o n (Televerket) . The r e sea rch was c a r r i e d ou t a t t h e Dept. of Speech Communication and Music Acoustics, KTH.

I w i s h t o t h a n k P ro f . Gunnar Fan t f o r h i s i n v a l u a b l e h e l p and guidance. I am a l s o indebted t o Johan L i l j e n c r a n t s and T.V. Ananthapad- 1 manabha f o r providing t h e computer programs, t o Lennart Nord who d i d t h e record ings , and t o Gudrun Tannergsrd who drew most of t h e f igures . I am g r a t e f u l t o Ailbhe N i Chasaide f o r d i scuss ing many a s p e c t s of t h i s work and f o r c r i t i c a l comments on t h e manuscript.

References Ananthapadmanabha, T.V. (1984) : " A c o u s t i c a n a l y s i s o f v o i c e s o u r c e dynamic^"^ STL-QPSR 2-3/1984, pp. 1-24.

Ananthapadmanabha, T.V. & F a n t , G. (1982) : " C a l c u l a t i o n o f t r u e g l o t t a l flow and its combnents" , Speech Communication - 1, pp. 167-184; a l s o i n STL-QPSR 1/1982r pp. 1-30.

Bickley, C.A. & S t e v e n s , K.N. ( 1986) : " E f f e c t s o f a v o c a l - t r a c t con- s t r i c t i o n on t h e g l o t t a l source: experimental and modelling s tud ie s " , J. o f P h o n e t i c s - 1 4 , No.3/4, pp. 373-382.

Carlson, R., G rans t rom, B., & H u n n i c u t t , S. (1981) : "A m u l t i - l a n g u a g e text-to-speech module", STL-QPSR 4/1981, pp. 18-28.

Cranen, B. & B O V ~ S ~ L. (1985) : " P r e s s u r e measu remen t s d u r i n g s p e e c h product ion using semiconductor minia ture pressure t ransducers : impact on models f o r speech production" , J.Acoust.Soc.Am. - 77 :4, pp. 1543-1551.

STL-QPSR 1/1988

Fantt G. ( 1960) : The A c o u s t i c Theory o f Speech P r o d u c t i o n , Mouton, Hague (2nd e d i t i o n 1970) .

Fant, G. ( 1979a ) : " G l o t t a l s o u r c e and e x c i t a t i o n a n a l y s i s " STL-QPSR 1/1979t pp. 85-107.

Fant, G. (1979b): "Vocal source a n a l y s i s - a progress r epo r t " , STL-QPSR 3-4/1979, pp. 31-54.

Fant , G. ( 1980) : "Voice source dynamics" STL-QPSR 2-3/1980, pp. 17-37.

Fant, G. (1981) : "The s o u r c e f i l t e r c o n c e p t i n v o i c e p r o d u c t i o n " , STL- QPSR 1/1981r pp. 21-37.

Fant, G. (1982a) : " P r e l i m i n a r i e s t o a n a l y s i s o f t h e human v o i c e source" r STL-QPSR 4/1982, pp. 1-27.

Fant, G. (1982b) : "The v o i c e s o u r c e - a c o u s t i c model ing" , STL-QPSR 4/1982, pp. 28-48.

Fan t , G. ( 1987) : " I n t e r a c t i v e phenomena i n speech production" , pp. 376- 381 i n R o c . XIth ICPhS, Ta l l i nn , Estonia , USSR, Aug. 1987 , Vo1.3, Es to- n i an Academy o f Sciences.

Fant, G. & Ananthapadmanabha, T.V. ( 1982) : " T r u n c a t i o n and s u p e r p o s i - t ion" 1 STL-QPSR 2-3/1982, pp. 1-17.

Fant, G. & Lin , Q. (1987) : " G l o t t a l s o u r c e - v o c a l t r a c t a c o u s t i c i n t e r a c t i o n " STL-QPSR 1/ 1987, pp. 13-27.

Fant, G. & Sonesson , B. ( 1962) : " I n d i r e c t s t u d i e s o f g l o t t a l c y c l e s by synchronous i nve r se f i l t e r i n g and photo-elec tr ica l g l o t t ~ g r a p h y " ~ STL- QPSR 4/1962t pp. 1-3.

Fant, G., L i l j e n c r a n t s , J., & L i n t Q. ( 1985a ) : "A f o u r - p a r a m e t e r model o f g l o t t a l f l o w " , French-Swedish Semina r on Speech , G r e n o b l e , A p r i l 1985; a l so i n QPSR 4/1985t pp. 1-13.

Fant, G., L i n , Q., & Gob11 C. (1985b) : "No te s on g l o t t a l f l o w i n t e r a c - t ion" t STL-QPSR 2-3/1985, pp. 21-45.

Fant, G., Nordl L., & Kruckenberg , A. ( 1987a ) : "Segmen ta l and p r o s o d i c v a r i a b i l i t i e s i n connected speech. An app l i ed data-bank study", pp. 102- 105 i n Proc. XI th ICPhS, T a l l i n n , E s t o n i a , USSR, Aug. 19871 V01.6, Estonian Academy of Sciences.

Fant, G., Gobl , C., K a r l s s o n , I., & Lin , Q. (198713): "The f e m a l e v o i c e - Experiments and o v e r v i e w " , J.Acoust.Soc.Am. - 8 2 , S92(A).

Gauff in , J. & Sundberg , J. (1980) : "Data on t h e g l o t t a l v o i c e s o u r c e behavior i n vowel production", STL-QPSR 2-3/1980, pp. 61-70.

Gobl, C. ( 1985) : " R o s t k a l l a n s v a r i a t i o n i t a l " , u n p u b l i s h e d t h e s i s work .

STL-QPSR 1/1988

N i Chasaide, A. ( 1985) : "Preasp i r a t ion i n phonolcg i c a l s t o p con t r a s t s " , unpublished Ph.D. t h e s i s t U n i v e r s i t y C o l l e g e o f Nor th Wales, Bangor, March 1985.

~i Chasa ide , A. ( 1987) : " G l o t t a l c o n t r o l o f a s p i r a t i o n and o f v o i c e - l e s snes s" , pp. 28-31 i n Proc. XIth ICPhS T a l l i n n , E s t o n i a , USSR, Vo1.6, Estonian Academy of Sciences.

~i hasa aide, A. & Gob11 C. ( 1987) : " C r o s s l a n g u a g e s t u d y o f t h e e f f e c t s o f vo iced/voice less consonants on t h e vowe l v o i c e s o u r c e c h a r a c t e r i s - t i c s " , J.Acoust.Soc.Am. 8 2 , S116(A). - Nord, L., Ananthapadmanabha, T.V., & Fan t , G. (1984) : " S i g n a l a n a l y s i s and p e r c e p t u a l tests o f vowe l r e s p o n s e s w i t h a n i n t e r a c t i v e s o u r c e f i l t e r modelNr STL-QPSR 2-3/1984, pp. 25-52.

Rosenberg, A.E. ( 1971) : " E f f e c t o f g l o t t a l p u l s e s h a p e on t h e q u a l i t y of n a t u r a l vowels", J.Acoust.Soc.Am. 49:2, pp. 583-598. - Rothenberg M. ( 1973) : "A new i n v e r s e f i l t e r i ng t e c h n i q u e f o r d e r i v i n g t h e g l o t t a l a i r f low waveform during voicing", J.Acoust.Soc.Am. 531 pp. - 1632-1645.

Rothenberg, M. (1981): "Acoustic i n t e r a c t i o n between the g l o t t a l source and t h e v o c a l t r a c t " , pp. 305-323 i n (K.N. S t e v e n s & M. H i r ano , eds.) Vocal Fold Physiology, Univers i ty of Tokyo Press.

Rothenberg, M. (1983): "An i n t e r a c t i v e model f o r t h e vo ice source", pp. 155-165 i n (D.M. B l e s s & J.H. Abbs, eds.) Vocal Fold P h y s i o l o g y , Col- lege-Hil l , San Diego, CA; a l s o i n STL-QPSR 4 /19811 pp. 1-17.

Rothenberg, M. C a r l s o n , R., G rans t rom, B., & L i n d q v i s t - ~ a u f f i n , J. (1974): "A three-parameter vo ice source f o r speech synthes is" , pp. 235- 243 i n (G. Fantr ed.) Speech Communicationr Vol. 21 Almqvist & Wiksel l , I n t . , Stockholm.

Sondhi,M.M. (1975) : "Measurement o f t h e g l o t t a l waveform", J.Acoust. Soc.Arn. 57:1, pp. 228-232. - Sundberg, J. (1972) : "An a r t i c u l a t o r y i n t e r p r e t a t i o n o f t h e ' s i n g i n g formant'", STL-QPSR 1 /1972, pp. 45-53; a l s o i n J.Acoust.Soc.Am. 55:4, - pp. 838-844.

Sundberg, J. & Gauff i n , J. (1979) : "Waveform and s p e c t r u m o f t h e g l o t - t a l v o i c e s o u r c e " , pp. 301-322 i n (B. Lindblom & S. ohman, eds.) Fron- tiers of Speech Cormnunicat i~n~ Academic Press, London; a l s o i n STL-QPSR 2-3 /I978 t pp. 35-50.