SPPA 403 Speech Science 1
Unit 3 outline
• The Vocal Tract (VT)• Source-Filter Theory
of Speech Production• Capturing Speech
Dynamics• The Vowels• The Diphthongs• The Glides• The Liquids
SPPA 403 Speech Science 2
Source-Filter Theory: Modeling Vowels
Vowels are• Voiced (shapes the glottal source spectrum)• oral (velopharyngeal port closed)• produced with “open” vocal tract (VT)
– Poral ~ Patmos
• Classified according to tongue position in VT– front/central/back– high/mid/low
SPPA 403 Speech Science 3
Vowel Quadrilateral
SPPA 403 Speech Science 4
Source-Filter Theory: Modeling Vowels
• Mobile articulators serve to change the VT area function so that it is not constant
• non-constant area function complexity for determining VT transfer function
• However, VT transfer function still based on tube acoustics
SPPA 403 Speech Science 5
Artic. Config. Area Function Transfer Function
glottis lipsfrequency
gain
SPPA 403 Speech Science 6
Source-Filter Theory: Modeling Vowels
• VT has an infinite number of resonances/formants
• Identification of vowel quality seems most dependent upon the location of F1, F2 & F3
• These observations are based on – Studies of vowel perception– Modeling efforts which suggest F4-F6 are relatively
static– the observation that glottal source spectrum rolls off
with increasing frequency
SPPA 403 Speech Science 7
Mid Central vowelF1: 500 HzF2: 1500 HzF3: 2500 Hz
/i/
/u/
//
//
frequency
Am
plit
ude/
gain
SPPA 403 Speech Science 8
Tongue “Rules” for vowel formant values
/i/ & /u/ have a low F1
// & // have high F1
Tongue height ~ F1
Tongue height F1 Tongue height F1
/u/ & // have low F2
/i/ & // have high F2
Tongue A-P ~ F2
Tongue front F2 Tongue back F2
SPPA 403 Speech Science 9
Mid Central vowelF1: 500 HzF2: 1500 HzF3: 2500 Hz
/i/
/u/
//
//
frequency
gain
SPPA 403 Speech Science 10
Lip “Rules” for vowel formant values
Lip rounding (for /u/) ~ F2
Note* lip protrusion will increase the overall length of the vocal tract which will decrease all formant values
SPPA 403 Speech Science 11
Mid Central vowelF1: 500 HzF2: 1500 Hz
/i/
/u/
//
//
frequency
Am
plit
ude/
gain
SPPA 403 Speech Science 12
IMPORTANT
• Tongue and lip rules are based on how these articulations change the VT area function (shape)
• VT area function ultimately determines the VT filter properties
SPPA 403 Speech Science 13
Tf32 example
SPPA 403 Speech Science 14
SPPA 403 Speech Science 15
F1-F2 values for English Vowels
SPPA 403 Speech Science 16
Vowels: Stylized Spectrograms
SPPA 403 Speech Science 17
Vowels: Stylized Spectrograms
SPPA 403 Speech Science 18
/a/ - low back ( F1 F2)
SPPA 403 Speech Science 19
/i/- high front ( F1 F2)
SPPA 403 Speech Science 20
/u/ - high back ( F1 F2)
SPPA 403 Speech Science 21
/ae/ - low front ( F1 F2)
SPPA 403 Speech Science 22
How important are F1-F3 in speech production & perception?
SPPA 403 Speech Science 23
Sinewave Speech Demonstration
Sinewave speech examples (from HINT sentence intelligibility test):
SPPA 403 Speech Science 24
Problem
• Recall - F1 = c/4l
• VT length influences exact frequency location of formants
• Speakers vary in their vocal tract length
• men > women > children
SPPA 403 Speech Science 25
Problem
/i/
/u/
SPPA 403 Speech Science 26
How do we know that a child, a man and a women all say /i/, when the acoustic values of formants are quite different?
SPPA 403 Speech Science 27
A possible answer??
F2
F1
children
women
men
SPPA 403 Speech Science 28
A possible answer??
• Relative locations of formants is similar across speakers even though absolute values differ
• Perhaps we ‘rescale’ our expectations depending upon factors such as gender and age
SPPA 403 Speech Science 29
SPPA 403 Speech Science 30
Vowel articulation and vowel acoustics
• Vowel quadrilateral: articulatory plane
is similar to
• F1-F2 plot: acoustic plane
SPPA 403 Speech Science 31
SPPA 403 Speech Science 32back front
low
high
SPPA 403 Speech Science 33
Vowel articulation and vowel acoustics
• Vowel quadrilateral: articulatory plane
is similar to
• F1-F2 plot: acoustic plane
SPPA 403 Speech Science 34
Unit 3 outline
• The Vocal Tract (VT)• Source-Filter Theory
of Speech Production• Capturing Speech
Dynamics• The Vowels• The Diphthongs• The Glides• The Liquids
SPPA 403 Speech Science 35
Diphthongs
• Slow gliding movement between two vowel qualities
• characterized by an articulatory transition
• articulatory transition = formant transitions
SPPA 403 Speech Science 36
Diphthongs
• /ai/ - “bye”
• /au/ - “bough”
• /oi/ - “boy”
• /ei/ - “bay”
SPPA 403 Speech Science 37
Diphthongs: /ai/
a i
SPPA 403 Speech Science 38
Diphthongs: /au/
a u
SPPA 403 Speech Science 39
Unit 3 outline
• The Vocal Tract (VT)• Source-Filter Theory
of Speech Production• Capturing Speech
Dynamics• The Vowels• The Diphthongs• The Glides• The Liquids
SPPA 403 Speech Science 40
Glides (/w/, /j/) & Liquids (/l/, /r/)
• often termed sonorants
• Associated with – a high degree of vocal tract constriction– articulatory transition = formant transition
SPPA 403 Speech Science 41
Glides (/w/, /j/) & Liquids (/l/, /r/)
Degree of Constriction• Greater than vowels
– Poral slightly greater than Patmos
• Less than fricatives– Poral for glides/liquids < Poral for fricatives
• Constriction lasts ~ 100 msec• Constriction results in a loss in energy
– weaker formants
SPPA 403 Speech Science 42
Glides (/w/, /j/) & Liquids (/l/, /r/)
Transition rate
• faster than the diphthongs
• slower than the stops
• lasts ~ 75 msec
SPPA 403 Speech Science 43
/w/
• Place: labial
• Acoustics– /u/-like formant frequencies– Constriction formant values– F1 ~ 330 Hz– F2 ~ 730 Hz– weak F3 (~ 2300 Hz)
SPPA 403 Speech Science 44
/w/
uh w ae
F1
F2
F3
1000
2000
3000
SPPA 403 Speech Science 45
/j/
• Place: palatal
• Acoustics– /i/-like formant frequencies– F1 ~ 300 Hz– F2 ~ 2200 Hz– F3 ~ 3000 Hz
SPPA 403 Speech Science 46
/j/
uh j ae
F1
F2
F3
1000
2000
3000
SPPA 403 Speech Science 47
/j/
uh j ae
SPPA 403 Speech Science 48
Liquids (/l/, /r/)
• lateral /l/
• Retroflex /r/
• Pickett (1999) considers these consonants glides as well
SPPA 403 Speech Science 49
/l/
• Place: alveolar
• Articulatory phonetics:– tongue tip contacts alveolar ridge– Constriction is on each side of this
obstruction – hence the term lateral– Vocal tract is split – not modeled with a
single tube
SPPA 403 Speech Science 50
/l/
• Acoustics– F1 ~ 360 Hz– F2 ~ 1300 Hz– F3 ~ 2700 Hz– F2 is variable and affected by vowel
environment– Position in word will affect acoustic
features of /l/– Final /l/ will have a higher F1 & lower F2
SPPA 403 Speech Science 51
/l/
uh l ae
F1
F2
F3
1000
2000
3000
SPPA 403 Speech Science 52
/l/
uh l ae
SPPA 403 Speech Science 53
/r/
• Place: palatal
• Articulatory phonetics– /r/ can take on a wide variety of articulator
positions– Tongue can be “bunched” together– Tongue can be “retroflexed”, tipping back
toward the palate– Clearly illustrates that many articulatory
configurations can result in the same acoustic product
SPPA 403 Speech Science 54
/r/
• Acoustics– Hallmark of /r/ is a low F3– F1 ~ 350 Hz– F2 ~ 1050 Hz– F3 ~ 1550 Hz– Vowels have F3 above 2200 Hz– Vowels around /r/ are colored by it and
exhibit lowered F3 values
SPPA 403 Speech Science 55
/r/
uh r ae
F1
F2
F3
1000
2000
3000
SPPA 403 Speech Science 56
“Bunched” /r/
uh r ae
SPPA 403 Speech Science 57
“Retroflexed” /r/
uh r ae
SPPA 403 Speech Science 58
A digression…
• /r/ demonstrates that there isn’t a single way to make a speech sound
• /r/ serves to remind us that our source-filter theory allows educated guesses that may not always be right.
• For example, how would you know from acoustics (i.e. formants) if the person is bunching or retroflexing?
SPPA 403 Speech Science 59
A digression…
• /r/ is a problematic sound for many youngsters to learn
• Why might that be?