19
SPEECH AND SPECTRAL ANALYSIS 1

SPEECH AND SPECTRAL ANALYSIS

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SPEECH AND SPECTRAL ANALYSIS

SPEECH AND SPECTRAL ANALYSIS

1

Page 2: SPEECH AND SPECTRAL ANALYSIS

Sound waves: production

• in general:• acoustic interference

• vibration (carried by some propagation medium)

• variations in air pressure

• speech: • actions of the articulatory organs ->

vibrations

• propagation medium -> airstream

Representation of fluctuations in air pressure caused by a vibrating tuning fork (from P. Ladefoged, Elements of acoustic phonetics).

2

Page 3: SPEECH AND SPECTRAL ANALYSIS

Sound waves: perception

A schematic diagramm of the mechanism of the ear (from P. Ladefoged, Elements of acoustic phonetics). 3

Page 4: SPEECH AND SPECTRAL ANALYSIS

Distinctive features of sound waves• Frequency

• measured in cycles per second (Hz): A sound wave whose frequency is 100 Hz has 100 cycles in a second.

• cycle: the distance between two peaks (C) or rests (B) in the movement of the wave (i.e. it describes how close together the two points are)

• period: Period is the time required to complete one cycle of vibration, e.g. if 20 cycles are completed in 1 second, the period is 1/20th of a second (s), or 0.05 s.

• Amplitude

• the maximum distance between the peak (C) and the trough(A) peak-to-peak a.

• Fundamental frequency (of a voiced speech sound):

• 1/fundamental period (i.e. the time required to complete one cycle of the pattern as a whole)

• the frequency of vocal folds vibration

• depending on the size of the vocal apparatus human voice produces sounds within the ranges: 80-220 male, 120-300 female, 200-500 children

A wave of a 20 Hz frequency from Davenport & Hannahs, Introducing phonetics and phonology). 4

Page 5: SPEECH AND SPECTRAL ANALYSIS

Simple and complex waves

Two simple waves (pure tones, harmonics) of frequency 100 and 500 cps.

The complex wave resulting from superposition of two simple waves of 100 and 500 cps (from P. Ladefoged, Elements of acoustic phonetics). 5

Page 6: SPEECH AND SPECTRAL ANALYSIS

Distinctive features of sounds (1)

Two sounds of the same duration (lenght) can differ with respect to:

• Pitch:• subjective impression of the “height” of the sound • related to fundamental frequency of the vibration which is an acoustic (objective) measure

indicating the “height” of the sound• two sounds of a different f. frequency (f0) can be perceived as having the same pitch

• Loudness• related to the amplitude of the sound: the higher the amplitudę, the louder the sound is

perceived

• affected by the efficiency and distance of the propagating medium: • the larger the distance, the less audible the sound becomes

• some materials, e.g. wood, are more efficient in carrying sounds than air

6

Page 7: SPEECH AND SPECTRAL ANALYSIS

Distinctive features of sounds (2)

• quality (or colouring)

• results from differences in the shape of the propagation medium (hence differences in the perception of the same phoneme produced by different speakers, as well as

differences in the vowel quality resulting from different shape of the vocal tract) and the material enclosing that medium (in case of musical instruments e.g. flute made of metal vs. wooden violin).

• Depending on the features (shape, size and material) of the propagation medium some harmonics of the sound will be emphasized and others will be damped.

7

Page 8: SPEECH AND SPECTRAL ANALYSIS

Source-filter theory (1)

• speech production: a two stage process

• 1) the generation of a sound source

• 2) shaping/filtering of the sound source by the resonant properties of the vocal tract

• the input (source of sound): glottis or the supralaryngeal vocal tract• the output: the lips or the nose (or both)• The vocal tract filters the sound source. The vocal tract’s acoustic response depends on

its length & shape.

8

Page 9: SPEECH AND SPECTRAL ANALYSIS

Source-filter theory (2)

• the effect of the vocal tract shape on the characteristics of the output sound:• it determies whether there is a supralaryngeal sound source• it determies the resonance frequencies (formant frequencies) of the vocal tract

Examples of different types of source and vocal tract shape.

9

Page 10: SPEECH AND SPECTRAL ANALYSIS

Source-filter theory (3)

• A resonator acts as a filter on the original source of sound: it rearranges the input

energy so that frequencies that are at or near the resonance frequencies are amplified,

at the expense of those frequencies that are not near the resonance frequencies (they

become reduced).

• We can calculate the resonances given the length of the vocal tract (assume 17.5 cm for

now) and the speed of sound (assume 35.000 cm/s):

F1 = c/4L, where: c = the speed of sound and L = the length of the tube

• For example, for a 17.5 cm tube, F1 = c/4L = 35000/70 = 500 Hz.

10

Page 11: SPEECH AND SPECTRAL ANALYSIS

Periodic and aperiodic waves

• complex waves can be:• periodic: regularly repeating

pattern—each complete cycle, or period, is like the last one

• aperiodic: irregular—no regularly repeating pattern, thus no clear cycles, or periods

• the type of the complex waveform is determined by the sound source (excitation source):

• periodic: when the vocal folds vibrate regularly

• aperiodic: every other sound source, laryngeal and supralaryngeal

11

Page 12: SPEECH AND SPECTRAL ANALYSIS

Periodic sound source in speech 1. Regular vibration of the vocal folds produces many different frequencies in a single

glottal cycle, which results in a complex periodic waveform -> a periodic (= regularly

repeating) sound source.

2. All periodic speech sounds are phonated, i.e. phonetically voiced. The source of

periodic sound is always in the larynx – at the glottis.

3. The period is the duration of one cycle of the pattern of a periodic wave (one

glottal cycle).

4. The fundamental frequency (f0) is the reciprocal of the period: 1/period.

5. The percept of pitch is closely related to f0. A higher pitch has a higher f0, and

hence faster glottal pulses. (Periodic sounds have pitch; aperiodic sounds do not.)

12

Page 13: SPEECH AND SPECTRAL ANALYSIS

Aperiodic sound sources in speech

1. Aperiodic sound source results in turbulence noise or implosion noise

(random noise = many frequencies, but forming irregular patterns). The vocal

folds do not vibrate: such sounds are phonetically voiceless.

2. The aperiodic source may be laryngeal (located at the glottis) or

supralaryngeal (located higher in the vocal tract):

• when the glottis is narrowed enough to produce aperiodic noise (but too

wide to let the vocal folds vibrate), the result is whisper, [h] (= a voiceless

vowel) or breathy voice

• for other aperiodic speech sounds, the source of sound is at a constriction

in the oral cavity that is narrow enough to cause air to rush through it.

These supralaryngeal constrictions result in voiceless stops, fricatives and

affricates, e.g. [f s t ʧ].

13

Page 14: SPEECH AND SPECTRAL ANALYSIS

Mixed voiced and aperiodic sound source

• Periodic and aperiodic sources can be generated simultaneously to produce mixed voiced and aperiodic speech typical of sounds such as voiced fricatives.

14

Page 15: SPEECH AND SPECTRAL ANALYSIS

Acoustic representations of sounds: spectrogram, waveform, spectrum (1)

• waveform• variations in the air pressure associated with speech sounds

• changes in amplitude through time

• pulses corresponding to the vibrations of the vocal folds

Waveform of a Polish utterance: Ostatnie przygody Korowiowa i Behemota (male speaker).

15

Page 16: SPEECH AND SPECTRAL ANALYSIS

Acoustic representations of sounds… (2): waveforms

• What kind of information can we derive from a waveform? • amplitude, F0 , the manner of articulation (to some extent):

• vowels, approximants and nasals – pulses (voicing), high amplitude and energy (vowels, approximants and in the end nasals)

• voiced obstruents (plosives, fricatives and affricates) – pulses and low energy and amplitude (fricative segments, plosives)

• voiceless obstruents – empty spaces in case of stops, aperiodic variation in the amplitude in case of fricatives and fricative component of an affricate

16

Page 17: SPEECH AND SPECTRAL ANALYSIS

Acoustic representations of sounds… (3): spectrograms

• spectrogram• variation in the frequency domain over the time

• vertical lines -> pulsations of the vocal folds

• frequency domain: certain frequencies are emphasized (dark marks) -> formants

• The frequency of the formant depends on the size and shape of the vocal tract, so in a spectrographic analysis it provides information on the place and manner of articulation.

Spectrogram of a Polish utterance: Ostatnie przygody Korowiowa i Behemota (male speaker).

17

Page 18: SPEECH AND SPECTRAL ANALYSIS

• In the analysis of speech the first four formants are taken into account and they are marked as F1, F2, F3 and F4 (from the lowest to the highest on the frequency scale).

• F1 and F2 are the most important indicators of vowel quality, whereas the higher formants reflect speaker’s characteristics (voice quality).

• In the flow of articulation changes in formant frequencies which occur when the setting of the vocal tract is changed from one sound to another are called transitions.

• Spectrograms: optimal for analysis of duration, F0 and phonetic features (e.g. aspiration) , and identification of different speech sounds (-> formant frequencies, transitions and vocal folds pulsations)

Acoustic representations of sounds… (4): spectrograms

18

Page 19: SPEECH AND SPECTRAL ANALYSIS

Acoustic representations of sounds… (5): spectra

• spectrum (pl. spectra) is static: it shows the amplitude of each frequency present in the sound, usually during a single short section of the signall e.g. 25 or 50 ms

• you can obtain a spectrogram by arranging together a series of spectra

• types of spectral analysis:

• Fourier analysis (fft [fast Fourier transform] or dft [discrete Fourier transform])

• Linear Predictive Coding (lpc)

• harmonics – each component frequency in a periodic wave: H1, H2 (=2 x H1), H3 (=3 x H1), etc.

• the frequency of the lowest harmonic (the first harmonic) is equivalent to the fundamental frequency of the voice-> f0 = H1

• harmonics ≠ formants

Dft (jagged line) and lpc (smooth line) spectra of [uː] in It’s too much.

19