32
Pitch-Synchronous Spectrogram: Principles and Applications C. Julian Chen Department of Applied Physics and Applied Mathematics May 24, 2018

Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

  • Upload
    others

  • View
    24

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

Pitch-Synchronous Spectrogram: Principles and Applications

C. Julian Chen

Department of Applied Physics and Applied Mathematics

May 24, 2018

Page 2: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

Outline• The traditional spectrogram• Observations with the electroglottograph (EGG)• Process of human voice production• Pitch-synchronous segmentation of voice signals• Pitch-synchronous spectrogram• Display of timbre spectrum within each period • Display of power evolution within each period• Free evaluation version and full versions

Page 3: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

The traditional spectrogram

The graph is always a mixture of pitch and timbre.

(A) with a wide window, the overtones of fundamental frequency dominate.

(B) with a narrow window, a mixture of formant peaks and details in each pitch period dominate.

Page 4: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

Display of timbre spectrum

The curve is always a mixture of pitch and timbre. It is very difficult to decipher formant frequencies and peak profiles.

Page 5: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

The Source-Filter Theory

Source: Fourier transform of glottal airflow waveform, -12 dB/oct. Filter: an all-pole transfer function. Radiation factor: +6 dB per octave, which is against the law of energy conservation.

Page 6: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

Pitch-Asynchronous Speech Parameterization (1)

The speech signal is blocked into overlapping frames with a fixed window size (25 msec) and a fixed shift (10 msec), and then multiplied by a processing window, typically a Hamming window. The windows often cross phoneme boundaries. Timbre and pitch cannot be separated.

Page 7: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

Pitch-Asynchronous Speech Parameterization (2)

Using an all-pole filter model from LPC analysis, the formants of speech signals can be extracted. But the process is not convergent.

Page 8: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

Anatomy of voice-production organs

Page 9: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

Observation of Speech Signals (1)

Vowel [a], King-TTS-012, 050007, 2.23-2.28 sec.

Page 10: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

Observation of Speech Signals (2)

Vowel [i], King-TTS-012, 004419, 1.938 – 1.968 sec.

Page 11: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

Observation of Speech Signals (3)

Vowel [u], King-TTS-012, 005044, 1.06 – 1.11 sec.

Page 12: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

Observation of Speech Signals (4)

Vowel [e], King-TTS-012, 050053, 2.535 – 2.585 sec.

Page 13: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

Observation of Speech Signals (5)

Vowel [o], King-TTS-012, 051022, 1.827 – 1.877 sec.

Page 14: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

The Electroglottograph (EGG)

A non-invasive instrument to detect the change of electric conductance between the two vocal cords, thus to monitor the opening and closing of the glottis (circa 1956).

Page 15: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

What the Correlation of EGG Signals and Voice Signals Tells Us?

A voice waveform is triggered by a glottal closing, starting with an impulse. The acoustic wave is strong in the closed phase, and weak in the open phase. (Fig. 5.6, Resonance in Singing, D. G. Miller).

Page 16: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

The Handclap Analogy (Robert Sataloff)

“Sound is actually produced by the closing of the vocal folds, in a manner similar to the sound generated by hand clapping. … (T)he more frequent they open and close, the higher the pitch.” (Sataloff).

Page 17: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

The Water-Hammer Analogy (Ronald Baken)

“The sharp cutoff of flow is particularly crucial, because it is this relatively sudden stoppage of the air flow that is the raw material of voice. An impulse-like shock wave is produced that “excites” air molecules in the vocal tract.” (R. Baken)

Page 18: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

Principle of Superposition (Peter Ladefoged)

The voice signal is a superposition of elementary decaying waves, each elementary wave starts at a glottal closing event. Pitch is the repetition rate of glottal closing. (Ladefoged)

Page 19: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

What Is Timber Spectrum?

• As the glottis closes, the air moving in the vocal tract at that moment maintains its momentum.

• The kinetic energy of the moving air in the vocal tract is converted into acoustic energy.

• The impulse resonates in the vocal tract.• The decaying elementary wave in each pitch

period is determined by the geometry of vocal tract, thus it represents instantaneous timber.

• Accurate timber spectrum must be computed from the waveform in each pitch period.

Page 20: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

Process Within Each Pitch Period

• A glottal closing starts a pitch period.• The acoustic wave decays exponentially during

the closed phase.• A glottal opening connects the vocal tract with

the lungs thus accelerates power decay.• A glottal opening also generates random noise.• The excitation at a glottal opening is mostly

weaker than that at a glottal closing.

Page 21: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

Pitch-Synchronous Segmentation Using EGG

The sharp peaks in EGG derivative occur about 1 msec before the starting impulse, which is in the weakly varying section of a pitch period, suitable as segmentation points.

Page 22: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

Pitch-Synchronous Segmentation from Voice

By multiplying the voice signal with an asymmetric window, an excitation profile function is generated. The peaks of the excitation profile function generate pitch marks.

Page 23: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

Ends-meeting procedure to make waveform cyclic

After an ends-meeting procedure, the waveform of each pitch period becomes a sample of a smooth periodic function.

Page 24: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

Example of a pitch-synchronous spectrogram

For voiced sections, the vertical lines represent glottal closing instants. In each pitch period, the amplitude timbre spectrum is displayed. Unvoiced sections has no glottal closings.

Page 25: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

Display of Timbre Spectrum and Power Decay

By left-clicking the spectrogram at a pitch period, its timbre spectrum is displayed. By right-clicking at a pitch period, a graph of power decay in that pitch period is displayed.

Page 26: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

Examples: Timbre spectra of some vowels

Page 27: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

Examples: Consistency of Timbre Spectra

Six examples of timbre spectra of vowel [i]. All showing a strong peak at about 300 Hz, and a group of peaks around 2-4 kHz.

Page 28: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

Examples: Timbre spectra of some consonants

Page 29: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

Examples: Power decay in a single pitch period

Page 30: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

A Free Evaluation Version• Includes pitch-synchronous segmentation of voice

signals, spectrogram generation, timbre spectrum generation, and power decay computation.

• Only works on Mac OS• Requires an installation of Tcl/Tk• Partially open-source: the C++ program is compiled,

the Tcl/Tk source code is open.• Includes two sets of standard speech data: the CMU

ARCTIC databases for US English speakers, male speaker bdl and female speaker slt

• Manually corrected phoneme label files for the two sets of speech data are also included

Page 31: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

Input panel of the evaluation versionThe entire package is in a single dir, PSS. In that dir, typeIMAC: PSS usermane$ wish pss.tcl <enter>

An input panel appears:

Page 32: Pitch-Synchronous Spectrogram: Principles and Applicationsjcc2161/images/Pitch-Synchronous Spectrogram.pdf• Pitch-synchronous segmentation of voice signals • Pitch-synchronous

References1. D. G. Miller, Resonance in Singing, Inside View Press,

2008. 2. R. T. Sataloff, The Human Voice, Scientific American,

December 1992, Vol. 108. 3. R. J. Baken, Electroglottography, Journal of Voice, Vol 6,

page 98-110 (1992)4. R. J. Baken, An Overview of Laryngeal Function for

Voice Production, in Professional Voice, Third Edition, edited by R. T. Sataloff, Plural Publishing, Vol. 1, pages 237-256 (2005).

5. P. Ladefoged, Elements of Acoustic Phonetics, University of Chicago Press, 1966.

6. C. J. Chen, Elements of Human Voice, World Scientific Publishing, 2016.